Why did SURMOUNT-4 use a randomized withdrawal design instead of a standard placebo-controlled trial?

A standard design would have required a placebo arm from baseline, which would not answer the clinically relevant question of what happens when an effective treatment is stopped. The withdrawal design directly models the real-world scenario of treatment discontinuation after a successful response period.

Does the open-label lead-in bias the results?

The lead-in selects for responders and tolerators, which limits generalizability to all patients. It also means participants entered the blinded phase with knowledge of the drug's effects, potentially compromising blinding. Both factors tend to amplify the observed treatment difference.

How does SURMOUNT-4 compare to the STEP 1 extension for semaglutide?

The STEP 1 extension showed similar weight regain after semaglutide discontinuation, with participants regaining about two-thirds of lost weight by 52 weeks off treatment. SURMOUNT-4's design was more rigorous (randomized withdrawal vs. observational follow-up), but the broad pattern of substantial regain after GLP-1RA cessation is consistent across both drugs.

Was the 52-week withdrawal period long enough?

The weight regain curve had not fully plateaued by week 88, suggesting that longer follow-up might reveal additional regain. The 52-week period was likely chosen to balance scientific rigor against participant retention and regulatory timelines.

Why were participants with recent GLP-1 receptor agonist use excluded?

Excluding prior GLP-1RA users ensured a drug-naive population, reducing variability in response. This limits applicability to the growing number of patients switching between agents but provides cleaner efficacy data for the regulatory submission.

What does the estimand framework mean for interpreting the primary result?

The treatment-policy estimand (primary) includes all randomized participants regardless of adherence, reflecting real-world assignment. The efficacy estimand assumes full adherence. Both produced similar results in SURMOUNT-4 because completion rates were high, but in other trials with more dropout, these two approaches can yield meaningfully different conclusions.

Could a lower maintenance dose prevent weight regain?

SURMOUNT-4 did not include a dose-reduction arm, so this question remains unanswered. Clinical interest in maintenance dosing strategies is high, and ongoing studies are expected to address whether reduced doses can sustain weight loss with fewer side effects and lower cost.

Does SURMOUNT-4 prove tirzepatide must be taken indefinitely?

The trial shows that stopping tirzepatide after 36 weeks of treatment leads to substantial regain over the next year. It does not test intermittent dosing, dose reduction, or combination strategies that might offer alternatives to continuous full-dose therapy. The data support long-term use but do not rule out other maintenance approaches.

How generalizable are the results to diverse populations?

The trial enrolled a predominantly White and female population. Weight regain kinetics may differ across demographic groups, metabolic phenotypes, and comorbidity profiles. Subgroup analyses were limited by sample size, so population-specific conclusions should be drawn cautiously.

What did SURMOUNT-4 show about cardiometabolic markers after withdrawal?

The placebo group experienced worsening of blood pressure, lipids, fasting insulin, and waist circumference relative to the continuation group. These changes tracked with weight regain, consistent with prior evidence that metabolic improvements from GLP-1RAs are weight-dependent and reversible upon discontinuation.

Inside the SURMOUNT-4 Methodology: What Most Summaries Skip

By HealthRX.com Medical Team

Published May 25, 2026Updated May 25, 2026Last reviewed May 25, 2026

Clinical image for Inside the SURMOUNT-4 Methodology: What Most Summaries Skip Image: HealthRX.com AI-generated clinical image

At a glance

| Detail | Value | |---|---| | Trial | SURMOUNT-4 (NCT04660643) | | N | 670 randomized (from 783 who entered the open-label lead-in) | | Intervention | Tirzepatide (maximum tolerated dose of 10 or 15 mg weekly) | | Comparator | Matched placebo (after open-label tirzepatide lead-in) | | Total duration | 88 weeks (36-week lead-in + 52-week double-blind period) | | Primary endpoint | Percent change in body weight from randomization (week 36) to week 88 | | Key result | Continuation arm: −5.5% additional loss; placebo arm: +14.0% regain | | Journal | JAMA, 2024 |

Why Most Trial Summaries Get SURMOUNT-4 Wrong

The headline number from SURMOUNT-4 is dramatic: a roughly 20-percentage-point difference in weight trajectory between people who stayed on tirzepatide and those switched to placebo. But the number alone obscures what makes this trial genuinely unusual. SURMOUNT-4 was designed from the start as a randomized withdrawal study, a structure borrowed from psychiatry and rheumatology that is still rare in obesity medicine. The design choice carries consequences for how physicians, payers, and patients should interpret the data.

Most coverage treats SURMOUNT-4 like a standard two-arm RCT. It is not. Understanding the methodology is the difference between reading this trial as "tirzepatide works" (obvious by SURMOUNT-1 through SURMOUNT-3) and reading it as "here is what happens to a specific population when treatment stops, and here is the magnitude of rebound."

The Randomized Withdrawal Design

Open-Label Lead-In (Weeks 0 to 36)

All 783 enrolled participants received open-label tirzepatide, escalated to the maximum tolerated dose (10 mg or 15 mg subcutaneously once weekly). This 36-week run-in served two purposes. First, it allowed dose titration without unblinding, since all participants reached their plateau dose before randomization. Second, it selected for a population of responders and tolerators. Anyone who discontinued during the lead-in, whether from adverse events, lack of efficacy, or personal choice, was excluded from the randomized phase.

This enrichment is the single most important design feature to understand. The 670 patients who entered the double-blind period had already demonstrated that they could tolerate tirzepatide and lose weight on it. That population does not represent all comers. It represents the subset most likely to benefit from continued therapy, which is exactly the population a clinician would face when deciding whether to maintain a prescription. The FDA label for Zepbound (tirzepatide) does not restrict use to responders, but real-world prescribing effectively does.

Randomization and Blinding (Weeks 36 to 88)

At week 36, participants who had achieved at least 5% body weight loss (the vast majority) were randomized 1:1 to continue tirzepatide at their current dose or switch to matching placebo. Randomization was stratified by two variables:

Dose at randomization (10 mg vs. 15 mg)
Diabetes status (type 2 diabetes vs. no diabetes)

The matched placebo was identical in appearance and injection volume. Participants in the placebo arm underwent a tapered withdrawal rather than abrupt cessation: their dose stepped down over several weeks. This taper is clinically relevant because it distinguishes SURMOUNT-4 from a cold-stop scenario. The observed 14% regain occurred despite a gradual transition off the drug. Abrupt cessation might produce different kinetics of regain, though no head-to-head comparison exists.

Both investigators and participants were blinded from week 36 onward. The lead-in was explicitly open-label, which introduces the possibility that participants' expectations during the blinded phase were shaped by their prior experience on active drug. If a placebo-arm participant noticed the absence of GI side effects (nausea, for instance), functional unblinding could occur. The trial protocol did not include a formal assessment of blinding integrity, a limitation the authors acknowledged.

Inclusion and Exclusion Criteria: Who Was Actually Studied

The enrolled population had a BMI of 30 or greater, or 27 or greater with at least one weight-related comorbidity. Type 2 diabetes was permitted but required an HbA1c <10%. Prior bariatric surgery was excluded. Use of other anti-obesity medications within 3 months was excluded.

Two exclusions deserve closer attention:

Prior GLP-1 receptor agonist use. Participants with recent GLP-1RA exposure were excluded. This means the trial population was drug-naive to the mechanism, which matters because real-world patients increasingly cycle between semaglutide and tirzepatide. Whether regain kinetics differ in GLP-1-experienced patients remains unknown.
Cardiovascular event history. Participants with recent major adverse cardiovascular events were excluded. Given that the SELECT trial demonstrated cardiovascular benefit for semaglutide in a high-CV-risk population, SURMOUNT-4's exclusion of that group limits extrapolation to the very patients for whom long-term maintenance might carry the highest stakes.

The Primary Endpoint and Its Estimand

The primary endpoint was percent change in body weight from week 36 (randomization) to week 88. This is a critical distinction: the baseline for the primary analysis was not the original enrollment weight but the post-lead-in weight. A participant who entered the trial at 110 kg, dropped to 90 kg during the open-label period, and then regained to 100 kg during the placebo phase would show an 11.1% increase from randomization, not an 9.1% decrease from original baseline.

Both framings are accurate. The trial's primary framing emphasizes what happens after treatment withdrawal. Secondary analyses reported weight change from the original baseline, showing that even after regain, placebo-arm participants remained below their starting weight at week 88. This secondary framing is what patients care about most ("Am I still better off than before I started?"), while the primary framing answers the mechanistic question of treatment withdrawal.

The Estimand Framework

SURMOUNT-4 used an ICH E9(R1) estimand framework, specifying two estimands:

Treatment-policy estimand: Includes all data regardless of treatment adherence or use of rescue medication. This is the intention-to-treat approach. It answers: "What happens when a clinician assigns this strategy?"
Efficacy estimand (hypothetical): Estimates the effect assuming all participants adhered to treatment without rescue medication. It answers: "What would happen if everyone took the drug as prescribed?"

The treatment-policy estimand was the primary regulatory estimand. The two estimands produced similar results because adherence in both arms was high during the blinded period (over 90% completion). In trials with higher dropout, these two estimands can diverge substantially, and the choice of which to prioritize becomes a genuine source of disagreement between sponsors and regulators. The FDA's 2023 guidance on estimands for obesity trials reflects an increasing preference for the treatment-policy approach, which SURMOUNT-4 aligned with.

Statistical Approach

The primary analysis used a mixed model for repeated measures (MMRM) with treatment group, visit, treatment-by-visit interaction, stratification factors (dose, diabetes status), and baseline body weight (at randomization) as covariates.

Missing data were handled under a missing-at-random (MAR) assumption within the MMRM framework. For the treatment-policy estimand, a reference-based multiple imputation approach was used for participants who discontinued: their missing data were imputed assuming they followed the trajectory of the placebo group. This is a conservative choice for the continuation arm and a neutral choice for the placebo arm.

The trial was powered to detect a 4-percentage-point difference in weight change, with 90% power at a two-sided alpha of 0.05. The observed difference of approximately 20 percentage points far exceeded this threshold, yielding a p-value of <0.001. The trial was, in a sense, overpowered for its primary endpoint, which raises the question of whether a smaller sample or shorter blinded period would have sufficed. The larger sample, though, provided useful data for subgroup analyses and safety surveillance.

Results Beyond the Headline

| Outcome | Tirzepatide continuation | Placebo (withdrawal) | |---|---|---| | Weight change, randomization to week 88 | −5.5% | +14.0% | | Weight change, original baseline to week 88 | −25.3% | −9.9% | | Participants achieving ≥5% loss from original baseline at week 88 | 97.3% | 57.2% | | Participants achieving ≥20% loss from original baseline at week 88 | 70.4% | 16.7% | | Waist circumference change from randomization | −4.3 cm | +7.8 cm |

The trajectory data are as informative as the endpoint data. Weight regain in the placebo arm was not linear. The steepest regain occurred in the first 12 to 16 weeks after randomization, with the curve flattening somewhat by week 88 but not reaching a clear plateau. This suggests that a 52-week withdrawal period may not have captured the full extent of regain. Had the blinded period extended to 104 weeks, the placebo arm might have regained more.

Cardiometabolic markers followed weight. The placebo group saw increases in waist circumference, blood pressure, fasting insulin, and lipid parameters relative to the continuation group. These secondary endpoints were not powered for individual statistical testing, but the pattern was consistent across markers and consistent with what the STEP 1 extension data for semaglutide showed in a similar withdrawal context.

Limitations the Authors Acknowledged

The published manuscript flagged several limitations directly:

Enrichment bias. The randomized population consisted exclusively of responders and tolerators, limiting generalizability.
Blinding integrity. No formal assessment of whether participants guessed their assignment.
Duration of withdrawal. Fifty-two weeks may not capture the full regain trajectory.
Homogeneity. The study population was predominantly White and female, limiting applicability to other demographic groups.
No active comparator. The trial compared continuation to withdrawal, not to a lower maintenance dose or to switching to a different agent.

A limitation the authors did not emphasize: the open-label lead-in could inflate the apparent treatment effect during the blinded phase through expectation effects. A participant who experienced dramatic weight loss during the open-label period and then noticed weight regain might alter diet and exercise behavior differently than someone in a fully blinded trial from day one.

What This Design Cannot Tell You

SURMOUNT-4 was built to answer one question well: what is the consequence of stopping tirzepatide after a meaningful response? It was not designed to answer:

Whether a lower maintenance dose could prevent regain (no dose-reduction arm existed)
Whether intermittent dosing (drug holidays) is feasible
How regain compares between tirzepatide and semaglutide 2.4 mg (Wegovy) in a head-to-head withdrawal design
Whether behavioral intervention during withdrawal modifies the regain curve
What the ceiling of regain is beyond 52 weeks off treatment

These are not criticisms of the trial. They are boundaries of what randomized withdrawal designs can address with a single comparator and fixed timeline. The American Gastroenterological Association's 2024 clinical practice update on anti-obesity medications explicitly calls for longer withdrawal and dose-reduction studies to fill these gaps.

The Clinical Translation Problem

For prescribers, SURMOUNT-4's core finding, that weight regain is substantial and rapid after tirzepatide discontinuation, supports indefinite treatment. But the trial's enrichment design means the 14% regain figure applies to a selected population. A patient who barely tolerated tirzepatide during the lead-in (and was excluded from randomization) might regain on a different curve, or might never have lost enough to make the regain question relevant.

Payers face a different interpretation problem. The trial demonstrates that stopping the drug erases a large portion of its benefit, which can be read as evidence for chronic coverage or as evidence that the drug creates dependency without durable effect. The same data support both readings, and the trial design cannot distinguish between them.

Frequently asked questions

›

References

Aronne LJ, Sattar N, Horn DB, et al. Continued treatment with tirzepatide for maintenance of weight reduction in adults with obesity: the SURMOUNT-4 randomized clinical trial. JAMA. 2024. https://jamanetwork.com/journals/jama/fullarticle/2814876
Zepbound (tirzepatide) prescribing information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2023/217806s000lbl.pdf
Wilding JPH, Batterham RL, Davies M, et al. Weight regain and cardiometabolic effects after withdrawal of semaglutide: the STEP 1 trial extension. Diabetes Obes Metab. 2022. https://pubmed.ncbi.nlm.nih.gov/35441470/
Lincoff AM, Brown-Frandsen K, Colhoun HM, et al. Semaglutide and cardiovascular outcomes in obesity without diabetes (SELECT). N Engl J Med. 2023. https://pubmed.ncbi.nlm.nih.gov/37952131/
Wegovy (semaglutide 2.4 mg) prescribing information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2021/215256s000lbl.pdf
American Gastroenterological Association clinical practice update on anti-obesity pharmacotherapy. Gastroenterology. 2024. https://pubmed.ncbi.nlm.nih.gov/38462375/