Inside the SURMOUNT-4 Methodology: What Most Summaries Skip

At a glance
| Detail | Value | |---|---| | Trial | SURMOUNT-4 (NCT04660643) | | N | 670 randomized (from 783 who entered the open-label lead-in) | | Intervention | Tirzepatide (maximum tolerated dose of 10 or 15 mg weekly) | | Comparator | Matched placebo (after open-label tirzepatide lead-in) | | Total duration | 88 weeks (36-week lead-in + 52-week double-blind period) | | Primary endpoint | Percent change in body weight from randomization (week 36) to week 88 | | Key result | Continuation arm: −5.5% additional loss; placebo arm: +14.0% regain | | Journal | JAMA, 2024 |
Why Most Trial Summaries Get SURMOUNT-4 Wrong
The headline number from SURMOUNT-4 is dramatic: a roughly 20-percentage-point difference in weight trajectory between people who stayed on tirzepatide and those switched to placebo. But the number alone obscures what makes this trial genuinely unusual. SURMOUNT-4 was designed from the start as a randomized withdrawal study, a structure borrowed from psychiatry and rheumatology that is still rare in obesity medicine. The design choice carries consequences for how physicians, payers, and patients should interpret the data.
Most coverage treats SURMOUNT-4 like a standard two-arm RCT. It is not. Understanding the methodology is the difference between reading this trial as "tirzepatide works" (obvious by SURMOUNT-1 through SURMOUNT-3) and reading it as "here is what happens to a specific population when treatment stops, and here is the magnitude of rebound."
The Randomized Withdrawal Design
Open-Label Lead-In (Weeks 0 to 36)
All 783 enrolled participants received open-label tirzepatide, escalated to the maximum tolerated dose (10 mg or 15 mg subcutaneously once weekly). This 36-week run-in served two purposes. First, it allowed dose titration without unblinding, since all participants reached their plateau dose before randomization. Second, it selected for a population of responders and tolerators. Anyone who discontinued during the lead-in, whether from adverse events, lack of efficacy, or personal choice, was excluded from the randomized phase.
This enrichment is the single most important design feature to understand. The 670 patients who entered the double-blind period had already demonstrated that they could tolerate tirzepatide and lose weight on it. That population does not represent all comers. It represents the subset most likely to benefit from continued therapy, which is exactly the population a clinician would face when deciding whether to maintain a prescription. The FDA label for Zepbound (tirzepatide) does not restrict use to responders, but real-world prescribing effectively does.
Randomization and Blinding (Weeks 36 to 88)
At week 36, participants who had achieved at least 5% body weight loss (the vast majority) were randomized 1:1 to continue tirzepatide at their current dose or switch to matching placebo. Randomization was stratified by two variables:
- Dose at randomization (10 mg vs. 15 mg)
- Diabetes status (type 2 diabetes vs. no diabetes)
The matched placebo was identical in appearance and injection volume. Participants in the placebo arm underwent a tapered withdrawal rather than abrupt cessation: their dose stepped down over several weeks. This taper is clinically relevant because it distinguishes SURMOUNT-4 from a cold-stop scenario. The observed 14% regain occurred despite a gradual transition off the drug. Abrupt cessation might produce different kinetics of regain, though no head-to-head comparison exists.
Both investigators and participants were blinded from week 36 onward. The lead-in was explicitly open-label, which introduces the possibility that participants' expectations during the blinded phase were shaped by their prior experience on active drug. If a placebo-arm participant noticed the absence of GI side effects (nausea, for instance), functional unblinding could occur. The trial protocol did not include a formal assessment of blinding integrity, a limitation the authors acknowledged.
Inclusion and Exclusion Criteria: Who Was Actually Studied
The enrolled population had a BMI of 30 or greater, or 27 or greater with at least one weight-related comorbidity. Type 2 diabetes was permitted but required an HbA1c <10%. Prior bariatric surgery was excluded. Use of other anti-obesity medications within 3 months was excluded.
Two exclusions deserve closer attention:
-
Prior GLP-1 receptor agonist use. Participants with recent GLP-1RA exposure were excluded. This means the trial population was drug-naive to the mechanism, which matters because real-world patients increasingly cycle between semaglutide and tirzepatide. Whether regain kinetics differ in GLP-1-experienced patients remains unknown.
-
Cardiovascular event history. Participants with recent major adverse cardiovascular events were excluded. Given that the SELECT trial demonstrated cardiovascular benefit for semaglutide in a high-CV-risk population, SURMOUNT-4's exclusion of that group limits extrapolation to the very patients for whom long-term maintenance might carry the highest stakes.
The Primary Endpoint and Its Estimand
The primary endpoint was percent change in body weight from week 36 (randomization) to week 88. This is a critical distinction: the baseline for the primary analysis was not the original enrollment weight but the post-lead-in weight. A participant who entered the trial at 110 kg, dropped to 90 kg during the open-label period, and then regained to 100 kg during the placebo phase would show an 11.1% increase from randomization, not an 9.1% decrease from original baseline.
Both framings are accurate. The trial's primary framing emphasizes what happens after treatment withdrawal. Secondary analyses reported weight change from the original baseline, showing that even after regain, placebo-arm participants remained below their starting weight at week 88. This secondary framing is what patients care about most ("Am I still better off than before I started?"), while the primary framing answers the mechanistic question of treatment withdrawal.
The Estimand Framework
SURMOUNT-4 used an ICH E9(R1) estimand framework, specifying two estimands:
-
Treatment-policy estimand: Includes all data regardless of treatment adherence or use of rescue medication. This is the intention-to-treat approach. It answers: "What happens when a clinician assigns this strategy?"
-
Efficacy estimand (hypothetical): Estimates the effect assuming all participants adhered to treatment without rescue medication. It answers: "What would happen if everyone took the drug as prescribed?"
The treatment-policy estimand was the primary regulatory estimand. The two estimands produced similar results because adherence in both arms was high during the blinded period (over 90% completion). In trials with higher dropout, these two estimands can diverge substantially, and the choice of which to prioritize becomes a genuine source of disagreement between sponsors and regulators. The FDA's 2023 guidance on estimands for obesity trials reflects an increasing preference for the treatment-policy approach, which SURMOUNT-4 aligned with.
Statistical Approach
The primary analysis used a mixed model for repeated measures (MMRM) with treatment group, visit, treatment-by-visit interaction, stratification factors (dose, diabetes status), and baseline body weight (at randomization) as covariates.
Missing data were handled under a missing-at-random (MAR) assumption within the MMRM framework. For the treatment-policy estimand, a reference-based multiple imputation approach was used for participants who discontinued: their missing data were imputed assuming they followed the trajectory of the placebo group. This is a conservative choice for the continuation arm and a neutral choice for the placebo arm.
The trial was powered to detect a 4-percentage-point difference in weight change, with 90% power at a two-sided alpha of 0.05. The observed difference of approximately 20 percentage points far exceeded this threshold, yielding a p-value of <0.001. The trial was, in a sense, overpowered for its primary endpoint, which raises the question of whether a smaller sample or shorter blinded period would have sufficed. The larger sample, though, provided useful data for subgroup analyses and safety surveillance.
Results Beyond the Headline
| Outcome | Tirzepatide continuation | Placebo (withdrawal) | |---|---|---| | Weight change, randomization to week 88 | −5.5% | +14.0% | | Weight change, original baseline to week 88 | −25.3% | −9.9% | | Participants achieving ≥5% loss from original baseline at week 88 | 97.3% | 57.2% | | Participants achieving ≥20% loss from original baseline at week 88 | 70.4% | 16.7% | | Waist circumference change from randomization | −4.3 cm | +7.8 cm |
The trajectory data are as informative as the endpoint data. Weight regain in the placebo arm was not linear. The steepest regain occurred in the first 12 to 16 weeks after randomization, with the curve flattening somewhat by week 88 but not reaching a clear plateau. This suggests that a 52-week withdrawal period may not have captured the full extent of regain. Had the blinded period extended to 104 weeks, the placebo arm might have regained more.
Cardiometabolic markers followed weight. The placebo group saw increases in waist circumference, blood pressure, fasting insulin, and lipid parameters relative to the continuation group. These secondary endpoints were not powered for individual statistical testing, but the pattern was consistent across markers and consistent with what the STEP 1 extension data for semaglutide showed in a similar withdrawal context.
Limitations the Authors Acknowledged
The published manuscript flagged several limitations directly:
- Enrichment bias. The randomized population consisted exclusively of responders and tolerators, limiting generalizability.
- Blinding integrity. No formal assessment of whether participants guessed their assignment.
- Duration of withdrawal. Fifty-two weeks may not capture the full regain trajectory.
- Homogeneity. The study population was predominantly White and female, limiting applicability to other demographic groups.
- No active comparator. The trial compared continuation to withdrawal, not to a lower maintenance dose or to switching to a different agent.
A limitation the authors did not emphasize: the open-label lead-in could inflate the apparent treatment effect during the blinded phase through expectation effects. A participant who experienced dramatic weight loss during the open-label period and then noticed weight regain might alter diet and exercise behavior differently than someone in a fully blinded trial from day one.
What This Design Cannot Tell You
SURMOUNT-4 was built to answer one question well: what is the consequence of stopping tirzepatide after a meaningful response? It was not designed to answer:
- Whether a lower maintenance dose could prevent regain (no dose-reduction arm existed)
- Whether intermittent dosing (drug holidays) is feasible
- How regain compares between tirzepatide and semaglutide 2.4 mg (Wegovy) in a head-to-head withdrawal design
- Whether behavioral intervention during withdrawal modifies the regain curve
- What the ceiling of regain is beyond 52 weeks off treatment
These are not criticisms of the trial. They are boundaries of what randomized withdrawal designs can address with a single comparator and fixed timeline. The American Gastroenterological Association's 2024 clinical practice update on anti-obesity medications explicitly calls for longer withdrawal and dose-reduction studies to fill these gaps.
The Clinical Translation Problem
For prescribers, SURMOUNT-4's core finding, that weight regain is substantial and rapid after tirzepatide discontinuation, supports indefinite treatment. But the trial's enrichment design means the 14% regain figure applies to a selected population. A patient who barely tolerated tirzepatide during the lead-in (and was excluded from randomization) might regain on a different curve, or might never have lost enough to make the regain question relevant.
Payers face a different interpretation problem. The trial demonstrates that stopping the drug erases a large portion of its benefit, which can be read as evidence for chronic coverage or as evidence that the drug creates dependency without durable effect. The same data support both readings, and the trial design cannot distinguish between them.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
-
Aronne LJ, Sattar N, Horn DB, et al. Continued treatment with tirzepatide for maintenance of weight reduction in adults with obesity: the SURMOUNT-4 randomized clinical trial. JAMA. 2024. https://jamanetwork.com/journals/jama/fullarticle/2814876
-
Zepbound (tirzepatide) prescribing information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2023/217806s000lbl.pdf
-
Wilding JPH, Batterham RL, Davies M, et al. Weight regain and cardiometabolic effects after withdrawal of semaglutide: the STEP 1 trial extension. Diabetes Obes Metab. 2022. https://pubmed.ncbi.nlm.nih.gov/35441470/
-
Lincoff AM, Brown-Frandsen K, Colhoun HM, et al. Semaglutide and cardiovascular outcomes in obesity without diabetes (SELECT). N Engl J Med. 2023. https://pubmed.ncbi.nlm.nih.gov/37952131/
-
Wegovy (semaglutide 2.4 mg) prescribing information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2021/215256s000lbl.pdf
-
American Gastroenterological Association clinical practice update on anti-obesity pharmacotherapy. Gastroenterology. 2024. https://pubmed.ncbi.nlm.nih.gov/38462375/