Honest Criticisms and Limitations of the SURMOUNT-4 Trial

GLP-1 medication and metabolic health image for Honest Criticisms and Limitations of the SURMOUNT-4 Trial

At a glance

| Detail | Value | |---|---| | N | 670 (randomized after 36-week open-label lead-in) | | Intervention | Tirzepatide (max tolerated dose: 10 mg or 15 mg weekly) continued for 52 additional weeks | | Comparator | Switch to placebo for 52 weeks | | Total duration | 88 weeks (36-week lead-in + 52-week randomized period) | | Primary endpoint | Percent change in body weight from week 36 (randomization) to week 88 | | Key result | Continuation group lost an additional 5.5%; placebo-switch group regained 14.0% |

Why This Trial Invites Scrutiny

SURMOUNT-4, published in JAMA in 2024, was designed to answer a specific clinical question: what happens when you stop tirzepatide after significant weight loss? The answer was stark. Patients switched to placebo regained roughly two-thirds of the weight they had lost during the open-label lead-in, while those who continued the drug kept losing.

That finding generated headlines. It also generated criticism. The trial's design choices, population selection, and funding structure each introduce caveats that matter for clinicians interpreting the data and for patients making decisions about indefinite treatment.

The Enrichment Design Problem

SURMOUNT-4 used a single-arm, open-label lead-in followed by randomized withdrawal. This is sometimes called an "enriched enrollment, randomized withdrawal" (EERW) design. Only patients who tolerated tirzepatide and achieved at least 5% weight loss during the 36-week open-label phase were randomized.

What this means in practice: the trial excluded non-responders and those who dropped out due to side effects before randomization even began. The 670 patients who entered the double-blind phase were, by definition, the drug's best-case population.

We can frame the bias with a simple schema:

| Stage | What happened | Who was excluded | |---|---|---| | Screening | 793 entered open-label lead-in | Patients who declined, had contraindications, or failed screening | | Lead-in (weeks 0-36) | All received tirzepatide | Non-responders (<5% loss), dropouts from GI side effects, anyone who discontinued for any reason | | Randomization (week 36) | 670 randomized | Already filtered to tolerant responders only | | Analysis (week 88) | Primary endpoint assessed | Results reflect enriched population, not intent-to-treat from drug initiation |

This design is not unusual in withdrawal studies. The FDA has accepted EERW designs for demonstrating maintenance of effect. But the resulting effect sizes are inflated relative to what a clinician would observe starting a new patient on tirzepatide and following them for 88 weeks. The 20.9% total weight loss from baseline reported at week 36 represents a best-case cohort, not the average patient walking into an obesity clinic.

Follow-Up Duration: 52 Weeks Is Not Long-Term

The randomized phase lasted 52 weeks. For a drug that trials like SURMOUNT-1 suggest patients may take indefinitely, one year of withdrawal data is a starting point, not a conclusion.

Several questions remain unanswered by this timeframe:

  • Weight trajectory plateau. The placebo group was still regaining weight at week 88. Did regain eventually stabilize, or would patients have returned entirely to baseline? The trial cannot say.
  • Metabolic rebound. Improvements in HbA1c, blood pressure, and lipids also reversed in the placebo group. Whether these metabolic parameters would have fully returned to pre-treatment levels with longer follow-up is unknown.
  • Re-treatment response. SURMOUNT-4 did not include a re-treatment arm. Clinicians cannot infer from this trial whether restarting tirzepatide after discontinuation produces the same magnitude of response.

The American Gastroenterological Association's 2024 clinical practice guideline on pharmacotherapy for obesity acknowledged that withdrawal-associated regain is expected for anti-obesity medications broadly, but noted the need for longer-term data to guide decisions about treatment duration and potential drug holidays.

Demographic and Generalizability Gaps

The randomized population in SURMOUNT-4 was predominantly white (84.6%) and female (71.5%), with a mean age of approximately 48 years and mean baseline BMI around 38 kg/m².

Notable exclusions or underrepresentations:

  • Type 2 diabetes. Patients with T2D were excluded. This is a major limitation given that a substantial proportion of patients with obesity have comorbid diabetes, and metabolic context alters both weight-loss magnitude and regain patterns.
  • Older adults. Mean age was under 50. Sarcopenia risk with weight cycling is a particular concern in older adults, and this population was poorly represented.
  • Racial and ethnic diversity. With <16% non-white enrollment, the trial cannot speak confidently to how discontinuation-related regain manifests across different populations. Obesity prevalence and treatment response vary by race and ethnicity in ways that matter clinically.
  • Prior bariatric surgery. Excluded. Many real-world patients considering GLP-1 therapy have surgical histories.
  • Concomitant medications. Use of other weight-affecting medications was restricted. Real-world polypharmacy (antidepressants, antipsychotics, insulin, corticosteroids) was not reflected.

Statistical and Methodological Caveats

Missing data handling. The primary analysis used a treatment-policy estimand with multiple imputation for missing data. Approximately 14% of the placebo group and 10% of the continuation group discontinued before week 88. Multiple imputation assumes data are missing at random, an assumption that is difficult to verify when dropout may relate to weight regain itself (informatively missing).

No active comparator. The comparison was tirzepatide vs. placebo, not tirzepatide vs. lifestyle intervention, behavioral support, or alternative pharmacotherapy. This design tells us that stopping the drug leads to regain. It does not tell us whether structured behavioral programs, dose reduction, or switching to a less intensive agent could mitigate that regain.

Dose pooling. Patients on 10 mg and 15 mg were pooled in the primary analysis. While subgroup data suggested similar patterns across doses, pooling obscures potential dose-dependent differences in both weight maintenance and regain velocity.

Outcome framing. The primary endpoint was percent change from week 36 to week 88, not from original baseline. This is technically appropriate for the study question but can mislead readers who conflate the randomization-period result with total treatment effect. The continuation group's additional 5.5% loss from week 36 is a smaller absolute number than the 14% regain in the placebo group, but both are measured from an already-reduced weight.

Conflict of Interest and Funding

SURMOUNT-4 was funded by Eli Lilly, the manufacturer of tirzepatide (Zepbound/Mounjaro). Multiple authors were Lilly employees. The lead investigators received consulting fees from Lilly and other pharmaceutical companies developing obesity treatments.

This does not invalidate the data. Industry-funded trials undergo the same peer-review process and regulatory scrutiny. But the trial's design choices consistently favored demonstrating a large treatment effect:

  • Enrichment removed poor responders before randomization
  • The comparator was placebo (guaranteed to show regain) rather than a less expensive or less intensive alternative
  • The 36-week lead-in ensured near-maximal weight loss before the withdrawal comparison began
  • The primary endpoint timeframe (week 36 to 88) maximized the visual separation between arms

None of these choices are scientifically inappropriate. Taken together, they represent a trial engineered to produce the most favorable framing of the drug's necessity for long-term use. Clinicians should recognize this when translating results to practice.

What Post-Publication Commentary Highlighted

Editorials and letters following SURMOUNT-4's publication raised several points:

  1. The "drug for life" implication. Multiple commentators noted that the trial's primary utility is as evidence that tirzepatide must be continued indefinitely, which directly serves the manufacturer's commercial interest. The clinical question of how to safely discontinue or taper was not addressed.

  2. Body composition concerns. Weight regain after GLP-1 discontinuation may preferentially restore fat mass over lean mass, potentially leaving patients with worse body composition than before treatment. SURMOUNT-4 did not include DEXA or other body-composition endpoints in the withdrawal phase.

  3. Cost and access framing. If indefinite use is necessary, the annual cost of tirzepatide (approximately $12,000-$14,000 at US list price per the Zepbound prescribing information) becomes a lifetime expenditure. Several commentators argued that withdrawal trials should be paired with health-economic analyses.

  4. Psychological impact. Rapid weight regain after drug withdrawal carries psychological consequences (shame, treatment failure perception, disordered eating relapse) that clinical trials rarely capture. Patient-reported outcome measures in SURMOUNT-4 showed worsening quality-of-life scores in the placebo group, but deeper psychological assessment was not performed.

Putting the Limitations in Context

These criticisms do not mean SURMOUNT-4's findings are wrong. The trial clearly demonstrated that tirzepatide withdrawal leads to substantial weight regain. That finding is consistent with the STEP 1 extension data for semaglutide and with decades of obesity physiology research showing that the body defends a higher set point after weight loss.

The criticisms matter because they define the boundaries of what the trial actually proved. It proved that in a selected group of excellent responders without diabetes, stopping tirzepatide after 36 weeks of treatment leads to significant regain over the following year. It did not prove that every patient needs lifelong therapy, that tapering is impossible, that no behavioral or pharmacological alternative exists for maintenance, or that the regain trajectory applies equally across diverse populations.

Clinicians citing SURMOUNT-4 to justify indefinite prescriptions should be transparent with patients about what the trial measured and whom it measured it in. Patients deserve to know that the "14% regain" figure comes from a best-case population and a design that, by construction, maximized the gap between continuation and withdrawal.

Frequently asked questions

References