Inside the SURMOUNT-2 Methodology: What Most Summaries Skip

At a glance
| Detail | Value | |---|---| | N randomized | 938 | | Intervention | Tirzepatide 10 mg or 15 mg SC weekly | | Comparator | Matching placebo SC weekly | | Duration | 72 weeks | | Population | Adults with BMI ≥27 kg/m² and type 2 diabetes | | Primary endpoint | Percent change in body weight from baseline at week 72 | | Key result | −12.8% (10 mg) and −14.7% (15 mg) vs −3.2% (placebo), treatment-policy estimand |
Why the Design Matters More Than the Headline
Most coverage of SURMOUNT-2 stops at the weight-loss percentage. That number is real, but it was produced by a specific set of methodological decisions. Change the estimand, the population definition, or the handling of intercurrent events, and the reported effect size shifts. This page walks through each major design choice and explains what it means for clinicians trying to apply these data.
Randomization and Blinding
SURMOUNT-2 used a 1:1:1 randomization ratio across three arms: tirzepatide 10 mg, tirzepatide 15 mg, and placebo. Randomization was stratified by baseline BMI (<35 vs ≥35 kg/m²) and by baseline HbA1c (<8.5% vs ≥8.5%). The BMI stratification ensures balanced representation of participants closer to overweight versus those with severe obesity. The HbA1c stratification matters because glycemic severity can independently affect weight trajectory: participants with higher HbA1c may show greater initial weight reduction from glucose control alone.
Blinding was maintained through matched placebo injections, identical pen devices, and a centralized interactive web-response system. Dose escalation followed a fixed schedule (2.5 mg starting dose, increased by 2.5 mg every four weeks), so both active and placebo arms underwent the same injection-volume changes at the same intervals. This is a stronger blinding method than some earlier obesity trials where dose titration was clinician-driven, which can inadvertently unblind through side-effect patterns.
The Population: Narrower Than You Think
The inclusion criteria required a BMI of ≥27 kg/m² (not ≥30) combined with a diagnosis of type 2 diabetes. This is a lower BMI floor than SURMOUNT-1, which enrolled participants with BMI ≥30 (or ≥27 with at least one comorbidity) but excluded those with diabetes. The practical effect: SURMOUNT-2's population started at a lower average body weight, which can affect both absolute and percentage weight loss.
Key exclusion criteria that shape interpretation:
- HbA1c <7.0% or >10.0% at screening. This excluded both well-controlled and severely uncontrolled diabetes, selecting a moderate-severity population.
- Use of GLP-1 receptor agonists within 3 months. This means the trial population was GLP-1 naive or had a sufficient washout, so the results reflect first-exposure efficacy.
- History of bariatric surgery. Excluding surgical patients removes the confound of altered gut anatomy, but it also means results cannot be extrapolated to the growing post-surgical population.
- eGFR <30 mL/min/1.73 m². Renal impairment was excluded, limiting applicability to CKD stage 4-5 patients.
The enrolled population had a mean baseline BMI of 36.1 kg/m², mean body weight of 100.7 kg, and mean HbA1c of 8.02%. Roughly 53% were women. These numbers matter: the population was predominantly white (68%) and from high-income countries, a limitation the authors themselves acknowledged.
Defining the Primary Endpoint: Two Estimands, Two Numbers
Here is where SURMOUNT-2's methodology diverges from casual reading. The trial pre-specified two estimands for the same primary endpoint (percent change in body weight at 72 weeks), and the distinction between them changes the reported result by roughly one percentage point.
The HealthRX Estimand Clarity Framework for SURMOUNT-2:
| Estimand | What It Measures | Handling of Discontinuation | Reported Result (15 mg) | |---|---|---|---| | Treatment-policy (primary) | Effect of being assigned to tirzepatide, regardless of adherence | Includes all participants; uses data after discontinuation | −14.7% | | Efficacy (secondary) | Effect of staying on tirzepatide for 72 weeks | Censors data after treatment discontinuation; uses modeling | −15.7% |
The treatment-policy estimand is an intention-to-treat analysis. If a participant stopped tirzepatide at week 20 and regained weight by week 72, that weight regain counts against the drug. This is the more conservative, real-world-relevant number: it reflects what happens when you prescribe tirzepatide to a population, knowing some will discontinue.
The efficacy estimand answers a different question: what happens to weight if a patient actually stays on the drug? It uses a mixed model for repeated measures (MMRM) that censors observations after treatment discontinuation. The −15.7% figure that appears in most headlines comes from this estimand. It is clinically meaningful (adherent patients will want to know their expected trajectory) but it overstates what a prescriber should expect across an unselected patient panel.
Why this matters clinically: When comparing SURMOUNT-2 to other trials, confirm which estimand you are comparing. The semaglutide 2.4 mg STEP trials used similar dual-estimand approaches, but the proportions of discontinuation differed, making cross-trial efficacy-estimand comparisons particularly unreliable.
Statistical Architecture
The primary analysis used an MMRM model with treatment, visit, treatment-by-visit interaction, stratification factors (BMI category and HbA1c category), and baseline body weight as covariates. Repeated measures used an unstructured covariance matrix.
Multiplicity was controlled through a gatekeeping strategy. The 15 mg dose was tested first against placebo for superiority on the co-primary endpoints (percent weight change and proportion achieving ≥5% weight loss). Only if both succeeded at the 0.025 one-sided significance level did testing proceed to the 10 mg dose. This sequential approach protects the family-wise error rate but also means the trial was powered primarily for the 15 mg comparison.
Missing data handling is where trials often diverge most. SURMOUNT-2 used multiple imputation under a "retrieved dropout" assumption for the treatment-policy estimand. This assumes that participants who discontinued treatment would, on average, follow the trajectory of similar participants who also discontinued. For the efficacy estimand, data after discontinuation were treated as missing and handled by the MMRM (which assumes missing-at-random). Neither approach is assumption-free, but the retrieved-dropout imputation is more conservative than last-observation-carried-forward, which older obesity trials sometimes used and which tends to overestimate treatment effects.
The Comparator Choice and Its Consequences
Placebo plus lifestyle counseling was the only comparator. There was no active-comparator arm (e.g., semaglutide 2.4 mg). This is standard for a registration trial but it limits the data's utility for formulary decision-making, where the relevant question is often "tirzepatide vs. semaglutide" rather than "tirzepatide vs. nothing."
All participants received lifestyle counseling consisting of a 500 kcal/day deficit and ≥150 minutes/week of physical activity. Counseling was standardized but not monitored with objective measures (no accelerometry, no food diaries with verification). The counseling component likely contributed to placebo-arm weight loss of 3.2%, which is higher than the 1-2% typically seen in pharmacotherapy trials with less intensive lifestyle support.
Concomitant diabetes medications were allowed with restrictions. Participants on metformin continued it at a stable dose. SGLT2 inhibitors, sulfonylureas, and insulin were allowed but could be adjusted by investigators to avoid hypoglycemia. This pragmatic approach reflects real prescribing but introduces variability: a participant whose sulfonylurea was reduced may have had different weight and glycemic trajectories than one whose medication remained unchanged.
Results Beyond the Abstract
The full results table shows a dose-response gradient and consistent effects across secondary endpoints.
| Outcome | Tirzepatide 10 mg (n=312) | Tirzepatide 15 mg (n=311) | Placebo (n=315) | |---|---|---|---| | Weight change (treatment-policy) | −12.8% | −14.7% | −3.2% | | Weight change (efficacy) | −13.4% | −15.7% | −3.3% | | ≥5% weight loss | 79.2% | 82.8% | 32.5% | | ≥10% weight loss | 57.1% | 66.1% | 9.2% | | ≥15% weight loss | 35.6% | 48.2% | 2.6% | | HbA1c change | −2.1% | −2.1% | −0.5% | | Waist circumference change (cm) | −10.8 | −12.3 | −3.3 |
The ≥15% threshold is particularly notable. Nearly half of participants on the 15 mg dose achieved it, a benchmark previously associated with improvements in obstructive sleep apnea severity and reduction in cardiovascular risk factors in observational data. The FDA's prescribing information for Zepbound reflects these categorical thresholds.
Adverse Events and Discontinuation Patterns
Gastrointestinal events were the most common adverse effects: nausea (24-33% in active arms vs 7% placebo), diarrhea (17-22% vs 8%), and vomiting (10-13% vs 2%). Most events were mild to moderate and occurred during the dose-escalation phase.
Treatment discontinuation due to adverse events was 4.8% (10 mg), 7.4% (15 mg), and 2.5% (placebo). This is meaningful for interpreting the estimand split. The 7.4% discontinuation rate in the 15 mg arm means the efficacy-estimand result selectively reflects those who tolerated the drug. Clinicians should counsel patients that roughly 1 in 14 may need to stop due to side effects, and those who stop will not achieve the efficacy-estimand weight loss.
Limitations the Authors Acknowledged
The original publication explicitly listed several limitations:
- 72-week duration. Obesity is a chronic disease. Without extension data, durability of weight loss after 72 weeks is unknown from this trial alone (though SURMOUNT-1 extension data suggest weight regain after discontinuation).
- Limited racial and ethnic diversity. The predominantly white, high-income-country population limits generalizability. Pharmacokinetic and pharmacodynamic responses to GIP/GLP-1 receptor agonists may differ across populations.
- No active comparator. As noted above, the trial cannot answer head-to-head questions against semaglutide or other agents.
- Concomitant medication adjustments. Reductions in sulfonylureas or insulin to prevent hypoglycemia could have independently affected weight.
- Lifestyle counseling intensity. The structured counseling program may not reflect real-world clinical settings where nutritional support is less intensive.
How These Methods Shape Clinical Application
For prescribers, the key takeaway is that the treatment-policy estimand (−14.7% at 15 mg) is the more actionable number. It accounts for real-world dropout. The efficacy estimand is useful for setting expectations with patients who tolerate the drug well, but it should not be used for population-level planning.
The ADA Standards of Care now reference tirzepatide as an option for weight management in type 2 diabetes, with the caveat that long-term cardiovascular outcome data (the ongoing SURPASS-CVOT and SURMOUNT-MMO trials) are still pending. The methodology of SURMOUNT-2 supports efficacy but does not yet establish whether that weight loss translates to reduced cardiovascular events in this population.
Frequently asked questions
›
›
›
›
›
›
›
›
›
References
- Garvey WT, Frias JP, Jastreboff AM, et al. Tirzepatide once weekly for the treatment of obesity in people with type 2 diabetes (SURMOUNT-2): a double-blind, randomised, multicentre, placebo-controlled, phase 3 trial. Lancet. 2023;402(10402):613-626. PubMed
- Jastreboff AM, Aronne LJ, Ahmad NN, et al. Tirzepatide once weekly for the treatment of obesity (SURMOUNT-1). N Engl J Med. 2022;387(3):205-216. PubMed
- U.S. Food and Drug Administration. Zepbound (tirzepatide) prescribing information. FDA Label
- Wilding JPH, Batterham RL, Calanna S, et al. Once-weekly semaglutide in adults with overweight or obesity (STEP 1). N Engl J Med. 2021;384(11):989-1002. PubMed
- American Diabetes Association Professional Practice Committee. Standards of Care in Diabetes, 2024. Diabetes Care. 2024;47(Suppl 1). PubMed