Inside the SURMOUNT-3 Methodology: What Most Summaries Skip

At a glance
| Parameter | Detail | |---|---| | N (randomized) | 579 | | Intervention | Tirzepatide, maximum tolerated dose (10 mg or 15 mg), subcutaneous, weekly | | Comparator | Matching placebo injection | | Lead-in | 12-week intensive lifestyle intervention (low-calorie diet + exercise counseling) | | Randomization ratio | 1:1 (tirzepatide : placebo) | | Duration | 72 weeks post-randomization (84 weeks total including lead-in) | | Primary endpoint | Percent change in body weight from randomization (Week 0) to Week 72 | | Key result | −21.1% tirzepatide vs −3.3% placebo (treatment difference: −17.8 percentage points) | | Registration | NCT04657016 |
Why a Lifestyle Lead-In Changes Everything
Most obesity RCTs randomize participants at baseline and layer pharmacotherapy on top of standard diet-and-exercise advice. SURMOUNT-3 did something different. Every enrolled participant first completed a 12-week intensive lifestyle program consisting of a low-calorie diet (1,200 to 1,500 kcal/day) and 150 minutes per week of physical activity. Only those who lost at least 5.0% of their starting body weight during this phase advanced to randomization.
This is an enrichment design. It selects for a specific population: people who can adhere to structured behavioral intervention and whose bodies respond to caloric restriction with clinically meaningful weight loss. The question the trial answers is therefore not "does tirzepatide cause weight loss in the general population with obesity?" (SURMOUNT-1 already answered that) but rather "among people who have already demonstrated lifestyle responsiveness, does adding tirzepatide produce additional, sustained loss beyond what lifestyle alone achieves?"
That distinction matters for clinical interpretation. Roughly 45% of screened participants failed to meet the 5% threshold during lead-in and were excluded. The randomized cohort was, by definition, a selected subset. Physicians considering SURMOUNT-3 results for a patient who has never responded to dietary changes should recognize this population enrichment.
Randomization and Blinding Architecture
After the lead-in, the 579 qualifying participants were randomized 1:1 to tirzepatide or placebo. Randomization used an interactive web-response system stratified by two factors: BMI category (<35 vs ≥35 kg/m²) and amount of weight lost during the lead-in (<10% vs ≥10%). These strata ensured balanced distribution of participants who were "super-responders" to lifestyle (those losing ≥10%) across both arms, preventing an imbalance that could confound the primary analysis.
Blinding was maintained through identical injection devices and dose-escalation schedules. Both arms followed the same titration: 2.5 mg weekly for 4 weeks, then 5 mg for 4 weeks, then 10 mg for 4 weeks, with an optional increase to 15 mg for the remainder. In the placebo arm, the same volume and injection schedule were used. The maximum tolerated dose (MTD) design meant that participants who could not tolerate 15 mg remained on 10 mg, which introduces within-arm dose heterogeneity. According to the published protocol, roughly 79% of tirzepatide-treated participants reached the 15 mg dose.
The HealthRX Estimand Interpretation Matrix
SURMOUNT-3 pre-specified two estimand frameworks, and understanding the distinction between them is critical to interpreting the headline numbers correctly.
Efficacy estimand (treatment policy): This estimates the effect of the assigned treatment regardless of whether participants actually took every dose or discontinued early. It includes all randomized participants and uses a mixed model for repeated measures (MMRM). The headline result of −21.1% vs −3.3% comes from this estimand. Under this model, early discontinuers still contribute data up to their last visit, but the MMRM imputes their trajectory.
Efficacy estimand (on-treatment): This estimates the treatment effect among participants who continued their assigned injection throughout the trial. It excludes data collected after permanent treatment discontinuation and uses a similar MMRM but censors post-discontinuation observations. The on-treatment result was −25.3% for tirzepatide vs −4.4% for placebo, a wider gap because participants who stayed on drug tended to lose more.
| Estimand | Tirzepatide (%) | Placebo (%) | Difference (pp) | |---|---|---|---| | Treatment policy | −21.1 | −3.3 | −17.8 | | On-treatment | −25.3 | −4.4 | −20.9 |
The treatment-policy estimand is the regulatory-grade number because it reflects the real-world scenario where some patients stop taking the drug. The on-treatment number is clinically informative but optimistic, as it reflects only those who stayed the course. When press coverage cites the "26% loss" figure, it is usually pulling from the on-treatment estimate, which overstates the average experience of someone who fills a tirzepatide prescription.
Statistical Approach: What the MMRM Does and Does Not Handle
The primary analysis used a restricted maximum likelihood MMRM with fixed effects for treatment, visit, treatment-by-visit interaction, stratification factors, and baseline body weight. This is a standard approach in obesity trials and handles intermittent missing data under the missing-at-random (MAR) assumption.
MAR assumes that the probability of dropout depends on observed data (prior weight measurements, treatment arm) but not on the unobserved weight the participant would have had. In practice, participants who discontinue due to side effects may have been on a different weight trajectory than completers, which would violate MAR. The investigators addressed this with sensitivity analyses using a retrieved-dropout pattern mixture model and a tipping-point analysis. Neither analysis overturned the primary result, but neither can fully rule out informative dropout.
Discontinuation rates were asymmetric: 14.4% in the tirzepatide arm vs 6.2% in the placebo arm. The higher discontinuation in the active arm was driven primarily by gastrointestinal adverse events (nausea, diarrhea, constipation). Because more participants left the tirzepatide arm, and those who left may have been losing less weight or experiencing tolerability problems, the treatment-policy estimand may slightly underestimate the effect in tolerant patients while slightly overestimating it for all-comers. This is a known tension in obesity-trial statistics and is discussed in the FDA's 2007 guidance on obesity endpoints.
Inclusion and Exclusion Criteria: The Hidden Filters
Beyond the 5% lead-in threshold, SURMOUNT-3 had additional filters that narrow the generalizability of results:
- BMI requirement: ≥30 kg/m², or ≥27 kg/m² with at least one weight-related comorbidity (hypertension, dyslipidemia, obstructive sleep apnea, or cardiovascular disease). This matches the FDA-approved tirzepatide (Zepbound) label.
- Diabetes exclusion: Participants with type 2 diabetes were excluded. This is critical. Tirzepatide's GIP/GLP-1 dual agonism affects glucose homeostasis, and patients with diabetes lose weight differently (often less) on GLP-1-class drugs. SURMOUNT-3 results should not be extrapolated to populations with diabetes.
- Prior anti-obesity medication: Use of prescription anti-obesity drugs within 90 days of screening was exclusionary. The population was therefore pharmacologically "naive" at the point of randomization, though they had just completed a successful behavioral program.
- Surgical history: Prior bariatric surgery was excluded, removing a population with altered gastrointestinal anatomy that could change drug absorption and response.
The Comparator Question
Placebo-controlled designs in obesity trials face a recurring criticism: is placebo the right comparator when multiple anti-obesity medications are already approved? The SURMOUNT program chose placebo across its key trials (SURMOUNT-1 through 4) because the FDA guidance for weight-management drugs requires demonstration of superiority over placebo plus lifestyle. Active-comparator trials exist in the GLP-1 space (STEP 8 compared semaglutide to liraglutide), but regulatory approval does not require head-to-head data.
For SURMOUNT-3 specifically, the placebo comparator has a secondary function: it quantifies how much weight participants regain (or continue to lose) after the intensive lifestyle phase ends. The placebo arm lost 3.3% additional weight over 72 weeks, suggesting that the behavioral changes from lead-in had some residual effect, though modest. Had the comparator been an active drug like semaglutide 2.4 mg, the trial would answer a different question entirely.
Weight Regain After Lead-In: What the Placebo Arm Reveals
The placebo arm is arguably the most underappreciated data in SURMOUNT-3. After achieving a mean 7.4% weight loss during the 12-week lifestyle phase, placebo participants lost an additional 3.3% from randomization through Week 72. This modest continued loss (rather than regain) is somewhat surprising, given that most lifestyle-only studies show gradual weight recrudescence after the intensive phase ends.
Several explanations exist. First, the ongoing lifestyle counseling (reduced-calorie diet plus physical activity advice) continued for both arms throughout the 72-week treatment period. Participants were not abandoned after lead-in. Second, selection bias plays a role: these were the 55% who successfully lost ≥5% during lead-in, a group likely characterized by higher motivation, adherence capacity, or metabolic responsiveness. Third, the 72-week window may not have been long enough to capture full weight regain, which in behavioral studies often peaks at 2 to 5 years.
Limitations the Authors Acknowledged
The primary publication lists several limitations worth highlighting:
- Population homogeneity: The trial population was predominantly white (72%) and female (59%). Generalizability to other demographics remains uncertain, particularly given known differences in body composition and metabolic responses across racial groups.
- Fixed duration: 72 weeks of treatment does not answer the durability question. SURMOUNT-4 addressed treatment withdrawal, showing significant weight regain after stopping tirzepatide, which contextualizes the SURMOUNT-3 result as maintenance-dependent.
- Enrichment bias: The lead-in design, while clinically meaningful, precludes comparison with trials that enroll all-comers. Cross-trial comparisons with SURMOUNT-1 or STEP trials require adjustment for baseline differences.
- No active comparator: See above.
Clinical Translation: Who Is the SURMOUNT-3 Patient?
The patient most clearly represented by SURMOUNT-3 is someone who has already tried structured lifestyle modification (not just "diet and exercise" mentioned at an annual physical, but a real calorie-restricted program with regular check-ins), lost weight successfully, and is considering pharmacotherapy to extend and maintain that loss. This is a narrower clinical scenario than the general "patient with obesity seeking treatment" that SURMOUNT-1 addresses.
The American Gastroenterological Association's 2022 obesity pharmacotherapy guideline recommends considering anti-obesity medications when lifestyle intervention alone is insufficient. SURMOUNT-3 adds nuance: even when lifestyle IS sufficient in the short term, pharmacotherapy can compound the benefit.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Wadden TA, et al. Tirzepatide after intensive lifestyle intervention in adults with overweight or obesity: the SURMOUNT-3 randomized clinical trial. Nat Med. 2023;29(11):2909-2918. PubMed
- FDA. Zepbound (tirzepatide) prescribing information. 2023. FDA Label
- FDA. Developing Products for Weight Management (Revised). Guidance for Industry. 2007. FDA Guidance
- Aronne LJ, et al. Continued treatment with tirzepatide for maintenance of weight reduction in adults with obesity: the SURMOUNT-4 randomized clinical trial. JAMA. 2024;331(1):38-48. PubMed
- Garvey WT, et al. AGA Clinical Practice Guideline on Pharmacological Interventions for Adults with Obesity. Gastroenterology. 2022;163(5):1198-1225. PubMed