Honest Criticisms and Limitations of the SURMOUNT-2 Trial

At a glance
| Parameter | Detail | |---|---| | N | 938 | | Intervention | Tirzepatide 10 mg or 15 mg subcutaneous weekly | | Comparator | Placebo (subcutaneous weekly) | | Duration | 72 weeks | | Primary endpoint | Percent change in body weight from baseline | | Key result | −12.8% (10 mg) and −14.7% (15 mg) vs −3.2% (placebo) by treatment-policy estimand; −13.4% and −15.7% vs −3.3% by efficacy estimand |
Why a Limitations Page Exists
Most coverage of SURMOUNT-2 stops at the headline numbers. A 15.7% body weight reduction in people with type 2 diabetes is clinically impressive, and the trial was well-powered and properly randomized. None of that is in dispute. What is worth examining, though, is the distance between a controlled trial environment and the population that will actually receive tirzepatide prescriptions. This page catalogs the specific gaps.
Enrollment Biases and Who Was Excluded
The trial enrolled adults aged 18 and older with a BMI of 27 kg/m² or greater and an HbA1c between 7.0% and 10.0%. That HbA1c window is worth scrutinizing. Patients with poorly controlled diabetes (HbA1c above 10%) were excluded. These are precisely the patients who may benefit most from dual glucose and weight management, yet they were left out of the dataset.
Additional exclusion criteria narrowed the pool further. Participants could not be taking insulin or SGLT2 inhibitors at screening. Given that current ADA guidelines recommend SGLT2 inhibitors for patients with T2D and cardiovascular or renal comorbidities, this exclusion removes a substantial and clinically relevant subset of the T2D population. The trial also excluded anyone with a history of pancreatitis, medullary thyroid carcinoma, or MEN2 syndrome, consistent with GLP-1 receptor agonist labeling, but these exclusions further limit the generalizability envelope.
Baseline participant characteristics reveal another skew. Mean baseline BMI was approximately 36 kg/m², and mean diabetes duration was roughly 8.5 years. Patients with very long-standing T2D (20+ years), who often have more beta-cell exhaustion and insulin dependence, are underrepresented. The trial population had relatively preserved beta-cell function, which may have amplified the glycemic and weight responses observed.
Racial and Geographic Diversity Gaps
SURMOUNT-2 enrolled participants across 14 countries, but the racial breakdown published in the supplementary appendix shows roughly 82% White, 4% Black or African American, and 8% Asian participants. In the United States, Black adults carry a disproportionate burden of both obesity and T2D. A 4% representation rate makes it difficult to draw firm conclusions about efficacy or safety in this group.
Hispanic/Latino participants were better represented at roughly 45% of the overall cohort, largely because of enrollment sites in Latin America. Still, the underrepresentation of Black participants is a recurring problem in obesity pharmacotherapy trials and was not solved here.
The HealthRX Generalizability Stress Test
To evaluate how well SURMOUNT-2 results map onto real-world patients, we apply a five-question framework to every trial in our library:
| Question | SURMOUNT-2 Assessment | |---|---| | 1. Does the enrolled population match the prescribing population? | Partially. Excludes insulin users, SGLT2i users, and HbA1c >10%, which together represent a large share of T2D patients seen in primary care. | | 2. Is the comparator realistic? | No active comparator. Placebo injection does not reflect a treatment decision between tirzepatide and semaglutide or other GLP-1 RAs. | | 3. Does the follow-up capture durable outcomes? | 72 weeks is adequate for weight trajectory but insufficient for cardiovascular events, cancer screening, or long-term weight regain. | | 4. Are concomitant medications representative? | Partially. Metformin was allowed, but insulin and SGLT2i were excluded. Many real patients are on combination therapy. | | 5. Does the safety database cover rare events? | No. N=938 is underpowered for signals like pancreatitis, thyroid neoplasia, or gallbladder events at population-level frequency. |
This framework is not a pass/fail test. It identifies where clinicians should apply extra caution when extrapolating trial data to their specific patient.
Statistical Design: What the Dual Estimand Approach Obscures
SURMOUNT-2 reported two estimands. The "treatment-policy" estimand includes all randomized participants regardless of adherence or rescue medication use. The "efficacy" estimand, sometimes called the "trial-product" estimand, censors data after treatment discontinuation or initiation of rescue therapy. The headline 15.7% weight loss at 15 mg comes from the efficacy estimand.
This dual-estimand approach follows ICH E9(R1) guidelines and is methodologically sound. The concern is presentational. Media coverage, promotional materials, and even some clinical discussions default to the efficacy estimand figure, which inherently inflates the apparent effect by excluding patients who stopped treatment. The treatment-policy estimate of 14.7% for the 15 mg dose is still substantial, but the gap between the two numbers reflects the reality that not everyone tolerates the drug.
Discontinuation rates matter here. In the SURMOUNT-2 publication, approximately 14% of participants in the tirzepatide 15 mg group discontinued treatment, compared with about 12% in placebo. Gastrointestinal adverse events (nausea, diarrhea, vomiting) were the most common reasons for discontinuation in the active arms. In practice, discontinuation rates for GLP-1 RAs tend to be higher outside of clinical trials, where patients lack the structured follow-up, dose titration support, and motivation that trial participation provides.
The Missing Active Comparator Problem
SURMOUNT-2 compared tirzepatide against placebo. At the time of trial design, this was a reasonable regulatory strategy. For clinical decision-making in 2026, it leaves a critical question unanswered: how does tirzepatide compare to semaglutide 2.4 mg (Wegovy) or other approved anti-obesity medications in the T2D population?
The SURMOUNT-5 trial later provided a head-to-head comparison with semaglutide 2.4 mg, but that study enrolled participants without T2D. For prescribers managing patients who have both obesity and T2D, direct comparative data remains thin. Cross-trial comparisons between SURMOUNT-2 and STEP-2 (semaglutide in T2D) are tempting but methodologically unreliable due to differences in baseline characteristics, concomitant medications, and endpoint definitions.
Follow-Up Duration and Weight Regain
Seventy-two weeks is long enough to observe near-maximal weight loss on GLP-1 RAs, but it tells us nothing about what happens after treatment stops. Data from the STEP-1 extension study showed that participants regained approximately two-thirds of lost weight within one year of semaglutide discontinuation. SURMOUNT-2 did not include a post-treatment observation period.
This omission matters because tirzepatide, like semaglutide, is positioned as a chronic therapy. Patients and payers need data on durability, and 72-week on-treatment results do not answer whether benefits persist, plateau, or reverse over 3, 5, or 10 years. The cardiovascular outcome trial for tirzepatide (SURPASS-CVOT) will eventually provide longer follow-up, but its primary endpoint is MACE, not sustained weight loss.
Conflict of Interest and Funding Considerations
SURMOUNT-2 was funded by Eli Lilly and Company. Lilly employees participated in study design, data collection, data analysis, data interpretation, and manuscript writing. Multiple academic authors reported consulting fees, advisory board participation, or research grants from Lilly and competing manufacturers.
This does not invalidate the data. Industry-funded trials follow the same GCP standards and regulatory oversight as publicly funded research. However, several features of industry-funded obesity trials deserve scrutiny:
- Titration schedules were optimized during development to minimize GI side effects and maximize the proportion of participants reaching target doses. Real-world prescribing may not replicate this careful escalation.
- Lifestyle counseling was standardized across all arms (500 kcal/day deficit, 150 min/week physical activity). The drug effect is measured on top of this structured behavioral intervention, which many patients will not receive in routine care.
- Publication timing aligned with the FDA approval of tirzepatide for chronic weight management (marketed as Zepbound), a sequence that is standard but can influence the framing and emphasis in the manuscript.
What Post-Publication Commentary Identified
Several letters to the editor and invited commentaries published after the original SURMOUNT-2 paper highlighted additional concerns:
- Lean mass loss. Body composition was not a primary or secondary endpoint. In a population already at risk for sarcopenic obesity, the proportion of weight lost as lean mass versus fat mass has clinical significance for functional outcomes, fall risk, and metabolic rate. Subsequent analyses of tirzepatide and body composition data remain limited.
- Gallbladder events. Cholelithiasis and cholecystitis occurred more frequently in the tirzepatide arms. The trial was not powered to detect a statistically significant difference, but the signal is consistent with the known association between rapid weight loss and gallstone formation.
- Applicability to older adults. Mean age in SURMOUNT-2 was approximately 54 years. Adults over 65 are underrepresented, and the risk-benefit calculus for aggressive weight loss is different in older populations where sarcopenia, bone density loss, and frailty are competing concerns.
What SURMOUNT-2 Does and Does Not Prove
The trial proves that tirzepatide produces clinically significant weight loss in adults with T2D and overweight or obesity, with concurrent HbA1c improvements. It does not prove that tirzepatide is the best available treatment for this population, that the weight loss is durable beyond 72 weeks, or that the results apply equally to patients on insulin, patients with long-standing T2D, Black patients, or older adults.
These are not failures of the trial. They are boundaries of what a single Phase 3 RCT can demonstrate. Clinicians should use SURMOUNT-2 as one input alongside cardiovascular outcome data, real-world evidence, insurance formulary considerations, and individual patient factors.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Garvey WT, Frias JP, Jastreboff AM, et al. Tirzepatide once weekly for the treatment of obesity in people with type 2 diabetes (SURMOUNT-2): a double-blind, randomised, multicentre, placebo-controlled, phase 3 trial. Lancet. 2023;402(10402):613-626. PubMed
- FDA. Zepbound (tirzepatide) prescribing information. 2023. FDA Label
- Wilding JPH, Batterham RL, Davies M, et al. Weight regain and cardiometabolic effects after withdrawal of semaglutide. Diabetes Obes Metab. 2022;24(8):1553-1564. PubMed
- American Diabetes Association Professional Practice Committee. Standards of Care in Diabetes, 2023. Diabetes Care. 2023;46(Suppl 1). PubMed
- Rodriguez PJ, Goodwin Cartwright BM, Engel L, et al. Tirzepatide vs semaglutide for weight loss (SURMOUNT-5). NEJM. 2024. PubMed