Inside the SURPASS-3 Methodology: What Most Summaries Skip

At a glance
| Field | Detail | |---|---| | N | 1,444 randomized | | Intervention | Tirzepatide 5 mg, 10 mg, or 15 mg SC once weekly | | Comparator | Insulin degludec, titrated to fasting glucose <90 mg/dL | | Duration | 52 weeks | | Primary endpoint | Change from baseline in HbA1c at week 52 | | Key result | All three tirzepatide doses were superior to insulin degludec for A1C reduction and body weight change |
Why the Comparator Choice Matters More Than You Think
Most summaries of SURPASS-3 state that tirzepatide beat insulin degludec. That is true. But the clinical meaning of that comparison depends entirely on how well the comparator arm was optimized.
Insulin degludec was titrated to a fasting plasma glucose target of <90 mg/dL, following a treat-to-target algorithm. This is aggressive by real-world standards. Many clinicians titrate basal insulin conservatively, leaving patients well above goal for months. The trial's titration protocol actually favored the insulin arm relative to typical practice, which strengthens the comparison for tirzepatide.
Still, the FDA label for tirzepatide notes that the mean insulin dose at week 52 was approximately 49 units per day. Whether that represents full optimization is debatable. Some endocrinologists argue that doses above 0.5 U/kg would have been a fairer test. The SURPASS-3 protocol did not cap insulin dosing, but the titration algorithm's weekly adjustments may have been too conservative for some patients.
The choice of degludec rather than glargine U-100 also carries implications. Degludec has a longer half-life and lower nocturnal hypoglycemia rates than glargine U-100, as established in the SWITCH trials. Using a best-in-class basal insulin was a design strength, not a weakness.
Randomization and the Open-Label Problem
SURPASS-3 randomized 1,444 participants in a 1:1:1:1 ratio across the three tirzepatide doses and insulin degludec. Randomization was stratified by country, baseline A1C (<8.5% vs. ≥8.5%), and baseline BMI (<30 vs. ≥30 kg/m²). These strata were well chosen because both variables are strong predictors of treatment response.
The trial was open-label. Participants and investigators knew which treatment was assigned. This is standard for injectable comparisons where injection frequency, device, and titration differ (weekly pen vs. daily pen with dose adjustments). Blinding would have required a double-dummy design with daily placebo injections and weekly placebo injections, adding burden that regulators and ethics committees often consider disproportionate.
Open-label design introduces known biases. Participants on tirzepatide who experienced nausea may have expected efficacy, reinforcing adherence. Investigators titrating insulin may have been less aggressive knowing the patient was in the comparator arm. The GRADE trial, which compared four add-on therapies to metformin, faced similar open-label challenges and acknowledged them as a limitation.
For A1C, an objective lab measurement, open-label bias is relatively contained. For patient-reported outcomes and weight (influenced by dietary behavior), the bias risk is higher. SURPASS-3 did not include a blinded central adjudication committee for weight, and no dietary standardization was enforced beyond routine counseling.
Inclusion and Exclusion Criteria: Who Got In
The trial enrolled adults with type 2 diabetes inadequately controlled on metformin alone (≥1 to 500 mg/day for ≥3 months). Baseline A1C had to fall between 7.0% and 10.5%. The mean baseline A1C was 8.17%, and the mean baseline BMI was 33.5 kg/m².
Several exclusion criteria shaped the population in ways that matter for generalizability. Patients with an eGFR <45 mL/min/1.73 m² were excluded, removing a group that frequently needs insulin in practice. Prior use of any injectable glucose-lowering therapy within the past 3 months was exclusionary. Patients with a history of pancreatitis were also excluded, consistent with GLP-1 receptor agonist labeling.
The requirement for metformin monotherapy at entry is worth noting. By 2021 ADA Standards of Care, many patients at A1C 8.17% would already be on dual therapy. The trial population therefore represents a specific clinical scenario: patients who should have been intensified earlier but were not. This is common in practice, which actually increases the trial's real-world relevance.
The Dual-Estimand Framework
This is where most summaries fail. SURPASS-3 used two co-primary estimands, aligned with the ICH E9(R1) addendum:
Treatment-policy estimand (efficacy estimand). This analyzed all randomized patients regardless of whether they discontinued study drug or added rescue medication. Missing data were handled by multiple imputation using a "retrieved dropout" approach. This estimand answers: "What happens if a clinician prescribes this drug in routine practice, knowing some patients will stop taking it?"
Trial-product estimand (efficacy estimand on treatment). This analyzed data only while participants were on the assigned treatment without rescue medication. It answers: "Among patients who actually stay on therapy, how much benefit can they expect?"
The distinction matters enormously for the weight data. Under the treatment-policy estimand, the 15 mg tirzepatide group lost 9.5 kg compared to a 2.3 kg gain in the insulin arm. Under the trial-product estimand, the 15 mg group lost 11.3 kg versus a 1.9 kg gain. The gap between estimands reflects what happens to patients who discontinue: they tend to regain weight, diluting the treatment-policy estimate.
For A1C, the pattern is similar but less dramatic. The treatment-policy estimate for 15 mg tirzepatide was a reduction of 2.37 percentage points versus 1.34 for degludec. The trial-product estimate showed reductions of 2.46 versus 1.48 percentage points. The narrower gap reflects the fact that patients who discontinue a glucose-lowering drug see A1C rise, but not as variably as weight.
Statistical Hierarchy and Multiplicity Control
The primary objective was superiority of each tirzepatide dose versus degludec for A1C change at week 52. A fixed-sequence testing procedure controlled the family-wise type I error rate at 5%. The sequence started with the highest dose (15 mg), then 10 mg, then 5 mg. Only if the preceding dose achieved significance could the next dose be tested.
All three doses cleared the superiority bar. The estimated treatment differences versus degludec for A1C under the treatment-policy estimand were: 5 mg, −0.59% (95% CI: −0.73 to −0.45); 10 mg, −0.86% (−1.00 to −0.72); 15 mg, −1.04% (−1.17 to −0.90). Each p-value was <0.001 per the published results.
Weight was a key secondary endpoint tested within the same hierarchical framework. Body weight change was tested for superiority only after A1C superiority was confirmed. This ordering reflects the FDA's guidance that glycemic control is the primary regulatory bar for type 2 diabetes drugs, with weight as supportive.
Discontinuation Rates and What They Signal
The discontinuation rate in the tirzepatide arms ranged from 15% to 19%, compared with 14% in the degludec arm. Gastrointestinal adverse events (nausea, diarrhea, decreased appetite) were the primary driver of tirzepatide discontinuations, particularly during the dose-escalation phase.
The dose-escalation schedule (2.5 mg starting dose, with 2.5 mg increases every four weeks) was designed to mitigate GI side effects. Patients randomized to 15 mg underwent a longer escalation period than those randomized to 5 mg. Even so, nausea occurred in 12% to 24% of tirzepatide-treated patients versus 2% on insulin, according to the trial report.
These rates affect how the two estimands diverge. The SURPASS-2 trial, which compared tirzepatide to semaglutide 1 mg, showed a similar pattern of GI-driven discontinuation, suggesting this is a class-level tolerability feature rather than a SURPASS-3 protocol artifact.
Limitations the Authors Acknowledged
The published trial listed several limitations explicitly. Open-label design was the most prominent. The 52-week duration, while sufficient for regulatory purposes, does not address durability beyond one year. The SURPASS-4 trial extended follow-up to 104 weeks and confirmed persistent A1C and weight effects, though that trial used glargine U-100 as the comparator.
The trial population was predominantly White (approximately 70%) with a mean diabetes duration of 8.4 years. Generalizability to more diverse populations or to patients with longer disease duration and greater beta-cell failure remains uncertain.
There was no SGLT2 inhibitor combination arm, which is increasingly relevant given that current ADA guidelines recommend cardiorenal-protective agents early in therapy. SURPASS-3 answers the question of tirzepatide versus basal insulin, but not whether tirzepatide plus an SGLT2 inhibitor would outperform insulin plus an SGLT2 inhibitor.
The Bottom Line for Clinicians
SURPASS-3's methodology is rigorous but not without trade-offs. The open-label design, aggressive insulin titration protocol, and dual-estimand framework all shape how to read the results. Quoting the headline A1C difference of approximately 1 percentage point without specifying which estimand, or citing weight loss without acknowledging the open-label effect on behavior, misrepresents what the trial actually showed. The data support tirzepatide's superiority over optimally titrated basal insulin, but the strength of that conclusion depends on which patient, which estimand, and which outcome you prioritize.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Ludvik B, Giorgino F, Jódar E, et al. Once-weekly tirzepatide versus once-daily insulin degludec as add-on to metformin in patients with type 2 diabetes (SURPASS-3): a randomised, open-label, parallel-group, phase 3 trial. Lancet. 2021;398(10300):583-598. PubMed
- U.S. Food and Drug Administration. Mounjaro (tirzepatide) prescribing information. 2022. FDA Label
- Del Prato S, Kahn SE, Pavo I, et al. Tirzepatide versus insulin glargine in type 2 diabetes and increased cardiovascular risk (SURPASS-4): a randomised, open-label, parallel-group, multicentre, phase 3 trial. Lancet. 2021;398(10313):1811-1824. PubMed
- Frías JP, Davies MJ, Rosenstock J, et al. Tirzepatide versus semaglutide once weekly in patients with type 2 diabetes (SURPASS-2). N Engl J Med. 2021;385(6):503-515. PubMed
- GRADE Study Research Group. Glycemia reduction in type 2 diabetes: glycemic outcomes (GRADE). N Engl J Med. 2022;387(12):1063-1074. PubMed
- American Diabetes Association Professional Practice Committee. Standards of Care in Diabetes, 2023. Diabetes Care. 2023;46(Suppl 1). PubMed