Inside the SURPASS-2 Methodology: What Most Summaries Skip

GLP-1 medication and metabolic health image for Inside the SURPASS-2 Methodology: What Most Summaries Skip

Inside the SURPASS-2 Methodology: What Most Summaries Skip

At a glance

Why the Comparator Choice Is the First Thing to Interrogate

Every active-controlled trial lives or dies by the comparator it selects. The SURPASS-2 investigators chose semaglutide 1 mg, the approved once-weekly dose for type 2 diabetes at the time of trial design. That was a defensible choice. Semaglutide 1 mg was the dominant GLP-1 receptor agonist in that class by market share and guideline endorsement when the trial was powered, and it carried a cardiovascular outcomes trial (SUSTAIN-6) showing superiority to placebo on major adverse cardiovascular events.

What the comparator was not is semaglutide 2 mg. The 2 mg dose had already appeared in the SUSTAIN FORTE data by the time SURPASS-2was published, and that trial showed meaningfully larger A1C and weight reductions with 2 mg versus 1 mg in patients with inadequately controlled type 2 diabetes on metformin. SUSTAIN FORTE reported a treatment difference of −0.23 percentage points in A1C favoring 2 mg, with weight reductions approximately 1.5 kg greater. The 2 mg dose received FDA approval in 2022. Had SURPASS-2 used semaglutide 2 mg, the margin separating tirzepatide from semaglutide would almost certainly have been smaller, possibly narrower than the trial's stated non-inferiority margin for some doses. This is not a flaw in the trial as designed, but it is the single largest contextual factor any clinician should hold in mind when reading the results.

The FDA label for semaglutide injection lists the 0.5 mg and 1 mg doses as indicated for glycemic control in adults with type 2 diabetes, with cardiovascular risk reduction language added after SUSTAIN-6. That label context matters: the trial used a fully approved, guideline-recommended dose, not a subtherapeutic one.

Randomization and Allocation Concealment

Participants were randomized 1:1:1:1 to tirzepatide 5 mg, tirzepatide 10 mg, tirzepatide 15 mg, or semaglutide 1 mg. Randomization was stratified by country, baseline A1C (<8.5% vs ≥8.5%), and baseline use of metformin alone versus metformin plus an SGLT2 inhibitor. Stratified randomization matters practically: it prevents chance imbalances in the subgroups most likely to drive the primary endpoint differential.

The published protocol and supplementary appendix confirm that an interactive web response system handled allocation, which is the current standard for preventing investigator-level selection bias. Allocation concealment was maintained through the double-dummy injection structure described below.

The Double-Dummy Design and What "Double-Blind" Actually Means Here

SURPASS-2 used a double-dummy approach: every participant injected two pens each week, one containing active drug and one containing matching placebo. A participant in the semaglutide arm injected active semaglutide plus placebo tirzepatide; a participant in the tirzepatide 15 mg arm injected active tirzepatide plus placebo semaglutide. This preserved blinding at the participant and site level because both pens were physically identical in appearance.

Where the design becomes more complicated is in the dose-escalation schedule. Tirzepatide was started at 2.5 mg and escalated every four weeks to the target dose (5, 10, or 15 mg). Semaglutide was started at 0.25 mg and escalated over four weeks to 1 mg per the approved label schedule. These escalation curves are different in shape and timing, and anyone paying attention to injection-site reactions or GI side effects during escalation might have been able to guess their allocation. The trial could not fully blind the escalation experience even with matching placebo pens. This is an inherent limitation of comparing two drugs with different pharmacokinetic profiles and titration schemes, and the SURPASS-2 authors acknowledged it in the supplementary appendix.

Inclusion and Exclusion Criteria: Who Was Actually Enrolled

The trial enrolled adults aged 18 or older with type 2 diabetes, an A1C of 7.5 to 11%, and a body mass index of at least 25 kg/m². Background therapy was metformin with or without an SGLT2 inhibitor. Key exclusions included a recent cardiovascular event (within 90 days), use of any GLP-1 receptor agonist or insulin within 90 days prior to screening, and an estimated GFR below 45 mL/min/1.73 m².

Several exclusion criteria deserve attention because they define the ceiling on generalizability. First, patients with eGFR <45 were excluded. The ADA Standards of Care prioritize GLP-1 receptor agonists in type 2 diabetes with established cardiovascular disease or high cardiovascular risk, a population that frequently carries CKD. Second, prior GLP-1 use was an exclusion, meaning the trial says nothing about switching from semaglutide to tirzepatide, a question clinicians encounter regularly. Third, the BMI floor of 25 kg/m² excluded a small but real subset of Asian patients with type 2 diabetes who present at lower BMIs.

The resulting trial population had a mean baseline A1C of approximately 8.28% and a mean baseline body weight of roughly 93.6 kg, which is broadly representative of the North American and European type 2 diabetes population on oral therapy but less so for populations with different phenotypes.

The Estimand Framework: Treatment Policy vs. Hypothetical

The estimand choice in SURPASS-2 deserves more attention than most summaries give it. The primary estimand was the treatment-policy estimand, meaning the analysis captured A1C change regardless of whether participants discontinued the study drug or used rescue therapy. This follows the ICH E9(R1) addendum framework, which asks what the effect of the treatment strategy is in a real-world sense, including discontinuations.

The ICH E9(R1) guidance document distinguishes between this and the hypothetical estimand, which would ask what would have happened if everyone had adhered perfectly. The treatment-policy estimand is more pragmatic and is less likely to overstate drug efficacy in compliant patients. The flip side is that discontinuations in one arm for tolerability reasons can attenuate the observed between-group difference if the drug with more early discontinuations has worse outcomes captured post-discontinuation.

In SURPASS-2, GI adverse events were more frequent with tirzepatide 15 mg (nausea 22%, diarrhea 17%) than with semaglutide 1 mg (nausea 17%, diarrhea 12%). If patients who discontinued tirzepatide early because of nausea reverted toward their baseline A1C during the follow-up data collection window, the treatment-policy estimand would capture that reversion and thereby reduce the apparent tirzepatide advantage. The fact that all three tirzepatide doses still showed statistically superior A1C reduction under this conservative estimand makes the finding more credible, not less.

A sensitivity analysis using a retrieved dropout estimand (direct measurement at the scheduled visit even after discontinuation) and a hypothetical estimand (multiple imputation under the assumption of no treatment effect after discontinuation) were pre-specified. The supplementary appendix of the published trial reports that these sensitivity analyses were consistent with the primary result.

Primary Endpoint Definition and Statistical Hierarchy

The primary endpoint was change in HbA1c from baseline to week 40, analyzed using a mixed model for repeated measures (MMRM) with treatment, country, stratification factors, visit, and treatment-by-visit interaction as covariates, and baseline A1C as a continuous covariate. MMRM uses all available data under a missing-at-random assumption rather than requiring complete cases, which is the appropriate choice for longitudinal trials with some dropout.

The statistical testing hierarchy moved from the 15 mg dose to the 10 mg dose to the 5 mg dose, controlling family-wise error. Secondary endpoints were tested in a pre-specified hierarchical order: body weight change, proportion reaching A1C <7%, proportion reaching A1C <5.7%, and weight loss of ≥5%, ≥10%, and ≥15%. This hierarchy matters because it determines which secondary results are protected from inflation. All pre-specified secondary endpoints met statistical significance, which is why the weight loss comparisons are valid and not exploratory.

| Endpoint | Tirzepatide 5 mg | Tirzepatide 10 mg | Tirzepatide 15 mg | Semaglutide 1 mg | |---|---|---|---|---| | A1C change (%) | −2.01 | −2.24 | −2.30 | −1.86 | | Body weight change (kg) | −7.6 | −9.3 | −11.2 | −5.7 | | A1C <7% (%) | 82 | 86 | 86 | 79 | | A1C <5.7% (%) | 27 | 39 | 46 | 19 | | Weight loss ≥5% (%) | 60 | 71 | 79 | 53 | | Weight loss ≥15% (%) | 15 | 27 | 36 | 8 |

Weight loss of 11.2 kg at the 15 mg dose versus 5.7 kg with semaglutide 1 mg is a clinically meaningful difference. For context, the SCALE Obesity trial for liraglutide 3 mg showed approximately 8.4 kg weight loss versus 2.8 kg for placebo, and that was in an obesity-specific trial at a higher GLP-1 dose than used for glycemic control. The magnitude of weight loss seen with tirzepatide 15 mg in a glycemic control trial is notable.

Limitations the Authors Named and Some They Did Not

The trial authors acknowledged several limitations directly. The 40-week duration does not capture long-term durability of glycemic control or cardiovascular outcomes. The semaglutide 1 mg comparator was not the highest approved dose. The population excluded patients with recent cardiovascular events, so no cardiovascular outcomes data exist from this trial. Generalizability to patients on insulin background therapy or those with eGFR <45 is absent.

A limitation the authors mentioned but did not quantify is the potential for functional unblinding through differences in GI side-effect profiles and injection-site experiences during titration. Patients who tolerated their drug well after week 4 might have guessed they were on once-weekly semaglutide (which reaches full dose quickly) rather than still-escalating tirzepatide. Whether this created any systematic reporting bias in patient-reported outcomes is unknowable from the published data.

A further gap is the absence of a tirzepatide-versus-semaglutide comparison at equipotent cardiovascular risk-reduction doses. SUSTAIN-6 with semaglutide 1 mg showed a 26% relative risk reduction in MACE. SURPASS-CVOT subsequently showed tirzepatide reduced MACE by 14% versus placebo in high-risk type 2 diabetes, a result in the same direction but without a head-to-head MACE comparison against semaglutide.

What the Methodology Tells Practitioners

The SURPASS-2 design was well-constructed for its stated question: is tirzepatide superior to semaglutide 1 mg on glycemic control in metformin-treated type 2 diabetes over 40 weeks? The answer is yes, across all three doses, under a conservative estimand, with a protected statistical hierarchy. The weight loss data are valid secondary endpoints, not exploratory observations.

The limits of that answer are equally clear. The comparison is against 1 mg, not 2 mg. The population is metformin-treated, not insulin-treated. The duration is 40 weeks. And the cardiovascular outcome question requires separate evidence. Clinicians choosing between these agents for a specific patient should weight those constraints against the magnitude of the glycemic and weight benefits shown, and should look at the full FDA prescribing information for tirzepatide for approved indications, dose titration, and contraindications that the trial protocol itself does not cover.

Frequently asked questions

References

  1. Frias JP, Davies MJ, Rosenstock J, et al. Tirzepatide versus semaglutide once weekly in patients with type 2 diabetes. N Engl J Med. 2021;385(6):503-515. https://pubmed.ncbi.nlm.nih.gov/34170647/

  2. Rosenstock J, Cheng A, Ritzel R, et al. More similarities than differences testing insulin glargine 300 units/mL versus insulin degludec 100 units/mL in insulin-naive type 2 diabetes: the randomized head-to-head BRIGHT trial. Diabetes Care. 2018;41(10):2147-2154. https://pubmed.ncbi.nlm.nih.gov/30120124/

  3. Rosenstock J, Friberg Feldt-Rasmussen H, et al. Effect of additional oral semaglutide versus sitagliptin on glycated hemoglobin in adults with type 2 diabetes uncontrolled with metformin alone or with sulfonylurea: the PIONEER 3 randomized clinical trial. JAMA. 2019;321(15):1466-1480. https://pubmed.ncbi.nlm.nih.gov/30951574/

  4. Lingvay I, Deanfield J, Kahn SE, et al. Semaglutide 2 mg versus 1 mg in type 2 diabetes inadequately controlled by 1 mg semaglutide (SUSTAIN FORTE): a double-blind, randomised, phase 3B trial. Lancet Diabetes Endocrinol. 2021;9(9):563-575. https://pubmed.ncbi.nlm.nih.gov/34170651/

  5. Marso SP, Bain SC, Consoli A, et al. Semaglutide and cardiovascular outcomes in patients with type 2 diabetes. N Engl J Med. 2016;375(19):1834-1844. https://pubmed.ncbi.nlm.nih.gov/27633186/

  6. FDA prescribing information for tirzepatide (Mounjaro). Accessed January 2025. https://www.accessdata.fda.gov/drugsatfda_docs/label/2022/215866s000lbl.pdf

  7. FDA prescribing information for semaglutide injection (Ozempic). Accessed January 2025. https://www.accessdata.fda.gov/drugsatfda_docs/label/2021/209637s008lbl.pdf

  8. ICH E9(R1) addendum on estimands and sensitivity analysis in clinical trials. FDA guidance document. https://www.fda.gov/media/119504/download

  9. American Diabetes Association. Standards of Medical Care in Diabetes 2023. Diabetes Care. 2023;46(Suppl 1):S1-S291. https://diabetesjournals.org/care/article/46/Supplement_1/S1/148040/Standards-of-Medical-Care-in-Diabetes-2023

  10. Pi-Sunyer X, Astrup A, Fujioka K, et al. A randomized, controlled trial of 3.0 mg of liraglutide in weight management. N Engl J Med. 2015;373(1):11-22. https://pubmed.ncbi.nlm.nih.gov/25352197/