Inside the SUSTAIN-6 Methodology: What Most Summaries Skip

At a glance
| Field | Detail | |---|---| | Trial Name | SUSTAIN-6 | | N | 3,297 | | Intervention | Semaglutide 0.5 mg or 1.0 mg SC once weekly | | Comparator | Volume-matched placebo (0.5 mg or 1.0 mg) | | Duration | 104 weeks (2 years) | | Primary Endpoint | 3-point MACE: CV death, non-fatal MI, non-fatal stroke | | Key Result | HR 0.74 (95% CI 0.58, 0.95); 26% relative risk reduction | | Primary Source | Marso et al., NEJM 2016 |
Why Regulatory Context Determines Everything About This Design
SUSTAIN-6 was not conceived as a superiority trial. It was conceived to satisfy the 2008 FDA guidance requiring cardiovascular outcomes trials (CVOTs) for new antidiabetic drugs, a requirement born from post-hoc concerns about rosiglitazone's cardiac signal. The agency required that the upper bound of the 95% confidence interval for the hazard ratio stay below 1.8 during development and below 1.3 at approval. That is the statistical bar SUSTAIN-6 was engineered to clear.
This regulatory framing matters enormously. The primary analysis was a non-inferiority test. Superiority was a secondary, pre-specified analysis. When Marso et al. reported in the NEJM that semaglutide achieved superiority (HR 0.74, p = 0.02 for superiority), that result was valid and pre-specified, but it sat one step below the primary non-inferiority conclusion in the statistical hierarchy. Many clinical summaries invert that hierarchy and present the superiority result as the headline without that caveat.
Randomization and Allocation Concealment
Patients were randomized 1:1 to semaglutide or placebo using an interactive voice or web response system, stratified by two factors: current insulin use and cardiovascular risk category (established CV disease vs. CKD stage 3 or above as the sole high-risk qualifier). Stratified randomization is standard for CVOTs because baseline insulin use substantially alters background glucose-lowering intensity, and unbalanced distribution across arms could confound the glucose-lowering contribution to any observed outcome difference.
The allocation was concealed at the site level. Investigators could not access the randomization sequence before enrollment, which is appropriate and reduces selection bias. What the trial could not conceal, however, was the glycemic and weight signal. At 104 weeks, semaglutide 1.0 mg reduced HbA1c by roughly 1.1 percentage points more than placebo, and body weight fell by about 4.5 kg more. Any site clinician familiar with GLP-1 receptor agonist pharmacology could reasonably infer treatment assignment from metabolic markers. That inference risk is an acknowledged limitation of all GLP-1 CVOTs.
Blinding: What "Double-Blind" Actually Meant Here
The trial used volume-matched placebo injections, meaning the injection volume was identical between semaglutide and placebo syringes across both dose levels. Patients, site staff, and outcome adjudicators were blinded to assignment. The adjudication committee reviewed outcomes without knowledge of treatment allocation, which protects the integrity of the primary endpoint.
The practical limitation is the GI side-effect profile. Nausea and vomiting, which Marso et al. documented at rates of 20% and 5% for semaglutide 1.0 mg vs. 5% and 2% for placebo, are a clinical fingerprint of GLP-1 receptor agonism. A patient or nurse who observed these symptoms could reasonably suspect active drug. This form of unblinding does not typically invalidate a hard-endpoint trial where outcomes are adjudicated independently, but it should temper confidence in secondary endpoints like patient-reported outcomes or symptom scales.
Inclusion and Exclusion Criteria: A High-Risk Population by Design
Enrollment required type 2 diabetes plus one of two risk categories. The first was established cardiovascular disease (prior MI, stroke, or peripheral arterial disease). The second was CKD stage 3 or above in patients aged 60 or older without prior CV events. The minimum age was 50 for the CV disease group and 60 for the CKD group.
Several practical consequences follow from these criteria. First, the trial population was older and sicker than typical early-stage T2D cohorts. Mean baseline HbA1c was approximately 8.7%, mean duration of diabetes was 14 years, and roughly 83% had established cardiovascular disease at baseline. Second, background therapy was highly heterogeneous: metformin, sulfonylureas, insulin, and SGLT2 inhibitors were all permitted and present in meaningful proportions. That heterogeneity makes isolating semaglutide's independent contribution difficult.
Third, and critically for generalization, patients with a recent acute coronary syndrome within 90 days or a planned coronary revascularization were excluded. The population that generates the most MACE events in clinical practice, those recently destabilized, was systematically absent. This is a design feature shared across all GLP-1 CVOTs and limits extrapolation to patients in the acute or peri-procedural phase.
The Primary Endpoint: How MACE Was Defined and Adjudicated
The primary endpoint was time to first occurrence of a 3-point MACE composite: cardiovascular death, non-fatal myocardial infarction, or non-fatal stroke. This is now the near-universal CVOT composite following FDA guidance and allows cross-trial comparisons with LEADER (liraglutide), EMPA-REG OUTCOME (empagliflozin), and DECLARE-TIMI 58 (dapagliflozin). A blinded, independent clinical events committee adjudicated all suspected events using pre-specified criteria, and those criteria were set before any unblinding.
The choice to use a time-to-first-event analysis rather than total event count is worth noting. Because SUSTAIN-6 ran only 104 weeks, patients who had a second or third MACE were essentially invisible to the primary analysis after their first event. A recurrent-events analysis would capture more of the total disease burden, but such analyses were not standard in CVOTs at the time of design. The 2020 ACC/AHA guidelines on antidiabetic therapies have since recommended interpretation of CVOTs in the context of total event burden, not just first events.
The Comparator Choice: What Placebo Really Means in Practice
The comparator was placebo, not an active glucose-lowering agent. This is ethically permissible and scientifically cleaner for isolating the drug's effect, but it introduces glucose-control asymmetry over a two-year period. Placebo patients required more frequent background-therapy intensification to maintain HbA1c within acceptable clinical bounds, which the protocol permitted. If those intensifications (often insulin additions) carried their own cardiovascular risks or benefits, the net treatment effect could be confounded.
Investigators tracked background-therapy changes and reported them as safety data. Insulin use increased more in the placebo group, and sulfonylurea use was similar across arms. This suggests most of the HbA1c gap reflects genuine semaglutide pharmacodynamics rather than background-therapy suppression in the active arm, but the asymmetry remains a recognized confound.
Statistical Architecture: Non-Inferiority First, Superiority Second
| Analysis | Threshold | Result | Interpretation | |---|---|---|---| | Non-inferiority (primary) | Upper 95% CI < 1.8 | Upper CI = 0.95 | Confirmed non-inferiority | | Non-inferiority (approval bar) | Upper 95% CI < 1.3 | Upper CI = 0.95 | Well within bound | | Superiority (pre-specified secondary) | HR < 1.0, p < 0.05 | HR 0.74, p = 0.02 | Confirmed superiority |
The trial used a hierarchical testing procedure, which controls the familywise error rate. Superiority was only formally tested after non-inferiority was established, protecting against inflated type I error. The power calculation assumed a 5% annual MACE rate in the placebo group and targeted at least 122 primary events for the non-inferiority analysis. The trial accumulated 254 primary events, giving it substantially more power than originally required, though this also reflects an unexpectedly high event rate in a relatively short trial.
One underappreciated feature is that semaglutide was tested at two doses (0.5 mg and 1.0 mg), and both doses were pooled for the primary analysis. The 1.0 mg arm showed a more pronounced point estimate for MACE reduction (HR approximately 0.68) compared with the 0.5 mg arm (HR approximately 0.82), though the interaction test was not statistically significant. This pooling is reasonable given the regulatory purpose, but it means the approved 1.0 mg label indication rests partly on evidence from a lower dose that may not perform identically.
What the Trial Did Not Measure: Gaps That Matter Clinically
SUSTAIN-6 ran for two years. The biological mechanisms proposed for GLP-1 cardiovascular benefit, including anti-atherosclerotic plaque effects and endothelial function improvements, are generally thought to operate over years to decades. The LEADER trial with liraglutide ran 3.8 years and showed a similar but slightly attenuated hazard ratio of 0.87, suggesting longer exposure may modulate effect size. Two years is enough to see a signal, particularly for stroke, but likely insufficient to observe the full trajectory of benefit.
Heart failure hospitalization was a secondary endpoint in SUSTAIN-6, and it showed a non-significant trend favoring semaglutide (HR 1.11 to 95% CI 0.77, 1.61). This is the opposite direction from what SGLT2 inhibitors show and from what subsequent semaglutide data in the SELECT trial suggested. The heart failure signal in SUSTAIN-6 should not be over-interpreted in either direction given the confidence interval width, but it does underscore that GLP-1 receptor agonists and SGLT2 inhibitors appear to address different cardiovascular phenotypes.
Renal outcomes were included as secondary endpoints, and semaglutide showed a significant reduction in new or worsening nephropathy (HR 0.64 to 95% CI 0.46, 0.88). This was an early signal later explored in dedicated renal outcome trials and is now reflected in ADA Standards of Care recommendations for GLP-1 use in CKD.
Retinopathy: A Safety Finding That Deserves More Attention
One of the most clinically consequential findings from SUSTAIN-6 was an increase in diabetic retinopathy complications in the semaglutide group (HR 1.76 to 95% CI 1.11, 2.78). This was a pre-specified secondary safety endpoint, and the signal was statistically significant. The proposed mechanism is rapid HbA1c reduction triggering transient worsening of retinopathy, a phenomenon previously described with insulin intensification and known as "early worsening." Patients with pre-existing proliferative retinopathy had the highest relative risk.
This finding influenced the Ozempic prescribing label, which includes a warning about retinopathy complications, and it remains a factor in ophthalmologic monitoring recommendations for patients initiating semaglutide with advanced baseline retinopathy.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
-
Marso SP, Bain SC, Consoli A, et al. Semaglutide and Cardiovascular Outcomes in Patients with Type 2 Diabetes. N Engl J Med. 2016;375(19):1834-1844. https://pubmed.ncbi.nlm.nih.gov/27633186/
-
Marso SP, Daniels GH, Brown-Frandsen K, et al. Liraglutide and Cardiovascular Outcomes in Type 2 Diabetes (LEADER). N Engl J Med. 2016;375(4):311-322. https://pubmed.ncbi.nlm.nih.gov/27295427/
-
Lincoff AM, Brown-Frandsen K, Colhoun HM, et al. Semaglutide and Cardiovascular Outcomes in Obesity without Diabetes (SELECT). N Engl J Med. 2023;389(24):2221-2232. https://pubmed.ncbi.nlm.nih.gov/37952131/
-
FDA. Ozempic (semaglutide) Prescribing Information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2017/209637lbl.pdf
-
American Diabetes Association. Standards of Medical Care in Diabetes. Diabetes Care. 2023;46(Suppl 1). https://pubmed.ncbi.nlm.nih.gov/36507631/
-
Das SR, Everett BM, Birtcher KK, et al. 2020 Expert Consensus Decision Pathway on Novel Therapies for Cardiovascular Risk Reduction in Patients With Type 2 Diabetes. J Am Coll Cardiol. 2020;76(9):1117-1145. https://www.ahajournals.org/doi/10.1161/CIR.0000000000000938