Honest Criticisms and Limitations of the SUSTAIN-6 Trial

At a glance
| Parameter | Detail | |---|---| | N | 3,297 | | Intervention | Semaglutide 0.5 mg or 1.0 mg subcutaneous, once weekly | | Comparator | Placebo (added to standard of care) | | Duration | 104 weeks (median follow-up ~2.1 years) | | Primary endpoint | First occurrence of 3-point MACE (CV death, nonfatal MI, nonfatal stroke) | | Key result | HR 0.74 to 95% CI 0.58, 0.95; p <0.001 for noninferiority |
Why a Limitations Page Exists
The SUSTAIN-6 trial changed the trajectory of GLP-1 receptor agonist prescribing. Published in the New England Journal of Medicine in 2016, it was the second major CVOT (after LEADER for liraglutide) to show a cardiovascular signal favoring a GLP-1 RA over placebo in patients with type 2 diabetes and established or high-risk cardiovascular disease. That signal, a 26% relative reduction in three-point MACE, helped reshape FDA labeling and ADA treatment guidelines.
But every trial has design constraints that bound the confidence we can place in its conclusions. The limitations below are not reasons to dismiss SUSTAIN-6. They are reasons to read it carefully.
1. Designed for Noninferiority, Not Superiority
SUSTAIN-6 was powered as a noninferiority trial. The prespecified margin was an upper bound of the 95% confidence interval for the hazard ratio of 1.8, the same threshold the FDA's 2008 cardiovascular guidance for diabetes drugs required. The trial was not designed or powered to demonstrate superiority for individual MACE components.
The HealthRX Noninferiority-vs-Superiority Interpretation Framework:
| Question | Noninferiority design | Superiority design | |---|---|---| | What is the null hypothesis? | Drug is worse than control by a clinically meaningful margin | Drug has no benefit over control | | What does rejection prove? | "Not unacceptably worse" | "Better than placebo" | | Can you claim superiority post hoc? | Only if the CI excludes 1.0 and the trial has adequate power | Yes, directly from the primary analysis | | Event count needed (typical CVOT) | ~125, 250 events | ~500, 800+ events | | Consequence of small sample | May reach noninferiority easily but lack power for component endpoints | Trial simply fails | | Risk of interpretation error | Readers conflate "noninferior" with "proven superior" | Lower, but effect magnitudes may still be overestimated |
The distinction matters here. SUSTAIN-6 accumulated 254 primary endpoint events across 3,297 patients. By comparison, LEADER enrolled 9,340 patients and accumulated 1,302 MACE events to power a superiority finding for liraglutide. The SUSTAIN-6 hazard ratio of 0.74 crossed the superiority boundary in a hierarchical testing procedure, but the trial's smaller event count means the confidence interval (0.58, 0.95) is wide. The point estimate could plausibly sit anywhere from a 42% reduction to a 5% reduction.
This is not a flaw in execution. It is a feature of the design. Novo Nordisk structured SUSTAIN-6 to satisfy the FDA's pre-approval cardiovascular safety requirement efficiently, not to generate the definitive superiority dataset.
2. Enriched Population Limits Generalizability
Enrollment criteria required either established cardiovascular disease (83% of participants) or chronic kidney disease, or both. Participants were older (mean age 64.6 years), had long-standing diabetes (mean duration 13.9 years), and had a mean baseline HbA1c of 8.7%.
This enrichment achieved two things: it increased the event rate (enabling the trial to finish faster with fewer patients) and it tested the drug in the population where cardiovascular harm would be most dangerous. Both are reasonable from a regulatory standpoint.
The trade-off is that the SUSTAIN-6 results tell us relatively little about:
- Younger patients with newly diagnosed T2D who lack established CVD
- Primary prevention populations where MACE event rates are far lower
- Patients without renal impairment, who represented a minority of the cohort
- Non-White populations, given that 83% of enrolled participants were White
The 2018 ADA/EASD consensus report acknowledged this by recommending GLP-1 RAs with proven CV benefit specifically for patients with established atherosclerotic cardiovascular disease, not as a blanket recommendation across all T2D phenotypes.
3. Two-Year Follow-Up Is Short for Cardiovascular Conclusions
The median follow-up was approximately 2.1 years. Cardiovascular outcome trials in diabetes have trended toward longer observation periods. LEADER ran for 3.8 years. The UKPDS legacy studies demonstrated that glycemic intervention effects on macrovascular outcomes sometimes take a decade to separate clearly from control.
Two years is long enough to detect a safety signal (the FDA's primary concern) and to identify large treatment effects. It is not long enough to answer several clinically important questions:
- Does the MACE benefit persist, grow, or attenuate after year two?
- Are there late-emerging adverse effects (thyroid C-cell concerns, pancreatic events) that require longer observation?
- Does the retinopathy signal (HR 1.76 to 95% CI 1.11, 2.78) stabilize, worsen, or reverse with continued treatment?
The retinopathy finding is particularly relevant. The SUSTAIN-6 publication noted a statistically significant increase in diabetic retinopathy complications with semaglutide. The authors attributed this primarily to rapid HbA1c reduction in patients with pre-existing retinopathy, a phenomenon observed with insulin intensification as well. But without extended follow-up, the natural history of this signal remains uncertain.
4. Composite Endpoint Masking
Three-point MACE bundles cardiovascular death, nonfatal myocardial infarction, and nonfatal stroke into a single outcome. This is standard for diabetes CVOTs, but composite endpoints can obscure which components are driving the overall result.
In SUSTAIN-6, the component breakdown was:
| Component | Semaglutide (n/1,648) | Placebo (n/1,649) | HR (95% CI) | |---|---|---|---| | CV death | 44 (2.7%) | 46 (2.8%) | 0.98 (0.65, 1.48) | | Nonfatal MI | 47 (2.9%) | 64 (3.9%) | 0.74 (0.51, 1.08) | | Nonfatal stroke | 27 (1.6%) | 44 (2.7%) | 0.61 (0.38, 0.99) |
The composite was driven primarily by nonfatal stroke and nonfatal MI reductions. Cardiovascular death showed no meaningful separation (HR 0.98). This pattern is worth noting: patients prescribed semaglutide based on the headline 26% MACE reduction might reasonably assume their risk of dying from cardiovascular causes is lower, but the trial data do not support that specific interpretation with any statistical confidence.
The nonfatal stroke reduction (HR 0.61) reached nominal significance, but the event count was small (27 vs. 44). Small absolute numbers in post-hoc component analyses are susceptible to random variation, and the trial was not powered for these individual comparisons.
5. Sponsor Design and Conduct
Novo Nordisk designed the trial, funded it, and was involved in data collection, analysis, and manuscript preparation. The trial publication discloses this clearly: "The sponsor (Novo Nordisk) designed the trial, with input from the steering committee."
This is standard practice for pharmaceutical CVOTs and does not inherently invalidate the findings. But it introduces structural considerations:
- Protocol design choices (noninferiority margin, enrichment criteria, follow-up duration) were optimized for regulatory efficiency, not necessarily for clinical informativeness
- Data access was managed through the sponsor, with academic investigators analyzing data under contractual arrangements
- Publication timing aligned with regulatory submissions, creating incentive pressure to present results favorably
- The steering committee had "final responsibility for the decision to submit for publication," per the manuscript, though the degree of truly independent analytic access has been debated in post-publication commentary
A 2017 editorial in The Lancet Diabetes & Endocrinology by Cefalu et al. discussed the broader challenge of interpreting sponsor-run CVOTs, noting that while results are valid, clinicians should weigh the distinction between regulatory-grade evidence (is the drug safe enough?) and practice-changing evidence (should this drug be preferred?).
6. Background Therapy Was Not Standardized
Participants continued their existing diabetes medications, and investigators could adjust background therapy (including insulin) at their discretion throughout the trial. This pragmatic design mirrors real-world practice but introduces confounding.
If semaglutide's glucose-lowering effect led to reductions in insulin dose or discontinuation of sulfonylureas, the MACE benefit could be partially attributable to reduced hypoglycemia or changes in concomitant medication rather than a direct vascular effect of semaglutide. The trial did not adjudicate or systematically report background therapy changes in a way that fully addresses this possibility.
7. Withdrawal and Adherence Patterns
Approximately 20% of participants in the semaglutide groups discontinued treatment prematurely, primarily due to gastrointestinal side effects. The analysis used a modified intention-to-treat approach, counting events regardless of treatment discontinuation, which is conservative and appropriate. But a 20% discontinuation rate means the observed treatment effect reflects a mixture of full exposure, partial exposure, and no exposure to semaglutide.
In a superiority framework, high discontinuation dilutes the treatment effect and biases toward the null, meaning the true on-treatment benefit may be larger. In clinical practice, however, the relevant question is whether patients who stop the drug due to nausea at month three still carry any residual cardiovascular protection. SUSTAIN-6 does not answer that question.
8. What Post-Publication Commentary Added
Several letters to the editor and subsequent review articles raised points that did not appear in the primary publication:
- The retinopathy signal generated substantial discussion, with multiple correspondents noting the need for prospective ophthalmologic monitoring in future semaglutide trials
- Questions about the interaction between rapid A1c lowering and microvascular outcomes were compared to historical data from the DCCT/EDIC studies
- The SELECT trial (2023, NEJM), which tested semaglutide 2.4 mg in patients with obesity and CVD but without diabetes, later confirmed a 20% MACE reduction in a dedicated superiority design with 17,604 participants, partially addressing the power limitations of SUSTAIN-6
Putting It Together
SUSTAIN-6 accomplished what it was designed to do: it demonstrated cardiovascular safety and generated a credible signal of benefit for subcutaneous semaglutide in high-risk type 2 diabetes. The 26% MACE reduction is real in the statistical sense. The question that the limitations above collectively raise is not "was semaglutide harmful?" but "how confident should we be in the magnitude and breadth of the benefit?"
For patients resembling the trial population (established CVD, long-duration T2D, older age, predominantly White), the evidence is relatively strong. For everyone else, the evidence is borrowed, extrapolated, or pending from other trial programs.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Marso SP, Bain SC, Consoli A, et al. Semaglutide and cardiovascular outcomes in patients with type 2 diabetes. N Engl J Med. 2016;375(19):1834-1844. PubMed
- Marso SP, Daniels GH, Tanaka K, et al. Liraglutide and cardiovascular outcomes in type 2 diabetes (LEADER). N Engl J Med. 2016;375(4):311-322. PubMed
- Davies MJ, D'Alessio DA, Fradkin J, et al. Management of hyperglycemia in type 2 diabetes, 2018: a consensus report by ADA and EASD. Diabetes Care. 2018;41(12):2669-2701. PubMed
- Lincoff AM, Brown-Frandsen K, Colhoun HM, et al. Semaglutide and cardiovascular outcomes in obesity without diabetes (SELECT). N Engl J Med. 2023;389(24):2221-2232. PubMed
- FDA. Ozempic (semaglutide) prescribing information. Revised 2020. FDA Label
- Cefalu WT, Kaul S, Gerstein HC, et al. Cardiovascular outcomes trials in type 2 diabetes: where do we go from here? Lancet Diabetes Endocrinol. 2017;5(5):325-328. PubMed