Honest Criticisms and Limitations of the EMPA-REG OUTCOME Trial

At a glance
| Parameter | Detail | |---|---| | N | 7,020 | | Intervention | Empagliflozin 10 mg or 25 mg daily | | Comparator | Placebo (on top of standard care) | | Duration | Median 3.1 years | | Primary endpoint | 3-point MACE (CV death, nonfatal MI, nonfatal stroke) | | Key result | 14% relative risk reduction in MACE (HR 0.86 to 95% CI 0.74-0.99, p = 0.04) |
The Trial That Changed Diabetes Cardiology
Published in the New England Journal of Medicine in September 2015, EMPA-REG OUTCOME was the first completed cardiovascular outcomes trial for any SGLT2 inhibitor. The 14% MACE reduction and the striking 38% reduction in cardiovascular death reshaped guidelines, prescribing habits, and the commercial trajectory of the entire drug class. Within two years, the FDA updated the empagliflozin label to include a cardiovascular death reduction indication.
But landmark trials deserve the hardest scrutiny. Below is what the published data, post-hoc commentary, and subsequent trials revealed about the limits of this evidence.
Enrollment Bias: Who Actually Got Into the Trial
The most consequential limitation is the population itself. EMPA-REG OUTCOME required established cardiovascular disease for enrollment. Over 99% of participants had documented atherosclerotic CVD at baseline, and roughly 75% had confirmed coronary artery disease. This was a secondary prevention trial in all but name.
That matters because the majority of patients with type 2 diabetes do not have established CVD at the time of SGLT2 inhibitor initiation. Primary prevention patients, the group most clinicians encounter in routine practice, were functionally excluded. The 2015 NEJM publication acknowledged this directly: the results "cannot be extrapolated to patients at lower cardiovascular risk."
Geographic enrollment also skewed heavily. Forty-one countries contributed patients, but the representation from North America was modest relative to Europe and Asia. Racial and ethnic diversity was limited. Roughly 72% of participants were white, 22% Asian, and only 5% Black. Given known differences in SGLT2 inhibitor pharmacokinetics and heart failure epidemiology across racial groups, this gap is not trivial.
Women comprised only 28.5% of the cohort. While this mirrors many cardiology trials, it weakens the confidence interval around sex-specific treatment effects. Subgroup analyses by sex did not reach statistical significance independently, a point often omitted when citing the headline result.
The Dose-Pooling Decision
EMPA-REG OUTCOME randomized patients to empagliflozin 10 mg, empagliflozin 25 mg, or placebo in a 1:1:1 ratio. The primary analysis pooled both empagliflozin arms against placebo. This pooling was pre-specified in the statistical analysis plan, but it effectively doubled the treatment group size relative to placebo, which raises a methodological concern: dose-response relationships were obscured.
When the two doses were examined separately, point estimates for MACE were similar (HR 0.85 for 10 mg, HR 0.86 for 25 mg). However, neither individual dose arm achieved statistical significance for the primary endpoint on its own. The trial was not powered for individual dose comparisons, and the investigators stated this openly. Still, the inability to demonstrate a clear dose-response weakens mechanistic interpretation. If both doses produce identical cardiovascular effects despite different glucose-lowering potency, the cardioprotective mechanism is likely independent of HbA1c reduction, a hypothesis that subsequent research supports but that EMPA-REG OUTCOME alone cannot confirm.
The HealthRX Limitation-Severity Framework
Not all trial limitations carry equal weight. We categorize each identified concern by how much it could change a clinician's decision if fully resolved.
| Limitation | Severity | Reasoning | |---|---|---| | Established CVD-only enrollment | High | Excludes the majority of T2D patients seen in primary care | | Dose pooling without individual-arm significance | Moderate | Pre-specified, but prevents dose-response conclusions | | 28.5% female enrollment | Moderate | Sex-specific subgroup underpowered | | Short median follow-up (3.1 yr) | Moderate | Long-term safety and durability unknown at publication | | 5% Black enrollment | High | Limits confidence in a population with elevated HF and CKD burden | | Industry sponsorship and author COI | Low-Moderate | Standard for large CVOTs, but warrants transparency | | Background therapy heterogeneity | Low | Reflects real-world prescribing, pragmatic design choice | | No active comparator arm | Moderate | Cannot attribute benefit vs. another glucose-lowering agent |
Statistical Caveats Worth Knowing
The primary endpoint achieved p = 0.04 for superiority. This crossed the pre-specified threshold, but the margin was not wide. In a hierarchical testing procedure, the primary endpoint had to clear first before secondary endpoints could be formally tested. It cleared, but barely.
The cardiovascular death reduction (HR 0.62 to 95% CI 0.49-0.77) was far more strong statistically and has been the number most frequently cited in subsequent guidelines, including the 2019 ESC/EASD guidelines on diabetes and cardiovascular disease. This creates an odd situation: the secondary endpoint drives clinical enthusiasm more than the primary one does.
Nonfatal stroke showed a non-significant trend toward harm (HR 1.18 to 95% CI 0.89-1.56). While this did not reach significance, it appeared in a trial otherwise dominated by positive signals. The CANVAS program for canagliflozin later showed a similar neutral-to-slightly-unfavorable stroke signal. Whether this represents a class effect, a chance finding, or a consequence of blood pressure reduction and volume depletion in stroke-prone patients remains debated.
The number needed to treat (NNT) to prevent one cardiovascular death over 3.1 years was approximately 39. For MACE, the NNT was approximately 62. These are clinically meaningful but context-dependent. In a population with established CVD, baseline event rates are high enough to make moderate relative reductions translate into reasonable absolute benefit. In a primary prevention cohort, the same relative reduction would yield far higher NNTs.
Conflict of Interest and Sponsorship
Boehringer Ingelheim funded the trial, provided the study drug, and employed several of the authors. The company participated in study design, data collection, data analysis, and manuscript preparation. This is standard practice for large cardiovascular outcomes trials (LEADER, SUSTAIN-6, DECLARE-TIMI 58 all followed the same model), but it is not a reason to dismiss the concern.
Independent re-analysis of the data has been limited. Academic statisticians who reviewed the published results and supplementary appendix have not identified irregularities, but full dataset access has not been granted to independent groups for replication. The 2016 FDA medical review did conduct its own analysis from submitted data and confirmed the primary endpoint result, which provides a degree of independent validation.
Several of the trial's steering committee members disclosed consulting fees from Boehringer Ingelheim, Eli Lilly, and competing SGLT2 inhibitor manufacturers. These disclosures were published transparently in the NEJM supplement, but the density of industry relationships across the authorship list is notable even by cardiology trial standards.
What Subsequent Evidence Clarified
Later SGLT2 inhibitor trials helped define what EMPA-REG OUTCOME could not answer alone.
CANVAS (canagliflozin, 2017): Showed a similar MACE reduction (HR 0.86) in a population that included some primary prevention patients (~34%). The amputation signal unique to canagliflozin, not seen with empagliflozin, complicated class-effect assumptions. Published in the NEJM.
DECLARE-TIMI 58 (dapagliflozin, 2019): Enrolled a majority primary prevention cohort (~59% without established CVD). MACE was not significantly reduced (HR 0.93 to 95% CI 0.84-1.03). Hospitalization for heart failure was reduced. This directly challenged the generalizability of the EMPA-REG MACE result to lower-risk populations. Published in the NEJM.
EMPA-KIDNEY (2023): Extended the empagliflozin evidence base to chronic kidney disease patients regardless of diabetes status, confirming renal benefits. Published in the NEJM. This filled a gap EMPA-REG OUTCOME left open: whether empagliflozin's kidney benefits extend beyond the T2D-with-CVD population.
The pattern across trials suggests that the MACE reduction observed in EMPA-REG OUTCOME may be specific to established CVD populations rather than a universal effect of SGLT2 inhibition. Heart failure and renal outcomes appear more consistently positive across risk strata.
Follow-Up Duration and Long-Term Safety
Median follow-up was 3.1 years. For a chronic disease drug expected to be used for decades, this is short. Questions about long-term bone health, ketoacidosis incidence over extended exposure, bladder cancer risk (a concern that haunted early SGLT2 development), and durability of cardiovascular benefit beyond 5 years were not addressable.
Post-marketing surveillance and registry data have been reassuring on most fronts. The bladder cancer signal has not materialized with longer follow-up. Euglycemic diabetic ketoacidosis remains a recognized but uncommon risk, primarily in surgical or fasting scenarios. Fournier's gangrene, identified post-marketing as a rare but serious adverse event, was not captured in EMPA-REG OUTCOME's safety reporting. The FDA issued a safety communication about Fournier's gangrene with SGLT2 inhibitors in 2018, three years after the trial published.
Background Therapy and the Moving Standard of Care
Participants continued their existing glucose-lowering medications, antihypertensives, and lipid-lowering therapies throughout the trial. Use of metformin (~74%), insulin (~48%), and sulfonylureas (~43%) was common. Statin use was approximately 77%, and ACE inhibitor or ARB use exceeded 80%.
This reflects 2010-2013 prescribing patterns, when enrollment occurred. Since then, GLP-1 receptor agonists have become standard co-therapy in high-risk T2D patients. Whether empagliflozin would produce the same incremental benefit when added to a modern regimen that already includes semaglutide or dulaglutide is unknown. The SOUL trial and ongoing combination studies may eventually answer this, but EMPA-REG OUTCOME alone cannot.
The Bottom Line on These Limitations
None of these criticisms invalidate the trial. EMPA-REG OUTCOME was methodologically sound for its stated objective: testing whether empagliflozin was safe and potentially beneficial for cardiovascular outcomes in T2D patients with established CVD. It answered that question.
The problems arise when the result is applied beyond its evidence boundary. Prescribing empagliflozin for cardiovascular protection in a 52-year-old with newly diagnosed T2D and no vascular disease is a clinical judgment call, not a direct application of EMPA-REG OUTCOME data. Subsequent trials (particularly DECLARE-TIMI 58) suggest the MACE benefit may not extend to that population, even if heart failure and renal benefits do.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Zinman B, Wanner C, Lachin JM, et al. Empagliflozin, cardiovascular outcomes, and mortality in type 2 diabetes. N Engl J Med. 2015;373(22):2117-2128. PubMed
- Neal B, Perkovic V, Mahaffey KW, et al. Canagliflozin and cardiovascular and renal events in type 2 diabetes. N Engl J Med. 2017;377(7):644-657. PubMed
- Wiviott SD, Raz I, Bonaca MP, et al. Dapagliflozin and cardiovascular outcomes in type 2 diabetes. N Engl J Med. 2019;380(4):347-357. PubMed
- The EMPA-KIDNEY Collaborative Group. Empagliflozin in patients with chronic kidney disease. N Engl J Med. 2023;388(2):117-127. PubMed
- Cosentino F, Grant PJ, Aboyans V, et al. 2019 ESC Guidelines on diabetes, pre-diabetes, and cardiovascular diseases. Eur Heart J. 2020;41(2):255-323. PubMed
- FDA. Jardiance (empagliflozin) prescribing information. FDA Label