Was LEADER designed to prove liraglutide reduces heart attacks and strokes?

Not exactly. LEADER was designed primarily to rule out excess cardiovascular risk (noninferiority). Superiority testing was pre-specified but only tested after noninferiority was confirmed. The trial was adequately powered for both, but the design originated from a safety mandate, not an efficacy hypothesis.

Why did 26% of patients stop taking liraglutide during the trial?

Gastrointestinal side effects (nausea, vomiting, diarrhea) were the most common reasons, consistent with the known GLP-1 RA side-effect profile. Some patients also discontinued for non-medical reasons. The high discontinuation rate dilutes the ITT effect estimate, meaning the true on-treatment benefit is likely larger than 13%.

Does LEADER apply to patients without existing heart disease?

Only about 19% of participants were in the primary prevention category (age ≥60 with risk factors but no established CVD). The trial was not powered to show a significant effect in this subgroup alone. Guidelines recommend GLP-1 RAs primarily for patients with established atherosclerotic cardiovascular disease based on LEADER.

How was cardiovascular death defined in LEADER?

An independent, blinded clinical events committee adjudicated all deaths using standardized definitions. Cardiovascular death included fatal MI, fatal stroke, heart failure death, sudden cardiac death, and death from other cardiovascular causes. Deaths of undetermined cause were classified as cardiovascular, which is standard practice but slightly inflates CV death counts in both arms.

Is the LEADER result a class effect for all GLP-1 drugs?

No. LEADER tested liraglutide specifically. Other GLP-1 RAs have shown varying CVOT results: semaglutide (SUSTAIN-6) showed a larger effect, while lixisenatide (ELIXA) and exenatide extended-release (EXSCEL) showed neutral results. Each drug has distinct pharmacology, and results should not be assumed to transfer across the class.

What was the noninferiority margin and why does it matter?

The FDA-mandated margin was a hazard ratio upper bound of 1.30, meaning liraglutide could not increase MACE risk by more than 30%. This is a deliberately conservative bar. LEADER cleared it decisively (upper CI bound 0.97), then passed the superiority test. The margin matters because it defines the minimum safety standard all new diabetes drugs must meet.

How does LEADER compare to EMPA-REG OUTCOME for empagliflozin?

Both showed cardiovascular benefit in T2D with high CV risk, but through different patterns. EMPA-REG showed a striking 38% reduction in CV death with empagliflozin, driven heavily by heart failure hospitalization reduction. LEADER showed a 22% CV death reduction with less heart failure effect. The two drug classes likely work through different mechanisms, and many clinicians now use them in combination.

Did LEADER look at kidney outcomes?

Yes. A pre-specified secondary renal composite (new-onset persistent macroalbuminuria, doubling of serum creatinine, end-stage renal disease, or renal death) showed a 22% reduction with liraglutide (HR 0.78 to 95% CI 0.67, 0.92). This was driven primarily by macroalbuminuria reduction rather than hard renal endpoints.

Why is understanding the estimand important for interpreting LEADER?

The estimand defines exactly what question the trial answers. LEADER used a treatment-policy estimand (what happens when you assign a drug, regardless of adherence). This is conservative and answers the regulatory question. Clinicians asking "will my adherent patient benefit?" need the on-treatment estimate, which suggests a larger effect (~17% reduction). Neither number is wrong; they answer different questions.

Inside the LEADER Methodology: What Most Summaries Skip

Q: Could the blood sugar difference between groups explain the MACE reduction?

The 0.4% HbA1c gap is unlikely to explain the cardiovascular benefit. The ACCORD trial showed that aggressive glycemic lowering in a similar population did not reduce MACE and may have increased mortality. The pattern of benefit in LEADER (driven by CV death rather than MI or stroke) also does not match what glucose lowering alone would predict.

By HealthRX.com Medical Team

Published May 25, 2026Updated May 25, 2026Last reviewed May 25, 2026

Clinical image for Inside the LEADER Methodology: What Most Summaries Skip Image: HealthRX.com AI-generated clinical image

At a glance

| Parameter | Detail | |---|---| | N | 9,340 | | Intervention | Liraglutide 1.8 mg daily (subcutaneous) | | Comparator | Matching placebo (both added to standard of care) | | Duration | Median 3.8 years (minimum 3.5 years follow-up) | | Primary endpoint | First occurrence of 3-point MACE (CV death, nonfatal MI, nonfatal stroke) | | Key result | HR 0.87 (95% CI 0.78, 0.97); p <0.001 for noninferiority, p = 0.01 for superiority |

Why the Design Matters More Than the Headline

Most trial summaries stop at "13% relative risk reduction in MACE." That number, while accurate, obscures a set of design decisions that determine what the result actually means for clinical practice. The LEADER trial was constructed under the 2008 FDA guidance requiring cardiovascular outcomes trials (CVOTs) for all new diabetes drugs. That guidance was a response to the rosiglitazone controversy and set a noninferiority margin of 1.3 for the upper bound of the 95% confidence interval. LEADER was built to clear that bar first. Superiority was a secondary objective, tested only if noninferiority succeeded.

This distinction matters. A trial designed from the start to prove benefit would look different from one designed to rule out harm. LEADER sits in between, and its methodology reflects that tension.

Randomization and Stratification

Participants were randomized 1:1 to liraglutide or placebo using an interactive voice/web response system. Randomization was stratified by two variables: whether a participant had prior cardiovascular disease, and whether estimated glomerular filtration rate (eGFR) was above or below 60 mL/min/1.73 m². These strata were not arbitrary. About 81% of the enrolled population had established cardiovascular disease at baseline, which concentrated events in the group most likely to generate them.

The stratification by renal function was clinically sharp. Patients with CKD stage 3 or worse metabolize GLP-1 receptor agonists differently, and renal impairment is an independent cardiovascular risk amplifier. By stratifying rather than excluding these patients, the investigators ensured balanced distribution across arms without sacrificing generalizability.

One point rarely discussed: the LEADER protocol allowed investigators to reduce the dose from 1.8 mg to 1.2 mg if side effects were intolerable. About 6% of liraglutide-treated patients used the lower dose. Because this happened post-randomization, the intention-to-treat analysis includes these patients at whatever dose they received. The effect estimate therefore represents a slightly diluted version of full-dose therapy.

Blinding and the Open-Label Problem

LEADER was double-blind with matching placebo pens. The pens were visually identical, and the injection volume was the same. This is a stronger blinding approach than some CVOTs that used oral placebos for injectable drugs.

But here is where things get complicated. Standard of care was open-label. Both groups received whatever glucose-lowering therapy their physicians chose, excluding other GLP-1 receptor agonists, DPP-4 inhibitors, and pramlintide. Investigators were encouraged to treat to glycemic targets per local guidelines. The result was a 0.4 percentage point HbA1c separation between groups at 36 months (7.0% vs. 7.4%).

This glycemic gap creates an interpretive problem. Was the MACE reduction driven by liraglutide's cardiovascular biology, by better glucose control, or by some combination? The ACCORD trial had shown that aggressive glycemic reduction in a similar population did not reduce MACE and may have increased mortality. That context argues against a purely glycemic explanation for LEADER's result. The separation also narrowed over time as clinicians in the placebo group added more therapy, suggesting that the early glycemic difference may have mattered less than the drug's direct vascular effects.

Inclusion and Exclusion Criteria: Who Was Actually Studied

The enrollment criteria shaped the population in ways that limit extrapolation:

Age ≥50 with established CV disease (coronary, cerebrovascular, peripheral vascular, CKD stage 3+, or heart failure NYHA II, III)
Age ≥60 with CV risk factors (microalbuminuria, hypertension plus LVH, LV dysfunction, or ankle-brachial index <0.9)
HbA1c ≥7.0%
Excluded: type 1 diabetes, recent acute coronary or cerebrovascular event within 14 days, planned revascularization, dialysis patients, family or personal history of medullary thyroid carcinoma or MEN2

The exclusion of recent ACS patients is standard for safety but means LEADER says nothing about liraglutide in the immediate post-MI period. The MTC/MEN2 exclusion reflects the rodent thyroid C-cell tumor signal from preclinical studies, which remains in the Victoza label as a boxed warning despite no confirmed human cases.

The practical takeaway: LEADER's population was older (mean age 64), had long diabetes duration (mean 12.8 years), and was heavily burdened with prior CV events. Applying these results to a 45-year-old with newly diagnosed T2D and no cardiovascular history requires a leap the data do not directly support.

Primary Endpoint Definition and Adjudication

The primary composite was time to first occurrence of any component of 3-point MACE: cardiovascular death, nonfatal myocardial infarction, or nonfatal stroke. Each event was adjudicated by an independent committee blinded to treatment assignment, using standardized definitions from the Standardized Data Collection for Cardiovascular Trials Initiative.

Breaking the composite apart reveals an uneven pattern:

| Component | Liraglutide (n) | Placebo (n) | HR (95% CI) | |---|---|---|---| | CV death | 219 | 278 | 0.78 (0.66, 0.93) | | Nonfatal MI | 281 | 317 | 0.88 (0.75, 1.03) | | Nonfatal stroke | 159 | 160 | 0.89 (0.72, 1.11) | | Composite MACE | 608 | 694 | 0.87 (0.78, 0.97) |

The composite is driven primarily by cardiovascular death. Nonfatal MI and nonfatal stroke showed directionally favorable but non-significant point estimates. This is clinically important: liraglutide appeared to reduce the hardest endpoint (death) more than the softer ones. Some trialists view this as a strength, because mortality is the least susceptible to ascertainment bias. Others note that a drug reducing death without clearly reducing events raises mechanistic questions about whether it prevents events or improves survival after them.

Statistical Architecture

The statistical plan used a hierarchical (closed) testing procedure. Noninferiority for the primary composite was tested first at the one-sided 2.5% level against a margin of 1.30. Only if noninferiority was confirmed could superiority be tested, also at the two-sided 5% level. This sequential gating preserved the overall type I error rate.

The trial was event-driven: it required at least 611 primary-outcome events for 90% power to demonstrate noninferiority and at least 611 events for 80% power to detect a true hazard ratio of 0.85 for superiority. The final dataset included 1,302 first MACE events, giving the trial considerably more power than the minimum for noninferiority and adequate power for the superiority test.

Time-to-event analysis used Cox proportional hazards models, stratified by the same two randomization variables (prior CV disease and baseline eGFR). The proportional hazards assumption was tested and held. Kaplan-Meier curves separated gradually after about 12 months, consistent with a treatment effect that accumulates with exposure rather than appearing immediately.

One nuance: the protocol specified an "in-trial" estimand, meaning all events counted regardless of whether the participant was still taking the drug. About 26% of liraglutide patients and 30% of placebo patients discontinued study drug prematurely. Because these patients were still followed for events, the intention-to-treat analysis captures real-world adherence patterns. A per-protocol or on-treatment analysis would likely show a larger effect size, but the ITT result is more conservative and more generalizable.

What the Estimand Framework Reveals

The ICH E9(R1) addendum on estimands was published after LEADER, but applying its framework retrospectively is instructive. LEADER's primary estimand is a "treatment policy" strategy: it asks, "What happens when patients are assigned to liraglutide, regardless of whether they keep taking it?"

This is the right estimand for a regulatory question ("Is this drug safe to approve?") but arguably the wrong one for a clinical question ("Will my patient benefit if she takes liraglutide and stays on it?"). A "hypothetical" estimand, modeling what would have happened had everyone adhered, would answer the clinical question. Post-hoc on-treatment analyses from LEADER substudies suggest the on-treatment HR was closer to 0.83, a somewhat stronger signal diluted by the ~28% discontinuation rate in the ITT population.

Clinicians prescribing liraglutide should recognize that the 13% reduction is the conservative, intent-to-treat estimate. Patients who tolerate the drug and remain adherent may derive more benefit, though that inference comes with the usual caveats of on-treatment analysis.

Comparator Choice and Standard of Care

The comparator was placebo plus standard of care, not an active GLP-1 comparator or another glucose-lowering drug class. This was dictated by the FDA CVOT framework, which aims to isolate the cardiovascular effect of the test drug from background therapy.

Standard of care evolved during the trial. Metformin use was ~76% at baseline. Insulin use was ~45%. SGLT2 inhibitors were approved during the enrollment period, and a small percentage of patients received them. This creates a moving standard-of-care problem: the control arm received progressively better therapy over the trial's duration, which would, if anything, bias against liraglutide by improving outcomes in the placebo group.

The absence of an active comparator means LEADER cannot answer whether liraglutide is better than an SGLT2 inhibitor for cardiovascular protection. The SUSTAIN-6 trial later showed semaglutide (a more potent GLP-1 RA) achieved a larger MACE reduction (26%), but cross-trial comparisons are unreliable. The 2019 ADA/EASD consensus subsequently recommended GLP-1 RAs or SGLT2 inhibitors as second-line therapy after metformin in patients with established atherosclerotic disease, giving LEADER direct guideline impact.

Limitations the Authors Acknowledged

The original publication and supplementary materials note several limitations:

High discontinuation rate (~26 to 30%), typical for long injectable trials but diluting the treatment effect in ITT analysis
Glycemic separation between arms, making it impossible to fully distinguish glucose-mediated from direct cardiovascular effects
Population skewed toward secondary prevention (81% with prior CVD), limiting applicability to primary prevention
Single GLP-1 RA tested, so results are not automatically class-wide
Open-label rescue therapy, potentially introducing bias through differential co-intervention

A limitation less discussed: the trial ran across 32 countries with varying standards of cardiovascular care. Regional heterogeneity in background therapy, event rates, and clinical practice introduces noise that the stratification variables only partially address.

Clinical Translation

LEADER changed practice. The FDA added a cardiovascular risk reduction indication to the Victoza label in 2017, making liraglutide the first GLP-1 RA approved specifically for cardiovascular benefit. The trial's methodology, while built for a noninferiority regulatory question, generated a superiority result strong enough to shift guidelines and prescribing patterns worldwide.

The critical lesson is that the 13% MACE reduction is not a single number but a product of specific design choices: the population enrolled, the comparator used, the estimand selected, the discontinuation rate tolerated, and the statistical hierarchy applied. Each of these choices is defensible, but each also constrains interpretation. Reading the trial without reading the methods means missing half the story.

Frequently asked questions

›

References

Marso SP, Daniels GH, Poulter NR, et al. Liraglutide and cardiovascular outcomes in type 2 diabetes. N Engl J Med. 2016;375(4):311-322. PubMed
U.S. Food and Drug Administration. Guidance for industry: diabetes mellitus, evaluating cardiovascular risk in new antidiabetic therapies to treat type 2 diabetes. December 2008. FDA.gov
Victoza (liraglutide) prescribing information. Novo Nordisk. Revised 2017. FDA Label
Marso SP, Bain SC, Consoli A, et al. Semaglutide and cardiovascular outcomes in patients with type 2 diabetes (SUSTAIN-6). N Engl J Med. 2016;375(19):1834-1844. PubMed
Davies MJ, D'Alessio DA, Fradkin J, et al. Management of hyperglycemia in type 2 diabetes, 2018. A consensus report by the ADA and EASD. Diabetes Care. 2018;41(12):2669-2701. PubMed
ACCORD Study Group. Effects of intensive glucose lowering in type 2 diabetes. N Engl J Med. 2008;358(24):2545-2559. PubMed