Honest Criticisms and Limitations of the STEP-8 Trial

At a glance
| Parameter | Detail | |---|---| | N | 338 (semaglutide 2.4 mg: 126, liraglutide 3.0 mg: 127, placebo: 85) | | Intervention | Subcutaneous semaglutide 2.4 mg once weekly | | Comparator | Subcutaneous liraglutide 3.0 mg once daily + placebo | | Duration | 68 weeks (20-week dose escalation + 48-week maintenance) | | Primary endpoint | Percentage change in body weight from baseline to week 68 | | Key result | Semaglutide: −15.8% vs liraglutide: −6.4% (estimated treatment difference: −9.4 percentage points; P <0.001) |
Why a Limitations Deep-Dive Matters
STEP-8 was the first randomized trial to directly pit semaglutide 2.4 mg against liraglutide 3.0 mg for chronic weight management. The topline result, a roughly 2.5-fold greater weight loss with semaglutide, dominated headlines and shaped prescribing conversations. But headline numbers can obscure real methodological issues. This page catalogs each limitation, drawing on the published trial, editorial commentary, and related evidence so clinicians can form a more complete picture.
Limitation Framework for Evaluating STEP-8
The criticisms below are organized into six categories. Each one addresses a different threat to the trial's internal validity, external applicability, or clinical interpretation.
| Category | Core Question | |---|---| | Enrollment bias | Who was studied, and who was left out? | | Duration and design | Was 68 weeks long enough to answer the right questions? | | Open-label structure | Could knowledge of treatment assignment inflate differences? | | Statistical considerations | Do the analytical choices hold up under scrutiny? | | Sponsor and COI | How should industry funding color interpretation? | | Post-publication commentary | What did peer reviewers and letter-writers flag? |
1. Enrollment Bias and Narrow Demographics
STEP-8 enrolled adults aged 18 or older with a BMI of 30 kg/m² or higher (or 27 kg/m² with at least one weight-related comorbidity) who did not have diabetes. This mirrors the FDA-approved indication for semaglutide 2.4 mg, but it excludes the large population of patients with type 2 diabetes who represent a major segment of real-world GLP-1 prescribing.
The trial's demographic breakdown further narrows generalizability:
| Characteristic | Semaglutide group | Liraglutide group | |---|---|---| | Female (%) | 78 | 78 | | White (%) | 75 | 84 | | Mean age (years) | 49 | 49 | | Mean BMI (kg/m²) | 37.2 | 38.0 | | Mean body weight (kg) | 104.5 | 108.8 |
The sample was predominantly White women in their late 40s. Representation of Black, Hispanic, and Asian participants was limited. Given documented differences in GLP-1 pharmacokinetics and body composition across racial and ethnic groups, STEP-8 cannot speak confidently to outcomes in populations that were underrepresented. The exclusion of patients with type 2 diabetes also means the trial says nothing about the interaction between glycemic control and weight-loss magnitude, an interaction that other STEP-program trials (STEP-2, specifically) showed can shift expected results.
Patients with a history of bariatric surgery, severe psychiatric illness, or active eating disorders were excluded. While these exclusions protect internal validity, they remove exactly the complex patients who present most frequently in obesity medicine clinics.
2. Duration and Design Constraints
Sixty-eight weeks is a standard timeframe for obesity pharmacotherapy trials, but it is too short to answer the questions that matter most: weight regain, cardiovascular events, and long-term tolerability.
Data from the STEP-1 extension study demonstrated that participants regained approximately two-thirds of lost weight within one year of stopping semaglutide. STEP-8 included no off-treatment follow-up phase. Clinicians therefore have no data from this trial on what happens when patients discontinue either drug or switch between them.
The 20-week dose-escalation period is another design element worth scrutinizing. Semaglutide was titrated from 0.25 mg weekly up to the full 2.4 mg dose over 16 weeks. Liraglutide was titrated from 0.6 mg daily to 3.0 mg over roughly 4 weeks, following its approved labeling. This means semaglutide spent a longer proportion of the trial at sub-therapeutic doses. Some commentators argued this asymmetry might actually have understated semaglutide's advantage, while others noted it simply reflects real-world dosing protocols and is therefore a reasonable design choice.
The trial also used a run-in period with lifestyle counseling before randomization. Participants who could not comply during this phase were excluded, which likely enriched the study population for motivated, adherent individuals.
3. Open-Label Comparator and Blinding Concerns
STEP-8 used a partially blinded design. The semaglutide versus placebo comparison was double-blinded. The semaglutide versus liraglutide comparison was open-label.
This is a significant limitation. Patients and investigators knew which active drug was being administered. Awareness of treatment assignment can influence:
- Patient-reported outcomes and subjective well-being
- Adherence to the lifestyle intervention component
- Investigator enthusiasm during counseling visits
- Threshold for reporting adverse events
The open-label design was a practical concession. Semaglutide is injected once weekly; liraglutide requires daily injections. Maintaining a true double-blind would have required daily placebo injections in the semaglutide group and weekly placebo injections in the liraglutide group, a significant burden. The investigators acknowledged this tradeoff in the published paper, but it remains a vulnerability. Any trial where the comparator requires more frequent self-injection inherently disadvantages that arm through differential burden and reduced blinding.
4. Statistical and Analytical Caveats
The primary analysis used a treatment-policy estimand (intention-to-treat principle), which accounts for treatment discontinuation. Sensitivity analyses using a trial-product estimand (on-treatment data only) showed even larger differences favoring semaglutide. Both approaches have merit. The treatment-policy approach is more conservative and clinically relevant, since it reflects what happens in practice when some patients stop therapy.
Several statistical points deserve attention:
Sample size. At 338 participants across three arms, STEP-8 was modestly powered. The trial was designed to detect a difference of 7 percentage points in weight change with 90% power, and it achieved this comfortably. But the sample was not powered for subgroup analyses by sex, race, baseline BMI, or comorbidity status. Any subgroup findings should be treated as hypothesis-generating only.
Multiple comparisons. The trial had two primary comparisons (semaglutide vs. liraglutide and semaglutide vs. placebo) and multiple secondary endpoints. A hierarchical testing procedure was used to control the family-wise error rate. This is appropriate methodology, but clinicians should note that some secondary endpoints could have been tested only if prior endpoints in the hierarchy reached significance.
Missing data. Approximately 15-20% of participants in each group discontinued treatment before week 68. The primary analysis used a mixed model for repeated measures, which handles missing data under a missing-at-random assumption. If dropout was related to unmeasured factors (such as privately experienced side effects not captured in the data), the results could be biased.
5. Sponsor Involvement and Conflicts of Interest
STEP-8 was funded by Novo Nordisk, the manufacturer of both semaglutide (Wegovy) and liraglutide (Saxenda). The company was involved in trial design, data collection, statistical analysis, and manuscript preparation.
This creates an unusual conflict dynamic. Novo Nordisk manufactures both drugs being compared, which means a positive result for semaglutide comes partly at the expense of liraglutide sales. In practice, the financial incentive clearly favored semaglutide: Wegovy carries a higher price point and represents the company's growth strategy, while Saxenda (liraglutide 3.0 mg) was already facing generic competition timelines.
Multiple authors disclosed financial relationships with Novo Nordisk, including consulting fees and research grants. This does not invalidate the data, but it is standard practice to consider sponsor involvement when interpreting effect sizes, particularly in an open-label comparison. An independently funded replication would meaningfully strengthen confidence in the magnitude of difference observed.
6. Post-Publication Commentary and Peer Criticism
Several points were raised in editorials and letters-to-the-editor following the trial's publication in JAMA:
Dose equivalence. Critics questioned whether 2.4 mg of once-weekly semaglutide and 3.0 mg of once-daily liraglutide represent pharmacologically "fair" comparisons. Semaglutide 2.4 mg is the maximum approved obesity dose. Liraglutide 3.0 mg is also its approved ceiling. But the GLP-1 receptor engagement profiles differ substantially due to differences in half-life (approximately 7 days for semaglutide vs. 13 hours for liraglutide), albumin binding, and steady-state pharmacokinetics. Some argued that comparing maximum approved doses does not necessarily mean comparing maximum pharmacological potential.
Gastrointestinal tolerability. Nausea, vomiting, and diarrhea were common in both groups but numerically more frequent with semaglutide. Whether the greater weight loss is partly mediated by reduced caloric intake due to GI side effects (rather than central appetite suppression alone) remains debated. The trial did not include validated dietary recall data sufficient to disentangle these mechanisms.
Absence of hard outcomes. STEP-8 measured weight change, waist circumference, and cardiometabolic biomarkers. It did not assess cardiovascular events, quality of life using validated instruments, or patient-centered functional outcomes. The SELECT trial subsequently demonstrated cardiovascular benefit for semaglutide 2.4 mg, but that was a different population (established cardiovascular disease) and a different question. STEP-8 alone cannot support claims about cardiovascular superiority of semaglutide over liraglutide.
Lifestyle intervention intensity. Both arms received monthly counseling sessions encouraging a 500 kcal/day deficit and 150 minutes/week of physical activity. The real-world intensity of lifestyle support varies enormously. Patients receiving less structured counseling may see different absolute weight-loss numbers with either drug.
What This Means for Clinical Decision-Making
STEP-8 provides strong evidence that semaglutide 2.4 mg produces greater short-term weight loss than liraglutide 3.0 mg in a selected population. That finding is internally valid and consistent with the broader STEP program. But clinicians choosing between these agents should recognize what the trial does not tell them: whether the difference persists beyond 68 weeks, whether it holds in diverse populations, whether it translates to differences in hard clinical endpoints when the two drugs are compared directly, and how much of the effect-size gap is amplified by open-label awareness.
For patients with type 2 diabetes, prior bariatric surgery, or significant psychiatric comorbidity, STEP-8's results require extrapolation rather than direct application. Cost, injection frequency preference, insurance formulary constraints, and individual tolerability often matter more in practice than the 9.4-percentage-point difference observed under controlled trial conditions.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Rubino DM, Greenway FL, Khalid U, et al. Effect of weekly subcutaneous semaglutide vs daily liraglutide on body weight in adults with overweight or obesity without diabetes: the STEP 8 randomized clinical trial. JAMA. 2022;327(2):138-150. PubMed
- Wilding JPH, Batterham RL, Calanna S, et al. Once-weekly semaglutide in adults with overweight or obesity. N Engl J Med. 2021;384(11):989-1002. PubMed
- Wilding JPH, Batterham RL, Davies M, et al. Weight regain and cardiometabolic effects after withdrawal of semaglutide: the STEP 1 trial extension. Diabetes Obes Metab. 2022;24(8):1553-1564. PubMed
- Lincoff AM, Brown-Frandsen K, Colhoun HM, et al. Semaglutide and cardiovascular outcomes in obesity without diabetes. N Engl J Med. 2023;389(24):2221-2232. PubMed
- Wegovy (semaglutide) prescribing information. U.S. Food and Drug Administration. FDA Label
- Saxenda (liraglutide) prescribing information. U.S. Food and Drug Administration. FDA Label