Inside the STEP-5 Methodology: What Most Summaries Skip

GLP-1 medication and metabolic health image for Inside the STEP-5 Methodology: What Most Summaries Skip

At a glance

| Detail | Value | |---|---| | N | 304 (semaglutide 203, placebo 101) | | Intervention | Semaglutide 2.4 mg subcutaneous, once weekly | | Comparator | Matched placebo injection + lifestyle counseling | | Duration | 104 weeks (2 years) | | Primary endpoint | Percentage change in body weight from baseline to week 104 | | Key result | −15.2% semaglutide vs −2.6% placebo (estimated treatment difference −12.6 percentage points, p < 0.0001) |

Why methodology matters more than the headline number

A 15.2% weight loss over two years sounds impressive on its own. But numbers without methodological context are marketing, not medicine. The value of STEP-5 lies in how the trial was built: what population was enrolled, how missing data was handled, and which statistical lens was applied to the results. Each of these choices can inflate or deflate an effect size by several percentage points.

Most trial summaries stop at the abstract. This page does not.

Randomization and blinding architecture

STEP-5 randomized participants 2:1 to semaglutide or placebo. The 2:1 ratio is common in obesity trials where the sponsor wants more granular safety and efficacy data on the active arm without substantially increasing total enrollment. All participants received identical-appearing pre-filled injection pens, and neither participants nor investigators knew the assignment.

The trial used an interactive web-response system (IWRS) for allocation, with stratification by baseline BMI category (≥27 to <30, ≥30 to <35, ≥35 to <40, and ≥40 kg/m²). Stratification by BMI category is important because response to weight-loss pharmacotherapy can vary by starting weight. Without it, chance imbalances in BMI distribution between arms could confound the primary endpoint.

One detail that rarely gets discussed: both arms received identical lifestyle counseling consisting of a 500 kcal/day deficit diet and 150 minutes per week of physical activity. This is not a minor point. The placebo arm's 2.6% weight loss reflects the effect of sustained counseling plus injection ritual, meaning the 12.6-percentage-point treatment difference isolates the pharmacological contribution of semaglutide from behavioral and expectation effects.

Inclusion and exclusion criteria: who was actually studied

Eligibility required adults aged ≥18 with BMI ≥30 kg/m², or BMI ≥27 kg/m² with at least one weight-related comorbidity (hypertension, dyslipidemia, obstructive sleep apnea, or cardiovascular disease). Participants needed at least one self-reported unsuccessful dietary effort.

Critical exclusions: type 2 diabetes, prior bariatric surgery, use of GLP-1 receptor agonists within the prior 90 days, and a history of pancreatitis. The diabetes exclusion is significant because STEP-2 studied that population separately. STEP-5's results therefore apply to the population most commonly seeking weight-loss pharmacotherapy in primary care, not to diabetic patients whose glucose-mediated appetite pathways differ.

The mean baseline characteristics tell the real story. Participants averaged approximately 106 kg body weight, BMI around 38.5 kg/m², and were predominantly female (approximately 78%) and white (approximately 93%). The narrow racial and ethnic composition is a genuine limitation, and Novo Nordisk's FDA label for Wegovy does not restrict by race, so clinicians extrapolate from a demographically limited evidence base.

The dose-escalation protocol most summaries ignore

Semaglutide 2.4 mg is not started at 2.4 mg. The trial used a 16-week dose-escalation schedule: 0.25 mg for weeks 1 through 4 to 0.5 mg for weeks 5 through 8 to 1.0 mg for weeks 9 through 12 to 1.7 mg for weeks 13 through 16, and then 2.4 mg from week 17 onward. This slow titration exists to reduce gastrointestinal side effects (nausea, vomiting, diarrhea), which are dose-dependent and peak during escalation.

Why does this matter for interpreting the 104-week result? Roughly 15% of the treatment period (16 of 104 weeks) was spent at sub-therapeutic doses. The weight-loss trajectory during escalation is shallower than the maintenance phase. Clinicians initiating Wegovy in practice follow the same schedule per the prescribing information, but patients who cannot tolerate escalation never reach the target dose, a real-world attrition mechanism the trial captures only partially.

The estimand framework: treatment-policy vs. trial-product

This is the single most important methodological feature most summaries skip entirely.

STEP-5 reported its primary endpoint under two estimands:

Treatment-policy estimand. This counts all randomized participants at week 104, regardless of whether they discontinued the drug, switched to rescue medication, or stopped injecting entirely. Missing data was handled using multiple imputation based on retrieved-dropout assumptions. In plain language: if a participant stopped semaglutide at week 40 and regained weight by week 104, that regain was imputed and counted against the drug. This is the more conservative estimand, and the one that produced the headline −15.2% result.

Trial-product estimand. This estimates the effect in participants who remained on treatment for the full 104 weeks, censoring data after treatment discontinuation and using a mixed model for repeated measures (MMRM). Under this estimand, semaglutide produced approximately −17.4% weight loss at week 104. The gap between −15.2% and −17.4% quantifies the penalty of real-world non-adherence.

Why this distinction shapes clinical decision-making

| Estimand | What it answers | Semaglutide result | Placebo result | |---|---|---|---| | Treatment-policy | "If I prescribe this drug, what weight loss should I expect on average, including patients who stop?" | −15.2% | −2.6% | | Trial-product | "If a patient stays on this drug continuously, what weight loss should they expect?" | −17.4% | −7.1% |

The placebo results are also instructive. Under the treatment-policy estimand, placebo produced −2.6%; under the trial-product estimand, −7.1%. The 4.5-percentage-point gap in the placebo arm reflects a selection effect: participants who adhered to placebo injections for two full years were also more adherent to diet and exercise counseling.

For clinicians, the treatment-policy estimand is the realistic one. Patients miss doses, discontinue due to side effects, or lose access due to insurance changes. Planning around −15% rather than −17% is more honest counseling.

Statistical analysis and multiplicity control

The primary analysis used an ANCOVA model with treatment group as a factor and baseline body weight as a covariate, applied to the multiple-imputed datasets under the treatment-policy estimand. The trial pre-specified a hierarchical testing procedure across two co-primary endpoints (percent weight change and proportion achieving ≥5% loss) and three confirmatory secondary endpoints, controlling the family-wise type I error rate at 5%.

All five endpoints in the hierarchy achieved statistical significance, so no alpha was "spent" on a failed endpoint. This is a clean result from a statistical perspective. Had any endpoint in the chain failed, all subsequent endpoints would have been reported as nominal p-values only, a distinction that matters for regulatory interpretation but is routinely omitted from media coverage.

The sample size of 304 was powered to detect a treatment difference of approximately 5 percentage points in weight change with 90% power. The observed difference of 12.6 points was well above this threshold, meaning STEP-5 was overpowered for its primary endpoint. That is not a flaw, but it does mean the confidence intervals are tighter than they would be in a smaller study, which can give an exaggerated sense of precision.

Discontinuation rates and what they signal

By week 104, approximately 13% of semaglutide participants and 27% of placebo participants discontinued the trial drug prematurely. The higher placebo discontinuation is expected (less perceived benefit), but the semaglutide discontinuation rate is clinically relevant: roughly 1 in 8 participants could not or chose not to continue the drug for two years.

The most common reasons for discontinuation on semaglutide were adverse events (predominantly gastrointestinal), participant withdrawal of consent, and "other" reasons. The trial does not separate insurance-related discontinuation from personal-preference discontinuation, a distinction that matters enormously for real-world implementation where cost barriers are the primary driver of stopping, as documented by analyses of commercial claims data.

Weight-loss trajectory: not linear, not a plateau

Weight loss with semaglutide followed a characteristic curve: rapid loss during weeks 0 through 60, gradual deceleration from weeks 60 through 80, and approximate stabilization from week 80 onward. The placebo arm showed modest early loss followed by gradual regain after week 20.

This trajectory has clinical implications. Patients who expect linear loss indefinitely will be disappointed around month 15 and may question whether the drug "stopped working." The AGA's 2024 clinical practice guideline on pharmacotherapy for obesity recommends counseling patients that weight stabilization, not continued loss, is the expected long-term outcome and does not indicate treatment failure.

Limitations the authors acknowledged (and one they didn't)

The published paper lists several limitations: single-country enrollment (primarily U.S., with some Canadian and European sites), limited racial and ethnic diversity, and the inability to assess post-treatment weight regain because the protocol did not include a withdrawal phase.

One limitation not discussed in the paper: the 2:1 randomization ratio means only 101 participants received placebo. Small comparator arms reduce precision in the placebo estimate, and the placebo group's confidence intervals at week 104 are visibly wider than the semaglutide group's. This asymmetry does not invalidate the result, but it means the placebo group's trajectory is estimated with less certainty than the drug group's.

A second under-discussed limitation: participants in both arms received counseling from trial-trained dietitians and exercise physiologists. The infrastructure of a clinical trial (regular visits, weigh-ins, accountability) likely inflates the behavioral component of both arms relative to routine clinical care, where patients may see their physician every 3 to 6 months. The generalizability gap between trial-delivered lifestyle counseling and real-world primary care is a known issue across the entire obesity pharmacotherapy literature.

How STEP-5 fits into the STEP program

STEP-5 was one of four phase 3a trials (STEP-1 through STEP-5, with STEP-4 testing withdrawal). Its unique contribution was duration: STEP-1 ran 68 weeks, while STEP-5 extended to 104 weeks with the same dose and population. The Wegovy FDA label draws on data from the entire STEP program, but STEP-5 is the primary evidence source for the claim that semaglutide's weight-loss effect is maintained beyond one year. Without STEP-5, the durability argument would rest on extrapolation rather than direct observation.

Frequently asked questions

References

  • Garvey WT, Batterham RL, Bhatta M, et al. Two-year effects of semaglutide in adults with overweight or obesity: the STEP 5 trial. Nature Medicine. 2022;28(10):2083-2091. PubMed
  • Wegovy (semaglutide) prescribing information. Novo Nordisk. Revised 2023. FDA Label
  • Wilding JPH, Batterham RL, Calanna S, et al. Once-weekly semaglutide in adults with overweight or obesity (STEP 1). N Engl J Med. 2021;384(11):989-1002. PubMed
  • Amaro A, Sugimoto D, Wharton S, et al. Efficacy and safety of semaglutide for weight management: evidence from the STEP program. Postgrad Med. 2022;134(sup1):5-17. PubMed
  • American Gastroenterological Association. Clinical practice guideline on pharmacological interventions for adults with obesity. Gastroenterology. 2024. PubMed