Honest Criticisms and Limitations of the ARCH Trial

Clinical medical image for trials arch: Honest Criticisms and Limitations of the ARCH Trial

At a glance

| Parameter | Detail | |---|---| | N | 4,093 postmenopausal women | | Intervention | Romosozumab 210 mg SC monthly for 12 months, then alendronate 70 mg weekly | | Comparator | Alendronate 70 mg weekly throughout | | Duration | Median 33 months (12-month double-blind + open-label alendronate extension) | | Primary endpoint | Cumulative incidence of new vertebral fracture at 24 months | | Key result | 48% relative risk reduction in new vertebral fractures vs alendronate (6.2% vs 11.9%; p <0.001) |

Why This Page Exists

Most summaries of the ARCH trial stop at the headline: romosozumab beat alendronate. That result is real, and it earned romosozumab (Evenity) its FDA approval in April 2019. But the trial design contains several layers of complexity that matter for clinical decision-making. This page catalogs the limitations that the authors themselves acknowledged, that peer reviewers flagged, and that subsequent literature has continued to debate.

Enrollment Biases and Population Narrowing

ARCH enrolled postmenopausal women aged 55 to 90 with osteoporosis and prior fragility fracture. The requirement for a T-score of −2.5 or below at the total hip or femoral neck, combined with at least one moderate or two mild vertebral fractures, selected for a high-risk cohort. This is clinically appropriate for demonstrating fracture reduction in a reasonable timeframe. It is not representative of the broader population prescribed osteoporosis therapy.

Key exclusions that limit generalizability:

  • Men were entirely excluded. Male osteoporosis accounts for roughly 20% of hip fractures. The Endocrine Society guidelines recognize this gap, and ARCH does nothing to fill it.
  • Patients with metabolic bone disease other than osteoporosis were excluded, removing those with Paget's disease, hyperparathyroidism, or renal osteodystrophy.
  • Prior use of certain osteoporosis drugs was exclusionary. Women who had received bisphosphonates within 12 months, or denosumab, teriparatide, or strontium ranelate at any time, were ineligible. This rules out the very patients most likely to be considered for romosozumab in practice: those who failed first-line therapy.
  • Geographic concentration. Enrollment sites spanned 41 countries, but the primary publication does not provide a breakdown of enrollment by region. The fracture epidemiology, vitamin D status, and calcium intake norms vary substantially across these populations, and the trial was not powered to detect regional heterogeneity.

The Cardiovascular Safety Signal

This is the single most consequential limitation. During the double-blind period, adjudicated serious cardiovascular adverse events occurred in 50 patients in the romosozumab group versus 38 in the alendronate group (2.5% vs 1.9%). The difference was not statistically significant, but it was numerically consistent and prompted the FDA label to carry a boxed warning advising against use in patients who have had a myocardial infarction or stroke within the preceding year.

HealthRX Cardiovascular Risk Appraisal Framework for ARCH

We propose evaluating the CV signal across five dimensions, because the raw event counts alone do not tell the full story:

| Dimension | What ARCH Shows | What Remains Unknown | |---|---|---| | Absolute event rate | 2.5% vs 1.9% over 12 months (delta 0.6%) | Whether this delta persists, widens, or narrows beyond 12 months of exposure | | Mechanistic plausibility | Sclerostin inhibition may affect vascular calcification pathways | No dedicated vascular imaging sub-study was performed | | Comparator effect | Alendronate may be mildly cardioprotective, inflating the relative difference | Head-to-head vs placebo (FRAME trial) showed no clear CV signal | | Baseline CV risk | Patients with recent MI/stroke were not excluded at enrollment | Post-hoc subgroup analyses by baseline CV risk have not been published in full | | Adjudication rigor | Independent committee adjudicated events | Adjudication criteria were not published in the supplement |

This framework matters because the raw numbers can be read two ways. If alendronate is mildly protective (some observational data suggest this), the gap may reflect alendronate benefit rather than romosozumab harm. The parallel FRAME trial compared romosozumab to placebo and did not show the same cardiovascular imbalance, which supports the comparator-effect hypothesis. Neither interpretation is proven.

Follow-Up Duration and Design Transitions

The trial had two distinct phases: a 12-month double-blind period (romosozumab vs alendronate), followed by an open-label period where both groups received alendronate. The primary endpoint, new vertebral fracture at 24 months, spans both phases. This creates an interpretive challenge.

At 24 months, the romosozumab-to-alendronate sequence reduced vertebral fractures by 48% compared with alendronate alone. But the open-label transition means:

  • Blinding was lost after month 12. Patient behavior, reporting, and even physician attention could have shifted.
  • The "romosozumab effect" at 24 months is actually a "romosozumab followed by alendronate" effect. Clinicians cannot isolate the durability of romosozumab's bone-forming window from the contribution of subsequent antiresorptive consolidation.
  • The median follow-up of 33 months is short for a chronic disease. Osteoporosis treatment decisions typically span 5 to 10 years. The American Association of Clinical Endocrinology guidelines recommend reassessment after 5 years of bisphosphonate therapy, a timeframe ARCH cannot inform.

Nonvertebral fractures showed a significant reduction at the final assessment (cumulative incidence 8.7% vs 10.6%), but this endpoint was event-driven and time-to-event, not measured at a fixed point. The clinical fracture reduction (hip fractures specifically) was numerically favorable but the trial was not independently powered for hip fracture alone.

Statistical Considerations

The primary publication reports several statistical features that merit scrutiny:

Hierarchical testing. The analysis used a pre-specified hierarchical gatekeeping strategy: first test vertebral fractures at 24 months, then clinical fractures, then nonvertebral fractures, then hip fractures. Hip fracture significance depended on all prior endpoints reaching significance. This is methodologically sound but means the hip fracture result (3.2% vs 4.2%, not statistically significant after gatekeeping) cannot be cited as a proven benefit.

Missing data handling. Vertebral fracture assessment required lateral spine radiographs at baseline, 12 months, and 24 months. Patients without evaluable radiographs at 24 months were excluded from the primary vertebral fracture analysis. The publication reports that 80.8% of randomized patients had evaluable radiographs at 24 months. The remaining 19.2% represent a meaningful data gap. If fracture risk differed among dropouts (and dropout rates may correlate with frailty, adverse events, or intercurrent illness), the primary result could be biased in either direction.

No placebo arm. ARCH was an active-comparator trial. This was ethically appropriate, since withholding treatment from high-risk women would be unjustifiable. But it means the absolute fracture reduction attributable to romosozumab alone cannot be calculated from ARCH. The FRAME trial provides placebo-controlled data, but in a different (lower-risk) population.

Conflict of Interest and Funding

ARCH was funded by UCB Pharma and Amgen, the co-developers of romosozumab. Several authors were employees of or held stock in these companies. The statistical analysis was performed by Amgen. The independent data monitoring committee included academic members, but the sponsor controlled site selection, data collection infrastructure, and the final manuscript.

This does not invalidate the results. Industry-funded trials are the norm for large-scale fracture endpoint studies because the costs (tens of millions of dollars for multi-year, multi-country RCTs with radiographic endpoints) exceed academic funding capacity. But it does mean:

  • The trial was designed to maximize the probability of showing romosozumab superiority. The choice of alendronate rather than a more potent comparator (zoledronic acid, denosumab) may reflect strategic positioning.
  • Post-hoc analyses and sub-study publications are subject to sponsor approval timelines.
  • Independent replication of the primary finding does not exist and is unlikely to be funded.

What Post-Publication Commentary Surfaced

After the 2017 publication, several themes emerged in editorials and correspondence:

  1. The cardiovascular question dominated. An accompanying editorial in the NEJM by Khosla specifically highlighted the CV imbalance and called for dedicated mechanistic studies.
  2. Sequencing uncertainty. Clinicians questioned whether romosozumab should precede or follow bisphosphonate therapy. ARCH tested romosozumab-first, but real-world patients often arrive having already taken alendronate for years.
  3. Cost-effectiveness concerns. At a US list price exceeding $20,000 per year for romosozumab versus generic alendronate at under $200 per year, the incremental cost-effectiveness ratio became a barrier to guideline adoption. The AACE/ACE 2020 guidelines recommend romosozumab for very high-risk patients, effectively rationing by severity.
  4. Duration of anabolic window. Bone formation markers peak at month 6 and decline toward baseline by month 12. Whether a longer romosozumab course (18 or 24 months) would add benefit or risk is untested.

What ARCH Does and Does Not Prove

It proves: In postmenopausal women with severe osteoporosis and prior fracture, 12 months of romosozumab followed by alendronate reduces vertebral and clinical fractures more effectively than alendronate alone over approximately two years.

It does not prove: That romosozumab is safe in patients with cardiovascular risk factors. That the benefit persists beyond 33 months. That romosozumab is superior to other sequencing strategies (e.g., teriparatide followed by denosumab). That results generalize to men, premenopausal women, or patients with prior biologic osteoporosis therapy.

Frequently asked questions

References

  • Saag KG, Petersen J, Brandi ML, et al. Romosozumab or alendronate for fracture prevention in women with osteoporosis. N Engl J Med. 2017;377(15):1417-1427. PubMed
  • Cosman F, Crittenden DB, Adachi JD, et al. Romosozumab treatment in postmenopausal women with osteoporosis (FRAME). N Engl J Med. 2016;375(16):1532-1543. PubMed
  • Evenity (romosozumab-aqqg) prescribing information. Amgen/UCB. 2019. FDA Label
  • Camacho PM, Petak SM, Binkley N, et al. American Association of Clinical Endocrinologists/American College of Endocrinology clinical practice guidelines for the diagnosis and treatment of postmenopausal osteoporosis, 2020 update. Endocr Pract. 2020;26(Suppl 1):1-46. PubMed
  • Khosla S. Romosozumab: forward or backward? [Editorial]. N Engl J Med. 2017;377(15):1480-1481. PubMed
  • Watts NB, Adler RA, Bilezikian JP, et al. Osteoporosis in men: an Endocrine Society clinical practice guideline. J Clin Endocrinol Metab. 2012;97(6):1802-1822. PubMed