Honest Criticisms and Limitations of the KEEPS Trial

At a glance
- N: 727 women randomized across 9 US centers
- Intervention: Oral conjugated equine estrogens (o-CEE 0.45 mg/day) or transdermal 17β-estradiol (t-E2 50 µg/day), both with cyclic oral micronized progesterone (200 mg × 12 days/month)
- Comparator: Matching placebo pills and patches
- Duration: 48 months of treatment
- Primary endpoint: Rate of change in carotid artery intima-media thickness (CIMT)
- Key result: Neither active treatment slowed CIMT progression compared with placebo; secondary findings included improvements in mood, anxiety, and vasomotor symptoms
Why KEEPS Mattered, and Why Its Limits Matter More
After the Women's Health Initiative (WHI) reported increased cardiovascular events with HRT in older postmenopausal women, the field needed data on younger, recently menopausal women. KEEPS was designed to test the timing hypothesis: that estrogen started within a few years of menopause might protect the vasculature rather than harm it. The trial did not confirm vascular benefit on imaging, but it also did not find harm, which many interpreted as supportive of early initiation.
That interpretation, though, rests on a trial with structural weaknesses worth examining carefully.
Enrollment Bias: A Healthier-Than-Average Cohort
KEEPS recruited women aged 42 to 58 who were within 6 to 36 months of their final menstrual period. Exclusion criteria removed anyone with a body mass index above 35, current smoking exceeding 10 cigarettes per day, uncontrolled hypertension, LDL cholesterol above 190 mg/dL, diabetes, or known cardiovascular disease.
The result was a cohort with a mean age of approximately 52 years, mean BMI of roughly 26, and a baseline 10-year Framingham risk score well under 5%. These women were, in effect, the healthiest subset of menopausal patients. Clinical populations that endocrinologists and gynecologists actually treat frequently include women with metabolic syndrome, obesity, or borderline cardiovascular risk factors, none of whom would have qualified for KEEPS.
This matters because the timing hypothesis is most relevant precisely in women who carry some baseline vascular risk. Testing HRT in a population where atherosclerosis progression is already very slow makes it difficult to detect a protective signal, even if one exists. It also means the safety reassurance from KEEPS cannot be extended confidently to women with higher baseline risk.
Four Years Is Not Long Enough
Atherosclerosis is a slow process. The mean annual CIMT progression rate in the placebo arm of KEEPS was approximately 0.007 mm/year, a clinically small increment that makes detecting treatment differences over 48 months statistically challenging. For context, the much larger MESA cohort study followed participants for a median of roughly 10 years to establish reliable CIMT trajectory data.
A useful framework for evaluating trial duration against disease tempo:
| Factor | KEEPS | What Adequate Power Would Require | |---|---|---| | Annual CIMT change (placebo) | ~0.007 mm/yr | N/A | | Total observation window | 4 years | 7-10 years per MESA/ARIC data | | Cumulative expected Δ | ~0.028 mm | <0.05 mm detectable difference needed | | Hard CV events captured | 0 adjudicated MIs | Thousands of patient-years for event-driven design |
The trial was not designed or powered to detect hard cardiovascular outcomes such as myocardial infarction or stroke. With only 727 participants over 4 years, the expected event count in this low-risk cohort was near zero. KEEPS can say that early HRT did not cause obvious imaging-level harm in healthy women over a short window. It cannot say whether early HRT prevents heart attacks over a decade.
The ELITE trial, published in 2016 with a 5-year follow-up, partially addressed this gap by showing CIMT benefit in early (but not late) postmenopausal women, lending some support to the timing hypothesis that KEEPS could not confirm.
Statistical Caveats
Underpowered for Its Own Primary Endpoint
The original power calculation assumed a CIMT progression rate difference of approximately 0.012 mm/year between treatment and placebo. The observed difference was far smaller. Post hoc analyses showed that the trial would have needed substantially more participants, or a longer follow-up period, to detect the effect sizes actually observed. A null result in an underpowered trial is not evidence of no effect; it is an inconclusive result.
Multiple Comparisons Without Formal Adjustment
KEEPS reported on numerous secondary endpoints: coronary artery calcium scores, lipid panels, insulin sensitivity, mood, sexual function, and vasomotor symptoms. The statistically significant findings on mood and hot flashes were secondary outcomes. The published results did not apply formal correction for multiple comparisons across these endpoints. In a trial with a null primary result, positive secondary findings require cautious interpretation, as they may represent chance associations inflated by the number of tests performed.
Surrogate Endpoint Controversy
CIMT is a surrogate marker. Its correlation with cardiovascular events, while supported by epidemiologic data, has been questioned by meta-analyses showing that changes in CIMT do not reliably predict changes in event rates at the individual level. The 2012 Cochrane review of HRT and cardiovascular disease emphasized that surrogate imaging endpoints should not substitute for event-driven data when making prescribing recommendations.
Generalizability Gaps
Race and Ethnicity
The KEEPS cohort was approximately 94% non-Hispanic White. This is a significant limitation given known differences in CIMT progression rates, HRT metabolism, cardiovascular risk profiles, and menopausal symptom burden across racial and ethnic groups. Black women, who experience earlier menopause onset and higher cardiovascular mortality, were severely underrepresented.
Progestogen Choice
All participants received oral micronized progesterone (Prometrium). Many real-world prescriptions use medroxyprogesterone acetate (MPA), norethindrone, or other synthetic progestins. Because the cardiovascular signal in the WHI was driven partly by the MPA component, the safety data from KEEPS cannot be generalized to regimens using different progestogens. The FDA label for Premarin carries a class-wide boxed warning that does not distinguish between progestogen types.
Dose and Route
The oral CEE dose used in KEEPS (0.45 mg/day) was lower than the WHI dose (0.625 mg/day). The transdermal estradiol dose (50 µg/day) is standard but not the only dose used clinically. Whether the findings apply to higher doses remains unknown. The trial did not include estradiol-only arms without progestogen, limiting conclusions for hysterectomized women.
What Post-Publication Commentary Raised
Several letters to the editor and invited commentaries published after the 2014 primary results highlighted additional concerns:
Adherence and dropout. Approximately 25% of randomized participants discontinued study medication before 48 months. The primary analysis used intention-to-treat methodology, which is appropriate for preserving randomization but dilutes the treatment signal when a quarter of participants stop taking the intervention. Per-protocol analyses were reported but carry their own selection biases.
Coronary artery calcium sub-study. The CAC score results, published separately, showed a statistically significant reduction with oral CEE compared with placebo. Some commentators argued this was the more clinically relevant imaging endpoint. Others noted that CAC was a pre-specified secondary endpoint in an otherwise null trial and should be treated as hypothesis-generating rather than confirmatory.
Conflict-of-interest considerations. KEEPS was funded by the Aurora Foundation, with additional support from pharmaceutical manufacturers including Pfizer (which markets Premarin) and Abbott (which marketed the estradiol patch at the time of the trial). While industry funding does not invalidate findings, it warrants disclosure scrutiny. The investigators also received speaking fees from hormone therapy manufacturers, which multiple commentators noted.
Comparison to WHI reanalyses. Some post-publication analyses pointed out that the WHI's own age-stratified reanalyses (women aged 50 to 59) showed trends consistent with the timing hypothesis without requiring a separate trial. Critics asked whether KEEPS, given its small size and short duration, added meaningful data beyond what the WHI subgroup analyses already provided.
What KEEPS Can and Cannot Support
The trial supports these conclusions with reasonable confidence: low-dose HRT started in early menopause does not accelerate subclinical atherosclerosis over 4 years in healthy women, and it improves vasomotor symptoms and mood.
The trial cannot support these claims: that early HRT prevents cardiovascular disease, that its safety profile applies to women with metabolic risk factors, that its findings extend beyond the specific formulations tested, or that 4 years of imaging data translate to long-term clinical outcomes. The 2022 NAMS position statement on HRT acknowledges the timing hypothesis as biologically plausible but notes that definitive event-driven trial data in early menopausal women remain lacking.
Clinicians citing KEEPS to reassure patients about cardiovascular safety should be transparent about these boundaries. The trial answered a narrow question in a narrow population. That answer was reassuring, but it was also incomplete.
Frequently asked questions
›
›
›
›
›
›
›
›
›
›
References
- Harman SM, Black DM, Naftolin F, et al. Arterial imaging outcomes and cardiovascular risk factors in recently menopausal women: a randomized trial. Ann Intern Med. 2014;161(4):249-260. PubMed
- Hodis HN, Mack WJ, Henderson VW, et al. Vascular effects of early versus late postmenopausal treatment with estradiol (ELITE). N Engl J Med. 2016;374(13):1221-1231. PubMed
- The 2022 Hormone Therapy Position Statement of The North American Menopause Society. Menopause. 2022;29(7):767-794. PubMed
- FDA. Premarin (conjugated estrogens) prescribing information. FDA Label
- Polak JF, Pencina MJ, Pencina KM, et al. Carotid-wall intima-media thickness and cardiovascular events. N Engl J Med. 2011;365(3):213-221. PubMed