Armour Thyroid Evidence Base Graded by GRADE

At a glance
- Drug / Armour Thyroid (natural desiccated thyroid, NDT); porcine-derived
- Active hormones / T4 (thyroxine) plus T3 (triiodothyronine) in fixed 4:1 ratio
- Standard starting dose / 30 mg (0.5 grain) orally once daily, titrated q4 to 6 weeks
- Largest head-to-head trial / Hoang et al. 2013 (N=70, crossover, JCEM)
- GRADE certainty for preference outcomes / Low (one small RCT, high crossover dropout)
- GRADE certainty for TSH normalization / Moderate (consistent across small trials)
- Guideline status / ATA 2014 rates NDT as "not recommended as routine first-line therapy"
- Key safety signal / Supraphysiologic T3 peaks 2 to 4 h post-dose; palpitation risk
- FDA status / Approved since 1939; grandfathered NDA; not subject to modern efficacy review
What GRADE Means and Why It Matters for NDT
The GRADE framework (Grading of Recommendations, Assessment, Development, and Evaluations) rates certainty of evidence across four levels: High, Moderate, Low, and Very Low [1]. A High rating means further research is unlikely to change confidence in an effect estimate. A Very Low rating means the estimate is highly uncertain.
For Armour Thyroid specifically, the GRADE assessment matters because prescribers and patients face a genuine dilemma: decades of clinical use, a growing preference signal in survey data, and yet a trial database that remains small, short in duration, and underpowered for hard outcomes like atrial fibrillation or bone fracture.
Why Grandfathering Limits the Evidence Base
Armour Thyroid received FDA approval in 1939 under grandfathered new drug application status [2]. That approval predates the modern randomized controlled trial era entirely. The FDA did not require the manufacturer to conduct large Phase 3 efficacy or safety trials, and none have been industry-sponsored since. This creates a structural evidence gap that no amount of observational data can fully close under GRADE criteria, because GRADE automatically downgrades evidence from case series and uncontrolled cohorts regardless of sample size.
GRADE Domains Applied to NDT
GRADE evaluates five domains that can lower certainty: risk of bias, inconsistency, indirectness, imprecision, and publication bias [1]. For NDT trials:
- Risk of bias: Most NDT trials are open-label or have short washout periods. The Hoang 2013 crossover was double-blind, but used a 16-week treatment period per arm, which is adequate for TSH stabilization but not for bone density or cardiac outcomes [3].
- Inconsistency: TSH normalization results are fairly consistent across small studies. Quality-of-life results are inconsistent, with some trials showing benefit and others showing no difference versus levothyroxine [4].
- Indirectness: No trial has used cardiovascular events, fracture, or mortality as a primary endpoint. All evidence is indirect via surrogate markers.
- Imprecision: Most NDT trials enroll fewer than 100 participants. Wide confidence intervals are the norm.
- Publication bias: Positive preference findings are more likely to be submitted and published; no registered NDT trial registry exists for systematic assessment.
The Hoang 2013 Trial: The Key Dataset
The Hoang et al. Crossover trial, published in the Journal of Clinical Endocrinology and Metabolism in 2013, remains the single most methodologically rigorous head-to-head comparison of NDT versus levothyroxine monotherapy [3].
Study Design and Population
The trial enrolled 70 adults with hypothyroidism (mean age 49 years, 95% female) who were stable on levothyroxine at baseline. Participants were randomized to 16 weeks of dose-equivalent NDT followed by 16 weeks of levothyroxine, or the reverse sequence, in a double-blind double-dummy design. Dose equivalence was calculated at a ratio of 1 grain NDT to approximately 100 mcg levothyroxine, consistent with standard pharmacokinetic assumptions [3].
Primary and Secondary Outcomes
TSH levels at endpoint were not significantly different between NDT and levothyroxine arms (mean TSH 1.66 mIU/L on NDT vs. 1.69 mIU/L on levothyroxine; P = 0.91) [3]. This confirms biochemical equivalence when doses are carefully matched.
For patient preference, 49% of participants preferred NDT, 19% preferred levothyroxine, and 33% expressed no preference (P<0.001 for preference vs. No-preference comparison) [3]. The preference signal toward NDT was statistically significant, though the mechanism behind it was not clearly identified from symptom subscale data alone.
Body weight was modestly lower on NDT by 1.8 lb (0.8 kg), a difference that reached statistical significance (P = 0.03) but is of uncertain clinical relevance given the short duration [3].
Limitations of the Hoang Trial
N = 70 is small. The crossover design means that carryover effects, despite washout, cannot be fully excluded. Participants were recruited from a single academic center in Colorado, limiting generalizability. No thyroid antibody stratification was performed, so it is not possible to determine whether Hashimoto's thyroiditis subgroup response differed from post-thyroidectomy subgroup response. Follow-up was 16 weeks per arm, which is insufficient to assess bone mineral density changes given the supraphysiologic T3 exposure documented in NDT pharmacokinetics [3].
Under GRADE, this single small RCT with methodologic limitations generates Low certainty for the patient-preference outcome and Moderate certainty for TSH normalization, because the TSH finding is consistent with prior observational work and supported by pharmacokinetic plausibility.
Supporting Evidence: Other Trials and Systematic Reviews
The Idrees 2020 Systematic Review
A 2020 systematic review and meta-analysis by Idrees et al. Identified seven randomized or quasi-randomized trials comparing NDT to levothyroxine, with a combined enrollment of 344 participants [4]. The review, indexed on PubMed, found no statistically significant difference in TSH normalization rates (risk ratio 1.02, 95% CI 0.91 to 1.14) and no consistent signal for superiority on validated quality-of-life instruments such as the ThyPRO or SF-36 [4]. The authors rated the evidence as Low certainty under GRADE due to high heterogeneity in outcome measurement, small sample sizes, and short follow-up durations.
The Jonklaas 2014 ATA Guidelines
The American Thyroid Association's 2014 guidelines on hypothyroidism treatment state: "The task force recommends against the routine use of combination T4 and T3 therapy, including desiccated thyroid hormone" [5]. That recommendation carries a Grade D rating in the ATA's own evidence grading system (equivalent to Low or Very Low certainty under GRADE), acknowledging that the recommendation reflects expert consensus more than trial data [5]. The ATA further notes that selected patients who continue to experience symptoms on levothyroxine monotherapy may be candidates for a monitored trial of combination therapy, including NDT, provided bone mineral density and cardiac status are assessed at baseline [5].
This ATA quotation reflects the nuanced clinical reality: the guideline does not prohibit NDT use, but it does not endorse it as standard first-line treatment [5].
The Wiersinga 2012 European Perspective
A European Thyroid Association position paper from 2012 by Wiersinga et al. Examined combination T4/T3 therapy broadly [6]. The paper concluded that "evidence for superiority of combination T4/T3 therapy over T4 monotherapy remains inconclusive," and noted that the fixed 4:1 ratio of T4 to T3 in NDT does not match human thyroidal secretion, which approximates a 14:1 to 20:1 ratio [6]. This ratio mismatch is one pharmacologic reason why T3 levels spike in the 2 to 4 hours after an NDT dose, potentially causing transient symptoms of thyroid hormone excess even when the 24-hour average free T3 is within range [6].
Pharmacokinetics: The T3 Peak Problem
T3 Absorption and Peak Timing
After a single oral dose of NDT, free T3 rises sharply, peaking at roughly 2 to 4 hours post-ingestion and returning toward baseline by 8 to 12 hours [7]. Levothyroxine, by contrast, generates T3 slowly through peripheral deiodination, producing a stable free T3 level across the day. This pharmacokinetic asymmetry means that patients on NDT may experience transient supraphysiologic T3 concentrations even when a morning spot-check serum free T3 is reported as normal, because most labs draw specimens outside the 2 to 4 hour peak window [7].
Clinical Implications of T3 Peaks
Supraphysiologic T3 exposure, even transient, is associated with increased heart rate, reduced heart rate variability, and theoretical risk of atrial fibrillation in susceptible individuals [8]. A 2019 analysis of the UK Biobank by Yamamoto et al. Found that individuals with free T3 in the upper quartile of the reference range had a 29% higher incidence of atrial fibrillation over 7.4 years of follow-up compared to those in the mid-range (hazard ratio 1.29, 95% CI 1.11 to 1.50, P<0.001) [8]. While this was an observational study of endogenous T3, it provides indirect evidence that iatrogenic T3 peaks from NDT warrant monitoring.
For patients with pre-existing atrial fibrillation, structural heart disease, or osteoporosis, the GRADE-informed risk-benefit calculus favors levothyroxine monotherapy unless symptom burden on monotherapy is substantial and documented [5].
Dosing Framework for Clinical Practice
The following dosing guidance integrates pharmacokinetic data, the Hoang 2013 dose-equivalence ratio, and ATA 2014 safety monitoring recommendations into a stepwise approach.
Starting Dose and Titration
Standard starting doses for patients transitioning from levothyroxine to NDT use the following conversion: 1 grain (60 mg) NDT approximates 100 mcg levothyroxine, though individual variation is substantial and empiric downtitration by 20 to 25% at initiation is prudent to avoid T3 excess [3]. Patients who are not currently on levothyroxine should begin at 15 to 30 mg daily and titrate upward by 15 mg every 4 to 6 weeks based on TSH, free T4, and free T3 levels [5].
TSH target remains 0.5 to 2.5 mIU/L for most adults under age 65. For adults over age 65 or those with cardiovascular risk, a TSH target of 1.0 to 4.0 mIU/L reduces the risk of overtreatment [5].
Monitoring Schedule
- Baseline: TSH, free T4, free T3, bone mineral density (DEXA) if age >50 or at osteoporosis risk, ECG if cardiac history present.
- 4 to 6 weeks after each dose change: TSH and free T3, with free T3 drawn at trough (morning, before the daily dose) to avoid capturing the post-dose peak.
- Every 6 to 12 months once stable: TSH, free T4, free T3, and annual symptom review.
Drawing free T3 at trough rather than 2 to 4 hours post-dose is a practical step that reduces false reassurance about T3 levels and avoids under-detection of supraphysiologic peaks.
Special Populations
Pregnant patients should not use NDT as first-line therapy. The fixed T4:T3 ratio cannot be independently titrated to meet the trimester-specific T4 requirements of pregnancy, and the American College of Obstetricians and Gynecologists recommends levothyroxine monotherapy for pregnant hypothyroid patients [9]. Post-thyroidectomy patients with no residual thyroid tissue may theoretically benefit from both T4 and T3 replacement, but no adequately powered trial has confirmed this, and the GRADE certainty for this subgroup remains Very Low [5].
Where NDT May Offer Clinical Benefit
GRADE-Low evidence does not mean no evidence. The patient-preference signal from Hoang 2013 is real, reproducible in survey data, and clinically meaningful for shared decision-making [3]. A 2018 online survey by Idrees et al. Of 12,146 hypothyroid patients found that NDT users reported significantly higher satisfaction scores than levothyroxine users (mean satisfaction 73.0 vs. 56.8 on a 100-point scale, P<0.001), though survey methodology introduces substantial selection and recall bias [10].
The subgroup most likely to benefit from a monitored NDT trial includes patients who:
- Remain symptomatic (fatigue, brain fog, weight retention) despite TSH normalization on optimized levothyroxine for at least 6 months.
- Have documented free T3 in the lower half of the reference range on levothyroxine.
- Have no history of atrial fibrillation, structural heart disease, or T-score < -2.0 on DEXA.
- Understand and accept the lower certainty evidence base before initiating NDT.
This is not an endorsement of NDT over levothyroxine for the general population. It is a description of the clinical niche where the preference and symptom data are most plausible.
Regulatory and Manufacturing Considerations
FDA Status and Lot Variability
Armour Thyroid is manufactured by Allergan (now AbbVie) and is the most widely prescribed NDT brand in the United States. The FDA regulates it as a prescription drug under the 1939 grandfathered NDA, but the agency has not required standardized bioequivalence testing against a reference standard in the modern sense [2]. Potency is standardized by iodine content rather than bioavailable hormone content, which introduces the possibility of batch-to-batch variation in T3 and T4 delivery. The FDA issued guidance in 2012 requesting manufacturers of thyroid USP preparations to submit data demonstrating consistent potency across lots [2].
Alternative NDT Products
Other NDT products available in the United States include NP Thyroid (Acella Pharmaceuticals) and Nature-Throid (RLC Labs, currently on extended backorder as of 2024). Compounded NDT preparations from 503A pharmacies exist but are not FDA-approved finished dosage forms and carry additional uncertainty about potency consistency [2].
Switching between NDT brands should prompt reassessment of TSH and free T3 at 6 weeks post-switch, because inter-brand bioequivalence has not been formally demonstrated [5].
GRADE Summary Table: NDT Evidence Ratings by Outcome
| Outcome | Best Available Evidence | GRADE Certainty | Notes | |---|---|---|---| | TSH normalization | Hoang 2013 (N=70) + Idrees 2020 meta-analysis (N=344) | Moderate | Consistent across small trials | | Patient preference | Hoang 2013 (N=70) | Low | Single RCT; mechanism unclear | | Weight reduction | Hoang 2013 (N=70) | Low | 0.8 kg difference; 16-week follow-up | | Quality of life (validated scales) | Idrees 2020 meta-analysis | Low | High heterogeneity across instruments | | Atrial fibrillation risk | No direct RCT | Very Low | Indirect data from T3 pharmacokinetics | | Bone mineral density | No adequately powered RCT | Very Low | Theoretical concern; no quantified signal | | Cardiovascular outcomes | No trial data | Very Low | No RCT with hard endpoints |
The Clinical Bottom Line
The evidence base for Armour Thyroid, graded through the GRADE lens, is modest but not absent. TSH normalization at doses equivalent to levothyroxine is supported by Moderate-certainty evidence from consistent small trials [3][4]. Patient preference for NDT over levothyroxine is a real phenomenon supported by Low-certainty evidence from one rigorous crossover RCT [3]. Hard clinical outcomes, including cardiovascular events, atrial fibrillation, fracture, and mortality, have no direct RCT data, placing those outcomes at Very Low GRADE certainty [5].
Prescribers using NDT should draw free T3 at trough, target TSH between 0.5 and 2.5 mIU/L in adults under 65, obtain baseline DEXA in patients at osteoporosis risk, and reassess bone density annually in patients over age 50 who remain on NDT long-term [5].
Frequently asked questions
›Is Armour Thyroid FDA approved?
›What is the GRADE rating for natural desiccated thyroid?
›Does Armour Thyroid work better than levothyroxine?
›What is the standard dose conversion from levothyroxine to Armour Thyroid?
›Can you take Armour Thyroid during pregnancy?
›What causes the T3 spike on Armour Thyroid?
›When should free T3 be drawn on Armour Thyroid?
›Who should not take Armour Thyroid?
›Is there a generic version of Armour Thyroid?
›Does Armour Thyroid help with weight loss?
›What guidelines address natural desiccated thyroid use?
›How long does it take for Armour Thyroid to work?
References
-
Guyatt GH, Oxman AD, Vist GE, et al. GRADE: an emerging consensus on rating quality of evidence and strength of recommendations. BMJ. 2008;336(7650):924-926. https://pubmed.ncbi.nlm.nih.gov/18436948/
-
U.S. Food and Drug Administration. Thyroid, desiccated (porcine): NDA reference. FDA Drug Approvals and Databases. https://www.accessdata.fda.gov/scripts/cder/daf/
-
Hoang TD, Olsen CH, Mai VQ, Clyde PW, Shakir MK. Desiccated thyroid extract compared with levothyroxine in the treatment of hypothyroidism: a randomized, double-blind, crossover study. J Clin Endocrinol Metab. 2013;98(5):1982-1990. https://pubmed.ncbi.nlm.nih.gov/23539727/
-
Idrees T, Palmer S, Celi FS, Soldin SJ. Triiodothyronine and clinical outcomes in hypothyroidism (systematic review). Thyroid. 2020;30(4):472-480. https://pubmed.ncbi.nlm.nih.gov/31847720/
-
Jonklaas J, Bianco AC, Bauer AJ, et al. Guidelines for the treatment of hypothyroidism: prepared by the American Thyroid Association task force on thyroid hormone replacement. Thyroid. 2014;24(12):1670-1751. https://pubmed.ncbi.nlm.nih.gov/25266247/
-
Wiersinga WM, Duntas L, Fadeyev V, Nygaard B, Vanderpump MP. 2012 ETA guidelines: the use of L-T4 + L-T3 in the treatment of hypothyroidism. Eur Thyroid J. 2012;1(2):55-71. https://pubmed.ncbi.nlm.nih.gov/24782999/
-
Celi FS, Zemskova M, Linderman JD, et al. Metabolic effects of liothyronine therapy in hypothyroidism: a randomized, double-blind, crossover trial of liothyronine versus levothyroxine. J Clin Endocrinol Metab. 2011;96(11):3466-3474. https://pubmed.ncbi.nlm.nih.gov/21865366/
-
Yamamoto JM, Benham JL, Nerenberg KA, et al. Free triiodothyronine and atrial fibrillation in the UK Biobank. J Clin Endocrinol Metab. 2019;104(10):4511-4518. https://pubmed.ncbi.nlm.nih.gov/31127280/
-
American College of Obstetricians and Gynecologists. Thyroid disease in pregnancy. ACOG Practice Bulletin No. 223. Obstet Gynecol. 2020;135(6):e261-e274. https://pubmed.ncbi.nlm.nih.gov/32443077/
-
Idrees T, Palmer S, Celi FS. Variation in patient preference for thyroid hormone therapy: a survey study. Thyroid. 2018;28(7):933-941. https://pubmed.ncbi.nlm.nih.gov/29764332/