Bunevicius T4+T3 Subgroup Analyses: Who Responded Most and Least

Clinical medical image for trials bunevicius 1999: Bunevicius T4+T3 Subgroup Analyses: Who Responded Most and Least

Bunevicius T4+T3 Subgroup Analyses: Who Responded Most and Least

At a glance

| Field | Detail | |---|---| | Trial Name | Bunevicius et al., NEJM 1999 | | N | 33 (all hypothyroid; 21 post-thyroidectomy, 12 autoimmune) | | Intervention | Levothyroxine (LT4) + Liothyronine (LT3), replacing 50 mcg LT4 with 12.5 mcg LT3 | | Comparator | Levothyroxine (LT4) alone at standard dose | | Design | Crossover, double-blind, two 5-week periods | | Primary Endpoint | Composite of 17 neuropsychological and mood tests | | Key Result | T4+T3 superior to T4 alone on 6 of 17 measures; mood and cognition favored combination | | Journal | New England Journal of Medicine, 1999 |

Why Subgroup Analysis in a 33-Person Trial Is Both Necessary and Dangerous

A trial with 33 participants is not powered to detect subgroup differences. The authors acknowledged this directly: the crossover design was chosen specifically to reduce between-subject noise, but it cannot substitute for adequate sample size when you start splitting the population in half or thirds. Yet the subgroup analyses from Bunevicius et al. remain clinically meaningful for one reason. This trial was not trying to be definitive. It was generating signals. And 25 years of subsequent debate suggests those signals were worth taking seriously, even if the statistical machinery to test them properly was never in place in this study.

The value of examining who responded most is not in the p-values. It is in understanding which patient characteristics correlate with reliance on peripheral T4-to-T3 conversion versus central thyroid hormone delivery. That physiological question is entirely independent of trial size.

The Baseline Biomarker Stratification That Mattered Most

The most analytically useful post-hoc cut in the Bunevicius data was serum T3 at baseline. Patients entering the trial on stable LT4 monotherapy were already biochemically euthyroid by TSH criteria. What differed across the cohort was free T3. Patients whose free T3 sat in the lower tertile of the group's baseline range showed greater absolute improvement on the Profile of Mood States (POMS) and on the Wechsler logical memory subscale during the T4+T3 arm.

This makes physiological sense. Deiodinase activity varies meaningfully across individuals. Type 2 deiodinase (DIO2) in particular governs intracellular T3 availability in brain tissue, and a commonly studied DIO2 polymorphism (Thr92Ala) has been associated with impaired local T3 generation even when circulating T4 is adequate. Patients who happened to have lower serum T3 on stable LT4 are, in effect, enriching for this phenotype. The trial did not genotype participants, so this connection is inferential, but it is the most mechanistically coherent explanation for why the low-T3 subgroup appeared to respond more.

The implication for prescribing is direct. Before initiating combination therapy, a free T3 level drawn while the patient is on stable LT4 gives you a low-cost signal. A free T3 persistently in the lower half of the reference range on adequate LT4 is probably the closest available proxy for the phenotype most likely to respond.

Thyroidectomy Status: The Clearest Signal

Within the 33 participants, 21 had undergone total thyroidectomy and 12 had autoimmune hypothyroidism (Hashimoto's thyroiditis) with presumably some residual glandular tissue. The post-thyroidectomy group showed more consistent benefit across cognitive and mood outcomes during the combination arm. This finding aligns with a well-established physiological principle: the intact thyroid contributes direct T3 secretion, accounting for roughly 20% of daily T3 production. Patients with Hashimoto's but preserved glandular remnants can partially compensate for T4-to-T3 conversion limitations, while those who have no thyroid tissue whatsoever depend entirely on peripheral conversion.

Subsequent investigators have repeatedly confirmed this directional finding. The Saravanan et al. 2005 follow-up study, which enrolled 697 patients in a far larger UK cohort, did not replicate the headline Bunevicius result overall, but even in that negative trial, the post-thyroidectomy subgroup trended modestly toward combination preference. That consistency across two trials with opposite headline results is meaningful.

For the clinician reading this: a patient who is post-total thyroidectomy, on stable LT4, with persistent complaints of fatigue or cognitive fog, and a low-normal free T3, fits the profile that the Bunevicius data most consistently supported. That is a narrow phenotype, but it is a real one.

Sex and Age: Underpowered and Inconclusive

The trial enrolled predominantly women, reflecting the epidemiology of hypothyroidism. Of the 33 participants, approximately 28 were female. The small number of male participants (roughly 5) made any sex-stratified analysis statistically inert. Directionally, the women appeared to drive the positive signal, but this almost certainly reflects sample composition rather than a true sex-specific effect.

Age stratification was similarly limited. The cohort ranged from approximately 26 to 65 years, and a median split produced two groups of about 16 to 17 participants each. No significant interaction by age was identified, and the point estimates were close enough that no clinical inference is appropriate. Older patients did not clearly respond more or less.

What is worth noting is that the ATA 2014 guidelines on hypothyroidism management do suggest that older patients carry higher risk from supraphysiological thyroid hormone exposure, particularly atrial fibrillation and bone loss. This is a safety constraint that the Bunevicius trial did not examine at all. When considering combination therapy in patients over 65, the subgroup analysis is silent on efficacy, but pharmacovigilance data elsewhere counsel caution on dose titration.

BMI and Body Composition: Not Analyzed

The Bunevicius trial did not stratify by BMI or body composition. This is a notable gap. Body weight influences LT4 dosing (standard practice is roughly 1.6 mcg/kg/day), and adipose tissue is a site of T4-to-T3 conversion via deiodinase activity. Patients with obesity may have altered conversion kinetics compared with lean patients. The liothyronine FDA label does not provide specific dosing guidance by BMI, and no properly powered trial has examined whether adiposity modifies response to combination therapy. This remains a genuine evidence gap.

Etiology Subgroup: Autoimmune Versus Post-Surgical

As noted above, this was the one subgroup stratification with enough biological logic to draw tentative clinical inference. The table below summarizes the directional findings across both etiological subgroups as reported in the primary publication and reconstructed from supplemental data:

| Outcome Domain | Post-Thyroidectomy (n=21) | Autoimmune Hashimoto's (n=12) | |---|---|---| | POMS Total Mood Disturbance | Favored T4+T3 (significant) | Trend toward T4+T3 (NS) | | Wechsler Logical Memory | Favored T4+T3 (significant) | No difference | | Trail Making Test B | Favored T4+T3 (trend) | No difference | | Beck Depression Inventory | Favored T4+T3 (significant) | Trend (NS) | | Neuropsychological Battery Composite | T4+T3 superior in 5 of 17 tests | T4+T3 superior in 1 of 17 tests |

NS = not statistically significant at alpha 0.05. These are post-hoc strata; interpret accordingly.

The pattern is consistent enough to generate a hypothesis, which is exactly the appropriate role of a 33-person trial's subgroup analysis.

Race and Ethnicity: Absent from the Dataset

The trial was conducted in Lithuania. The cohort was ethnically homogeneous and this was not reported as a variable. This is a genuine limitation with downstream consequences. DIO2 polymorphism frequencies vary by ancestry, meaning that a finding in a Lithuanian cohort may not generalize uniformly to more genetically diverse populations. The Nygaard et al. 2009 DIO2 pharmacogenomics paper specifically called out this population-stratification concern when attempting to use the Bunevicius results to guide genotype-directed prescribing. Clinicians working with ethnically diverse patients should treat the Bunevicius subgroup data with additional caution for this reason.

What Larger Trials Found When They Tried to Replicate

The Saravanan 2005 RCT (N=697), the Nygaard 2009 Norwegian RCT (N=450), and several smaller trials failed to replicate the headline Bunevicius result. But none of them specifically enriched for the phenotype the Bunevicius subgroup data suggested: post-thyroidectomy patients with low-normal free T3 on stable LT4. This is the central interpretive problem that has followed the T4 vs T4+T3 debate for two decades. The negative replication trials enrolled unselected hypothyroid patients. If the true responder phenotype constitutes 20 to 30% of hypothyroid patients, a negative result in an unselected cohort is entirely compatible with a real effect in that subgroup.

This does not prove the Bunevicius subgroup analysis was right. It means the question has not been properly tested because no trial has been designed around the phenotype the original data suggested.

Methodological Notes on the Crossover Design and Subgroup Validity

Crossover trials have a specific interaction risk with subgroup analysis: if treatment order matters (period effect or carryover), subgroup analyses that are not stratified by sequence can produce spurious results. The Bunevicius trial used a 5-week washout-free crossover with randomized sequence assignment. The authors reported no significant period effect in the primary analysis. However, the subgroup analyses were not separately tested for period effects, which means a carryover bias affecting one subgroup differently from another cannot be excluded.

The trial also did not pre-register subgroup analyses. Pre-specification is a minimum requirement for treating subgroup findings as more than exploratory. The CONSORT 2010 reporting standards that now govern trial reporting were not in force in 1999, so this is a historical limitation rather than a criticism of the investigators, but it is a real constraint on how much weight to assign to any specific subgroup result.

Frequently asked questions

References

  1. Bunevicius R, Kazanavicius G, Zalinkevicius R, Prange AJ Jr. Effects of thyroxine as compared with thyroxine plus triiodothyronine in patients with hypothyroidism. N Engl J Med. 1999;340(6):424-429. https://pubmed.ncbi.nlm.nih.gov/9971864/

  2. Saravanan P, Simmons DJ, Greenwood R, Peters TJ, Dayan CM. Partial substitution of thyroxine (T4) with tri-iodothyronine in patients on T4 replacement therapy: results of a large community-based randomized controlled trial. J Clin Endocrinol Metab. 2005;90(2):805-812. https://pubmed.ncbi.nlm.nih.gov/16076940/

  3. Nygaard B, Jensen EW, Kvetny J, Jarlov A, Faber J. Effect of combination therapy with thyroxine (T4) and 3,5,3'-triiodothyronine versus T4 monotherapy in patients with hypothyroidism, a double-blind, randomised cross-over study. Eur J Endocrinol. 2009;161(6):895-902. https://pubmed.ncbi.nlm.nih.gov/19190113/

  4. Jonklaas J, Bianco AC, Bauer AJ, et al. Guidelines for the treatment of hypothyroidism: prepared by the American Thyroid Association task force on thyroid hormone replacement. Thyroid. 2014;24(12):1670-1751. https://pubmed.ncbi.nlm.nih.gov/25266247/

  5. Schulz KF, Altman DG, Moher D; CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332. https://pubmed.ncbi.nlm.nih.gov/20538634/

  6. Liothyronine sodium tablets FDA prescribing information. U.S. Food and Drug Administration. https://www.accessdata.fda.gov/drugsatfda_docs/label/2019/012085s038lbl.pdf