Dear Editor,

Retrospective analyses of data generated in clinical trials can provide valuable contributions to drug development and stimulate innovation and progress. Concordet et al. recently published an opinion paper that questioned the use of average bioequivalence by health authorities for evaluation of a drug like Levothyrox® [1]. They propose instead a “conceptual framework of individual bioequivalence”. The concept of individual bioequivalence has been a topic of intense debate, but no regulatory agency has endorsed a general framework for individual bioequivalence and there are no reference ranges, guidance or directives on its use. This opinion paper is built on a post-hoc analysis of data from a clinical study (EudraCT No: 2013-000274-29) that had demonstrated bioequivalence between the new and old formulations of levothyroxine. The introduction of the new formulation has elicited numerous reports of adverse events in France, which led the authors to investigate the notion of “switchability”, using an individual exposure ratio (IER).

The authors acknowledge that the study and its analysis were performed according to applicable guidelines in France, Europe and the USA, yet argue for an individual approach and lean towards a possible subject-by-formulation interaction explaining the outcome of their analysis. We propose an alternative perspective to the hypotheses brought by Concordet et al.

The authors calculated the IER for the 204 subjects who contributed to the primary analysis in the trial and observed that “50% […] were actually outside the a priori bioequivalence range”. No such established bioequivalence range exists for individual estimates. In addition, the spread of IER estimates may be explained largely by factors other than a putative “subject-by-formulation” interaction, as speculated.

Bioequivalence in this study was achieved where the 90% confidence intervals around the point estimate for geometric mean ratios of key pharmacokinetic parameters lay between 0.9 and 1.11. This was consistent with the narrow therapeutic index of levothyroxine, in comparison to the “usual” range of 0.8–1.25. This is not a validated range for ratios calculated for each individual, and there is no scientific rationale for categorising individual responses according to population boundaries.

Furthermore, the study was not designed to evaluate subject-by-formulation interaction, in accordance with applicable guidelines. It is known that there can be a link between the variance term measuring subject-by-formulation interaction and the spreading of IER values. Although we cannot exclude that the subject-by-formulation might contribute to the widening of the distribution of the IER, we propose an alternative explanation.

The exposure metrics for levothyroxine were assessed with or without adjustment for the baseline thyroxine (T4) level. In the reported clinical study in healthy subjects, endogenous T4, i.e., before levothyroxine treatment, represented about two thirds of the total exposure to T4 (area under the concentration–time curve) after treatment. Each concentration measurement bears an analytical error (assay precision), and we performed simulations to investigate the propagation of errors associated with the calculation of the IER, on unadjusted and adjusted concentrations. We simulated a cross-over study in 10,000 subjects, receiving levothyroxine twice, and assuming that the only difference in subjects’ pharmacokinetic profiles was due to bioanalytical error. The pharmacokinetic profiles were simulated assuming identical and independent lognormal distributions of individual concentrations, with the means based on the geometric mean profiles seen in the study, and assuming a constant standard deviation of 0.10 (corresponding to a coefficient of variation of 10%, which is very comparable to the precision of the bioanalytical method). From these simulated concentrations, we derived the IER on both the baseline-adjusted and the unadjusted area under the concentration–time curve, calculated using the linear trapezoidal rule as also employed by Concordet et al. The results are displayed in Table 1 and Fig. 1.

Table 1 Summary outcome of the individual exposure ratio calculated from 10,000 pairs of simulated total thyroxine levels without and with baseline subtraction
Fig. 1
figure 1

Distribution of individual exposure ratio (area under the curve test/area under the curve reference) obtained with unadjusted thyroxine (left panel) and baseline-adjusted thyroxine (right panel) simulated plasma levels. Red and blue vertical dotted lines represent ranges of 0.8–1.25 and 0.9–1.11, respectively. Pharmacokinetic profiles for 10,000 subjects were simulated over the course of 72 h, assuming a bioanalytical error of 10%, no difference in pharmacokinetic profiles under both formulations and the absence of a subject-by-formulation interaction

The resulting distribution of IER for adjusted T4 is similar to that seen in the calculations of Concordet et al., with 66.0% of individual ratios outside the 0.9–1.11 range. For unadjusted IER, the proportion of individual ratios falling outside the range (4.75%) was lower than in the study. Hence, even starting from an unadjusted IER distribution narrower than in the original data, applying an experimental (bioanalytical) imprecision to the T4 levels widens considerably the spread of IERs, without any other source of variability.

Thus, the distribution of the IER can widen substantially, irrespective of the magnitude of a putative subject-by-formulation interaction. This is owing to the (proportionally large) baseline subtraction, and computation of the ratio from the smaller adjusted area under the concentration–time curves. This inflated the variability and probably explains the statement that more than 50% of the healthy volunteers’ IER fell outside a 0.9–1.11 range. Accordingly, this spreading may be artefactual and largely the result of inflation of the analytical (im)precision due to the calculation.

The authors’ hypothesis that a putative underlying subject-by-formulation interaction could be due to the inclusion of mannitol in the formulation is not supported by data. Mannitol can accelerate intestinal transit in a dose-dependent manner, but at concentrations 11- to 34-fold higher than that used in a 100-µg dose of Levothyrox® [2]. This effect is unlikely to drive significantly different absorption behaviours in individual subjects.

The authors also question the appropriateness of using healthy volunteers to assess bioequivalence for levothyroxine. The use of healthy volunteers in bioequivalence trials allows a more straightforward comparison of pharmacokinetics, offers fewer opportunities for confounding effects, is typically more sensitive to any changes in formulations, especially after single-dose administration, and is considered the gold standard for these studies. Healthy euthyroid subjects show less physiological variation than patients, with little, if any, impact on thyroid function of variables such as physical training, body habitus, posture, immobilisation, exercise, ambulatory status or geographic environment. If patients were enrolled in this study, other medicines, organ dysfunction or even the disease itself could vary between the time of administrations of both formulations to the individual subjects and confuse the findings.

In conclusion, we believe that the findings of Concordet et al. can be explained largely by the propagation of experimental errors implied by the calculation, rather than any subject-by-formulation interaction. We also believe that the model of single-dose bioequivalence assessment in healthy volunteers, be it with or without baseline subtraction, provides the most stringent test conditions to identify any possible difference between formulations. Real-world data from other countries where the new formulation has been launched show no increased safety reporting and no impact on the benefit-risk balance of the product and hence, does not corroborate the authors’ conjecture of non-bioequivalence for more than 50% of the population.