We are pleased to respond to the letter of Munafo et al. [1] as follows: in our levothyroxine article [2], we concluded that “a statistical analysis conducted in the conceptual framework of individual bioequivalence would have enabled: (i) documentation of possible higher intra-individual variability for the new compared to the old formulation and, hence, possible reconsideration of development of this new formulation” and “(ii) consideration of a possible subject-by-formulation interaction, allowing both regulatory authorities and prescribing clinicians to be better placed to manage and systematically supervise all patients during transition from the old to the new formulation”. We are of the firm opinion that a fundamental responsibility of a drug company is to determine and report these two sources of biological variability, when, as is the case for levothyroxine (T4) [a drug having a Narrow Therapeutic Index (NTI)], a new formulation is imposed on millions of patients.

Switchability is the key issue and it is neither sufficient nor indeed appropriate to refer to a third source of variability, when this is simply ‘noise’ even though this ‘noise’ is specific to drugs such as levothyroxine, for which a baseline correction is in order. The reliability issues raised by correction for a baseline T4 level have been addressed previously, placed in perspective, and even already disputed by others [3, 4]. We re-iterate that it was not our intention to reanalyze this data set, for which the experimental design was not publicly available. Moreover, we made no claim that individual bioequivalence (BE) does not exist in this case. Therefore, exploration of all possible factors underlying the observation, from the raw data, that almost 70% of subjects enrolled in this trial were outside the a priori BE acceptance range was not our goal. In short, this value was for us simply a “warning signal”.

We return to and re-state our main message, namely that an analysis to determine average bioequivalence (ABE), conducted by tightening the a priori acceptance interval, cannot establish unequivocally switchability of the two Levothyrox® formulations. This fundamental message is not nullified by referring to difficulties in correcting for the baselineT4 level, required by the European Union (EU) guideline and which Merck used to demonstrate ABE. Rather, it should be understood that the 2010 EU guideline on ABE explicitly excluded the issues of substitution and switchability. The word ‘substitution’ is quoted once only as follows “Furthermore, this guideline does not cover aspects related to generic substitution as this is subject to national regulation” and the word ‘switchability’ is not mentioned in this guideline [5]. This is because there is no consensus among the various EU member states on the issues of BE acceptance criteria and switchability, as reviewed recently by Verbeeck and Musuamba [6]. For example, the Danish Health and Medicines Authority requires tighter acceptance limits of 0.90–1.11 for substances with an NTI, in respect of automatic substitution for a list of substances that are considered switchable. It is important to note that the Danish regulatory authority specifically excludes from this list three substances including thyroxine [6]. The Federal Agency for Medicines and Health Products of Belgium is even more conservative in its stance, having published a list of 31 substances, including thyroxine, considered to be “non-switchable” [6]. Consequently, we are far removed from harmonization between EU member states on the questions of thyroxine as a non-switchable substance and discouraging switching after initiation of therapy, whilst France chose to impose administratively on approaching 3 million patients replacement of an old to a new levothyroxine formulation.

In USA, and contrary to what the authors’ wrote, the question of BE for NTI drugs has been addressed in at least two product-specific BE recommendations (warfarin and tacrolimus) for which the scaled BE and variability comparison criteria are recommended to be applied to demonstrate BE. More precisely, Yu et al. giving a US Food and Drug Administration view stated that “The US Food and Drug Administration proposes that the bioequivalence of narrow therapeutic index drugs be determined using a scaling approach with a four-way, fully replicated, crossover design study in healthy subjects that permits the simultaneous equivalence comparison of the mean and within-subject variability of the test and reference products” [7]. This approach takes into account the within-subject variability (WSV). This is because the WSV (which in this instance also encompasses the interaction-formulation-by-patient) is regarded as critical for any NTI agent; it must not be a highly variable drug, i.e., a drug having a WSV >30% [7]. When it is planned to use 216 subjects to evaluate formulations for ABE, it is legitimate to ask whether this very high number of subjects was the consequence of a high WSV. We note that, in an overview on the WSV of NTI drugs, it was reported by the Food and Drug Administration that the mean WSV for levothyroxine was only 9.3%, with a range of 3.8–15.5%, for the area under the curve in nine BE trials [8]: and is it the case for Levothyrox®?

Finally, we propose that the concern of the authors, relating to the noise introduced and propagated by the T4 baseline correction, could be neatly solved by apportioning the observed variability to its different sources (noise vs. variability of biological interest) given the full replicate study design. In this regard, the burden of proof clearly lies with the company and not with ourselves or other external observers, having no access to the full dossier. When almost 70% of subjects were actually outside the a priori acceptance interval, it should come as no surprise to those who designed and interpreted results of the trial to be questioned with what we prudently termed a “warning signal” regarding switchability of the two formulations. Indeed, it is reported that one approach to assess the magnitude of subject-by-formulation interaction “is to determine what proportion of subjects have individual T/R mean ratios outside some predetermined interval” and “that our judgment (FDA) was that if the proportion of individuals outside 80–125% reached about 15%, that would constitute a large proportion” [9] or even a proportion of 10% [10].

Regarding the potential impact of mannitol on bioavailability and bioequivalence, in the case of levothyroxine, we refer to the review entitled “Impact of Osmotically Active Excipients on Bioavailability and Bioequivalence of BCS Class III Drugs” [11]. On the question of levothyroxine BE in patients vs. healthy volunteers, we cite publications reporting that BE in healthy volunteers is not indicative of BE in some sub-groups of patients [12, 13].