Clinical Pharmacokinetics

, Volume 52, Issue 12, pp 1033–1043

Random-Effects Linear Modeling and Sample Size Tables for Two Special Crossover Designs of Average Bioequivalence Studies: The Four-Period, Two-Sequence, Two-Formulation and Six-Period, Three-Sequence, Three-Formulation Designs

Authors

    • Department of BiostatisticsThe University of Kansas Medical Center
  • Michel J. Berg
    • Strong Epilepsy CenterUniversity of Rochester Medical Center
  • Ron Krebill
    • Department of BiostatisticsThe University of Kansas Medical Center
  • Timothy Welty
    • Department of Clinical Sciences, College of Pharmacy and Health SciencesDrake University
  • Barry E. Gidal
    • Department of Neurology, School of PharmacyUniversity of Wisconsin
  • Rita Alloway
    • Transplant Section, Division of Nephrology and Hypertension, Department of Internal Medicine, College of MedicineUniversity of Cincinnati
  • Michael Privitera
    • Epilepsy CenterUniversity of Cincinnati Neuroscience Institute
Leading Article

DOI: 10.1007/s40262-013-0103-4

Cite this article as:
Diaz, F.J., Berg, M.J., Krebill, R. et al. Clin Pharmacokinet (2013) 52: 1033. doi:10.1007/s40262-013-0103-4

Abstract

Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

1 Introduction

Generic drug substitution saved the US public more than US$734 billion from 1999 to 2008, with savings of approximately US$121 billion in 2008 alone [1]. Before being marketed, a generic drug needs to be examined through a bioequivalence study, which is a highly regulated clinical trial used to establish that the generic can be interchanged with the reference (usually brand) product without safety or efficacy concerns.

The US Food and Drug Administration (FDA) uses two bioavailability pharmacokinetic measures, the area under the drug concentration–time curve (AUC) and the maximum drug concentration (Cmax), to determine in vivo bioequivalence in the Abbreviated New Drug Application (ANDA) process. Although individual bioequivalence, population bioequivalence [2], or scaled average bioequivalence (ABE) [3] analyses are possible, bioequivalence studies are usually designed and powered with the main goal of examining ABE. However, individual, population or scaled bioequivalence examinations, or analyses of individual responses [4, 5] can be done as additional analyses if appropriate data are available. ABE is established when the 90 % confidence interval of the ratio of the test (typically generic) to reference (typically brand) products for the mean AUC (both the measured AUC to time of last blood sample and the calculated AUC to time infinity) and mean Cmax fall within an 80–125 % range; this is the decision rule of the so-called two one-sided test [6], required by the FDA. Randomized crossover designs are the most common designs for examining bioequivalence [7]. Although more complex crossover designs are possible and described in FDA guidances [7], the most common (and also the simplest) design consists of a two-period study, typically using single doses in healthy adults.

There is a growing awareness of the limitations of bioequivalence studies based on only two periods, and of those based on single doses. Firstly, individual responses to formulation switches cannot be reliably assessed with statistical methods if crossover designs based on three (ideally four) or more periods are not used [35]. A reason for this is that only studies in which there are subjects who receive a formulation in at least two periods, and subjects who receive at least two formulations in different periods, can be used to compute a subject-by-formulation interaction variance [4]. This variance, in turn, is an essential quantity to assess individual effects of formulations (see below). Convincing arguments have been advanced supporting that personalized medicine research needs to use more crossover designs with more than two periods [4], particularly in chronic diseases. Secondly, especially with formulations for chronic diseases, bioequivalence studies using single doses may sometimes not provide an accurate picture of the clinical environment realities, since there are pharmacological phenomena that can be detected only after a chronic administration of the formulation [8]. However, especially with highly variable drugs, the use of designs with multiple dosing at steady state is still controversial [3].

In the case of epilepsy, professional and patient support organizations around the world have expressed concerns about safety and efficacy with indiscriminate generic product substitution. These concerns are based on case reports and claims database analyses. These organizations have issued position statements stating that generic antiepileptic drug (AED) variability can be problematic for some people with epilepsy [9, 10]. Published reviews suggest that extra caution may be needed for patients at highest risk of seizure complications, such as pregnant patients, patients with recurrent status epilepticus, or patients who have been seizure-free for long periods of time and are driving. It has been argued that the total risks and benefits of generic substitution may not be fully understood [10].

Due to the level of concern and debate in the epilepsy medical community motivating resolution of the above controversy, and to the current FDA interest in revising their approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of AEDs, including the “Equivalence Among AED Generic and Brand Products in People With Epilepsy: Chronic-Dose 4-Period Replicate Design” and the “Equivalence Among AED Generic and Brand Products in People With Epilepsy: Single-Dose 6-Period Replicate Design”, named the EQUIGEN studies. (Summary protocols can be found at clinicaltrials.gov, identifiers NCT01713777 and NCT01733394.) The former study compares two disparate lamotrigine generics using chronic doses in a four-period, two-sequence randomized crossover design, whereas the latter examines two disparate generics simultaneously with the lamotrigine brand product using single doses in a six-period, three-sequence randomized crossover design, both studies in people with epilepsy.

During the design of the EQUIGEN studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA recommendation of using random-effects linear models for the analysis of data from bioequivalence studies [7]. These models, which are also named linear mixed-effects models, are currently state-of-the-art statistical tools for the analysis of crossover studies. Therefore, making available software and tables for sample-size and power computations for these types of studies should be welcomed by the pharmacological and statistical community. Although FDA guidelines provide a table for sample-size computations for four-period crossover designs such as the design of the chronic-dose EQUIGEN study [7, page 28], this table may not be adequate for some studies because it is based on the seemingly limiting assumption that the noise-adjusted bioavailability correlation between two formulations is 0. This correlation is defined in Sect. 3.4 of the current article. Also, the table is applicable only to narrow ranges of variance parameters, particularly the within-subject variance and the subject-by-formulation interaction variance.

An objective of this article is to present tables for sample-size evaluations of ABE studies based on the two randomized crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, termed here Design 1, and the six-period, three-sequence, three-formulation design, termed here Design 2. In particular, for these designs, tables including noise-adjusted bioavailability correlations higher than 0 are provided. Here, sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs, as currently required by the FDA. The tables were made with SAS programs that compute power through Monte-Carlo simulations (available in the Electronic Supplementary Material).

Although some methodological studies regarding power computations for random-effects linear models have been published (e.g., Heo and Leon [11], Muller et al. [12], and Verbeke and Molenberghs [13]), these studies have not specifically addressed hypothesis testing for bioequivalence studies or for the particular crossover designs in this article. Some approximate formulas for the computation of power in bioequivalence studies with crossover designs have been published [14], but these formulas assume that certain non-linear mixed-effect models are used in analyses. At present, however, these non-linear models are not usually used in bioequivalence studies. Nonetheless, it is generally accepted that the most straightforward approach to examining the power of studies with complex repeated-measure designs is to perform Monte-Carlo simulations, since the exact probability distributions of the statistics used in these studies are not usually known under the alternative hypotheses [7, 13]. This is particularly true in the case of bioequivalence studies. Thus, Monte-Carlo simulations are the approach that is followed in this article.

Unfortunately, random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze data from studies producing repeated measures. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients [4, 1518]. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample size computations. This description will also provide the notation, terminology, and interpretations that are needed to appropriately use the sample-size tables.

This article focuses on the ideas and concepts that pharmacologists, clinicians, and statisticians need to understand the utilization of random-effects linear models in the context of bioequivalence studies and to use the provided sample-size tables. This article does not provide a comprehensive introduction to the analysis of data with random-effects linear models as general tools for the analysis of repeated-measures data. Also, methods for estimating model parameters are out of the scope of this article. Examples of comprehensive treatments of random-effects linear models can be found elsewhere [13, 19, 20].

2 Crossover Study Designs

This section describes the two crossover designs for which sample-size tables are provided in this article and for which the statistical models are detailed. The designs are described in the context of the EQUIGEN studies in this section, although, of course, the designs can be used to investigate bioequivalence in many other types of drugs, not only AEDs, and both designs can be used in chronic- or single-dose studies. Some characteristics of the first design, Design 1, are described in FDA guidances [7]. Design 2 is a novel design used in the single-dose EQUIGEN study.

2.1 Four-Period, Two-Sequence, Two-Formulation Crossover Design (Design 1)

The EQUIGEN-Chronic Dose Study is a prospective, multicenter, blinded, crossover, sequence-randomized, two-sequence, four-period chronic dosing pharmacokinetic trial in people with epilepsy on concomitant AEDs. Each subject is administered each of two disparate lamotrigine products twice, giving a total of four different study periods. Each subject is randomized to one of the two following sequences, where G1 and G2 represent the two generics:
$$ {\text{Sequence }}1:{\text{G}}1{\text{-G}}2{\text{-G}}1{\text{-G}}2 $$
$$ {\text{Sequence }}2:{\text{G}}2{\text{-G}}1{\text{-G}}2{\text{-G}}1 $$

Since one of our goals is to describe the statistical model for this design in a more general setting than that of the EQUIGEN studies, we refer to the two generics as the test and reference formulation, following standard terminology of bioequivalence studies. Moreover, the statistical model for this design, which is presented in Sect. 3, is applicable to single-dose, not only to chronic-dose, designs, provided each subject is randomized to one of the two formulation sequences.

2.2 Six-Period, Three-Sequence, Three-Formulation Crossover Design (Design 2)

The EQUIGEN-Single Dose Study is a prospective, multicenter, blinded, crossover, sequence-randomized, three-sequence, six-period single-dose pharmacokinetic trial in people with epilepsy on concomitant AEDs. Each subject is administered each of three products of lamotrigine (the brand and its two most disparate generics) twice, giving a total of six different study periods. Each subject is randomized to one of the following three sequences, where G1 and G2 represent the two generics and B represents the brand:
$$ {\text{Sequence }}1:{\text{G}}1{\text{-G}}2{\text{-B-G}}1{\text{-B-G}}2 $$
$$ {\text{Sequence }}2:{\text{G}}2{\text{-B-G}}1{\text{-G}}2{\text{-G}}1{\text{-B}} $$
$$ {\text{Sequence }}3:{\text{B-G}}1{\text{-G}}2{\text{-B-G}}2{\text{-G}}1 $$
Advantages of the sequence order of Design 2 include:
  1. 1.

    Each period has all three products represented, which allows balancing period effects within product effects, if any, and therefore separating period from formulation effects in statistical analyses.

     
  2. 2.

    No product repeats in consecutive periods for any of the three sequences.

     
  3. 3.

    All three products are tested once in each sequence during the first three periods, providing for usable data if a subject withdraws from the study after the third period. These data can still be used in brand–generic and generic-to-generic comparisons.

     
  4. 4.

    Replicate administrations of the three products provide full information for performing individual bioequivalence, scaled bioequivalence, and outlier analyses [5].

     
  5. 5.

    Statistical data analysis may be performed by fitting only one random-effects linear model that simultaneously provides all means and variances needed to perform average, individual, and scaled bioequivalence analyses, as well as outlier analyses [5]; this guarantees a greater statistical efficiency than conducting separate two-sequence, four-period crossover studies of comparable sample sizes.

     
  6. 6.

    Other possible three-sequence alternative designs can be defined that meet the above criteria, but these designs are statistically equivalent to Design 2.

     

The statistical model for Design 2 is described in Sect. 4. In that section, the three formulations examined under Design 2 are called formulations 0, 1, and 2. Although Design 2 is used by EQUIGEN studies implementing a single lamotrigine dose per period, the described model and sample-size computations are applicable to chronic dose plans and other drugs as well.

3 Model of Bioavailability for Subjects Under Design 1

In this section the statistical model for Design 1 is described. Let \( Y \) be a measure of bioavailability from a particular subject, such as AUC or Cmax, which is obtained at each of the four periods. The measure \( Y \) can be obtained through compartmental or non-compartmental approaches [22]. Usually, the natural logarithm, \( { \ln }(Y) \), of this measure is used for modeling purposes, since log-transformations usually produce pharmacokinetic data that satisfy the usual assumptions of statistical linear models, an observation that has been reported by many authors [17, 21, 22]. Let \( T \) be an indicator covariate defined as 1 if the test formulation is administered to the subject and 0 if the reference formulation is administered. Let \( \varvec{X} \) be a vector of covariates, which usually contains indicators representing the period and indicators representing the sequence. Other covariates such as demographics, concomitant medications, genotypes, or indicators representing study sites in a multicenter study may be included in \( \varvec{X} \) for secondary analyses. For a particular subject, it is assumed that the relationship between a particularmeasure\( Y \) from the subject, the formulation \( T \) that produced the measure and the covariates vector \( \varvec{X} \) is given by Eq. 1:
$$ \ln (Y) = \alpha + \beta T +\varvec{\gamma}^{T} \varvec{X} + \epsilon , $$
(1)
where \( \alpha \) and \( \beta \) are characteristic constants of the subject that do not change during the study, and \( \varvec{\gamma} \) is a vector of regression coefficients that has the same value for all subjects. Since several \( Y \) measures are taken from the same subject across study periods, the value of \( \epsilon \) is assumed to vary from measure to measure \( Y \) from the subject, and thus \( \epsilon \) is viewed as an intra-subject random error assumed to be \( {\text{Normal}}(0,\sigma_\epsilon^2) \).

Although the values of \( \alpha \) and \( \beta \) are unique for a particular subject, these values are assumed to vary from subject to subject in the sense that, at the subject population level, \( \alpha \) and \( \beta \) are considered normally distributed random variables: \( \alpha \sim {\text{Normal}}(\mu_{\alpha } ,\sigma_{\alpha }^{2} ) \) and \( \beta \sim {\text{Normal}}(\mu_{\beta } , \sigma_{\beta }^{2} ) \). For this reason, \( \alpha \) and \( \beta \) are called random coefficients or ‘random effects’. Importantly, the variabilities of \( \alpha \) and \( \beta \) are considered to reflect real variation in the biological and environmental factors that shape a person as an individual [16]. That is, under this view of random-effects linear models, these variabilities are not mere mathematical artifacts for handling population’s heterogeneity [16]. Also, in general, \( \alpha \) and \( \beta \) may not be independent random variables; that is, \( \alpha \) and \( \beta \) may be correlated. An additional, usual assumption is that both \( \alpha \) and \( \beta \) are independent of \( \epsilon \).

The model in Eq. 1 implicitly assumes that the intra-subject variance \( \sigma_{\epsilon}^{2} \) is the same for bioavailability measures taken under the test and measures taken under the reference formulation. The FDA recommends following this assumption for power computations, but requires inclusion of the possibility of different variances for the test and reference formulations in actual data analyses [7]. The presentation in this article focuses on the model that assumes equal variances, but the described concepts are also applicable to the more general model incorporating different variances.

Suppose that, after administering the test formulation to a particular subject, the subject’s bioavailability response is \( Y_{1} \) and, after administering the reference formulation, his/her response is \( Y_{0} \). Then, we can write the two equations (Eq. 2):
$$ \ln (Y_{1} ) = \delta +\varvec{\gamma}^{T} \varvec{X} + \epsilon\quad {\text{and}}\quad\ln (Y_{0} ) = \alpha +\varvec{\gamma}^{T} \varvec{X} + \epsilon , $$
(2)
where \( \delta = \alpha + \beta \), because \( T = 1 \) if the subject received the test and \( T = 0 \) if the subject received the reference formulation. At the subject population level, \( {\text{Var}}(\delta ) \) and \( \sigma_{\alpha }^{2} = {\text{Var}}(\alpha ) \) are considered measures of between-subject variability under the test and reference formulation, respectively. We follow the FDA guidelines that suggest assuming \( {\text{Var(}}\delta )= \sigma_{\alpha }^{2} \) for power computations (although this assumption does not need to be made in statistical analyses) [7].

Denote \( \sigma^{2} = {\text{Var}}\left( \delta \right) = \sigma_{\alpha }^{2} \) and call \( \sigma^{2} \) the between-subjects variability. Whereas \( \sigma^{2} \) is interpreted in the context of the subject population, the variance \( \sigma_\epsilon^2 = {\rm Var}(\epsilon) \) is interpreted only in the context of a single subject and is therefore called ‘within-subject variability’.

As mentioned previously, the variabilities of the random coefficients \( \alpha \) and \( \beta \) are considered to be the result of real variation in biological and environmental factors, and are not just a mathematical trick to handle the variability of subjects’ pharmacological response. Thus, the variability in bioavailability across subjects that is caused by genetic variation must be incorporated in \( \sigma_{\alpha }^{2} \) and \( \sigma_{\beta }^{2} \) [17], and \( {\text{Var}}(\delta ) \) and \( \sigma_{\alpha }^{2} = {\text{Var}}(\alpha ) \) can be considered upper bounds of genetic variability [4]. In contrast, the variability \( \sigma_{\epsilon }^{2} \) of the random error \( \epsilon \) is considered within-subject variability that is caused by momentary (inter-occasion) changes in uncontrolled environmental and biological conditions that may affect drug bioavailability of particular subjects, or caused by the unavoidable imperfections of the laboratory procedures used to measure \( Y \); therefore, \( \sigma_{\epsilon }^{2} \) is not reflective of genetic variation or of overall or constant environmental factors.

3.1 Test Formulation Effect Size

The parameter \( \beta \) in Eq. 1 can be given an interesting interpretation by using the concept of relative percentiles [15, 17, 23]. Let \( 0 < p < 1 \). For a particular subject, let \( y_{1} (p) \) and \( y_{0} (p) \) be the \( p \times 100\,\% \) percentiles of the bioavailability response \( Y \) of the subject under the test and reference formulations, respectively. Since the two equations in Eq. 2 correspond to the same subject, it can be shown that the following equation is applicable to a particular subject (Eq. 3):
$$ \frac{{y_{1} \left( p \right)}}{{y_{0} \left( p \right)}} = \exp (\beta ) \quad{\text{for}}\, {\text{all}}\, p. $$
(3)
As a consequence, the quantity \( { \exp }(\beta ) \) can be viewed as a measure of relative bioavailability within the subject that compares the test and reference’s bioavailabilities occurring in the subject. Moreover, the quantity given by Eq. 4:
$$ E = (\exp (\beta ) - 1) \times 100, $$
(4)
measures the percent change in the bioavailability of the drug in the subject when the subject’s prescription is changed from the reference to the test formulation.

For a particular subject, the quantity \( E \) can be considered a measure of the size of the effect of the test formulation on the subject’s bioavailability response \( Y \), relative to the reference formulation [17, 2427]. Therefore, \( E \) allows assessment of the pharmacological importance of the difference in bioavailabilities between the two formulations, if there is a difference. As an illustration, if, for a particular patient, \( E = 15\,\% \), then we can infer that the bioavailability \( Y \) of the test formulation is 15 % higher than that of the reference formulation in the patient; and if \( E = - 15\,\% \), then the former bioavailability is 15 % lower than the latter.

3.2 Average Bioequivalence

In an analysis of ABE, only the quantity \( { \exp }(\mu_{\beta } ) \), that is, only the measure of relative bioavailability for ‘the average subject’, is examined. Here, the average subject is a subject for whom \( \beta = \mu_{\beta } \) and, therefore, for whom the test formulation effect size is given by Eq. 5:
$$ E^{*} = (\exp (\mu_{\beta } ) - 1) \times 100. $$
(5)
Ideally, \( \exp (\mu_{\beta } ) = 1 \), meaning that the bioavailability of the test and reference formulations in the average subject are exactly the same. However, such a perfection and agreement in the manufacture of drugs is not possible, something that regulatory agencies acknowledge by allowing a departure from this ideal situation. Thus, the test formulation is accepted to be bioequivalent to the reference formulation if there is evidence supporting the inequalities (Eq. 6):
$$ q^{ - 1} < \exp (\mu_{\beta } ) < q, $$
(6)
where \( q \) is a pre-specified number defined by the regulatory agency, \( q > 1 \). The reason for using the apparently asymmetric interval \( (q^{ - 1} ,\, q) \) is that it is immaterial which percentile, \( y_{1} \left( p \right) \) or \( y_{0} \left( p \right) \), goes in the numerator of \( y_{1} (p)/y_{0} (p) \) in Eq. 3. Current FDA guidelines [7] require \( q = 1.25 \) to be used, meaning that, in the average subject, the condition \( - 20\,\% \le E \le 25\,\% \) should be satisfied. That is, the bioavailability of the test formulation should not be more than 25 % larger than that of the reference formulation or more than 20 % lower, in the average subject.
In practice, the null hypothesis of non-bioequivalence (Eq. 7)
$$ H_{0} :\exp (\mu_{\beta } ) \le q^{ - 1}\quad {\text{or }}\exp (\mu_{\beta } ) \ge q $$
(7)
is rejected in favor of the alternative hypothesis of bioequivalence (Eq. 8)
$$ H_{1} : q^{ - 1} < \exp (\mu_{\beta } ) < q $$
(8)
if a confidence interval for \( \mu_{\beta } \) (usually of 90 % confidence) is contained in the interval \( [\ln (q^{ - 1} ),\ln (q)] \) [6].

The power of this statistical test depends on the actual value of \( E^{*} \). Other things being equal, for a fixed study sample size, the smaller the ‘true’ value of \( E^{*} \), the easier it is to reject the null hypothesis of non-bioequivalence, and therefore the more powerful is the test. For this reason, the regulatory agency requires that, for a proposed ABE study sample size, the power of an ABE test be reported for a value of \( E^{*} \) not smaller than a prespecified number, denoted here as \( E_{\text{m}}^{*} \) (5 % per the FDA guidelines [7]). Using the statistical jargon of power analyses, \( E_{\text{m}}^{*} \) can be named the ‘maximum detectable effect size’ because, unless the proposed sample size is increased, the existence of a larger effect size than \( E_{\text{m}}^{*} \) may render a power lower than that reported in the study protocol.

In this article, SAS® PROC IML and SAS® PROC MIXED were used to write the program for computing power for this hypothesis test. SAS® PROC MIXED was used in the way recommended by FDA guidelines [7, page 34]. Although the SAS® package was used to perform bioequivalence analyses, a number of other reliable commercial statistical packages such as STATA® or SPSS® are quite capable of fitting the model described by Eq. 1 and computing the above confidence interval.

3.3 Subject-by-Formulation Interaction

The model expressed in Eq. 1 implies that the effect size \( E \) varies from subject to subject, since \( \beta \) varies from subject to subject. That is, if \( \sigma_{\beta }^{2} > 0, \) there is an interaction between subjects and test formulation in the informal sense that a particular subject may ‘modify’ the effect size of the test formulation on the bioavailability \( Y \). The quantity \( \sigma_{\beta }^{2} \), which is usually called ‘subject-by-formulation interaction variance’, measures the extent to which subjects may modify this effect size. Statisticians usually think of an interaction between two variables as a process in which one variable modifies the effect of the other variable on a dependent variable; consistent with this conception of ‘interaction’, subjects are viewed here as a variable affecting the difference between the bioavailabilities of two formulations. A power computation for an ABE study requires a careful consideration of the value of this variance, since a value for \( \sigma_{\beta }^{2} \) needs to be used as an input for the computation.

3.4 Noise-Adjusted Bioavailability Correlation

Let \( \rho \) denote the correlation between \( \delta \) and \( \alpha \) across subjects, where \( \delta \) and \( \alpha \) are given in Eq. 2. Note that \( \rho \) is a characteristic of the subject population, not of a particular subject. By Eq. 2, in which \( \varvec{\gamma} \) is assumed to be a population constant, \( \rho \) can be interpreted as the correlation between the bioavailabilities of the two formulations, after controlling for intra-subject variation of bioavailability measures.

In general, other things being equal, subjects exhibiting a high bioavailability for the test formulation can be expected to show a high bioavailability for the reference formulation, relative to average bioavailabilities. Thus, it is reasonable to assume that \( \rho \) is positive for power computations. Sample-size tables provided by FDA guidelines seem to have been computed under the particular assumption that \( \rho = 0 \) [7]. Tables 1, 2, 3, and 4 of this article present sample sizes computed under other values of \( \rho \). Interestingly, as can be seen in Tables 1, 2, 3, and 4, the higher the value of \( \rho , \) the higher the sample size required to achieve a particular power in an ABE study. This means that assuming \( \rho = 0 \), as the FDA guidance [7] does, may overestimate the power of an ABE study, resulting in an artificially low sample size.
Table 1

Design 1, power ≥80 %

\( \sigma_{\beta } \)

\( \rho \)

\( \sigma_{\epsilon } \)

  

0.20

0.34

0.43

0.51

0.58

0.64

0.01

0.0

12

28

44

60

76

92

 

0.1

12

28

44

60

78

92

 

0.2

12

28

44

60

78

92

 

0.3

12

28

44

60

78

92

 

0.4

12

28

44

60

78

92

0.1

0.0

14

30

46

62

78

94

 

0.1

14

30

46

62

78

94

 

0.2

14

30

46

62

78

94

 

0.3

14

30

46

62

78

94

 

0.4

14

30

46

62

78

94

0.15

0.0

16

32

48

64

80

96

 

0.1

16

32

48

64

80

96

 

0.2

18

32

48

64

80

96

 

0.3

18

34

50

66

82

96

 

0.4

20

34

50

66

82

98

0.2

0.0

20

34

50

66

82

98

 

0.1

20

36

50

68

84

98

 

0.2

22

38

54

68

84

100

 

0.3

22

40

54

70

88

102

 

0.4

26

40

56

72

88

104

0.25

0.0

24

40

54

72

86

104

 

0.1

26

42

56

72

86

104

 

0.2

28

44

58

74

90

108

 

0.3

30

46

60

78

90

110

 

0.4

32

48

62

80

96

110

0.3

0.0

30

46

60

76

94

110

 

0.1

32

48

64

78

94

110

 

0.2

34

50

64

80

96

114

 

0.3

38

54

68

86

100

114

 

0.4

42

58

74

90

106

120

Smallest total sample sizes \( N \) producing a power ≥80 % for an average bioequivalence study using Design 1, and for a maximum detectable effect size of \( E_{\text{m}}^{*} = 5\,\% \). Described sample sizes are the sum of the required sample sizes for Sequences 1 and 2

\( \sigma_{\beta } \), subject-by-formulation interaction standard deviation; \( \rho \), noise-adjusted bioavailability correlation; \( \sigma_{\epsilon } \), within-subject standard deviation

Table 2

Design 1, power ≥90 %

\( \sigma_{\beta } \)

\( \rho \)

\( \sigma_{\epsilon } \)

  

0.20

0.34

0.43

0.51

0.58

0.64

0.01

0.0

14

36

56

78

98

122

 

0.1

14

36

58

78

98

124

 

0.2

14

36

58

80

100

124

 

0.3

14

36

58

80

100

124

 

0.4

14

36

58

80

102

124

0.1

0.0

18

38

60

82

102

124

 

0.1

18

40

60

82

104

126

 

0.2

18

40

60

82

104

128

 

0.3

18

40

62

82

104

128

 

0.4

18

40

62

84

104

128

0.15

0.0

20

42

62

84

106

126

 

0.1

22

42

66

86

108

128

 

0.2

22

44

66

86

108

128

 

0.3

24

46

66

86

108

130

 

0.4

24

46

68

88

108

132

0.2

0.0

24

46

68

88

110

132

 

0.1

26

48

68

90

110

134

 

0.2

28

48

70

90

112

136

 

0.3

30

50

72

94

114

136

 

0.4

34

52

74

96

116

138

0.25

0.0

32

52

74

96

118

140

 

0.1

34

56

76

96

120

142

 

0.2

36

58

78

100

120

144

 

0.3

40

58

80

102

120

144

 

0.4

44

64

86

106

128

146

0.3

0.0

40

60

80

104

126

148

 

0.1

42

64

84

106

128

148

 

0.2

46

66

90

108

130

152

 

0.3

52

72

94

112

136

156

 

0.4

56

76

98

118

144

162

Smallest total sample sizes \( N \) producing a power ≥90 % for an average bioequivalence study using Design 1, and for a maximum detectable effect size of \( E_{\text{m}}^{*} = 5\,\% \). Described sample sizes are the sum of the required sample sizes for Sequences 1 and 2

\( \sigma_{\beta } \), subject-by-formulation interaction standard deviation; \( \rho \), noise-adjusted bioavailability correlation; \( \sigma_{\epsilon } \), within-subject standard deviation

Table 3

Design 2, power ≥80 %

\( \sigma_{\beta } \)

\( \rho \)

\( \sigma_{\epsilon } \)

  

0.20

0.34

0.43

0.51

0.58

0.64

0.01

0.0

12

27

42

60

75

93

 

0.1

12

30

42

60

75

93

 

0.2

12

30

45

60

75

93

 

0.3

12

30

45

60

75

93

 

0.4

12

30

45

60

75

93

0.1

0.0

15

30

48

60

78

93

 

0.1

15

30

48

63

78

93

 

0.2

15

30

48

63

78

93

 

0.3

15

30

48

63

78

93

 

0.4

15

33

48

63

78

96

0.15

0.0

15

33

48

63

81

96

 

0.1

18

33

48

66

81

96

 

0.2

18

33

51

66

81

99

 

0.3

18

36

51

66

81

99

 

0.4

18

36

51

66

81

99

0.2

0.0

21

36

51

66

84

99

 

0.1

21

36

54

66

84

99

 

0.2

21

39

54

69

84

99

 

0.3

24

39

54

69

87

99

 

0.4

24

42

57

75

90

105

0.25

0.0

24

42

54

72

87

102

 

0.1

24

42

57

75

90

102

 

0.2

27

42

60

75

93

105

 

0.3

30

45

60

78

93

108

 

0.4

33

48

63

78

96

108

0.3

0.0

30

45

60

78

90

111

 

0.1

30

45

63

81

93

111

 

0.2

33

51

66

81

96

111

 

0.3

36

54

69

84

99

114

 

0.4

42

57

69

87

99

117

Smallest total sample sizes \( N \) producing a power ≥80 % for an average bioequivalence study using Design 2, and for a maximum detectable effect size of \( E_{\text{m}}^{*} = 5\,\% \). Described sample sizes are the sum of the required sample sizes for Sequences 1, 2, and 3

\( \sigma_{\beta } \), subject-by-formulation interaction standard deviation; \( \rho \), noise-adjusted bioavailability correlation; \( \sigma_{\epsilon } \), within-subject standard deviation

Table 4

Design 2, power ≥90 %

\( \sigma_{\beta } \)

\( \rho \)

\( \sigma_{\epsilon } \)

  

0.20

0.34

0.43

0.51

0.58

0.64

0.01

0.0

15

36

57

78

99

123

 

0.1

15

36

57

78

99

123

 

0.2

15

36

57

81

99

123

 

0.3

15

36

57

81

102

123

 

0.4

15

36

57

81

102

126

0.1

0.0

18

39

63

81

102

126

 

0.1

18

39

63

84

102

126

 

0.2

18

39

63

84

102

126

 

0.3

18

39

63

84

105

126

 

0.4

18

39

63

84

105

126

0.15

0.0

21

42

63

84

108

126

 

0.1

24

42

66

84

108

126

 

0.2

24

45

66

87

108

129

 

0.3

24

45

66

87

108

129

 

0.4

24

48

69

90

111

132

0.2

0.0

24

48

69

90

111

132

 

0.1

27

51

69

93

114

132

 

0.2

27

51

72

93

114

135

 

0.3

33

51

75

93

114

135

 

0.4

33

54

75

96

120

135

0.25

0.0

33

54

75

93

114

138

 

0.1

33

54

78

99

117

141

 

0.2

36

57

81

99

117

141

 

0.3

39

57

81

105

123

144

 

0.4

42

63

84

108

129

150

0.3

0.0

39

60

81

102

123

147

 

0.1

42

63

84

108

126

150

 

0.2

45

66

87

108

126

153

 

0.3

48

72

90

111

135

156

 

0.4

54

75

99

117

141

162

Smallest total sample sizes \( N \) producing a power ≥90 % for an average bioequivalence study using Design 2, and for a maximum detectable effect size of \( E_{\text{m}}^{*} = 5\,\% \). Described sample sizes are the sum of the required sample sizes for Sequences 1, 2, and 3

\( \sigma_{\beta } \), subject-by-formulation interaction standard deviation; \( \rho \), noise-adjusted bioavailability correlation; \( \sigma_{\epsilon } \), within-subject standard deviation

4 Model of Bioavailability for Subjects Under Design 2

In Design 2, three formulations, denoted formulations 0, 1, and 2, are compared. Three comparisons are possible (1 vs. 0, 2 vs. 0, and 1 vs. 2). Assuming that we want to compare 1 versus 0 (or 2 vs. 0), the model for a particular subject and a particular measure\( Y \) from the subject is formulated as follows (Eq. 9):
$$ \ln (Y) = \alpha + \beta_{1} T_{1} + \beta_{2} T_{2} + {\varvec{\gamma}}^{T} X + \epsilon , $$
(9)
where \( T_{1} = 1 \) if \( Y \) is the subject’s measured bioavailability under formulation 1, and \( T_{1} = 0 \) otherwise; and \( T_{2} = 1 \) if \( Y \) is the bioavailability under formulation 2, and \( T_{2} = 0 \) otherwise. In this model, each subject has his/her own, unique values of \( \alpha , \)\( \beta_{1} \), and \( \beta_{2} \), although these values vary from subject to subject following the probability distributions given by Eq. 10:
$$ \alpha \sim {\text{Normal(}}\mu_{\alpha } , \sigma_{\alpha }^{2} ),\beta_{1} \sim {\text{Normal(}}\mu_{{\beta_{1} }} , \sigma_{{\beta_{1} }}^{2} )\,{\text{and }}\beta_{2} \sim {\text{Normal(}}\mu_{{\beta_{2} }} , \sigma_{{\beta_{2} }}^{2} ) . $$
(10)
Also, \( \alpha , \)\( \beta_{1} \), and \( \beta_{2} \) may not be independent from each other, although they are considered to be independent of \( \epsilon \). An analogous model equation can be written if a comparison between formulations 1 and 2 is needed.

Analogously to the model in Eq. 1, \( \sigma_{10}^{2} = \sigma_{{\beta_{1} }}^{2},\sigma_{20}^{2} = \sigma_{{\beta_{2} }}^{2} \), and \( \sigma_{12}^{2} = {\text{Var(}}\beta_{1} - \beta_{2} ) \) are interpreted as subject-by-formulation interaction variances; \( \sigma_{\epsilon }^{2} \) is interpreted as a within-subject variance; the noise-adjusted correlation between the bioavailabilities of formulations 1 and 0 is defined as \( \rho_{10} = {\text{Corr}}(\alpha , \alpha + \beta_{1} ) \), between 2 and 0 as \( \rho_{20} = {\text{Corr}}(\alpha , \alpha + \beta_{2} ) \), and between 1 and 2 as \( \rho_{12} = {\text{Corr}}(\alpha + \beta_{1} , \alpha + \beta_{2} ) \); and \( \varvec{\gamma} \) is usually considered a vector of subject population constants. Between-subject variances of bioavailability measures under formulations 0, 1, and 2 are defined as \( \sigma_{\alpha }^{2} \), \( {\text{Var}}(\alpha + \beta_{1} ) \) and \( {\text{Var}}(\alpha + \beta_{2} ) \), respectively.

For practical reasons, in power computations it is assumed that there is a single between-subject variance \( \sigma^{2} \) across the three formulations, that is (Eq. 11):
$$ \sigma^{2} = {\text{Var}}(\alpha + \beta_{1} ) = {\text{Var}}(\alpha + \beta_{2} ) = \sigma_{\alpha }^{2} ; $$
(11)
a single subject-by-formulation interaction variance, denoted here as \( \sigma_{\beta }^{2} \), that is (Eq. 12):
$$ \sigma_{\beta }^{2} = \sigma_{10}^{2} = \sigma_{20}^{2} = \sigma_{12}^{2} ; $$
(12)
and a single noise-adjusted correlation \( \rho , \) that is (Eq. 13):
$$ \rho = \rho_{10} = \rho_{20} = \rho_{12} . $$
(13)
These assumptions, however, do not need to be made, and are not usually made, when fitting the model in Eq. 9 and testing bioequivalence.
Analogously to Design 1, the effect size for the average subject, comparing formulations 1 and 0, is given by Eq. 14:
$$ E^{*} = (\exp (\mu_{{\beta_{1} }} ) - 1) \times 100. $$
(14)
ABE between formulations 1 and 0 is declared if a confidence interval for \( \mu_{{\beta_{1} }} \) is contained in the interval \( [\ln (q^{ - 1} ),{ \ln }(q)] \), where \( q > 1 \) is prespecified by the regulatory agency. To compute the power of this test, a maximum detectable effect size \( E_{\text{m}}^{*} \) must be input into the computer program (5 % per the FDA guidelines), as well as \( \sigma_{\epsilon }^{2} \), a sample size \( N \) that must be a multiple of 3, \( \sigma_{\beta }^{2} \), and \( \rho \).

5 Parameter Values Required for Power Computations for Design 1 or 2

In summary, according to the model descriptions in Sects. 3 and 4, the following parameters are needed to compute the power for examining ABE when using random-effects linear models and crossover Design 1 or 2:
  1. 1.

    A maximum detectable effect size for the average subject, \( E_{\text{m}}^{*} \). FDA guidelines [7] require using \( E_{\text{m}}^{*} = 5\,\% \) or, equivalently, \( \mu_{\beta } = \log (1.05) \).

     
  2. 2.

    The within-subject variance \( \sigma_{\epsilon }^{2} \).

     
  3. 3.

    The sample size \( N \). For Design 1, \( N \) must be an even number so that the same numbers of subjects are allocated to the two sequences. For Design 2, \( N \) must be a multiple of 3.

     
  4. 4.

    The subject-by-formulation interaction variance \( \sigma_{\beta }^{2} \).

     
  5. 5.

    The noise-adjusted bioavailability correlation \( \rho \).

     
Alternatively, for the possible situation in which the value of the between-subject variability \( \sigma^{2} \) is known or can be conjectured instead of either \( \sigma_{\beta }^{2} \) or \( \rho \), we have derived the following formula, which can be utilized to compute \( \sigma_{\beta }^{2} \) or \( \rho \) (Eq. 15).
$$ \sigma^{2} = \frac{{\sigma_{\beta }^{2} }}{{2\left( {1 - \rho } \right)}} . $$
(15)
Equation 15, which is proved in the Appendix, is valid for both Designs 1 and 2. Similarly to the FDA guidelines [7], our power computations assume that there are no period or sequence effects on bioavailability, that is, that \( \varvec{\gamma} = 0. \) This assumption is justified if sufficiently long washout time intervals are used and if efforts to keep steady study conditions throughout the entire study are made. This assumption, however, does not need to be made when fitting the model in Eq. 1, or that in Eq. 9, in an actual bioequivalence analysis.

6 Tables for Sample Size Computations

We used the SAS® programs to elaborate tables that give optimal total sample sizes for Design 1 (Tables 1 and 2) and Design 2 (Tables 3 and 4) for selected values of \( \sigma_{\epsilon } \), \( \sigma_{\beta } \), and \( \rho \), with \( E_{\text{m}}^{*} = 5\,\% \). Tables 1 and 3 give the smallest sample sizes producing a power of at least 80 %, and Tables 2 and 4 give the smallest sample sizes producing a power of at least 90 %. The programs compute the power of a study of a particular sample size by simulating 5,000 studies of the same sample size using the model described by Eq. 1 in the case of Design 1, or the model described by Eq. 9 in the case of Design 2. To build Table 1 or 3, studies of different sample sizes were simulated under Design 1 or 2, respectively, and the smallest sample size producing a power of at least 80 % was recorded in the table. A similar methodology was followed for building Tables 2 and 4.

As mentioned in Sect. 3.4, and as seen in the tables, higher values of \( \rho \) usually demand larger sample sizes to achieve a desired power. For actual power computations, the value of \( \rho \) can be obtained from pilot studies. In the absence of empirical information, we believe that \( \rho \) in many studies can be assumed to be equal to 0.4. This is because clinical research rarely finds correlations higher than this. Also, as expected, larger within-subject standard deviations (\( \sigma_{\epsilon } \)) or larger subject-by-formulation interaction standard deviations (\( \sigma_{\beta } \)) tend to imply larger sample sizes.

The tables show that the correlation \( \rho \) between bioavailabilities of different products does not have a substantial impact on study sample size when \( \sigma_{\beta } \) is relatively small, especially for Design 1. However, for both designs \( \rho \) does have an impact for moderate or large values of \( \sigma_{\beta } \). For instance, for Design 1, if \( \sigma_{\epsilon } = 0.34 \) and \( \sigma_{\beta } = 0.1 \), the optimal sample size \( N \) achieving a power of at least 80 % is 30, regardless of whether \( \rho = 0 \) or 0.4 (Table 1). In contrast, if \( \sigma_{\epsilon } = 0.34 \) and \( \sigma_{\beta } = 0.25 \), the optimal sample size jumps upwards from 40 for \( \rho = 0 \) to 48 for \( \rho = 0.4 \) (Table 1). In general, the larger the value of \( \sigma_{\beta } \), the higher the relevance of \( \rho \) in the computation of optimal sample sizes in ABE studies. For Design 2, \( \rho \) may be very relevant in the presence of small values of \( \sigma_{\beta } \).

7 Discussion and Conclusion

This article describes a computer program and tables for sample size and power computations for two important ABE designs. We hope that this program and the tables fill, at least in part, a gap in bioequivalence research: the lack of commercial or non-commercial software and tables for power and sample sizes supporting random-effects linear modeling in bioequivalence studies. It is surprising that these tools are not available yet, despite the fact that the FDA issued guidelines recommending the use of this type of modeling more than a decade ago (FDA) [7], and that there are available commercial and non-commercial software packages that include modules for random-effects linear modeling (for instance, SAS®, STATA®, SPSS®, etc.). Although FDA guidelines provide some sample-size tables, these tables are incomplete since only a limited range of model parameters are considered. Moreover, the guidelines do not provide tables for more complex designs such as Design 2 used by the EQUIGEN studies.

Beyond the practical need of following the recommendations of regulatory agencies in the design of bioequivalence studies, there are important scientific reasons for implementing sample-size software and tables supporting random-effects linear modeling. Random-effects linear models are sophisticated statistical and mathematical tools that are changing the landscape of pharmacological and personalized medicine research, and there are strong reasons to believe that in the future these models will constitute the natural mathematical language of drug dosage individualization and of the discovery and applications of personalized medicine interventions in general [4, 1618].

There is an increasing awareness about the importance of a careful assessment of between-patient and within-patient variabilities in the design of personalized pharmacological interventions [16, 18, 28], and also of a correct quantification of the subject-by-formulation interaction variance before evaluating the clinical utility of results from personalized medicine research [4]. This applies not only to pharmacokinetic but also to pharmacodynamic responses to medical treatment. Not surprisingly, assessing these variabilities is also very important in the design and analyses of bioequivalence studies, as shown in Sects. 35.

Arguments have been advanced that individual effects of medical treatments cannot be accurately assessed if subject-by-formulation interaction variances are not computed in medical research [4]. In fact, this variance cannot be separated from other types of variance when using parallel designs or using crossover designs with only two periods [4]. It is no exaggeration to say that being able to estimate this variance is the main methodological advantage when using a crossover design with more than two periods.

The descriptions of the models for Designs 1 and 2, and Tables 1, 2, 3, and 4, also show the importance of considering the correlation \( \rho \) between the bioavailabilities of the examined pharmacological products when computing an optimal sample size for an ABE study. Ignoring this correlation in computations and assuming that it is 0 may underpower the study.

For power and sample size computations in ABE studies, the current work and Tables 1, 2, 3, and 4 also show that a careful assessment of the correlation between the bioavailabilities of test and reference formulations (\( \rho \)) may be needed. In fact, power or sample-size computations for ABE studies using Design 1 or 2 require a previous, educated conjecture as to the extent of the within-subject variability \( \sigma_{\epsilon }^{2} \) and, by Eq. 15, the extent of at least two of the three following quantities: between-subject variability \( \sigma^{2} , \) subject-by-formulation interaction variability \( \sigma_{\beta }^{2} \), and bioavailability correlation \( \rho \). For instance, Eq. 15 can be used to assess the value of \( \sigma_{\beta }^{2} \) in situations in which the values of \( \sigma^{2} \) and \( \rho \) are easier to conjecture. In other situations, the values of both \( \sigma_{\beta }^{2} \) and \( \sigma^{2} \) may be easier to conjecture, and in those cases Eq. 15 needs to be used before using Tables 1, 2, and 3.

Tables 1, 2, 3, and 4 illustrate that large within-subject variances demand relatively large sample sizes for ABE studies, which has consequences for research, regulation, and ethics that are analyzed by Tothfalusi et al. [3]. As suggested by these authors, scaled ABE analyses may be more appropriate in cases of highly variable drugs, although power and sample-size computations for scaled ABE studies would require software and tables different to those for ABE studies.

Conflict of interest and Sources of Funding

The development of the six-period crossover design and the SAS/IML programs for power and sample size computations were made in the context of the “Chronic-Dose Bioequivalence Study of Generic Antiepileptic Drugs” (Chronic-dose EQUIGEN) and the “Single-Dose Bioequivalence Study of Generic Antiepileptic Drugs”, (Single-dose EQUIGEN) studies, which are currently funded by the US Food and Drug Administration (FDA), the Epilepsy Foundation, and the American Epilepsy Society. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the supporting institutions. The SAS/IML programs were written by Dr. Diaz. For the writing of this article, or for developing the sample-size tables reported in this article, Dr. Diaz did not receive funding from the above institutions or any other type of external funding, and none of the other authors paid a salary to him. Dr. Privitera has received research support from the FDA, Epilepsy Foundation, American Epilepsy Society, UCB, and Eisai, and has served on data safety monitoring boards for Upsher-Smith and Eli Lilly. The authors declare that they have no conflict of interest.

Supplementary material

40262_2013_103_MOESM1_ESM.txt (31 kb)
Supplementary material 1 (TXT 30 kb)

Copyright information

© Springer International Publishing Switzerland 2013