# Random-Effects Linear Modeling and Sample Size Tables for Two Special Crossover Designs of Average Bioequivalence Studies: The Four-Period, Two-Sequence, Two-Formulation and Six-Period, Three-Sequence, Three-Formulation Designs

## Authors

- First Online:

DOI: 10.1007/s40262-013-0103-4

- Cite this article as:
- Diaz, F.J., Berg, M.J., Krebill, R. et al. Clin Pharmacokinet (2013) 52: 1033. doi:10.1007/s40262-013-0103-4

- 2 Citations
- 194 Views

## Abstract

Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

## 1 Introduction

Generic drug substitution saved the US public more than US$734 billion from 1999 to 2008, with savings of approximately US$121 billion in 2008 alone [1]. Before being marketed, a generic drug needs to be examined through a bioequivalence study, which is a highly regulated clinical trial used to establish that the generic can be interchanged with the reference (usually brand) product without safety or efficacy concerns.

The US Food and Drug Administration (FDA) uses two bioavailability pharmacokinetic measures, the area under the drug concentration–time curve (AUC) and the maximum drug concentration (*C*_{max}), to determine in vivo bioequivalence in the Abbreviated New Drug Application (ANDA) process. Although individual bioequivalence, population bioequivalence [2], or scaled average bioequivalence (ABE) [3] analyses are possible, bioequivalence studies are usually designed and powered with the main goal of examining ABE. However, individual, population or scaled bioequivalence examinations, or analyses of individual responses [4, 5] can be done as additional analyses if appropriate data are available. ABE is established when the 90 % confidence interval of the ratio of the test (typically generic) to reference (typically brand) products for the mean AUC (both the measured AUC to time of last blood sample and the calculated AUC to time infinity) and mean *C*_{max} fall within an 80–125 % range; this is the decision rule of the so-called two one-sided test [6], required by the FDA. Randomized crossover designs are the most common designs for examining bioequivalence [7]. Although more complex crossover designs are possible and described in FDA guidances [7], the most common (and also the simplest) design consists of a two-period study, typically using single doses in healthy adults.

There is a growing awareness of the limitations of bioequivalence studies based on only two periods, and of those based on single doses. Firstly, individual responses to formulation switches cannot be reliably assessed with statistical methods if crossover designs based on three (ideally four) or more periods are not used [3–5]. A reason for this is that only studies in which there are subjects who receive a formulation in at least two periods, and subjects who receive at least two formulations in different periods, can be used to compute a subject-by-formulation interaction variance [4]. This variance, in turn, is an essential quantity to assess individual effects of formulations (see below). Convincing arguments have been advanced supporting that personalized medicine research needs to use more crossover designs with more than two periods [4], particularly in chronic diseases. Secondly, especially with formulations for chronic diseases, bioequivalence studies using single doses may sometimes not provide an accurate picture of the clinical environment realities, since there are pharmacological phenomena that can be detected only after a chronic administration of the formulation [8]. However, especially with highly variable drugs, the use of designs with multiple dosing at steady state is still controversial [3].

In the case of epilepsy, professional and patient support organizations around the world have expressed concerns about safety and efficacy with indiscriminate generic product substitution. These concerns are based on case reports and claims database analyses. These organizations have issued position statements stating that generic antiepileptic drug (AED) variability can be problematic for some people with epilepsy [9, 10]. Published reviews suggest that extra caution may be needed for patients at highest risk of seizure complications, such as pregnant patients, patients with recurrent status epilepticus, or patients who have been seizure-free for long periods of time and are driving. It has been argued that the total risks and benefits of generic substitution may not be fully understood [10].

Due to the level of concern and debate in the epilepsy medical community motivating resolution of the above controversy, and to the current FDA interest in revising their approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of AEDs, including the “Equivalence Among AED Generic and Brand Products in People With Epilepsy: Chronic-Dose 4-Period Replicate Design” and the “Equivalence Among AED Generic and Brand Products in People With Epilepsy: Single-Dose 6-Period Replicate Design”, named the EQUIGEN studies. (Summary protocols can be found at clinicaltrials.gov, identifiers NCT01713777 and NCT01733394.) The former study compares two disparate lamotrigine generics using chronic doses in a four-period, two-sequence randomized crossover design, whereas the latter examines two disparate generics simultaneously with the lamotrigine brand product using single doses in a six-period, three-sequence randomized crossover design, both studies in people with epilepsy.

During the design of the EQUIGEN studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA recommendation of using random-effects linear models for the analysis of data from bioequivalence studies [7]. These models, which are also named linear mixed-effects models, are currently state-of-the-art statistical tools for the analysis of crossover studies. Therefore, making available software and tables for sample-size and power computations for these types of studies should be welcomed by the pharmacological and statistical community. Although FDA guidelines provide a table for sample-size computations for four-period crossover designs such as the design of the chronic-dose EQUIGEN study [7, page 28], this table may not be adequate for some studies because it is based on the seemingly limiting assumption that the noise-adjusted bioavailability correlation between two formulations is 0. This correlation is defined in Sect. 3.4 of the current article. Also, the table is applicable only to narrow ranges of variance parameters, particularly the within-subject variance and the subject-by-formulation interaction variance.

An objective of this article is to present tables for sample-size evaluations of ABE studies based on the two randomized crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, termed here Design 1, and the six-period, three-sequence, three-formulation design, termed here Design 2. In particular, for these designs, tables including noise-adjusted bioavailability correlations higher than 0 are provided. Here, sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs, as currently required by the FDA. The tables were made with SAS programs that compute power through Monte-Carlo simulations (available in the Electronic Supplementary Material).

Although some methodological studies regarding power computations for random-effects linear models have been published (e.g., Heo and Leon [11], Muller et al. [12], and Verbeke and Molenberghs [13]), these studies have not specifically addressed hypothesis testing for bioequivalence studies or for the particular crossover designs in this article. Some approximate formulas for the computation of power in bioequivalence studies with crossover designs have been published [14], but these formulas assume that certain non-linear mixed-effect models are used in analyses. At present, however, these non-linear models are not usually used in bioequivalence studies. Nonetheless, it is generally accepted that the most straightforward approach to examining the power of studies with complex repeated-measure designs is to perform Monte-Carlo simulations, since the exact probability distributions of the statistics used in these studies are not usually known under the alternative hypotheses [7, 13]. This is particularly true in the case of bioequivalence studies. Thus, Monte-Carlo simulations are the approach that is followed in this article.

Unfortunately, random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze data from studies producing repeated measures. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients [4, 15–18]. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample size computations. This description will also provide the notation, terminology, and interpretations that are needed to appropriately use the sample-size tables.

This article focuses on the ideas and concepts that pharmacologists, clinicians, and statisticians need to understand the utilization of random-effects linear models in the context of bioequivalence studies and to use the provided sample-size tables. This article does not provide a comprehensive introduction to the analysis of data with random-effects linear models as general tools for the analysis of repeated-measures data. Also, methods for estimating model parameters are out of the scope of this article. Examples of comprehensive treatments of random-effects linear models can be found elsewhere [13, 19, 20].

## 2 Crossover Study Designs

This section describes the two crossover designs for which sample-size tables are provided in this article and for which the statistical models are detailed. The designs are described in the context of the EQUIGEN studies in this section, although, of course, the designs can be used to investigate bioequivalence in many other types of drugs, not only AEDs, and both designs can be used in chronic- or single-dose studies. Some characteristics of the first design, Design 1, are described in FDA guidances [7]. Design 2 is a novel design used in the single-dose EQUIGEN study.

### 2.1 Four-Period, Two-Sequence, Two-Formulation Crossover Design (Design 1)

Since one of our goals is to describe the statistical model for this design in a more general setting than that of the EQUIGEN studies, we refer to the two generics as the test and reference formulation, following standard terminology of bioequivalence studies. Moreover, the statistical model for this design, which is presented in Sect. 3, is applicable to single-dose, not only to chronic-dose, designs, provided each subject is randomized to one of the two formulation sequences.

### 2.2 Six-Period, Three-Sequence, Three-Formulation Crossover Design (Design 2)

- 1.
Each period has all three products represented, which allows balancing period effects within product effects, if any, and therefore separating period from formulation effects in statistical analyses.

- 2.
No product repeats in consecutive periods for any of the three sequences.

- 3.
All three products are tested once in each sequence during the first three periods, providing for usable data if a subject withdraws from the study after the third period. These data can still be used in brand–generic and generic-to-generic comparisons.

- 4.
Replicate administrations of the three products provide full information for performing individual bioequivalence, scaled bioequivalence, and outlier analyses [5].

- 5.
Statistical data analysis may be performed by fitting only one random-effects linear model that simultaneously provides all means and variances needed to perform average, individual, and scaled bioequivalence analyses, as well as outlier analyses [5]; this guarantees a greater statistical efficiency than conducting separate two-sequence, four-period crossover studies of comparable sample sizes.

- 6.
Other possible three-sequence alternative designs can be defined that meet the above criteria, but these designs are statistically equivalent to Design 2.

The statistical model for Design 2 is described in Sect. 4. In that section, the three formulations examined under Design 2 are called formulations 0, 1, and 2. Although Design 2 is used by EQUIGEN studies implementing a single lamotrigine dose per period, the described model and sample-size computations are applicable to chronic dose plans and other drugs as well.

## 3 Model of Bioavailability for Subjects Under Design 1

*C*

_{max}, which is obtained at each of the four periods. The measure \( Y \) can be obtained through compartmental or non-compartmental approaches [22]. Usually, the natural logarithm, \( { \ln }(Y) \), of this measure is used for modeling purposes, since log-transformations usually produce pharmacokinetic data that satisfy the usual assumptions of statistical linear models, an observation that has been reported by many authors [17, 21, 22]. Let \( T \) be an indicator covariate defined as 1 if the test formulation is administered to the subject and 0 if the reference formulation is administered. Let \( \varvec{X} \) be a vector of covariates, which usually contains indicators representing the period and indicators representing the sequence. Other covariates such as demographics, concomitant medications, genotypes, or indicators representing study sites in a multicenter study may be included in \( \varvec{X} \) for secondary analyses. For a

*particular subject*, it is assumed that the relationship between a

*particular*

*measure*\( Y \) from the subject, the formulation \( T \) that produced the measure and the covariates vector \( \varvec{X} \) is given by Eq. 1:

Although the values of \( \alpha \) and \( \beta \) are unique for a particular subject, these values are assumed to vary from subject to subject in the sense that, at the subject population level, \( \alpha \) and \( \beta \) are considered normally distributed random variables: \( \alpha \sim {\text{Normal}}(\mu_{\alpha } ,\sigma_{\alpha }^{2} ) \) and \( \beta \sim {\text{Normal}}(\mu_{\beta } , \sigma_{\beta }^{2} ) \). For this reason, \( \alpha \) and \( \beta \) are called random coefficients or ‘random effects’. Importantly, the variabilities of \( \alpha \) and \( \beta \) are considered to reflect real variation in the biological and environmental factors that shape a person as an individual [16]. That is, under this view of random-effects linear models, these variabilities are not mere mathematical artifacts for handling population’s heterogeneity [16]. Also, in general, \( \alpha \) and \( \beta \) may not be independent random variables; that is, \( \alpha \) and \( \beta \) may be correlated. An additional, usual assumption is that both \( \alpha \) and \( \beta \) are independent of \( \epsilon \).

The model in Eq. 1 implicitly assumes that the intra-subject variance \( \sigma_{\epsilon}^{2} \) is the same for bioavailability measures taken under the test and measures taken under the reference formulation. The FDA recommends following this assumption for power computations, but requires inclusion of the possibility of different variances for the test and reference formulations in actual data analyses [7]. The presentation in this article focuses on the model that assumes equal variances, but the described concepts are also applicable to the more general model incorporating different variances.

Denote \( \sigma^{2} = {\text{Var}}\left( \delta \right) = \sigma_{\alpha }^{2} \) and call \( \sigma^{2} \) the between-subjects variability. Whereas \( \sigma^{2} \) is interpreted in the context of the subject population, the variance \( \sigma_\epsilon^2 = {\rm Var}(\epsilon) \) is interpreted only in the context of a single subject and is therefore called ‘within-subject variability’.

As mentioned previously, the variabilities of the random coefficients \( \alpha \) and \( \beta \) are considered to be the result of real variation in biological and environmental factors, and are not just a mathematical trick to handle the variability of subjects’ pharmacological response. Thus, the variability in bioavailability across subjects that is caused by genetic variation must be incorporated in \( \sigma_{\alpha }^{2} \) and \( \sigma_{\beta }^{2} \) [17], and \( {\text{Var}}(\delta ) \) and \( \sigma_{\alpha }^{2} = {\text{Var}}(\alpha ) \) can be considered upper bounds of genetic variability [4]. In contrast, the variability \( \sigma_{\epsilon }^{2} \) of the random error \( \epsilon \) is considered within-subject variability that is caused by momentary (inter-occasion) changes in uncontrolled environmental and biological conditions that may affect drug bioavailability of particular subjects, or caused by the unavoidable imperfections of the laboratory procedures used to measure \( Y \); therefore, \( \sigma_{\epsilon }^{2} \) is not reflective of genetic variation or of overall or constant environmental factors.

### 3.1 Test Formulation Effect Size

For a particular subject, the quantity \( E \) can be considered a measure of the size of the effect of the test formulation on the subject’s bioavailability response \( Y \), relative to the reference formulation [17, 24–27]. Therefore, \( E \) allows assessment of the pharmacological importance of the difference in bioavailabilities between the two formulations, if there is a difference. As an illustration, if, for a particular patient, \( E = 15\,\% \), then we can infer that the bioavailability \( Y \) of the test formulation is 15 % higher than that of the reference formulation in the patient; and if \( E = - 15\,\% \), then the former bioavailability is 15 % lower than the latter.

### 3.2 Average Bioequivalence

The power of this statistical test depends on the actual value of \( E^{*} \). Other things being equal, for a fixed study sample size, the smaller the ‘true’ value of \( E^{*} \), the easier it is to reject the null hypothesis of non-bioequivalence, and therefore the more powerful is the test. For this reason, the regulatory agency requires that, for a proposed ABE study sample size, the power of an ABE test be reported for a value of \( E^{*} \) not smaller than a prespecified number, denoted here as \( E_{\text{m}}^{*} \) (5 % per the FDA guidelines [7]). Using the statistical jargon of power analyses, \( E_{\text{m}}^{*} \) can be named the ‘maximum detectable effect size’ because, unless the proposed sample size is increased, the existence of a larger effect size than \( E_{\text{m}}^{*} \) may render a power lower than that reported in the study protocol.

In this article, SAS^{®} PROC IML and SAS^{®} PROC MIXED were used to write the program for computing power for this hypothesis test. SAS^{®} PROC MIXED was used in the way recommended by FDA guidelines [7, page 34]. Although the SAS^{®} package was used to perform bioequivalence analyses, a number of other reliable commercial statistical packages such as STATA^{®} or SPSS^{®} are quite capable of fitting the model described by Eq. 1 and computing the above confidence interval.

### 3.3 Subject-by-Formulation Interaction

The model expressed in Eq. 1 implies that the effect size \( E \) varies from subject to subject, since \( \beta \) varies from subject to subject. That is, if \( \sigma_{\beta }^{2} > 0, \) there is an interaction between subjects and test formulation in the informal sense that a particular subject may ‘modify’ the effect size of the test formulation on the bioavailability \( Y \). The quantity \( \sigma_{\beta }^{2} \), which is usually called ‘subject-by-formulation interaction variance’, measures the extent to which subjects may modify this effect size. Statisticians usually think of an interaction between two variables as a process in which one variable modifies the effect of the other variable on a dependent variable; consistent with this conception of ‘interaction’, subjects are viewed here as a variable affecting the difference between the bioavailabilities of two formulations. A power computation for an ABE study requires a careful consideration of the value of this variance, since a value for \( \sigma_{\beta }^{2} \) needs to be used as an input for the computation.

### 3.4 Noise-Adjusted Bioavailability Correlation

Let \( \rho \) denote the correlation between \( \delta \) and \( \alpha \) across subjects, where \( \delta \) and \( \alpha \) are given in Eq. 2. Note that \( \rho \) is a characteristic of the subject population, not of a particular subject. By Eq. 2, in which \( \varvec{\gamma} \) is assumed to be a population constant, \( \rho \) can be interpreted as the correlation between the bioavailabilities of the two formulations, after controlling for intra-subject variation of bioavailability measures.

Design 1, power ≥80 %

\( \sigma_{\beta } \) | \( \rho \) | \( \sigma_{\epsilon } \) | |||||
---|---|---|---|---|---|---|---|

0.20 | 0.34 | 0.43 | 0.51 | 0.58 | 0.64 | ||

0.01 | 0.0 | 12 | 28 | 44 | 60 | 76 | 92 |

0.1 | 12 | 28 | 44 | 60 | 78 | 92 | |

0.2 | 12 | 28 | 44 | 60 | 78 | 92 | |

0.3 | 12 | 28 | 44 | 60 | 78 | 92 | |

0.4 | 12 | 28 | 44 | 60 | 78 | 92 | |

0.1 | 0.0 | 14 | 30 | 46 | 62 | 78 | 94 |

0.1 | 14 | 30 | 46 | 62 | 78 | 94 | |

0.2 | 14 | 30 | 46 | 62 | 78 | 94 | |

0.3 | 14 | 30 | 46 | 62 | 78 | 94 | |

0.4 | 14 | 30 | 46 | 62 | 78 | 94 | |

0.15 | 0.0 | 16 | 32 | 48 | 64 | 80 | 96 |

0.1 | 16 | 32 | 48 | 64 | 80 | 96 | |

0.2 | 18 | 32 | 48 | 64 | 80 | 96 | |

0.3 | 18 | 34 | 50 | 66 | 82 | 96 | |

0.4 | 20 | 34 | 50 | 66 | 82 | 98 | |

0.2 | 0.0 | 20 | 34 | 50 | 66 | 82 | 98 |

0.1 | 20 | 36 | 50 | 68 | 84 | 98 | |

0.2 | 22 | 38 | 54 | 68 | 84 | 100 | |

0.3 | 22 | 40 | 54 | 70 | 88 | 102 | |

0.4 | 26 | 40 | 56 | 72 | 88 | 104 | |

0.25 | 0.0 | 24 | 40 | 54 | 72 | 86 | 104 |

0.1 | 26 | 42 | 56 | 72 | 86 | 104 | |

0.2 | 28 | 44 | 58 | 74 | 90 | 108 | |

0.3 | 30 | 46 | 60 | 78 | 90 | 110 | |

0.4 | 32 | 48 | 62 | 80 | 96 | 110 | |

0.3 | 0.0 | 30 | 46 | 60 | 76 | 94 | 110 |

0.1 | 32 | 48 | 64 | 78 | 94 | 110 | |

0.2 | 34 | 50 | 64 | 80 | 96 | 114 | |

0.3 | 38 | 54 | 68 | 86 | 100 | 114 | |

0.4 | 42 | 58 | 74 | 90 | 106 | 120 |

Design 1, power ≥90 %

\( \sigma_{\beta } \) | \( \rho \) | \( \sigma_{\epsilon } \) | |||||
---|---|---|---|---|---|---|---|

0.20 | 0.34 | 0.43 | 0.51 | 0.58 | 0.64 | ||

0.01 | 0.0 | 14 | 36 | 56 | 78 | 98 | 122 |

0.1 | 14 | 36 | 58 | 78 | 98 | 124 | |

0.2 | 14 | 36 | 58 | 80 | 100 | 124 | |

0.3 | 14 | 36 | 58 | 80 | 100 | 124 | |

0.4 | 14 | 36 | 58 | 80 | 102 | 124 | |

0.1 | 0.0 | 18 | 38 | 60 | 82 | 102 | 124 |

0.1 | 18 | 40 | 60 | 82 | 104 | 126 | |

0.2 | 18 | 40 | 60 | 82 | 104 | 128 | |

0.3 | 18 | 40 | 62 | 82 | 104 | 128 | |

0.4 | 18 | 40 | 62 | 84 | 104 | 128 | |

0.15 | 0.0 | 20 | 42 | 62 | 84 | 106 | 126 |

0.1 | 22 | 42 | 66 | 86 | 108 | 128 | |

0.2 | 22 | 44 | 66 | 86 | 108 | 128 | |

0.3 | 24 | 46 | 66 | 86 | 108 | 130 | |

0.4 | 24 | 46 | 68 | 88 | 108 | 132 | |

0.2 | 0.0 | 24 | 46 | 68 | 88 | 110 | 132 |

0.1 | 26 | 48 | 68 | 90 | 110 | 134 | |

0.2 | 28 | 48 | 70 | 90 | 112 | 136 | |

0.3 | 30 | 50 | 72 | 94 | 114 | 136 | |

0.4 | 34 | 52 | 74 | 96 | 116 | 138 | |

0.25 | 0.0 | 32 | 52 | 74 | 96 | 118 | 140 |

0.1 | 34 | 56 | 76 | 96 | 120 | 142 | |

0.2 | 36 | 58 | 78 | 100 | 120 | 144 | |

0.3 | 40 | 58 | 80 | 102 | 120 | 144 | |

0.4 | 44 | 64 | 86 | 106 | 128 | 146 | |

0.3 | 0.0 | 40 | 60 | 80 | 104 | 126 | 148 |

0.1 | 42 | 64 | 84 | 106 | 128 | 148 | |

0.2 | 46 | 66 | 90 | 108 | 130 | 152 | |

0.3 | 52 | 72 | 94 | 112 | 136 | 156 | |

0.4 | 56 | 76 | 98 | 118 | 144 | 162 |

Design 2, power ≥80 %

\( \sigma_{\beta } \) | \( \rho \) | \( \sigma_{\epsilon } \) | |||||
---|---|---|---|---|---|---|---|

0.20 | 0.34 | 0.43 | 0.51 | 0.58 | 0.64 | ||

0.01 | 0.0 | 12 | 27 | 42 | 60 | 75 | 93 |

0.1 | 12 | 30 | 42 | 60 | 75 | 93 | |

0.2 | 12 | 30 | 45 | 60 | 75 | 93 | |

0.3 | 12 | 30 | 45 | 60 | 75 | 93 | |

0.4 | 12 | 30 | 45 | 60 | 75 | 93 | |

0.1 | 0.0 | 15 | 30 | 48 | 60 | 78 | 93 |

0.1 | 15 | 30 | 48 | 63 | 78 | 93 | |

0.2 | 15 | 30 | 48 | 63 | 78 | 93 | |

0.3 | 15 | 30 | 48 | 63 | 78 | 93 | |

0.4 | 15 | 33 | 48 | 63 | 78 | 96 | |

0.15 | 0.0 | 15 | 33 | 48 | 63 | 81 | 96 |

0.1 | 18 | 33 | 48 | 66 | 81 | 96 | |

0.2 | 18 | 33 | 51 | 66 | 81 | 99 | |

0.3 | 18 | 36 | 51 | 66 | 81 | 99 | |

0.4 | 18 | 36 | 51 | 66 | 81 | 99 | |

0.2 | 0.0 | 21 | 36 | 51 | 66 | 84 | 99 |

0.1 | 21 | 36 | 54 | 66 | 84 | 99 | |

0.2 | 21 | 39 | 54 | 69 | 84 | 99 | |

0.3 | 24 | 39 | 54 | 69 | 87 | 99 | |

0.4 | 24 | 42 | 57 | 75 | 90 | 105 | |

0.25 | 0.0 | 24 | 42 | 54 | 72 | 87 | 102 |

0.1 | 24 | 42 | 57 | 75 | 90 | 102 | |

0.2 | 27 | 42 | 60 | 75 | 93 | 105 | |

0.3 | 30 | 45 | 60 | 78 | 93 | 108 | |

0.4 | 33 | 48 | 63 | 78 | 96 | 108 | |

0.3 | 0.0 | 30 | 45 | 60 | 78 | 90 | 111 |

0.1 | 30 | 45 | 63 | 81 | 93 | 111 | |

0.2 | 33 | 51 | 66 | 81 | 96 | 111 | |

0.3 | 36 | 54 | 69 | 84 | 99 | 114 | |

0.4 | 42 | 57 | 69 | 87 | 99 | 117 |

Design 2, power ≥90 %

\( \sigma_{\beta } \) | \( \rho \) | \( \sigma_{\epsilon } \) | |||||
---|---|---|---|---|---|---|---|

0.20 | 0.34 | 0.43 | 0.51 | 0.58 | 0.64 | ||

0.01 | 0.0 | 15 | 36 | 57 | 78 | 99 | 123 |

0.1 | 15 | 36 | 57 | 78 | 99 | 123 | |

0.2 | 15 | 36 | 57 | 81 | 99 | 123 | |

0.3 | 15 | 36 | 57 | 81 | 102 | 123 | |

0.4 | 15 | 36 | 57 | 81 | 102 | 126 | |

0.1 | 0.0 | 18 | 39 | 63 | 81 | 102 | 126 |

0.1 | 18 | 39 | 63 | 84 | 102 | 126 | |

0.2 | 18 | 39 | 63 | 84 | 102 | 126 | |

0.3 | 18 | 39 | 63 | 84 | 105 | 126 | |

0.4 | 18 | 39 | 63 | 84 | 105 | 126 | |

0.15 | 0.0 | 21 | 42 | 63 | 84 | 108 | 126 |

0.1 | 24 | 42 | 66 | 84 | 108 | 126 | |

0.2 | 24 | 45 | 66 | 87 | 108 | 129 | |

0.3 | 24 | 45 | 66 | 87 | 108 | 129 | |

0.4 | 24 | 48 | 69 | 90 | 111 | 132 | |

0.2 | 0.0 | 24 | 48 | 69 | 90 | 111 | 132 |

0.1 | 27 | 51 | 69 | 93 | 114 | 132 | |

0.2 | 27 | 51 | 72 | 93 | 114 | 135 | |

0.3 | 33 | 51 | 75 | 93 | 114 | 135 | |

0.4 | 33 | 54 | 75 | 96 | 120 | 135 | |

0.25 | 0.0 | 33 | 54 | 75 | 93 | 114 | 138 |

0.1 | 33 | 54 | 78 | 99 | 117 | 141 | |

0.2 | 36 | 57 | 81 | 99 | 117 | 141 | |

0.3 | 39 | 57 | 81 | 105 | 123 | 144 | |

0.4 | 42 | 63 | 84 | 108 | 129 | 150 | |

0.3 | 0.0 | 39 | 60 | 81 | 102 | 123 | 147 |

0.1 | 42 | 63 | 84 | 108 | 126 | 150 | |

0.2 | 45 | 66 | 87 | 108 | 126 | 153 | |

0.3 | 48 | 72 | 90 | 111 | 135 | 156 | |

0.4 | 54 | 75 | 99 | 117 | 141 | 162 |

## 4 Model of Bioavailability for Subjects Under Design 2

*particular subject*and a

*particular measure*\( Y \) from the subject is formulated as follows (Eq. 9):

Analogously to the model in Eq. 1, \( \sigma_{10}^{2} = \sigma_{{\beta_{1} }}^{2},\sigma_{20}^{2} = \sigma_{{\beta_{2} }}^{2} \), and \( \sigma_{12}^{2} = {\text{Var(}}\beta_{1} - \beta_{2} ) \) are interpreted as subject-by-formulation interaction variances; \( \sigma_{\epsilon }^{2} \) is interpreted as a within-subject variance; the noise-adjusted correlation between the bioavailabilities of formulations 1 and 0 is defined as \( \rho_{10} = {\text{Corr}}(\alpha , \alpha + \beta_{1} ) \), between 2 and 0 as \( \rho_{20} = {\text{Corr}}(\alpha , \alpha + \beta_{2} ) \), and between 1 and 2 as \( \rho_{12} = {\text{Corr}}(\alpha + \beta_{1} , \alpha + \beta_{2} ) \); and \( \varvec{\gamma} \) is usually considered a vector of subject population constants. Between-subject variances of bioavailability measures under formulations 0, 1, and 2 are defined as \( \sigma_{\alpha }^{2} \), \( {\text{Var}}(\alpha + \beta_{1} ) \) and \( {\text{Var}}(\alpha + \beta_{2} ) \), respectively.

## 5 Parameter Values Required for Power Computations for Design 1 or 2

- 1.
A maximum detectable effect size for the average subject, \( E_{\text{m}}^{*} \). FDA guidelines [7] require using \( E_{\text{m}}^{*} = 5\,\% \) or, equivalently, \( \mu_{\beta } = \log (1.05) \).

- 2.
The within-subject variance \( \sigma_{\epsilon }^{2} \).

- 3.
The sample size \( N \). For Design 1, \( N \) must be an even number so that the same numbers of subjects are allocated to the two sequences. For Design 2, \( N \) must be a multiple of 3.

- 4.
The subject-by-formulation interaction variance \( \sigma_{\beta }^{2} \).

- 5.
The noise-adjusted bioavailability correlation \( \rho \).

## 6 Tables for Sample Size Computations

We used the SAS^{®} programs to elaborate tables that give optimal total sample sizes for Design 1 (Tables 1 and 2) and Design 2 (Tables 3 and 4) for selected values of \( \sigma_{\epsilon } \), \( \sigma_{\beta } \), and \( \rho \), with \( E_{\text{m}}^{*} = 5\,\% \). Tables 1 and 3 give the smallest sample sizes producing a power of at least 80 %, and Tables 2 and 4 give the smallest sample sizes producing a power of at least 90 %. The programs compute the power of a study of a particular sample size by simulating 5,000 studies of the same sample size using the model described by Eq. 1 in the case of Design 1, or the model described by Eq. 9 in the case of Design 2. To build Table 1 or 3, studies of different sample sizes were simulated under Design 1 or 2, respectively, and the smallest sample size producing a power of at least 80 % was recorded in the table. A similar methodology was followed for building Tables 2 and 4.

As mentioned in Sect. 3.4, and as seen in the tables, higher values of \( \rho \) usually demand larger sample sizes to achieve a desired power. For actual power computations, the value of \( \rho \) can be obtained from pilot studies. In the absence of empirical information, we believe that \( \rho \) in many studies can be assumed to be equal to 0.4. This is because clinical research rarely finds correlations higher than this. Also, as expected, larger within-subject standard deviations (\( \sigma_{\epsilon } \)) or larger subject-by-formulation interaction standard deviations (\( \sigma_{\beta } \)) tend to imply larger sample sizes.

The tables show that the correlation \( \rho \) between bioavailabilities of different products does not have a substantial impact on study sample size when \( \sigma_{\beta } \) is relatively small, especially for Design 1. However, for both designs \( \rho \) does have an impact for moderate or large values of \( \sigma_{\beta } \). For instance, for Design 1, if \( \sigma_{\epsilon } = 0.34 \) and \( \sigma_{\beta } = 0.1 \), the optimal sample size \( N \) achieving a power of at least 80 % is 30, regardless of whether \( \rho = 0 \) or 0.4 (Table 1). In contrast, if \( \sigma_{\epsilon } = 0.34 \) and \( \sigma_{\beta } = 0.25 \), the optimal sample size jumps upwards from 40 for \( \rho = 0 \) to 48 for \( \rho = 0.4 \) (Table 1). In general, the larger the value of \( \sigma_{\beta } \), the higher the relevance of \( \rho \) in the computation of optimal sample sizes in ABE studies. For Design 2, \( \rho \) may be very relevant in the presence of small values of \( \sigma_{\beta } \).

## 7 Discussion and Conclusion

This article describes a computer program and tables for sample size and power computations for two important ABE designs. We hope that this program and the tables fill, at least in part, a gap in bioequivalence research: the lack of commercial or non-commercial software and tables for power and sample sizes supporting random-effects linear modeling in bioequivalence studies. It is surprising that these tools are not available yet, despite the fact that the FDA issued guidelines recommending the use of this type of modeling more than a decade ago (FDA) [7], and that there are available commercial and non-commercial software packages that include modules for random-effects linear modeling (for instance, SAS^{®}, STATA^{®}, SPSS^{®}, etc.). Although FDA guidelines provide some sample-size tables, these tables are incomplete since only a limited range of model parameters are considered. Moreover, the guidelines do not provide tables for more complex designs such as Design 2 used by the EQUIGEN studies.

Beyond the practical need of following the recommendations of regulatory agencies in the design of bioequivalence studies, there are important scientific reasons for implementing sample-size software and tables supporting random-effects linear modeling. Random-effects linear models are sophisticated statistical and mathematical tools that are changing the landscape of pharmacological and personalized medicine research, and there are strong reasons to believe that in the future these models will constitute the natural mathematical language of drug dosage individualization and of the discovery and applications of personalized medicine interventions in general [4, 16–18].

There is an increasing awareness about the importance of a careful assessment of between-patient and within-patient variabilities in the design of personalized pharmacological interventions [16, 18, 28], and also of a correct quantification of the subject-by-formulation interaction variance before evaluating the clinical utility of results from personalized medicine research [4]. This applies not only to pharmacokinetic but also to pharmacodynamic responses to medical treatment. Not surprisingly, assessing these variabilities is also very important in the design and analyses of bioequivalence studies, as shown in Sects. 3–5.

Arguments have been advanced that individual effects of medical treatments cannot be accurately assessed if subject-by-formulation interaction variances are not computed in medical research [4]. In fact, this variance cannot be separated from other types of variance when using parallel designs or using crossover designs with only two periods [4]. It is no exaggeration to say that being able to estimate this variance is the main methodological advantage when using a crossover design with more than two periods.

The descriptions of the models for Designs 1 and 2, and Tables 1, 2, 3, and 4, also show the importance of considering the correlation \( \rho \) between the bioavailabilities of the examined pharmacological products when computing an optimal sample size for an ABE study. Ignoring this correlation in computations and assuming that it is 0 may underpower the study.

For power and sample size computations in ABE studies, the current work and Tables 1, 2, 3, and 4 also show that a careful assessment of the correlation between the bioavailabilities of test and reference formulations (\( \rho \)) may be needed. In fact, power or sample-size computations for ABE studies using Design 1 or 2 require a previous, educated conjecture as to the extent of the within-subject variability \( \sigma_{\epsilon }^{2} \) and, by Eq. 15, the extent of at least two of the three following quantities: between-subject variability \( \sigma^{2} , \) subject-by-formulation interaction variability \( \sigma_{\beta }^{2} \), and bioavailability correlation \( \rho \). For instance, Eq. 15 can be used to assess the value of \( \sigma_{\beta }^{2} \) in situations in which the values of \( \sigma^{2} \) and \( \rho \) are easier to conjecture. In other situations, the values of both \( \sigma_{\beta }^{2} \) and \( \sigma^{2} \) may be easier to conjecture, and in those cases Eq. 15 needs to be used before using Tables 1, 2, and 3.

Tables 1, 2, 3, and 4 illustrate that large within-subject variances demand relatively large sample sizes for ABE studies, which has consequences for research, regulation, and ethics that are analyzed by Tothfalusi et al. [3]. As suggested by these authors, scaled ABE analyses may be more appropriate in cases of highly variable drugs, although power and sample-size computations for scaled ABE studies would require software and tables different to those for ABE studies.

## Conflict of interest and Sources of Funding

The development of the six-period crossover design and the SAS/IML programs for power and sample size computations were made in the context of the “Chronic-Dose Bioequivalence Study of Generic Antiepileptic Drugs” (Chronic-dose EQUIGEN) and the “Single-Dose Bioequivalence Study of Generic Antiepileptic Drugs”, (Single-dose EQUIGEN) studies, which are currently funded by the US Food and Drug Administration (FDA), the Epilepsy Foundation, and the American Epilepsy Society. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the supporting institutions. The SAS/IML programs were written by Dr. Diaz. For the writing of this article, or for developing the sample-size tables reported in this article, Dr. Diaz did not receive funding from the above institutions or any other type of external funding, and none of the other authors paid a salary to him. Dr. Privitera has received research support from the FDA, Epilepsy Foundation, American Epilepsy Society, UCB, and Eisai, and has served on data safety monitoring boards for Upsher-Smith and Eli Lilly. The authors declare that they have no conflict of interest.