Optimal Design in Hierarchical Random Effect Models for Individual Prediction with Application in Precision Medicine

Hierarchical random effect models are used for different purposes in clinical research and other areas. In general, the main focus is on population parameters related to the expected treatment effects or group differences among all units of an upper level (e.g. subjects in many settings). Optimal design for estimation of population parameters are well established for many models. However, optimal designs for the prediction for the individual units may be different. Several settings are identified in which individual prediction may be of interest. In this paper, we determine optimal designs for the individual predictions, e.g. in multi-cluster trials or in trials that investigate a new treatment in a number of different subpopulations, and compare them to a conventional balanced design with respect to treatment allocation. Our investigations show that in the case of uncorrelated cluster intercepts and cluster treatments the optimal allocations are far from being balanced if the treatment effects vary strongly as compared to the residual error and more subjects should be recruited to the active (new) treatment. Nevertheless, efficiency loss may be limited resulting in a moderate sample size increase when individual predictions are foreseen with a balanced allocation.


Introduction
Hierarchical random effect models are used for different purposes. Common applications, e.g. in clinical research, are random effect meta analyses of several clinical trials using individual patient data (IPD), multi-centre trials assuming random centre effects or mixed or random effect models for repeated measurements (MMRM) for longitudinal data within subjects. Clinical trials or meta-analyses of clinical trials may involve several populations or subgroups of patients (e.g. defined by genetic biomarkers), where in some settings between-population variability may be modelled using a hierarchical approach.
Precision medicine aims at investigating new treatments in different subpopulations of patients, e.g. with specific tumour subtypes, in order to detect potentially differential treatment effects indicating the application in specific subpopulations. Basket trials investigate a new treatment in a number of different groups of patients, where the treatment is compared to a control in each of the different subpopulations. If, for example, two binary or dichotomized genetic biomarkers identify biomarker positive or negative patients, the whole patient population could be split into four disjoint subsets of patients. However, far more subpopulations of interest may be identified, especially in oncological settings, where one treatment may be investigated in a large number of different patient populations. Furthermore, in many oncological indications, biomarker defined subgroups might be rather small and early phase clinical trials aim at exploring different subpopulations that may be highly heterogeneous with respect to the response to treatment, since depending on the individual biomarker status, different treatment effects are often expected. In these difficult settings with a limited number of trial participants available, high design efficiency is an important prerequisite for a successful development of new targeted drugs in small populations.
Further applications refer or observational studies data on quality parameters of drugs arise from hierarchical settings with multiple layers, i.e. with multiple sources of variability according to the underlying manufacturing process, e.g. given by manufacturing sites, production batches and samples within batches. In general, the main focus is on population parameters related to the expected treatment effects or group differences among all units of an upper level (e.g. trials in IPD meta-analyses, centres in multi-centre trials, patients in longitudinal trials, batches in quality control, etc.). Several authors considered optimal design for estimation of population parameters (expected values of random effects) in similar models; see e.g. [2][3][4][5][6]8]. In general, prediction of the outcome in the individual units may also be of interest in several settings, as for treatment effects in single clusters to assess qualification of individual clinics, manufacturing sites in manufacturing control, treatment effects in different subpopulations of patients. In these cases, the question arises, whether optimal designs for populations parameters can also be used for individual predictions of random effects, whether optimal designs for individual predictions differ from those for populations parameters and to which extent and which efficiency loss (or sample size increase) can be anticipated if another conventional design is chosen.
In this paper, we investigate optimal designs for random effects to be applied, e.g. in basket trials (or multi-cluster trials), investigating a number of disjoint patient populations and compare them to a conventional balanced design with respect to treatment allocation. Nevertheless, the results are applicable to very different settings as outlined above.
The structure of the paper is as follows: In Sect. 2, a hierarchical model is specified and the best linear unbiased predictions of the individual parameters (intercepts and treatment effects for the individual units or clusters) are derived. In Sect. 3, analytical results are presented to characterize optimal designs for prediction of the cluster specific effects. The results are illustrated by some numerical examples. The paper is concluded by a short discussion.

Model Specification
We consider a randomized comparative trial with K different clusters (e.g. subpopulations). In all of these clusters, individuals are allocated to two treatment groups. In the first group (denoted by x = 1 ), the individuals receive an active treatment while in the second group (denoted by x = 0 ) a placebo or a control treatment is applied.
We consider K clusters i = 1, … , K . Denote by i and i the intercept (mean response at placebo or control) and the effect of the active treatment (compared to placebo or control), respectively, in cluster i, which both may vary across the clusters. The response Y ij of an individual j = 1, … , N i in cluster i can be described as where x ij is equal to 1, if the individual belongs to the treatment group, and x ij is equal to 0 for the control (or placebo) group and ij denotes the random variation in the response of the individuals. The individual variations ij are assumed to have zero mean and to be homoscedastic with common variance 2 .
The cluster specific intercepts i and treatment effects i can be assumed as random with (unknown) expected values E( i ) = and E( i ) = characterizing the mean intercept and mean treatment effect across the clusters and covariance structure Cov(( i , i ) ⊤ ) = 2 for some 2 × 2 positive definite dispersion matrix . All random effects and all individual variations are assumed to be uncorrelated.
For the sake of simplicity, we further assume that the total number N of individuals per cluster is the same for all clusters ( N i = N ) and that the allocation rate is constant across the clusters, i. e. the number n of individuals in the treatment group is the same for all clusters. The design problem can then be formulated in terms of finding the optimal allocation rate w = n∕N for the treatment group.
Because of exchangeability of the individuals within each cluster, we may sort them in the analysis regardless of randomization in such a way that the first n individuals j = 1, … , n to be analyzed receive the active treatment and the remaining N − n individuals j = n + 1, … , N are in the control group. Then the experimental settings x ij in (1) can be specified by and are independent of the cluster i.
Hence, the multi-cluster model (1) can be identified as a particular case of the random coefficient regression model investigated by [7] when the regression functions and the cluster parameters are specified by (x) = (1, x) ⊤ and i = ( i , i ) ⊤ , respectively. In general, this model can be written in vector notation as tors of observations and individual variations at cluster i, respectively, and = ( (x 1 ), … , (x N )) ⊤ is the within cluster design matrix which is equal across all clusters.
For the present multi-cluster model, the design matrix simplifies to where and denote the -dimensional vectors with all entries equal to 1 and 0, respectively.
According to [7], Corollary 1 (see also [2]) in model (2), the best linear unbiased predictors (BLUPs) of the random parameters i are weighted combinations of the estimates ̂ i;within = ( ⊤ ) −1 ⊤ i based only on the observations within cluster i and the best linear unbiased estimator (BLUE) ̂ 0 = ( ⊤ ) −1 ⊤̄ of the population parameter i is the mean observational vector averaged across the clusters.
We additionally assume that the cluster intercepts i and the cluster treatment effects i are uncorrelated for all clusters, i. e.
= diag(u, v) , where u = 2 ∕ 2 > 0 and v = 2 ∕ 2 > 0 are the variance ratios of the intercepts and the treatment effects in relation to the observational variance of the individuals.
With the common notations for the mean response in the treatment ("T") and the control ("C") group in cluster i, i ⋅ for the overall mean of the treatment and the control group, respectively, the BLUPs for the cluster parameters ̂ i = (̂i,̂i) ⊤ of the random intercepts and the random treatment effects i = ( i , i ) ⊤ in model (1) can be written as weighted averages (see [7], Corollary 2), where denotes the × identity matrix and ⊗ is the symbol for the Kronecker product of matrices or vectors.
Further denote by = 1 , … , K ⊤ the vector of treatment effects for all clusters. Then = K ⊗ (0, 1) and, hence, for the MSE matrix of the BLUP ̂ = (̂1, … ,̂K) ⊤ of . Using this and formula (7), we obtain the MSE matrix Note that the MSE matrix for ̂ is completely symmetric and that, hence, the MSE of ̂i attains the same value for all cluster treatment effects.

Optimal Design
As individuals are interchangeable within treatment groups, we may define an exact within cluster design Page 6 of 12 by the allocation numbers n and N − n to the treatment and control groups T and C, respectively. For analytical purposes, we generalize this to the definition of an approximate design: where w = n N is the allocation rate to the treatment group and 1 − w = N−n N is the allocation rate to the control group. For finding an optimal design, only the optimal allocation rate w * to the treatment group has to be determined.
For an approximate design, the definition of the MSE matrix (9) of the BLUP ̂ is extended in a straightforward manner and can be rewritten (neglecting 2 ) as : in terms of the allocation rate w.
For the assessment of the MSE matrix, we focus on the A-optimality criterion for the cluster treatment effects , which averages the mean squared errors of the predicted cluster treatment effects ̂i . More specifically, the A-criterion Φ A, is the trace of the MSE matrix of the prediction ̂ of the cluster treatment effects. For an approximate design, we obtain for the criterion function Φ A, in terms of the allocation rate w.
Alternatively, one may consider the MV-optimality criterion for the cluster treatment effects , which measures the maximal MSE over all predicted cluster treatment effects ̂i . More specifically, the MV-criterion Φ MV, is the maximal diagonal entry of the MSE matrix of the prediction ̂ of the cluster treatment effects. Because the MSE is equal for all ̂i , the MV-criterion is related to the A-criterion by Φ MV, = 1 K Φ A, , and hence, the A-optimal designs are also MV-optimal. Because, in general, there is no explicit solution for the optimal allocation rate, which minimizes (13), we will give an insight in the qualitative behaviour by some numerical examples below.
It is worthwhile mentioning that the criterion (13) is convex, and therefore, an optimal exact design may be obtained by choosing the best of the two exact designs adjacent to an optimal approximate design.

Example 1
For illustrative purposes, we consider a numerical example with K = 16 clusters and N = 4 individuals in each clusters, corresponding, for example, to a phase II trial investigating the potentially heterogeneous treatment effect in a population that is subdivided by four different dichotomous biomarkers. Obviously, the resulting sample size per subgroup will be small resulting in a high need for optimal designs. Figure 1 exhibits the behaviour of the optimal allocation rate w * to the treatment group in dependence on the variance ratio v of the treatment effects for some fixed values 0.01, 0.1, 0.25, 0.5 and 1.5 of the variance ratio u of the intercept. For reasons of presentation, we plot the optimal allocation rate w * against the rescaled variance ratio r v = v∕ (1 + v) in the spirit of intra-class correlation in order to cover all possible values of the treatment effects variance ratio by a finite interval ((0, 1)), where r v → 0 and r v → 1 relate to v → 0 and v → ∞ , respectively, and r v is monotonically increasing in v. What can be seen from the picture is that for fixed values of u the optimal allocation rate w * is equal to 0.5 for v → 0 and increases with increasing values of the variance ratio v of the treatment effects. The different lines associated with the different values of u appear in descending order which means that the optimal allocation rates decrease when the variance ratio u of the intercepts gets larger.
The next figure (Fig. 2) shows the behaviour of the optimal allocation rate in dependence on the variance ratio u of the intercepts for fixed values 0.01, 0.1, 0.2, 0.5 and 2 of the variance ratio v of the treatment effects, where again the variance ratio is rescaled ( r u = u∕(u + 1) ). Also here it can be seen that the optimal allocation rate decreases with increasing values of u and increases with increasing values of v. Figures 3 and 4 present the efficiency of the equal allocation rate w 0 = 0.5 which is optimal in the fixed effects model ( u = v = 0 ). The efficiency for the A-criterion (A-efficiency) has been computed using the common formula Note that for the MV-criterion the MV-efficiency, which is defined analogously, coincides with the A-efficiency. The efficiencies decrease with increasing values of Example 2 Extending the illustrative example, we further consider trials with K = 16 or K = 32 clusters and N = 6 or N = 4 individuals in each cluster. Figure 5 exhibits the behaviour of the optimal allocation rate w * to the treatment group in dependence on the variance ratio q = v∕u for some fixed values 0.01, 0.1, 0.25, 0.5 and 1.5 of the variance ratio u of the intercept. For reasons of presentation, we again plot the optimal allocation rate w * against the rescaled variance ratio r q = q∕(1 + q) . As we can observe from the graphics, the dependence on the variance ratio q is more essential for K = 32 and N = 4 than for K = 16 and N = 6. Figure 6 presents the efficiency of the equal allocation rate w 0 = 0.5 which is optimal in the fixed effects model. The efficiency is more sensible with respect to the variance ratio in the case K = 32 and N = 4 than for K = 16 and N = 6.

Discussion
In the present paper, we focused on A-and MV-optimality for prediction of the cluster treatment effects . Alternatively, one may be tempted to employ the D-optimality criterion Φ D, which is defined as the log determinant of the MSE matrix of the prediction ̂ of the cluster treatment effects , The D-criterion is commonly used in the case of estimation of fixed effects and measures essentially the volume of a simultaneous confidence ellipsoid for the parameters of interest taking correlations between the parameter estimates into account. In the present setting of prediction of cluster treatment effects, especially when considering treatment effects in patient subpopulations, stand-alone interpretations for the given clusters are targeted and the simultaneous prediction ellipsoid corresponding to the D-criterion appears less relevant for the clinical interpretation. From a practical point of view, simultaneous prediction intervals based on a Bonferroni-like approach would better fit to the intended application. The corresponding R-optimality criterion (see e.g. [1]) measures the volume of these multidimensional intervals and is defined as the product of the diagonal elements of the MSE matrix.
In the present situation of equal diagonal elements, the R-criterion Φ R, is related to the A-and MV-criterion by Φ R, = (Φ MV, ) K = ( 1 K Φ A, ) K , and hence, the A-optimal designs are also R-optimal.
As illustrated in the examples, the larger the between-unit (between-cluster) variability of the treatment effects, the more the optimal weight deviates from equal allocation, especially if the variance of the treatment effects is large compared to the variance of the intercepts of the units. An increasing heterogeneity in the treatment effect leads to a decreased precision of the design that is optimal for overall population parameters: A balanced design is far from optimal if the treatment effects vary strongly as compared to the residual error and more subjects should be recruited to the active (new) treatment in multi-cluster trials. Nevertheless, efficiency loss may still be limited as in the examples resulting in a total sample size increase of about 10-20% in the considered scenarios if individual predictions are foreseen with a less efficient balanced allocation. If between-unit variability of treatment effects is considered to be small, equal allocation may suffice. However, using the results given in the paper, specific settings with different expectations can be assessed properly, in