Optimal designs for comparing regression curves -- dependence within and between groups

We consider the problem of designing experiments for the comparison of two regression curves describing the relation between a predictor and a response in two groups, where the data between and within the group may be dependent. In order to derive efficient designs we use results from stochastic analysis to identify the best linear unbiased estimator (BLUE) in a corresponding continuous time model. It is demonstrated that in general simultaneous estimation using the data from both groups yields more precise results than estimation of the parameters separately in the two groups. Using the BLUE from simultaneous estimation, we then construct an efficient linear estimator for finite sample size by minimizing the mean squared error between the optimal solution in the continuous time model and its discrete approximation with respect to the weights (of the linear estimator). Finally, the optimal design points are determined by minimizing the maximal width of a simultaneous confidence band for the difference of the two regression functions. The advantages of the new approach are illustrated by means of a simulation study, where it is shown that the use of the optimal designs yields substantially narrower confidence bands than the application of uniform designs.


Introduction
The application of optimal or efficient designs can improve the accuracy of statistical analysis substantially and meanwhile there exists a well established and powerful theory for the construction of (approximate) optimal designs for independent observations, see for example the monographs of Pukelsheim (2006) or Fedorov and Leonov (2013). In contrast, the determination of optimal designs for efficient statistical analysis from dependent data is more challenging because the corresponding optimization problems are in general not convex and therefore the powerful tools of convex analysis are not applicable. Although design problems for correlated data have been discussed for a long time (see, for example Ylvisaker, 1966, 1968;Bickel and Herzberg, 1979;Näther, 1985, who use asymptotic arguments to develop continuous but in general non-convex optimzation problems in this context) a large part of the discussion is restricted to models with a small number of parameters and we refer Pázman and Müller (2001), Müller and Pázman (2003), Dette et al. (2008), Kiselak and Stehlík (2008), Harman andŠtulajter (2010), Rodriguez-Diaz (2017), Campos-Barreiro and López-Fidalgo (2015) and Attia and Constantinescu (2020) among others. Recently, Dette et al. (2013) suggest a more systematic approach to the problem and determine (asymptotic) optimal designs for least squares estimation, under the additional assumption that the regression functions are eigenfunctions of an integral operator associated with the covariance kernel of the error process. This approach refers to models, where the regression functions are eigenfunctions of the integral operator corresponding to the covariance kernel, which is used to describe the dependencies. For more general models  propose to construct the optimal design and estimator simultaneously. More precisely, they construct a class of estimators and corresponding optimal designs with a variance converging (as the sample size increases) to the optimal variance in the continuous time model. Dette et al. (2017a) propose an alternative strategy for this purpose. They start with the construction of the best linear unbiased estimator (BLUE) in the continuous time model using stochastic calculus and determine in a second step an implementable design, which is "close" to the solution in the continuous time model. By this approach these authors are able to provide an easily implementable estimator with a corresponding design which is practically non distinguishable from the weighted least squares estimate (WLSE) with corresponding optimal design. Their results are applicable for a broad class of linear regression models with various covariance kernels and have recently been extended to the situation, where also derivatives of the process can be observed (see Dette et al., 2019). Dette and Schorning (2016) and Dette et al. (2017b) propose designs for the comparison of regression curves from two independent samples, where the latter reference also allows for dependencies within the samples. Their work is motivated by applications in drug development, where a comparison between two regression models that describe the relation between a common response and the same covariates for two groups is used to establish the non-superiority of one model to the other or to check whether the difference between two regression models can be neglected. For example, if the similarity between two regression functions describing the dose response relationships in the groups individually has been established subsequent inference in drug development could be based on the combined samples such that a more efficient statistical analysis is possible on the basis of the larger population. Because of its importance several procedures for the comparison of curves have been investigated in linear and nonlinear models (see Liu et al., 2009;Gsteiger et al., 2011;Liu et al., 2011;Dette et al., 2018;Bretz et al., 2018;Möllenhoff et al., 2018Möllenhoff et al., , 2020. Designs minimizing the maximal width of a (simultaneous) confidence band for the difference between the regression curves calculated from two independent groups are determined by Dette and Schorning (2016) and Dette et al. (2017b), who also demonstrate that the use of these designs yields to substantially narrower confidence bands. While these results refer to independent groups it is the purpose of the present paper to investigate designs for the comparison of regression curves corresponding to two groups, where the data within the groups and between the groups may be dependent. It will be demonstrated that in most cases simultaneous estimation of the parameters in the regression models using the data from both groups yields to more efficient inference than estimating the parameters in the models corresponding to the different groups separately. Moreover, the simultaneous estimation procedure can never be worse. While this property holds independently of the design under consideration, we subsequently construct efficient designs for the comparison of curves corresponding to not necessarily independent groups and demonstrate its superiority by means of a simulation study. The remaining part of this paper is organized as follows. In Section 2 we introduce the basics and the design problem. Section 3 is devoted to a continuous time model, which could be interpreted as a limiting experiment of the discrete model if the sample size converges to infinity. In this model we derive an explicit representation of the BLUE if estimation is performed simultaneously is both groups. In Section 4 we develop a discrete approximation of the continuous BLUE by determining the optimal weights for the linear estimator. Finally, the optimal design points are determined such that the maximum width of the confidence band for the difference of the two regression functions is minimal. Section 5 is devoted to a small numerical comparison of the performance of the optimal designs with uniform designs. In particular, it is demonstrated that optimal designs yield substantially narrower confidence bands. In many cases the maximal width of a confidence band from the uniform design is by a factor between 2 and 10 larger than the width of a confidence band from the optimal design.

Simultaneous estimation of two regression models
Throughout this paper we consider the situation of two groups of observations Y 1,1 , . . . , Y 1,n and Y 2,1 , . . . , Y 2,n at time points t 1 , . . . , t n (i = 1, 2) where there may exist dependencies within and between the groups. We assume the relation between the response and the covariate t in each group is described by a linear regression models given by Thus in each group n observations are taken at the same time points t 1 , . . . , t n which can be chosen in a compact interval, say [a, b], and observations at different time points and in different groups might be dependent. The vectors of the unknown parameters θ (1) and θ (2) are assumed to be p 1 -and p 2 -dimensional, respectively, and the corresponding vectors of regression functions To address the situation of correlation between the groups, we start with a very simple covariance structure for each group, but we emphasize that all results presented in this paper are correct for more general covariance structures corresponding to Markov processes, see Remark 3.3 for more details. To be precise, let denotes the mean value and the covariance of the individual process ε i at the points t j and t k , respectively. Let σ 1 , σ 2 > 0, ̺ ∈ (−1, 1), denote by Σ 1/2 the square root of the covariance matrix where ε(t) = (ε 1 (t), ε 2 (t)) ⊤ . Note that ̺ ∈ (−1, 1) denotes the correlation between the observations Y 1 (t j ) and Y 2 (t j ) (j = 1, . . . , n), and that in general the correlation between Y 1 (t j ) and Y 2 (t k ) is given by Considering the two groups individually results in proper (for example weighted least squares) estimators of the parameters θ (1) and θ (2) . However, this procedure ignores the correlation between the two groups and estimating the parameters θ (1) and θ (2) simultaneously from the data of both groups might result in more precise estimates. In order to define estimators for the parameters θ (1) and θ (2) using the information from both groups we now consider a more general two-dimensional regression model, which on the one hand contains the situation described in the previous paragraph as special case, but on the other hand also allows us to consider the case, where some of the components in θ 1 and θ 2 coincide, see Example 2.2 and Section 3.3 for details. To be precise we define the regression model In model (2.5) the vector θ = (ϑ 1 , . . . , ϑ p ) ⊤ is a p-dimensional parameter and denotes a (p × 2) matrix containing continuously differentiable regression functions, where the two-dimensional functions (F 1,1 (t), F 2,1 (t)) ⊤ , . . . , (F 1,p (t), F 2,p (t)) ⊤ are assumed to be linearly independent.
Example 2.1 The individual models defined in (2.1) are contained in this two-dimensional model. More precisely, defining the p = (p 1 + p 2 )-dimensional vector of parameters θ by θ = ((θ (1) ) ⊤ , (θ (2) ) ⊤ ) ⊤ and the regression function F ⊤ (t) in (2.6) by the rows it follows that model (2.5) coincides with model (2.1). Moreover, this composite model takes the correlation between the groups into account. In this case the models describing the relation between the variable t and the responses Y 1 (t) and Y 2 (t) do not share any parameters.
Example 2.2 In this example we consider the situation where some of the parameters of the individual models in (2.1) coincide. This situation occurs, for example, if Y 1 (t) and Y 2 (t) represent clinical parameters (depending on time) before and after treatment, where it can be assumed that the effect at time a coincides before and after the treatment. In this case a reasonable model for average effect in the two groups is given by More generally, we consider the situation where the vectors of the parameters are given by where θ (0) ∈ R p 0 denotes the vector of common parameters in both models and vectorsθ (1) ∈ R p 1 −p 0 andθ (2) ∈ R p 2 −p 0 contain the different parameters in the two individual models. The corresponding regression functions are given by where the vector f ⊤ 0 (t) contains the regression functions corresponding to the common parameters in the two models, andf ⊤ 1 (t) andf ⊤ 2 (t) denote the vectors of regression functions corresponding to the different parametersθ (1) andθ (2) , respectively. Defining the p = (p 1 + p 2 − p 0 )-dimensional vector of parameters θ by θ = (θ (0) ,θ (1) ,θ (2) ) and the regression function F ⊤ (t) in (2.6) by the rows it follows that the model (2.5) contains the individual models in (2.1), where the regression functions are given by (2.7) and the parameters θ (1) and θ (2) share the parameter θ (0) . Moreover, this composite model takes the potential correlation between the groups into account.

Continuous time models
It was demonstrated by Dette et al. (2017a) that efficient designs for dependent data in regression problems can be derived by first considering the estimation problem in a continuous time model. In this model there is no optimal design problem as the data can be observed over the full interval [a, b]. However, efficient designs can be determined in two steps. First, one derives the best linear unbiased estimator (BLUE) in the continuous time model and, secondly, one determines design points (and an estimator) such that the resulting estimator from the discrete data provides a good approximation of the optimal solution in the continuous time model. In this paper we will use this strategy to develop optimal designs for the comparison of regression curves from two (possible) dependent groups. In the present section we discuss a continuous time model corresponding to the discrete model (2.5), while the second step, the determination of an "optimal" approximation will be postponed to the following Section 4 .

Best linear unbiased estimation
To be precise, we consider the continuous time version of the linear regression model in (2.5), that is, where we assume 0 < a < b and the full trajectory of the process } is a vector of independent Brownian motions as defined in (2.2) and the matrix Σ 1/2 is the square root of the covariance matrix Σ defined in (2.3). Note that we restrict ourselves to an interval on the positive line, because in this case the notation is slightly simpler. But we emphasize that the theory developed in this section can also be applied for a = 0, see Remark 3.1 for more details. We further assume that the (p × p)-matrix is non-singular.
Theorem 3.1 Consider the continuous time linear regression model (3.1) on the interval [a, b], a > 0, with a continuously differentiable matrix of regression functions F, a vector {ε(t) = (ε 1 (t), ε 2 (t)) ⊤ | t ∈ [a, b]} of independent Brownian motions and a covariance matrix Σ defined by (2.3). The best linear unbiased estimator of the parameter θ is given bŷ Moreover, the minimum variance is given by Multiplying Y by the matrix Σ −1/2 yields a transformed regression model where Σ −1/2 is the inverse of Σ 1/2 , the square root of the covariance matrix Σ defined in (2.3). Note that the components of the vectorỸ are independent, and consequently, the joint likelihood function can be obtained as the product of the individual components. Next we rewrite the components of the continuous time model (3.5) in terms of two stochastic differential equations, that is where 1 A is the indicator function of the set A and Σ To obtain the joint density of the processes defined by (3.6) and (3.7) it is therefore sufficient to derive the individual densities. Let P respectively. It follows from Theorem 1 in Appendix II of Ibragimov and Has'minskii (1981) Consequently, because of independence, the joint density of (P As the processesỸ 1 andỸ 2 are independent by construction the maximum likelihood estimator in the model (3.1) can be determined by solving the equation with respect to θ. The solution coincides with the linear estimate defined in (3.3), and a straightforward calculation, using Ito's formula and the fact that the random variables b aḞ (t)dε t and ε a are independent, gives where the matrix M is defined in (3.2). Since the covariance matrix M −1 is the inverse of the information matrix in the continuous time regression model in (3.1) (see Ibragimov and Has'minskii, 1981, p. 81), the linear estimator (3.3) is the BLUE, which completes the proof of Theorem 3.1.
and we have to distinguish three cases.
(1) If the regression function F satisfies F(0) = 0 p×2 (that is rank(F(0)) = 0)), the deterministic equation (3.8) does not contain any further information about the parameter θ and the maximum likelihood estimator in model (3.1) is given bŷ where the minimum variance is given by (2) If the rank of the matrix F(0) satisfies rank(F(0)) = 1, the deterministic equation (3.8) contains one informative equation about θ. In that case, we assume without loss of generality that F 1,1 (0) = 0 and it follows by (3.8) that θ 1 can be reformulated by θ 2 , . . . , θ p through . (3.9) Using (3.9) in combination with model (3.1), we obtain a reduced model by (3.10) where the matrix valued functionF(t) is defined bỹ 3.11) and the reduced (p−1)-dimensional parameterθ is given byθ = (θ 2 , . . . , θ p ) . It follows by rank(F(0)) = 1, that the matrix valued functionF(t) defined in (3.11) satisfiesF T (0) = 0 2×p . Consequently, the modified model given by (3.10) satisfies the condition of the case given in (1) and the best linear unbiased estimator for the reduced parameterθ is obtained byθ where the process {Z(t); t ∈ [0, b]} is defined by (3.10), the matrixF(t) is given by (3.11) and the minimum variance is given by The best linear unbiased estimator for the remaining parameter θ 1 is then obtained bŷ (3) If the rank of the matrix F(0) satisfies rank(F(0)) = 2, equation (3.8) contains two informative equations about θ. Let be the submatrix of F which contains the first two columns of F T (t). Without loss of generality, we assume that A(0) is non-singular (as rank(F(0)) = 2). Then it follows by (3.8) that (3.14) Using (3.14) in combination with (3.1) we obtain a reduced model given by where the matrix valued function A(t) is given by (3.13), the matrix valued functioñ 3.16) and the reduced (p − 2)-dimensional parameterθ is given byθ = (θ 3 , . . . , θ p ) . The matrix valued functionF(t) defined in (3.16) satisfiesF T (0) = 0 2×p . Consequently, the modified model given by (3.15) satisfies the condition of the case given in (1) and the best linear unbiased estimatorθ BLUE for the reduced (p − 2)-dimensional parameterθ is obtained by (3.12) using the process {Z(t); t ∈ [0, b]} defined by (3.15) and the matrix valued functionF(t) given by (3.16). The best linear unbiased estimator for the remaining parameter (θ 1 , θ 2 ) T is then obtained by

Model with no common parameters
Recall the definition of model (2.1) in Section 1. It was demonstrated in Example 2.1 that this case is a special case of model (2.5), where the matrix F ⊤ is given by and θ = (θ (1) ⊤ , θ (2) ⊤ ) ⊤ . Considering both components in the vector Y separately, we obtain continuous versions of the two models introduced in (2.1),that is, independent Brownian motions and a matrix Σ defined by (2.3). The best linear unbiased estimator for the parameter θ is given bŷ The minimum variance is given by M −1 , where It is of interest to compare the estimator (3.19) with the estimatorθ mar = ((θ mar ) ⊤ ) ⊤ , which is obtained by estimating the parameter in both models (3.18) separately. It follows from Theorem 2.1 in Dette et al. (2017a) that the best linear unbiased estimators in these models are given byθ where the matrices are defined by Moreover, the covariance matrices of the estimatorsθ mar andθ mar are the inverses of the Fisher information matrices in the individual models, that is The following result compares the variance of the two estimators (3.19) and (3.21).
Theorem 3.2 If the assumptions of Corollary 3.1 are satisfied, we have (with respect to the Loewner ordering) mar are the best linear unbiased estimators of the parameter θ (ℓ) obtained by simultaneous estimation (see (3.19)) and separate estimation in the two groups (see (3.21)) , respectively.
Proof. Without loss of generality we consider the case ℓ = 1, the proof for the index ℓ = 2 is obtained by the same arguments. Let K 1 ⊤ = (I p 1 , 0 p 1 ×p 2 ) be a p 1 × (p 1 + p 2 )-matrix, where I p 1 and 0 p 1 ×p 2 denote the p 1 -identity matrix and a (p 1 × p 2 )-matrix filled with zeros. Then, is the Schur complement of the block M 22 of the information matrix M (see p. 74 in Pukelsheim, 2006). Observing (3.22) we now compare C K 1 (M) and 1 σ 2 M 11 and obtain Note that the matrixM is nonnegative definite. An application of Lemma 3.12 of Pukelsheim (2006) shows that the Schur complement C K 1 (M) is also nonnegative definite, that is C K 1 (M) ≥ 0 with respect to the Loewner ordering. Observing (3.24) we have and the statement of the theorem follows.
Remark 3.2 If ̺ = 0 we have C K 1 (M) = M 11 , and it follows from (3.23) that separate estimation in the individual groups does not yield less precise estimates, that is Cov(θ BLUE ). Moreover, the inequality is strict in most cases, which means that simultaneous estimation of the parameters θ (1) and θ (2) yields more precise estimators. A necessary condition for strict inequality (i.e the matrix Cov(θ (l) mar ) − Cov(θ (1) BLUE ) is positive definite) is the condition ̺ = 0. The following result shows that this condition is not sufficient. It considers the important case where the regression functions f 1 and f 2 in (3.17) are the same and shows that in this case the two estimatorsθ BLUE andθ mar coincide.
Corollary 3.2 If the assumptions of Corollary 3.1 hold and additionally the regression functions in model (2.5) satisfy f 1 = f 2 , the best linear unbiased estimator for the parameter θ is given byθ where I 2 denotes the 2 × 2-identity matrix and the matrix M 11 is defined by (3.27). Moreover, the minimum variance is given by Cov(θ BLUE ) = Σ ⊗ M −1 11 and

Models with common parameters
Recall the definition of model (2.1) in Section 1. It was demonstrated in Example 2.2 that this case is a special case of model (2.5), where the matrix of regression functions is given by and the vector of parameters is defined by An application of Theorem 3.1 yields the BLUE in model (2.5) with the matrix F ⊤ defined by (3.25).

Corollary 3.3 Consider the continuous time linear regression model (2.5) on the interval [a, b],
where the the matrix of regression functions F ⊤ is continuously differentiable. The best linear unbiased estimator for the parameter θ is given by (3.26)

The minimum variance is
and individual blocks in this matrix are given by

Remark 3.3
The results presented so far have been derived for the case where the error process {ε(t) = (ε 1 (t), ε 2 (t)) ⊤ | t ∈ [a, b]} in (2.5) consist of two independent Brownian motions. This assumption has been made to simplify the notation. Similar results can be obtained for Markov processes and in this remark we indicate the essential arguments. To be precise, assume that the error processes {ε(t) = (ε 1 (t), ε 2 (t)) ⊤ | t ∈ [a, b]} in model (2.5) consist of two independent centered Gaussian processes with continuous covariance kernel given by where u(·) and v(·) are functions defined on the interval [a, b] such that the function q(·) = u(·)/v(·) is positive and strictly increasing. Kernels of the form (3.28) are called triangular kernels and a famous result in Doob (1949) essentially shows that a Gaussian process is a Markov process if and only if its covariance kernel is triangular (see also Mehr and McFadden, 1965). In this case model (2.5) can be transformed into a model with an error process consisting of two independent Brownian motions using the arguments given in Appendix B of . More precisely, define and consider the stochastic process where {ε(t) = (ε 1 (t) ⊤ ,ε 2 (t))|t ∈ [ã,b]} consists of two independent Brownian motions on the interval [ã,b] = [q(a), q(b)]. It now follows from Doob (1949) that the process {ε(t) = (ε 1 (t), ε 2 (t)) ⊤ | t ∈ [a, b]} consists of two independent centered Gaussian process on the interval [a, b] with covariance kernel (3.28). Consequently, if we consider the model the results obtained so far are applicable. Thus, a "good" estimator obtained for the parameter θ in model (3.29) is also a "good estimator" for the parameter θ in model (3.1) with error process consisting of two Gaussian processes with covariance kernel (3.28). Consequently, we can derive the optimal estimator for the parameter θ in the continuous time model (3.1) with covariance kernel (3.28) from the best linear unbiased estimator in the model given in (3.29) with Brownian motions by an application of Theorem 3.1. The resulting best linear unbiased estimator for θ in model (3.1) with triangular kernel (3.28) is of the form where the minimum variance is given by

Optimal designs for comparing curves
In this section we will derive optimal designs for comparing curves. The first part is devoted to a discretization of the BLUE in the continuous time model. In the second part we develop an optimality criterion to obtain efficient designs for the comparison of curves based on the discretized estimators.

From the continuous to the discrete model
To obtain a discrete design for n observations at the points a = t 1 , . . . , t n from the continuous design derived in Section 3, we use a similar approach as in Dette et al. (2017a) and construct a discrete approximation of the stochastic integral in (3.3). For this purpose we consider the linear estimatorθ were a = t 1 < t 2 < . . . < t n−1 < t n = b, Ω 2 , . . . , Ω n are p × p weight matrices and Φ 2 = Ω 2Ḟ (t 1 ), . . . , Φ n = Ω nḞ (t n−1 ) are p × 2 matrices, which have to be chosen in a reasonable way. The matrix M −1 is given in (3.4). To determine these weights in an "optimal" way we first derive a representation of the mean squared error between the best linear estimate (3.3) in the continuous time model and its discrete approximation (4.1). The following result is a direct consequence of Ito's formula.
In the following we choose optimal p × 2 matrices Φ i = Ω iḞ (t i−1 ) and design points t 2 , . . . , t n−1 (t 1 = a, t n = b), such that the linear estimate (4.1) is unbiased and the mean squared error matrix in (4.2) "becomes small". An alternative criterion is to replace the mean squared error between the estimateθ n defined in (4.1) and the "true" vector of parameters. The following result shows that in the class of unbiased estimators both optimization problems yield the same solution. The proof is similar to the proof of Theorem 3.1 in Dette et al. (2017a).
is satisfied. Moreover, for any linear unbiased estimator of the formθ n = b a G(s)dY s we have In order to describe a solution in terms of optimal "weights" Φ * i and design points t * i we recall that the condition of unbiasedness of the estimateθ n in (4.1) is given by (4.3) and introduce the notation

4)
It follows from Lemma 4.1 and Theorem 4.1 that for an unbiased estimatorθ n of the form (4.1) the mean squared error has the representation which has to be "minimized" subject to the constraint The following result shows that a minimization with respect to the weights Φ i (or equivalently A i ) can actually be carried out with respect to the Loewner ordering.
Theorem 4.2 Assume that the assumptions of Theorem 3.1 are satisfied and that the matrix is non-singular. Let Φ * 2 , . . . , Φ * n denote (p × 2)-matrices satisfying the equations with respect to the Loewner ordering among all unbiased estimators of the form (4.1). Moreover, the variance of the resulting estimatorθ * n is given by Proof. Let v denote a p-dimensional vector and consider the problem of minimizing the subject to the constraint (4.6). Observing (4.5) this yields the Lagrange function (4.10) where A 2 , . . . , A n are (p × 2)-matrices and Λ = (λ k,ℓ ) p k,ℓ=1 is a (p × p)-matrix of Lagrange multipliers. The function G v is convex with respect to A 2 , . . . , A n . Therefore, taking derivatives with respect to A j yields as necessary and sufficient for the extremum (here we use matrix differential calculus) . . , n .
Rewriting this system of linear equations in a (p × 2)-matrix form gives Substituting the expression in (4.6) and using the non-singularity of the matrices M and B yields for the matrix of Lagrangian multipliers (4.11) Note that one solution of (4.11) is given by which does not depend on the vectors v. Therefore, the tuple of matrices (A * 2 , . . . , A * n ) minimizes the convex function G v in (4.10) for all v ∈ R p . Observing the notations in (4.4) shows that the optimal matrix weights are given by (4.8). Moreover, these weights in (4.8) do not depend on the vector v either and provide a simultaneous minimizer of the criterion defined in (4.9) for all v ∈ R p . Consequently, the weights defined in (4.8) minimize E θ [(θ BLUE −θ n )(θ BLUE −θ n ) ⊤ ] under the unbiasedness constraint (4.6) with respect to the Loewner ordering.
Remark 4.1 If the matrix B in Theorem 4.2 is singular, the optimal weights are not uniquely determined and we propose to replace the inverse B by its Moore-Penrose inverse.
Note that for fixed design points t 1 , . . . , t n Theorem 4.2 yields universally optimal weights Φ * 2 , . . . , Φ * n (with respect to the Loewner ordering) for estimators of the form (4.1) satisfying (4.3). On the other hand, a further optimization with respect to the Loewner ordering with respect to the choice of the points t 2 , . . . , t n−1 (t 1 = a, t n = b) is not possible, and we have to apply a real valued optimality criterion for this purpose. In the following section, we will derive such a criterion which explicitly addresses the comparison of the regression curves from the two groups introduced in Section 2.

Confidence bands
We return to the practical scenario of the two groups introduced in (2.1), where we now focus on the comparison of these groups on the interval [a, b]. More precisely, consider the model introduced in (2.5) and letθ * n be the estimator (4.1) with optimal weights defined by (4.8) from n observations taken at the time points a = t 1 < t 2 < . . . < t n−1 < t n = b. Then this estimator is normally distributed with mean E[θ * n ] = θ and covariance matrix where the matrices M, M 0 and B are given by (3.2), (4.6) and (4.7), respectively. Note that the covariance matrix depends on the time points t 1 , . . . , t n through the matrix B −1 . Moreover, using the estimatorθ * n the prediction of the difference of a fixed time point t ∈ [a, b] satisfies We now use this result and the results of Gsteiger et al. (2011) to obtain a simultaneous confidence band for the difference of the two curves. More precisely, if the interval [a, b] is the range where the two curves should be compared, the simultaneous confidence band is defined as follows. Consider the statistiĉ and define D as the (1 − α)-quantile of the corresponding distribution, that is Note that Gsteiger et al. (2011) propose the parametric bootstrap for choosing the critical value D. Define u(t; t 1 , . . . , t n ) = (1, −1)F ⊤ (t)θ * n + D · {h(t; t 1 , . . . , t n )} 1/2 , l(t; t 1 , . . . , t n ) = (1, −1)F ⊤ (t)θ * n − D · {h(t; t 1 , . . . , t n )} 1/2 , then the confidence band for the difference of the two regression functions is defined by Consequently, good time points t 1 = a < t 2 < . . . < t n−1 , t n = b should minimize the width u(t; t 1 , . . . , t n ) − l(t; t 1 , . . . , t n ) = 2 · D · {h(t; t 1 , . . . , t n )} 1/2 of this band at each t ∈ [a, b]. As this is only possible in rare circumstances, we propose to minimize an L p -norm of the function h(·; t 1 . . . , t n ) as a design criterion, that is (4.13) where the case p = ∞ corresponds to the maximal deviation h(·; t 1 . . . , t n ) ∞ = sup t∈ [a,b] |h(t; t 1 . . . , t n )|.
Finally, the optimal points a = t * 1 < t * 2 < . . . < t * n = b (minimizing (4.13)) and the corresponding weights derived in Theorem 4.2 provide the optimal linear unbiased estimator of the form (4.1) (with the corresponding optimal design).
Example 4.1 We now conclude this section by considering the cases of no common and common parameters, respectively. (a) If we are in the situation of Example 2.1 (no common parameters), the regression function F ⊤ (t) is of the form in (3.17) and the variance of the prediction of the difference at a fixed point t ∈ [a, b] reduces to The corresponding design criterion is given by (4.14) (b) If we are in the situation of Example 2.2 (common parameters), the regression function F ⊤ (t) is given by (3.25) and the variance of the prediction of the difference at a fixed point t ∈ [a, b] reduces to h(t; t 1 , . . . , t n ) = (0,f ⊤ The corresponding design criterion is given by

Numerical Examples
In this section the methodology is illustrated in examples by means of a simulation study. To be precise, we consider the regression model (2.5), where the matrix F(t) is given by (3.17) corresponding to the case that the regression function do not share common parameters, see Section 3.2 for more details. In this case the corresponding bounds for the confidence band is given by (4.12), where u(t; t 1 , . . . , t n ) = (θ * (1) . . , t n )} 1/2 , l(t; t 1 , . . . , t n ) = (θ * (1) n ) ⊤ ) ⊤ is the estimator (4.1) with optimal weights defined in (4.8). The design space is given by the interval [a, b] = [1, 10], and we consider three choices for the functions f 1 and f 2 in the matrix (3.17), that is To model the dependence between the two groups we use the covariance matrix in (2.5), where the correlations are chosen as ̺ = 0.2, 0.5, 0.7. Following the discussion in Section 4.1 we focus on the comparison of the regression curves for the two groups and derive optimal designs, minimizing the criterion Φ ∞ defined in (4.14). As result, we obtain simultaneous confidence bands with a smaller maximal width for the difference of the curves describing the relation in the two groups. We can obtain similar results different values p ∈ (0, ∞) in (4.14) but for the sake of brevity we concentrate on the criterion Φ ∞ which is probably also the easiest to interpret for practitioners. We denote byθ * n the linear unbiased estimator derived in Section 4. For each of the combinations of regression functions containing two different functions defined in (4.15), the optimal weights have been found by Theorem 4.2 and the optimal design points t * i are determined minimising the criterion Φ ∞ defined in (4.14). For the numerical optimisation the Particle Swarm Optimisation (PSO) algorithm is used (see, for example, Clerc, 2006) assuming a sample size of four observations in each group, that is, n = 4. Furthermore, the uniform design used in the following calculations is the design which has four equally spaced design points in the interval [1, 10]. The Φ ∞ -optimal design points minimizing the criterion criterion in (4.14) are given in Table 1 for all combinations of models and correlations under consideration. Note that for each model the corresponding optimal design points change for different values of correlation ̺. In order to investigate the impact of the optimal design on the structure of the confidence bands we have performed a small simulation study simulating confidence bands for the difference of the regression functions. The vector of parameter values used for each model is θ = (θ (1) ⊤ , θ (2) ⊤ ) ⊤ = (1, 1, 1, 1, 1, 1) ⊤ . In Figure 1 we display the averages of uniform confidence bands defined in (4.12) under the uniform and optimal design calculated by 100 simulation runs. The left, middle and right columns show the results for the correlation ̺ = 0.2, ̺ = 0.5 and ̺ = 0.7, respectively, while the rows correposnd to different combinations for the functions f 1  and f 2 (first row: f 1 = f A , f 2 = f B , middle row: f 1 = f A , f 2 = f C and last row f 1 = f B , f 2 = f C ). In each graph, the confidence bands from the Φ ∞ -optimal or the uniform design are plotted separately using the solid and dashed lines respectively, along with the plot for the true difference f ⊤ 1 (t)θ (1) − f ⊤ 2 (t)θ (2) (solid grey lines). We observe, that in all cases under considerations the use of Φ ∞ -optimal designs yields a clearly visible improvement compared to the uniform design. The maximal width of the confidence band is reduced substantially. Moreover, the bands from the Φ ∞ -optimal designs are nearly uniformly more narrow than the bands based on the uniform design. Even more importantly, the confidence bands based on the Φ ∞ -optimal design show a similar structure as the true differences, while the confidence bands from the uniform design oscillate. A comparison of the left, middle and right columns in Figure 1 shows that the maximum width for the confidence bands based on the optimal design decreases with increasing (absolute) correlation ̺. This effect is not visible for the confidence bands based on the uniform design. For example, for the middle row of Figure 1, which corresponds to the case f 1 = f A and f 2 = f C , the maximum width of the confidence bands based on the equally spaced design points even seem to increase. Table 2 presents the values of the criterion Φ ∞ in (4.14) for the different scenarios and confirms the conclusions drawn from the visual inspection of the confidence bands plots. We observe that the use of the optimal design points reduces the maximum width of the confidence bands substantially. Moreover, for the optimal design the maximum width becomes smaller with increasing (absolute) correlation. On the other hand this monotonicity cannot be observed in all cases for the uniform designs.
Summarizing, the use of the proposed Φ ∞ -optimal design improves statistically inference substantially reducing the maximum variance of the difference of the two estimated regression curves. Moreover, simultaneous estimation in combination with a Φ ∞ -optimal design yields a further reduction of the maximum width of the confidence bands, thus providing a more precise inference for the difference of the curves describing the relation between t and the responses in the two groups. Table 2: Values of the criterion Φ ∞ for the optimal and uniform design with four observations in each group in the interval [1, 10]. The error process is given by a two independent Brownian motions with correlation ̺ = 0.2, 0.5, 0.7 between the groups, respectively.