1 Introduction

The application of optimal or efficient designs can improve the accuracy of statistical analysis substantially, and meanwhile, there exists a well-established and powerful theory for the construction of (approximate) optimal designs for independent observations, see for example the monographs of Pukelsheim [28] or Fedorov and Leonov [15]. In contrast, the determination of optimal designs for efficient statistical analysis from dependent data is more challenging because the corresponding optimization problems are in general not convex and therefore the powerful tools of convex analysis are not applicable. Although design problems for correlated data have been discussed for a long time (see, for example [2, 26, 30, 31], who use asymptotic arguments to develop continuous but in general non-convex optimization problems in this context), a large part of the discussion is restricted to models with a small number of parameters and we refer to Pázman and Müller [27], Müller and Pázman [25], Dette et al. [7], Kiselak and Stehlík [19], Harman and Štulajter [17], Rodriguez-Diaz [29], Campos-Barreiro and López-Fidalgo [4] and Attia and Constantinescu [1] among others.

Recently, Dette et al. [9] suggest a more systematic approach to the problem and determine (asymptotic) optimal designs for least squares estimation, under the additional assumption that the regression functions are eigenfunctions of an integral operator associated with the covariance kernel of the error process. For more general models Dette et al. [10] propose to construct the optimal design and estimator simultaneously. More precisely, they construct a class of estimators and corresponding optimal designs with a variance converging (as the sample size increases) to the optimal variance in the continuous model. Dette et al. [6] propose an alternative strategy for this purpose. They start with the construction of the best linear unbiased estimator (BLUE) in the continuous model using stochastic calculus and determine in a second step an implementable design, which is “close” to the solution in the continuous model. By this approach these authors are able to provide an easily implementable estimator with a corresponding design which is practically non-distinguishable from the weighted least squares estimate (WLSE) with corresponding optimal design. Their results are applicable for a broad class of linear regression models with various covariance kernels and have recently been extended to the situation, where also derivatives of the process can be observed (see [11]).

Dette and Schorning [12] and Dette et al. [13] propose designs for the comparison of regression curves from two independent samples, where the latter reference also allows for dependencies within the samples. Their work is motivated by applications in drug development, where a comparison between two regression models that describe the relation between a common response and the same covariates for two groups is used to establish the non-superiority of one model to the other or to check whether the difference between two regression models can be neglected. For example, if the similarity between two regression functions describing the dose–response relationships in the groups individually has been established, subsequent inference in drug development could be based on the combined samples such that a more efficient statistical analysis is possible on the basis of the larger population. Because of its importance, several procedures for the comparison of curves have been investigated in linear and nonlinear models (see [3, 8, 16, 20, 21, 23, 24], among others). Designs minimizing the maximal width of a (simultaneous) confidence band for the difference between the regression curves calculated from two independent groups are determined by Dette and Schorning [12] and Dette et al. [13], who also demonstrate that the use of these designs yields to substantially narrower confidence bands.

While these results refer to independent groups, it is the purpose of the present paper to investigate designs for the comparison of regression curves corresponding to two groups, where the data within the groups and between the groups may be dependent. It will be demonstrated that in most cases simultaneous estimation of the parameters in the regression models using the data from both groups yields to more efficient inference than estimating the parameters in the models corresponding to the different groups separately. Moreover, the simultaneous estimation procedure can never be worse. While this property holds independently of the design under consideration, we subsequently construct efficient designs for the comparison of curves corresponding to not necessarily independent groups and demonstrate its superiority by means of a simulation study.

The remaining part of this paper is organized as follows. In Sect. 2 we introduce the basics and the design problem. Section 3 is devoted to a continuous model, which could be interpreted as a limiting experiment of the discrete model if the sample size converges to infinity. In this model we derive an explicit representation of the BLUE if estimation is performed simultaneously in both groups. In Sect. 4 we develop a discrete approximation of the continuous BLUE by determining the optimal weights for the linear estimator. Finally, the optimal design points are determined such that the maximum width of the confidence band for the difference of the two regression functions is minimal. Section 5 is devoted to a small numerical comparison of the performance of the optimal designs with uniform designs. In particular, it is demonstrated that optimal designs yield substantially narrower confidence bands. In many cases the maximal width of a confidence band from the uniform design is by a factor between 2 and 10 larger than the width of a confidence band from the optimal design.

2 Simultaneous Estimation of Two Regression Models

Throughout this paper we consider the situation of two groups of observations \(Y_{1,1} , \ldots , Y_{1,n}\) and \(Y_{2,1} , \ldots , Y_{2,n}\) at the points \(t_1, \ldots , t_n\) (\(i=1,2\)) where there may exist dependencies within and between the groups. We assume the relation between the response and the covariate t in each group is described by a linear regression model given by

$$\begin{aligned} Y_{ij}= Y_i(t_j) = f_i^\top (t_j) \theta ^{(i)} + {\eta }_i(t_j) , \, j=1, \ldots , n , i=1, 2 \, . \end{aligned}$$
(2.1)

Thus in each group n observations are taken at the same points \(t_1, \ldots , t_n\) which can be chosen in a compact interval, say [ab], and observations at different points and in different groups might be dependent. The vectors of the unknown parameters \(\theta ^{(1)}\) and \(\theta ^{(2)}\) are assumed to be \(p_1\)- and \(p_2\)-dimensional, respectively, and the corresponding vectors of regression functions \(f_i(t) = (f_{i,1}(t), \ldots , f_{i,p_i}(t))^\top \), \(i=1, 2\), have continuously differentiable linearly independent components.

To address the situation of correlation between the groups, we start with a very simple covariance structure for each group, but we emphasize that all results presented in this paper are correct for more general covariance structures corresponding to Markov processes, see Remark 3.3 for more details. To be precise, let \(\{{\varepsilon }_1(t)| ~t\in [a, b]\}\) and \(\{{\varepsilon }_2(t)| ~t\in [a, b]\}\) denote two independent Brownian motions, such that

$$\begin{aligned} \mathbb {E}[\varepsilon _i(t_j)]= 0,~~K_i(t_j, t_k) = \mathbb {E}[\varepsilon _i(t_j)\varepsilon _i(t_k)] = \min (t_j,t_k) \end{aligned}$$
(2.2)

denotes the mean value and the covariance of the individual process \(\varepsilon _i\) at the points \(t_{j}\) and \(t_k\), respectively. Let \(\sigma _1, \sigma _2 >0 \), \(\varrho \in (-1, 1)\), denote by \(\varvec{\Sigma }^{1/2}\) the square root of the covariance matrix

$$\begin{aligned} \varvec{\Sigma } = \begin{pmatrix} \sigma ^2_1 &{} \sigma _1\sigma _2 \varrho \\ \sigma _1\sigma _2 \varrho &{} \sigma ^2_2 \end{pmatrix} \,, \end{aligned}$$
(2.3)

and define for \(t\in [a,b]\) the two-dimensional process \( \{ \varvec{\eta }(t)|~ t \in [a,b] \} \) by

$$\begin{aligned} \varvec{\eta }(t)= \begin{pmatrix} \eta _1(t) \\ \eta _2 (t) \end{pmatrix} =\varvec{\Sigma }^{1/2} \varvec{\varepsilon }(t) , \end{aligned}$$
(2.4)

where \( \varvec{\varepsilon }(t) =(\varepsilon _1(t) , \varepsilon _2(t))^\top \). Note that \(\varrho \in (-1, 1)\) denotes the correlation between the observations \(Y_1(t_j)\) and \(Y_2(t_j)\) (\(j=1, \ldots , n\)), and that in general the correlation between \(Y_1(t_j)\) and \(Y_2(t_k)\) is given by

$$\begin{aligned} \text{ Corr } (Y_1(t_j), Y_2(t_k)) = \varrho \min \left\{ \sqrt{\frac{t_j}{t_k}}\, , \, \sqrt{\frac{t_k}{t_j}}\right\} \end{aligned}$$
(2.5)

if \(t_j,t_k \in [a,b]\), for \(a>0\). If the interval is given by \([a=0, b]\) instead, the correlation between \(Y_1(t_j)\) and \(Y_2(t_k)\) is given by (2.5) if \(t_j, t_k \in (0, b]\), whereas \(\text{ Corr } (Y_1(0), Y_2(t_j))= \text{ Corr } (Y_1(t_j), Y_2(0)) = 0\) for \(t_j \in [0, b]\).

Considering the two groups individually results in proper (for example weighted least squares) estimators of the parameters \(\theta ^{(1)}\) and \(\theta ^{(2)}\). However, this procedure ignores the correlation between the two groups and estimating the parameters \(\theta ^{(1)}\) and \(\theta ^{(2)}\) simultaneously from the data of both groups might result in more precise estimates. In order to define estimators for the parameters \(\theta ^{(1)}\) and \( \theta ^{(2)}\) using the information from both groups we now consider a more general two-dimensional regression model, which on the one hand contains the situation described in the previous paragraph as a special case, but on the other hand also allows us to consider the case, where some of the components in \(\theta _1\) and \( \theta _2\) coincide, see Example 2.2 and Sect. 3.3 for details. To be precise we define the regression model

$$\begin{aligned} \mathbf {Y}(t_j) = \begin{pmatrix} Y_1(t_j) \\ Y_2(t_j) \end{pmatrix} = \mathbf {F}^\top (t_j)\theta + \varvec{\eta }(t_j) = \mathbf {F}^\top (t_j)\theta + \varvec{\Sigma }^{1/2} \varvec{\varepsilon }(t_j), \quad \quad j=1, \ldots , n, \end{aligned}$$
(2.6)

where two-dimensional observations

$$\begin{aligned} \mathbf {Y}(t_1) = (Y_1(t_1), Y_2(t_1))^\top , \ldots , \mathbf {Y}(t_n) = (Y_1(t_n), Y_2(t_n))^\top \end{aligned}$$

are available at points \(t_1, \ldots , t_n \in [a,b]\). In model (2.6) the vector \(\theta = (\vartheta _1, \ldots , \vartheta _p)^\top \) is a p-dimensional parameter and

$$\begin{aligned} \mathbf {F}^\top (t) = \begin{pmatrix} F^\top _1(t) \\ F^\top _2(t) \end{pmatrix} = \begin{pmatrix} F_{1,1}(t) &{} \ldots &{} F_{1,p}(t) \\ F_{2,1}(t) &{} \ldots &{} F_{2,p}(t) \end{pmatrix} \end{aligned}$$
(2.7)

denotes a \((2\times p)\) matrix containing continuously differentiable regression functions, where the two-dimensional functions \((F_{1,1}(t) , F_{2,1}(t) )^\top , \ldots , (F_{1,p}(t) , F_{2,p}(t) )^\top \) are assumed to be linearly independent.

Example 2.1

The individual models defined in (2.1) are contained in this two-dimensional model. More precisely, defining the \(p=(p_1+ p_2)\)-dimensional vector of parameters \(\theta \) by \(\theta = ((\theta ^{(1)})^\top , (\theta ^{(2)})^\top )^\top \) and the regression function \(\mathbf {F}^\top (t)\) in (2.7) by the rows

$$\begin{aligned} F^\top _1(t)= ({f}^\top _1(t), 0^\top _{p_2}) , ~~ F^\top _2(t)= (0^\top _{p_1}, {f}^\top _2(t)), \end{aligned}$$

it follows that model (2.6) coincides with model (2.1). Moreover, this composite model takes the correlation between the groups into account. In this case the models describing the relation between the variable t and the responses \(Y_1(t)\) and \(Y_2(t)\) do not share any parameters.

Example 2.2

In this example we consider the situation where some of the parameters of the individual models in (2.1) coincide. This situation occurs, for example, if \(Y_1(t)\) and \(Y_2(t)\) represent clinical parameters (depending on time) before and after treatment, where it can be assumed that the effect at time a coincides before and after the treatment. In this case a reasonable model for average effect in the two groups is given by

$$\begin{aligned} \mathbb {E} [Y_i (t) ] = \theta ^{(0)} + ({{\tilde{\theta }}}^{(i )})^\top {\tilde{f}}_i (t) ~,~~i =1,2~. \end{aligned}$$

More generally, we consider the situation where the vectors of the parameters are given by

$$\begin{aligned} \theta ^{(1)}= (\theta ^{(0)^\top }, {\tilde{\theta }}^{(1)^\top })^\top \quad , \quad \theta ^{(2)}= (\theta ^{(0)^\top }, {\tilde{\theta }}^{(2)^\top })^\top , \end{aligned}$$

where \(\theta ^{(0)} \in \mathbb {R}^{p_0}\) denotes the vector of common parameters in both models and vectors \({\tilde{\theta }}^{(1)} \in \mathbb {R}^{p_1 - p_0}\) and \({\tilde{\theta }}^{(2)} \in \mathbb {R}^{p_2 - p_0}\) contain the different parameters in the two individual models. The corresponding regression functions are given by

$$\begin{aligned} f_1^\top (t) = (f^\top _{0}(t), {\tilde{f}_1}^\top (t)) \quad , \quad f_2^\top (t) = (f^\top _{0}(t), \tilde{f}_2^\top (t)) , \end{aligned}$$
(2.8)

where the vector \(f^\top _0(t)\) contains the regression functions corresponding to the common parameters in the two models, and \({\tilde{f}_1}^\top (t)\) and \({\tilde{f}_2}^\top (t)\) denote the vectors of regression functions corresponding to the different parameters \({\tilde{\theta }}^{(1)}\) and \({\tilde{\theta }}^{(2)}\), respectively.

Defining the \(p=(p_1+p_2-p_0)\)-dimensional vector of parameters \(\theta \) by \(\theta = (\theta ^{(0)}, {{\tilde{\theta }}}^{(1)}, {{\tilde{\theta }}}^{(2)})\) and the regression function \(\mathbf {F}^\top (t)\) in (2.7) by the rows

$$\begin{aligned} F^\top _1(t)= (f^\top _0(t), \tilde{f}^\top _1(t), 0^\top _{p_2-p_0}) , ~~ F^\top _2(t)= (f^\top _0(t), 0^\top _{p_1-p_0}, \tilde{f}^\top _2(t)), \end{aligned}$$

it follows that model (2.6) contains the individual models in (2.1), where the regression functions are given by (2.8) and the parameters \(\theta ^{(1)}\) and \(\theta ^{(2)}\) share the parameter \(\theta ^{(0)}\). Moreover, this composite model takes the potential correlation between the groups into account.

3 Continuous Models

It was demonstrated by Dette et al. [6] that efficient designs for dependent data in regression problems can be derived by first considering the estimation problem in a continuous model. In this model there is no optimal design problem as the data can be observed over the full interval [ab]. However, efficient designs can be determined in two steps. First, one derives the best linear unbiased estimator (BLUE) in the continuous model and, secondly, one determines design points (and an estimator) such that the resulting estimator from the discrete data provides a good approximation of the optimal solution in the continuous model. In this paper we will use this strategy to develop optimal designs for the comparison of regression curves from two (possible) dependent groups. In the present section we discuss a continuous model corresponding to discrete model (2.6), while the second step, the determination of an “optimal” approximation will be postponed to following Sect. 4 .

3.1 Best Linear Unbiased Estimation

To be precise, we consider the continuous version of the linear regression model in (2.6), that is,

$$\begin{aligned} \mathbf {Y} (t)= \begin{pmatrix} Y_1(t) \\ Y_2(t) \end{pmatrix} = \mathbf {F}^\top (t)\theta + \varvec{\Sigma }^{1/2} \varvec{\varepsilon }(t) , \quad t\in [a,b] , \end{aligned}$$
(3.1)

where we assume \(0<a<b\) and the full trajectory of the process \(\{ {\varvec{Y}}(t) \mid t\in [a,b]\}\) is observed, \(\{\varvec{\varepsilon }(t)=(\varepsilon _1(t), \varepsilon _2(t))^\top \mid t\in [a,b]\}\) is a vector of independent Brownian motions as defined in (2.2), and the matrix \(\varvec{\Sigma }^{1/2}\) is the square root of the covariance matrix \(\varvec{\Sigma }\) defined in (2.3). Note that we restrict ourselves to an interval on the positive line, because in this case the notation is slightly simpler. But we emphasize that the theory developed in this section can also be applied for \(a=0\), see Remark 3.1 for more details. We further assume that the \((p \times p)\)-matrix

$$\begin{aligned} \mathbf {M} =\int _a^b {{\dot{\mathbf{F}}}}(t) \varvec{\Sigma }^{-1} {\dot{\mathbf{F}}}^\top (t) \,\mathrm{d}t + \frac{1}{a} \mathbf {{F}}(a)\varvec{\Sigma }^{-1} \mathbf {{F}}^\top (a) \end{aligned}$$
(3.2)

is non-singular.

Theorem 3.1

Consider continuous linear regression model (3.1) on the interval [ab], \(a >0\), with a continuously differentiable matrix of regression functions \(\mathbf {F}\), a vector \(\{\varvec{\varepsilon }(t)=(\varepsilon _1(t), \varepsilon _2(t))^\top \mid t\in [a,b]\}\) of independent Brownian motions and a covariance matrix \(\varvec{\Sigma }\) defined by (2.3). The best linear unbiased estimator of the parameter \(\theta \) is given by

$$\begin{aligned} \hat{\theta }_\mathrm{BLUE} = \mathbf {M}^{-1} \Big ( \int _a^b \dot{\mathbf {F}}(t) \varvec{\Sigma }^{-1}\,\mathrm{d}\mathbf {Y} (t) + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1} \mathbf {Y}(a) \Big ) . \end{aligned}$$
(3.3)

Moreover, the minimum variance is given by

$$\begin{aligned} \text{ Cov }(\hat{\theta }_\mathrm{BLUE} ) = \mathbf {M}^{-1} =\left( \int _a^b {\dot{\mathbf{F}}}(t) \varvec{\Sigma }^{-1} {\dot{\mathbf{F}}}^\top (t) \,\mathrm{d}t + \frac{1}{a} \mathbf {{F}}(a)\varvec{\Sigma }^{-1} \mathbf {{F}}^\top (a)\right) ^{-1} \, . \end{aligned}$$
(3.4)

Proof

Multiplying \( \varvec{{Y}}\) by the matrix \(\varvec{\Sigma }^{-1/2}\) yields a transformed regression model

$$\begin{aligned} \varvec{\tilde{Y}}(t) = \begin{pmatrix}\tilde{Y}_1(t) \\ \tilde{Y}_2(t) \end{pmatrix} = \varvec{\Sigma }^{-1/2} \begin{pmatrix}{Y}_1(t) \\ {Y}_2(t) \end{pmatrix} = \varvec{\Sigma }^{-1/2}\mathbf {F}^\top (t) \theta + \ \varvec{\varepsilon }(t) , \end{aligned}$$
(3.5)

where \(\varvec{\Sigma }^{-1/2}\) is the inverse of \(\varvec{\Sigma }^{1/2}\), the square root of the covariance matrix \(\varvec{\Sigma }\) defined in (2.3). Note that the components of the vector \(\varvec{\tilde{Y}}\) are independent, and consequently, the joint likelihood function can be obtained as the product of the individual components. Next we rewrite the components of continuous model (3.5) in terms of two stochastic differential equations, that is

$$\begin{aligned} \mathrm{d}\tilde{Y}_i(t)= & {} \mathbf {1}_{[a,b]}(t) \varvec{\Sigma }^{-1/2}_i {{\dot{\mathbf{F}}}}^\top (t) \theta \mathrm{d}t + \mathrm{d}\varepsilon _i(t) , t\in [0, b]~, \end{aligned}$$
(3.6)
$$\begin{aligned} \tilde{Y}_i(a)= & {} \varvec{\Sigma }^{-1/2}_i \mathbf {F}^\top (a) \theta + \varepsilon _i(a)~, \end{aligned}$$
(3.7)

where \( \mathbf {1}_{A}\) is the indicator function of the set A and \(\varvec{\Sigma }^{-1/2}_i\) denotes the i-th row of the matrix \(\varvec{\Sigma }^{-1/2}\) (\(i=1, 2\)). Since \(\{\varepsilon _i(t)| ~t\in [a, b]\}\) is a Brownian motion, its increments are independent. Consequently, the processes \(\{\tilde{Y}_i(t)| ~t\in [0,b]\}\) and the random variable \({\tilde{Y}}_i(a)\) are independent. To obtain the joint density of the processes defined by (3.6) and (3.7) it is therefore sufficient to derive the individual densities.

Let \(\mathbb {P}_\theta ^{(i)}\) and \(\mathbb {P}_0^{(i)}\) denote the measures on C([0, b]) associated with the process \({\tilde{Y}}_i = \{ Y_i(t) | \ t \in [0,b] \}\) and \(\{ \varepsilon _{i} (t) | \ t \in [0,b] \}\), respectively. It follows from Theorem 1 in Appendix II of Ibragimov and Has’minskii [18] that \(\mathbb {P}_\theta ^{(i)} \) is absolute continuous with respect to \(\mathbb {P}_0^{(i)}\) with Radon–Nikodym density

$$\begin{aligned} \frac{\mathrm{d}\mathbb {P}_\theta ^{(i)}}{d \mathbb {P}_0^{(i)}} ( \tilde{Y}_i) = \exp \left\{ \int _a^b\Sigma ^{-1/2}_i {\dot{\mathbf{F}}}^\top (t)\theta \mathrm{d}\tilde{Y}_i(t) - \frac{1}{2} \int _a^b (\Sigma ^{-1/2}_i {\dot{\mathbf{F}}}^\top (t)\theta )^2 \mathrm{d}t\right\} \, . \end{aligned}$$

Similarly, if \(\mathbb {Q}_\theta \) denotes the distribution of the random variable \({\tilde{Y}}_i(a) \sim {{\mathcal {N}} } (\Sigma ^{-1/2}_i \mathbf {F}^\top (a)\theta , a) \) in (3.7), then the Radon–Nikodym density of \(\mathbb {Q}^{(i)}_\theta \) with respect to \(\mathbb {Q}^{(i)}_0 \) is given by

$$\begin{aligned} \frac{\mathrm{d}\mathbb {Q}^{(i)}_\theta }{d \mathbb {Q}^{(i)}_0} (\tilde{Y}_i(a))= \exp \left\{ \frac{\tilde{Y}_i(a)\Sigma ^{-1/2}_i \mathbf {F}^\top (a)\theta }{a} - \frac{1}{2} \frac{(\varvec{\Sigma }^{-1/2}_i \mathbf {F}^\top (a) \theta )^2}{a} \right\} \, . \end{aligned}$$

Consequently, because of independence, the joint density of \((\mathbb {P}^{(i)}_\theta ,\mathbb {Q}^{(i)}_\theta )\) with respect to \((\mathbb {P}^{(i)}_0,\mathbb {Q}^{(i)}_0)\) is obtained as

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d}\mathbb {P}^{(i)}_\theta }{\mathrm{d}\mathbb {P}^{(i)}_0} (\tilde{Y}_i)\times \frac{\mathrm{d}\mathbb {Q}^{(i)}_\theta }{\mathrm{d}\mathbb {Q}^{(i)}_0} (\tilde{Y}_i(a)) =&\exp \left\{ \left( \int _a^b\varvec{\Sigma }^{-1/2}_i {\dot{\mathbf{F}}}^\top (t)\theta \mathrm{d}\tilde{Y}_i(t)+ \frac{\tilde{Y}_i(a)\varvec{\Sigma }^{-1/2}_i \mathbf {F}^\top (a)\theta }{a} \right) \right. \\&\left. - \frac{1}{2}\left( \int _a^b (\varvec{\Sigma }^{-1/2}_i {\dot{\mathbf{F}}}^\top (t)\theta )^2 \mathrm{d}t\ + \frac{(\varvec{\Sigma }^{-1/2}_i \mathbf {F}(a) \theta )^2}{a}\right) \right\} \, . \end{aligned} \end{aligned}$$

As the processes \(\tilde{Y}_1\) and \(\tilde{Y}_2\) are independent by construction, the maximum likelihood estimator in model (3.1) can be determined by solving the equation

$$\begin{aligned} \begin{aligned}&\frac{\partial }{\partial \theta } \log \Big \{ \prod _{i=1}^{2} \frac{\mathrm{d}\mathbb {P}^{(i)}_\theta }{\mathrm{d}\mathbb {P}^{(i)}_0} (\tilde{Y}_i)\times \frac{\mathrm{d}\mathbb {Q}^{(i)}_\theta }{\mathrm{d}\mathbb {Q}^{(i)}_0} (\tilde{Y}_i(a))\Big \} \\&\quad = \sum _{i=1}^{2} \Big \{ \int _a^b{\dot{\mathbf{F}}}(t)\varvec{\Sigma }^{-1/2}_i \mathrm{d}\tilde{Y}_i(t)+ \frac{\mathbf {F}(a)\varvec{\Sigma }^{-1/2}_i\tilde{Y}_i(a)}{a} \\&\qquad - \Big ( \int _a^b {\dot{\mathbf{F}}}(t)\Sigma ^{-1/2}_i \varvec{\Sigma }^{-1/2}_i {\dot{\mathbf{F}}}^\top (t) \,\mathrm{d}t + {\dot{\mathbf{F}}}(a)\varvec{\Sigma }^{-1/2}_i\varvec{\Sigma }^{-1/2}_i {\dot{\mathbf{F}}}^\top (a) \Big ) \theta \Big \}= 0 \end{aligned} \end{aligned}$$

with respect to \(\theta \). The solution coincides with the linear estimate defined in (3.3), and a straightforward calculation, using Ito’s formula and the fact that the random variables \(\int ^b_a \varvec{\dot{F}} (t) \mathrm{d} \varvec{\varepsilon } _t \) and \( \varvec{\varepsilon }_a\) are independent, gives

$$\begin{aligned} \text{ Cov }(\hat{\theta }_\mathrm{BLUE} )= & {} \mathbf {M}^{-1} \mathbb {E}_{\theta } \Bigl [ \Big ( \int _a^b \dot{\mathbf {F}}(t) \varvec{\Sigma }^{-1}\,\mathrm{d}\mathbf {Y}(t) + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1} \mathbf {Y}(a) \Big ) \\&\times \Big ( \int _a^b \dot{\mathbf {F}}(t) \varvec{\Sigma }^{-1}\,\mathbf {Y}(t) + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1} \mathbf {Y}(a) \Big )^\top \Bigr ] \mathbf {M}^{-1} \\= & {} \mathbf {M}^{-1} \Big ( \int _a^b {\dot{\mathbf{F}}}(t) \varvec{\Sigma }^{-1} {\dot{\mathbf{F}}}^\top (t) \,\mathrm{d}t + \frac{1}{a} \mathbf {{F}}(a)\varvec{\Sigma }^{-1} \mathbf {{F}}^\top (a)\Big ) \mathbf {M}^{-1} = \mathbf {M}^{-1}, \end{aligned}$$

where the matrix \(\mathbf {M}\) is defined in (3.2). Since the covariance matrix \(\mathbf {M}^{-1}\) is the inverse of the information matrix in the continuous regression model in (3.1) (see [18], p. 81), linear estimator (3.3) is the BLUE, which completes the proof of Theorem 3.1. \(\square \)

Remark 3.1

The proof of Theorem 3.1 can easily be modified to obtain the BLUE for the continuous model on the interval \([a=0, b]\). More precisely, for \(a=0\) Eq. (3.7) becomes a deterministic equation equivalent to

$$\begin{aligned} \mathbf {Y}(0) = \mathbf {F}^\top (0) \theta \, , \end{aligned}$$
(3.8)

and we have to distinguish three cases.

  1. (1)

    If the regression function \(\mathbf {F}\) satisfies \(\mathbf {F}(0) = \mathbf {0}_{p\times 2}\) (that is \(\text{ rank }(\mathbf {F}(0))= 0\))), deterministic Eq. (3.8) does not contain any further information about the parameter \(\theta \) and the maximum likelihood estimator in model (3.1) is given by

    $$\begin{aligned} \hat{\theta }_\mathrm{BLUE} = \mathbf {M}_0^{-1} \Big ( \int _0^b \dot{\mathbf {F}}(t) \varvec{\Sigma }^{-1}\,\mathrm{d}\mathbf {Y} (t) \Big ) , \end{aligned}$$

    where the minimum variance is given by

    $$\begin{aligned} \text{ Cov }(\hat{\theta }_\mathrm{BLUE} ) = \mathbf {M}_0^{-1} =\left( \int _0^b {\dot{\mathbf{F}}}(t) \varvec{\Sigma }^{-1} {\dot{\mathbf{F}}}^\top (t) \,\mathrm{d}t\right) ^{-1} \, . \end{aligned}$$
  2. (2)

    If the rank of the matrix \(\mathbf {F}(0)\) satisfies \(\text{ rank }(\mathbf {F}(0))=1\), deterministic Eq. (3.8) contains one informative equation about \(\theta \). In that case, we assume without loss of generality that \(F_{1,1}(0) \ne 0 \) and it follows by (3.8) that \(\theta _1\) can be reformulated by \(\theta _2, \ldots , \theta _p\) through

    $$\begin{aligned} \theta _1 = \frac{Y_1(0) - \sum _{i=j}^p\theta _j F_{1,j}(0)}{F_{1,1}(0)} \, . \end{aligned}$$
    (3.9)

    Using (3.9) in combination with model (3.1), we obtain a reduced model by

    $$\begin{aligned} \mathbf {Z}(t) = \mathbf {Y}(t) - \frac{Y_1(0)}{F_{1,1}(0)}\begin{pmatrix}F_{1, 1}(0) \\ F_{2, 1}(0)\end{pmatrix} = \tilde{\mathbf {F}}(t){\tilde{\theta }} + \varvec{\Sigma }^{1/2} \varvec{\varepsilon }(t) , \end{aligned}$$
    (3.10)

    where the matrix-valued function \(\tilde{\mathbf {F}}(t)\) is defined by

    $$\begin{aligned} \tilde{\mathbf {F}}^T(t) = \bigl ( F_{i,j}(t) - \frac{F_{i,1}(0)}{F_{1, 1}(0)} F_{1, j}(0)\bigr )_{i=1, 2, j=2, \ldots p} \end{aligned}$$
    (3.11)

    and the reduced \((p-1)\)-dimensional parameter \({{\tilde{\theta }}}\) is given by \({{\tilde{\theta }}} = (\theta _2, \ldots , \theta _p)\). It follows by \(\text{ rank }(\mathbf {F}(0)) = 1\) that the matrix-valued function \(\tilde{\mathbf {F}}(t)\) defined in (3.11) satisfies \(\tilde{\mathbf {F}^T(0)} = 0_{2\times p}\). Consequently, the modified model given by (3.10) satisfies the condition of the case given in (1) and the best linear unbiased estimator for the reduced parameter \({{\tilde{\theta }}}\) is obtained by

    $$\begin{aligned} \hat{{{\tilde{\theta }}}}_\mathrm{BLUE} = \mathbf {M}_0^{-1} \Big ( \int _0^b \dot{\tilde{\mathbf {F}}}(t) \varvec{\Sigma }^{-1}\,\mathrm{d}\mathbf {Z} (t) \Big ) , \end{aligned}$$
    (3.12)

    where the process \(\{\mathbf {Z(t)}; t\in [0, b]\}\) is defined by (3.10), the matrix \(\tilde{\mathbf {F}}(t)\) is given by (3.11), and the minimum variance is given by

    $$\begin{aligned} \text{ Cov }(\hat{{{\tilde{\theta }}}}_\mathrm{BLUE} ) = \mathbf {M}_0^{-1} =\left( \int _0^b \dot{\tilde{\mathbf {F}}}(t) \varvec{\Sigma }^{-1} \dot{\tilde{\mathbf {F}}}^\top (t) \,\mathrm{d}t\right) ^{-1} \, . \end{aligned}$$

    The best linear unbiased estimator for the remaining parameter \(\theta _1\) is then obtained by

    $$\begin{aligned} {\hat{\theta }}_1 = \frac{Y_1(0) - \sum _{i=j}^p\hat{\tilde{\theta }}_{\mathrm{BLUE},j} F_{1,j}(0)}{F_{1,1}(0)} \, . \end{aligned}$$
  3. (3)

    If the rank of the matrix \(\mathbf {F}(0)\) satisfies \(\text{ rank }(\mathbf {F}(0)) = 2\), Eq. (3.8) contains two informative equations about \(\theta \).

    Let

    $$\begin{aligned} \mathbf {A}(t) = \begin{pmatrix} F_{1,1}(t) &{} F_{1,2}(t) \\ F_{2,1}(t) &{} F_{2, 2}(t) \end{pmatrix} \end{aligned}$$
    (3.13)

    be the submatrix of \(\mathbf {F}\) which contains the first two columns of \(\mathbf {F}^T(t)\). Without loss of generality, we assume that \(\mathbf {A}(0)\) is non-singular (as \(\text{ rank }(\mathbf {F}(0)) = 2\)).

    Then it follows by (3.8) that

    $$\begin{aligned} \begin{pmatrix} \theta _1 \\ \theta _2 \end{pmatrix} = \mathbf {A}^{-1}(0)\Bigl (\mathbf {Y}(0) - \bigl (\sum _{j=3}^{p} F_{i,j}(0)\theta _j\bigr )_{i=1, 2} \Bigr ) \, . \end{aligned}$$
    (3.14)

    Using (3.14) in combination with (3.1) we obtain a reduced model given by

    $$\begin{aligned} \mathbf {Z}(t) = \mathbf {Y}(t) - \mathbf {A}(t)\mathbf {A}^{-1}(0) \mathbf {Y}(0) = \tilde{\mathbf {F}}(t){\tilde{\theta }} + \varvec{\Sigma }^{1/2} \varvec{\varepsilon }(t) \end{aligned}$$
    (3.15)

    where the matrix-valued function \(\mathbf {A}(t)\) is given by (3.13), the matrix-valued function \(\tilde{\mathbf {F}}^T(t)\) is of the form

    $$\begin{aligned} \tilde{\mathbf {F}}^T(t) = \mathbf {A}(t) \mathbf {A}^{-1}(0)\bigl (F_{i,j}(0)\bigr )_{i=1, 2; j=3, \ldots p } + \bigl (F_{i,j}(t)\bigr )_{i=1, 2; j=3, \ldots p } \end{aligned}$$
    (3.16)

    and the reduced \((p-2)\)-dimensional parameter \({{\tilde{\theta }}}\) is given by \({{\tilde{\theta }}} = (\theta _3, \ldots , \theta _p) \) .

    The matrix-valued function \(\tilde{\mathbf {F}}(t)\) defined in (3.16) satisfies \(\tilde{\mathbf {F}^T}(0) = \mathbf {0}_{2\times p}\). Consequently, the modified model given by (3.15) satisfies the condition of the case given in (1) and the best linear unbiased estimator \(\hat{{{\tilde{\theta }}}}_\mathrm{BLUE}\) for the reduced \((p-2)\)-dimensional parameter \({{\tilde{\theta }}}\) is obtained by (3.12) using the process \(\{\mathbf {Z(t)}; t\in [0, b]\}\) defined by (3.15) and the matrix-valued function \(\tilde{\mathbf {F}}(t)\) given by (3.16). The best linear unbiased estimator for the remaining parameter \((\theta _1, \theta _2)^T\) is then obtained by

    $$\begin{aligned} \begin{pmatrix} {\hat{\theta }}_1 \\ {\hat{\theta }}_2 \end{pmatrix} = \mathbf {A}^{-1}(0)\Bigl (\mathbf {Y}(0) - \bigl (\sum _{j=3}^{p} F_{i,j}(0)\hat{{{\tilde{\theta }}}}_\mathrm{BLUE,j}\bigr )_{i=1, 2} \Bigr ) \, . \end{aligned}$$

3.2 Model with No Common Parameters

Recall the definition of model (2.1) in Sect. 1. It was demonstrated in Example 2.1 that this case is a special case of model (2.6), where the matrix \(\mathbf {F}^\top \) is given by

$$\begin{aligned} \mathbf {F}^\top (t) = \begin{pmatrix} {f}^\top _1(t) &{} 0^\top _{p_2}\\ 0^\top _{p_1} &{} {f}^\top _2(t)\end{pmatrix} \end{aligned}$$
(3.17)

and \(\theta =({\theta ^{(1)}}^\top ,{\theta ^{(2)}}^\top )^\top \). Considering both components in the vector \(\mathbf {Y}\) separately, we obtain continuous versions of the two models introduced in (2.1),that is,

$$\begin{aligned} Y_i(t) = {f}^\top _i(t) \theta ^{(i)} + \eta _i(t), \, i=1, 2 , \end{aligned}$$
(3.18)

where the error processes \(\{\eta (t) \mid t \in [a, b]\}\) are defined by (2.4). An application of Theorem 3.1 yields the following BLUE.

Corollary 3.1

Consider continuous linear regression model (2.6) on the interval [ab], with continuously differentiable matrix (3.17), a vector \(\{\varvec{\varepsilon }(t)=(\varepsilon _1(t), \varepsilon _2(t))^\top \mid t\in [a,b]\}\) of independent Brownian motions and a matrix \(\varvec{\Sigma }\) defined by (2.3). The best linear unbiased estimator for the parameter \(\theta \) is given by

$$\begin{aligned}&\begin{aligned} {\hat{\theta }}_\mathrm{BLUE}&= \begin{pmatrix} {\hat{\theta }}^{(1)}_\mathrm{BLUE}\\ {\hat{\theta }}^{(2)}_\mathrm{BLUE} \end{pmatrix} \\ {}&= \frac{1}{\sigma ^2_1\sigma ^2_2(1-\varrho ^2)} \mathbf {M} ^{-1} \left\{ \int _a^b \begin{pmatrix} \sigma _2^2 \dot{f}_1(t) &{} - \sigma _1\sigma _2\varrho \dot{f}_1(t) \\ - \sigma _1\sigma _2\varrho \dot{f}_2(t) &{} \sigma ^2_1\dot{f}_2(t) \end{pmatrix} d\begin{pmatrix}Y_1(t) \\ Y_2(t) \end{pmatrix} \, \right. \\&\quad \left. + \frac{1}{a} \begin{pmatrix} \sigma _2^2 {f}_1(a) &{} - \sigma _1\sigma _2\varrho {f}_1(a) \\ - \sigma _1\sigma _2\varrho {f}_2(a) &{} \sigma ^2_1{f}_2(a) \end{pmatrix}\begin{pmatrix}Y_1(a) \\ Y_2(a) \end{pmatrix} \right\} \, . \end{aligned}\nonumber \\ \end{aligned}$$
(3.19)

The minimum variance is given by \(\mathbf {M}^{-1}\), where

$$\begin{aligned} \mathbf {M} = \frac{1}{\sigma ^2_1\sigma ^2_2(1-\varrho ^2)} \begin{pmatrix} \sigma _2^2 \mathbf {M}_{11} &{} -\sigma _1\sigma _2\varrho \mathbf {M}_{12} \\ -\sigma _1\sigma _2\varrho \mathbf {M}_{21} &{} \sigma ^2_1\mathbf {M}_{22}\end{pmatrix} \end{aligned}$$

and

$$\begin{aligned} \mathbf {M}_{ij} = \int _a^b \dot{f}_i(t) \dot{f}^\top _j(t)\mathrm{d}t \, + \frac{1}{a} f_i(a) f_j^T(a) \, ~~~~i,j=1, 2 . \end{aligned}$$
(3.20)

It is of interest to compare estimator (3.19) with the estimator \({\hat{\theta }}_\mathrm{sep} = (({\hat{\theta }}_\mathrm{sep}^{(1)})^\top , ({\hat{\theta }}_\mathrm{sep}^{(2)})^\top )^\top \), which is obtained by estimating the parameter in both models (3.18) separately. It follows from Theorem 2.1 in Dette et al. [6] that the best linear unbiased estimators in these models are given by

$$\begin{aligned} \hat{\theta }^{(i)}_\mathrm{sep} =\mathbf {M}^{-1}_{ii}\left( \int _{a}^b \dot{{f}}_i(t) \mathrm{d}Y_i(t) + \frac{1}{a}{f}_i(a) Y_i(a) \right) , \quad \quad i =1,2 , \end{aligned}$$
(3.21)

where the matrices are defined by

$$\begin{aligned} \mathbf {M}_{ii} = \int _a^b \dot{f}_i(t) \dot{f}^\top _i(t)\mathrm{d}t + \frac{1}{a}f_i(a) f_i^\top (a) , ~~~i=1, 2. \end{aligned}$$

Moreover, the covariance matrices of the estimators \({\hat{\theta }}^{(1)}_\mathrm{sep}\) and \({\hat{\theta }}^{(2)}_\mathrm{sep}\) are the inverses of the Fisher information matrices in the individual models, that is

$$\begin{aligned} \text{ Cov }({\hat{\theta }}^{(i)}_\mathrm{sep}) = \sigma _i^2 \mathbf {M}^{-1}_{ii} ~~~i=1,2. \end{aligned}$$
(3.22)

The following result compares the variance of two estimators (3.19) and (3.21).

Theorem 3.2

If the assumptions of Corollary 3.1 are satisfied, we have (with respect to the Loewner ordering)

$$\begin{aligned} \text{ Cov }( {\hat{\theta }}^{(i )}_\mathrm{BLUE} ) \le \text{ Cov }({\hat{\theta }}^{(i)}_\mathrm{sep}) ~, ~~ i =1,2 , \end{aligned}$$

for all \(\varrho \in (-1, 1)\), where the \( {\hat{\theta }}^{(i )}_\mathrm{BLUE} \) and \( {\hat{\theta }}^{(i)}_\mathrm{sep}\) are the best linear unbiased estimators of the parameter \(\theta ^{(i)}\) obtained by simultaneous estimation (see (3.19)) and separate estimation in the two groups (see (3.21)) , respectively.

Proof

Without loss of generality we consider the case \(i =1\), the proof for the index \(i=2\) is obtained by the same arguments. Let \(\mathbf {K_1}^\top = (\mathbf {I}_{p_1}, \mathbf {0}_{p_1\times p_2})\) be a \(p_1\times (p_1+ p_2)\)- matrix, where \(\mathbf {I}_{p_1}\) and \(\mathbf {0}_{p_1\times p_2}\) denote the \(p_1\)-identity matrix and a \((p_1\times p_2)\)-matrix filled with zeros. Then,

$$\begin{aligned} \text{ Cov }( {\hat{\theta }}^{(i )}_\mathrm{BLUE} ) = (\mathbf {C}_{\mathbf {K}_1}(\mathbf {M}))^{-1}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {C}_{\mathbf {K}_1}(\mathbf {M}) = (\mathbf {K}^\top _1\mathbf {M}^{-1}\mathbf {K}_1)^{-1} = \frac{1}{\sigma ^2_1(1-\varrho ^2)} \left( \mathbf {M}_{11} -\varrho ^2 \mathbf {M}_{12}\mathbf {M}^{-1}_{22}\mathbf {M}^\top _{12}\right) \end{aligned}$$
(3.23)

is the Schur complement of the block \(\mathbf {M}_{22}\) of the information matrix \(\mathbf {M}\) (see p. 74 in [28]). Observing (3.22) we now compare \(\mathbf {C}_{\mathbf {K}_1}(\mathbf {M})\) and \(\tfrac{1}{\sigma ^2}\mathbf {M}_{11}\) and obtain

$$\begin{aligned} \begin{aligned} \mathbf {C}_{\mathbf {K}_1}(\mathbf {M})-\frac{1}{\sigma ^2_1}\mathbf {M}_{11} =&\frac{1}{\sigma ^2_1(1-\varrho ^2)} \left( \mathbf {M}_{11} - \varrho ^2 \mathbf {M}_{12}\mathbf {M}^{-1}_{22} \mathbf {M}_{12}^\top \right) - \frac{1}{\sigma ^2_1}\mathbf {M}_{11} \\ =&\frac{\varrho ^2}{\sigma ^2_1(1-\varrho ^2)}\left( \mathbf {M}_{11} - \mathbf {M}_{12}\mathbf {M}^{-1}_{22} \mathbf {M}_{12}^\top \right) \, \\ :=&\frac{\varrho ^2}{\sigma ^2_1(1-\varrho ^2)} \mathbf {C}_{\mathbf {K}_1}({{\tilde{\mathbf {M}}}}), \end{aligned} \end{aligned}$$
(3.24)

where \(\mathbf {C}_{\mathbf {K}_1}({{\tilde{\mathbf {M}}}})\) is the Schur complement of the block \(\mathbf {M}_{22}\) of the matrix

$$\begin{aligned} {\tilde{\mathbf {M}}} = \begin{pmatrix} \mathbf {M}_{11} &{} \mathbf {M}_{12} \\ \mathbf {M}_{21} &{} \mathbf {M}_{22}\end{pmatrix} \, . \end{aligned}$$

Note that the matrix \({\tilde{\mathbf {M}}}\) is non-negative definite. An application of Lemma 3.12 of Pukelsheim [28] shows that the Schur complement \(\mathbf {C}_{\mathbf {K}_1}({\tilde{\mathbf {M}}})\) is also non-negative definite, that is \(\mathbf {C}_{\mathbf {K}_1}({\tilde{\mathbf {M}}})\ge 0 \) with respect to the Loewner ordering. Observing (3.24) we have

$$\begin{aligned} \big ( \text{ Cov }( {\hat{\theta }}^{(1 )}_\mathrm{BLUE} )\big )^{-1} = \mathbf {C}_{\mathbf {K}_1}(\mathbf {M}) \ge \frac{1}{\sigma ^2_1}\mathbf {M}_{11} = \big ( \text{ Cov }({\hat{\theta }}^{(1)}_\mathrm{sep}) \big )^{-1} \end{aligned}$$

and the statement of the theorem follows. \(\square \)

Remark 3.2

If \(\varrho = 0\) we have \(\mathbf {C}_{K_1}(\mathbf {M})= \mathbf {M}_{11}\), and it follows from (3.23) that separate estimation in the individual groups does not yield less precise estimates, that is \( \text{ Cov }({\hat{\theta }}_\mathrm{sep}^{(l)}) = \text{ Cov }({\hat{\theta }}^{(1)}_\mathrm{BLUE})\) \((i =1,2)\). However, in general we have \(\text{ Cov } ({\hat{\theta }}_\mathrm{sep}^{(l)}) \ge \text{ Cov } ({\hat{\theta }}^{(1)}_\mathrm{BLUE})\). Moreover, the inequality is strict in most cases, which means that simultaneous estimation of the parameters \(\theta ^{(1)}\) and \(\theta ^{(2)}\) yields more precise estimators. A necessary condition for strict inequality (i.e., the matrix \(\text{ Cov } ({\hat{\theta }}_\mathrm{sep}^{(l)}) - \text{ Cov }({\hat{\theta }}^{(1)}_\mathrm{BLUE})\) is positive definite) is the condition \(\varrho \not = 0\). The following result shows that this condition is not sufficient. It considers the important case where the regression functions \(f_1\) and \(f_2\) in (3.17) are the same and shows that in this case the two estimators \({\hat{\theta }}_\mathrm{BLUE}\) and \({\hat{\theta }}_\mathrm{sep}\) coincide.

Corollary 3.2

If the assumptions of Corollary 3.1 hold and additionally the regression functions in model (2.6) satisfy \(f_1 = f_2\), the best linear unbiased estimator for the parameter \(\theta \) is given by

$$\begin{aligned} \begin{aligned} {\hat{\theta }}_\mathrm{BLUE} = \begin{pmatrix} {\hat{\theta }}^{(1)}_\mathrm{BLUE}\\ {\hat{\theta }}^{(2)}_\mathrm{BLUE} \end{pmatrix}= \int _a^b \left( \mathbf {I}_2 \otimes \mathbf {M}_{11}^{-1}\dot{f}_1(t)\right) \mathrm{d}\mathbf {Y} (t) + \frac{1}{a}\left( \mathbf {I}_2\otimes \mathbf {M}_{11}^{-1} f_1(a)\right) \mathbf {Y} (a) \end{aligned}, \end{aligned}$$

where \(\mathbf {I}_{2}\) denotes the \(2\times 2\)-identity matrix and the matrix \(\mathbf {M}_{11}\) is defined by (3.27). Moreover, the minimum variance is given by \(\text{ Cov }({\hat{\theta }}_\mathrm{BLUE})= \varvec{\Sigma } \otimes \mathbf {M}_{11}^{-1} \) and

$$\begin{aligned} \text{ Cov }({\hat{\theta }}_\mathrm{sep}^{(i)}) = \text{ Cov }({\hat{\theta }}^{(1)}_\mathrm{BLUE}) ~~~(i =1,2)~. \end{aligned}$$

3.3 Models with Common Parameters

Recall the definition of model (2.1) in Sect. 1. It was demonstrated in Example 2.2 that this case is a special case of model (2.6), where the matrix of regression functions is given by

$$\begin{aligned} \mathbf {F}^\top (t) = \begin{pmatrix} f^\top _0(t), \tilde{f}^\top _1(t), 0^\top _{p_2-p_0}\\ f^\top _0(t), 0^\top _{p_1-p_0}, \tilde{f}^\top _2(t) \end{pmatrix} \end{aligned}$$
(3.25)

and the vector of parameters is defined by

$$\begin{aligned} \theta = (\theta ^{(0)^\top }, {{\tilde{\theta }}}^{(1)^\top }, {{\tilde{\theta }}}^{(2)^\top })^\top \, . \end{aligned}$$

An application of Theorem 3.1 yields the BLUE in model (2.6) with the matrix \( \mathbf {F}^\top \) defined by (3.25).

Corollary 3.3

Consider continuous linear regression model (2.6) on the interval [ab], where the matrix of regression functions \( \mathbf {F}^\top \) is continuously differentiable. The best linear unbiased estimator for the parameter \(\theta \) is given by

$$\begin{aligned} \begin{aligned} {\hat{\theta }}_\mathrm{BLUE}&= \begin{pmatrix} {\hat{\theta }}^{(0)}_\mathrm{BLUE} \\ \hat{{{\tilde{\theta }}}}^{(1)}_\mathrm{BLUE}\\ \hat{{\tilde{\theta }}}^{(2)}_\mathrm{BLUE} \end{pmatrix}= \frac{1}{\sigma _1^2\sigma _2^2(1-\varrho ^2)} \mathbf {M} ^{-1} \\&\qquad \left\{ \int _a^b \begin{pmatrix} (\sigma _2^2-\sigma _1\sigma _2\varrho )\dot{f}_0(t) &{} (\sigma _1^2-\sigma _1\sigma _2\varrho )\dot{f}_0(t) \\ \sigma _2^2\dot{\tilde{f}}_1(t) &{} - \sigma _1\sigma _2 \varrho \dot{\tilde{f}}_1(t) \\ - \sigma _1\sigma _2 \varrho \dot{\tilde{f}}_2(t) &{} \sigma ^2_1\dot{\tilde{f}}_2(t) \end{pmatrix} d\begin{pmatrix}Y_1(t) \\ Y_2(t) \end{pmatrix} \right. \\&\qquad \left. +\frac{1}{a} \begin{pmatrix} (\sigma _2^2-\sigma _1\sigma _2\varrho ){f}_0(a) &{} (\sigma _1^2-\sigma _1\sigma _2\varrho ){f}_0(a) \\ \sigma _2^2{\tilde{f}}_1(a) &{} - \sigma _1\sigma _2 \varrho {\tilde{f}}_1(a) \\ - \sigma _1\sigma _2 \varrho {\tilde{f}}_2(a) &{} \sigma ^2_1{\tilde{f}}_2(a) \end{pmatrix}\,\begin{pmatrix}Y_1(a) \\ Y_2(a) \end{pmatrix} \right\} . \end{aligned} \end{aligned}$$
(3.26)

The minimum variance is

$$\begin{aligned} \text{ Cov }( {\hat{\theta }}_\mathrm{BLUE} ) = \mathbf {M}^{-1}~, \end{aligned}$$

where

$$\begin{aligned} \mathbf {M} = \frac{1}{\sigma _1^2\sigma _2^2(1-\varrho ^2)} \begin{pmatrix} (\sigma _1^2+\sigma _2^2 - \sigma _1\sigma _2\varrho )\mathbf {M}_{00} &{} (\sigma _2^2-\sigma _1\sigma _2\varrho )\mathbf {M}_{01} &{} (\sigma _1^2-\sigma _1\sigma _2\varrho )\mathbf {M}_{02}\\ (\sigma _2^2-\sigma _1\sigma _2\varrho )\mathbf {M}_{10} &{} \sigma _2^2\mathbf {M}_{11} &{} - \sigma _1\sigma _2 \varrho \mathbf {M}_{12} \\ (\sigma _1^2-\sigma _1\sigma _2\varrho )\mathbf {M}_{20} &{} - \sigma _1\sigma _2 \varrho \mathbf {M}_{21} &{} \sigma _1^2\mathbf {M}_{22}\end{pmatrix} \end{aligned}$$

and individual blocks in this matrix are given by

$$\begin{aligned} \mathbf {M}_{ij} = \int _a^b \dot{g}_i(t) \dot{g}^\top _j(t)\mathrm{d}t + \frac{1}{a}g_i(a) g_j^\top (a), \end{aligned}$$
(3.27)

for \(i, j=0, 1, 2\), where \(g_0(t) = f_0(t)\) and \(g_i(t)= \tilde{f}_i(t)\) for \(i=1, 2\) .

It is again of interest to compare estimate (3.26) with the estimate \({\hat{\theta }}_\mathrm{sep} = (({\hat{\theta }}_\mathrm{sep}^{(1)})^\top , ({\hat{\theta }}_\mathrm{sep}^{(2)})^\top )^\top \), which is obtained by estimating the parameter \(\theta ^{(i)} = ((\theta ^{(0)})^\top ,\) \(({\tilde{\theta }}^{(i)})^\top )^\top \) in both models \(i=1, 2\) (3.18) separately by using (3.21). The corresponding covariances of the estimators \({\hat{\theta }}^{(1)}_{\mathrm{sep}}\) and \({\hat{\theta }}^{(2)}_{\mathrm{sep}}\) are given by (3.22). The following result compares the variance of two estimators (3.26) and (3.21). Its proof is similar to the proof of Theorem 3.2 and therefore omitted.

Theorem 3.3

If the assumptions of Corollary 3.3 are satisfied, we have (with respect to the Loewner ordering)

$$\begin{aligned} \text{ Cov }( {\hat{\theta }}^{(i )}_\mathrm{BLUE} ) \le \text{ Cov }({\hat{\theta }}^{(i)}_\mathrm{sep}) ~, ~~ i =1,2 , \end{aligned}$$

for all \(\varrho \in (-1, 1)\), where the \( {\hat{\theta }}^{(i )}_\mathrm{BLUE} \) and \( {\hat{\theta }}^{(i)}_\mathrm{sep}\) are the best linear unbiased estimators of the parameter \(\theta ^{(i)}\) obtained by simultaneous and separate estimation, respectively.

Remark 3.3

The results presented so far have been derived for the case where the error process \(\{\varvec{\varepsilon }(t) = (\varepsilon _1(t), \varepsilon _2(t))^\top | ~t\in [a, b]\}\) in (2.6) consists of two independent Brownian motions. This assumption has been made to simplify the notation. Similar results can be obtained for Markov processes, and in this remark, we indicate the essential arguments.

To be precise, assume that the error processes \(\{\varvec{\varepsilon }(t) = (\varepsilon _1(t), \varepsilon _2(t))^\top | ~t\in [a, b]\}\) in model (2.6) consist of two independent centered Gaussian processes with continuous covariance kernel given by

$$\begin{aligned} K(s, t) = \mathbb {E}[\varepsilon _i(s) \varepsilon _i(t) ] = u(s)v(t) \min \{q(s), q(t)\} \quad s, t \in [a, b] , \end{aligned}$$
(3.28)

where \(u(\cdot )\) and \(v(\cdot )\) are functions defined on the interval [ab] such that the function \(q(\cdot ) = u(\cdot )/v(\cdot )\) is positive and strictly increasing. Kernels of form (3.28) are called triangular kernels, and a famous result in Doob [14] essentially shows that a Gaussian process is a Markov process if and only if its covariance kernel is triangular (see also [22]). In this case model (2.6) can be transformed into a model with an error process consisting of two independent Brownian motions using the arguments given in Appendix B of Dette et al. [10]. More precisely, define

$$\begin{aligned} q(t) = \frac{u(t)}{v(t)} \end{aligned}$$

and consider the stochastic process

$$\begin{aligned} \varvec{\varepsilon } (t) = v(t) \varvec{ {\tilde{\varepsilon }}} ({q(t)}) , \end{aligned}$$

where \(\{ \varvec{{\tilde{\varepsilon }}} (\tilde{t}) = ({\tilde{\varepsilon }}_1({\tilde{t}})^\top , \tilde{\varepsilon }_2({\tilde{t}})) | \ \tilde{t} \in [\tilde{a},\tilde{b}] \}\) consists of two independent Brownian motions on the interval \([\tilde{a},\tilde{b}]= [ q (a), q (b)]\). It now follows from Doob [14] that the process \(\{ \varvec{ \varepsilon }(t) = (\varepsilon _1(t), \varepsilon _2(t) )^\top | \ t \in [a,b] \}\) consists of two independent centered Gaussian processes on the interval [ab] with covariance kernel (3.28). Consequently, if we consider the model

$$\begin{aligned} \varvec{\tilde{Y}}(\tilde{t}) = \begin{pmatrix} \tilde{Y}_1(\tilde{t}) \\ \tilde{Y}_2(\tilde{t}) \end{pmatrix} = \tilde{\mathbf {F}}^\top (\tilde{t})\theta + \varvec{\Sigma }^{1/2} \varvec{\tilde{\varepsilon }}(\tilde{t}) \, , \, \, \tilde{t} \in [q(a), q(b)], \end{aligned}$$
(3.29)

and

$$\begin{aligned} \tilde{\mathbf {F}}(\tilde{t}) = \frac{\mathbf {F}(q^{-1}(\tilde{t}))}{v(q^{-1}(\tilde{t}))}~,~ \varvec{ \tilde{\varepsilon }} ({\tilde{t}}) = \frac{\varvec{ \varepsilon } (q^{-1}(\tilde{t}))}{v(q^{-1}(\tilde{t}))} ~,~ \varvec{ \tilde{Y}} ({{\tilde{t}}}) = \frac{{\varvec{Y}} (q^{-1}(\tilde{t}))}{v(q^{-1}(\tilde{t}))}, \end{aligned}$$

the results obtained so far are applicable. Thus, a ”good” estimator obtained for the parameter \(\theta \) in model (3.29) is also a ”good estimator” for the parameter \(\theta \) in model (3.1) with error process consisting of two Gaussian processes with covariance kernel (3.28). Consequently, we can derive the optimal estimator for the parameter \(\theta \) in continuous model (3.1) with covariance kernel (3.28) from the best linear unbiased estimator in the model given in (3.29) with Brownian motions by an application of Theorem 3.1. The resulting best linear unbiased estimator for \(\theta \) in model (3.1) with triangular kernel (3.28) is of the form

$$\begin{aligned} \hat{\theta }_\mathrm{BLUE} = \mathbf {M}^{-1} \Big \{ \int _a^b \frac{\dot{\mathbf {F}}(t)v(t) - \mathbf {F}(t)v(t)}{\dot{u}(t)v(t)- u(t)\dot{v}(t)} \varvec{\Sigma }^{-1}\,\mathrm{d}\left( \frac{\mathbf {Y} (t)}{v(t)}\right) + \frac{\mathbf {F}(a)\varvec{\Sigma }^{-1} \mathbf {Y}(a)}{u(a)v(a)} \Big \}, \end{aligned}$$

where the minimum variance is given by

$$\begin{aligned} \mathbf {M}^{-1}&=\left( \int _a^b \frac{\left( \dot{\mathbf {F}}(t)v(t)- \mathbf {F}(t) \dot{v}(t)\right) \varvec{\Sigma }^{-1}\left( \dot{\mathbf {F}}(t)v(t)- \mathbf {F}(t) \dot{v}(t)\right) ^\top }{v^2(t) [\dot{u}(t)v(t)- u(t)\dot{v}(t)]} \,\mathrm{d}t \right. \\&\quad \left. + \frac{ \mathbf {{F}}(a)\varvec{\Sigma }^{-1} \mathbf {{F}}^\top (a)}{u(a)v(a)}\right) ^{-1} \, . \end{aligned}$$

4 Optimal Designs for Comparing Curves

In this section we will derive optimal designs for comparing curves. The first part is devoted to a discretization of the BLUE in the continuous model. In the second part we develop an optimality criterion to obtain efficient designs for the comparison of curves based on the discretized estimators.

4.1 From the Continuous to the Discrete Model

To obtain a discrete design for n observations at the points \(a=t_1, \ldots , t_n\) from the continuous design derived in Sect. 3, we use a similar approach as in Dette et al. [6] and construct a discrete approximation of the stochastic integral in (3.3). For this purpose we consider the linear estimator

$$\begin{aligned} \hat{\theta }_n= & {} \mathbf {M}^{-1} \Big \{ \sum _{i=2}^n \varvec{\Omega }_i \dot{\mathbf {F}}(t_{i-1})\varvec{\Sigma }^{-1} (Y({t_i})-Y({t_{i-1}})) + \frac{\mathbf {F}(a)}{a}\varvec{\Sigma }^{-1} Y_a \Big \} \nonumber \\= & {} \mathbf {M}^{-1} \Big \{ \sum _{i=2}^n \varvec{\Phi }_i \varvec{\Sigma }^{-1} (Y(t_i)-Y({t_{i-1}})) + \frac{\mathbf {F}(a)}{a}\varvec{\Sigma }^{-1} Y_a \Big \}, \end{aligned}$$
(4.1)

where \(a= t_1< t_2< \ldots< t_{n-1}< t_n = b\), \(\varvec{\Omega }_2, \ldots , \varvec{\Omega }_n\) are \(p \times p\) weight matrices and \(\varvec{\Phi }_2=\varvec{\Omega }_2 \dot{\mathbf {F}}(t_{1}), \ldots , \varvec{\Phi }_n = \varvec{\Omega }_n \dot{\mathbf {F}}(t_{n-1})\) are \(p\times 2\) matrices, which have to be chosen in a reasonable way. The matrix \(\mathbf {M}^{-1}\) is given in (3.4). To determine these weights in an “optimal” way we first derive a representation of the mean squared error between best linear estimate (3.3) in the continuous model and its discrete approximation (4.1). The following result is a direct consequence of Ito’s formula.

Lemma 4.1

Consider continuous model (3.1). If the assumptions of Theorem 3.1 are satisfied, we have

$$\begin{aligned}&\mathbb {E}_\theta [(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)^\top ] = \mathbf {M}^{-1} \Big \{ \sum _{i=2}^n \int _{t_{i-1}}^{t_i} \big [ \dot{\mathbf {F}}(s) - \varvec{\Phi }_i \big ] \varvec{\Sigma }^{-1}\big [ \dot{\mathbf {F}}(s) - \varvec{\Phi }_i \big ]^\top \,\mathrm{d}s \nonumber \\&+ \sum _{i,j=2}^n \int _{t_{i-1}}^{t_i} \big [ \dot{\mathbf {F}}(s) - \varvec{\Phi }_i \big ]\varvec{\Sigma }^{-1}\dot{\mathbf {F}}^\top (s) \,\mathrm{d}s \, \theta \, \theta ^\top \int _{t_{j-1}}^{t_j} \dot{\mathbf {F}}(s)\varvec{\Sigma }^{-1}\big [ \dot{\mathbf {F}}(s) - \varvec{\Phi }_i \big ]^\top \,\mathrm{d}s \Big \} \mathbf {M}^{-1}. \end{aligned}$$
(4.2)

In the following we choose optimal \(p\times 2\) matrices \(\Phi _i=\varvec{\Omega }_i \dot{\mathbf {F}}(t_{i-1})\) and design points \(t_2, \ldots , t_{n-1}\) \((t_1=a, t_n=b)\), such that linear estimate (4.1) is unbiased and the mean squared error matrix in (4.2) “becomes small.” An alternative criterion is to replace the mean squared error \(\mathbb {E}_\theta [({\hat{\theta }}_\mathrm{BLUE} - {\hat{\theta }}_n)({\hat{\theta }}_\mathrm{BLUE} - {\hat{\theta }}_n)^\top ]\) by the mean squared error

$$\begin{aligned} \mathbb {E}_\theta [(\hat{\theta }_n - \theta )(\hat{\theta }_n - \theta )^\top ] \end{aligned}$$

between the estimate \({\hat{\theta }}_n\) defined in (4.1) and the “true” vector of parameters. The following result shows that in the class of unbiased estimators both optimization problems yield the same solution. The proof is similar to the proof of Theorem 3.1 in Dette et al. [6].

Theorem 4.1

The estimator \(\hat{\theta }_n\) defined in (4.1) is unbiased if and only if the identity

$$\begin{aligned} \mathbf {M}_0= & {} \int _a^b \dot{\mathbf {F}}(s)\varvec{\Sigma }^{-1} \dot{\mathbf {F}}^\top (s) \,\mathrm{d}s = \sum _{i=2}^n \Phi _i\varvec{\Sigma }^{-1} \int _{t_{i-1}}^{t_i} \dot{\mathbf {F}}^\top (s) \,\mathrm{d}s\nonumber \\= & {} \sum ^n_{i=2} \varvec{\Phi }_i\varvec{\Sigma }^{-1}(\mathbf {F}(t_i)-\mathbf {F}(t_{i-1}))^\top , \end{aligned}$$
(4.3)

is satisfied. Moreover, for any linear unbiased estimator of the form \(\tilde{\theta }_n = \int _a^b \mathbf {G}(s) dY_s \) we have

$$\begin{aligned} \mathbb {E}_\theta [(\tilde{\theta }_n - \theta )(\tilde{\theta }_n - \theta )^\top ] = \mathbb {E}_\theta [(\tilde{\theta }_n - \hat{\theta }_\mathrm{BLUE} )(\tilde{\theta }_n - \hat{\theta }_\mathrm{BLUE} )^\top ] + \mathbf {M}^{-1}. \end{aligned}$$

In order to describe a solution in terms of optimal “weights” \(\varvec{\Phi }^*_i\) and design points \(t^*_i\) we recall that the condition of unbiasedness of the estimate \({\hat{\theta }}_n\) in (4.1) is given by (4.3) and introduce the notation

$$\begin{aligned}&\mathbf {B}_i = [\mathbf {F}(t_i) - \mathbf {F}(t_{i-1})]\varvec{\Sigma }^{-1/2} / \sqrt{t_i-t_{i-1}}, \\&\mathbf {A}_i = \varvec{\Phi }_i\varvec{\Sigma }^{-1/2} \sqrt{t_i-t_{i-1}}.\nonumber \end{aligned}$$
(4.4)

It follows from Lemma 4.1 and Theorem  4.1 that for an unbiased estimator \({\hat{\theta }}_n\) of form (4.1) the mean squared error has the representation

$$\begin{aligned} \mathbb {E}_\theta \big [( {\hat{\theta }}_\mathrm{BLUE} - {\hat{\theta }}_n)^\top ({\hat{\theta }}_\mathrm{BLUE} - {\hat{\theta }}_n)\big ] = -\mathbf {M}^{-1} \mathbf {M}_0 \mathbf {M}^{-1} + \sum _{i=2}^n\mathbf {M}^{-1} \mathbf {A}_i \mathbf {A}_i{^\top } \mathbf {M}^{-1} , \end{aligned}$$
(4.5)

which has to be “minimized” subject to the constraint

$$\begin{aligned} \mathbf {M}_0 = \int ^b_a \dot{\mathbf {F}} (s) \varvec{\Sigma }^{-1} \dot{\mathbf {F}}^\top (s)\mathrm{d}s= \sum _{i=2}^n \mathbf {A}_i \mathbf {B}_i^\top . \end{aligned}$$
(4.6)

The following result shows that a minimization with respect to the weights \(\varvec{\Phi }_i\) (or equivalently \(\mathbf {A}_i\)) can actually be carried out with respect to the Loewner ordering.

Theorem 4.2

Assume that the assumptions of Theorem 3.1 are satisfied and that the matrix

$$\begin{aligned} \mathbf {B} = \sum _{i=2}^{n} \mathbf {B}_i \mathbf {B}^\top _i = \sum ^n_{i=2} \frac{[\mathbf {F}(t_i) - \mathbf {F}(t_{i-1})]\varvec{\Sigma }^{-1}[\mathbf {F}(t_i) - \mathbf {F}(t_{i-1})]^\top }{t_i - t_{i-1}} , \end{aligned}$$
(4.7)

is non-singular. Let \(\varvec{\Phi }^*_2, \ldots , \varvec{\Phi }^*_n\) denote \((p \times 2)\)-matrices satisfying the equations

$$\begin{aligned} \varvec{\Phi }^*_i = \mathbf {M}_0 \mathbf {B}^{-1} \frac{\mathbf {F}(t_i) - \mathbf {F}(t_{i-1})}{t_i - t_{i-1}}\qquad i=2,\ldots ,n, \end{aligned}$$
(4.8)

then \(\varvec{\Phi }^*_2, \ldots , \varvec{\Phi }^*_n\) are optimal weight matrices minimizing \( \mathbb {E}_\theta [(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)^\top ] \) with respect to the Loewner ordering among all unbiased estimators of form (4.1). Moreover, the variance of the resulting estimator \({\hat{\theta }}^*_n\) is given by

$$\begin{aligned} \mathrm {Cov}({\hat{\theta }}^*_n) = \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1} \end{aligned}$$

Proof

Let v denote a p-dimensional vector and consider the problem of minimizing the criterion

$$\begin{aligned} v^\top \mathbb {E}_\theta [(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)^\top ] v \end{aligned}$$
(4.9)

subject to constraint (4.6). Observing (4.5) this yields the Lagrange function

$$\begin{aligned} G_{v}(\mathbf {A}_1\ldots , \mathbf {A}_n)&= - v^\top \mathbf {M}^{-1} \mathbf {M}_0 \mathbf {M}^{-1} v+ \sum ^n_{i=2} (v^\top \mathbf {M}^{-1} \mathbf {A}_i\mathbf {A}_i^\top \mathbf {M}^{-1}v) \nonumber \\&\quad - \text{ tr }\big \{\varvec{\Lambda }(\mathbf {M}_0 - \sum _{i=2}^{n} \mathbf {A}_i \mathbf {B}_i^\top ) \big \} , \end{aligned}$$
(4.10)

where \(\mathbf {A}_2, \ldots , \mathbf {A}_n\) are \((p\times 2)\)-matrices and \(\varvec{\Lambda } = (\lambda _{k,i})^p_{k,i =1}\) is a \((p\times p)\)-matrix of Lagrange multipliers. The function \(G_v\) is convex with respect to \(\mathbf {A}_2,\ldots ,\mathbf {A}_n\). Therefore, taking derivatives with respect to \(\mathbf {A}_j\) yields as necessary and sufficient for the extremum (here we use matrix differential calculus)

$$\begin{aligned} 2(\mathbf {M}^{-1}v)^\top \mathbf {A}_i\otimes (\mathbf {M}^{-1}v)^\top + \text{ vec }\left\{ \varvec{\Lambda }\mathbf {B}_i\right\} = 0^\top _{2p} , \qquad i=2, \ldots , n \, . \end{aligned}$$

Rewriting this system of linear equations in a \((p\times 2)\)-matrix form gives

$$\begin{aligned} 2\mathbf {M}^{-1}vv^\top \mathbf {M}^{-1} \mathbf {A}_i = - \varvec{\Lambda } \mathbf {B}_i \qquad i =2, \ldots , n \, . \end{aligned}$$

Substituting the expression in (4.6) and using the non-singularity of the matrices \(\mathbf {M}\) and \(\mathbf {B}\) yields for the matrix of Lagrangian multipliers

$$\begin{aligned} \varvec{\Lambda } = -2\mathbf {M}^{-1}vv^\top \mathbf {M}^{-1}\mathbf {M}_0\mathbf {B}^{-1}~, \end{aligned}$$

which gives

$$\begin{aligned} 2\mathbf {M}^{-1}vv^\top \mathbf {M}^{-1} \mathbf {A}_i = 2\mathbf {M}^{-1}vv^\top \mathbf {M}^{-1}\mathbf {M}_0\mathbf {B}^{-1} \mathbf {B}_i \qquad i=2, \ldots , n \, . \end{aligned}$$
(4.11)

Note that one solution of (4.11) is given by

$$\begin{aligned} \mathbf {A}^*_i = \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {B}_i , \qquad i=2, \ldots , n \end{aligned}$$

which does not depend on the vectors v. Therefore, the tuple of matrices \((\mathbf {A}^*_2, \ldots , \mathbf {A}^*_n)\) minimizes the convex function \(G_v\) in (4.10) for all \(v\in \mathbb {R}^{p}\).

Observing the notations in (4.4) shows that the optimal matrix weights are given by (4.8). Moreover, these weights in (4.8) do not depend on the vector v either and provide a simultaneous minimizer of the criterion defined in (4.9) for all \(v\in \mathbb {R}^p\). Consequently, the weights defined in (4.8) minimize \( \mathbb {E}_\theta [(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)(\hat{\theta }_\mathrm{BLUE} - \hat{\theta }_n)^\top ] \) under unbiasedness constraint (4.6) with respect to the Loewner ordering. \(\square \)

Remark 4.1

If the matrix \(\mathbf {B}\) in Theorem 4.2 is singular, the optimal weights are not uniquely determined and we propose to replace the inverse B by its Moore–Penrose inverse.

Note that for fixed design points \(t_1, \ldots , t_n \) Theorem 4.2 yields universally optimal weights \(\varvec{\Phi }^*_2,\ldots ,\varvec{\Phi }^*_n\) (with respect to the Loewner ordering) for estimators of form (4.1) satisfying (4.3). On the other hand, a further optimization with respect to the Loewner ordering with respect to the choice of the points \(t_2,\ldots ,t_{n-1}\) \((t_1=a, t_n=b)\) is not possible, and we have to apply a real-valued optimality criterion for this purpose. In the following section, we will derive such a criterion which explicitly addresses the comparison of the regression curves from the two groups introduced in Sect. 2.

4.2 Confidence Bands

We return to the practical scenario of the two groups introduced in (2.1), where we now focus on the comparison of these groups on the interval [ab].

More precisely, consider the model introduced in (2.6) and let \(\hat{\theta }^*_n\) be estimator (4.1) with optimal weights defined by (4.8) from n observations taken at the points \(a=t_1<t_2<\ldots< t_{n-1} <t_n = b\). Then this estimator is normally distributed with mean \(\mathbb {E}[\hat{\theta }^*_n]=\theta \) and covariance matrix

$$\begin{aligned} \text{ Cov }(\hat{\theta }^*_n)= \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1} \end{aligned}$$

where the matrices \(\mathbf {M}, \mathbf {M}_0\) and \(\mathbf {B}\) are given by (3.2), (4.6) and (4.7), respectively. Note that the covariance matrix depends on the points \(t_1, \ldots , t_n\) through the matrix \(\mathbf {B}^{-1}\). Moreover, using the estimator \(\hat{\theta }^*_n\) the prediction of the difference of a fixed point \(t\in [a,b]\) satisfies

$$\begin{aligned} (1, -1) \mathbf {F}^\top (t) \hat{\theta }^*_n - (1, -1) \mathbf {F}^\top (t) \theta \sim \mathcal {N}_p(0, h(t; t_1, \ldots , t_n))~, \end{aligned}$$

where

$$\begin{aligned}&h(t; t_1, \ldots , t_n) = (1, -1)\mathbf {F}^\top \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \\&\quad \mathbf {M}^{-1}\mathbf {F}(t) (1, -1)^T \, . \end{aligned}$$

We now use this result and the results of Gsteiger et al. [16] to obtain a simultaneous confidence band for the difference of the two curves. More precisely, if the interval [ab] is the range where the two curves should be compared, the simultaneous confidence band is defined as follows. Consider the statistic

$$\begin{aligned} {\hat{T}} = \sup _{t \in [a,b]} \ \frac{|(1, -1) \mathbf {F}^\top (t) \hat{\theta }^*_n - (1, -1) \mathbf {F}^\top (t) \theta |}{ \{h(t; t_1, \ldots , t_n)\}^{1/2}} , \end{aligned}$$

and define D as the \((1-\alpha )\)-quantile of the corresponding distribution, that is

$$\begin{aligned} \mathbb {P}({\hat{T}} \le D) = 1-\alpha . \end{aligned}$$

Note that Gsteiger et al. [16] propose the parametric bootstrap for choosing the critical value D. Define

$$\begin{aligned} u (t; t_1, \ldots , t_n)= & {} (1, -1) \mathbf {F}^\top (t) \hat{\theta }^*_n + D \cdot { \{h(t; t_1, \ldots , t_n)\}^{1/2}} ,\\ l (t; t_1, \ldots , t_n)= & {} (1, -1) \mathbf {F}^\top (t) \hat{\theta }^*_n - D \cdot { \{h(t; t_1, \ldots , t_n)\}^{1/2}}, \end{aligned}$$

then the confidence band for the difference of the two regression functions is defined by

$$\begin{aligned} {{\mathcal {C}}}_{1-\alpha } = \big \{ g: [a,b] \rightarrow \mathbb {R}~|~ l (t; t_1, \ldots , t_n) \le g(t ) \le u (t; t_1, \ldots , t_n) \text{ for } \text{ all } t \in [a,b] \big \}.\nonumber \\ \end{aligned}$$
(4.12)

Consequently, good points \(t_1=a< t_2< \ldots <t_{n-1}, t_n=b\) should minimize the width

$$\begin{aligned} u (t; t_1, \ldots , t_n) - l (t; t_1, \ldots , t_n) = 2 \cdot D \cdot { \{h(t; t_1, \ldots , t_n)\}^{1/2}} \end{aligned}$$

of this band at each \(t \in [a,b ]\). As this is only possible in rare circumstances, we propose to minimize an \(L_p\)-norm of the function \(h(\cdot ; t_1\ldots , t_n)\) as a design criterion, that is

$$\begin{aligned} \Phi _p(t_1, \ldots , t_n) = \Vert h(\cdot ; t_1\ldots , t_n) \Vert _p := \Big ( \int _a^b [h(t; t_1\ldots , t_n)]^p \Big )^{1/p} \,\mathrm{d}t, \quad 1 \le p \le \infty ,\nonumber \\ \end{aligned}$$
(4.13)

where the case \(p=\infty \) corresponds to the maximal deviation

$$\begin{aligned} \Vert h(\cdot ; t_1\ldots , t_n) \Vert _{\infty } = \sup _{t \in [a,b]} |h(t; t_1\ldots , t_n)|. \end{aligned}$$

Finally, the optimal points \(a=t^*_1< t_2^*< \ldots < t_n^*=b\) (minimizing (4.13)) and the corresponding weights derived in Theorem 4.2 provide the optimal linear unbiased estimator of form (4.1) (with the corresponding optimal design).

Example 4.1

We now conclude this section by considering the cases of no common and common parameters, respectively.

  1. (a)

    If we are in the situation of Example 2.1 (no common parameters), the regression function \(\mathbf {F}^\top (t)\) is of form in (3.17) and the variance of the prediction of the difference at a fixed point \(t\in [a,b]\) reduces to

    $$\begin{aligned} h(t; t_1, \ldots , t_n)&= (f^\top _1(t), - f^\top _2(t)) \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 \right. \nonumber \\&\quad \left. + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1} (f^\top _1(t), - f^\top _2(t))^\top . \end{aligned}$$

    The corresponding design criterion is given by

    $$\begin{aligned} \Phi _p \big (t_1, \ldots , t_n\big )&= \Vert (f^\top _1, - f^\top _2) \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 \right. \nonumber \\&\quad \left. + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1}(f^\top _1, - f^\top _2)^\top \Vert _p \, . \end{aligned}$$
    (4.14)
  2. (b)

    If we are in the situation of Example 2.2 (common parameters), the regression function \(\mathbf {F}^\top (t)\) is given by (3.25) and the variance of the prediction of the difference at a fixed point \(t\in [a, b]\) reduces to

    $$\begin{aligned} h(t; t_1, \ldots , t_n)&= (0, \tilde{f}^\top _1(t), - \tilde{f}^\top _2(t)) \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 \right. \nonumber \\&\quad \left. + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1} (0, \tilde{f}^\top _1(t), - \tilde{f}^\top _2(t)) ^\top \, . \end{aligned}$$

    The corresponding design criterion is given by

    $$\begin{aligned} \Phi _p \big (t_1, \ldots , t_n\big )&= \Vert (0^\top _{p_0}, \tilde{f}^\top _1, - \tilde{f}^\top _2) \mathbf {M}^{-1}\left\{ \mathbf {M}_0 \mathbf {B}^{-1}\mathbf {M}_0 \right. \nonumber \\&\quad \left. + \frac{1}{a} \mathbf {F}(a)\varvec{\Sigma }^{-1}\mathbf {F}^\top (a)\right\} \mathbf {M}^{-1}(0^\top _{p_0}, \tilde{f}^\top _1, - \tilde{f}^\top _2)^\top \Vert _p \, . \end{aligned}$$

5 Numerical Examples

In this section the methodology is illustrated in examples by means of a simulation study. To be precise, we consider regression model (2.6), where the matrix \(\mathbf {F} (t)\) is given by (3.17) corresponding to the case that the regression function does not share common parameters, see Sect. 3.2 for more details. In this case the corresponding bounds for the confidence band are given by (4.12), where

$$\begin{aligned} u (t; t_1, \ldots , t_n)= & {} (\hat{\theta }^{*(1)}_n)^\top f^{(1)} (t) - (\hat{\theta }^{*(2)}_n)^\top f^{(2)} (t) + D \cdot { \{h(t; t_1, \ldots , t_n)\}^{1/2}} ,\\ l (t; t_1, \ldots , t_n)= & {} (\hat{\theta }^{*(1)}_n)^\top f^{(1)} (t) - (\hat{\theta }^{*(2)}_n)^\top f^{(2)} (t) - D \cdot { \{h(t; t_1, \ldots , t_n)\}^{1/2}}, \end{aligned}$$

and \(\hat{\theta }^*_n = ( (\hat{\theta }^{*(1)}_n)^\top , (\hat{\theta }^{*(2)}_n)^\top )^\top \) is estimator (4.1) with optimal weights defined in (4.8). The design space is given by the interval \([a, b] = [1, 10]\), and we consider three choices for the functions \(f_1\) and \(f_2\) in matrix (3.17), that is

$$\begin{aligned} f_A(t)= & {} (t, \sin (t), \cos (t))^\top , \nonumber \\ f_B(t)= & {} (t^2, \cos (t), \cos (2t))^\top , \\ f_C(t)= & {} \big (t, \log (t), \frac{1}{t}\big )^\top \, . \nonumber \end{aligned}$$
(5.1)

To model the dependence between the two groups we use the covariance matrix

$$\begin{aligned} \varvec{\Sigma } = \begin{pmatrix} 0.1 &{} 0.1\varrho \\ 0.1\varrho &{} 0.1 \end{pmatrix} , \end{aligned}$$

in (2.6), where the correlations are chosen as \(\varrho = 0.2, 0.5, 0.7\). Following the discussion in Sect. 4.1 we focus on the comparison of the regression curves for the two groups and derive optimal designs, minimizing the criterion \( \Phi _\infty \) defined in (4.14). As a result, we obtain simultaneous confidence bands with a smaller maximal width for the difference of the curves describing the relation in the two groups. We can obtain similar results for different values \(p \in (0,\infty ) \) in (4.14), but for the sake of brevity we concentrate on the criterion \(\Phi _{\infty }\) which is probably also the easiest to interpret for practitioners.

We denote by \(\hat{\theta }_{n}^*\) the linear unbiased estimator derived in Sect. 4. For each of the combinations of regression functions containing two different functions defined in (5.1), the optimal weights have been found by Theorem 4.2 and the optimal design points \(t_{i}^*\) are determined minimizing the criterion \(\Phi _\infty \) defined in (4.14). For the numerical optimization the particle swarm optimization (PSO) algorithm is used (see, for example, [5]) assuming a sample size of four observations in each group, that is, \(n=4\). Furthermore, the uniform design used in the following calculations is the design which has four equally spaced design points in the interval [1, 10]. The \(\Phi _{\infty }\)-optimal design points minimizing the criterion criterion in (4.14) are given in Table 1 for all combinations of models and correlations under consideration. Note that for each model the corresponding optimal design points change for different values of correlation \(\varrho \).

Table 1 Optimal designs points on the interval [1, 10] for the estimator \(\hat{\theta }_{n}^*\) in (4.1) minimizing the criterion \(\Phi _{\infty }\) in (4.14)
Fig. 1
figure 1

Confidence bands for the difference of the regression functions (solid gray line) on the basis of an optimal (solid lines) and uniform design (dashed lines). Left panel: \(\varrho =0.2\). Middle panel: \(\varrho = 0.5\). Right panel: \(\varrho = 0.7\). First row: model with \(f_1=f_A\) and \(f_2=f_B\). Second row: model with \(f_1=f_A\) and \(f_2=f_C\). Third row: model with \(f_1=f_B\) and \(f_2=f_C\)

Table 2 Values of the criterion \(\Phi _\infty \) for the optimal and uniform design with four observations in each group in the interval [1, 10]

In order to investigate the impact of the optimal design on the structure of the confidence bands we have performed a small simulation study simulating confidence bands for the difference of the regression functions. The vector of parameter values used for each model is \(\theta = ({\theta ^{(1)}}^\top , {\theta ^{(2)}}^\top )^\top = (1, 1, 1, 1, 1, 1)^\top \). In Fig. 1 we display the averages of uniform confidence bands defined in (4.12) under the uniform and optimal design calculated by 100 simulation runs.

The left, middle and right columns show the results for the correlations \(\varrho = 0.2\), \(\varrho =0.5\) and \(\varrho = 0.7\), respectively, while the rows correspond to different combinations for the functions \(f_1\) and \(f_2\) (first row: \(f_1=f_A\), \(f_2=f_B\), middle row: \(f_1 = f_A\), \(f_2= f_C\) and last row \(f_1= f_B\), \(f_2=f_C\)). In each graph, the confidence bands from the \(\Phi _{\infty }\)-optimal or the uniform design are plotted separately using the solid and dashed lines, respectively, along with the plot for the true difference \(f_1^\top (t)\theta ^{(1)} - f_2^\top (t)\theta ^{(2)}\) (solid gray lines).

We observe that in all cases under considerations the use of \(\Phi _{\infty }\)-optimal designs yields a clearly visible improvement compared to the uniform design. The maximal width of the confidence band is reduced substantially. Moreover, the bands from the \(\Phi _{\infty }\)-optimal designs are nearly uniformly narrower than the bands based on the uniform design (except for the confidence bands displayed in Fig. 1 in the second row of the first panel). Even more importantly, the confidence bands based on the \(\Phi _{\infty }\)-optimal design show a similar structure as the true differences, while the confidence bands from the uniform design oscillate.

A comparison of the left, middle and right columns in Fig. 1 shows that the maximum width for the confidence bands based on the optimal design decreases with increasing (absolute) correlation \(\varrho \). This effect is not visible for the confidence bands based on the uniform design. For example, for the middle row of Fig. 1, which corresponds to the case \(f_1 = f_A\) and \(f_2 = f_C\), the maximum width of the confidence bands based on the equally spaced design points even seems to increase.

Table 2 presents the values of the criterion \(\Phi _{\infty }\) in (4.14) for the different scenarios and confirms the conclusions drawn from the visual inspection of the confidence bands plots. We observe that the use of the optimal design points reduces the maximum width of the confidence bands substantially. Moreover, for the optimal design the maximum width becomes smaller with increasing (absolute) correlation. On the other hand this monotonicity cannot be observed in all cases for the uniform designs.

In an additional numerical study we investigated the amount of model robustness of the optimal designs depicted in Table 1. For the sake of brevity, we only state the main results here: We observed that the optimal designs based on model combinations, which involve the strongly oscillating function \(f_B\), still produce narrow confidence bands for the other model combinations. These bands are still thinner than the corresponding confidence bands based on the uniform design. On the other hand, if the optimal design based on the model combination \(f_A\) and \(f_C\) is used to construct confidence bands for the other model combinations, the resulting bands become wider, but still have shapes similar to the ones based on the uniform design. In general, the usage of the optimal designs even for miss-specified model combination does not result in wider confidence bands for all models under consideration.

Summarizing, the use of the proposed \(\Phi _{\infty }\)-optimal design improves statistically inference substantially reducing the maximum variance of the difference of the two estimated regression curves even if the regression curves are misspecified. Moreover, simultaneous estimation in combination with a \(\Phi _\infty \)-optimal design yields a further reduction of the maximum width of the confidence bands, thus providing a more precise inference for the difference of the curves describing the relation between t and the responses in the two groups.