1 Introduction

Unbalanced data have challenged statisticians for decades. In a simple two-way analysis of variance model with interactions and an unequal number of replicates per cell, which is often referred to as the unbalanced case, it is not obvious at all how to test various hypotheses and estimate effects. For this model, however, it is always possible to construct F-distributed test statistics, under standard normality assumptions, although there remains a serious issue of interpreting the results. Approximate tests in non-normal unbalanced two-way random models has been studied by Güven (2012).

In case of a two-way analysis of variance model with random effects the situation starts to become even more challenging since it is not obvious how to set up the test statistic so that inference can rely on the F-distribution. In this context, the idea to sacrifice some power in order to create a balanced data set proposed by Khuri (1986), Khuri and Littell (1987), Khuri (1990), Gallo and Khuri (1990), Öfversten (1993, 1995) and Christensen (1996) is very appealing.

The above mentioned works were mainly concerned with creating tests for variance components and estimators of fixed effects in the two-way model with interactions with the key idea that one can resample from the residuals to increase variation in order to mimic a balanced model.

In this article we adopt the above mentioned ideas and focus on estimation of fixed effects parameter in a mixed linear model with two variance components. The new proposed estimator can be considered as an alternative to the classical moment estimator, e.g., see Henderson (1953) or Al Sarraj and von Rosen (2009), where in the latter reference improved moment estimators were discussed although the focus was on variance components estimation.

Throughout the article, vectors (matrices) will be denoted by bold letters (bold upper cases), \({\mathcal C}(\cdot )\) denotes the column vector space, and for any matrix \(\varvec{A}\), \(P_{\varvec{A}}\) denotes the orthogonal projection onto \({\mathcal C}(\varvec{A})\). \({\sim }N_p(\varvec{\mu },\varvec{\Sigma })\) stands for distributed as a p-dimensional multivariate normal distribution with mean \(\varvec{\mu }\) and dispersion matrix \(\varvec{\Sigma }\). For a random vector (variable) \(E[\cdot ]\) stands for its expectation, \(D[\cdot ]\) for the dispersion matrix (variance), and \({\mathrm{cov}}[\cdot ,\cdot ]\) for the covariance between two vectors (random variables). Other notation will be defined in the subsequent text as needed.

2 Model

The main focus in this article is on the explicit estimation of \(\varvec{X}\varvec{\beta }\), the vector of fixed effects parameters, in the following mixed linear model,

$$\begin{aligned} \varvec{y}=\varvec{X}\varvec{\beta }+\varvec{Z}\varvec{\gamma }+\epsilon , \end{aligned}$$
(2.1)

where \(\varvec{y}\) is an \(n\times 1\) observable random vector. Let \(\varvec{X}\) be an \(n\times p\), known model matrix of rank \(\mathrm{rank}(\varvec{X})=k\le p < n\) and \(\varvec{Z}\) an \(n\times m\), \(m<n\) known matrix, such that \({\mathcal C}(\varvec{Z}) \not \subset {\mathcal C}(\varvec{X})\).

Here, \(\varvec{\beta }\in R^p\) is unknown and fixed parameter of the mean. Furthermore, the vector of random effects, \(\varvec{\gamma }\sim N_{m}(\varvec{0},\sigma ^2_\gamma \varvec{I}_m)\), is assumed to be independently distributed of the random errors \(\varvec{\epsilon }\sim N_n(\varvec{0},\sigma ^2\varvec{I}_n)\); the variances \(\sigma ^2_\gamma \ge 0\) and \(\sigma ^2>0\) are unknown scalar parameters. Note that the model comprises the important case when \(E[\varvec{\gamma }]\ne \varvec{0}\) because the mean of \(\varvec{Z}\varvec{\gamma }\) can be “moved into” \(\varvec{X}\varvec{\beta }\).

Let \(\rho \!\!=\!\!\sigma ^2_\gamma /\sigma ^2\). The covariance matrix of \(\varvec{y}\) is then \(D[\varvec{y}]\!\!=\!\!\sigma ^2(\rho \varvec{Z}\varvec{Z}'+\varvec{I}_n)\). The ordinary least squares estimator (OLS) is always available for \(\varvec{X}\varvec{\beta }\), i.e., \(\widetilde{\varvec{X}\varvec{\beta }}=P_{\varvec{X}}\varvec{y}\), but it is known from maximum likelihood theory that in general, the weighted least squares estimator asymptotically performs better. In model (2.1), explicit closed form maximum likelihood estimators are available only in special cases. Therefore, in this article the aim is to find an explicit estimator for \(\varvec{X}\varvec{\beta }\), which also uses the random effects variation due to the presence of random effects \(\varvec{\gamma }\).

3 Main result

Denote \(l=\mathrm{rank}(\varvec{X}:\varvec{Z})\), then the dimension of the column space \({\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{X}:\varvec{Z})\) is \(l-k\). Let \(\varvec{A}\) be an \(n\times (l-k)\) matrix, such that the column vector space satisfies \({\mathcal C}(\varvec{A})={\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{X}:\varvec{Z}) = {\mathcal C}\left( P_{(\varvec{X}:\varvec{Z})} - P_{\varvec{X}}\right) \) and \(\varvec{A}'\varvec{A}=\varvec{I}_{l-k}\). The matrix \(\varvec{A}\) can be obtained through the Gram-Schmidt orthogonalization algorithm.

Denote \(\varvec{B}_1\) an \(n\times k\) matrix, such that \(\varvec{B}_1\varvec{B}_1'=P_{\varvec{X}}\) and \(\varvec{B}_1'\varvec{B}_1=\varvec{I}_{k}\). Let \(\varvec{B}_2=\varvec{A}(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1/2}\), and let \(\varvec{B}_3\) be any \(n\times (n-l)\) matrix, such that \({\mathcal C}(\varvec{B}_3)={\mathcal C}(\varvec{X}:\varvec{Z})^\perp \) with \(\varvec{B}_3'\varvec{B}_3=\varvec{I}_{n-l}\). From the assumptions it follows that \(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A}\) is positive definite and therefore its inverse exists: \(\mathrm{rank}(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})=\mathrm{rank}(\varvec{A}'\varvec{Z})=\mathrm{rank}(\varvec{A}:\varvec{Z}^o)-\mathrm{rank}(\varvec{Z}^o)\), where \(\varvec{Z}^o\) is any matrix spanning \({\mathcal C}(\varvec{Z})^\perp \); since \(\varvec{A}\) is of full column rank, the statement is true because \({\mathcal C}(\varvec{A})\cap {\mathcal C}(\varvec{Z})^\perp ={\mathcal C}(\varvec{X}:\varvec{Z})\cap {\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{Z})^\perp =\{\varvec{0}\}\).

The matrices \(\varvec{B}_i\), \(i=1,2,3\), satisfy

$$\begin{aligned} R^n={\mathcal C}(\varvec{B}_1)\boxplus {\mathcal C}(\varvec{B}_2)\boxplus {\mathcal C}(\varvec{B}_3). \end{aligned}$$

Thus, a one-to-one transformation \((\varvec{B}_1:\varvec{B}_2:\varvec{B}_3)'\) of the model (2.1) yields the equivalent representation through the following three models:

$$\begin{aligned} \varvec{B}_1'\varvec{y}= & {} \varvec{B}_1'\varvec{X}\varvec{\beta }+\varvec{B}_1'\varvec{Z}\varvec{\gamma }+\varvec{B}_1'\epsilon ,\end{aligned}$$
(3.1)
$$\begin{aligned} \varvec{B}_2'\varvec{y}= & {} \varvec{B}_2'\varvec{Z}\varvec{\gamma }+\varvec{B}_2'\epsilon ,\end{aligned}$$
(3.2)
$$\begin{aligned} \varvec{B}_3'\varvec{y}= & {} \varvec{B}_3'\epsilon . \end{aligned}$$
(3.3)

Since \(\varvec{B}_3'\varvec{y}\) is normally distributed and \(\varvec{B}_3\) is orthogonal to \((\varvec{B}_1:\varvec{B}_2)\), the vector \(\varvec{B}_3'\varvec{y}\) is independent of \(\varvec{B}_1'\varvec{y}\) as well as \(\varvec{B}_2'\varvec{y}\).

In the subsequent, the models (3.1), (3.2), and (3.3) constitute the basis for the derivation of the estimator for \(\varvec{X}\varvec{\beta }\).

From construction of \(\varvec{B}_1\) we have \(\varvec{B}_1\varvec{B}_1'\varvec{y}=P_{\varvec{X}}\varvec{y}= \widetilde{\varvec{X}\varvec{\beta }}\), the OLS estimator of \(\varvec{X}\varvec{\beta }\), which is functionally independent of the variance components. However, the random effects vector \(\varvec{\gamma }\) is included in both (3.1) and (3.2), and thus for an alternative to the OLS estimator, i.e. a weighted estimator, (3.2) contains essential information about \(\varvec{X}\varvec{\beta }\).

Notice \(E[\varvec{B}_2'\varvec{y}]=\varvec{0}\), \(\varvec{B}_2'\varvec{B}_2= (\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1}\), and \(\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_2=\varvec{I}_{l-k}\). Hence \(D[\varvec{B}_2'\varvec{y}]=\sigma ^2(\rho \varvec{I}_{l-k}+\varvec{B}_2'\varvec{B}_2)\), which implies that the components in \(\varvec{B}_2'\varvec{y}\) are correlated and this can cause technical inference problems because if, for example, conditioning \(\varvec{B}_1'\varvec{y}\) on \(\varvec{B}_2'\varvec{y}\), the inverse of \(D[\varvec{B}_2'\varvec{y}]\) is needed and may be difficult to handle.

Now, one can add an extra random term to \(\varvec{B}_2'\varvec{y}\), which means adding extra variation, in such a way that the dispersion matrix of the resulting random vector becomes diagonal, and its inverse is then easy to utilize. This is a key idea of this article. The main remaining issue is how to add the necessary portion of proper variation, since \(\sigma ^2_\gamma \) and \(\sigma ^2\) are both unknown.

The problem can be resolved adopting the ideas of Gallo, Khuri, Öfversten (see the Introduction for references), and others. These authors performed some kind of abstract bootstrapping. Implementing their ideas means to take the important step of adding observable random variables from (3.3) to (3.2) so that eventually a diagonal dispersion matrix is obtained. Let the new random vector \(\varvec{u}_2\) be defined by

$$\begin{aligned} \varvec{u}_2=\varvec{B}_2'\varvec{y}+(\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2)^{1/2}\mathrm{sel}(\varvec{B}_3'\varvec{y},(l-k)). \end{aligned}$$
(3.4)

We have introduced here a selection operator, \(\mathrm{sel}(\varvec{B}_3'\varvec{y},(l-k))\), which selects \(l-k\) independent observations from \(\varvec{B}_3'\varvec{y}\sim N_{n-l}(\varvec{0},\sigma ^2\varvec{I})\). This selection may be represented by e.g., an \((l-k) \times (n-l)\) matrix \(\varvec{R}\), whose rows are the arbitrarily chosen \(l-k\) rows of \(I_{n-l}\). Obviously, \(\varvec{R}\varvec{R}'= I_{l-k}\). One possible version of (3.4) is then

$$\begin{aligned} \varvec{u}_2=\varvec{B}_2'\varvec{y}+(\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2)^{1/2}\varvec{R}\varvec{B}_3'\varvec{y}. \end{aligned}$$
(3.5)

The scalar \(\lambda \) is chosen large enough so that the matrix \(\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2\) is nonnegative definite and hence the square root in (3.4) exists. There exist several matrix square roots but the different choices do not effect the results in any way.

Theorem 3.1

Let \(\varvec{u}_2\) be given by (3.5). Then,

$$\begin{aligned} \varvec{u}_2\sim N_{(l-k)}(\varvec{0}, \sigma ^2(\rho +\lambda )\varvec{I}_{l-k}). \end{aligned}$$

Proof

Since \(\varvec{u}_2\) is a linear combination of two independently normally distributed random vectors \(\varvec{B}_2'\varvec{y}\) and \(\varvec{B}_3'\varvec{y}\), it is also normally distributed. Thus, only the first two moments of \(\varvec{u}_2\) have to be determined. For the expectation, it follows immediately that \(E[\varvec{u}_2]=\varvec{0}\) because \(E[\varvec{B}_2'\varvec{y}]=\varvec{0}\) and \(E[\varvec{B}_3'\varvec{y}]=\varvec{0}\). Further, direct calculations show that \(D[\varvec{u}_2]=\sigma ^2(\rho +\lambda )\varvec{I}_{l-k}\), which completes the proof.\(\square \)

It is clear that if \(\varvec{u}_2\) is to be used, the parameter \(\lambda \) should be chosen as small as possible (with respect to the existence of the square root) because the dispersion of \(\varvec{u}_2\) is proportional to \(\lambda \). Thus, a natural choice for \(\lambda \) is the largest eigenvalue of \(\varvec{B}'_2\varvec{B}_2=(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1}\).

Note that all components in \(\varvec{u}_2\) are mutually independent, which indeed is a very nice property. Moreover, denote \(\varvec{u}_1=\varvec{B}_1'\varvec{y}\). Then, because of independence between \(\varvec{u}_1\) and \(\varvec{B}_3'\varvec{y}\),

$$\begin{aligned} {\mathrm{cov}}[\varvec{u}_1,\varvec{u}_2]={\mathrm{cov}}[\varvec{B}_1'\varvec{Z}\varvec{\gamma },\varvec{B}_2'\varvec{Z}\varvec{\gamma }]= \sigma ^2\rho \,\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2. \end{aligned}$$

Conditioning \(\varvec{u}_1\) on \(\varvec{u}_2\) we obtain

$$\begin{aligned} E[\varvec{u}_1|\varvec{u}_2]= & {} E[\varvec{u}_1]+{\mathrm{cov}}[\varvec{u}_1,\varvec{u}_2]D[\varvec{u}_2]^{-1}\varvec{u}_2\\ {}= & {} \varvec{B}_1'\varvec{X}\varvec{\beta }+ \frac{\rho }{\rho +\lambda }\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2. \end{aligned}$$

Thus, if the ratio of the variances \(\rho =\sigma ^2_\gamma /\sigma ^2\) is known and because the distribution of \(\varvec{u}_2\) is independent of \(\varvec{\beta }\), the ”natural” estimator of \(\varvec{X}\varvec{\beta }\) is given by \(\varvec{B}_1\widehat{\widehat{\varvec{B}_1'\varvec{X}\varvec{\beta }}}\), where

$$\begin{aligned} \widehat{\widehat{\varvec{B}_1'\varvec{X}\varvec{\beta }}}=\varvec{u}_1- \frac{\rho }{\rho +\lambda }\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2. \end{aligned}$$
(3.6)

In general, the variances or their ratio are not known, hence the ratio

$$\begin{aligned} \frac{\rho }{\rho +\lambda }=1-\frac{\lambda }{\rho +\lambda } \end{aligned}$$
(3.7)

has to be estimated.

Denote after selection, the remaining variables in (3.3) by

$$\begin{aligned} \varvec{u}_3\sim N_{n-2l+k}(\varvec{0},\sigma ^2\varvec{I}_{n-2l+k}). \end{aligned}$$
(3.8)

Moreover,

$$\begin{aligned}&E[(\varvec{u}_2'\varvec{u}_2)^{-1}]=\frac{1}{(l-k-2)\sigma ^2}(\rho +\lambda )^{-1},\quad l-k>2, \end{aligned}$$
(3.9)

since

$$\begin{aligned} \frac{\varvec{u}_2'\varvec{u}_2}{\sigma ^2(\rho +\lambda )}\sim \chi ^2(l-k), \end{aligned}$$

and \(E[\varvec{u}_3'\varvec{u}_3]=(n-2l+k)\sigma ^2\).

Because \(\varvec{u}_2\) and \(\varvec{u}_3\) are independently distributed, it immediately follows that \((\rho +\lambda )^{-1}\) is estimable, i.e.

$$\begin{aligned} \frac{l-k-2}{n-2l+k}E[\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}]=\frac{1}{\rho +\lambda }, \end{aligned}$$
(3.10)

and thus also \(\varvec{X}\varvec{\beta }\) can be estimated.

Proposition 3.2

For the model in (2.1), let \(\varvec{u}_i\), \(i=1,2,3\), and \(\varvec{B}_j\), \(j=1,2\), be defined as in the text preceding this proposition. Then, if \(n-2l+k>0\) and \(l-k-2>0\),

$$\begin{aligned} \widehat{\varvec{X}\varvec{\beta }}=\varvec{B}_1(\varvec{u}_1- \varvec{f}), \end{aligned}$$
(3.11)

where \(\varvec{f}=(1-c \varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1})\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\), \(c=\lambda (l-k-2)/(n-2l+k)\), and \(\lambda \) equals the largest eigenvalue of \(\varvec{B}_2'\varvec{B}_2\).

The proposed estimator for \(\varvec{X}\varvec{\beta }\) is a nonlinear estimator since it is nonlinear in \(\varvec{u}_2\) and \(\varvec{u}_3\). The main objection against this estimator might be that the choice of \(l-k\) components from \(\varvec{B}_3'\varvec{y}\) is arbitrary. Notice that the distribution of \(\varvec{u}_2\) and \(\varvec{u}_3\) does not depend on choice of \(\varvec{R}\) in (3.5). Some kind of U-statistic approach may circumvent this type of arbitrariness but this will not be explored in this article.

4 \(E[\widehat{\varvec{X}\varvec{\beta }}]\) and \(D[\widehat{\varvec{X}\varvec{\beta }}]\)

In this section, \(E[\widehat{\varvec{X}\varvec{\beta }}]\) and \(D[\widehat{\varvec{X}\varvec{\beta }}]\) will be studied, where \(\widehat{\varvec{X}\varvec{\beta }}\) was given in Proposition 3.1. The calculations are somewhat lengthy but fairly straightforward. In the following, the next lemma is needed.

Lemma 4.1

Let \(\varvec{u}_2\) be given by (3.5). Then,

  1. (i)

    \(E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=(l-k)^{-1}\varvec{I}_{l-k}\);

  2. (ii)

    \(E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']= (l-k)^{-1}(l-k-2)^{-1}(\rho +\lambda )^{-1}\displaystyle \frac{1}{\sigma ^2}\varvec{I}_{l-k}\).

Proof

Let \(\varvec{\Gamma }\) be an arbitrary orthogonal matrix. Since \(\varvec{\Gamma }'\varvec{u}_2\) has the same distribution as \(\varvec{u}_2\), \(\varvec{\Gamma }'E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{\Gamma }= E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\) and\(\varvec{\Gamma }'E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{\Gamma }\) equals \(E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\). Therefore,

$$\begin{aligned}&E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=c_1\varvec{I}_{l-k},\\&E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=c_2\varvec{I}_{l-k}, \end{aligned}$$

for some constants \(c_1\) and \(c_2\). Applying the trace-function to these relations yields \(c_1=(l-k)^{-1}\) and

$$\begin{aligned} c_2=(l-k)^{-1}E[(\varvec{u}_2'\varvec{u}_2)^{-1}]=(l-k)^{-1}(l-k-2)^{-1}(\rho +\lambda )^{-1}\frac{1}{\sigma ^2}, \end{aligned}$$

where (3.9) was utilized (see e.g., Kollo and von Rosen 2005; Lemma 2.4.1 for discussion). \(\square \)

The following moment relations will be used in the subsequent calculations:

$$\begin{aligned}&E[\varvec{u}_1]=\varvec{B}_1'\varvec{X}\varvec{\beta },\, E[\varvec{u}_i]=\varvec{0},\,\,i=2,3,\quad E[\varvec{f}]=\varvec{0},\nonumber \\&E[\varvec{u}_1|\varvec{u}_2]=\varvec{B}_1'\varvec{X}\varvec{\beta }+ \rho (\rho +\lambda )^{-1}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2,\end{aligned}$$
(4.1)
$$\begin{aligned}&D[\varvec{u}_1]=\sigma ^2(\rho \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_1+\varvec{I}_{k}),\end{aligned}$$
(4.2)
$$\begin{aligned}&D[\varvec{u}_2]=\sigma ^2(\rho +\lambda )\varvec{I}_{l-k},\end{aligned}$$
(4.3)
$$\begin{aligned}&E[\varvec{u}_3'\varvec{u}_3]=(n-2l+k)\sigma ^2,\end{aligned}$$
(4.4)
$$\begin{aligned}&E[(\varvec{u}_3'\varvec{u}_3)^2]=(n-2l+k)(n-2(l-1)+k)(\sigma ^2)^2. \end{aligned}$$
(4.5)

Here we have used \(\varvec{u}_3'\varvec{u}_3/\sigma ^2\sim \chi ^2(n-2l+k)\).

Since \(E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}]=\varvec{0}\), it follows from Proposition 3.2 that for all \(\varvec{\beta }\in R^k\),

$$\begin{aligned} E[\widehat{\varvec{X}\varvec{\beta }}]=E[\varvec{B}_1\varvec{u}_1]=\varvec{X}\varvec{\beta }. \end{aligned}$$
(4.6)

Furthermore, from Proposition 3.2 it follows that

$$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]=\varvec{B}_1 \left( D[\varvec{u}_1]+D[\varvec{f}]- 2{\mathrm{cov}}[\varvec{u}_1,\varvec{f}] \right) \varvec{B}_1', \end{aligned}$$
(4.7)

since it can be shown that \({\mathrm{cov}}[\varvec{u}_1,\varvec{f}]={\mathrm{cov}}[\varvec{f},\varvec{u}_1]\). The dispersion \(D[\varvec{u}_1]\) was presented in (4.2) and it remains to derive \(D[\varvec{f}]\) and \(2{\mathrm{cov}}[\varvec{u}_1,\varvec{f}]\). We start calculating

$$\begin{aligned} D[\varvec{f}]=E[(1-c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1})^2\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1], \end{aligned}$$

where as in Proposition 3.2,

$$\begin{aligned} c=\lambda \frac{l-k-2}{n-2l+k}. \end{aligned}$$

Hence, we calculate

$$\begin{aligned}&E[\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]=\sigma ^2(\rho +\lambda )\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.3) was used;

$$\begin{aligned}&-2E[c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\\&\quad =-2c(n-2l+k)\sigma ^2E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1\\&\quad =-\sigma ^2 2c(n-2l+k)(l-k)^{-1} \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.5) and Lemma 4.1 (i) were used;

$$\begin{aligned}&E[c^2(\varvec{u}_3'\varvec{u}_3)^2(\varvec{u}_2'\varvec{u}_2)^{-2}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\\&\quad =c^2(n-2l+k)(n-2(l-1)+k)(\sigma ^2)^2E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\\&\qquad \times \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1\\&\quad =\sigma ^2c^2\frac{(n-2l+k)(n-2(l-1)+k)}{(l-k)(l-k-2)(\rho +\lambda )} \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.5) and Lemma 4.1 (ii) were used, which together yield

$$\begin{aligned} D[\varvec{f}]= & {} \sigma ^2\left( \rho +\lambda -2c\frac{(n-2l+k)}{l-k}\right. \nonumber \\&\left. +\, c^2\frac{(n-2l+k)(n-2(l-1)+k)}{(l-k)(l-k-2)(\rho +\lambda )}\right) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1. \end{aligned}$$
(4.8)

Next \({\mathrm{cov}}[\varvec{f},\varvec{u}_1]\) will be calculated via conditioning \(\varvec{u}_1|\varvec{u}_2\). Here \(E_{\varvec{y}}[\cdot ]\) indicates that expectation is taken with respect to the distribution of \(\varvec{y}\).

$$\begin{aligned} {\mathrm{cov}}[\varvec{f},\varvec{u}_1]= & {} E_{u_2}E_{u_1|u_2}E_{u_3}[(1-c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_1']\nonumber \\= & {} E_{u_2}[(1-c(n-2l+k)\sigma ^2(\varvec{u}_2'\varvec{u}_2)^{-1})\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\nonumber \\&\times \rho (\rho +\lambda )^{-1}\nonumber \\= & {} \sigma ^2\rho \left( 1-c\frac{(n-2l+k)}{(l-k)(\rho +\lambda )} \right) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$
(4.9)

where (4.4) and Lemma 4.1 (i) have been applied. Thus, \(D[\widehat{\varvec{X}\varvec{\beta }}]\) is obtained.

Denote the combined constants in \(D[\varvec{f}]\) and \({\mathrm{cov}}[\varvec{f},\varvec{u}_1]\) by

$$\begin{aligned} \rho _1= & {} \rho +\lambda -2\frac{l-k-2}{l-k} \lambda + \frac{l-k-2}{l-k} \frac{n-2(l-1)+k}{n-2l+k}\frac{(\lambda )^{2}}{\rho +\lambda }\nonumber \\&-2\rho \left( 1-\frac{(l-k-2)\lambda }{(l-k)(\rho +\lambda )}\right) . \end{aligned}$$
(4.10)

After substituting (4.8), (4.9), and (4.10) into (4.7), we get

$$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]=\sigma ^2P_{\varvec{X}} + \sigma ^2 \rho P_{\varvec{X}}\varvec{Z}\left[ I + \frac{\rho _1}{\rho } \varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\right] \varvec{Z}'P_{\varvec{X}}. \end{aligned}$$
(4.11)

The above obtained results are summarized in the next theorem.

Theorem 4.2

Let \(\widehat{\varvec{X}\varvec{\beta }}\) be given by (3.11) and let \(\widetilde{\varvec{X}\varvec{\beta }}\) be the OLS of \(\varvec{X}\varvec{\beta }\). Then

  1. (i)

    for all \(\varvec{\beta }\in R^k\), \(\sigma ^2>0\), \(\rho \ge 0\),

    $$\begin{aligned} E[\widehat{\varvec{X}\varvec{\beta }}]=\varvec{X}\varvec{\beta }; \end{aligned}$$
  2. (ii)
    $$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]= & {} D[\widetilde{\varvec{X}\varvec{\beta }}] + \sigma ^2\rho _1P_{\varvec{X}}\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'P_{\varvec{X}}, \end{aligned}$$

    where \(\rho _1\) is given by (4.10).

5 Comparison of \(D[\widehat{\varvec{g}'\varvec{\beta }}]\) with the dispersion matrix of the OLS estimator

Let \(\varvec{g}'\varvec{\beta }\) be an arbitrary estimable function of \(\varvec{\beta }\), i.e., \(\varvec{g}\in {\mathcal C}(\varvec{X}')\). Equivalently, \(\varvec{g}=\varvec{X}'\varvec{h}\) for some \(\varvec{h}\). From (4.11) and Theorem 4.2 we get immediately,

$$\begin{aligned} D[\widehat{\varvec{g}'\varvec{\beta }}] = D[\widehat{\varvec{h}'\varvec{X}\varvec{\beta }}]= D[\widetilde{\varvec{g}'\varvec{\beta }}] + \sigma ^2\rho _1\varvec{h}'P_{\varvec{X}}\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'P_{\varvec{X}}\varvec{h}. \end{aligned}$$

As presented in (4.2), the dispersion matrix of \(\widetilde{\varvec{X}\varvec{\beta }}=\varvec{B}_1\varvec{u}_1=P_{\varvec{X}}\varvec{y}\) equals

$$\begin{aligned} \sigma ^2 \varvec{B}_1(\rho \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_1 +\varvec{I}_k)\varvec{B}_1' = \sigma ^2 P_{\varvec{X}}(\rho \varvec{Z}\varvec{Z}' +\varvec{I}_k)P_{\varvec{X}}, \end{aligned}$$

from where it immediately follows,

$$\begin{aligned} D[\widetilde{\varvec{g}'\varvec{\beta }}] = \sigma ^2 \varvec{h}'P_{\varvec{X}}(\rho \varvec{Z}\varvec{Z}' +\varvec{I}_k)P_{\varvec{X}}\varvec{h}. \end{aligned}$$

Let \(d=\tfrac{l-k-2}{l-k}\). In order to see if the proposed estimator \({\widehat{\varvec{\beta }}}\) improves the ordinary least squares estimator, according to Theorem 4.2 (ii), the condition under which \(\rho _1<0\), i.e.

$$\begin{aligned}&\rho +\lambda -2d\lambda +d\tfrac{n-2(l-1)+k}{n-2l+k}\tfrac{(\lambda )^{2}}{\rho +\lambda } -2\left( \rho -d\tfrac{\rho \lambda }{\rho +\lambda }\right) <0 \end{aligned}$$

has to be studied. This expression is equivalent to

$$\begin{aligned} \rho >\lambda +d\left( \tfrac{n-2(l-1)+k}{n-2l+k}-2\right) \tfrac{\lambda ^2}{\rho +\lambda }, \end{aligned}$$
(5.1)

where it has been used that

$$\begin{aligned} \tfrac{\rho }{\rho +\lambda }=1-\tfrac{\lambda }{\rho +\lambda }. \end{aligned}$$

Theorem 5.1

Let \(\widehat{\varvec{X}\varvec{\beta }}\) be given by (3.11). Then for every estimable function \(\varvec{g}'\varvec{\beta }\), \(D[\widetilde{\varvec{g}'\varvec{\beta }}] - D[\widehat{\varvec{g}'\varvec{\beta }}] >0\) if and only if

$$\begin{aligned} \rho >\lambda (1-(1-\tfrac{2}{n-2l+k})(1-\tfrac{2}{l-k}))^{\tfrac{1}{2}}. \end{aligned}$$

Proof

Manipulating (5.1) yields that \({\widehat{\varvec{X}\varvec{\beta }}}\) has a smaller dispersion matrix if and only if

$$\begin{aligned} \rho -\lambda +\tfrac{\lambda ^2}{\rho +\lambda }\tfrac{n-2l+k-2}{n-2l+k}\tfrac{l-k-2}{l-k}> 0. \end{aligned}$$

The statement of the theorem is the solution to this inequality. \(\square \)

From Theorem 5.1 it follows that the inequality \(\rho \!>\!\lambda \) is a simple criterion for deciding if \({\widehat{\varvec{X}\varvec{\beta }}}\) should be used instead of the OLS estimator, although \(\rho \) has to be estimated. Here we can use as variance estimator \({\widehat{\sigma }}^2\!=\!(n\!-\!2l\!+\!k)^{-1}\varvec{u}_3'\varvec{u}_3\). However, if we use the estimator \({\widehat{\rho }}=\tfrac{n-2(l+1)+k}{l-k} \varvec{u}_2'\varvec{u}_2(\varvec{u}_3'\varvec{u}_3)^{-1}-\lambda \), which is motivated by (3.8), we observe that if \(\rho<<\lambda \), \({\widehat{\rho }}\) can take negative values. Similarly, if we use the estimator \({\widehat{\sigma }}_\gamma ^2=(l-k)^{-1} \varvec{u}_2'\varvec{u}_2-\lambda {\widehat{\sigma }}^2\), which is also motivated by Theorem 3.1 and (3.8), we can observe that if \(\sigma _\gamma ^2<<\sigma ^2\), \({\widehat{\sigma }}_\gamma ^2\) can take negative values, indicating that the OLS estimator is preferable.

Table 1 Configuration settings and calculated \(\lambda \) for the simulation study
Table 2 Empirical estimates, and the theoretical variances \(D[{\tilde{\beta }}]\) and \(D[{\widehat{\beta }}]\) for Configuration 1
Table 3 Empirical estimates, and the theoretical variances \(D[{\tilde{\beta }}]\) and \(D[{\widehat{\beta }}]\) for Configuration 2

6 Example: unbalanced one-way random model

A special case of model (2.1) is the one-way random effects model

$$\begin{aligned} y_{ij} = \beta + \gamma _i + \epsilon _{ij}, \end{aligned}$$
(6.1)

where \(i=1, \dots , l\), \(j=1,\dots ,n_i\), and \(n=\sum _{i=1}^ln_i\). The expectation \(\beta \in R\) is our parameter of interest. All distributional assumptions of model (2.1) are assumed to hold. Here \(k=1\), the matrix \(\varvec{X}\) is an n-dimensional column vector of ones, \(\varvec{X}={\varvec{1}} _n\), and the matrix \(\varvec{Z}\) is a block matrix of column vectors of ones of length \(n_i\), \(\varvec{Z}={\mathrm{Diag}}\{{\varvec{1}} _{n_i}\}\). The OLS estimator of \({\beta }\) is \(\bar{y} = \varvec{u}_1\), the average of all n observations. It is well known that the maximum likelihood estimator (MLE) of \(\mu \), in case of an unbalanced model, has to be obtained iteratively. Note that \({\mathcal C}(\varvec{X})\subseteq {\mathcal C}(\varvec{Z})\) and thus \(l=\mathrm{rank}(\varvec{Z})\).

We illustrate our procedure using a small simulation study for model (6.1). A broad range for the ratio of the variance components \(\rho \) is considered. The number of levels l of \(\varvec{\gamma }\) is chosen as small as \(l=3\) with relatively higher number of observations \(n_i\) per level, and a relatively higher \(l=10\), with smaller numbers \(n_i\). Because of the proportionality, we considered \(\sigma ^2=1\) only.

All setting configurations are presented in Table 1. For each configuration, 10,000 simulations were carried out. The OLS estimates as well as \({\widehat{\beta }}\) estimates are presented as averages of 10,000 observed estimates. In addition, the estimates of \(\sigma ^2\) and of \(\sigma ^2_\gamma \) are presented. The observed MSE of the OLS estimators as well as of the \({\widehat{\beta }}\)s are also included here. The results are presented in Tables 2 and 3.

Fig. 1
figure 1

Histogram from 10,000 estimates of \({\widehat{\sigma }}_\gamma ^2\) for Configuration 2, for \(\rho =5\), illustrating positive probability of negative estimates of \(\sigma _\gamma ^2\)

Fig. 2
figure 2

Histogram from 10,000 estimates of \({\widehat{\beta }}\) for Configuration 2, for \(\rho =5\), illustrating the symmetry of the distribution of \({\widehat{\beta }}\)

Fig. 3
figure 3

Histogram from 10,000 estimates of the OLS estimator for Configuration 2, for \(\rho =5\)

In Sect. 5, it was suggested that the new proposed estimator has a smaller dispersion if \(\rho >\lambda =7.175\) which is in complete agreement with Table 2 where for \(\rho =10\) or \(\rho =20\) simulations indicate the new estimator has a smaller dispersion than the OLS estimator, but not for \(\rho =5\). The results for Configuration 2 are presented in Table 3. Now using Table 3 and Theorem 5.1, it follows that \(\rho \) should be larger than 1.97 if \({\widehat{\varvec{\beta }}}\) is to be applied instead of the OLS estimator. This strategy is supported by results of simulations presented in Table 3. The tables indicate that even a smaller \(\rho \) can be used but one has to remember that estimated variances and MSEs are applied, in particular that \({\widehat{\sigma }}_{\gamma }\) can become negative (see Figure 1). However, with confidence we can state that the new estimator is better than the least squares estimator in certain regions of the parameter space (described through \(\rho \)) as it is shown in Theorem 5.1 and simulations. In addition, we would like to point out that in spite of \({\widehat{\varvec{\beta }}}\) being a nonlinear estimator, it is an unbiased estimator as is shown in (4.6), and as we observed in our simulations, its distribution seems to be symmetric around its expectation. For illustration see histograms of \({\widehat{\beta }}\) and of the OLS estimators in Figure 2 and Figure 3, respectively.