A new method for obtaining explicit estimators in unbalanced mixed linear models

von Rosen, Tatjana; von Rosen, Dietrich; Volaufova, Julia

doi:10.1007/s00362-017-0937-1

A new method for obtaining explicit estimators in unbalanced mixed linear models

Regular Article
Open access
Published: 10 August 2017

Volume 61, pages 371–383, (2020)
Cite this article

Download PDF

You have full access to this open access article

Statistical Papers Aims and scope Submit manuscript

A new method for obtaining explicit estimators in unbalanced mixed linear models

Download PDF

Tatjana von Rosen¹,
Dietrich von Rosen ORCID: orcid.org/0000-0002-3135-4325^2,3 &
Julia Volaufova⁴

1341 Accesses
1 Altmetric
Explore all metrics

Abstract

The general unbalanced mixed linear model with two variance components is considered. Through resampling it is demonstrated how the fixed effects can be estimated explicitly. It is shown that the obtained nonlinear estimator is unbiased and its variance is also derived. A condition is given when the proposed estimator is recommended instead of the ordinary least squares estimator.

General unbiased estimating equations for variance components in linear mixed models

Article 08 September 2021

Interval Estimation of the Intra-class Correlation in General Linear Mixed Effects Models

Article 25 May 2021

Inferences in linear mixed models with skew-normal random effects

Article 15 March 2015

1 Introduction

Unbalanced data have challenged statisticians for decades. In a simple two-way analysis of variance model with interactions and an unequal number of replicates per cell, which is often referred to as the unbalanced case, it is not obvious at all how to test various hypotheses and estimate effects. For this model, however, it is always possible to construct F-distributed test statistics, under standard normality assumptions, although there remains a serious issue of interpreting the results. Approximate tests in non-normal unbalanced two-way random models has been studied by Güven (2012).

In case of a two-way analysis of variance model with random effects the situation starts to become even more challenging since it is not obvious how to set up the test statistic so that inference can rely on the F-distribution. In this context, the idea to sacrifice some power in order to create a balanced data set proposed by Khuri (1986), Khuri and Littell (1987), Khuri (1990), Gallo and Khuri (1990), Öfversten (1993, 1995) and Christensen (1996) is very appealing.

The above mentioned works were mainly concerned with creating tests for variance components and estimators of fixed effects in the two-way model with interactions with the key idea that one can resample from the residuals to increase variation in order to mimic a balanced model.

In this article we adopt the above mentioned ideas and focus on estimation of fixed effects parameter in a mixed linear model with two variance components. The new proposed estimator can be considered as an alternative to the classical moment estimator, e.g., see Henderson (1953) or Al Sarraj and von Rosen (2009), where in the latter reference improved moment estimators were discussed although the focus was on variance components estimation.

Throughout the article, vectors (matrices) will be denoted by bold letters (bold upper cases), ${\mathcal C}(\cdot )$ denotes the column vector space, and for any matrix $\varvec{A}$, $P_{\varvec{A}}$ denotes the orthogonal projection onto ${\mathcal C}(\varvec{A})$. ${\sim }N_p(\varvec{\mu },\varvec{\Sigma })$ stands for distributed as a p-dimensional multivariate normal distribution with mean $\varvec{\mu }$ and dispersion matrix $\varvec{\Sigma }$. For a random vector (variable) $E[\cdot ]$ stands for its expectation, $D[\cdot ]$ for the dispersion matrix (variance), and ${\mathrm{cov}}[\cdot ,\cdot ]$ for the covariance between two vectors (random variables). Other notation will be defined in the subsequent text as needed.

2 Model

The main focus in this article is on the explicit estimation of $\varvec{X}\varvec{\beta }$, the vector of fixed effects parameters, in the following mixed linear model,

$$\begin{aligned} \varvec{y}=\varvec{X}\varvec{\beta }+\varvec{Z}\varvec{\gamma }+\epsilon , \end{aligned}$$

(2.1)

where $\varvec{y}$ is an $n\times 1$ observable random vector. Let $\varvec{X}$ be an $n\times p$, known model matrix of rank $\mathrm{rank}(\varvec{X})=k\le p < n$ and $\varvec{Z}$ an $n\times m$, $m<n$ known matrix, such that ${\mathcal C}(\varvec{Z}) \not \subset {\mathcal C}(\varvec{X})$.

Here, $\varvec{\beta }\in R^p$ is unknown and fixed parameter of the mean. Furthermore, the vector of random effects, $\varvec{\gamma }\sim N_{m}(\varvec{0},\sigma ^2_\gamma \varvec{I}_m)$, is assumed to be independently distributed of the random errors $\varvec{\epsilon }\sim N_n(\varvec{0},\sigma ^2\varvec{I}_n)$; the variances $\sigma ^2_\gamma \ge 0$ and $\sigma ^2>0$ are unknown scalar parameters. Note that the model comprises the important case when $E[\varvec{\gamma }]\ne \varvec{0}$ because the mean of $\varvec{Z}\varvec{\gamma }$ can be “moved into” $\varvec{X}\varvec{\beta }$.

Let $\rho \!\!=\!\!\sigma ^2_\gamma /\sigma ^2$. The covariance matrix of $\varvec{y}$ is then $D[\varvec{y}]\!\!=\!\!\sigma ^2(\rho \varvec{Z}\varvec{Z}'+\varvec{I}_n)$. The ordinary least squares estimator (OLS) is always available for $\varvec{X}\varvec{\beta }$, i.e., $\widetilde{\varvec{X}\varvec{\beta }}=P_{\varvec{X}}\varvec{y}$, but it is known from maximum likelihood theory that in general, the weighted least squares estimator asymptotically performs better. In model (2.1), explicit closed form maximum likelihood estimators are available only in special cases. Therefore, in this article the aim is to find an explicit estimator for $\varvec{X}\varvec{\beta }$, which also uses the random effects variation due to the presence of random effects $\varvec{\gamma }$.

3 Main result

Denote $l=\mathrm{rank}(\varvec{X}:\varvec{Z})$, then the dimension of the column space ${\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{X}:\varvec{Z})$ is $l-k$. Let $\varvec{A}$ be an $n\times (l-k)$ matrix, such that the column vector space satisfies ${\mathcal C}(\varvec{A})={\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{X}:\varvec{Z}) = {\mathcal C}\left( P_{(\varvec{X}:\varvec{Z})} - P_{\varvec{X}}\right) $ and $\varvec{A}'\varvec{A}=\varvec{I}_{l-k}$. The matrix $\varvec{A}$ can be obtained through the Gram-Schmidt orthogonalization algorithm.

Denote $\varvec{B}_1$ an $n\times k$ matrix, such that $\varvec{B}_1\varvec{B}_1'=P_{\varvec{X}}$ and $\varvec{B}_1'\varvec{B}_1=\varvec{I}_{k}$. Let $\varvec{B}_2=\varvec{A}(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1/2}$, and let $\varvec{B}_3$ be any $n\times (n-l)$ matrix, such that ${\mathcal C}(\varvec{B}_3)={\mathcal C}(\varvec{X}:\varvec{Z})^\perp $ with $\varvec{B}_3'\varvec{B}_3=\varvec{I}_{n-l}$. From the assumptions it follows that $\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A}$ is positive definite and therefore its inverse exists: $\mathrm{rank}(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})=\mathrm{rank}(\varvec{A}'\varvec{Z})=\mathrm{rank}(\varvec{A}:\varvec{Z}^o)-\mathrm{rank}(\varvec{Z}^o)$, where $\varvec{Z}^o$ is any matrix spanning ${\mathcal C}(\varvec{Z})^\perp $; since $\varvec{A}$ is of full column rank, the statement is true because ${\mathcal C}(\varvec{A})\cap {\mathcal C}(\varvec{Z})^\perp ={\mathcal C}(\varvec{X}:\varvec{Z})\cap {\mathcal C}(\varvec{X})^\perp \cap {\mathcal C}(\varvec{Z})^\perp =\{\varvec{0}\}$.

The matrices $\varvec{B}_i$, $i=1,2,3$, satisfy

$$\begin{aligned} R^n={\mathcal C}(\varvec{B}_1)\boxplus {\mathcal C}(\varvec{B}_2)\boxplus {\mathcal C}(\varvec{B}_3). \end{aligned}$$

Thus, a one-to-one transformation $(\varvec{B}_1:\varvec{B}_2:\varvec{B}_3)'$ of the model (2.1) yields the equivalent representation through the following three models:

$$\begin{aligned} \varvec{B}_1'\varvec{y}= & {} \varvec{B}_1'\varvec{X}\varvec{\beta }+\varvec{B}_1'\varvec{Z}\varvec{\gamma }+\varvec{B}_1'\epsilon ,\end{aligned}$$

(3.1)

$$\begin{aligned} \varvec{B}_2'\varvec{y}= & {} \varvec{B}_2'\varvec{Z}\varvec{\gamma }+\varvec{B}_2'\epsilon ,\end{aligned}$$

(3.2)

$$\begin{aligned} \varvec{B}_3'\varvec{y}= & {} \varvec{B}_3'\epsilon . \end{aligned}$$

(3.3)

Since $\varvec{B}_3'\varvec{y}$ is normally distributed and $\varvec{B}_3$ is orthogonal to $(\varvec{B}_1:\varvec{B}_2)$, the vector $\varvec{B}_3'\varvec{y}$ is independent of $\varvec{B}_1'\varvec{y}$ as well as $\varvec{B}_2'\varvec{y}$.

In the subsequent, the models (3.1), (3.2), and (3.3) constitute the basis for the derivation of the estimator for $\varvec{X}\varvec{\beta }$.

From construction of $\varvec{B}_1$ we have $\varvec{B}_1\varvec{B}_1'\varvec{y}=P_{\varvec{X}}\varvec{y}= \widetilde{\varvec{X}\varvec{\beta }}$, the OLS estimator of $\varvec{X}\varvec{\beta }$, which is functionally independent of the variance components. However, the random effects vector $\varvec{\gamma }$ is included in both (3.1) and (3.2), and thus for an alternative to the OLS estimator, i.e. a weighted estimator, (3.2) contains essential information about $\varvec{X}\varvec{\beta }$.

Notice $E[\varvec{B}_2'\varvec{y}]=\varvec{0}$, $\varvec{B}_2'\varvec{B}_2= (\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1}$, and $\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_2=\varvec{I}_{l-k}$. Hence $D[\varvec{B}_2'\varvec{y}]=\sigma ^2(\rho \varvec{I}_{l-k}+\varvec{B}_2'\varvec{B}_2)$, which implies that the components in $\varvec{B}_2'\varvec{y}$ are correlated and this can cause technical inference problems because if, for example, conditioning $\varvec{B}_1'\varvec{y}$ on $\varvec{B}_2'\varvec{y}$, the inverse of $D[\varvec{B}_2'\varvec{y}]$ is needed and may be difficult to handle.

Now, one can add an extra random term to $\varvec{B}_2'\varvec{y}$, which means adding extra variation, in such a way that the dispersion matrix of the resulting random vector becomes diagonal, and its inverse is then easy to utilize. This is a key idea of this article. The main remaining issue is how to add the necessary portion of proper variation, since $\sigma ^2_\gamma $ and $\sigma ^2$ are both unknown.

The problem can be resolved adopting the ideas of Gallo, Khuri, Öfversten (see the Introduction for references), and others. These authors performed some kind of abstract bootstrapping. Implementing their ideas means to take the important step of adding observable random variables from (3.3) to (3.2) so that eventually a diagonal dispersion matrix is obtained. Let the new random vector $\varvec{u}_2$ be defined by

$$\begin{aligned} \varvec{u}_2=\varvec{B}_2'\varvec{y}+(\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2)^{1/2}\mathrm{sel}(\varvec{B}_3'\varvec{y},(l-k)). \end{aligned}$$

(3.4)

We have introduced here a selection operator, $\mathrm{sel}(\varvec{B}_3'\varvec{y},(l-k))$, which selects $l-k$ independent observations from $\varvec{B}_3'\varvec{y}\sim N_{n-l}(\varvec{0},\sigma ^2\varvec{I})$. This selection may be represented by e.g., an $(l-k) \times (n-l)$ matrix $\varvec{R}$, whose rows are the arbitrarily chosen $l-k$ rows of $I_{n-l}$. Obviously, $\varvec{R}\varvec{R}'= I_{l-k}$. One possible version of (3.4) is then

$$\begin{aligned} \varvec{u}_2=\varvec{B}_2'\varvec{y}+(\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2)^{1/2}\varvec{R}\varvec{B}_3'\varvec{y}. \end{aligned}$$

(3.5)

The scalar $\lambda $ is chosen large enough so that the matrix $\lambda \varvec{I}_{l-k}-\varvec{B}_2'\varvec{B}_2$ is nonnegative definite and hence the square root in (3.4) exists. There exist several matrix square roots but the different choices do not effect the results in any way.

Theorem 3.1

Let $\varvec{u}_2$ be given by (3.5). Then,

$$\begin{aligned} \varvec{u}_2\sim N_{(l-k)}(\varvec{0}, \sigma ^2(\rho +\lambda )\varvec{I}_{l-k}). \end{aligned}$$

Proof

Since $\varvec{u}_2$ is a linear combination of two independently normally distributed random vectors $\varvec{B}_2'\varvec{y}$ and $\varvec{B}_3'\varvec{y}$, it is also normally distributed. Thus, only the first two moments of $\varvec{u}_2$ have to be determined. For the expectation, it follows immediately that $E[\varvec{u}_2]=\varvec{0}$ because $E[\varvec{B}_2'\varvec{y}]=\varvec{0}$ and $E[\varvec{B}_3'\varvec{y}]=\varvec{0}$. Further, direct calculations show that $D[\varvec{u}_2]=\sigma ^2(\rho +\lambda )\varvec{I}_{l-k}$, which completes the proof.$\square $

It is clear that if $\varvec{u}_2$ is to be used, the parameter $\lambda $ should be chosen as small as possible (with respect to the existence of the square root) because the dispersion of $\varvec{u}_2$ is proportional to $\lambda $. Thus, a natural choice for $\lambda $ is the largest eigenvalue of $\varvec{B}'_2\varvec{B}_2=(\varvec{A}'\varvec{Z}\varvec{Z}'\varvec{A})^{-1}$.

Note that all components in $\varvec{u}_2$ are mutually independent, which indeed is a very nice property. Moreover, denote $\varvec{u}_1=\varvec{B}_1'\varvec{y}$. Then, because of independence between $\varvec{u}_1$ and $\varvec{B}_3'\varvec{y}$,

$$\begin{aligned} {\mathrm{cov}}[\varvec{u}_1,\varvec{u}_2]={\mathrm{cov}}[\varvec{B}_1'\varvec{Z}\varvec{\gamma },\varvec{B}_2'\varvec{Z}\varvec{\gamma }]= \sigma ^2\rho \,\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2. \end{aligned}$$

Conditioning $\varvec{u}_1$ on $\varvec{u}_2$ we obtain

$$\begin{aligned} E[\varvec{u}_1|\varvec{u}_2]= & {} E[\varvec{u}_1]+{\mathrm{cov}}[\varvec{u}_1,\varvec{u}_2]D[\varvec{u}_2]^{-1}\varvec{u}_2\\ {}= & {} \varvec{B}_1'\varvec{X}\varvec{\beta }+ \frac{\rho }{\rho +\lambda }\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2. \end{aligned}$$

Thus, if the ratio of the variances $\rho =\sigma ^2_\gamma /\sigma ^2$ is known and because the distribution of $\varvec{u}_2$ is independent of $\varvec{\beta }$, the ”natural” estimator of $\varvec{X}\varvec{\beta }$ is given by $\varvec{B}_1\widehat{\widehat{\varvec{B}_1'\varvec{X}\varvec{\beta }}}$, where

$$\begin{aligned} \widehat{\widehat{\varvec{B}_1'\varvec{X}\varvec{\beta }}}=\varvec{u}_1- \frac{\rho }{\rho +\lambda }\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2. \end{aligned}$$

(3.6)

In general, the variances or their ratio are not known, hence the ratio

$$\begin{aligned} \frac{\rho }{\rho +\lambda }=1-\frac{\lambda }{\rho +\lambda } \end{aligned}$$

(3.7)

has to be estimated.

Denote after selection, the remaining variables in (3.3) by

$$\begin{aligned} \varvec{u}_3\sim N_{n-2l+k}(\varvec{0},\sigma ^2\varvec{I}_{n-2l+k}). \end{aligned}$$

(3.8)

Moreover,

$$\begin{aligned}&E[(\varvec{u}_2'\varvec{u}_2)^{-1}]=\frac{1}{(l-k-2)\sigma ^2}(\rho +\lambda )^{-1},\quad l-k>2, \end{aligned}$$

(3.9)

since

$$\begin{aligned} \frac{\varvec{u}_2'\varvec{u}_2}{\sigma ^2(\rho +\lambda )}\sim \chi ^2(l-k), \end{aligned}$$

and $E[\varvec{u}_3'\varvec{u}_3]=(n-2l+k)\sigma ^2$.

Because $\varvec{u}_2$ and $\varvec{u}_3$ are independently distributed, it immediately follows that $(\rho +\lambda )^{-1}$ is estimable, i.e.

$$\begin{aligned} \frac{l-k-2}{n-2l+k}E[\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}]=\frac{1}{\rho +\lambda }, \end{aligned}$$

(3.10)

and thus also $\varvec{X}\varvec{\beta }$ can be estimated.

Proposition 3.2

For the model in (2.1), let $\varvec{u}_i$, $i=1,2,3$, and $\varvec{B}_j$, $j=1,2$, be defined as in the text preceding this proposition. Then, if $n-2l+k>0$ and $l-k-2>0$,

$$\begin{aligned} \widehat{\varvec{X}\varvec{\beta }}=\varvec{B}_1(\varvec{u}_1- \varvec{f}), \end{aligned}$$

(3.11)

where $\varvec{f}=(1-c \varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1})\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2$, $c=\lambda (l-k-2)/(n-2l+k)$, and $\lambda $ equals the largest eigenvalue of $\varvec{B}_2'\varvec{B}_2$.

The proposed estimator for $\varvec{X}\varvec{\beta }$ is a nonlinear estimator since it is nonlinear in $\varvec{u}_2$ and $\varvec{u}_3$. The main objection against this estimator might be that the choice of $l-k$ components from $\varvec{B}_3'\varvec{y}$ is arbitrary. Notice that the distribution of $\varvec{u}_2$ and $\varvec{u}_3$ does not depend on choice of $\varvec{R}$ in (3.5). Some kind of U-statistic approach may circumvent this type of arbitrariness but this will not be explored in this article.

4 $E[\widehat{\varvec{X}\varvec{\beta }}]$ and $D[\widehat{\varvec{X}\varvec{\beta }}]$

In this section, $E[\widehat{\varvec{X}\varvec{\beta }}]$ and $D[\widehat{\varvec{X}\varvec{\beta }}]$ will be studied, where $\widehat{\varvec{X}\varvec{\beta }}$ was given in Proposition 3.1. The calculations are somewhat lengthy but fairly straightforward. In the following, the next lemma is needed.

Lemma 4.1

Let $\varvec{u}_2$ be given by (3.5). Then,

(i)
$E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=(l-k)^{-1}\varvec{I}_{l-k}$;
(ii)
$E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']= (l-k)^{-1}(l-k-2)^{-1}(\rho +\lambda )^{-1}\displaystyle \frac{1}{\sigma ^2}\varvec{I}_{l-k}$.

Proof

Let $\varvec{\Gamma }$ be an arbitrary orthogonal matrix. Since $\varvec{\Gamma }'\varvec{u}_2$ has the same distribution as $\varvec{u}_2$, $\varvec{\Gamma }'E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{\Gamma }= E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']$ and$\varvec{\Gamma }'E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{\Gamma }$ equals $E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']$. Therefore,

$$\begin{aligned}&E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=c_1\varvec{I}_{l-k},\\&E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']=c_2\varvec{I}_{l-k}, \end{aligned}$$

for some constants $c_1$ and $c_2$. Applying the trace-function to these relations yields $c_1=(l-k)^{-1}$ and

$$\begin{aligned} c_2=(l-k)^{-1}E[(\varvec{u}_2'\varvec{u}_2)^{-1}]=(l-k)^{-1}(l-k-2)^{-1}(\rho +\lambda )^{-1}\frac{1}{\sigma ^2}, \end{aligned}$$

where (3.9) was utilized (see e.g., Kollo and von Rosen 2005; Lemma 2.4.1 for discussion). $\square $

The following moment relations will be used in the subsequent calculations:

$$\begin{aligned}&E[\varvec{u}_1]=\varvec{B}_1'\varvec{X}\varvec{\beta },\, E[\varvec{u}_i]=\varvec{0},\,\,i=2,3,\quad E[\varvec{f}]=\varvec{0},\nonumber \\&E[\varvec{u}_1|\varvec{u}_2]=\varvec{B}_1'\varvec{X}\varvec{\beta }+ \rho (\rho +\lambda )^{-1}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2,\end{aligned}$$

(4.1)

$$\begin{aligned}&D[\varvec{u}_1]=\sigma ^2(\rho \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_1+\varvec{I}_{k}),\end{aligned}$$

(4.2)

$$\begin{aligned}&D[\varvec{u}_2]=\sigma ^2(\rho +\lambda )\varvec{I}_{l-k},\end{aligned}$$

(4.3)

$$\begin{aligned}&E[\varvec{u}_3'\varvec{u}_3]=(n-2l+k)\sigma ^2,\end{aligned}$$

(4.4)

$$\begin{aligned}&E[(\varvec{u}_3'\varvec{u}_3)^2]=(n-2l+k)(n-2(l-1)+k)(\sigma ^2)^2. \end{aligned}$$

(4.5)

Here we have used $\varvec{u}_3'\varvec{u}_3/\sigma ^2\sim \chi ^2(n-2l+k)$.

Since $E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}]=\varvec{0}$, it follows from Proposition 3.2 that for all $\varvec{\beta }\in R^k$,

$$\begin{aligned} E[\widehat{\varvec{X}\varvec{\beta }}]=E[\varvec{B}_1\varvec{u}_1]=\varvec{X}\varvec{\beta }. \end{aligned}$$

(4.6)

Furthermore, from Proposition 3.2 it follows that

$$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]=\varvec{B}_1 \left( D[\varvec{u}_1]+D[\varvec{f}]- 2{\mathrm{cov}}[\varvec{u}_1,\varvec{f}] \right) \varvec{B}_1', \end{aligned}$$

(4.7)

since it can be shown that ${\mathrm{cov}}[\varvec{u}_1,\varvec{f}]={\mathrm{cov}}[\varvec{f},\varvec{u}_1]$. The dispersion $D[\varvec{u}_1]$ was presented in (4.2) and it remains to derive $D[\varvec{f}]$ and $2{\mathrm{cov}}[\varvec{u}_1,\varvec{f}]$. We start calculating

$$\begin{aligned} D[\varvec{f}]=E[(1-c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1})^2\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1], \end{aligned}$$

where as in Proposition 3.2,

$$\begin{aligned} c=\lambda \frac{l-k-2}{n-2l+k}. \end{aligned}$$

Hence, we calculate

$$\begin{aligned}&E[\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]=\sigma ^2(\rho +\lambda )\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.3) was used;

$$\begin{aligned}&-2E[c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\\&\quad =-2c(n-2l+k)\sigma ^2E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1\\&\quad =-\sigma ^2 2c(n-2l+k)(l-k)^{-1} \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.5) and Lemma 4.1 (i) were used;

$$\begin{aligned}&E[c^2(\varvec{u}_3'\varvec{u}_3)^2(\varvec{u}_2'\varvec{u}_2)^{-2}\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\\&\quad =c^2(n-2l+k)(n-2(l-1)+k)(\sigma ^2)^2E[\varvec{u}_2(\varvec{u}_2'\varvec{u}_2)^{-1}(\varvec{u}_2'\varvec{u}_2)^{-1}\varvec{u}_2']\\&\qquad \times \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1\\&\quad =\sigma ^2c^2\frac{(n-2l+k)(n-2(l-1)+k)}{(l-k)(l-k-2)(\rho +\lambda )} \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

where (4.5) and Lemma 4.1 (ii) were used, which together yield

$$\begin{aligned} D[\varvec{f}]= & {} \sigma ^2\left( \rho +\lambda -2c\frac{(n-2l+k)}{l-k}\right. \nonumber \\&\left. +\, c^2\frac{(n-2l+k)(n-2(l-1)+k)}{(l-k)(l-k-2)(\rho +\lambda )}\right) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1. \end{aligned}$$

(4.8)

Next ${\mathrm{cov}}[\varvec{f},\varvec{u}_1]$ will be calculated via conditioning $\varvec{u}_1|\varvec{u}_2$. Here $E_{\varvec{y}}[\cdot ]$ indicates that expectation is taken with respect to the distribution of $\varvec{y}$.

$$\begin{aligned} {\mathrm{cov}}[\varvec{f},\varvec{u}_1]= & {} E_{u_2}E_{u_1|u_2}E_{u_3}[(1-c\varvec{u}_3'\varvec{u}_3(\varvec{u}_2'\varvec{u}_2)^{-1}) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_1']\nonumber \\= & {} E_{u_2}[(1-c(n-2l+k)\sigma ^2(\varvec{u}_2'\varvec{u}_2)^{-1})\varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{u}_2\varvec{u}_2'\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1]\nonumber \\&\times \rho (\rho +\lambda )^{-1}\nonumber \\= & {} \sigma ^2\rho \left( 1-c\frac{(n-2l+k)}{(l-k)(\rho +\lambda )} \right) \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'\varvec{B}_1, \end{aligned}$$

(4.9)

where (4.4) and Lemma 4.1 (i) have been applied. Thus, $D[\widehat{\varvec{X}\varvec{\beta }}]$ is obtained.

Denote the combined constants in $D[\varvec{f}]$ and ${\mathrm{cov}}[\varvec{f},\varvec{u}_1]$ by

$$\begin{aligned} \rho _1= & {} \rho +\lambda -2\frac{l-k-2}{l-k} \lambda + \frac{l-k-2}{l-k} \frac{n-2(l-1)+k}{n-2l+k}\frac{(\lambda )^{2}}{\rho +\lambda }\nonumber \\&-2\rho \left( 1-\frac{(l-k-2)\lambda }{(l-k)(\rho +\lambda )}\right) . \end{aligned}$$

(4.10)

After substituting (4.8), (4.9), and (4.10) into (4.7), we get

$$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]=\sigma ^2P_{\varvec{X}} + \sigma ^2 \rho P_{\varvec{X}}\varvec{Z}\left[ I + \frac{\rho _1}{\rho } \varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\right] \varvec{Z}'P_{\varvec{X}}. \end{aligned}$$

(4.11)

The above obtained results are summarized in the next theorem.

Theorem 4.2

Let $\widehat{\varvec{X}\varvec{\beta }}$ be given by (3.11) and let $\widetilde{\varvec{X}\varvec{\beta }}$ be the OLS of $\varvec{X}\varvec{\beta }$. Then

(i)
for all $\varvec{\beta }\in R^k$, $\sigma ^2>0$, $\rho \ge 0$,
$$\begin{aligned} E[\widehat{\varvec{X}\varvec{\beta }}]=\varvec{X}\varvec{\beta }; \end{aligned}$$
(ii)
$$\begin{aligned} D[\widehat{\varvec{X}\varvec{\beta }}]= & {} D[\widetilde{\varvec{X}\varvec{\beta }}] + \sigma ^2\rho _1P_{\varvec{X}}\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'P_{\varvec{X}}, \end{aligned}$$
where $\rho _1$ is given by (4.10).

5 Comparison of $D[\widehat{\varvec{g}'\varvec{\beta }}]$ with the dispersion matrix of the OLS estimator

Let $\varvec{g}'\varvec{\beta }$ be an arbitrary estimable function of $\varvec{\beta }$, i.e., $\varvec{g}\in {\mathcal C}(\varvec{X}')$. Equivalently, $\varvec{g}=\varvec{X}'\varvec{h}$ for some $\varvec{h}$. From (4.11) and Theorem 4.2 we get immediately,

$$\begin{aligned} D[\widehat{\varvec{g}'\varvec{\beta }}] = D[\widehat{\varvec{h}'\varvec{X}\varvec{\beta }}]= D[\widetilde{\varvec{g}'\varvec{\beta }}] + \sigma ^2\rho _1\varvec{h}'P_{\varvec{X}}\varvec{Z}\varvec{Z}'\varvec{B}_2\varvec{B}_2'\varvec{Z}\varvec{Z}'P_{\varvec{X}}\varvec{h}. \end{aligned}$$

As presented in (4.2), the dispersion matrix of $\widetilde{\varvec{X}\varvec{\beta }}=\varvec{B}_1\varvec{u}_1=P_{\varvec{X}}\varvec{y}$ equals

$$\begin{aligned} \sigma ^2 \varvec{B}_1(\rho \varvec{B}_1'\varvec{Z}\varvec{Z}'\varvec{B}_1 +\varvec{I}_k)\varvec{B}_1' = \sigma ^2 P_{\varvec{X}}(\rho \varvec{Z}\varvec{Z}' +\varvec{I}_k)P_{\varvec{X}}, \end{aligned}$$

from where it immediately follows,

$$\begin{aligned} D[\widetilde{\varvec{g}'\varvec{\beta }}] = \sigma ^2 \varvec{h}'P_{\varvec{X}}(\rho \varvec{Z}\varvec{Z}' +\varvec{I}_k)P_{\varvec{X}}\varvec{h}. \end{aligned}$$

Let $d=\tfrac{l-k-2}{l-k}$. In order to see if the proposed estimator ${\widehat{\varvec{\beta }}}$ improves the ordinary least squares estimator, according to Theorem 4.2 (ii), the condition under which $\rho _1<0$, i.e.

$$\begin{aligned}&\rho +\lambda -2d\lambda +d\tfrac{n-2(l-1)+k}{n-2l+k}\tfrac{(\lambda )^{2}}{\rho +\lambda } -2\left( \rho -d\tfrac{\rho \lambda }{\rho +\lambda }\right) <0 \end{aligned}$$

has to be studied. This expression is equivalent to

$$\begin{aligned} \rho >\lambda +d\left( \tfrac{n-2(l-1)+k}{n-2l+k}-2\right) \tfrac{\lambda ^2}{\rho +\lambda }, \end{aligned}$$

(5.1)

where it has been used that

$$\begin{aligned} \tfrac{\rho }{\rho +\lambda }=1-\tfrac{\lambda }{\rho +\lambda }. \end{aligned}$$

Theorem 5.1

Let $\widehat{\varvec{X}\varvec{\beta }}$ be given by (3.11). Then for every estimable function $\varvec{g}'\varvec{\beta }$, $D[\widetilde{\varvec{g}'\varvec{\beta }}] - D[\widehat{\varvec{g}'\varvec{\beta }}] >0$ if and only if

$$\begin{aligned} \rho >\lambda (1-(1-\tfrac{2}{n-2l+k})(1-\tfrac{2}{l-k}))^{\tfrac{1}{2}}. \end{aligned}$$

Proof

Manipulating (5.1) yields that ${\widehat{\varvec{X}\varvec{\beta }}}$ has a smaller dispersion matrix if and only if

$$\begin{aligned} \rho -\lambda +\tfrac{\lambda ^2}{\rho +\lambda }\tfrac{n-2l+k-2}{n-2l+k}\tfrac{l-k-2}{l-k}> 0. \end{aligned}$$

The statement of the theorem is the solution to this inequality. $\square $

From Theorem 5.1 it follows that the inequality $\rho \!>\!\lambda $ is a simple criterion for deciding if ${\widehat{\varvec{X}\varvec{\beta }}}$ should be used instead of the OLS estimator, although $\rho $ has to be estimated. Here we can use as variance estimator ${\widehat{\sigma }}^2\!=\!(n\!-\!2l\!+\!k)^{-1}\varvec{u}_3'\varvec{u}_3$. However, if we use the estimator ${\widehat{\rho }}=\tfrac{n-2(l+1)+k}{l-k} \varvec{u}_2'\varvec{u}_2(\varvec{u}_3'\varvec{u}_3)^{-1}-\lambda $, which is motivated by (3.8), we observe that if $\rho<<\lambda $, ${\widehat{\rho }}$ can take negative values. Similarly, if we use the estimator ${\widehat{\sigma }}_\gamma ^2=(l-k)^{-1} \varvec{u}_2'\varvec{u}_2-\lambda {\widehat{\sigma }}^2$, which is also motivated by Theorem 3.1 and (3.8), we can observe that if $\sigma _\gamma ^2<<\sigma ^2$, ${\widehat{\sigma }}_\gamma ^2$ can take negative values, indicating that the OLS estimator is preferable.

Table 1 Configuration settings and calculated $\lambda $ for the simulation study

Full size table

Table 2 Empirical estimates, and the theoretical variances $D[{\tilde{\beta }}]$ and $D[{\widehat{\beta }}]$ for Configuration 1

Full size table

Table 3 Empirical estimates, and the theoretical variances $D[{\tilde{\beta }}]$ and $D[{\widehat{\beta }}]$ for Configuration 2

Full size table

6 Example: unbalanced one-way random model

A special case of model (2.1) is the one-way random effects model

$$\begin{aligned} y_{ij} = \beta + \gamma _i + \epsilon _{ij}, \end{aligned}$$

(6.1)

where $i=1, \dots , l$, $j=1,\dots ,n_i$, and $n=\sum _{i=1}^ln_i$. The expectation $\beta \in R$ is our parameter of interest. All distributional assumptions of model (2.1) are assumed to hold. Here $k=1$, the matrix $\varvec{X}$ is an n-dimensional column vector of ones, $\varvec{X}={\varvec{1}} _n$, and the matrix $\varvec{Z}$ is a block matrix of column vectors of ones of length $n_i$, $\varvec{Z}={\mathrm{Diag}}\{{\varvec{1}} _{n_i}\}$. The OLS estimator of ${\beta }$ is $\bar{y} = \varvec{u}_1$, the average of all n observations. It is well known that the maximum likelihood estimator (MLE) of $\mu $, in case of an unbalanced model, has to be obtained iteratively. Note that ${\mathcal C}(\varvec{X})\subseteq {\mathcal C}(\varvec{Z})$ and thus $l=\mathrm{rank}(\varvec{Z})$.

We illustrate our procedure using a small simulation study for model (6.1). A broad range for the ratio of the variance components $\rho $ is considered. The number of levels l of $\varvec{\gamma }$ is chosen as small as $l=3$ with relatively higher number of observations $n_i$ per level, and a relatively higher $l=10$, with smaller numbers $n_i$. Because of the proportionality, we considered $\sigma ^2=1$ only.

All setting configurations are presented in Table 1. For each configuration, 10,000 simulations were carried out. The OLS estimates as well as ${\widehat{\beta }}$ estimates are presented as averages of 10,000 observed estimates. In addition, the estimates of $\sigma ^2$ and of $\sigma ^2_\gamma $ are presented. The observed MSE of the OLS estimators as well as of the ${\widehat{\beta }}$s are also included here. The results are presented in Tables 2 and 3.

In Sect. 5, it was suggested that the new proposed estimator has a smaller dispersion if $\rho >\lambda =7.175$ which is in complete agreement with Table 2 where for $\rho =10$ or $\rho =20$ simulations indicate the new estimator has a smaller dispersion than the OLS estimator, but not for $\rho =5$. The results for Configuration 2 are presented in Table 3. Now using Table 3 and Theorem 5.1, it follows that $\rho $ should be larger than 1.97 if ${\widehat{\varvec{\beta }}}$ is to be applied instead of the OLS estimator. This strategy is supported by results of simulations presented in Table 3. The tables indicate that even a smaller $\rho $ can be used but one has to remember that estimated variances and MSEs are applied, in particular that ${\widehat{\sigma }}_{\gamma }$ can become negative (see Figure 1). However, with confidence we can state that the new estimator is better than the least squares estimator in certain regions of the parameter space (described through $\rho $) as it is shown in Theorem 5.1 and simulations. In addition, we would like to point out that in spite of ${\widehat{\varvec{\beta }}}$ being a nonlinear estimator, it is an unbiased estimator as is shown in (4.6), and as we observed in our simulations, its distribution seems to be symmetric around its expectation. For illustration see histograms of ${\widehat{\beta }}$ and of the OLS estimators in Figure 2 and Figure 3, respectively.

References

Al Sarraj R, von Rosen D (2009) Improving Henderson’s method 3 approach when estimating variance components in a two-way mixed linear model. In: Schipp B, Kräer W (eds) Statistical inference, econometric analysis and matrix algebra. Physica-Verlag, Heidelberg, pp 125–142
Christensen R (1996) Exact tests for variance components. Biometrics 52:309–314
Article MathSciNet Google Scholar
Gallo J, Khuri AI (1990) Exact tests for the random and fixed effects in an unbalanced mixed two-way cross-classification model. Biometrics 46:1087–1095
Article MathSciNet Google Scholar
Güven B (2012) Approximate tests in unbalanced two-way random models without interaction. Stat Pap 53:753–766
Article MathSciNet Google Scholar
Henderson CR (1953) Estimation of variance and covariance components. Biometrics 9:226–252
Article MathSciNet Google Scholar
Khuri AI (1986) Exact tests for the comparison of correlated response models with an unknown dispersion matrix. Technometrics 28:347–357
Article MathSciNet Google Scholar
Khuri AI (1990) Exact tests for random models with unequal cell frequencies in the last stage. J Stat Plan Inference 24:177–193
Article MathSciNet Google Scholar
Khuri AI, Littell RC (1987) Exact tests for the main effects variance components in an unbalanced random two-way model. Biometrics 43:545–560
Article MathSciNet Google Scholar
Kollo T, von Rosen D (2005) Advanced multivariate statistics with matrices. Mathematics and its applications, vol 579. Springer, Dordrecht
Öfversten J (1993) Exact tests for variance components in unbalanced mixed linear models. Biometrics 49:45–57
Article MathSciNet Google Scholar
Öfversten J (1995) Estimation in mixed models via layer triangular transformation. Comput Stat Data Anal 20:657–667
Article MathSciNet Google Scholar

Download references

Acknowledgements

The research has been supported by The Swedish Foundation for Humanities and Social sciences (P14-0641:1).

Author information

Authors and Affiliations

Department of Statistics, Stockholm University, Stockholm, Sweden
Tatjana von Rosen
Department of Engineering and Technology, Swedish University of Agricultural Sciences, Uppsala, Sweden
Dietrich von Rosen
Department of Mathematics, Linköping University, Linköping, Sweden
Dietrich von Rosen
Biostatistics Program, LSU Health - New Orleans, School of Public Health, New Orleans, USA
Julia Volaufova

Authors

Tatjana von Rosen
View author publications
You can also search for this author in PubMed Google Scholar
Dietrich von Rosen
View author publications
You can also search for this author in PubMed Google Scholar
Julia Volaufova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dietrich von Rosen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

von Rosen, T., von Rosen, D. & Volaufova, J. A new method for obtaining explicit estimators in unbalanced mixed linear models. Stat Papers 61, 371–383 (2020). https://doi.org/10.1007/s00362-017-0937-1

Download citation

Received: 17 September 2016
Revised: 03 July 2017
Published: 10 August 2017
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00362-017-0937-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new method for obtaining explicit estimators in unbalanced mixed linear models

Abstract

Similar content being viewed by others

General unbiased estimating equations for variance components in linear mixed models

Interval Estimation of the Intra-class Correlation in General Linear Mixed Effects Models

Inferences in linear mixed models with skew-normal random effects

1 Introduction

2 Model