On the variance parameter estimator in general linear models

Lindholm, Mathias; Wahl, Felix

doi:10.1007/s00184-019-00751-4

On the variance parameter estimator in general linear models

Open access
Published: 06 November 2019

Volume 83, pages 243–254, (2020)
Cite this article

Download PDF

You have full access to this open access article

Metrika Aims and scope Submit manuscript

On the variance parameter estimator in general linear models

Download PDF

2350 Accesses
1 Citation
Explore all metrics

Abstract

In the present note we consider general linear models where the covariates may be both random and non-random, and where the only restrictions on the error terms are that they are independent and have finite fourth moments. For this class of models we analyse the variance parameter estimator. In particular we obtain finite sample size bounds for the variance of the variance parameter estimator which are independent of covariate information regardless of whether the covariates are random or not. For the case with random covariates this immediately yields bounds on the unconditional variance of the variance estimator—a situation which in general is analytically intractable. The situation with random covariates is illustrated in an example where a certain vector autoregressive model which appears naturally within the area of insurance mathematics is analysed. Further, the obtained bounds are sharp in the sense that both the lower and upper bound will converge to the same asymptotic limit when scaled with the sample size. By using the derived bounds it is simple to show convergence in mean square of the variance parameter estimator for both random and non-random covariates. Moreover, the derivation of the bounds for the above general linear model is based on a lemma which applies in greater generality. This is illustrated by applying the used techniques to a class of mixed effects models.

New results on asymptotic properties of likelihood estimators with persistent data for small and large T

Article Open access 03 August 2023

Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions

Article 26 December 2015

Partially linear models with first-order autoregressive symmetric errors

Article 01 April 2015

1 The general linear model

The primary model class that will be analysed is defined as follows: Let ${\varvec{y}}$ be a random $n \times 1$ vector and let ${\varvec{X}}$ be a random $n \times p_x$ matrix of almost surely full column rank, $\mathrm {rank}({\varvec{X}}) = p_x$, with $n > p_x$. Further, let ${\varvec{\Sigma }}$ be a symmetric almost surely strictly positive definite $n\times n$ matrix. This ensures that we may define ${\varvec{\Sigma }}^{1/2}$ in the standard way using orthogonalization. The class of GLMs which will be studied in the present note are of the form

$$\begin{aligned} {\varvec{y}}= {\varvec{X}}\varvec{\beta }+ \sigma {\varvec{\Sigma }}^{1/2} {\varvec{e}}, \end{aligned}$$

(1)

where $\varvec{\beta }$ is a $p_x \times 1$ vector, $\sigma > 0$ is a scalar and ${\varvec{e}}$ is some random $n \times 1$ vector whose elements are independent. Moreover, we assume that ${\varvec{e}}$ has, conditional on ${\varvec{X}}$ and ${\varvec{\Sigma }}$, mean $\varvec{0}$ and covariance ${\varvec{I}}$, together with common central fourth moments $\mu _4 \ge 1$ (since $\mu _4 := {\mathbb {E}}[{\varvec{e}}_i^4] \ge {\text {Var}}({\varvec{e}}_i)^2 = 1$). Here ${\varvec{I}}$ denotes the $n \times n$ identity matrix. The standard generalized least squares estimator of $\varvec{\beta }$, conditional on ${\varvec{X}}$ and ${\varvec{\Sigma }}$, is given by:

$$\begin{aligned} {\hat{{\varvec{\beta }}}}= ({\varvec{X}}' {\varvec{\Sigma }}^{-1} {\varvec{X}})^{-1} {\varvec{X}}' {\varvec{\Sigma }}^{-1} {\varvec{y}}, \end{aligned}$$

see for instance (Seber and Lee 2003, Sec. 3.10) for this and more on the general linear model. Moreover, an unbiased estimator of $\sigma ^2$ (conditional on ${\varvec{X}}$ and ${\varvec{\Sigma }}$), and the estimator we will focus on in the present note, is given by:

$$\begin{aligned} {\hat{\sigma }}^2 := {\hat{\sigma }}_n^2({\varvec{X}},{\varvec{\Sigma }}) = \frac{1}{n - p_x} \left( {\varvec{y}}- {\varvec{X}}{\hat{{\varvec{\beta }}}}\right) ' {\varvec{\Sigma }}^{-1} \left( {\varvec{y}}- {\varvec{X}}{\hat{{\varvec{\beta }}}}\right) . \end{aligned}$$

(2)

The results below will of course remain valid, with the obvious changes, if we consider an estimator normalized with some other, non-degenerate, function of n and $p_x$, for instance simply n. It is important to note that when ${\varvec{X}}$ is assumed to be random it is assumed to be possible to observe perfectly. That is, we are not dealing with an errors-in-variables model, which would lead to problems such as biased estimators. In Example 3 we comment on the situation when the regression coefficients are allowed to be random instead of the covariates, i.e. we consider mixed effects models.

For the results below it will be useful to define

$$\begin{aligned} {\varvec{P}}:= {\varvec{X}}({\varvec{X}}' {\varvec{\Sigma }}^{-1} {\varvec{X}})^{-1} {\varvec{X}}' {\varvec{\Sigma }}^{-1}, \end{aligned}$$

which corresponds to the projection matrix associated with the linear model (1), and to also define the idempotent matrices

$$\begin{aligned} {\varvec{V}}:= {\varvec{\Sigma }}^{-1/2} {\varvec{P}}{\varvec{\Sigma }}^{1/2}, \end{aligned}$$

and

$$\begin{aligned} {\varvec{K}}:= {\varvec{I}}- {\varvec{V}}. \end{aligned}$$

In particular, using the above it is possible to rewrite ${\hat{\sigma }}^2$ according to

$$\begin{aligned} {\hat{\sigma }}^2 = \frac{\sigma ^2}{p_k} {\varvec{e}}' {\varvec{K}}{\varvec{e}}, \end{aligned}$$

(3)

where $p_k = {\text {rank}}({\varvec{K}}) = n-p_x$, see the proof of Proposition 1 for more details.

We will henceforth focus on properties of the variance of the variance parameter estimator ${\hat{\sigma }}^2$. One can note that the special case of a sample variance in the non-Gaussian setting, i.e. an intercept only GLM, was treated already in e.g. Cramér (1946, Eq. 27.4.2). Other similar results are obtained in the theory of minimum variance component estimation, see e.g. Rao (1970, 1971) and the proof of Seber and Lee (2003, Thm. 3.4). More general results can be found in for instance (Dette et al. 1998) where estimation of the variance parameter in the case of nonparametric regression is treated. Lemma 1 may be seen as a special case of the corresponding expression for the mean squared error (MSE) from Dette et al. (1998, Eq. 6). In Example 3 we discuss extensions to mixed effects models and comment on the results in Li (2012) which extend the analysis in Dette et al. (1998) w.r.t. mixed effects. We will return to these comparisons in more detail below.

The general problem formulation above, of course, relies on the theory of random quadratic forms. For more on this topic, see e.g. Eaton (1983), Mathai et al. (2012) and Seber and Lee (2003) and the references therein.

The results that we obtain for the variance of the variance estimator are based on that ${\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }})$ can be calculated explicitly. For the particular situation of interest this variance is obtained using the following result from Plackett (1960, Eq. (2), p. 16) which we state in the following lemma:

Lemma 1

(Plackett 1960) Let ${\varvec{z}}$ be an $n\times 1$ dimensional vector of independent random variables with mean 0, and common variance $\sigma ^2$, and common fourth central moment $\mu _4$, and let ${\varvec{W}}$ denote an arbitrary $n\times n$ matrix. It then holds that

$$\begin{aligned} {\text {Var}}({\varvec{z}}'{\varvec{W}}{\varvec{z}}) = \sigma ^4 \left( 2{{\,\mathrm{tr}\,}}({\varvec{W}}^2) + (\mu _4-3)\sum _{i=1}^n {\varvec{W}}_{ii}^2 \right) . \end{aligned}$$

(4)

N.B. The last sum in (4) corresponds to the sum of the squared diagonal elements of ${\varvec{W}}$, which should not be confused with ${{\,\mathrm{tr}\,}}({\varvec{W}}^2)$. The proof of Lemma 1 is given in Plackett (1960), and a more general version can be found, without proof, in Atiqullah (1962a), whose proof can be found in Seber and Lee (2003, Thm. 1.6). See also the derivation of the SE expressions given in Dette et al. (1998) and Li (2012, Lemma 3).

Further, the main objective of the current note is to obtain finite sample bounds for the variance of the variance estimator of the general linear model defined by (1) when the covariates may be both random and non-random. In order to prove such bounds we will make use of the following lemma:

Lemma 2

Given that ${\varvec{W}}$ from Lemma 1 is idempotent and symmetric it follows that the variance expression (4) may be bounded according to

$$\begin{aligned} \left\{ \begin{array}{ll} {\text {Var}}({\varvec{z}}'{\varvec{W}}{\varvec{z}}) \in [\nu _n - \kappa _n, \nu _n] &{} \quad \mathrm {if}~ \mu _4 > 3, \\ {\text {Var}}({\varvec{z}}'{\varvec{W}}{\varvec{z}}) \in [\nu _n, \nu _n - \kappa _n] &{}\quad \mathrm {if}~ 1 \le \mu _4 \le 3, \end{array}\right. \end{aligned}$$

where $\nu _n := \sigma ^4(\mu _4 - 1)(n-p_u)$ and

$$\begin{aligned} \kappa _n := \left\{ \begin{array}{ll} \sigma ^4 (\mu _4 - 3)p_u &{}\quad \mathrm {if}~ p_u \le n/2,\\ \sigma ^4 (\mu _4 - 3)(n-p_u) &{}\quad \mathrm {if}~ p_u > n/2,\\ \end{array}\right. \end{aligned}$$

and where ${\varvec{U}}= {\varvec{I}}- {\varvec{W}}$ and $p_u = {\text {rank}}({\varvec{U}})$.

Note that Lemma 2 only relies on Lemma 1 in terms of the explicit form of the variance given by (4), a fact which will be exploited further in Example 3 given below. The usefulness of Lemma 2 becomes apparent when the decomposition of ${\varvec{W}}$ is in terms of a ${\varvec{U}}$ with $p_u$ being a constant (much) smaller than n. This is what will be exploited in Corollary 1 and 2 , and which is the motivation for why the bounds in Lemma 2 are expressed in terms of $p_u$ instead of $p_w := n - p_u$. Further, note that the split between $p_u \le n/2$ and $p_u > n/2$ ascertains that all bounds are positive.

Proof of Lemma 2

Since ${\varvec{W}}= {\varvec{I}}- {\varvec{U}}$ is idempotent it follows that ${\varvec{W}}^2 = {\varvec{W}}$, that ${{\,\mathrm{tr}\,}}({\varvec{W}}) = {\text {rank}}({\varvec{W}}) = p_w = n - p_u$, and that (4) simplifies to

$$\begin{aligned} {\text {Var}}({\varvec{z}}'{\varvec{W}}{\varvec{z}})&= \sigma ^4\left( 2(n-p_u) + (\mu _4 - 3) \sum _{i = 1}^n (1 - {\varvec{U}}_{ii})^2\right) . \end{aligned}$$

Further, expanding the square and noting that $\sum _{i=1}^n {\varvec{U}}_{ii} = p_u$ yields

$$\begin{aligned} {\text {Var}}({\varvec{z}}'{\varvec{W}}{\varvec{z}})&= \sigma ^4\left( (\mu _4 - 1)(n-p_u) + (\mu _4 - 3)\left( \sum _{i = 1}^n {\varvec{U}}_{ii}^2 - p_u \right) \right) . \end{aligned}$$

(5)

Now, since ${\varvec{U}}$ is idempotent and symmetric it follows that

$$\begin{aligned} {\varvec{U}}_{ii} = {\varvec{U}}_{ii}^2 + \sum _{j \ne i} {\varvec{U}}_{ij}^2, \end{aligned}$$

and in turn that

$$\begin{aligned} 0 \le \sum _{i = 1}^n {\varvec{U}}_{ii}^2 \le \sum _{i = 1}^n {\varvec{U}}_{ii} = p_u. \end{aligned}$$

(6)

Inserting the lower and upper bound into (5) finishes the proof of the lemma for $p_u \le n/2$. The corresponding bounds for $p_u > n/2$ are obtained by noting that $\sum {\varvec{W}}_{ii}^2 \ge 0$. $\square $

We may now state our main result:

Proposition 1

Consider the general linear model given by (1) and let

(i)
${\varvec{\Sigma }}$ be a random symmetric almost surely positive definite $n\times n$ matrix and ${\varvec{X}}$ be a random $n\times p_x$ matrix of almost surely full column rank,
(ii)
the error term components defining the random $n\times 1$ vector ${\varvec{e}}$ be independent with, conditional on ${\varvec{X}}$ and ${\varvec{\Sigma }}$, mean 0, variance 1 and common fourth central moments $\mu _4$.

It then holds that

$$\begin{aligned} \left\{ \begin{array}{ll} {\text {Var}}({\hat{\sigma }}^2 |{\varvec{X}}, {\varvec{\Sigma }}) \in [\nu _n - \kappa _n, \nu _n] &{} \quad \mathrm {if}~ \mu _4 > 3, \\ {\text {Var}}({\hat{\sigma }}^2 |{\varvec{X}}, {\varvec{\Sigma }}) \in [\nu _n, \nu _n - \kappa _n] &{} \quad \mathrm {if}~ 1 \le \mu _4 \le 3, \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \left\{ \begin{array}{ll} {\text {Var}}({\hat{\sigma }}^2) \in [\nu _n - \kappa _n, \nu _n] &{} \quad \mathrm {if}~ \mu _4 > 3, \\ {\text {Var}}({\hat{\sigma }}^2) \in [\nu _n, \nu _n - \kappa _n] &{} \quad \mathrm {if}~ 1 \le \mu _4 \le 3, \end{array}\right. \end{aligned}$$

where $\nu _n := \sigma ^4\frac{\mu _4 - 1}{n-p_x}$ and

$$\begin{aligned} \kappa _n := \left\{ \begin{array}{ll} \sigma ^4 \frac{\mu _4 - 3}{(n-p_x)^2}p_x &{} \quad \mathrm {if}~ p_x \le n/2,\\ \sigma ^4 \frac{\mu _4 - 3}{n-p_x} &{}\quad \mathrm {if}~ p_x > n/2. \end{array}\right. \end{aligned}$$

The proof of the conditional bounds in Proposition 1 is based on that ${\hat{\sigma }}^2$ may be expressed according to (3) together with an application of Lemmas 1 and 2. The details are given in the appendix. Moreover, note that the conditional variance bounds for ${\hat{\sigma }}^2$ are deterministic regardless of whether the covariates ${\varvec{X}}$ are random or not. This fact combined with that ${\hat{\sigma }}^2$ is unbiased conditional on ${\varvec{X}}$ and ${\varvec{\Sigma }}$, together with an application of variance decomposition proves the unconditional variance bounds in Proposition 1. The full proof of Proposition 1 is given in the appendix.

We can also state a finite sample upper bound on the difference between the conditional and unconditional variances together with convergence of these using the bounds in Proposition 1.

Corollary 1

If the assumptions of Proposition 1 hold: Then

$$\begin{aligned} |{\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}) - {\text {Var}}({\hat{\sigma }}^2)| \le |\kappa _n|, ~\mathrm {for~ all}~ n, \end{aligned}$$

and

$$\begin{aligned} n {\text {Var}}({\hat{\sigma }}^2) \rightarrow \nu , \quad n {\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}) \rightarrow \nu , \quad \text {uniformly as } n \rightarrow \infty \end{aligned}$$

(7)

where $\nu = \sigma ^4 (\mu _4 - 1)$ and $\mu _4 \ge 1$.

Remark 1

It is of course possible to state the rate of convergence in (7) by noting that

$$\begin{aligned} |(n - p_x) {\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}) - \nu | \le \frac{\sigma ^4 |\mu _4 - 3|}{n - p_x}p_x, \end{aligned}$$

and

$$\begin{aligned} |(n - p_x) {\text {Var}}({\hat{\sigma }}^2) - \nu | \le \frac{\sigma ^4 |\mu _4 - 3|}{n - p_x}p_x, \end{aligned}$$

hold for all n. Since, Corollary 1 is a convergence result in terms of n where $p_x$ is a fixed quantity, the primary interest is on the situation where $p_x \le n/2$.

By Corollary 1 it follows that ${\text {Var}}({\hat{\sigma }}^2) \rightarrow 0$, uniformly as $n \rightarrow \infty $, and therefore we can state the following result.

Corollary 2

If the assumptions of Proposition 1 hold: Then

$$\begin{aligned} {\hat{\sigma }}^2 := {\hat{\sigma }}_n^2 {\mathop {\rightarrow }\limits ^{L^2}} \sigma ^2, \end{aligned}$$

as $n\rightarrow \infty $.

Remark 2

Note that ${\hat{\sigma }}^2$ from (2) is a special case of the variance parameter estimator ${\hat{\sigma }}_{ON}^2$ defined for the nonparametric mixed effects models analysed in Li (2012), see their Eq. (11). One can also note that the mixed effects models treated in Li (2012) is an extension of the model class treated in Dette et al. (1998). An alternative proof of the consistency of ${\hat{\sigma }}^2$ stated in Corollary 2 in the case with fixed covariates, hence, follows from Li (2012, Thm. 1) by neglecting the random effects part of their model. Further, as already commented upon above, in Dette et al. (1998) and Li (2012) focus is on the MSE of the variance parameter estimator. Moreover, their MSE expressions are stated in terms of asymptotic equivalence, i.e. using “o(1 / n)”. Thus, it is not straightforward to use their expressions to obtain bounds similar to those provided by Proposition 1 (or Lemma 2) without carefully inspecting the o(1 / n) terms.

2 Examples

Example 1

Consider the following vector autoregressive model:

$$\begin{aligned} {\varvec{X}}_{t+1} = {\varvec{X}}_t \varvec{\beta }_t + \sigma _t {\varvec{\Sigma }}_t^{1/2} {\varvec{e}}_{t+1}, \end{aligned}$$

(8)

where all dimensions are in accordance with the linear model given by (1). Model (8) is closely connected to the distribution free Chain–Ladder model, which is a widely used actuarial reserving model, where t denotes time and ${\varvec{X}}_t$ denotes the amount of payments made in the time interval [0, t], see e.g. Mack (1993). More specifically, the Chain–Ladder model assumes that ${\varvec{X}}_t$ is $n\times 1$, $({\varvec{X}}_t)_i \ge 0$ and that ${\varvec{\Sigma }}_t := {\mathrm {diag}}({\varvec{X}}_t)$. In this situation we may analyse the variance of the variance parameter estimator, ${{\hat{\sigma }}}_t^2$, both conditional on ${\varvec{X}}_t$ as well as unconditionally using Proposition 1 and its corollaries. In either situation the above results provide us with finite sample bounds as well as ascertaining that ${{\hat{\sigma }}}_t^2$ is a (mean square) consistent estimator of $\sigma _t^2$. A practical application of the finite sample bounds is for the Chain–Ladder model w.r.t. the discussion of the appropriateness of using conditional versus unconditional prediction error, see e.g. Buchwalder et al. (2006) and Lindholm et al. (2019). Moreover, the relevance of using finite sample bounds is also apparent in many real-world insurance applications, when highly aggregated data is used, where the sample size often is around $n = 10$ and $p_x = 1 \ll n$.

Further, the Chain–Ladder model assumes a diagonal structure of ${\varvec{\Sigma }}_t$ which, even though making ${\mathrm {diag}}({\varvec{V}}) = {\mathrm {diag}}({\varvec{P}})$, does not make the variance bounds any more explicit.

For more on other models than the distribution free Chain–Ladder model that are used in an insurance context, see e.g. Kremer (1984) and Lindholm et al. (2017).

Example 2

One example of a more restricted sub-class of the GLMs from (1) is when we assume that ${\varvec{\Sigma }}= {\varvec{I}}$ and explicitly include an intercept, i.e. ${\varvec{X}}$ contains a column of ones. In this situation it is shown in Seber and Lee (2003, Eq. 10.12) that

$$\begin{aligned} 1/n \le {\varvec{V}}_{ii} \le 1 \end{aligned}$$

which makes it is possible to tighten e.g. the lower bound in Lemma 2 when $\mu _4 > 3$ and, hence, tighten the corresponding bound in Proposition 1 to $\nu _n - \kappa _n + \frac{\mu _4 - 3}{(n-p)n}$, and analogously for the upper bound when $\mu _4 \le 3$.

Example 3

In order to illustrate the usefulness of the techniques of the present paper we will now exploit the essentially model-free aspects of Lemma 1 and, in particular, Lemma 2. As already noted after stating Lemma 2, the results of Lemma 2 are purely algebraic, given that you somehow have arrived at a variance expression of the form (4). Purely as an illustration, consider the following special case of a mixed effects model introduced in Atiqullah (1962b):

$$\begin{aligned} {\varvec{y}} = {\varvec{A}}\varvec{\theta } + {\varvec{B}}\varvec{\tau } + \varvec{\epsilon }, \end{aligned}$$

(9)

where ${\varvec{y}}$ is $n\times 1$, and where $\varvec{\theta }$ is $p_a \times 1$, and $\varvec{\epsilon }$ is $n\times 1$, are independent random vectors whose components all have mean 0 and variances $\sigma _\theta ^2$ and $\sigma _\epsilon ^2$, respectively, together with the kurtoses $\mu _4^\theta ~(= \gamma _{2\theta } +3)$ and $\mu _4^\epsilon ~(= \gamma _{2\epsilon } + 3)$, and where $\varvec{\tau }$ is a $p_b\times 1$ vector of fixed effects. Further, ${\varvec{A}}$ and ${\varvec{B}}$ are assumed to be of full column rank $p_a$ and $p_b$, respectively. Note that compared with (1) the model given by (9) is defined in terms of ${\varvec{\Sigma }}= {\varvec{I}}$ in accordance with Atiqullah (1962b). Further, the following unbiased estimators of $\sigma _\theta ^2$ and $\sigma _\epsilon ^2$ are given in Atiqullah (1962b):

$$\begin{aligned} {{\hat{\sigma }}}_\theta ^2&= \frac{p_h(M_h - M_r)}{{{\,\mathrm{tr}\,}}({\varvec{U}})}, \end{aligned}$$

(10)

$$\begin{aligned} {{\hat{\sigma }}}_\epsilon ^2&= M_r, \end{aligned}$$

(11)

where

$$\begin{aligned} M_h = \frac{{\varvec{y}}'{\varvec{H}}{\varvec{y}}}{p_h}, \quad M_r = \frac{{\varvec{y}}'{\varvec{R}}{\varvec{y}}}{p_r}, \end{aligned}$$

and where

$$\begin{aligned} {\varvec{G}}&= {\varvec{B}}({\varvec{B}}'{\varvec{B}})^{-1}{\varvec{B}}',\\ {\varvec{L}}&= {\varvec{I}} - {\varvec{G}},\\ {\varvec{H}}&= {\varvec{L}}{\varvec{A}}({\varvec{A}}'{\varvec{L}}{\varvec{A}})^{-1}{\varvec{A}}'{\varvec{L}},\\ {\varvec{R}}&= {\varvec{I}} - {\varvec{G}} - {\varvec{H}},\\ {\varvec{U}}&= {\varvec{A}}'{\varvec{H}}{\varvec{A}} = {\varvec{A}}'{\varvec{L}}{\varvec{A}}, \end{aligned}$$

and where $p_g, p_h$ and $p_r$ denote the rank of ${\varvec{G}}, {\varvec{H}}$ and ${\varvec{R}}$. Moreover, in Atiqullah (1962b) it is stated that the introduced variance parameter estimators have the following variances:

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\theta ^2)&= \frac{2\sigma _\epsilon ^4 ((p_rp_h + p_h^2) + p_r\xi (\xi {{\,\mathrm{tr}\,}}({\varvec{U}}^2) + 2{{\,\mathrm{tr}\,}}({\varvec{U}})))}{p_r{{\,\mathrm{tr}\,}}({\varvec{U}})^2}\nonumber \\&\quad \times \left( 1 + \frac{(\mu _4^\epsilon - 3)(p_r{\varvec{h}} - p_h{\varvec{r}})'(p_r{\varvec{h}} - p_h{\varvec{r}}) + (\mu _4^\theta - 3){\varvec{U}}'{\varvec{U}}p_r^2\xi ^2}{2p_r((p_rp_h + p_h^2) + p_r\xi (\xi {{\,\mathrm{tr}\,}}({\varvec{U}}^2) + 2{{\,\mathrm{tr}\,}}({\varvec{U}})))}\right) \end{aligned}$$

(12)

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\epsilon ^2)&= \frac{2\sigma _\epsilon ^4}{p_r}\left( 1 + \frac{1}{2}(\mu _4^\epsilon - 3) \frac{{\varvec{r}}'{\varvec{r}}}{p_r}\right) , \end{aligned}$$

(13)

where $\xi = \sigma _\theta ^2/\sigma _\epsilon ^2$ and where ${\varvec{h}}$ and ${\varvec{r}}$ corresponds to the vectors of the diagonal elements of ${\varvec{H}}$ and ${\varvec{R}}$. Regarding ${{\hat{\sigma }}}_\epsilon ^2$ we may start off by noting that (13) may be re-written according to

$$\begin{aligned} {\text {Var}}(p_r{{\hat{\sigma }}}_\epsilon ^2) = \sigma _\epsilon ^4\left( 2p_r + (\mu _4^\epsilon - 3) \sum _{i=1}^n {\varvec{R}}_{ii}^2\right) \end{aligned}$$

which is on the same form as (4) from Lemma 1, since ${\varvec{R}}$ is idempotent, and Lemma 2 applies. Moreover, by using the same arguments as those used in the proof of Proposition 1, it follows that $p_r = n - p_a - p_b$, where both $p_a$ and $p_b$ are constants, thus ascertaining $L^2$-consistency and that the corollaries of Proposition 1 hold as well.

Concerning the variance of ${{\hat{\sigma }}}_\theta ^2$, the expression of (12) can not be approached using Lemma 2. It is, however, possible to make use of trace inequalities to show that when e.g. $\mu _4^\theta ,\mu _4^\epsilon > 3$

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\theta ^2) \ge \frac{2\sigma _\theta ^4}{p_u} > 0 \end{aligned}$$

(14)

where $p_u := \mathrm {rank}({\varvec{U}})$ is a constant, see the appendix. That is, ${\text {Var}}({{\hat{\sigma }}}_\theta ^2)$ is bounded from below by a positive constant for all n, and ${\hat{\sigma }}_\theta ^2$ is, hence, not $L^2$-consistent. It is, however, possible to obtain sharper finite sample bounds on ${\text {Var}}({{\hat{\sigma }}}_\theta ^2)$ by using other trace inequalities—a matter not pursued further in the present note. For other examples and a deeper discussion concerning the performance of estimators of variance components, see e.g. Christensen (2019, Ex. 5.1.1 and Sec. 5.4).

Even though the above mixed effects example is only intended for illustration purposes, without any claim of practical relevance, it is still worth commenting on the relation to the results in Li (2012). In Li (2012) mixed effects nonparametric regression is considered w.r.t. the MSE of the total variance parameter estimator. That is, the situation with $\sigma _\epsilon = \sigma _\theta $ is considered. Consequently, the decomposition from Atiqullah (1962b) is not covered as a special case. Moreover, by setting $\sigma _\epsilon = \sigma _\theta $ above, the resulting estimator is neither contained in the estimators covered in Li (2012, Thm. 1). For more on differences compared with Li (2012) (and Dette et al. 1998), see Remark 2 above.

References

Atiqullah M (1962a) The estimation of residual variance in quadratically balanced least-squares problems and the robustness of the F-test. Biometrika 49(1–2):83–91
Article MathSciNet Google Scholar
Atiqullah M (1962b) On the effect of non-normality on the estimation of components of variance. J R Stat Soc Ser B (Methodol) 24:140–147
MathSciNet MATH Google Scholar
Buchwalder M, Bühlmann H, Merz M, Wüthrich MV (2006) The mean square error of prediction in the chain ladder reserving method (Mack and Murphy revisited). ASTIN Bull 36(2):521–542
Article MathSciNet Google Scholar
Christensen R (2019) Advanced linear modeling: statistical learning and dependent data, 3rd edn. Springer, New York
Book Google Scholar
Cramér H (1946) Mathematical methods of statistics. Princeton University Press, Princeton
MATH Google Scholar
Dette H, Munk A, Wagner T (1998) Estimating the variance in nonparametric regression—What is a reasonable choice? J R Stat Soc Ser B (Stat Methodol) 60(4):751–764
Article MathSciNet Google Scholar
Eaton ML (1983) Multivariate statistics: a vector space approach. Wiley, New York
MATH Google Scholar
Kremer E (1984) A class of autoregressive models for predicting the final claims amount. Insur Math Econ 3(2):111–119
Article MathSciNet Google Scholar
Li Z (2012) A comparison of error variance estimates in nonparametric mixed models. Commun Stat Theory Methods 41(4):778–790
Article MathSciNet Google Scholar
Lindholm M, Lindskog F, Wahl F (2017) Valuation of non-life liabilities from claims triangles. Risks 5(3):39
Article Google Scholar
Lindholm M, Lindskog F, Wahl F (2019) Estimation of conditional mean squared error of prediction for claims reserving. Ann Actuar Sci. https://doi.org/10.1017/S174849951900006X
Article Google Scholar
Mack T (1993) Distribution-free calculation of the standard error of chain ladder reserve estimates. ASTIN Bull 23(2):213–225
Article Google Scholar
Mathai AM, Provost SB, Hayakawa T (2012) Bilinear forms and zonal polynomials, vol 102. Springer, Berlin
MATH Google Scholar
Plackett RL (1960) Principles of regression analysis. Clarendon Press, Oxford
MATH Google Scholar
Rao CR (1970) Estimation of heteroscedastic variances in linear models. J Am Stat Assoc 65(329):161–172
Article MathSciNet Google Scholar
Rao CR (1971) Minimum variance quadratic unbiased estimation of variance components. J Multivar Anal 1:445–456
Article MathSciNet Google Scholar
Seber G, Lee A (2003) Linear regression analysis, 2nd edn. Wiley, Hoboken
Book Google Scholar
Wolkowicz H, Styan GPH (1980) Bounds for eigenvalues using traces. Linear Algebra Appl 29:471–506
Article MathSciNet Google Scholar

Download references

Acknowledgements

Open access funding provided by Stockholm University. The authors acknowledge beneficial discussion with Rolf Sundberg. The authors are also most grateful to Ronald Christensen for the comments on an earlier version of the paper which resulted in the split of $p_u \le n/2$ and $p_u > n/2$ in Lemma 2, and for pointing out that we had missed that ${\varvec{R}}$ from the current Example 3 is idempotent, which made it possible to more easily connect Example 3 to the results in the current paper and to the reference Christensen (2019). We also want to acknowledge the comments from an anonymous reviewer which pointed out a number of typos and provided comments which we believe have improved the paper.

Author information

Authors and Affiliations

Stockholms Universitet Matematiska Institutionen, Stockholm, Sweden
Mathias Lindholm & Felix Wahl

Authors

Mathias Lindholm
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wahl
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Lindholm.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Proofs

Proof of Proposition 1

We start by reformulating the linear model of (1) as follows:

$$\begin{aligned} {\widetilde{{\varvec{y}}}}= {\widetilde{{\varvec{X}}}}\varvec{\beta }+ \sigma {\varvec{e}}, \end{aligned}$$

where ${\widetilde{{\varvec{y}}}}:= {\varvec{\Sigma }}^{-1/2} {\varvec{y}}$ and ${\widetilde{{\varvec{X}}}}:= {\varvec{\Sigma }}^{-1/2} {\varvec{X}}$. Now let ${\hat{{\widetilde{{\varvec{e}}}}}}:= {\widetilde{{\varvec{y}}}}- {\widetilde{{\varvec{X}}}}{\hat{{\varvec{\beta }}}}$, it then follows that

$$\begin{aligned} {\hat{\sigma }}^2 = \frac{1}{n-p_x} {\hat{{\widetilde{{\varvec{e}}}}}}' {\hat{{\widetilde{{\varvec{e}}}}}}, \end{aligned}$$

which may be expressed in terms of ${\varvec{e}}$ by noting that

$$\begin{aligned} {\hat{{\widetilde{{\varvec{e}}}}}}&= \left( {\widetilde{{\varvec{X}}}}\varvec{\beta }+ \sigma {\varvec{e}}\right) - {\varvec{V}}{\widetilde{{\varvec{y}}}}\\&= {\widetilde{{\varvec{X}}}}\varvec{\beta }+ \sigma {\varvec{e}}- {\varvec{V}}\left( {\widetilde{{\varvec{X}}}}\varvec{\beta }+ \sigma {\varvec{e}}\right) \\&= \sigma \left( {\varvec{I}}- {\varvec{V}}\right) {\varvec{e}}= \sigma {\varvec{K}}{\varvec{e}}. \end{aligned}$$

Hence, we have

$$\begin{aligned} {\hat{\sigma }}^2&= \frac{\sigma ^2}{n - p_x} {\varvec{e}}' {\varvec{K}}' {\varvec{K}}{\varvec{e}}. \end{aligned}$$

Further, since ${\varvec{V}}$ is idempotent and symmetric it follows that ${\varvec{K}}$ will inherit these properties as well and we arrive at

$$\begin{aligned} {\hat{\sigma }}^2 = \frac{\sigma ^2}{n - p_x} {\varvec{e}}' {\varvec{K}}{\varvec{e}}. \end{aligned}$$

Finally, since ${\varvec{V}}$ is idempotent we know that ${\text {rank}}({\varvec{V}}) = p_v = {{\,\mathrm{tr}\,}}({\varvec{V}})$, where

$$\begin{aligned} {{\,\mathrm{tr}\,}}({\varvec{V}})&= {{\,\mathrm{tr}\,}}({\varvec{\Sigma }}^{-1/2} {\varvec{P}}{\varvec{\Sigma }}^{1/2})\\&= {{\,\mathrm{tr}\,}}({\varvec{P}}{\varvec{\Sigma }}^{1/2}{\varvec{\Sigma }}^{-1/2} )\\&= {{\,\mathrm{tr}\,}}({\varvec{P}}), \end{aligned}$$

and by repeating this argument if follows that ${{\,\mathrm{tr}\,}}({\varvec{P}})= {{\,\mathrm{tr}\,}}({\varvec{I}}_{p_x}) = p_x$, that is,

$$\begin{aligned} {\hat{\sigma }}^2 = \frac{\sigma ^2}{p_k} {\varvec{e}}' {\varvec{K}}{\varvec{e}}. \end{aligned}$$

This ascertains that the conditions of Lemmas 1 and 2 are fulfilled, which proves the unconditional part of Proposition 1.

The unconditional variance bounds are obtained by first noting that ${\hat{\sigma }}^2$ is unbiased and then applying a variance decomposition. This argument of course holds also for biased estimators as long as ${\mathbb {E}}[{\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}]$ is nonrandom, e.g. when the normalization constant is n and not $n-p_x$. $\square $

Proof of Corollary 1

The first part follows trivially from Proposition 1, but we also get that

$$\begin{aligned} \left\{ \begin{array}{ll} n {\text {Var}}({\hat{\sigma }}^2 |{\varvec{X}}, {\varvec{\Sigma }}) \in [n\nu _n - n\kappa _n, n\nu _n] &{}\quad \mathrm {if}~ \mu _4 > 3, \\ n {\text {Var}}({\hat{\sigma }}^2 |{\varvec{X}}, {\varvec{\Sigma }}) \in [n\nu _n, n\nu _n - n\kappa _n] &{}\quad \mathrm {if}~ \mu _4 \le 3, \end{array}\right. \end{aligned}$$

and, since $\lim _n n \nu _n = \nu $ and $\lim _n n \kappa _n = 0$, it follows that

$$\begin{aligned} \lim _{n \rightarrow \infty } n {\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}) = \nu , \end{aligned}$$

due to the assumptions on ${\varvec{X}}$ and ${\varvec{\Sigma }}$. That is, $n {\text {Var}}({\hat{\sigma }}^2 | {\varvec{X}}, {\varvec{\Sigma }}) \rightarrow \nu $ uniformly. By the same argument it follows that $n {\text {Var}}({\hat{\sigma }}^2) \rightarrow \nu $ uniformly in n. $\square $

Proof of (14)

We will now show that ${{\hat{\sigma }}}_\theta ^2$ is not in general $L^2$ consistent. If we restrict our attention to $\mu _4^\epsilon ,\mu _4^\theta \ge 3$ it follows that (12) may be bounded according to

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\theta ^2) \ge \frac{2\sigma _\epsilon ^4 ((p_rp_h + p_h^2) + p_r\xi (\xi {{\,\mathrm{tr}\,}}({\varvec{U}}^2) + 2{{\,\mathrm{tr}\,}}({\varvec{U}})))}{p_r{{\,\mathrm{tr}\,}}({\varvec{U}})^2}, \end{aligned}$$

since the second factor consists of quadratic forms and non-negative functions/constants. Further, note that by construction it is assumed that $({\varvec{A}}'{\varvec{L}}{\varvec{A}})^{-1}$ exists, i.e. ${\varvec{A}}'{\varvec{L}}{\varvec{A}}$ is positive definite and of full column rank $p_a = \mathrm {rank}(A)$, from which it follows that $0 < p_h = p_u = p_a = O(1)$. Moreover, an application of the inequality

$$\begin{aligned} \frac{{{\,\mathrm{tr}\,}}({\varvec{U}})^2}{{{\,\mathrm{tr}\,}}({\varvec{U}}^2)} \le p_u, \end{aligned}$$

see e.g. Wolkowicz and Styan (1980, Eq. 2.35), yields

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\theta ^2) \ge 2\sigma _\epsilon ^4\left( \frac{ p_rp_h + p_h^2}{p_r{{\,\mathrm{tr}\,}}({\varvec{U}})^2} + \frac{\xi ^2}{p_u} + \frac{2\xi }{{{\,\mathrm{tr}\,}}({\varvec{U}})}\right) . \end{aligned}$$

Furthermore, since $p_r \ge 0$ it follows that

$$\begin{aligned} {\text {Var}}({{\hat{\sigma }}}_\theta ^2) \ge \frac{2\sigma _\epsilon ^4 \xi ^2}{p_u} = \frac{2\sigma _\theta ^4}{p_u}> 0, \end{aligned}$$

i.e. ${\text {Var}}({{\hat{\sigma }}}_\theta ^2)$ is bounded from below by a constant value greater than 0. $\square $

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Lindholm, M., Wahl, F. On the variance parameter estimator in general linear models. Metrika 83, 243–254 (2020). https://doi.org/10.1007/s00184-019-00751-4

Download citation

Received: 05 June 2018
Published: 06 November 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s00184-019-00751-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the variance parameter estimator in general linear models

Abstract

Similar content being viewed by others

New results on asymptotic properties of likelihood estimators with persistent data for small and large T

Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions

Partially linear models with first-order autoregressive symmetric errors

1 The general linear model

Lemma 1

Lemma 2

Proof of Lemma 2

Proposition 1

Corollary 1

Remark 1

Corollary 2

Remark 2

2 Examples

Example 1

Example 2

Example 3

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A. Proofs

Proof of Proposition 1

Proof of Corollary 1

Proof of (14)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the variance parameter estimator in general linear models

Abstract

Similar content being viewed by others

New results on asymptotic properties of likelihood estimators with persistent data for small and large T

Mallows’ quasi-likelihood estimation for log-linear Poisson autoregressions

Partially linear models with first-order autoregressive symmetric errors

1 The general linear model

Lemma 1

Lemma 2

Proof of Lemma 2

Proposition 1

Corollary 1

Remark 1

Corollary 2

Remark 2

2 Examples

Example 1

Example 2

Example 3

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A. Proofs

Appendix A. Proofs

Proof of Proposition 1

Proof of Corollary 1

Proof of (14)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation