Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Helwig, Nathaniel E.

doi:10.1007/s11222-015-9610-5

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Published: 12 November 2015

Volume 26, pages 1319–1336, (2016)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Nathaniel E. Helwig¹

586 Accesses
9 Citations
Explore all metrics

Abstract

Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Fixed and random effects models: making an informed choice

Article Open access 07 August 2018

Notes

For multilevel data with one grouping factor, the bigsplines package exploits the sparsity of ${\mathbf {Z}}$ and only stores the relevant crossproduct matrices for estimation.
Letting s.t denote subject.trial, we excluded the following: 11.2, 12.8, 12.10, 18.0, 18.2, 27.4, 27.6, 27.8, 39.6, 43.0, 46.0, 53.8, 55.2, 55.4, 56.0, 57.2, 82.4, 83.0, 84.10, 84.12, 92.18, 99.0.
By convention, ERPs are plotted upside down, i.e., positive voltages downwards and negative voltages upwards.

References

Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
Bates, D., Maechler, M., Bolker, B., Walker, S.: lme4: Linear mixed-effects models using Eigen and S4. http://CRAN.R-project.org/package=lme4, r package version 1.1-8 (2015)
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Mathe. 31, 377–403 (1979)
Article MathSciNet MATH Google Scholar
de Leeuw, J., Meijer, E.: Introduction to multilevel analysis. In: de Leeuw, J., Meijer, E. (eds.) Handbook of Multilevel Analysis, pp. 1–75. Springer, New York (2008)
Chapter Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Fu, W.J.: Penalized estimating equations. Biometrics 59, 126–132 (2003)
Article MathSciNet MATH Google Scholar
Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information reml: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995)
Article MATH Google Scholar
Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, West Sussex (2011)
MATH Google Scholar
Gu, C.: Smoothing Spline ANOVA Models, 2nd edn. Springer, New York (2013)
Book MATH Google Scholar
Gu, C.: gss: General smoothing splines. http://CRAN.R-project.org/package=gss, r package version 2.1-5 (2014)
Gu, C., Ma, P.: Optimal smoothing in nonparametric mixed-effect models. Annal. Stat. 33, 1357–1379 (2005)
Article MathSciNet MATH Google Scholar
Hartley, H.O., Rao, J.N.K.: Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93–108 (1967)
Article MathSciNet MATH Google Scholar
Harville, D.A.: Maximum-likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72, 320–340 (1977)
Article MathSciNet MATH Google Scholar
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC, New York (1990)
MATH Google Scholar
Helwig, N.E.: bigsplines: smoothing splines for large samples. http://CRAN.R-project.org/package=bigsplines, r package version 1.0-7 (2015)
Helwig, N.E., Ma, P.: Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. J. Comput. Graph. Stat. 24, 715–732 (2015)
Article MathSciNet Google Scholar
Helwig, N.E., Ma, P.: (in press) Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters. Statistics and Its Interface
Henderson, C.R.: Estimation of genetic parameters (abstract). Annal. Math. Stat. 21, 309–310 (1950)
Google Scholar
Henderson, C.R.: Estimation of variance and covariance components. Biometrics 9, 226–252 (1953)
Article MathSciNet Google Scholar
Henderson, C.R., Kempthorne, O., Searle, S.R., von Krosigk, C.M.: The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192–218 (1959)
Article MATH Google Scholar
Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev. 23, 53–60 (1981)
Article MathSciNet MATH Google Scholar
Kenward, M., Molenberghs, G.: Likelihood based frequentist inference when data are missing at random. Stat. Sci. 12, 236–247 (1998)
MathSciNet MATH Google Scholar
Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B 66, 337–356 (2004)
Article MathSciNet MATH Google Scholar
Laird, N.M.: Computation of variance components using the EM algorithm. J. Stat. Comput. Simul. 14, 295–303 (1982)
Article MathSciNet MATH Google Scholar
Li, K.C.: Asymptotic optimality for ${C}_{p}$, ${C}_{L}$, cross-validation and generalized cross-validation: discrete index set. Annal. Stat. 15, 958–975 (1987)
Article MathSciNet MATH Google Scholar
Ma, P., Huang, J., Zhang, N.: Efficient computation of smoothing splines via adaptive basis sampling. Biometrika 102, 631–645 (2015)
Article MathSciNet MATH Google Scholar
Moore, E.H.: On the reciprocal of the general algebraic matrix. Bull. Am. Mathe. Soc. 26, 394–395 (1920)
Google Scholar
Paterson, L.: Socio-economic status and educational attainment: a multidimensional and multilevel studys. Eval. Res. Educ. 5, 97–121 (1991)
Article Google Scholar
Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)
Article MathSciNet MATH Google Scholar
Penrose, R.: A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51, 406–413 (1950)
Article MathSciNet MATH Google Scholar
Porjesz, B., Begleiter, H.: Event-related potentials for individuals at risk for alcoholism. Alcohol 7, 465–469 (1990a)
Article Google Scholar
Porjesz, B., Begleiter, H.: Neuroelectric processes in individuals at risk for alcoholism. Alcohol Alcohol. 25, 251–256 (1990b)
Google Scholar
Porjesz, B., Begleiter, H., Garozzo, R.: Visual evoked potential correlates of information deficits in chronic alcoholics. In: Begleiter, H. (ed.) Biological Effects of Alcohol, pp. 603–623. Plenum Press, New York (1980)
Chapter Google Scholar
Porjesz, B., Begleiter, H., Bihari, B., Kissin, B.: The N2 component of the event-related brain potential in abstinent alcoholics. Electroencephalogr. Clin. Neurophysiol. 66, 121–131 (1987)
Article Google Scholar
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Book MATH Google Scholar
Searle, S.E.: Applying the EM algorithm to calculating ML and REML estimates of variance components. Technical report BU-1213-M, Biometrics Unit, Cornell University (1993)
Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer, New York (2000)
MATH Google Scholar
Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. Annal. Stat. 4, 1378–1402 (1985)
Article MathSciNet MATH Google Scholar
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Book MATH Google Scholar
Wang, Y.: Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc, Ser. B 60, 159–174 (1998a)
Article MathSciNet MATH Google Scholar
Wang, Y.: Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 93, 341–348 (1998b)
Article MATH Google Scholar
Wang, Y., Ke, C.: assist: A Suite of S functions for Implementing Spline smoothing Techniques. http://CRAN.R-project.org/package=assist, r package version 3.1.2 (2013)
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)
Article MathSciNet MATH Google Scholar
Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton (2006)
MATH Google Scholar
Wood, S.N.: mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation and GAMMs by REML/PQL. http://CRAN.R-project.org/package=mgcv, r package version 1.8-7 (2015)
Zhang, D., Lin, X., Raz, J., Sowers, M.: Semiparametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 93, 710–719 (1998)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology & School of Statistics, University of Minnesota-Twin Cities, Minneapolis, USA
Nathaniel E. Helwig

Authors

Nathaniel E. Helwig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nathaniel E. Helwig.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 495 KB)

Appendix

1.1 Smoothing parameter initialization for REML

We initialized $\theta _{k}=1 \ \forall k$ when defining the fixed-effects design matrix ${\mathbf {X}}_{{\varvec{\theta }}} =({\mathbf {K}},{\mathbf {J}}_{{\varvec{\theta }}}) = \tilde{{\mathbf {F}}}_{1}\tilde{{\mathbf {R}}}$ used for the REML error contrasts. To understand the motivation for this, note that we can write

$$\begin{aligned} {\mathbf {X}}_{{\varvec{\theta }}} = \left( \begin{array}{c@{\quad }c} \tilde{{\mathbf {F}}}_{K}&\tilde{{\mathbf {F}}}_{J} \end{array} \right) \left( \begin{array}{c@{\quad }c} \tilde{{\mathbf {R}}}_{K} &{} \tilde{{\mathbf {R}}}_{KJ} \\ {\mathbf {0}} &{} \tilde{{\mathbf {R}}}_{J} \end{array} \right) \end{aligned}$$

where $\tilde{{\mathbf {F}}}_{1} = (\tilde{{\mathbf {F}}}_{K} \ \tilde{{\mathbf {F}}}_{J})$, and $\tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {R}}}_{K}$ is the QR decomposition of ${\mathbf {K}}$. Note that this implies $\tilde{{\mathbf {R}}}_{K} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {K}}$ and $\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}$ are upper-triangular, $\tilde{{\mathbf {R}}}_{KJ} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}$, and $\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {R}}}_{J}$ is the QR decomposition of $({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}$. Combining the results, we have $\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}} = ({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}$, where $\tilde{{\mathbf {F}}}_{K}$ does not depend on ${\varvec{\theta }}$. A sufficient condition to ensure that $\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}$ is upper-triangular for any ${\varvec{\theta }}$ is obtained by requiring $\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{{J}_k}$ to be upper-triangular for all k, where $\tilde{{\mathbf {F}}}_{{\!J}_k}\tilde{{\mathbf {R}}}_{{J}_k} = {\mathbf {J}}_{k}$ is the QR decomposition of ${\mathbf {J}}_k$, and ${\mathbf {J}}_{{\varvec{\theta }}} = \sum _{k=1}^{s} \theta _k {\mathbf {J}}_{k}$. For $k>1$ this sufficient condition imposes more constraints than are necessary, so we recommend setting $\theta _k = 1 \forall k$ and requiring $\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{J_\bullet }$ to be upper-triangular, where $\tilde{{\mathbf {F}}}_{J_\bullet }\tilde{{\mathbf {R}}}_{J_\bullet }$ is the QR decomposition of ${\mathbf {J}}_{\bullet } = \sum _{k=1}^{s} {\mathbf {J}}_{k}$. Given that the ${\mathbf {J}}_{k}$ matrices span different (orthogonal) contrast spaces, this aggregate requirement should produce a reasonable approximation to the sufficient condition.

1.2 REML expected information matrix

We first present the results of Gilmour et al. (1995) for the gradient of the REML log-likelihood in the VC model. First, note that ${\mathbf {H}}_{k}={\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }$ and note that we can write ${\mathbf {Z}}_{k}={\mathbf {W}}{\mathbf {S}}_{k}$ where ${\mathbf {W}}=({\mathbf {X}}_{{\varvec{\theta }}},{\mathbf {Z}})$. Also note that we can write ${\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}$ where ${\varvec{\varSigma }}_{0}$ is ${\varvec{\varSigma }}^{-1}$ augmented with zeros corresponding to ${\mathbf {X}}_{{\varvec{\theta }}}$ in ${\mathbf {W}}$. The needed trace $\alpha = {\mathrm {tr}}({\mathbf {H}}_{k}{\mathbf {P}})$ has the form

$$\begin{aligned} \alpha&= {\mathrm {tr}}\left[ {\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }\left( {\varvec{\varPsi }}^{-1} -{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \nonumber \\&={\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime }\left( {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} -{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}}\right) {\mathbf {S}}_{k}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime } \{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime } {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] - {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] \nonumber \\&= q_{k}/\tau _{k} - {\mathrm {tr}}\left( {\mathbf {C}}_{kk}^{-1}\right) \big /\tau _{k}^{2} \end{aligned}$$

(24)

The last piece for the gradient calculation derives from the fact that ${\mathbf {Z}}^{\prime }{\mathbf {P}}{\mathbf {y}}={\varvec{\varSigma }}^{-1}\hat{{\mathbf {b}}}$.

Now, note that the trace calculation $\beta = {\mathrm {tr}}({\mathbf {H}}_{j}{\mathbf {P}}{\mathbf {H}}_{k}{\mathbf {P}})$ needed for the information matrix has the form

$$\begin{aligned} \beta&= {\mathrm {tr}}\left[ {\mathbf {Z}}_{j}{\mathbf {Z}}_{j}^{\prime }{\mathbf {P}}{\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }{\mathbf {P}}\right] \nonumber \\&={\mathrm {tr}}\left[ {\mathbf {S}}_{j}^{\prime }{\mathbf {W}}^{\prime }{\mathbf {P}}{\mathbf {W}}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\mathbf {W}}^{\prime }{\mathbf {P}}{\mathbf {W}}{\mathbf {S}}_{j}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{j}^{\prime }\{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime } \{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right] \nonumber \\&={\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right) -2{\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j} \right) \nonumber \\&\qquad + {\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k} {\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right) \nonumber \\&= I_{\{j=k\}}\left\{ q_{k}/\tau _{k}^{2} - 2{\mathrm {tr}}\left( {\mathbf {C}}_{kk}^{-1}\right) \big /\tau _{k}^{3}\right\} \nonumber \\&\quad + {\mathrm {tr}}\left( {\mathbf {C}}_{jk}^{-1}{\mathbf {C}}_{kj}^{-1}\right) \big / \left( \tau _{j}^{2}\tau _{k}^{2}\right) \end{aligned}$$

(25)

where $I_{\{j=k\}}$ is the indicator function (note ${\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}={\mathbf {0}}$ whenever $j\ne k$), and ${\mathbf {C}}_{jk}^{-1}$ denotes the portion of ${\mathbf {C}}^{-1}$ that corresponds to ${\mathbf {Z}}_{j}^{\prime }{\mathbf {Z}}_{k}$.

1.3 REML algorithms

In the boxed algorithm, $\epsilon $ is the convergence tolerance, and $\omega $ is the maximum number of allowed iterations. In practice, we set $\epsilon =10^{-4}$ and $\omega =500$ as a default. The only difference between the REML–FS and REML–EM algorithms is Step 1 of the iterative procedure, so we combine the two algorithms into one framework below (with two distinct Step 1 parts of the iterative procedure). In practice, either the FS update or the EM update is used as Step 1 of the iterative procedure. The algorithm convergence is determined using a relative change $\tilde{{\mathcal {L}}} = -2{\mathcal {L}}$ where ${\mathcal {L}}$ is REML log-likelihood.

1.4 Efficient crossproduct calculation

For the mixed-effects model, the numerator of the GCV score can be written as

$$\begin{aligned} n\left\| \left( {\mathbf {I}}_{n}-{\mathbf {S}}_{{\varvec{\tau }}{\varvec{\lambda }}}\right) \tilde{{\mathbf {y}}}\right\| ^2 = n\left[ \Vert \tilde{{\mathbf {y}}}\Vert ^{2} - 2\tilde{{\mathbf {y}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {z}}} + \tilde{{\mathbf {z}}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}\tilde{{\mathbf {z}}}\right] \end{aligned}$$

(26)

where $\tilde{{\mathbf {y}}}_{{\varvec{\theta }}} =\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {y}}}$, and $\tilde{{\mathbf {z}}}={\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime } \tilde{{\mathbf {y}}}_{{\varvec{\theta }}}$ with ${\mathbf {V}}{\mathbf {E}}{\mathbf {V}}^{\prime }$ denoting the full-rank spectral decomposition of $\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}} + n\lambda \tilde{{\mathbf {Q}}}_{{\varvec{\theta }}}$. Furthermore, ${\mathrm {tr}}({\mathbf {S}}_{{\varvec{\tau }}{\varvec{\lambda }}}) ={\mathrm {tr}}({\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}})$, and the inverse of $\hat{{\varvec{\varSigma }}}_{*}$ has the form

$$\begin{aligned} \hat{{\varvec{\varSigma }}}_{*}^{-1} = {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {Z}}\left( \hat{{\varvec{\varSigma }}}{}^{-1} +{\mathbf {Z}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {Z}}\right) ^{-1}{\mathbf {Z}}^{\prime }{\varvec{\varPsi }}^{-1} \end{aligned}$$

(27)

which uses Eq. (17) of Henderson and Searle (1981). As a result, the pieces of $\tilde{{\mathbf {y}}}_{{\varvec{\theta }}}$ and $\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}}$ can be formed from the crossproduct vectors and matrices that were calculated for the REML estimation of the variance components. Thus, it is never necessary to form the $n \times n$ matrix $\hat{{\varvec{\varSigma }}}_{*}^{-1}$ to evaluate the GCV score.

1.5 Extensions for unknown error covariance matrices

Suppose that ${\varvec{\varPsi }}$ is non-diagonal and depends on the unknown parameters ${\varvec{\nu }} = (\nu _{1},\ldots ,\nu _m)$. Letting ${\varvec{\varPsi }}_k = \partial {\mathbf {H}} / \partial \nu _k$ denote the derivative of ${\mathbf {H}} = {\varvec{\varPsi }} + {\mathbf {Z}}{\varvec{\varSigma }}{\mathbf {Z}}^{\prime }$ with respect to $\nu _k$, analogues of the formulas in Eqs. (13) and (14) can be applied to obtain the gradient and expected information matrix corresponding to the variance parameters ${\varvec{\nu }}$. In this case, the needed trace calculation $\alpha = {\mathrm {tr}}({\varvec{\varPsi }}_k {\mathbf {P}})$ has the form

$$\begin{aligned} \alpha&= {\mathrm {tr}}\left[ {\varvec{\varPsi }}_k \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \\&= {\mathrm {tr}} \left[ {\varvec{\varPsi }}_k {\varvec{\varPsi }}^{-1} \right] - {\mathrm {tr}} \left[ {\mathbf {C}}^{-1}{\mathbf {C}}_k \right] \end{aligned}$$

where ${\mathbf {C}}_k = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{k} {\varvec{\varPsi }}^{-1} {\mathbf {W}}$. Similarly, the trace calculation $\beta = {\mathrm {tr}}({\varvec{\varPsi }}_j{\mathbf {P}}{\varvec{\varPsi }}_k{\mathbf {P}})$ needed for the information matrix has the form

$$\begin{aligned} \beta&= {\mathrm {tr}} \left[ {\varvec{\varPsi }}_j \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right. \\&\quad \times \left. {\varvec{\varPsi }}_k \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \\&={\mathrm {tr}}\left[ {\varvec{\varPsi }}_{j}{\varvec{\varPsi }}^{-1}{\varvec{\varPsi }}_{k}{\varvec{\varPsi }}^{-1}\right] \\&\quad -2{\mathrm {tr}}\left[ {\varvec{\varPsi }}_{j}{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{k}{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right] \\&\quad +{\mathrm {tr}}\left[ {\mathbf {C}}_{j}{\mathbf {C}}^{-1}{\mathbf {C}}_{k}{\mathbf {C}}^{-1}\right] \end{aligned}$$

where ${\mathbf {C}}_j = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{j} {\varvec{\varPsi }}^{-1} {\mathbf {W}}$. Consequently, when ${\varvec{\varPsi }}$ depends on the unknown parameters ${\varvec{\nu }}$, the efficiency of the estimation will depend on the form of ${\varvec{\varPsi }}_k$, which will depend on the assumed structure of ${\varvec{\varPsi }}$.

Unlike the REML–VC algorithm presented for known ${\varvec{\varPsi }}$, the computational cost for unknown ${\varvec{\varPsi }}$ will generally not be insensitive to the sample size n. This is because the ${\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}$ matrix will have to be iteratively updated, i.e., recalculated for each new ${\varvec{\varPsi }}$. In this case, it may possible to apply the rounding parameter approximation proposed by Helwig and Ma (in press) to reduce the data to $u \ll n$ unique data points, which would reduce the computational burden involved with the iterative formation of ${\mathbf {C}}$. However, for simple parameterizations of ${\varvec{\varPsi }}$ (e.g., first-order autoregressive), it may be more computationally efficient to perform a grid search, i.e., fix the parameters in ${\varvec{\nu }}$ and use the REML–VC algorithm for known ${\varvec{\varPsi }}$. This sort of approach would be easily parallelizable, and (assuming the number of $\nu _{k}$ parameters is small) could be much more efficient than estimating the $\nu _{k}$ parameters using a REML approach.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Helwig, N.E. Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Stat Comput 26, 1319–1336 (2016). https://doi.org/10.1007/s11222-015-9610-5

Download citation

Received: 16 April 2015
Accepted: 01 November 2015
Published: 12 November 2015
Issue Date: November 2016
DOI: https://doi.org/10.1007/s11222-015-9610-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Violating the normality assumption may be the lesser of two evils

Fixed and random effects models: making an informed choice

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 495 KB)

Appendix

1.1 Smoothing parameter initialization for REML

1.2 REML expected information matrix

1.3 REML algorithms

1.4 Efficient crossproduct calculation

1.5 Extensions for unknown error covariance matrices

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Violating the normality assumption may be the lesser of two evils

Fixed and random effects models: making an informed choice

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 495 KB)

Appendix

Appendix

1.1 Smoothing parameter initialization for REML

1.2 REML expected information matrix

1.3 REML algorithms

1.4 Efficient crossproduct calculation

1.5 Extensions for unknown error covariance matrices

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation