Skip to main content
Log in

Efficient estimation of variance components in nonparametric mixed-effects models with large samples

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. For multilevel data with one grouping factor, the bigsplines package exploits the sparsity of \({\mathbf {Z}}\) and only stores the relevant crossproduct matrices for estimation.

  2. Letting s.t denote subject.trial, we excluded the following: 11.2, 12.8, 12.10, 18.0, 18.2, 27.4, 27.6, 27.8, 39.6, 43.0, 46.0, 53.8, 55.2, 55.4, 56.0, 57.2, 82.4, 83.0, 84.10, 84.12, 92.18, 99.0.

  3. By convention, ERPs are plotted upside down, i.e., positive voltages downwards and negative voltages upwards.

References

  • Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)

  • Bates, D., Maechler, M., Bolker, B., Walker, S.: lme4: Linear mixed-effects models using Eigen and S4. http://CRAN.R-project.org/package=lme4, r package version 1.1-8 (2015)

  • Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Mathe. 31, 377–403 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  • de Leeuw, J., Meijer, E.: Introduction to multilevel analysis. In: de Leeuw, J., Meijer, E. (eds.) Handbook of Multilevel Analysis, pp. 1–75. Springer, New York (2008)

    Chapter  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  • Fu, W.J.: Penalized estimating equations. Biometrics 59, 126–132 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information reml: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995)

    Article  MATH  Google Scholar 

  • Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, West Sussex (2011)

    MATH  Google Scholar 

  • Gu, C.: Smoothing Spline ANOVA Models, 2nd edn. Springer, New York (2013)

    Book  MATH  Google Scholar 

  • Gu, C.: gss: General smoothing splines. http://CRAN.R-project.org/package=gss, r package version 2.1-5 (2014)

  • Gu, C., Ma, P.: Optimal smoothing in nonparametric mixed-effect models. Annal. Stat. 33, 1357–1379 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Hartley, H.O., Rao, J.N.K.: Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93–108 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  • Harville, D.A.: Maximum-likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72, 320–340 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC, New York (1990)

    MATH  Google Scholar 

  • Helwig, N.E.: bigsplines: smoothing splines for large samples. http://CRAN.R-project.org/package=bigsplines, r package version 1.0-7 (2015)

  • Helwig, N.E., Ma, P.: Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. J. Comput. Graph. Stat. 24, 715–732 (2015)

    Article  MathSciNet  Google Scholar 

  • Helwig, N.E., Ma, P.: (in press) Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters. Statistics and Its Interface

  • Henderson, C.R.: Estimation of genetic parameters (abstract). Annal. Math. Stat. 21, 309–310 (1950)

    Google Scholar 

  • Henderson, C.R.: Estimation of variance and covariance components. Biometrics 9, 226–252 (1953)

    Article  MathSciNet  Google Scholar 

  • Henderson, C.R., Kempthorne, O., Searle, S.R., von Krosigk, C.M.: The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192–218 (1959)

    Article  MATH  Google Scholar 

  • Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev. 23, 53–60 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • Kenward, M., Molenberghs, G.: Likelihood based frequentist inference when data are missing at random. Stat. Sci. 12, 236–247 (1998)

    MathSciNet  MATH  Google Scholar 

  • Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B 66, 337–356 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Laird, N.M.: Computation of variance components using the EM algorithm. J. Stat. Comput. Simul. 14, 295–303 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, K.C.: Asymptotic optimality for \({C}_{p}\), \({C}_{L}\), cross-validation and generalized cross-validation: discrete index set. Annal. Stat. 15, 958–975 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, P., Huang, J., Zhang, N.: Efficient computation of smoothing splines via adaptive basis sampling. Biometrika 102, 631–645 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Moore, E.H.: On the reciprocal of the general algebraic matrix. Bull. Am. Mathe. Soc. 26, 394–395 (1920)

    Google Scholar 

  • Paterson, L.: Socio-economic status and educational attainment: a multidimensional and multilevel studys. Eval. Res. Educ. 5, 97–121 (1991)

    Article  Google Scholar 

  • Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  • Penrose, R.: A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51, 406–413 (1950)

    Article  MathSciNet  MATH  Google Scholar 

  • Porjesz, B., Begleiter, H.: Event-related potentials for individuals at risk for alcoholism. Alcohol 7, 465–469 (1990a)

    Article  Google Scholar 

  • Porjesz, B., Begleiter, H.: Neuroelectric processes in individuals at risk for alcoholism. Alcohol Alcohol. 25, 251–256 (1990b)

    Google Scholar 

  • Porjesz, B., Begleiter, H., Garozzo, R.: Visual evoked potential correlates of information deficits in chronic alcoholics. In: Begleiter, H. (ed.) Biological Effects of Alcohol, pp. 603–623. Plenum Press, New York (1980)

    Chapter  Google Scholar 

  • Porjesz, B., Begleiter, H., Bihari, B., Kissin, B.: The N2 component of the event-related brain potential in abstinent alcoholics. Electroencephalogr. Clin. Neurophysiol. 66, 121–131 (1987)

    Article  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Searle, S.E.: Applying the EM algorithm to calculating ML and REML estimates of variance components. Technical report BU-1213-M, Biometrics Unit, Cornell University (1993)

  • Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer, New York (2000)

    MATH  Google Scholar 

  • Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. Annal. Stat. 4, 1378–1402 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)

    Book  MATH  Google Scholar 

  • Wang, Y.: Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc, Ser. B 60, 159–174 (1998a)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, Y.: Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 93, 341–348 (1998b)

    Article  MATH  Google Scholar 

  • Wang, Y., Ke, C.: assist: A Suite of S functions for Implementing Spline smoothing Techniques. http://CRAN.R-project.org/package=assist, r package version 3.1.2 (2013)

  • Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton (2006)

    MATH  Google Scholar 

  • Wood, S.N.: mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation and GAMMs by REML/PQL. http://CRAN.R-project.org/package=mgcv, r package version 1.8-7 (2015)

  • Zhang, D., Lin, X., Raz, J., Sowers, M.: Semiparametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 93, 710–719 (1998)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nathaniel E. Helwig.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 495 KB)

Appendix

Appendix

1.1 Smoothing parameter initialization for REML

We initialized \(\theta _{k}=1 \ \forall k\) when defining the fixed-effects design matrix \({\mathbf {X}}_{{\varvec{\theta }}} =({\mathbf {K}},{\mathbf {J}}_{{\varvec{\theta }}}) = \tilde{{\mathbf {F}}}_{1}\tilde{{\mathbf {R}}}\) used for the REML error contrasts. To understand the motivation for this, note that we can write

$$\begin{aligned} {\mathbf {X}}_{{\varvec{\theta }}} = \left( \begin{array}{c@{\quad }c} \tilde{{\mathbf {F}}}_{K}&\tilde{{\mathbf {F}}}_{J} \end{array} \right) \left( \begin{array}{c@{\quad }c} \tilde{{\mathbf {R}}}_{K} &{} \tilde{{\mathbf {R}}}_{KJ} \\ {\mathbf {0}} &{} \tilde{{\mathbf {R}}}_{J} \end{array} \right) \end{aligned}$$

where \(\tilde{{\mathbf {F}}}_{1} = (\tilde{{\mathbf {F}}}_{K} \ \tilde{{\mathbf {F}}}_{J})\), and \(\tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {R}}}_{K}\) is the QR decomposition of \({\mathbf {K}}\). Note that this implies \(\tilde{{\mathbf {R}}}_{K} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {K}}\) and \(\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\) are upper-triangular, \(\tilde{{\mathbf {R}}}_{KJ} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\), and \(\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {R}}}_{J}\) is the QR decomposition of \(({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}\). Combining the results, we have \(\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}} = ({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}\), where \(\tilde{{\mathbf {F}}}_{K}\) does not depend on \({\varvec{\theta }}\). A sufficient condition to ensure that \(\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\) is upper-triangular for any \({\varvec{\theta }}\) is obtained by requiring \(\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{{J}_k}\) to be upper-triangular for all k, where \(\tilde{{\mathbf {F}}}_{{\!J}_k}\tilde{{\mathbf {R}}}_{{J}_k} = {\mathbf {J}}_{k}\) is the QR decomposition of \({\mathbf {J}}_k\), and \({\mathbf {J}}_{{\varvec{\theta }}} = \sum _{k=1}^{s} \theta _k {\mathbf {J}}_{k}\). For \(k>1\) this sufficient condition imposes more constraints than are necessary, so we recommend setting \(\theta _k = 1 \forall k\) and requiring \(\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{J_\bullet }\) to be upper-triangular, where \(\tilde{{\mathbf {F}}}_{J_\bullet }\tilde{{\mathbf {R}}}_{J_\bullet }\) is the QR decomposition of \({\mathbf {J}}_{\bullet } = \sum _{k=1}^{s} {\mathbf {J}}_{k}\). Given that the \({\mathbf {J}}_{k}\) matrices span different (orthogonal) contrast spaces, this aggregate requirement should produce a reasonable approximation to the sufficient condition.

1.2 REML expected information matrix

We first present the results of Gilmour et al. (1995) for the gradient of the REML log-likelihood in the VC model. First, note that \({\mathbf {H}}_{k}={\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }\) and note that we can write \({\mathbf {Z}}_{k}={\mathbf {W}}{\mathbf {S}}_{k}\) where \({\mathbf {W}}=({\mathbf {X}}_{{\varvec{\theta }}},{\mathbf {Z}})\). Also note that we can write \({\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}\) where \({\varvec{\varSigma }}_{0}\) is \({\varvec{\varSigma }}^{-1}\) augmented with zeros corresponding to \({\mathbf {X}}_{{\varvec{\theta }}}\) in \({\mathbf {W}}\). The needed trace \(\alpha = {\mathrm {tr}}({\mathbf {H}}_{k}{\mathbf {P}})\) has the form

$$\begin{aligned} \alpha&= {\mathrm {tr}}\left[ {\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }\left( {\varvec{\varPsi }}^{-1} -{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \nonumber \\&={\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime }\left( {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} -{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}}\right) {\mathbf {S}}_{k}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime } \{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime } {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] - {\mathrm {tr}}\left[ {\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}\right] \nonumber \\&= q_{k}/\tau _{k} - {\mathrm {tr}}\left( {\mathbf {C}}_{kk}^{-1}\right) \big /\tau _{k}^{2} \end{aligned}$$
(24)

The last piece for the gradient calculation derives from the fact that \({\mathbf {Z}}^{\prime }{\mathbf {P}}{\mathbf {y}}={\varvec{\varSigma }}^{-1}\hat{{\mathbf {b}}}\).

Now, note that the trace calculation \(\beta = {\mathrm {tr}}({\mathbf {H}}_{j}{\mathbf {P}}{\mathbf {H}}_{k}{\mathbf {P}})\) needed for the information matrix has the form

$$\begin{aligned} \beta&= {\mathrm {tr}}\left[ {\mathbf {Z}}_{j}{\mathbf {Z}}_{j}^{\prime }{\mathbf {P}}{\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }{\mathbf {P}}\right] \nonumber \\&={\mathrm {tr}}\left[ {\mathbf {S}}_{j}^{\prime }{\mathbf {W}}^{\prime }{\mathbf {P}}{\mathbf {W}}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\mathbf {W}}^{\prime }{\mathbf {P}}{\mathbf {W}}{\mathbf {S}}_{j}\right] \nonumber \\&= {\mathrm {tr}}\left[ {\mathbf {S}}_{j}^{\prime }\{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime } \{{\mathbf {C}} - {\varvec{\varSigma }}_{0}\} {\mathbf {C}}^{-1} {\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right] \nonumber \\&={\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right) -2{\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}{\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j} \right) \nonumber \\&\qquad + {\mathrm {tr}}\left( {\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k} {\mathbf {S}}_{k}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {C}}^{-1}{\varvec{\varSigma }}_{0}{\mathbf {S}}_{j}\right) \nonumber \\&= I_{\{j=k\}}\left\{ q_{k}/\tau _{k}^{2} - 2{\mathrm {tr}}\left( {\mathbf {C}}_{kk}^{-1}\right) \big /\tau _{k}^{3}\right\} \nonumber \\&\quad + {\mathrm {tr}}\left( {\mathbf {C}}_{jk}^{-1}{\mathbf {C}}_{kj}^{-1}\right) \big / \left( \tau _{j}^{2}\tau _{k}^{2}\right) \end{aligned}$$
(25)

where \(I_{\{j=k\}}\) is the indicator function (note \({\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}={\mathbf {0}}\) whenever \(j\ne k\)), and \({\mathbf {C}}_{jk}^{-1}\) denotes the portion of \({\mathbf {C}}^{-1}\) that corresponds to \({\mathbf {Z}}_{j}^{\prime }{\mathbf {Z}}_{k}\).

1.3 REML algorithms

In the boxed algorithm, \(\epsilon \) is the convergence tolerance, and \(\omega \) is the maximum number of allowed iterations. In practice, we set \(\epsilon =10^{-4}\) and \(\omega =500\) as a default. The only difference between the REML–FS and REML–EM algorithms is Step 1 of the iterative procedure, so we combine the two algorithms into one framework below (with two distinct Step 1 parts of the iterative procedure). In practice, either the FS update or the EM update is used as Step 1 of the iterative procedure. The algorithm convergence is determined using a relative change \(\tilde{{\mathcal {L}}} = -2{\mathcal {L}}\) where \({\mathcal {L}}\) is REML log-likelihood.

1.4 Efficient crossproduct calculation

For the mixed-effects model, the numerator of the GCV score can be written as

$$\begin{aligned} n\left\| \left( {\mathbf {I}}_{n}-{\mathbf {S}}_{{\varvec{\tau }}{\varvec{\lambda }}}\right) \tilde{{\mathbf {y}}}\right\| ^2 = n\left[ \Vert \tilde{{\mathbf {y}}}\Vert ^{2} - 2\tilde{{\mathbf {y}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {z}}} + \tilde{{\mathbf {z}}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}\tilde{{\mathbf {z}}}\right] \end{aligned}$$
(26)

where \(\tilde{{\mathbf {y}}}_{{\varvec{\theta }}} =\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {y}}}\), and \(\tilde{{\mathbf {z}}}={\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime } \tilde{{\mathbf {y}}}_{{\varvec{\theta }}}\) with \({\mathbf {V}}{\mathbf {E}}{\mathbf {V}}^{\prime }\) denoting the full-rank spectral decomposition of \(\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}} + n\lambda \tilde{{\mathbf {Q}}}_{{\varvec{\theta }}}\). Furthermore, \({\mathrm {tr}}({\mathbf {S}}_{{\varvec{\tau }}{\varvec{\lambda }}}) ={\mathrm {tr}}({\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}})\), and the inverse of \(\hat{{\varvec{\varSigma }}}_{*}\) has the form

$$\begin{aligned} \hat{{\varvec{\varSigma }}}_{*}^{-1} = {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {Z}}\left( \hat{{\varvec{\varSigma }}}{}^{-1} +{\mathbf {Z}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {Z}}\right) ^{-1}{\mathbf {Z}}^{\prime }{\varvec{\varPsi }}^{-1} \end{aligned}$$
(27)

which uses Eq. (17) of Henderson and Searle (1981). As a result, the pieces of \(\tilde{{\mathbf {y}}}_{{\varvec{\theta }}}\) and \(\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}}\) can be formed from the crossproduct vectors and matrices that were calculated for the REML estimation of the variance components. Thus, it is never necessary to form the \(n \times n\) matrix \(\hat{{\varvec{\varSigma }}}_{*}^{-1}\) to evaluate the GCV score.

1.5 Extensions for unknown error covariance matrices

Suppose that \({\varvec{\varPsi }}\) is non-diagonal and depends on the unknown parameters \({\varvec{\nu }} = (\nu _{1},\ldots ,\nu _m)\). Letting \({\varvec{\varPsi }}_k = \partial {\mathbf {H}} / \partial \nu _k\) denote the derivative of \({\mathbf {H}} = {\varvec{\varPsi }} + {\mathbf {Z}}{\varvec{\varSigma }}{\mathbf {Z}}^{\prime }\) with respect to \(\nu _k\), analogues of the formulas in Eqs. (13) and (14) can be applied to obtain the gradient and expected information matrix corresponding to the variance parameters \({\varvec{\nu }}\). In this case, the needed trace calculation \(\alpha = {\mathrm {tr}}({\varvec{\varPsi }}_k {\mathbf {P}})\) has the form

$$\begin{aligned} \alpha&= {\mathrm {tr}}\left[ {\varvec{\varPsi }}_k \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \\&= {\mathrm {tr}} \left[ {\varvec{\varPsi }}_k {\varvec{\varPsi }}^{-1} \right] - {\mathrm {tr}} \left[ {\mathbf {C}}^{-1}{\mathbf {C}}_k \right] \end{aligned}$$

where \({\mathbf {C}}_k = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{k} {\varvec{\varPsi }}^{-1} {\mathbf {W}}\). Similarly, the trace calculation \(\beta = {\mathrm {tr}}({\varvec{\varPsi }}_j{\mathbf {P}}{\varvec{\varPsi }}_k{\mathbf {P}})\) needed for the information matrix has the form

$$\begin{aligned} \beta&= {\mathrm {tr}} \left[ {\varvec{\varPsi }}_j \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right. \\&\quad \times \left. {\varvec{\varPsi }}_k \left( {\varvec{\varPsi }}^{-1} - {\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right) \right] \\&={\mathrm {tr}}\left[ {\varvec{\varPsi }}_{j}{\varvec{\varPsi }}^{-1}{\varvec{\varPsi }}_{k}{\varvec{\varPsi }}^{-1}\right] \\&\quad -2{\mathrm {tr}}\left[ {\varvec{\varPsi }}_{j}{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{k}{\varvec{\varPsi }}^{-1}{\mathbf {W}}{\mathbf {C}}^{-1}{\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}\right] \\&\quad +{\mathrm {tr}}\left[ {\mathbf {C}}_{j}{\mathbf {C}}^{-1}{\mathbf {C}}_{k}{\mathbf {C}}^{-1}\right] \end{aligned}$$

where \({\mathbf {C}}_j = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{j} {\varvec{\varPsi }}^{-1} {\mathbf {W}}\). Consequently, when \({\varvec{\varPsi }}\) depends on the unknown parameters \({\varvec{\nu }}\), the efficiency of the estimation will depend on the form of \({\varvec{\varPsi }}_k\), which will depend on the assumed structure of \({\varvec{\varPsi }}\).

Unlike the REML–VC algorithm presented for known \({\varvec{\varPsi }}\), the computational cost for unknown \({\varvec{\varPsi }}\) will generally not be insensitive to the sample size n. This is because the \({\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}\) matrix will have to be iteratively updated, i.e., recalculated for each new \({\varvec{\varPsi }}\). In this case, it may possible to apply the rounding parameter approximation proposed by Helwig and Ma (in press) to reduce the data to \(u \ll n\) unique data points, which would reduce the computational burden involved with the iterative formation of \({\mathbf {C}}\). However, for simple parameterizations of \({\varvec{\varPsi }}\) (e.g., first-order autoregressive), it may be more computationally efficient to perform a grid search, i.e., fix the parameters in \({\varvec{\nu }}\) and use the REML–VC algorithm for known \({\varvec{\varPsi }}\). This sort of approach would be easily parallelizable, and (assuming the number of \(\nu _{k}\) parameters is small) could be much more efficient than estimating the \(\nu _{k}\) parameters using a REML approach.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Helwig, N.E. Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Stat Comput 26, 1319–1336 (2016). https://doi.org/10.1007/s11222-015-9610-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9610-5

Keywords

Navigation