Abstract
Linear mixed-effects (LME) regression models are a popular approach for analyzing correlated data. Nonparametric extensions of the LME regression model have been proposed, but the heavy computational cost makes these extensions impractical for analyzing large samples. In particular, simultaneous estimation of the variance components and smoothing parameters poses a computational challenge when working with large samples. To overcome this computational burden, we propose a two-stage estimation procedure for fitting nonparametric mixed-effects regression models. Our results reveal that, compared to currently popular approaches, our two-stage approach produces more accurate estimates that can be computed in a fraction of the time.
Similar content being viewed by others
Notes
For multilevel data with one grouping factor, the bigsplines package exploits the sparsity of \({\mathbf {Z}}\) and only stores the relevant crossproduct matrices for estimation.
Letting s.t denote subject.trial, we excluded the following: 11.2, 12.8, 12.10, 18.0, 18.2, 27.4, 27.6, 27.8, 39.6, 43.0, 46.0, 53.8, 55.2, 55.4, 56.0, 57.2, 82.4, 83.0, 84.10, 84.12, 92.18, 99.0.
By convention, ERPs are plotted upside down, i.e., positive voltages downwards and negative voltages upwards.
References
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2013)
Bates, D., Maechler, M., Bolker, B., Walker, S.: lme4: Linear mixed-effects models using Eigen and S4. http://CRAN.R-project.org/package=lme4, r package version 1.1-8 (2015)
Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Mathe. 31, 377–403 (1979)
de Leeuw, J., Meijer, E.: Introduction to multilevel analysis. In: de Leeuw, J., Meijer, E. (eds.) Handbook of Multilevel Analysis, pp. 1–75. Springer, New York (2008)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B 39, 1–38 (1977)
Fu, W.J.: Penalized estimating equations. Biometrics 59, 126–132 (2003)
Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information reml: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51, 1440–1450 (1995)
Goldstein, H.: Multilevel Statistical Models, 4th edn. Wiley, West Sussex (2011)
Gu, C.: Smoothing Spline ANOVA Models, 2nd edn. Springer, New York (2013)
Gu, C.: gss: General smoothing splines. http://CRAN.R-project.org/package=gss, r package version 2.1-5 (2014)
Gu, C., Ma, P.: Optimal smoothing in nonparametric mixed-effect models. Annal. Stat. 33, 1357–1379 (2005)
Hartley, H.O., Rao, J.N.K.: Maximum likelihood estimation for the mixed analysis of variance model. Biometrika 54, 93–108 (1967)
Harville, D.A.: Maximum-likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72, 320–340 (1977)
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall/CRC, New York (1990)
Helwig, N.E.: bigsplines: smoothing splines for large samples. http://CRAN.R-project.org/package=bigsplines, r package version 1.0-7 (2015)
Helwig, N.E., Ma, P.: Fast and stable multiple smoothing parameter selection in smoothing spline analysis of variance models with large samples. J. Comput. Graph. Stat. 24, 715–732 (2015)
Helwig, N.E., Ma, P.: (in press) Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters. Statistics and Its Interface
Henderson, C.R.: Estimation of genetic parameters (abstract). Annal. Math. Stat. 21, 309–310 (1950)
Henderson, C.R.: Estimation of variance and covariance components. Biometrics 9, 226–252 (1953)
Henderson, C.R., Kempthorne, O., Searle, S.R., von Krosigk, C.M.: The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192–218 (1959)
Henderson, H.V., Searle, S.R.: On deriving the inverse of a sum of matrices. SIAM Rev. 23, 53–60 (1981)
Kenward, M., Molenberghs, G.: Likelihood based frequentist inference when data are missing at random. Stat. Sci. 12, 236–247 (1998)
Kim, Y.J., Gu, C.: Smoothing spline gaussian regression: more scalable computation via efficient approximation. J. R. Stat. Soc. Ser. B 66, 337–356 (2004)
Laird, N.M.: Computation of variance components using the EM algorithm. J. Stat. Comput. Simul. 14, 295–303 (1982)
Li, K.C.: Asymptotic optimality for \({C}_{p}\), \({C}_{L}\), cross-validation and generalized cross-validation: discrete index set. Annal. Stat. 15, 958–975 (1987)
Ma, P., Huang, J., Zhang, N.: Efficient computation of smoothing splines via adaptive basis sampling. Biometrika 102, 631–645 (2015)
Moore, E.H.: On the reciprocal of the general algebraic matrix. Bull. Am. Mathe. Soc. 26, 394–395 (1920)
Paterson, L.: Socio-economic status and educational attainment: a multidimensional and multilevel studys. Eval. Res. Educ. 5, 97–121 (1991)
Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971)
Penrose, R.: A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51, 406–413 (1950)
Porjesz, B., Begleiter, H.: Event-related potentials for individuals at risk for alcoholism. Alcohol 7, 465–469 (1990a)
Porjesz, B., Begleiter, H.: Neuroelectric processes in individuals at risk for alcoholism. Alcohol Alcohol. 25, 251–256 (1990b)
Porjesz, B., Begleiter, H., Garozzo, R.: Visual evoked potential correlates of information deficits in chronic alcoholics. In: Begleiter, H. (ed.) Biological Effects of Alcohol, pp. 603–623. Plenum Press, New York (1980)
Porjesz, B., Begleiter, H., Bihari, B., Kissin, B.: The N2 component of the event-related brain potential in abstinent alcoholics. Electroencephalogr. Clin. Neurophysiol. 66, 121–131 (1987)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Searle, S.E.: Applying the EM algorithm to calculating ML and REML estimates of variance components. Technical report BU-1213-M, Biometrics Unit, Cornell University (1993)
Verbeke, G., Molenberghs, G.: Linear Mixed Models for Longitudinal Data. Springer, New York (2000)
Wahba, G.: A comparison of GCV and GML for choosing the smoothing parameters in the generalized spline smoothing problem. Annal. Stat. 4, 1378–1402 (1985)
Wahba, G.: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia (1990)
Wang, Y.: Mixed effects smoothing spline analysis of variance. J. R. Stat. Soc, Ser. B 60, 159–174 (1998a)
Wang, Y.: Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 93, 341–348 (1998b)
Wang, Y., Ke, C.: assist: A Suite of S functions for Implementing Spline smoothing Techniques. http://CRAN.R-project.org/package=assist, r package version 3.1.2 (2013)
Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)
Wood, S.N.: Generalized Additive Models: An Introduction with R. Chapman & Hall, Boca Raton (2006)
Wood, S.N.: mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimation and GAMMs by REML/PQL. http://CRAN.R-project.org/package=mgcv, r package version 1.8-7 (2015)
Zhang, D., Lin, X., Raz, J., Sowers, M.: Semiparametric stochastic mixed models for longitudinal data. J. Am. Stat. Assoc. 93, 710–719 (1998)
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Smoothing parameter initialization for REML
We initialized \(\theta _{k}=1 \ \forall k\) when defining the fixed-effects design matrix \({\mathbf {X}}_{{\varvec{\theta }}} =({\mathbf {K}},{\mathbf {J}}_{{\varvec{\theta }}}) = \tilde{{\mathbf {F}}}_{1}\tilde{{\mathbf {R}}}\) used for the REML error contrasts. To understand the motivation for this, note that we can write
where \(\tilde{{\mathbf {F}}}_{1} = (\tilde{{\mathbf {F}}}_{K} \ \tilde{{\mathbf {F}}}_{J})\), and \(\tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {R}}}_{K}\) is the QR decomposition of \({\mathbf {K}}\). Note that this implies \(\tilde{{\mathbf {R}}}_{K} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {K}}\) and \(\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\) are upper-triangular, \(\tilde{{\mathbf {R}}}_{KJ} = \tilde{{\mathbf {F}}}_{K}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\), and \(\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {R}}}_{J}\) is the QR decomposition of \(({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}\). Combining the results, we have \(\tilde{{\mathbf {F}}}_{J}\tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}} = ({\mathbf {I}}_n - \tilde{{\mathbf {F}}}_{K}\tilde{{\mathbf {F}}}_{K}^{\prime }){\mathbf {J}}_{{\varvec{\theta }}}\), where \(\tilde{{\mathbf {F}}}_{K}\) does not depend on \({\varvec{\theta }}\). A sufficient condition to ensure that \(\tilde{{\mathbf {R}}}_{J} = \tilde{{\mathbf {F}}}_{J}^{\prime }{\mathbf {J}}_{{\varvec{\theta }}}\) is upper-triangular for any \({\varvec{\theta }}\) is obtained by requiring \(\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{{J}_k}\) to be upper-triangular for all k, where \(\tilde{{\mathbf {F}}}_{{\!J}_k}\tilde{{\mathbf {R}}}_{{J}_k} = {\mathbf {J}}_{k}\) is the QR decomposition of \({\mathbf {J}}_k\), and \({\mathbf {J}}_{{\varvec{\theta }}} = \sum _{k=1}^{s} \theta _k {\mathbf {J}}_{k}\). For \(k>1\) this sufficient condition imposes more constraints than are necessary, so we recommend setting \(\theta _k = 1 \forall k\) and requiring \(\tilde{{\mathbf {F}}}_{J}^{\prime }\tilde{{\mathbf {F}}}_{J_\bullet }\) to be upper-triangular, where \(\tilde{{\mathbf {F}}}_{J_\bullet }\tilde{{\mathbf {R}}}_{J_\bullet }\) is the QR decomposition of \({\mathbf {J}}_{\bullet } = \sum _{k=1}^{s} {\mathbf {J}}_{k}\). Given that the \({\mathbf {J}}_{k}\) matrices span different (orthogonal) contrast spaces, this aggregate requirement should produce a reasonable approximation to the sufficient condition.
1.2 REML expected information matrix
We first present the results of Gilmour et al. (1995) for the gradient of the REML log-likelihood in the VC model. First, note that \({\mathbf {H}}_{k}={\mathbf {Z}}_{k}{\mathbf {Z}}_{k}^{\prime }\) and note that we can write \({\mathbf {Z}}_{k}={\mathbf {W}}{\mathbf {S}}_{k}\) where \({\mathbf {W}}=({\mathbf {X}}_{{\varvec{\theta }}},{\mathbf {Z}})\). Also note that we can write \({\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}\) where \({\varvec{\varSigma }}_{0}\) is \({\varvec{\varSigma }}^{-1}\) augmented with zeros corresponding to \({\mathbf {X}}_{{\varvec{\theta }}}\) in \({\mathbf {W}}\). The needed trace \(\alpha = {\mathrm {tr}}({\mathbf {H}}_{k}{\mathbf {P}})\) has the form
The last piece for the gradient calculation derives from the fact that \({\mathbf {Z}}^{\prime }{\mathbf {P}}{\mathbf {y}}={\varvec{\varSigma }}^{-1}\hat{{\mathbf {b}}}\).
Now, note that the trace calculation \(\beta = {\mathrm {tr}}({\mathbf {H}}_{j}{\mathbf {P}}{\mathbf {H}}_{k}{\mathbf {P}})\) needed for the information matrix has the form
where \(I_{\{j=k\}}\) is the indicator function (note \({\mathbf {S}}_{j}^{\prime }{\varvec{\varSigma }}_{0}{\mathbf {S}}_{k}={\mathbf {0}}\) whenever \(j\ne k\)), and \({\mathbf {C}}_{jk}^{-1}\) denotes the portion of \({\mathbf {C}}^{-1}\) that corresponds to \({\mathbf {Z}}_{j}^{\prime }{\mathbf {Z}}_{k}\).
1.3 REML algorithms
In the boxed algorithm, \(\epsilon \) is the convergence tolerance, and \(\omega \) is the maximum number of allowed iterations. In practice, we set \(\epsilon =10^{-4}\) and \(\omega =500\) as a default. The only difference between the REML–FS and REML–EM algorithms is Step 1 of the iterative procedure, so we combine the two algorithms into one framework below (with two distinct Step 1 parts of the iterative procedure). In practice, either the FS update or the EM update is used as Step 1 of the iterative procedure. The algorithm convergence is determined using a relative change \(\tilde{{\mathcal {L}}} = -2{\mathcal {L}}\) where \({\mathcal {L}}\) is REML log-likelihood.
1.4 Efficient crossproduct calculation
For the mixed-effects model, the numerator of the GCV score can be written as
where \(\tilde{{\mathbf {y}}}_{{\varvec{\theta }}} =\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime }\tilde{{\mathbf {y}}}\), and \(\tilde{{\mathbf {z}}}={\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime } \tilde{{\mathbf {y}}}_{{\varvec{\theta }}}\) with \({\mathbf {V}}{\mathbf {E}}{\mathbf {V}}^{\prime }\) denoting the full-rank spectral decomposition of \(\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}} + n\lambda \tilde{{\mathbf {Q}}}_{{\varvec{\theta }}}\). Furthermore, \({\mathrm {tr}}({\mathbf {S}}_{{\varvec{\tau }}{\varvec{\lambda }}}) ={\mathrm {tr}}({\mathbf {V}}{\mathbf {E}}^{-1}{\mathbf {V}}^{\prime }\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}})\), and the inverse of \(\hat{{\varvec{\varSigma }}}_{*}\) has the form
which uses Eq. (17) of Henderson and Searle (1981). As a result, the pieces of \(\tilde{{\mathbf {y}}}_{{\varvec{\theta }}}\) and \(\tilde{{\mathbf {X}}}_{{\varvec{\theta }}}^{\prime } \tilde{{\mathbf {X}}}_{{\varvec{\theta }}}\) can be formed from the crossproduct vectors and matrices that were calculated for the REML estimation of the variance components. Thus, it is never necessary to form the \(n \times n\) matrix \(\hat{{\varvec{\varSigma }}}_{*}^{-1}\) to evaluate the GCV score.
1.5 Extensions for unknown error covariance matrices
Suppose that \({\varvec{\varPsi }}\) is non-diagonal and depends on the unknown parameters \({\varvec{\nu }} = (\nu _{1},\ldots ,\nu _m)\). Letting \({\varvec{\varPsi }}_k = \partial {\mathbf {H}} / \partial \nu _k\) denote the derivative of \({\mathbf {H}} = {\varvec{\varPsi }} + {\mathbf {Z}}{\varvec{\varSigma }}{\mathbf {Z}}^{\prime }\) with respect to \(\nu _k\), analogues of the formulas in Eqs. (13) and (14) can be applied to obtain the gradient and expected information matrix corresponding to the variance parameters \({\varvec{\nu }}\). In this case, the needed trace calculation \(\alpha = {\mathrm {tr}}({\varvec{\varPsi }}_k {\mathbf {P}})\) has the form
where \({\mathbf {C}}_k = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{k} {\varvec{\varPsi }}^{-1} {\mathbf {W}}\). Similarly, the trace calculation \(\beta = {\mathrm {tr}}({\varvec{\varPsi }}_j{\mathbf {P}}{\varvec{\varPsi }}_k{\mathbf {P}})\) needed for the information matrix has the form
where \({\mathbf {C}}_j = {\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1} {\varvec{\varPsi }}_{j} {\varvec{\varPsi }}^{-1} {\mathbf {W}}\). Consequently, when \({\varvec{\varPsi }}\) depends on the unknown parameters \({\varvec{\nu }}\), the efficiency of the estimation will depend on the form of \({\varvec{\varPsi }}_k\), which will depend on the assumed structure of \({\varvec{\varPsi }}\).
Unlike the REML–VC algorithm presented for known \({\varvec{\varPsi }}\), the computational cost for unknown \({\varvec{\varPsi }}\) will generally not be insensitive to the sample size n. This is because the \({\mathbf {C}}={\mathbf {W}}^{\prime }{\varvec{\varPsi }}^{-1}{\mathbf {W}} +{\varvec{\varSigma }}_{0}\) matrix will have to be iteratively updated, i.e., recalculated for each new \({\varvec{\varPsi }}\). In this case, it may possible to apply the rounding parameter approximation proposed by Helwig and Ma (in press) to reduce the data to \(u \ll n\) unique data points, which would reduce the computational burden involved with the iterative formation of \({\mathbf {C}}\). However, for simple parameterizations of \({\varvec{\varPsi }}\) (e.g., first-order autoregressive), it may be more computationally efficient to perform a grid search, i.e., fix the parameters in \({\varvec{\nu }}\) and use the REML–VC algorithm for known \({\varvec{\varPsi }}\). This sort of approach would be easily parallelizable, and (assuming the number of \(\nu _{k}\) parameters is small) could be much more efficient than estimating the \(\nu _{k}\) parameters using a REML approach.
Rights and permissions
About this article
Cite this article
Helwig, N.E. Efficient estimation of variance components in nonparametric mixed-effects models with large samples. Stat Comput 26, 1319–1336 (2016). https://doi.org/10.1007/s11222-015-9610-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9610-5