On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing


We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (J Am Stat Assoc 72(358):320–338, 1977)’s work, but it is able to deal with models that have a precision matrix for the random effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (separation of overlapping precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Breslow, N.E., Clayton, D.G.: Approximate inference in generalized linear mixed models. J. Am. Stat. Assoc. 88(421), 9–25 (1993)

    MATH  Google Scholar 

  2. Camarda, C.G., Eilers, P.H., Gampe, J.: Sums of smooth exponentials to decompose complex series of counts. Stat. Model. 16(4), 279–296 (2016)

    MathSciNet  Article  Google Scholar 

  3. Crainiceanu, C.M., Ruppert, D., Carroll, R.J., Joshi, A., Goodner, B.: Spatially adaptive Bayesian penalized splines with heteroscedastic errors. J. Comput. Graph. Stat. 16(2), 265–288 (2007)

    MathSciNet  Article  Google Scholar 

  4. Crump, S.L.: The present status of variance component analysis. Biometrics 7(1), 1–16 (1951)

    MathSciNet  Article  Google Scholar 

  5. Cui, Y., Hodges, J.S., Kong, X., Carlin, B.P.: Partitioning degrees of freedom in hierarchical and other richly-parameterized models. Technometrics 52, 124–136 (2010)

    MathSciNet  Article  Google Scholar 

  6. Currie, I.D., Durban, M.: Flexible smoothing with P-splines: a unified approach. Stat. Model. 2(4), 333–349 (2002)

    MathSciNet  MATH  Article  Google Scholar 

  7. Currie, I.D., Durban, M., Eilers, P.H.C.: Generalized linear array models with applications to multidimensional smoothing. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 68(2), 259–280 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  8. Davies, P.L., Gather, U., Meise, M., Mergel, D., Mildenberger, T.: Residual-based localization and quantification of peaks in X-ray diffractograms. Ann. Appl. Stat. 2(3), 861–886 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  9. Davies, P.L., Gather, U., Meise, M. Mergel, D., Mildenberger, T., Bernholt, T., Hofmeister, T.: diffractometry: baseline identification and peak decomposition for x-ray diffractograms. R package version 0.1-10 (2018)

  10. Djeundje, V.A., Currie, I.D.: Appropriate covariance-specification via penalties for penalized splines in mixed models for longitudinal data. Electron. J. Stat. 4, 1202–1224 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  11. Durban, M., Aguilera-Morillo, M.C.: On the estimation of functional random effects. Stat. Model. 17(1–2), 50–58 (2017)

    MathSciNet  Article  Google Scholar 

  12. Durban, M., Harezlak, J., Wand, M.P., Carroll, R.J.: Simple fitting of subject-specific curves for longitudinal data. Stat. Med. 24(8), 1153–1167 (2005)

    MathSciNet  Article  Google Scholar 

  13. Eilers, P.H.C.: Discussion of Verbyla et al. J. R. Stat. Soc. Ser. C (Appl. Stat.) 48, 300–311 (1999)

    Google Scholar 

  14. Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11(2), 89–121 (1996)

    MathSciNet  MATH  Article  Google Scholar 

  15. Engel, B.: The analysis of unbalanced linear models with variance components. Stat. Neerl. 44, 195–219 (1990)

    MathSciNet  Article  Google Scholar 

  16. Engel, B., Buist, W.: Analysis of a generalized linear mixed model: a case study and simulation results. Biom. J. 38(1), 61–80 (1996)

    MATH  Article  Google Scholar 

  17. Engel, B., Keen, A.: A simple approach for the analysis of generalizea linear mixed models. Stat. Neerl. 48(1), 1–22 (1994)

    MATH  Article  Google Scholar 

  18. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  19. Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4), 1440–1450 (1995)

    MATH  Article  Google Scholar 

  20. Goldsmith, J., Bobb, J., Crainiceanu, C.M., Caffo, B., Reich, D.: Penalized functional regression. J. Comput. Graph. Stat. 20(4), 830–851 (2011)

    MathSciNet  Article  Google Scholar 

  21. Goldsmith, J., Crainiceanu, C.M., Caffo, B., Reich, D.: Longitudinal penalized functional regression for cognitive outcomes on neuronal tract measurements. J. R. Stat. Soc. Ser. C (Appl. Stat.) 61(3), 453–469 (2012)

    MathSciNet  Article  Google Scholar 

  22. Goldsmith, J., Scheipl, F., Huang, L., Wrobel, J., Gellar, J., Harezlak, J., McLean, M.W., Swihart, B., Xiao, L., Crainiceanu, C., Reiss, P.T.: refund: Regression with Functional Data. R package version 0.1-16 (2016)

  23. Graser, H.-U., Smith, S.P., Tier, B.: A derivative-free approach for estimating variance components in animal models by restricted maximum likelihood. J. Anim. Sci. 2(64), 1362–1373 (1987)

    Article  Google Scholar 

  24. Green, P.J.: Penalized likelihood for general semi-parametric regression models. Int. Stat. Rev./Revue Internationale de Statistique 55(3), 245–259 (1987)

    MathSciNet  MATH  Google Scholar 

  25. Greven, S., Scheipl, F.: A general framework for functional regression modelling. Stat. Model. 17(1–2), 1–35 (2017)

    MathSciNet  Article  Google Scholar 

  26. Groll, A., Tutz, G.: Variable selection for generalized linear mixed models by L1-penalized estimation. Stat. Comput. 24(2), 137–154 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  27. Harville, D.A.: Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72(358), 320–338 (1977)

    MathSciNet  MATH  Article  Google Scholar 

  28. Harville, D.A.: Matrix Algebra from a Statistician’s Perspective. Springer, Berlin (1997)

    Google Scholar 

  29. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman & Hall, London (1990)

    Google Scholar 

  30. Heckman, N., Lockhart, R., Nielsen, J.D.: Penalized regression, mixed effects models and appropriate modelling. Electron. J. Stat. 7, 1517–1552 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  31. Henderson, C.R.: Selection index and expected genetic advance. Stat. Genet. Plant Breed. 982, 141–163 (1963)

    Google Scholar 

  32. Hunter, D.R., Li, R.: Variable selection using MM algorithms. Ann. Stat. 33(4), 1617–1642 (2005)

    MathSciNet  MATH  Article  Google Scholar 

  33. Johnson, D.L., Thompson, R.: Restricted maximum likelihood estimation of variance components for univariate animal models using sparse matrix techniques and average information. J. Dairy Sci. 78, 449–456 (1995)

    Article  Google Scholar 

  34. Karas, M., Brzyski, D., Dzemidzic, M., Goñi, J., Kareken, D.A., Randolph, T.W., Harezlak, J.: Brain connectivity-informed regularization methods for regression. Stat. Biosci. (2017). https://doi.org/10.1007/s12561-017-9208-x

  35. Krivobokova, T.: Smoothing parameter selection in two frameworks for penalized splines. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(4), 725–741. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12010 (2009)

  36. Krivobokova, T., Crainiceanu, C.M., Kauermann, G.: Fast adaptive penalized splines. J. Comput. Graph. Stat. 17(1), 1–20 (2008)

    MathSciNet  Article  Google Scholar 

  37. Lee, D.-J.: Smoothing mixed model for spatial and spatio-temporal data. PhD thesis. Department of Statistics, Universidad Carlos III de Madrid, Spain (2010)

  38. McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall/CRC Monographs on Statistics and Applied Probability Series, 2nd edn. Chapman & Hall, London (1989)

    Google Scholar 

  39. Patterson, H.D., Thompson, R.: Recovery of inter-block information when block sizes are unequal. Biometrika 58(3), 545–554 (1971)

    MathSciNet  MATH  Article  Google Scholar 

  40. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2018)

  41. Reiss, P .T., Ogden, R .T.: Smoothing parameter selection for a class of semiparametric linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 71(2), 505–523 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  42. Rodríguez-Álvarez, M.X., Durban, M., Lee, D.-J., Eilers, P.H.C.: Fast estimation of multidimensional adaptive p-spline models. In: Friedl, H., Wagner, H. (eds.) Proceedings of the 30th International Workshop on Statistical Modelling, pp 330 – 335. arXiv:1610.06861 (2015a)

  43. Rodríguez-Álvarez, M.X., Lee, D.-J., Kneib, T., Durban, M., Eilers, P.H.C.: Fast smoothing parameter separation in multidimensional generalized P-splines: the sap algorithm. Stat. Comput. 25, 941–957 (2015b)

    MathSciNet  MATH  Article  Google Scholar 

  44. Rodríguez-Álvarez, M.X., Durban, M., Lee, D.-J., Eilers, P.H.C., Gonzalez, F.: Spatio-temporal adaptive penalized splines with application to neuroscience. In: Dupuy, J.-F., Josse, J. (eds.) Proceedings of the 31th International Workshop on Statistical Modelling, pp. 267–272. arXiv:1610.06860 (2016)

  45. Rodríguez-Álvarez, M.X., Boer, M.P., van Eeuwijk, F.A., Eilers, P.H.: Correcting for spatial heterogeneity in plant breeding experiments with p-splines. Spat. Stat. 23, 52–71 (2018)

    MathSciNet  Article  Google Scholar 

  46. Ruppert, D., Carroll, R.J.: Spatially-adaptive penalties for spline fitting. Aust. N. Z. J. Stat. 42(2), 205–223 (2000)

    Article  Google Scholar 

  47. Ruppert, D., Wand, M.P., Carroll, R.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  48. Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78(4), 719–727 (1991)

    MathSciNet  MATH  Article  Google Scholar 

  49. Simpkin, A., Newell, J.: An additive penalty p-spline approach to derivative estimation. Comput. Stat. Data Anal. 68, 30–43 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  50. Smith S.P.: Estimation of genetic parameters in non-linear models. In: Gianola, D., Hammond, K. (eds.) Advances in Statistical Methods for Genetic Improvement of Livestock. Advanced Series in Agricultural Sciences, vol. 18. Springer, Berlin, Heidelberg (1990)

  51. Taylor, J.D., Verbyla, A.P., Cavanagh, C., Newberry, M.: Variable selection in linear mixed models using an extended class of penalties. Aust. N. Z. J. Stat. 54(4), 427–449 (2012)

    MathSciNet  MATH  Article  Google Scholar 

  52. Tibshirani, R.J.: Adaptive piecewise polynomial estimation via trend filtering. Ann. Stat. 42(1), 285–323 (2014)

    MathSciNet  MATH  Article  Google Scholar 

  53. Wand, M.P.: Smoothing and mixed models. Comput. Stat. 18(2), 223–249 (2003)

    MATH  Article  Google Scholar 

  54. Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(3), 495–518 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  55. Wood, S .N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(1), 2–36 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  56. Wood, S.N.: Generalized Additive Models: An Introduction with R, 2nd edn. Chapman & Hall CRC, London (2017)

    Google Scholar 

  57. Wood, S.N., Fasiolo, M.: A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models. Biometrics 73, 1071–1081 (2017)

    MathSciNet  MATH  Article  Google Scholar 

  58. Wood, S.N., Pya, N., Säfken, B.: Smoothing parameter and model selection for general smooth models. J. Am. Stat. Assoc. 111(516), 1548–1563 (2016)

    MathSciNet  Article  Google Scholar 

  59. Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509–1533 (2008)

    MathSciNet  MATH  Article  Google Scholar 

Download references


This research was supported by the Basque Government through the BERC 2018-2021 program and by Spanish Ministry of Economy and Competitiveness MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and through projects MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym “AFTERAM”, MTM2014-52184-P and MTM2014-55966-P. The MRI/DTI data were collected at Johns Hopkins University and the Kennedy-Krieger Institute. We are grateful to Pedro Caro and Iain Currie for useful discussions, to Martin Boer and Cajo ter Braak for the detailed reading of the paper and their many suggestions, and to Bas Engel for sharing with us his knowledge. We are also grateful to the two peer referees for their constructive comments of the paper.

Author information



Corresponding author

Correspondence to María Xosé Rodríguez-Álvarez.


A Proof of Theorem 1


We first note that the first-order partial derivatives of the (approximate) REML log-likelihood function can be expressed as (see, e.g., Rodríguez-Álvarez et al. 2015b)

$$\begin{aligned} \frac{\partial {l}}{\partial {\sigma _{k_l}^2}} = - \frac{1}{2}\hbox {trace}\left( {\varvec{Z}}^{\top }{\varvec{P}}{\varvec{Z}}\frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}}\right) + \frac{1}{2}\widehat{\varvec{\alpha }}^{\top }{\varvec{G}}^{-1}\frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}}{\varvec{G}}^{-1}\widehat{\varvec{\alpha }}. \end{aligned}$$

Given that \({\varvec{G}}\) is a positive definite matrix, we have the identity

$$\begin{aligned} \frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}} = - {\varvec{G}}\frac{\partial {{\varvec{G}}^{-1}}}{\partial {\sigma _{k_l}^2}}{\varvec{G}}, \end{aligned}$$

and thus

$$\begin{aligned} \frac{\partial {{\varvec{G}}}}{\partial {\sigma _{k_l}^2}} = \frac{1}{\sigma ^{4}_{k_l}}\hbox {diag}\left( {\varvec{0}}^{(1)},{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k,{\varvec{0}}^{(2)}\right) , \end{aligned}$$

where \({\varvec{0}}^{(1)}\) and \({\varvec{0}}^{(2)}\) are zero square matrices of appropriate dimensions.

The first-order partial derivatives of the REML log-likelihood function are then expressed as

$$\begin{aligned} 2\frac{\partial {l}}{\partial {\sigma _{k_l}^2}} = - \frac{1}{\sigma _{k_l}^4}\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) + \frac{1}{\sigma _{k_l}^4}\widehat{\varvec{\alpha }}_k^{\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k. \end{aligned}$$

When the REML estimates are positive, they are obtained by equating the former expression to zero (see, e.g., Engel 1990)

$$\begin{aligned} \frac{\widehat{\varvec{\alpha }}^{\top }_k\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) } = 1. \end{aligned}$$

We now multiply both sides with \(\sigma _{k_l}^2\) and evaluate the left-hand side for the previous iterates and the right-hand side for the update, obtaining

$$\begin{aligned} {\widehat{\sigma }}^{2}_{k_l}= & {} \frac{\widehat{\varvec{\alpha }}_k^{[t]\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}^{[t]}{\varvec{Z}}_k{\varvec{G}}_k^{[t]}\varvec{\Lambda }_{k_l}{\varvec{G}}_k^{[t]}\right) }{\widehat{\sigma }}_{k_l}^{2[t]}\\= & {} \frac{\widehat{\varvec{\alpha }}_k^{[t]\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {trace}\left( {\varvec{Z}}_k^{\top }{\varvec{P}}^{[t]}{\varvec{Z}}_k{\varvec{G}}_k^{[t]}\frac{\varvec{\Lambda }_{k_l}}{{\widehat{\sigma }}_{k_l}^{2[t]}}{\varvec{G}}_k^{[t]}\right) }. \end{aligned}$$

\(\square \)

B Proof of Theorem 2


First let us recall some notation and introduce some needed results. We denote as \({\varvec{P}} = {\varvec{V}}^{-1} - {\varvec{V}}^{-1}{\varvec{X}}\left( {\varvec{X}}^{\top }{\varvec{V}}^{-1}{\varvec{X}}\right) {\varvec{X}}^{\top }{\varvec{V}}^{-1}\), where \({\varvec{V}} = {\varvec{R}} + {\varvec{Z}}{\varvec{G}}{\varvec{Z}}^{\top }\), \({\varvec{R}} = \phi {\varvec{W}}^{-1}\) with \({\varvec{W}}\) being the diagonal weight matrix involved in the Fisher scoring algorithm.

Denote as \({\mathcal {C}}\left( {\varvec{A}}\right) \) the linear space spanned by the columns of \({\varvec{A}}\), and let \({\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}} = {\varvec{X}}\left( {\varvec{X}}^{\top }{\varvec{V}}^{-1}{\varvec{X}}\right) ^{-1}{\varvec{X}}^{\top }{\varvec{V}}^{-1}\) be the orthogonal projection matrix for \({\mathcal {C}}\left( {\varvec{X}}\right) \) with respect to \({\varvec{V}}^{-1}\). It is easy to show that

$$\begin{aligned} {\varvec{P}}&= {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \\&= \left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) . \end{aligned}$$

By Theorem 14.2.9 in Harville (1997), \({\varvec{P}}\) is a (symmetric) positive semi-definite matrix. In addition,

$$\begin{aligned} {\varvec{P}}{\varvec{X}} = {\varvec{0}}, \end{aligned}$$


$$\begin{aligned} {\hbox {rank}}({\varvec{P}})&= \hbox {rank}\left( {\varvec{V}}^{-1}\left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \right) \\&= \hbox {rank}\left( \left( {\varvec{I}}_n - {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \right) \\&= n - \hbox {rank}\left( {\varvec{P}}_{{\varvec{X}}{\varvec{V}}^{-1}}\right) \\&= n - \hbox {rank}\left( {\varvec{X}}\right) . \end{aligned}$$


$$\begin{aligned} ker\left( {\varvec{P}}\right) = {\mathcal {C}}\left( {\varvec{X}}\right) , \end{aligned}$$

i.e., \({\varvec{P}}{\varvec{x}} = {\varvec{0}}\) if and only if \({\varvec{x}}\) is in \({\mathcal {C}}\left( {\varvec{X}}\right) \).

Let \(\varvec{\Lambda }_{k_l} = {\varvec{U}}{\varvec{\Sigma }}{\varvec{U}}^{\top }\) be the eigenvalue decomposition of \(\varvec{\Lambda }_{k_l}\). Note that \(\varvec{\Lambda }_{k_l} = {\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }\), where \({\varvec{U}}_{+}\) and \({\varvec{\Sigma }}_{+}\) are the sub-matrices corresponding to the nonzero eigenvalues. Then

$$\begin{aligned} \widehat{\varvec{\alpha }}_k^{\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k&= \widehat{\varvec{\alpha }}_k^{\top }{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }\widehat{\varvec{\alpha }}_k\\&= {\varvec{z}}^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}{\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{z}} \geqslant 0, \end{aligned}$$

with equality holding if and only if \({\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{z}} = {\varvec{0}}\) (since \({\varvec{\Sigma }}_{+}\) is positive definite). Thus, using result (26), the equality holds if \({\varvec{z}}\) is in \({\mathcal {C}}\left( {\varvec{X}}\right) \) or \({\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \). By Lemma 4.2.2 and Corollary 4.5.2 in Harville (1997), we have

$$\begin{aligned} {\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right)&= {\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \\&\iff \hbox {rank}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l},{\varvec{X}}\right) = \hbox {rank}\left( {\varvec{X}}\right) . \end{aligned}$$

Regarding the denominator of the REML-based estimates updates, we follow a similar reasoning. Using Corollary 14.7.5 (and Theorem 14.2.9) in Harville (1997), we have

$$\begin{aligned} \hbox {trace}&\left( {\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l}{\varvec{G}}_k\right) \\&= \hbox {trace}\left( {\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}{\varvec{\Sigma }}_{+}\right) \geqslant 0, \end{aligned}$$

with equality holding if and only if \({\varvec{U}}_{+}^{\top }{\varvec{G}}_k{\varvec{Z}}_k^{\top }{\varvec{P}}{\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+} = {\varvec{0}}\). Again, this equality holds if and only if \({\mathcal {C}}\left( {\varvec{Z}}_k{\varvec{G}}_k{\varvec{U}}_{+}\right) \subset {\mathcal {C}}\left( {\varvec{X}}\right) \) (i.e., \(\iff \hbox {rank}\left( {\varvec{Z}}_k{\varvec{G}}_k\varvec{\Lambda }_{k_l},{\varvec{X}}\right) = \hbox {rank}\left( {\varvec{X}}\right) \)). \(\square \)

C Estimating algorithm

This appendix summarises the steps of the estimating algorithm for model (10) based on the SOP method. Recall that interest lies in estimating model

$$\begin{aligned} g\left( {\varvec{\mu }}\right) = {\varvec{X}}\varvec{\beta } + {\varvec{Z}}\varvec{\alpha } = {\varvec{X}}\varvec{\beta } + \sum _{k=1}^{c}{\varvec{Z}}_k\varvec{\alpha }_k, \end{aligned}$$

where \({\varvec{Z}} = \left[ {\varvec{Z}}_{1},\ldots ,{\varvec{Z}}_{c}\right] \), \(\varvec{\alpha } = \left( \varvec{\alpha }_{1}^{\top },\ldots ,\varvec{\alpha }_{c}^{\top }\right) ^{\top }\), \(\varvec{\alpha }_k\sim N\left( {\varvec{0}}, {\varvec{G}}_k\right) \) with \( {\varvec{G}}_{k}^{-1} = \sum _{l = 1}^{p_k}\sigma _{k_l}^{-2}\varvec{\Lambda }_{k_l}\), and \(\varvec{\alpha }\sim N\left( {\varvec{0}}, {\varvec{G}}\right) \) with \({\varvec{G}} = \bigoplus _{k = 1}^{c}{\varvec{G}}_{k}\).

  1. Initialise

    Set initial values for \(\widehat{{\varvec{\mu }}}^{[0]}\) and the variance parameters \({\widehat{\sigma }}_{k_l}^{2[0]}\) (\(l = 1,\ldots ,p_k\) and \(k = 1, \ldots , c\)). In those situations where \(\phi \) is unknown, establish an initial value for this parameter, \({\widehat{\phi }}^{[0]}\). Set \(t = 0\).

  2. Step 1

    Construct the working response vector \({\varvec{z}}\) and the matrix of weights \({\varvec{W}}\) as follows

    $$\begin{aligned} z_i = g({\widehat{\mu }}_i^{[t]}) + (y_i - {\widehat{\mu }}_i^{[t]})g^{\prime }({\widehat{\mu }}_i^{[t]}), \\ w_{ii} = \left\{ g'({\widehat{\mu }}_i^{[t]})^2\nu ({\widehat{\mu }}_i^{[t]})\right\} ^{-1}. \end{aligned}$$
    1. Step 1.1

      Given the initial estimates of variance parameters, estimate \(\varvec{\alpha }\) and \(\varvec{\beta }\) by solving

      $$\begin{aligned} \underbrace{ \begin{bmatrix} {\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{X}}&{\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{Z}} \\ {\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{X}}&{\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{Z}} + {{\varvec{G}}^{[t]}}^{-1} \end{bmatrix}}_{{\varvec{C}}^{[t]}} \begin{bmatrix} \varvec{{\widehat{\beta }}}\\ \varvec{{\widehat{\alpha }}} \end{bmatrix} = \begin{bmatrix} {\varvec{X}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{z}}\\ {\varvec{Z}}^{\top }{{\varvec{R}}^{[t]}}^{-1}{\varvec{z}} \end{bmatrix},\nonumber \\ \end{aligned}$$

      where \({\varvec{R}}^{[t]} = {\widehat{\phi }}^{[t]}{\varvec{W}}^{-1}\). Let \(\varvec{{\widehat{\alpha }}}^{[t]}\) and \(\varvec{{\widehat{\beta }}}^{[t]}\) be these estimates.

    2. Step 1.2

      Update the variance parameters as follows

      $$\begin{aligned} {\widehat{\sigma }}^{2}_{k_l} = \frac{\widehat{\varvec{\alpha }}_k^{{[t]}\top }\varvec{\Lambda }_{k_l}\widehat{\varvec{\alpha }}_k^{[t]}}{\hbox {ED}_{k_l}^{[t]}}, \end{aligned}$$

      and, when necessary,

      $$\begin{aligned} {\widehat{\phi }} = \frac{\left( {\varvec{z}}-{\varvec{X}}\varvec{{\widehat{\beta }}}^{[t]}-{\varvec{Z}}\varvec{{\widehat{\alpha }}}^{[t]}\right) ^{\top }{\varvec{W}}\left( {\varvec{z}}-{\varvec{X}}\varvec{{\widehat{\beta }}}^{[t]}-{\varvec{Z}}\varvec{{\widehat{\alpha }}}^{[t]}\right) }{n - \hbox {rank}({\varvec{X}}) - \sum _{k=1}^c\sum _{l=1}^{p_k}\hbox {ED}_{k_l}^{[t]}}, \end{aligned}$$


      $$\begin{aligned} \hbox {ED}_{k_l}^{[t]} = \hbox {trace}\left( \left( {\varvec{G}}^{[t]}_k - {{\varvec{C}}^{[t]}}^{*}_{kk}\right) \frac{\varvec{\Lambda }_{k_l}}{{{\widehat{\sigma }}_{k_l}^{2[t]}}}\right) . \end{aligned}$$

      Recall that \({{\varvec{C}}^{[t]}}^{*}\) denotes the inverse of \({\varvec{C}}^{[t]}\) in (27), and \({{\varvec{C}}^{[t]}}^{*}_{kk}\) is that partition of \({{\varvec{C}}^{[t]}}^{*}\) corresponding to the k-th random component \(\varvec{\alpha }_k\).

    3. Step 1.3

      Repeat Step 1.1 and Step 1.2, replacing \({\widehat{\sigma }}_{k_l}^{2[t]}\), and, if updated, \({\widehat{\phi }}^{[t]}\), by \({\widehat{\sigma }}_{k_l}^{2}\) and \({\widehat{\phi }}\), until convergence. For the examples presented in Sect. 5, we use the REML deviance as the convergence criterion.

  3. Step 2

    Repeat Step 1, replacing the variance parameters and the model’s fixed and random effects (and thus \(\widehat{{\varvec{\mu }}}^{[t]}\)) by those obtained in the last iteration of Steps 1.1–Step 1.3, until convergence.

It is worth noting that for notational convenience, in the examples described in Sect. 4, the precision matrix was rewritten as \({\varvec{G}}^{-1} = \sum _{l=1}^{p}\sigma _l^{-2}\widetilde{\varvec{\Lambda }}_l\), where \(p = \sum _{k=1}^{c}p_k\) is the number of variance parameters, and \(\widetilde{\varvec{\Lambda }}_l\) are the matrices \(\varvec{\Lambda }_{k_l}\) padded out with zeroes. Here, \(\widetilde{\varvec{\Lambda }}_l\) are matrices of dimension \(q \times q\), where \( q = \sum _{k=1}^{c}q_k\) is the number of random effects coefficients. For this specification, the estimating algorithm discussed above remains essentially the same, but in Step 1.2, \(\varvec{\alpha }^{[t]}_k\), \({\varvec{G}}^{[t]}_k\), and \(\varvec{\Lambda }_{k_l}\) are replaced by, respectively, \(\varvec{\alpha }^{[t]}\), \({\varvec{G}}^{[t]}\) and \(\widetilde{\varvec{\Lambda }}_l\), and \({{\varvec{C}}^{[t]}}^{*}_{kk}\) would be that partition of \({{\varvec{C}}^{[t]}}^{*}\) corresponding to the random vector \(\varvec{\alpha }\) (and thus the same for all variance parameters).

D Factor-by-curve hierarchical curve model

This appendix describes in detail the factor-by-curve interaction model discussed in Sect. 5.3, i.e.,

$$\begin{aligned} y_{ij} = f_{z_j}\left( t_i\right) + g_j\left( t_i\right) + \varepsilon _{ij} \;\; 1\le i \le s,\;1\le j \le m, \end{aligned}$$

where \(z_j = 1\) if the j-th individual is affected by MS (case) and \(z_j = 0\) otherwise (control). Let’s order the data with the observations on controls first, followed by observations on MS patients. In matrix notation, the model can be expressed as

$$\begin{aligned} {\varvec{y}} = [{\varvec{Q}}\otimes {\varvec{B}}]{\varvec{\theta }} + [{\varvec{I}}_{m}\otimes \breve{{\varvec{B}}}]\breve{{\varvec{\theta }}} + \varvec{\varepsilon }, \end{aligned}$$

with \({\varvec{B}}\), \(\breve{{\varvec{B}}}\), \(\breve{{\varvec{\theta }}}\) as defined in Sect. 4.2, and \(\varvec{\varepsilon } = \left( \varvec{\varepsilon }_{\cdot ,1}^{\top },\ldots , \varvec{\varepsilon }_{\cdot ,m}^{\top }\right) ^{\top }\), with \(\varvec{\varepsilon }_{\cdot j}= \left( \varepsilon _{1j},\ldots , \varepsilon _{sj}\right) ^{\top }\). Matrix \({\varvec{Q}}\) is any suitable contrast matrix of dimension \(m \times 2\), where \(m = m_0 + m_1\), with \(m_0\) being the number of controls and \(m_1\) the number of MS patients. For our application, we consider

$$\begin{aligned} {\varvec{Q}} = \begin{pmatrix} {\varvec{1}}_{m_0} &{} {\varvec{0}}_{m_0}\\ {\varvec{0}}_{m_1} &{} {\varvec{1}}_{m_1} \end{pmatrix}, \end{aligned}$$

and a different amount of smoothing is assumed for \(f_{0}\) and \(f_{1}\), i.e., the penalty matrix acting over the vector of coefficients \({\varvec{\theta }}\) is of the form

$$\begin{aligned} {\varvec{P}} = \begin{pmatrix} \lambda _{1}{\varvec{D}}_q^{\top }{\varvec{D}}_q &{} {\varvec{0}}_{c \times c}\\ {\varvec{0}}_{c \times c} &{} \lambda _{2}{\varvec{D}}_q^{\top }{\varvec{D}}_q \end{pmatrix}. \end{aligned}$$

The reformulation as a mixed model can be done in a similar fashion to that described in Sect. 4.2, with, in this case

$$\begin{aligned} {\varvec{X}} =&[{\varvec{Q}}\otimes {\varvec{B}}{\varvec{U}}_{0}],\\ {\varvec{Z}} =&[{\varvec{Q}}\otimes {\varvec{B}}{\varvec{U}}_{+}:{\varvec{I}}_m\otimes \breve{{\varvec{B}}}], \end{aligned}$$


$$\begin{aligned} {\varvec{G}}^{-1} = \begin{pmatrix} \sigma _{1}^{-2}{\varvec{\Sigma }}_{+} &{} {\varvec{0}}_{d \times d} &{} {\varvec{0}}_{d\times \left( m \breve{d}\right) }\\ {\varvec{0}}_{d\times d} &{} \sigma _{2}^{-2}{\varvec{\Sigma }}_{+} &{} {\varvec{0}}_{d\times \left( m \breve{d}\right) }\\ {\varvec{0}}_{\left( m \breve{d}\right) \times d}&{} {\varvec{0}}_{\left( m \breve{d}\right) \times c} &{} {\varvec{I}}_{m}\otimes \breve{{\varvec{G}}}^{-1} \end{pmatrix}, \end{aligned}$$


$$\begin{aligned} \breve{{\varvec{G}}}^{-1} = \sigma _3^{-2}\breve{{\varvec{D}}}_{\breve{q}}^{\top }\breve{{\varvec{D}}}_{\breve{q}} + \sigma _4^{-2}{\varvec{I}}_{\breve{d}}. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Álvarez, M.X., Durban, M., Lee, D. et al. On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing. Stat Comput 29, 483–500 (2019). https://doi.org/10.1007/s11222-018-9818-2

Download citation


  • Generalised linear mixed models
  • Generalised additive models
  • Variance parameters
  • Smoothing parameters
  • REML
  • Effective degrees of freedom