Skip to main content

Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm

Abstract

A new computational algorithm for estimating the smoothing parameters of a multidimensional penalized spline generalized linear model with anisotropic penalty is presented. This new proposal is based on the mixed model representation of a multidimensional P-spline, in which the smoothing parameter for each covariate is expressed in terms of variance components. On the basis of penalized quasi-likelihood methods, closed-form expressions for the estimates of the variance components are obtained. This formulation leads to an efficient implementation that considerably reduces the computational burden. The proposed algorithm can be seen as a generalization of the algorithm by Schall (1991)—for variance components estimation—to deal with non-standard structures of the covariance matrix of the random effects. The practical performance of the proposed algorithm is evaluated by means of simulations, and comparisons with alternative methods are made on the basis of the mean square error criterion and the computing time. Finally, we illustrate our proposal with the analysis of two real datasets: a two dimensional example of historical records of monthly precipitation data in USA and a three dimensional one of mortality data from respiratory disease according to the age at death, the year of death and the month of death.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

References

  • Breslow, N.E., Clayton, D.G.: Aproximated inference in generalised linear mixed models. J. Am. Stat. Assoc. 88, 9–25 (1993)

    MATH  Google Scholar 

  • Currie, I., Durban, M., Eilers, P.H.C.: Generalized linear array models with applications to multidimensional smoothing. J. R. Stat. Soc. Ser. B 68, 259–280 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  • Currie, I., Durban, M.: Flexible smoothing with P-splines: a unified approach. Stat. Model. 4, 333–349 (2002)

    MathSciNet  Article  Google Scholar 

  • de Boor, C.A.: A Practical Guide to Splines. Revised Edition. Springer, New York (2001)

    Google Scholar 

  • Eilers, P.H.C., Marx, B.D.: Flexible smoothing with B-splines and penalties. Stat. Sci. 11, 89–121 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  • Eilers, P.H.C., Marx, B.D.: Multivariate calibration with temperature interaction using two-dimensional penalized signal regression. Chemom. Intell. Lab. Syst. 66, 159–174 (2003)

    Article  Google Scholar 

  • Eilers, P.H.C., Currie, I., Durban, M.: Fast and compact smoothing on large multidimensional grids. Comput. Stat. Data Anal. 50, 61–76 (2006)

    MathSciNet  Article  MATH  Google Scholar 

  • Fahrmeir, L., Kneib, T., Lang, S.: Penalized structured additive regression for space-time data: a Bayesian perspective. Stat. Sin. 14, 715–745 (2004)

    MathSciNet  Google Scholar 

  • Gilmour, A.R., Thompson, R., Cullis, B.R.: Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. 51, 1440–1450 (1995)

  • Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990)

    MATH  Google Scholar 

  • Hastie, T.J., Tibshirani, R.J.: Varying-coefficient models. J. R. Stat. Soc. Ser. B 55, 757–796 (1993)

    MathSciNet  MATH  Google Scholar 

  • Harville, D.A.: Maximum likelihood approaches to variance component estimation and to related problems. J. Am. Stat. Assoc. 72, 320–338 (1977)

    MathSciNet  Article  MATH  Google Scholar 

  • Johns, C., Nychka, D., Kittel, T., Daly, C.: Infilling sparse records of spatial fields. J. Am. Stat. Assoc. 98, 796–806 (2003)

    MathSciNet  Article  Google Scholar 

  • Krivobokova, T., Crainiceanu, C.M., Kauermann, G.: Fast adaptive penalized splines. J. Comput. Graph. Stat. 17, 1–20 (2008)

    Google Scholar 

  • Lang, S., Brezger, A.: Bayesian P-splines. J. Comput. Grap. Stat. 13, 183–212 (2004)

  • Lee, D.-J.: Smothing mixed model for spatial and spatio-temporal data. PhD thesis, Department of Statistics, Universidad Carlos III de Madrid, Spain (2010)

  • Lee, D.-J., Durbán, M.: P-spline ANOVA-type interaction models for spatio-temporal smoothing. Stat. Model. 11, 49–69 (2011)

    MathSciNet  Article  Google Scholar 

  • Lee, D.-J., Durbán, M., Eilers, P.H.C.: Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Comput. Stat. Data Anal. 61, 22–37 (2013)

    Article  Google Scholar 

  • Lin, X., Breslow, N.E.: Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Stat. Assoc. 91, 1007–1016 (1996)

    MathSciNet  Article  MATH  Google Scholar 

  • Lin, X., Zhang, D.: Inference in generalized additive mixed models using smoothing splines. J. R. Stat. Soc. Ser. B 61, 381–400 (1999)

    Article  MATH  Google Scholar 

  • Pawitan, Y.: In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, USA (2001)

    Google Scholar 

  • R Core Team. R: a language and environment for statistical computing. R  Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/ (2013)

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Schall, R.: Estimation in generalized linear models with random effects. Biometrika 78, 719–721 (1991)

    Article  MATH  Google Scholar 

  • Stiratelli, R., Laird, N.M., Ware, J.H.: Random effects models with serial observations with binary responses. Biometrics 40, 719–727 (1984)

    Article  Google Scholar 

  • Wand, M.P.: Smoothing and mixed models. Comput. Stat. 18, 223–249 (2003)

    MATH  Google Scholar 

  • Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. Ser. B 65, 95–114 (2003)

    Article  MATH  Google Scholar 

  • Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Am. Stat. Assoc. 99, 673–686 (2004)

    Article  MATH  Google Scholar 

  • Wood, S.N.: Generalized Additive Models. An introduction with R. Chapman & Hall/CRC, Boca Raton (2006a)

    MATH  Google Scholar 

  • Wood, S.N.: Low-rank scale-invariant tensor product smooths for generalized additive models. J. R. Stat. Soc. Ser. B 70, 495–518 (2006b)

    Article  Google Scholar 

  • Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. Biometrics 62, 1025–1036 (2008)

    Article  Google Scholar 

  • Wood, S.N.: Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. Ser. B 73, 3–36 (2011)

    Google Scholar 

  • Wood, S.N., Scheipl, F., Faraway, J.J.: Straightforward intermediate rank tensor product smoothing in mixed models. Stat. Comput. 23, 341–360 (2013)

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgments

The authors would like to express their gratitude for the support received in the form of the Spanish Ministry of Economy and Competitiveness grants MTM2011-28285-C02-01 and MTM2011-28285-C02-02. The research of Dae-Jin Lee was funded by an NIH grant for the Superfund Metal Mixtures, Biomarkers and Neurodevelopment project 1PA2ES016454-01A2. We are also grateful to the associate editor and the two peer referees for their valuable comments and suggestions, which served to make a substantial improvement to this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to María Xosé Rodríguez-Álvarez.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 243 KB)

Appendices

Appendix 1: Fixed and random effects coefficients estimation

For given values of the variance components \(\tau _d^2\) (\(d = 1, 2\)) and \(\phi \), estimation of the fixed and random effects coefficients of model (4), can be obtained by maximizing, with respect to \(\varvec{\beta }\) and \(\varvec{\alpha }\), the approximate penalized log-likelihood (see Eq. (6) in Breslow and Clayton 1993)

$$\begin{aligned} -\frac{1}{2\phi }\sum _{i=1}^{n}Dev_i\left( y_i, \mu _i\right) - \frac{1}{2}\varvec{\alpha }^{t}\varvec{G}^{-1}\varvec{\alpha }, \end{aligned}$$

where \(Dev_i\) denotes the deviance. This maximization can be carried out on the basis of a Fisher-Scoring algorithm, involving a working dependent variable and a weight matrix, which should be updated at each iteration. Specifically, at \((k+1)\)th Fisher-Scoring iteration, the working vector \(\varvec{z}\) is obtained as

$$\begin{aligned} z_i = g(\mu _i^{(k)}) + (y_i - \mu _i^{(k)})g^{\prime }(\mu _i^{(k)}), \end{aligned}$$

and the model’s fixed and random effects are then estimated as

$$\begin{aligned} \varvec{\hat{\beta }}^{(k+1)}&= \left( \varvec{X}^{t}\varvec{V}^{-1} \varvec{X}\right) ^{-1}\varvec{X}^{t}\varvec{V}^{-1}\varvec{z}, \end{aligned}$$
(10)
$$\begin{aligned} \varvec{\hat{\alpha }}^{(k+1)}&= \varvec{G}\varvec{Z}^{t} \varvec{V}^{-1}\left( \varvec{z}- \varvec{X} \varvec{\hat{\beta }}^{(k+1)}\right) \nonumber \\&= \varvec{G}\varvec{Z}^{t}\varvec{P}\varvec{z}, \end{aligned}$$
(11)

where

$$\begin{aligned} \varvec{V}&= \varvec{W}^{-1} +\varvec{Z}\varvec{G} \varvec{Z}^{t},\\ \varvec{P}&= \varvec{V}^{-1} - \varvec{V^{-1}} \varvec{X} \left( \varvec{X^{t}V^{-1}X}\right) ^{-1} \varvec{X}^{t}\varvec{V^{-1}}, \end{aligned}$$

and \(\varvec{W}\) is a diagonal matrix of weights with elements \(w_{ii} = \left\{ \phi [g'(\mu _i^{(k)})]^2\nu (\mu _i^{(k)})\right\} ^{-1}\).

From a computational point of view, a more convenient method for jointly obtaining \(\varvec{\hat{\beta }}\) and \(\varvec{\hat{\alpha }}\) is by the solution of the linear system (see Eq. (9) in Breslow and Clayton 1993)

$$\begin{aligned} \underbrace{ \begin{bmatrix} \varvec{X}^t\varvec{W}\varvec{X}&\varvec{X}^t\varvec{W}\varvec{Z}\varvec{G} \\ \varvec{Z}^t\varvec{W}\varvec{X}&\varvec{I} + \varvec{Z}^t\varvec{W}\varvec{Z}\varvec{G} \end{bmatrix}}_{\varvec{C}} \begin{bmatrix} \varvec{\hat{\beta }}^{(k+1)}\\ \varvec{\hat{b}}^{(k+1)} \end{bmatrix} = \begin{bmatrix} \varvec{X}^{t}\varvec{W}\varvec{z}\\ \varvec{Z}^{t}\varvec{W}\varvec{z} \end{bmatrix}, \end{aligned}$$
(12)

where \(\varvec{\hat{b}}^{(k+1)} = \varvec{G}^{-1} \varvec{\hat{\alpha }}^{(k+1)}\). Note that (12) corresponds to the normal equations of the best linear unbiased estimation of \(\varvec{\beta }\) and the best linear unbiased prediction of \(\varvec{\alpha }\) under the working linear mixed model

$$\begin{aligned} \varvec{z}&= \varvec{X}\varvec{\beta } + \varvec{Z}\varvec{\alpha } + \varvec{\epsilon },\;\;\; \text{ with }\;\;\; \varvec{\alpha }\sim N(\varvec{0},\varvec{G})\\&\quad \text{ and }\;\;\;\varvec{\epsilon }\sim N(\varvec{0},\varvec{W}^{-1}). \end{aligned}$$

Appendix 2: Proof of theorem

Proof

Ignoring the dependence of \(\varvec{W}\) on \(\tau _d\) (\(d = 1, 2\)), the approximate restricted log-likelihood of the working linear mixed model is given by (Breslow and Clayton 1993)

$$\begin{aligned} l^{*}&= -\frac{1}{2}\log |\varvec{V}|-\frac{1}{2}\log |\varvec{X^{t} V^{-1}X}|\\&-\frac{1}{2}(\varvec{z}-\varvec{X}\varvec{\hat{\beta }})^{t} \varvec{V}^{-1}(\varvec{z}-\varvec{X}\varvec{\hat{\beta }}). \end{aligned}$$

The REML estimates of the variance components are then obtained in the usual manner by maximizing this quantity. Taking derivatives with respect to the variance components \(\tau _d^2\) (\(d=1,2\)), we obtain (see online Supplementary Material for details)

$$\begin{aligned} \frac{\partial {l^{*}}}{\partial {\tau _d^2}} =-\frac{1}{2}trace \left( \varvec{Z}^{t}\varvec{P}\varvec{Z} \frac{\partial {\varvec{G}}}{\partial {\tau _d^2}}\right) +\frac{1}{2}\varvec{\hat{\alpha }}^{t}\varvec{G}^{-1} \frac{\partial {\varvec{G}}}{\partial {\tau _d^2}} \varvec{G}^{-1}\varvec{\hat{\alpha }}.\nonumber \\ \end{aligned}$$
(13)

Applying matrix differentiation properties, we have

$$\begin{aligned} \frac{\partial {\varvec{G}}}{\partial {\tau _2^2}} = - \varvec{G}\frac{\partial {\varvec{G}^{-1}}}{\partial {\tau _d^2}} \varvec{G}=\frac{1}{\tau _d^4}\varvec{G}\varvec{\Lambda }_d \varvec{G}, \end{aligned}$$
(14)

where

$$\begin{aligned}&\varvec{G} = \text{ diag }\left( \tau _2^2/\vec {\varvec{d}}_2, \tau _1^2/\vec {\varvec{d}}_1, 1/(\vec {\varvec{d}}_2^*/ \tau _2^2 + \vec {\varvec{d}}_1^*/ \tau _1^2)\right) , \\&\varvec{\Lambda }_1 = \text{ diag }(\vec {\varvec{0}}_{q_1(c_2 - q_2)},\vec {\varvec{d}}_1,\vec {\varvec{d}}_1^*),\nonumber \\&\varvec{\Lambda }_2 = \text{ diag }(\vec {\varvec{d}}_2, \vec {\varvec{0}}_{q_2(c_1 - q_1)}, \vec {\varvec{d}}_2^*),\nonumber \end{aligned}$$
(15)

with \(\vec {\varvec{0}}_{r}\) being a vector of zeroes of length \(r\), and \(\varvec{d}_1 = \varvec{I}_{q_2}\otimes \tilde{\varvec{\Sigma }}_1\), \(\varvec{d}_2 = \tilde{\varvec{\Sigma }}_2 \otimes \varvec{I}_{q_1}\), \(\varvec{d}_1^*= \varvec{I}_{c_2-q_2}\otimes \tilde{\varvec{\Sigma }}_1\), \(\varvec{d}_2^*= \tilde{\varvec{\Sigma }}_2 \otimes \varvec{I}_{c_1-q_1}\). By pluggin expression (14) in (13) we obtain that the first-order partial derivatives of the approximate restricted log-likelihood become

$$\begin{aligned} 2\frac{\partial {l^{*}}}{\partial {\tau _d^2}} = -\frac{1}{\tau _d^2}trace \left( \varvec{Z}^{t}\varvec{P}\varvec{Z}\varvec{G} \frac{\varvec{\Lambda }_d}{\tau _d^2}\varvec{G}\right) +\frac{1}{\tau _d^4}\varvec{\hat{\alpha }}^{t}\varvec{\Lambda }_d \varvec{\hat{\alpha }}. \end{aligned}$$
(16)

Then, REML estimates of the variance components \(\tau _d^2\) (\(d=1,2\)) are found by equating expression (16) to zero, which gives

$$\begin{aligned} \hat{\tau }_d^2=\frac{\varvec{\hat{\alpha }}^{t}\varvec{\Lambda }_d \varvec{\hat{\alpha }}}{trace\left( \varvec{Z}^{t}\varvec{P} \varvec{Z}\varvec{G}\frac{\varvec{\Lambda }_d}{\tau _d^2} \varvec{G}\right) }. \end{aligned}$$

Before proceeding with the estimation of  \(\phi \)—if unknown—it is important to observe that the sum of the quantities involved in the denominators of the variance components estimates corresponds to the effective dimension of the penalized part (or random part) of the fitted model

$$\begin{aligned}&trace\left( \varvec{Z}^{t}\varvec{P}\varvec{Z}\varvec{G} \frac{\varvec{\Lambda }_1}{\tau _1^2}\varvec{G}\right) + trace\left( \varvec{Z}^{t}\varvec{P}\varvec{Z}\varvec{G} \frac{\varvec{\Lambda }_2}{\tau _2^2}\varvec{G}\right) \\&\quad =trace\left( \varvec{Z}^{t}\varvec{P}\varvec{Z} \varvec{G}\right) \\&\quad = trace\left( \varvec{Z}\varvec{G}\varvec{Z}^{t} \varvec{P}\right) \\&\quad = trace\left( \varvec{H}_{Random}\right) , \end{aligned}$$

where \(\varvec{H}_{Random}\) denotes the hat matrix (Hastie and Tibshirani 1990) of the random part [see (10)].

Finally, an estimate of \(\phi \) is obtained, as before, by taking derivatives of the approximate restricted log-likelihood with respect to \(\phi \)

$$\begin{aligned} \frac{\partial {l^{*}}}{\partial {\phi }}&= -\frac{1}{2}trace\left( \varvec{P} \frac{\partial {\varvec{V}}}{\partial {\phi }}\right) \\&+ \frac{1}{2}(\varvec{z} -\varvec{X}\varvec{\hat{\beta }})^{t} \varvec{V}^{-1}\frac{\partial {\varvec{V}}}{\partial {\phi }} \varvec{V}^{-1}(\varvec{z}- \varvec{X}\varvec{\hat{\beta }}). \end{aligned}$$

First, by Eq. (5.2) in Harville (1977), we have that \(\varvec{V}^{-1}(\varvec{z} - \varvec{X} \varvec{\hat{\beta }}) = \varvec{W}(\varvec{z} - \varvec{X} \varvec{\hat{\beta }} - \varvec{Z} \varvec{\hat{\alpha }})\). Moreover, given that \(\varvec{V}\) depends on \(\phi \) through \(\varvec{W}^{-1}\) which can be rewritten as \(\varvec{W} = \frac{1}{\phi } \widetilde{\varvec{W}}\), with \(\widetilde{\varvec{W}}\) being a diagonal matrix with elements \(\widetilde{w}_{ii} = \left\{ [g'\left( \mu _i\right) ]^2\nu \left( \mu _i\right) \right\} ^{-1}\), and ignoring again the dependence of \(\widetilde{\varvec{W}}\) on \(\phi \), it then follows that

$$\begin{aligned} 2\frac{\partial {l^{*}}}{\partial {\phi }}&= - \frac{1}{\phi }trace \left( \varvec{P}\varvec{W}^{-1}\right) \nonumber \\&+ \frac{1}{\phi ^2} (\varvec{z} -\varvec{X}\varvec{\hat{\beta }} - \varvec{Z}\varvec{\hat{\alpha }})^{t}\widetilde{\varvec{W}} (\varvec{z} - \varvec{X}\varvec{\hat{\beta }} - \varvec{Z}\varvec{\hat{\alpha }}). \end{aligned}$$

By equating the above expression to zero, we obtain

$$\begin{aligned} \hat{\phi } = \frac{(\varvec{z} -\varvec{X}\varvec{\hat{\beta }} - \varvec{Z}\varvec{\hat{\alpha }})^{t}\widetilde{\varvec{W}}(\varvec{z} - \varvec{X}\varvec{\hat{\beta }} -\varvec{Z}\varvec{\hat{\alpha }})}{trace\left( \varvec{P}\varvec{W}^{-1}\right) }, \end{aligned}$$

where [see Eq. (5.3) in Harville 1977 and expressions (10), (10), and (12)]

$$\begin{aligned}&trace\left( \varvec{P}\varvec{W}^{-1}\right) = trace\left( \varvec{W}^{-1}\varvec{P}\right) \\&\quad = trace\left( \varvec{I}_n - [\varvec{X}|\varvec{Z}\varvec{G}]\varvec{C}^{-1} \begin{bmatrix} \varvec{X}^{t}\varvec{W}\\ \varvec{Z}^{t}\varvec{W} \end{bmatrix}\right) \\&\quad = trace\left( \varvec{I}_n - [\varvec{X}|\varvec{Z}\varvec{G}] \begin{bmatrix} \left( \varvec{X}^{t}\varvec{V}^{-1}\varvec{X}\right) ^{-1} \varvec{X}^{t}\varvec{V}^{-1}\\ \varvec{Z}^{t}\varvec{P} \end{bmatrix}\right) \\&\quad = n - trace\left( \varvec{X}\left( \varvec{X}^{t}\varvec{V}^{-1} \varvec{X}\right) ^{-1}\varvec{X}^{t}\varvec{V}^{-1}\right) \\&\qquad - trace\left( \varvec{Z}\varvec{G}\varvec{Z}^{t}\varvec{P}\right) \\&\quad = n - rank\left( \varvec{X}\right) - \sum _{d=1}^{2}ed_d. \end{aligned}$$

Note that \(\varvec{H} =[\varvec{X}|\varvec{Z}\varvec{G}] \varvec{C}^{-1}[\varvec{X}|\varvec{Z}]^{t}\varvec{W}\) corresponds with the hat matrix of the fitted model, whose trace, as shown, can be decomposed as the sum of the traces of the hat matrices of the unpenalized (or fixed) part and the penalized (or random) part. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rodríguez-Álvarez, M.X., Lee, DJ., Kneib, T. et al. Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Stat Comput 25, 941–957 (2015). https://doi.org/10.1007/s11222-014-9464-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-014-9464-2

Keywords

  • Smoothing
  • P-splines
  • Tensor product
  • Anisotropic penalty
  • Mixed models