Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

Kneib, Thomas; Klein, Nadja; Lang, Stefan; Umlauf, Nikolaus

doi:10.1007/s11749-019-00631-z

Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

Invited Paper
Published: 15 February 2019

Volume 28, pages 1–39, (2019)
Cite this article

TEST Aims and scope Submit manuscript

Thomas Kneib¹,
Nadja Klein²,
Stefan Lang³ &
…
Nikolaus Umlauf³

790 Accesses
8 Citations
Explore all metrics

Abstract

Semiparametric regression models offer considerable flexibility concerning the specification of additive regression predictors including effects as diverse as nonlinear effects of continuous covariates, spatial effects, random effects, or varying coefficients. Recently, such flexible model predictors have been combined with the possibility to go beyond pure mean-based analyses by specifying regression predictors on potentially all parameters of the response distribution in a distributional regression framework. In this paper, we discuss a generic concept for defining interaction effects in such semiparametric distributional regression models based on tensor products of main effects. These interactions can be assigned anisotropic penalties, i.e. different amounts of smoothness will be associated with the interacting covariates. We investigate identifiability and the decomposition of interactions into main effects and pure interaction effects (similar as in a smoothing spline analysis of variance) to facilitate a modular model building process. The decomposition is based on orthogonality in function spaces which allows for considerable flexibility in setting up the effect decomposition. Inference is based on Markov chain Monte Carlo simulations with iteratively weighted least squares proposals under constraints to ensure identifiability and effect decomposition. One important aspect is therefore to maintain sparse matrix structures of the tensor product also in identifiable, decomposed model formulations. The performance of modular regression is verified in a simulation on decomposed interaction surfaces of two continuous covariates and two applications on the construction of spatio-temporal interactions for the analysis of precipitation on the one hand and functional random effects for analysing house prices on the other hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cross-covariance functions for additive and coupled joint spatiotemporal SPDE models in R-INLA

Article 20 November 2017

A matrix exponential spatial specification approach to panel data models

Article 29 August 2014

Fast estimation of matrix exponential spatial models

Article 12 September 2021

References

Adler D, Kneib T, Lang S, Umlauf N, Zeileis A (2012) BayesXsrc: R Package Distribution of the BayesX C++ Sources. R package version 3.0-0. https://CRAN.R-project.org/package=BayesXsrc. Accessed 29 Jan 2019
Belitz C, Brezger A, Klein N, Kneib T, Lang S, Umlauf N (2015) BayesX—Software for Bayesian inference in structured additive regression models. Version 3.0.2. http://www.bayesx.org. Accessed 29 Jan 2019
Besag J, Higdon D (1999) Bayesian analysis of agricultural field experiments. J R Stat Soc Ser B (Methodol) 61:691–746
Article MathSciNet MATH Google Scholar
Brezger A, Lang S (2006) Generalized structured additive regression based on Bayesian P-splines. Comput Stat Data Anal 50:967–991
Article MathSciNet MATH Google Scholar
Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, New York
Book MATH Google Scholar
Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression for space–time data: a Bayesian perspective. Stat Sin 14:731–761
MathSciNet MATH Google Scholar
Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression—models, methods and applications. Springer, Berlin
MATH Google Scholar
Gamerman D (1997) Sampling from the posterior distribution in generalized linear mixed models. Stat Comput 7:57–68
Article Google Scholar
Gelfand AE, Sahu SK (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253
Article MathSciNet MATH Google Scholar
Gelman A (2006) Prior distributions for variance parameters in hierarchichal models. Bayesian Anal 1:515–533
Article MathSciNet MATH Google Scholar
Goicoa T, Adin A, Ugarte MD, Hodges JS (2018) In spatio-temporal disease mapping models, identifiability constraints affet PQL and INLA results. Stoch Environ Res Risk Assess 32:749–770
Article Google Scholar
Gu C (2002) Smoothing spline ANOVA models. Springer, New York
Book MATH Google Scholar
Hodges J S (2013) Richly parameterized linear models: additive, time series, and spatial models using random effects. Chapman & Hall/CRC, New York/Boca Raton
MATH Google Scholar
Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc Ser B (Stat Methodol) 75:139–159
Article MathSciNet Google Scholar
Klein N (2018) sdPrior: scale-dependent hyperpriors in structured additive distributional regression. R package version 1.0
Klein N, Kneib T (2016a) Scale-dependent priors for variance parameters in structured additive distributional regression. Bayesian Anal 11:1071–1106
Article MathSciNet MATH Google Scholar
Klein N, Kneib T (2016b) Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Stat Comput 26:841–860
Article MathSciNet MATH Google Scholar
Klein N, Kneib T, Klasen S, Lang S (2015a) Bayesian structured additive distributional regression for multivariate responses. J R Stat Soc Ser C (Appl Stat) 64:569–591
Article MathSciNet Google Scholar
Klein N, Kneib T, Lang S (2015b) Bayesian generalized additive models for location, scale and shape for zero-inflated and overdispersed count data. J Am Stat Assoc 110:405–419
Article MathSciNet MATH Google Scholar
Klein N, Kneib T, Lang S, Sohn A (2015c) Bayesian structured additive distributional regression with with an application to regional income inequality in Germany. Ann Appl Stat 9:1024–1052
Article MathSciNet MATH Google Scholar
Knorr-Held L (2000) Bayesian modelling of inseparable space-time variation in disease risk. Stat Med 19:2555–2567
Article Google Scholar
Lang S, Brezger A (2004) Bayesian P-splines. J Comput Graph Stat 13:183–212
Article MathSciNet MATH Google Scholar
Lang S, Umlauf N, Wechselberger P, Harttgen K, Kneib T (2014) Multilevel structured additive regression. Stat Comput 24:223–238
Article MathSciNet MATH Google Scholar
Lavine M, Hodges JS (2012) On rigorous specification of icar models. Am Stat 66:42–49
Article MathSciNet Google Scholar
Lee D-J, Durbán M (2011) P-spline ANOVA type interaction models for spatio temporal smoothing. Stat Model 11:46–69
Article MathSciNet Google Scholar
Marí-Dell’Olmo M, Martinez-Beneito MA, Mercè Gotsens M, Palència L (2014) A smoothed anova model for multivariate ecological regression. Stoch Environ Res Risk Assess 28:695–706
Article Google Scholar
Marra G, Radice R (2017) Bivariate copula additive models for location, scale and shape. Comput Stat Data Anal 112:99–113
Article MathSciNet MATH Google Scholar
Marra G, Wood SN (2012) Coverage properties of confidence intervals for generalized additive model components. Scand J Stat 39:53–74
Article MathSciNet MATH Google Scholar
Paciorek CJ (2007) Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP package. J Stat Softw 19:1–38
Article Google Scholar
R Core Team (2017) R: a Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. https://www.R-project.org/. Accessed 29 Jan 2019
Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206
Article MathSciNet MATH Google Scholar
Rigby RA, Stasinopoulos DM (2005) Generalized additive models for location, scale and shape (with discussion). J R Stat Soc Ser C (Appl Stat) 54:507–554
Article MATH Google Scholar
Rodriguez Alvarez MX, Lee D-J, Kneib T, Durban M, Eilers P (2015) Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Stat Comput 25:941–957
Article MathSciNet MATH Google Scholar
Rue H, Held L (2005) Gaussian Markov random fields. Chapman & Hall/CRC, New York/Boca Raton
Book MATH Google Scholar
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Book MATH Google Scholar
Simpson D, Rue H, Martins TG, Riebler A, Sørbye SH (2017) Penalising model component complexity: a principled, practical approach to constructing prior. Stat Sci 32(1):1–28
Article MathSciNet MATH Google Scholar
Stauffer R, Mayr GJ, Messner JW, Umlauf N, Zeileis A (2016) Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int J Climatol 15:3264
Google Scholar
Stauffer R, Umlauf N, Messner JW, Mayr GJ, Zeileis A (2017) Ensemble postprocessing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies. Mon Weather Rev 145(3):955–969
Article Google Scholar
Ugarte MD, Adin A, Goicoa T (2017) One-dimensional, two-dimensional, and three-dimensional B-splines to specify space-time interations in bayesian disease mapping: model fitting and model identifiability. Spat Stat 22:451–468
Article MathSciNet Google Scholar
Umlauf N, Klein N, Zeileis A, Köhler M (2018) bamlss : Bayesian additive models for location scale and shape (and Beyond). R package version 1.0-0. http://CRAN.R-project.org/package=bamlss. Accessed 29 Jan 2019
Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline anova for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895
Article MathSciNet MATH Google Scholar
Wood SN (2006) Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62:1025–1036
Article MathSciNet MATH Google Scholar
Wood SN (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc Ser B (Stat Methodol) 70:495–518
Article MathSciNet MATH Google Scholar
Wood S (2015) mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimations. R package version 1.8-5
Wood SN (2017) Generalized additive models : an introduction with R. Chapman & Hall/CRC, New York/Boca Raton
Book MATH Google Scholar
Wood SN, Scheipl F, Faraway JJ (2013) Straightforward intermediate rank tensor product smoothing in mixed models. Stat Comput 23:341–360
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank the referees and the associate editor for many valuable comments that lead to a significant improvement in our paper upon the original submission. We are grateful to Jim Hodges for pointing us to the alternative representation of the tensor product precision matrix based on eigen decompositions. Financial support by the German Research Foundation (DFG), Grant KN 922/9-1 is gratefully acknowledged.

Author information

Authors and Affiliations

Chair of Statistics, Georg-August-Universität Göttingen, Göttingen, Germany
Thomas Kneib
Humboldt Universität zu Berlin, Berlin, Germany
Nadja Klein
Department of Statistics, Universität Innsbruck, Innsbruck, Austria
Stefan Lang & Nikolaus Umlauf

Authors

Thomas Kneib
View author publications
You can also search for this author in PubMed Google Scholar
Nadja Klein
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Lang
View author publications
You can also search for this author in PubMed Google Scholar
Nikolaus Umlauf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Kneib.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This invited paper is discussed in comments available at: https://doi.org/10.1007/s11749-019-00632-y, doi: https://doi.org/10.1007/s11749-019-00633-x, doi: https://doi.org/10.1007/s11749-019-00634-w, and doi: https://doi.org/10.1007/s11749-019-00635-9

Appendices

Appendix

Mixed model decomposition

For the further developments, it is useful to study functional effects with partially improper prior to determine which parts of the function are assigned an informative prior and which ones are assigned a flat prior. While this is hidden in the multivariate normal prior (1), it can be made explicit by reparameterising the basis functions utilising the mixed model representation of structured additive regression terms. This reparameterisation can be obtained from the spectral decomposition of the prior precision matrix as

$$\begin{aligned} {{\varvec{K}}}= {\varvec{\Gamma }}{\varvec{\Omega }}{\varvec{\Gamma }}' \end{aligned}$$

where

$$\begin{aligned} {\varvec{\Omega }}= {{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _D), \qquad \omega _1\le \ldots \le \omega _D \end{aligned}$$

comprises the eigenvalues (in ascending order) and

$$\begin{aligned} {\varvec{\Gamma }}'{\varvec{\Gamma }}={{\varvec{I}}}\end{aligned}$$

is the orthonormal matrix of corresponding eigenvectors. Let now $M = \dim ({\varvec{\gamma }})-{{\,\mathrm{rk}\,}}({{\varvec{K}}}) = D - R > 0$ denote the rank deficiency of ${{\varvec{K}}}$, then the first M eigenvalues are equal to zero, i.e. $\omega _1=\ldots =\omega _M=0$. Accordingly, we split the spectral decomposition into

$$\begin{aligned} {\varvec{\Gamma }}=[{\varvec{\Gamma }}_1,{\varvec{\Gamma }}_2], \qquad {\varvec{\Omega }}= {{\,\mathrm{blockdiag}\,}}({\varvec{\Omega }}_1,{\varvec{\Omega }}_2) \end{aligned}$$

where ${\varvec{\Omega }}_1={{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _M)$ and ${\varvec{\Omega }}_2={{\,\mathrm{diag}\,}}(\omega _{M+1},\ldots ,\omega _D)$ comprise zero and nonzero eigenvalues, respectively, while ${\varvec{\Gamma }}_1$ and ${\varvec{\Gamma }}_2$ are the corresponding matrices of eigenvectors of dimension $D\times M$ and $D\times R$. We then obtain a reduced representation of the precision matrix as

$$\begin{aligned} {{\varvec{K}}}={\varvec{\Gamma }}_2{\varvec{\Omega }}_2{\varvec{\Gamma }}_2'. \end{aligned}$$

Based on ${\tilde{{\varvec{X}}}}={\varvec{\Gamma }}_1$ and ${\tilde{{\varvec{Z}}}}={\varvec{\Gamma }}_2{\varvec{\Omega }}_2^{-1/2}$, we can now reparameterise ${\varvec{\gamma }}$ as

$$\begin{aligned} {\varvec{\gamma }}= {\tilde{{\varvec{X}}}}{\varvec{\beta }}+ {\tilde{{\varvec{Z}}}}{\varvec{\alpha }}\end{aligned}$$

where the new regression parameters ${\varvec{\beta }}$ and ${\varvec{\alpha }}$ (which are of dimension M and R, respectively) follow the prior specifications

$$\begin{aligned} p({\varvec{\beta }})\propto {{\,\mathrm{const}\,}}\qquad {\varvec{\alpha }}\sim {{\,\mathrm{N}\,}}(0,\tau ^2{{\varvec{I}}}_R). \end{aligned}$$

This can be interpreted as follows: A flat, noninformative prior is assigned to the vector of regression coefficients ${\varvec{\beta }}$ which therefore represents the part of the function that is not affected by the prior specification while ${\varvec{\alpha }}$ follows an i.i.d. Gaussian prior. In mixed model terminology, this means that ${\varvec{\beta }}$ comprises fixed effects while ${\varvec{\alpha }}$ are i.i.d. Gaussian random effects. Note that ${\tilde{{\varvec{X}}}}$ defines a basis of the null space of the prior precision matrix ${{\varvec{K}}}$ and in fact any alternative basis of this null space can be considered as well. In particular, for specific types of effects one can deduce bases that entail an easier interpretation.

In matrix notation, the mixed model reparameterisation induces a similar decomposition of the vector of function evaluations, i.e.

$$\begin{aligned} {{\varvec{f}}}= {{\varvec{X}}}{\varvec{\beta }}+ {{\varvec{Z}}}{\varvec{\alpha }}\end{aligned}$$

where ${{\varvec{X}}}={{\varvec{B}}}{\tilde{{\varvec{X}}}}$ and ${{\varvec{Z}}}={{\varvec{B}}}{\tilde{{\varvec{Z}}}}$. For the individual function evaluations, we obtain

$$\begin{aligned} f(\nu )= & {} x_1\beta _1 + \ldots + x_M\beta _M + \tilde{f}(\nu )\\= & {} {{\varvec{x}}}'{\varvec{\beta }}+ {{\varvec{z}}}(\nu )'{\varvec{\alpha }}\end{aligned}$$

where ${{\varvec{x}}}'={{\varvec{b}}}(\nu ){\tilde{{\varvec{X}}}}$ and ${{\varvec{z}}}(\nu )'={{\varvec{b}}}(\nu )'{\tilde{{\varvec{Z}}}}$. Following the mixed model representation above this means that $x_1,\ldots ,x_M$ are covariates with fixed effects, whereas $\tilde{f}(\nu )$ is a function that is obtained from $f(\nu )$ by removing these fixed effects by means of a reparameterisation. For the new function $\tilde{f}(\nu )$, the basis functions are defined in the vector ${{\varvec{z}}}(\nu )$.

Note that by construction the reparameterisation is a one-to-one transformation, and therefore, explicit expressions for the reparameterised regression coefficients can be obtained from

$$\begin{aligned} ({\tilde{{\varvec{X}}}},{\tilde{{\varvec{Z}}}})^{-1} = \begin{pmatrix}{\varvec{\Gamma }}_1'\\ {\varvec{\Omega }}_2^{1/2}{\varvec{\Gamma }}_2'\end{pmatrix} \end{aligned}$$

and are then given by

$$\begin{aligned} {\varvec{\beta }}= {\varvec{\Gamma }}_1'{\varvec{\gamma }}\qquad {\varvec{\alpha }}= {\varvec{\Omega }}_2^{1/2}{\varvec{\Gamma }}_2'{\varvec{\gamma }}. \end{aligned}$$

Algorithm for constrained sampling

As shown in Rue and Held (2005, Algorithm 2.6), a sample ${\varvec{\gamma }}$ can be modified to fulfil the constraint ${{\varvec{A}}}{\varvec{\gamma }}=\mathbf {0}$ applying the following steps:

Compute the $D\times a$ matrix ${{\varvec{V}}}={{\varvec{P}}}^{-1}{{\varvec{A}}}'$ by solving the equation systems ${{\varvec{P}}}{{\varvec{V}}}={{\varvec{A}}}'$ for each of the columns of ${{\varvec{V}}}$, see Rue and Held (2005, Algorithm 2.1).
Compute the $a\times a$ matrix ${{\varvec{W}}}={{\varvec{A}}}{{\varvec{V}}}$.
Compute the $a\times D$ matrix ${{\varvec{U}}}={{\varvec{W}}}^{-1}{{\varvec{V}}}'$ by solving the equation systems ${{\varvec{W}}}{{\varvec{U}}}={{\varvec{V}}}'$ for each of the columns of ${{\varvec{U}}}$.
Compute the constrained sample ${\varvec{\gamma }}^*={\varvec{\gamma }}-{{\varvec{U}}}'{{\varvec{A}}}{\varvec{\gamma }}$ where ${\varvec{\gamma }}$ is an unconstrained sample from ${{\,\mathrm{N}\,}}({\varvec{\mu }},{{\varvec{P}}}^{-1})$. Note that all four steps of the sampling scheme have to be conducted in each iteration of the MCMC algorithm since typically the precision matrix ${{\varvec{P}}}$ changes over the iterations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kneib, T., Klein, N., Lang, S. et al. Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions. TEST 28, 1–39 (2019). https://doi.org/10.1007/s11749-019-00631-z

Download citation

Published: 15 February 2019
Issue Date: 12 March 2019
DOI: https://doi.org/10.1007/s11749-019-00631-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

Abstract

Access this article

Similar content being viewed by others

Cross-covariance functions for additive and coupled joint spatiotemporal SPDE models in R-INLA

A matrix exponential spatial specification approach to panel data models

Fast estimation of matrix exponential spatial models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Mixed model decomposition

Algorithm for constrained sampling

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

Abstract

Access this article

Similar content being viewed by others

Cross-covariance functions for additive and coupled joint spatiotemporal SPDE models in R-INLA

A matrix exponential spatial specification approach to panel data models

Fast estimation of matrix exponential spatial models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix

Mixed model decomposition

Algorithm for constrained sampling

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation