Skip to main content
Log in

Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions

  • Invited Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Semiparametric regression models offer considerable flexibility concerning the specification of additive regression predictors including effects as diverse as nonlinear effects of continuous covariates, spatial effects, random effects, or varying coefficients. Recently, such flexible model predictors have been combined with the possibility to go beyond pure mean-based analyses by specifying regression predictors on potentially all parameters of the response distribution in a distributional regression framework. In this paper, we discuss a generic concept for defining interaction effects in such semiparametric distributional regression models based on tensor products of main effects. These interactions can be assigned anisotropic penalties, i.e. different amounts of smoothness will be associated with the interacting covariates. We investigate identifiability and the decomposition of interactions into main effects and pure interaction effects (similar as in a smoothing spline analysis of variance) to facilitate a modular model building process. The decomposition is based on orthogonality in function spaces which allows for considerable flexibility in setting up the effect decomposition. Inference is based on Markov chain Monte Carlo simulations with iteratively weighted least squares proposals under constraints to ensure identifiability and effect decomposition. One important aspect is therefore to maintain sparse matrix structures of the tensor product also in identifiable, decomposed model formulations. The performance of modular regression is verified in a simulation on decomposed interaction surfaces of two continuous covariates and two applications on the construction of spatio-temporal interactions for the analysis of precipitation on the one hand and functional random effects for analysing house prices on the other hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adler D, Kneib T, Lang S, Umlauf N, Zeileis A (2012) BayesXsrc: R Package Distribution of the BayesX C++ Sources. R package version 3.0-0. https://CRAN.R-project.org/package=BayesXsrc. Accessed 29 Jan 2019

  • Belitz C, Brezger A, Klein N, Kneib T, Lang S, Umlauf N (2015) BayesX—Software for Bayesian inference in structured additive regression models. Version 3.0.2. http://www.bayesx.org. Accessed 29 Jan 2019

  • Besag J, Higdon D (1999) Bayesian analysis of agricultural field experiments. J R Stat Soc Ser B (Methodol) 61:691–746

    Article  MathSciNet  MATH  Google Scholar 

  • Brezger A, Lang S (2006) Generalized structured additive regression based on Bayesian P-splines. Comput Stat Data Anal 50:967–991

    Article  MathSciNet  MATH  Google Scholar 

  • Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, New York

    Book  MATH  Google Scholar 

  • Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression for space–time data: a Bayesian perspective. Stat Sin 14:731–761

    MathSciNet  MATH  Google Scholar 

  • Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression—models, methods and applications. Springer, Berlin

    MATH  Google Scholar 

  • Gamerman D (1997) Sampling from the posterior distribution in generalized linear mixed models. Stat Comput 7:57–68

    Article  Google Scholar 

  • Gelfand AE, Sahu SK (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A (2006) Prior distributions for variance parameters in hierarchichal models. Bayesian Anal 1:515–533

    Article  MathSciNet  MATH  Google Scholar 

  • Goicoa T, Adin A, Ugarte MD, Hodges JS (2018) In spatio-temporal disease mapping models, identifiability constraints affet PQL and INLA results. Stoch Environ Res Risk Assess 32:749–770

    Article  Google Scholar 

  • Gu C (2002) Smoothing spline ANOVA models. Springer, New York

    Book  MATH  Google Scholar 

  • Hodges J S (2013) Richly parameterized linear models: additive, time series, and spatial models using random effects. Chapman & Hall/CRC, New York/Boca Raton

    MATH  Google Scholar 

  • Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc Ser B (Stat Methodol) 75:139–159

    Article  MathSciNet  Google Scholar 

  • Klein N (2018) sdPrior: scale-dependent hyperpriors in structured additive distributional regression. R package version 1.0

  • Klein N, Kneib T (2016a) Scale-dependent priors for variance parameters in structured additive distributional regression. Bayesian Anal 11:1071–1106

    Article  MathSciNet  MATH  Google Scholar 

  • Klein N, Kneib T (2016b) Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Stat Comput 26:841–860

    Article  MathSciNet  MATH  Google Scholar 

  • Klein N, Kneib T, Klasen S, Lang S (2015a) Bayesian structured additive distributional regression for multivariate responses. J R Stat Soc Ser C (Appl Stat) 64:569–591

    Article  MathSciNet  Google Scholar 

  • Klein N, Kneib T, Lang S (2015b) Bayesian generalized additive models for location, scale and shape for zero-inflated and overdispersed count data. J Am Stat Assoc 110:405–419

    Article  MathSciNet  MATH  Google Scholar 

  • Klein N, Kneib T, Lang S, Sohn A (2015c) Bayesian structured additive distributional regression with with an application to regional income inequality in Germany. Ann Appl Stat 9:1024–1052

    Article  MathSciNet  MATH  Google Scholar 

  • Knorr-Held L (2000) Bayesian modelling of inseparable space-time variation in disease risk. Stat Med 19:2555–2567

    Article  Google Scholar 

  • Lang S, Brezger A (2004) Bayesian P-splines. J Comput Graph Stat 13:183–212

    Article  MathSciNet  MATH  Google Scholar 

  • Lang S, Umlauf N, Wechselberger P, Harttgen K, Kneib T (2014) Multilevel structured additive regression. Stat Comput 24:223–238

    Article  MathSciNet  MATH  Google Scholar 

  • Lavine M, Hodges JS (2012) On rigorous specification of icar models. Am Stat 66:42–49

    Article  MathSciNet  Google Scholar 

  • Lee D-J, Durbán M (2011) P-spline ANOVA type interaction models for spatio temporal smoothing. Stat Model 11:46–69

    Article  MathSciNet  Google Scholar 

  • Marí-Dell’Olmo M, Martinez-Beneito MA, Mercè Gotsens M, Palència L (2014) A smoothed anova model for multivariate ecological regression. Stoch Environ Res Risk Assess 28:695–706

    Article  Google Scholar 

  • Marra G, Radice R (2017) Bivariate copula additive models for location, scale and shape. Comput Stat Data Anal 112:99–113

    Article  MathSciNet  MATH  Google Scholar 

  • Marra G, Wood SN (2012) Coverage properties of confidence intervals for generalized additive model components. Scand J Stat 39:53–74

    Article  MathSciNet  MATH  Google Scholar 

  • Paciorek CJ (2007) Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP package. J Stat Softw 19:1–38

    Article  Google Scholar 

  • R Core Team (2017) R: a Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. https://www.R-project.org/. Accessed 29 Jan 2019

  • Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206

    Article  MathSciNet  MATH  Google Scholar 

  • Rigby RA, Stasinopoulos DM (2005) Generalized additive models for location, scale and shape (with discussion). J R Stat Soc Ser C (Appl Stat) 54:507–554

    Article  MATH  Google Scholar 

  • Rodriguez Alvarez MX, Lee D-J, Kneib T, Durban M, Eilers P (2015) Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Stat Comput 25:941–957

    Article  MathSciNet  MATH  Google Scholar 

  • Rue H, Held L (2005) Gaussian Markov random fields. Chapman & Hall/CRC, New York/Boca Raton

    Book  MATH  Google Scholar 

  • Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Simpson D, Rue H, Martins TG, Riebler A, Sørbye SH (2017) Penalising model component complexity: a principled, practical approach to constructing prior. Stat Sci 32(1):1–28

    Article  MathSciNet  MATH  Google Scholar 

  • Stauffer R, Mayr GJ, Messner JW, Umlauf N, Zeileis A (2016) Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int J Climatol 15:3264

    Google Scholar 

  • Stauffer R, Umlauf N, Messner JW, Mayr GJ, Zeileis A (2017) Ensemble postprocessing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies. Mon Weather Rev 145(3):955–969

    Article  Google Scholar 

  • Ugarte MD, Adin A, Goicoa T (2017) One-dimensional, two-dimensional, and three-dimensional B-splines to specify space-time interations in bayesian disease mapping: model fitting and model identifiability. Spat Stat 22:451–468

    Article  MathSciNet  Google Scholar 

  • Umlauf N, Klein N, Zeileis A, Köhler M (2018) bamlss : Bayesian additive models for location scale and shape (and Beyond). R package version 1.0-0. http://CRAN.R-project.org/package=bamlss. Accessed 29 Jan 2019

  • Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline anova for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895

    Article  MathSciNet  MATH  Google Scholar 

  • Wood SN (2006) Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62:1025–1036

    Article  MathSciNet  MATH  Google Scholar 

  • Wood SN (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc Ser B (Stat Methodol) 70:495–518

    Article  MathSciNet  MATH  Google Scholar 

  • Wood S (2015) mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimations. R package version 1.8-5

  • Wood SN (2017) Generalized additive models : an introduction with R. Chapman & Hall/CRC, New York/Boca Raton

    Book  MATH  Google Scholar 

  • Wood SN, Scheipl F, Faraway JJ (2013) Straightforward intermediate rank tensor product smoothing in mixed models. Stat Comput 23:341–360

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank the referees and the associate editor for many valuable comments that lead to a significant improvement in our paper upon the original submission. We are grateful to Jim Hodges for pointing us to the alternative representation of the tensor product precision matrix based on eigen decompositions. Financial support by the German Research Foundation (DFG), Grant KN 922/9-1 is gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Kneib.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This invited paper is discussed in comments available at: https://doi.org/10.1007/s11749-019-00632-y, doi: https://doi.org/10.1007/s11749-019-00633-x, doi: https://doi.org/10.1007/s11749-019-00634-w, and doi: https://doi.org/10.1007/s11749-019-00635-9

Appendices

Appendix

Mixed model decomposition

For the further developments, it is useful to study functional effects with partially improper prior to determine which parts of the function are assigned an informative prior and which ones are assigned a flat prior. While this is hidden in the multivariate normal prior (1), it can be made explicit by reparameterising the basis functions utilising the mixed model representation of structured additive regression terms. This reparameterisation can be obtained from the spectral decomposition of the prior precision matrix as

$$\begin{aligned} {{\varvec{K}}}= {\varvec{\Gamma }}{\varvec{\Omega }}{\varvec{\Gamma }}' \end{aligned}$$

where

$$\begin{aligned} {\varvec{\Omega }}= {{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _D), \qquad \omega _1\le \ldots \le \omega _D \end{aligned}$$

comprises the eigenvalues (in ascending order) and

$$\begin{aligned} {\varvec{\Gamma }}'{\varvec{\Gamma }}={{\varvec{I}}}\end{aligned}$$

is the orthonormal matrix of corresponding eigenvectors. Let now \(M = \dim ({\varvec{\gamma }})-{{\,\mathrm{rk}\,}}({{\varvec{K}}}) = D - R > 0\) denote the rank deficiency of \({{\varvec{K}}}\), then the first M eigenvalues are equal to zero, i.e.  \(\omega _1=\ldots =\omega _M=0\). Accordingly, we split the spectral decomposition into

$$\begin{aligned} {\varvec{\Gamma }}=[{\varvec{\Gamma }}_1,{\varvec{\Gamma }}_2], \qquad {\varvec{\Omega }}= {{\,\mathrm{blockdiag}\,}}({\varvec{\Omega }}_1,{\varvec{\Omega }}_2) \end{aligned}$$

where \({\varvec{\Omega }}_1={{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _M)\) and \({\varvec{\Omega }}_2={{\,\mathrm{diag}\,}}(\omega _{M+1},\ldots ,\omega _D)\) comprise zero and nonzero eigenvalues, respectively, while \({\varvec{\Gamma }}_1\) and \({\varvec{\Gamma }}_2\) are the corresponding matrices of eigenvectors of dimension \(D\times M\) and \(D\times R\). We then obtain a reduced representation of the precision matrix as

$$\begin{aligned} {{\varvec{K}}}={\varvec{\Gamma }}_2{\varvec{\Omega }}_2{\varvec{\Gamma }}_2'. \end{aligned}$$

Based on \({\tilde{{\varvec{X}}}}={\varvec{\Gamma }}_1\) and \({\tilde{{\varvec{Z}}}}={\varvec{\Gamma }}_2{\varvec{\Omega }}_2^{-1/2}\), we can now reparameterise \({\varvec{\gamma }}\) as

$$\begin{aligned} {\varvec{\gamma }}= {\tilde{{\varvec{X}}}}{\varvec{\beta }}+ {\tilde{{\varvec{Z}}}}{\varvec{\alpha }}\end{aligned}$$

where the new regression parameters \({\varvec{\beta }}\) and \({\varvec{\alpha }}\) (which are of dimension M and R, respectively) follow the prior specifications

$$\begin{aligned} p({\varvec{\beta }})\propto {{\,\mathrm{const}\,}}\qquad {\varvec{\alpha }}\sim {{\,\mathrm{N}\,}}(0,\tau ^2{{\varvec{I}}}_R). \end{aligned}$$

This can be interpreted as follows: A flat, noninformative prior is assigned to the vector of regression coefficients \({\varvec{\beta }}\) which therefore represents the part of the function that is not affected by the prior specification while \({\varvec{\alpha }}\) follows an i.i.d. Gaussian prior. In mixed model terminology, this means that \({\varvec{\beta }}\) comprises fixed effects while \({\varvec{\alpha }}\) are i.i.d. Gaussian random effects. Note that \({\tilde{{\varvec{X}}}}\) defines a basis of the null space of the prior precision matrix \({{\varvec{K}}}\) and in fact any alternative basis of this null space can be considered as well. In particular, for specific types of effects one can deduce bases that entail an easier interpretation.

In matrix notation, the mixed model reparameterisation induces a similar decomposition of the vector of function evaluations, i.e.

$$\begin{aligned} {{\varvec{f}}}= {{\varvec{X}}}{\varvec{\beta }}+ {{\varvec{Z}}}{\varvec{\alpha }}\end{aligned}$$

where \({{\varvec{X}}}={{\varvec{B}}}{\tilde{{\varvec{X}}}}\) and \({{\varvec{Z}}}={{\varvec{B}}}{\tilde{{\varvec{Z}}}}\). For the individual function evaluations, we obtain

$$\begin{aligned} f(\nu )= & {} x_1\beta _1 + \ldots + x_M\beta _M + \tilde{f}(\nu )\\= & {} {{\varvec{x}}}'{\varvec{\beta }}+ {{\varvec{z}}}(\nu )'{\varvec{\alpha }}\end{aligned}$$

where \({{\varvec{x}}}'={{\varvec{b}}}(\nu ){\tilde{{\varvec{X}}}}\) and \({{\varvec{z}}}(\nu )'={{\varvec{b}}}(\nu )'{\tilde{{\varvec{Z}}}}\). Following the mixed model representation above this means that \(x_1,\ldots ,x_M\) are covariates with fixed effects, whereas \(\tilde{f}(\nu )\) is a function that is obtained from \(f(\nu )\) by removing these fixed effects by means of a reparameterisation. For the new function \(\tilde{f}(\nu )\), the basis functions are defined in the vector \({{\varvec{z}}}(\nu )\).

Note that by construction the reparameterisation is a one-to-one transformation, and therefore, explicit expressions for the reparameterised regression coefficients can be obtained from

$$\begin{aligned} ({\tilde{{\varvec{X}}}},{\tilde{{\varvec{Z}}}})^{-1} = \begin{pmatrix}{\varvec{\Gamma }}_1'\\ {\varvec{\Omega }}_2^{1/2}{\varvec{\Gamma }}_2'\end{pmatrix} \end{aligned}$$

and are then given by

$$\begin{aligned} {\varvec{\beta }}= {\varvec{\Gamma }}_1'{\varvec{\gamma }}\qquad {\varvec{\alpha }}= {\varvec{\Omega }}_2^{1/2}{\varvec{\Gamma }}_2'{\varvec{\gamma }}. \end{aligned}$$

Algorithm for constrained sampling

As shown in Rue and Held (2005, Algorithm 2.6), a sample \({\varvec{\gamma }}\) can be modified to fulfil the constraint \({{\varvec{A}}}{\varvec{\gamma }}=\mathbf {0}\) applying the following steps:

  • Compute the \(D\times a\) matrix \({{\varvec{V}}}={{\varvec{P}}}^{-1}{{\varvec{A}}}'\) by solving the equation systems \({{\varvec{P}}}{{\varvec{V}}}={{\varvec{A}}}'\) for each of the columns of \({{\varvec{V}}}\), see Rue and Held (2005, Algorithm 2.1).

  • Compute the \(a\times a\) matrix \({{\varvec{W}}}={{\varvec{A}}}{{\varvec{V}}}\).

  • Compute the \(a\times D\) matrix \({{\varvec{U}}}={{\varvec{W}}}^{-1}{{\varvec{V}}}'\) by solving the equation systems \({{\varvec{W}}}{{\varvec{U}}}={{\varvec{V}}}'\) for each of the columns of \({{\varvec{U}}}\).

  • Compute the constrained sample \({\varvec{\gamma }}^*={\varvec{\gamma }}-{{\varvec{U}}}'{{\varvec{A}}}{\varvec{\gamma }}\) where \({\varvec{\gamma }}\) is an unconstrained sample from \({{\,\mathrm{N}\,}}({\varvec{\mu }},{{\varvec{P}}}^{-1})\). Note that all four steps of the sampling scheme have to be conducted in each iteration of the MCMC algorithm since typically the precision matrix \({{\varvec{P}}}\) changes over the iterations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kneib, T., Klein, N., Lang, S. et al. Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions. TEST 28, 1–39 (2019). https://doi.org/10.1007/s11749-019-00631-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-019-00631-z

Keywords

Mathematics Subject Classification

Navigation