Abstract
Semiparametric regression models offer considerable flexibility concerning the specification of additive regression predictors including effects as diverse as nonlinear effects of continuous covariates, spatial effects, random effects, or varying coefficients. Recently, such flexible model predictors have been combined with the possibility to go beyond pure mean-based analyses by specifying regression predictors on potentially all parameters of the response distribution in a distributional regression framework. In this paper, we discuss a generic concept for defining interaction effects in such semiparametric distributional regression models based on tensor products of main effects. These interactions can be assigned anisotropic penalties, i.e. different amounts of smoothness will be associated with the interacting covariates. We investigate identifiability and the decomposition of interactions into main effects and pure interaction effects (similar as in a smoothing spline analysis of variance) to facilitate a modular model building process. The decomposition is based on orthogonality in function spaces which allows for considerable flexibility in setting up the effect decomposition. Inference is based on Markov chain Monte Carlo simulations with iteratively weighted least squares proposals under constraints to ensure identifiability and effect decomposition. One important aspect is therefore to maintain sparse matrix structures of the tensor product also in identifiable, decomposed model formulations. The performance of modular regression is verified in a simulation on decomposed interaction surfaces of two continuous covariates and two applications on the construction of spatio-temporal interactions for the analysis of precipitation on the one hand and functional random effects for analysing house prices on the other hand.
Similar content being viewed by others
References
Adler D, Kneib T, Lang S, Umlauf N, Zeileis A (2012) BayesXsrc: R Package Distribution of the BayesX C++ Sources. R package version 3.0-0. https://CRAN.R-project.org/package=BayesXsrc. Accessed 29 Jan 2019
Belitz C, Brezger A, Klein N, Kneib T, Lang S, Umlauf N (2015) BayesX—Software for Bayesian inference in structured additive regression models. Version 3.0.2. http://www.bayesx.org. Accessed 29 Jan 2019
Besag J, Higdon D (1999) Bayesian analysis of agricultural field experiments. J R Stat Soc Ser B (Methodol) 61:691–746
Brezger A, Lang S (2006) Generalized structured additive regression based on Bayesian P-splines. Comput Stat Data Anal 50:967–991
Fahrmeir L, Kneib T (2011) Bayesian smoothing and regression for longitudinal, spatial and event history data. Oxford University Press, New York
Fahrmeir L, Kneib T, Lang S (2004) Penalized structured additive regression for space–time data: a Bayesian perspective. Stat Sin 14:731–761
Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression—models, methods and applications. Springer, Berlin
Gamerman D (1997) Sampling from the posterior distribution in generalized linear mixed models. Stat Comput 7:57–68
Gelfand AE, Sahu SK (1999) Identifiability, improper priors, and Gibbs sampling for generalized linear models. J Am Stat Assoc 94:247–253
Gelman A (2006) Prior distributions for variance parameters in hierarchichal models. Bayesian Anal 1:515–533
Goicoa T, Adin A, Ugarte MD, Hodges JS (2018) In spatio-temporal disease mapping models, identifiability constraints affet PQL and INLA results. Stoch Environ Res Risk Assess 32:749–770
Gu C (2002) Smoothing spline ANOVA models. Springer, New York
Hodges J S (2013) Richly parameterized linear models: additive, time series, and spatial models using random effects. Chapman & Hall/CRC, New York/Boca Raton
Hughes J, Haran M (2013) Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J R Stat Soc Ser B (Stat Methodol) 75:139–159
Klein N (2018) sdPrior: scale-dependent hyperpriors in structured additive distributional regression. R package version 1.0
Klein N, Kneib T (2016a) Scale-dependent priors for variance parameters in structured additive distributional regression. Bayesian Anal 11:1071–1106
Klein N, Kneib T (2016b) Simultaneous inference in structured additive conditional copula regression models: a unifying Bayesian approach. Stat Comput 26:841–860
Klein N, Kneib T, Klasen S, Lang S (2015a) Bayesian structured additive distributional regression for multivariate responses. J R Stat Soc Ser C (Appl Stat) 64:569–591
Klein N, Kneib T, Lang S (2015b) Bayesian generalized additive models for location, scale and shape for zero-inflated and overdispersed count data. J Am Stat Assoc 110:405–419
Klein N, Kneib T, Lang S, Sohn A (2015c) Bayesian structured additive distributional regression with with an application to regional income inequality in Germany. Ann Appl Stat 9:1024–1052
Knorr-Held L (2000) Bayesian modelling of inseparable space-time variation in disease risk. Stat Med 19:2555–2567
Lang S, Brezger A (2004) Bayesian P-splines. J Comput Graph Stat 13:183–212
Lang S, Umlauf N, Wechselberger P, Harttgen K, Kneib T (2014) Multilevel structured additive regression. Stat Comput 24:223–238
Lavine M, Hodges JS (2012) On rigorous specification of icar models. Am Stat 66:42–49
Lee D-J, Durbán M (2011) P-spline ANOVA type interaction models for spatio temporal smoothing. Stat Model 11:46–69
Marí-Dell’Olmo M, Martinez-Beneito MA, Mercè Gotsens M, Palència L (2014) A smoothed anova model for multivariate ecological regression. Stoch Environ Res Risk Assess 28:695–706
Marra G, Radice R (2017) Bivariate copula additive models for location, scale and shape. Comput Stat Data Anal 112:99–113
Marra G, Wood SN (2012) Coverage properties of confidence intervals for generalized additive model components. Scand J Stat 39:53–74
Paciorek CJ (2007) Bayesian smoothing with Gaussian processes using Fourier basis functions in the spectralGP package. J Stat Softw 19:1–38
R Core Team (2017) R: a Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. https://www.R-project.org/. Accessed 29 Jan 2019
Reich BJ, Hodges JS, Zadnik V (2006) Effects of residual smoothing on the posterior of the fixed effects in disease-mapping models. Biometrics 62:1197–1206
Rigby RA, Stasinopoulos DM (2005) Generalized additive models for location, scale and shape (with discussion). J R Stat Soc Ser C (Appl Stat) 54:507–554
Rodriguez Alvarez MX, Lee D-J, Kneib T, Durban M, Eilers P (2015) Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Stat Comput 25:941–957
Rue H, Held L (2005) Gaussian Markov random fields. Chapman & Hall/CRC, New York/Boca Raton
Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge University Press, Cambridge
Simpson D, Rue H, Martins TG, Riebler A, Sørbye SH (2017) Penalising model component complexity: a principled, practical approach to constructing prior. Stat Sci 32(1):1–28
Stauffer R, Mayr GJ, Messner JW, Umlauf N, Zeileis A (2016) Spatio-temporal precipitation climatology over complex terrain using a censored additive regression model. Int J Climatol 15:3264
Stauffer R, Umlauf N, Messner JW, Mayr GJ, Zeileis A (2017) Ensemble postprocessing of daily precipitation sums over complex terrain using censored high-resolution standardized anomalies. Mon Weather Rev 145(3):955–969
Ugarte MD, Adin A, Goicoa T (2017) One-dimensional, two-dimensional, and three-dimensional B-splines to specify space-time interations in bayesian disease mapping: model fitting and model identifiability. Spat Stat 22:451–468
Umlauf N, Klein N, Zeileis A, Köhler M (2018) bamlss : Bayesian additive models for location scale and shape (and Beyond). R package version 1.0-0. http://CRAN.R-project.org/package=bamlss. Accessed 29 Jan 2019
Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline anova for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895
Wood SN (2006) Low-rank scale-invariant tensor product smooths for generalized additive mixed models. Biometrics 62:1025–1036
Wood SN (2008) Fast stable direct fitting and smoothness selection for generalized additive models. J R Stat Soc Ser B (Stat Methodol) 70:495–518
Wood S (2015) mgcv: Mixed GAM computation vehicle with GCV/AIC/REML smoothness estimations. R package version 1.8-5
Wood SN (2017) Generalized additive models : an introduction with R. Chapman & Hall/CRC, New York/Boca Raton
Wood SN, Scheipl F, Faraway JJ (2013) Straightforward intermediate rank tensor product smoothing in mixed models. Stat Comput 23:341–360
Acknowledgements
We thank the referees and the associate editor for many valuable comments that lead to a significant improvement in our paper upon the original submission. We are grateful to Jim Hodges for pointing us to the alternative representation of the tensor product precision matrix based on eigen decompositions. Financial support by the German Research Foundation (DFG), Grant KN 922/9-1 is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This invited paper is discussed in comments available at: https://doi.org/10.1007/s11749-019-00632-y, doi: https://doi.org/10.1007/s11749-019-00633-x, doi: https://doi.org/10.1007/s11749-019-00634-w, and doi: https://doi.org/10.1007/s11749-019-00635-9
Appendices
Appendix
Mixed model decomposition
For the further developments, it is useful to study functional effects with partially improper prior to determine which parts of the function are assigned an informative prior and which ones are assigned a flat prior. While this is hidden in the multivariate normal prior (1), it can be made explicit by reparameterising the basis functions utilising the mixed model representation of structured additive regression terms. This reparameterisation can be obtained from the spectral decomposition of the prior precision matrix as
where
comprises the eigenvalues (in ascending order) and
is the orthonormal matrix of corresponding eigenvectors. Let now \(M = \dim ({\varvec{\gamma }})-{{\,\mathrm{rk}\,}}({{\varvec{K}}}) = D - R > 0\) denote the rank deficiency of \({{\varvec{K}}}\), then the first M eigenvalues are equal to zero, i.e. \(\omega _1=\ldots =\omega _M=0\). Accordingly, we split the spectral decomposition into
where \({\varvec{\Omega }}_1={{\,\mathrm{diag}\,}}(\omega _1,\ldots ,\omega _M)\) and \({\varvec{\Omega }}_2={{\,\mathrm{diag}\,}}(\omega _{M+1},\ldots ,\omega _D)\) comprise zero and nonzero eigenvalues, respectively, while \({\varvec{\Gamma }}_1\) and \({\varvec{\Gamma }}_2\) are the corresponding matrices of eigenvectors of dimension \(D\times M\) and \(D\times R\). We then obtain a reduced representation of the precision matrix as
Based on \({\tilde{{\varvec{X}}}}={\varvec{\Gamma }}_1\) and \({\tilde{{\varvec{Z}}}}={\varvec{\Gamma }}_2{\varvec{\Omega }}_2^{-1/2}\), we can now reparameterise \({\varvec{\gamma }}\) as
where the new regression parameters \({\varvec{\beta }}\) and \({\varvec{\alpha }}\) (which are of dimension M and R, respectively) follow the prior specifications
This can be interpreted as follows: A flat, noninformative prior is assigned to the vector of regression coefficients \({\varvec{\beta }}\) which therefore represents the part of the function that is not affected by the prior specification while \({\varvec{\alpha }}\) follows an i.i.d. Gaussian prior. In mixed model terminology, this means that \({\varvec{\beta }}\) comprises fixed effects while \({\varvec{\alpha }}\) are i.i.d. Gaussian random effects. Note that \({\tilde{{\varvec{X}}}}\) defines a basis of the null space of the prior precision matrix \({{\varvec{K}}}\) and in fact any alternative basis of this null space can be considered as well. In particular, for specific types of effects one can deduce bases that entail an easier interpretation.
In matrix notation, the mixed model reparameterisation induces a similar decomposition of the vector of function evaluations, i.e.
where \({{\varvec{X}}}={{\varvec{B}}}{\tilde{{\varvec{X}}}}\) and \({{\varvec{Z}}}={{\varvec{B}}}{\tilde{{\varvec{Z}}}}\). For the individual function evaluations, we obtain
where \({{\varvec{x}}}'={{\varvec{b}}}(\nu ){\tilde{{\varvec{X}}}}\) and \({{\varvec{z}}}(\nu )'={{\varvec{b}}}(\nu )'{\tilde{{\varvec{Z}}}}\). Following the mixed model representation above this means that \(x_1,\ldots ,x_M\) are covariates with fixed effects, whereas \(\tilde{f}(\nu )\) is a function that is obtained from \(f(\nu )\) by removing these fixed effects by means of a reparameterisation. For the new function \(\tilde{f}(\nu )\), the basis functions are defined in the vector \({{\varvec{z}}}(\nu )\).
Note that by construction the reparameterisation is a one-to-one transformation, and therefore, explicit expressions for the reparameterised regression coefficients can be obtained from
and are then given by
Algorithm for constrained sampling
As shown in Rue and Held (2005, Algorithm 2.6), a sample \({\varvec{\gamma }}\) can be modified to fulfil the constraint \({{\varvec{A}}}{\varvec{\gamma }}=\mathbf {0}\) applying the following steps:
-
Compute the \(D\times a\) matrix \({{\varvec{V}}}={{\varvec{P}}}^{-1}{{\varvec{A}}}'\) by solving the equation systems \({{\varvec{P}}}{{\varvec{V}}}={{\varvec{A}}}'\) for each of the columns of \({{\varvec{V}}}\), see Rue and Held (2005, Algorithm 2.1).
-
Compute the \(a\times a\) matrix \({{\varvec{W}}}={{\varvec{A}}}{{\varvec{V}}}\).
-
Compute the \(a\times D\) matrix \({{\varvec{U}}}={{\varvec{W}}}^{-1}{{\varvec{V}}}'\) by solving the equation systems \({{\varvec{W}}}{{\varvec{U}}}={{\varvec{V}}}'\) for each of the columns of \({{\varvec{U}}}\).
-
Compute the constrained sample \({\varvec{\gamma }}^*={\varvec{\gamma }}-{{\varvec{U}}}'{{\varvec{A}}}{\varvec{\gamma }}\) where \({\varvec{\gamma }}\) is an unconstrained sample from \({{\,\mathrm{N}\,}}({\varvec{\mu }},{{\varvec{P}}}^{-1})\). Note that all four steps of the sampling scheme have to be conducted in each iteration of the MCMC algorithm since typically the precision matrix \({{\varvec{P}}}\) changes over the iterations.
Rights and permissions
About this article
Cite this article
Kneib, T., Klein, N., Lang, S. et al. Modular regression - a Lego system for building structured additive distributional regression models with tensor product interactions. TEST 28, 1–39 (2019). https://doi.org/10.1007/s11749-019-00631-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-019-00631-z
Keywords
- Constrained sampling
- Distributional regression
- Functional random effects
- Markov chain Monte Carlo simulations
- Penalised splines
- Smoothing spline analysis of variance
- Space–time models
- Tensor product interactions