Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Feature significance in generalized additive models

Abstract

This paper develops inference for the significance of features such as peaks and valleys observed in additive modeling through an extension of the SiZer-type methodology of Chaudhuri and Marron (1999) and Godtliebsen et al. (2002, 2004) to the case where the outcome is discrete. We consider the problem of determining the significance of features such as peaks or valleys in observed covariate effects both for the case of additive modeling where the main predictor of interest is univariate as well as the problem of studying the significance of features such as peaks, inclines, ridges and valleys when the main predictor of interest is geographical location. We work with low rank radial spline smoothers to allow to the handling of sparse designs and large sample sizes. Reducing the problem to a Generalised Linear Mixed Model (GLMM) framework enables derivation of simulation-based critical value approximations and guards against the problem of multiple inferences over a range of predictor values. Such a reduction also allows for easy adjustment for confounders including those which have an unknown or complex effect on the outcome. A simulation study indicates that our method has satisfactory power. Finally, we illustrate our methodology on several data sets.

This is a preview of subscription content, log in to check access.

References

  1. Berndt E.R. 1991. The Practice of Econometrics: Classical and Contemporary. Addison-Wesley: Reading, Massachusetts.

  2. Breslow N.E. and Clayton D.G. 1993. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88: 9–25.

  3. Chaudhuri P. and Marron J.S. 1999. SiZer for exploration of structures in curves. Journal of the American Statistical Association 94: 807–823.

  4. Chaudhuri P. and Marron J.S. 2000. Scale space view of curve estimation. The Annals of Statistics 28: 408–428.

  5. Cressie N. 1989. Geostatistics. The American Statistician 43: 197–202.

  6. Fan J., Heckman N.E., and Wand M.P. 1995. Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. Journal of the American Statistical Association 90: 141–150.

  7. French J.L., Kammann E.E., and Wand M.P. 2001. Comment on paper by Ke and Wang. Journal of the American Statistical Association 96: 1285–1288.

  8. Ganguli B. and Wand M.P. 2004. Feature significance in geostatistics. Journal of Computational and Graphical Statistics 13: 954–973.

  9. Godtliebsen F., Marron J.S., and Chaudhuri P. 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11: 1–22.

  10. Godtliebsen F., Marron J.S., and Chaudhuri P. 2004. Statistical significance of features in digital images. Image and Vision Computing 13: 1093–1104.

  11. Green P.J. and Silverman B.W. 1994. Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.

  12. Hastie T. 1996. Pseudosplines. Journal of the Royal Statistical Society, Series B 58: 379–396.

  13. Kammann E.E. and Wand M.P. 2003. Geoadditive models. Applied Statistics 52: 1–18.

  14. Kaufman L. and Rousseeuw P.J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.

  15. Nychka D. and Saltzman N. 1998. Design of Air Quality Monitoring Networks. In: D. Nychka, L. Cox, and W. Piegorsch (Eds.), Case Studies in Environmental Statistics, Lecture Notes in Statistics, Springer-Verlag, pp. 51–76.

  16. Ruppert D. and Wand M.P. 1994. Multivariate locally weighted least squares regression. The Annals of Statistics 22: 1346–1370.

  17. Wolfinger R. and O’Connell M. 1993. Generalized linear mixed models: A pseudo-likelihood approach. Journal Statistical Computation and Simulation 48: 233–243.

  18. Zanobetti A., Wand M.P., Schwartz J., and Ryan L.M. 2000. Generalized additive distributed lag models. Biostatistics 1: 279–292.

Download references

Author information

Correspondence to B. Ganguli.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ganguli, B., Wand, M.P. Feature significance in generalized additive models. Stat Comput 17, 179–192 (2007). https://doi.org/10.1007/s11222-006-9011-x

Download citation

Keywords

  • Additive models
  • Best linear unbiased prediction (BLUP)
  • Bivariate smoothing
  • Generalised linear mixed models
  • Geostatistics
  • Low-rank mixed models
  • Penalised splines
  • Penalised quasi-likelihood (PQL)