Statistics and Computing

, Volume 17, Issue 2, pp 179–192 | Cite as

Feature significance in generalized additive models

  • B. Ganguli
  • M. P. Wand
Article

Abstract

This paper develops inference for the significance of features such as peaks and valleys observed in additive modeling through an extension of the SiZer-type methodology of Chaudhuri and Marron (1999) and Godtliebsen et al. (2002, 2004) to the case where the outcome is discrete. We consider the problem of determining the significance of features such as peaks or valleys in observed covariate effects both for the case of additive modeling where the main predictor of interest is univariate as well as the problem of studying the significance of features such as peaks, inclines, ridges and valleys when the main predictor of interest is geographical location. We work with low rank radial spline smoothers to allow to the handling of sparse designs and large sample sizes. Reducing the problem to a Generalised Linear Mixed Model (GLMM) framework enables derivation of simulation-based critical value approximations and guards against the problem of multiple inferences over a range of predictor values. Such a reduction also allows for easy adjustment for confounders including those which have an unknown or complex effect on the outcome. A simulation study indicates that our method has satisfactory power. Finally, we illustrate our methodology on several data sets.

Keywords

Additive models Best linear unbiased prediction (BLUP) Bivariate smoothing Generalised linear mixed models Geostatistics Low-rank mixed models Penalised splines Penalised quasi-likelihood (PQL) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berndt E.R. 1991. The Practice of Econometrics: Classical and Contemporary. Addison-Wesley: Reading, Massachusetts.Google Scholar
  2. Breslow N.E. and Clayton D.G. 1993. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association 88: 9–25.Google Scholar
  3. Chaudhuri P. and Marron J.S. 1999. SiZer for exploration of structures in curves. Journal of the American Statistical Association 94: 807–823.MATHCrossRefMathSciNetGoogle Scholar
  4. Chaudhuri P. and Marron J.S. 2000. Scale space view of curve estimation. The Annals of Statistics 28: 408–428.MATHCrossRefMathSciNetGoogle Scholar
  5. Cressie N. 1989. Geostatistics. The American Statistician 43: 197–202.CrossRefGoogle Scholar
  6. Fan J., Heckman N.E., and Wand M.P. 1995. Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. Journal of the American Statistical Association 90: 141–150.MATHCrossRefMathSciNetGoogle Scholar
  7. French J.L., Kammann E.E., and Wand M.P. 2001. Comment on paper by Ke and Wang. Journal of the American Statistical Association 96: 1285–1288.Google Scholar
  8. Ganguli B. and Wand M.P. 2004. Feature significance in geostatistics. Journal of Computational and Graphical Statistics 13: 954–973.CrossRefMathSciNetGoogle Scholar
  9. Godtliebsen F., Marron J.S., and Chaudhuri P. 2002. Significance in scale space for bivariate density estimation. Journal of Computational and Graphical Statistics 11: 1–22.CrossRefMathSciNetGoogle Scholar
  10. Godtliebsen F., Marron J.S., and Chaudhuri P. 2004. Statistical significance of features in digital images. Image and Vision Computing 13: 1093–1104.CrossRefGoogle Scholar
  11. Green P.J. and Silverman B.W. 1994. Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London.Google Scholar
  12. Hastie T. 1996. Pseudosplines. Journal of the Royal Statistical Society, Series B 58: 379–396.MATHMathSciNetGoogle Scholar
  13. Kammann E.E. and Wand M.P. 2003. Geoadditive models. Applied Statistics 52: 1–18.Google Scholar
  14. Kaufman L. and Rousseeuw P.J. 1990. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.Google Scholar
  15. Nychka D. and Saltzman N. 1998. Design of Air Quality Monitoring Networks. In: D. Nychka, L. Cox, and W. Piegorsch (Eds.), Case Studies in Environmental Statistics, Lecture Notes in Statistics, Springer-Verlag, pp. 51–76.Google Scholar
  16. Ruppert D. and Wand M.P. 1994. Multivariate locally weighted least squares regression. The Annals of Statistics 22: 1346–1370.MATHMathSciNetGoogle Scholar
  17. Wolfinger R. and O’Connell M. 1993. Generalized linear mixed models: A pseudo-likelihood approach. Journal Statistical Computation and Simulation 48: 233–243.Google Scholar
  18. Zanobetti A., Wand M.P., Schwartz J., and Ryan L.M. 2000. Generalized additive distributed lag models. Biostatistics 1: 279–292.MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • B. Ganguli
    • 1
  • M. P. Wand
    • 2
  1. 1.Department of StatisticsUniversity of CalcuttaCalcuttaIndia
  2. 2.Department of Statistics, School of Mathematics and StatisticsUniversity of New South WalesSydneyAustralia

Personalised recommendations