Advertisement

Journal of Classification

, Volume 27, Issue 1, pp 89–110 | Cite as

Parsimonious Classification Via Generalized Linear Mixed Models

  • G. Kauermann
  • J. T. Ormerod
  • M. P. Wand
Article

Abstract

We devise a classification algorithm based on generalized linear mixed model (GLMM) technology. The algorithm incorporates spline smoothing, additive model-type structures and model selection. For reasons of speed we employ the Laplace approximation, rather than Monte Carlo methods. Tests on real and simulated data show the algorithm to have good classification performance. Moreover, the resulting classifiers are generally interpretable and parsimonious.

Keywords

Akaike Information Criterion Feature selection Generalized additive models Penalized splines Supervised learning Model selection Rao statistics Variance components 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. BOYD, S., and VANDENBERGHE, L. (2004), Convex Optimization, New York: Cambridge University Press.zbMATHGoogle Scholar
  2. BREIMAN, L. (2001), “Statistical Modeling: The Two Cultures (With Discussion)”, Statistical Science, 16, 199–231.zbMATHCrossRefMathSciNetGoogle Scholar
  3. BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., and STONE, C.J. (1984), Classification and Regression Trees, Belmont, California: Wadsworth Publishing.zbMATHGoogle Scholar
  4. BRESLOW, N.E., and CLAYTON, D.G. (1993), “Approximate Inference in Generalized Linear Mixed Models”, Journal of the American Statistical Association, 88, 9–25.zbMATHCrossRefGoogle Scholar
  5. BUJA, A., HASTIE, T., and TIBSHIRANI, R. (1989), “Linear Smoothers and Additive Models”, The Annals of Statistics, 17, 453–510.zbMATHCrossRefMathSciNetGoogle Scholar
  6. CHAMBERS, J. M., and HASTIE, T. J. (1992), Statistical Models in S, New York: Chapman and Hall.zbMATHGoogle Scholar
  7. COX, D., and KOH, E. (1989), “A Smoothing Spline Based Test of Model Adequacy in Polynomial Regression”, Annals of the Institute of Statistical Mathematics, 41, 383–400.zbMATHCrossRefMathSciNetGoogle Scholar
  8. DURBÁN, M., and CURRIE, I. (2003), “A Note on P-Spline Additive Models with Correlated Errors”, Computational Statistics, 18, 263–292.MathSciNetGoogle Scholar
  9. GRAY, R. J. (1994), “Spline-based Tests in Survival Analysis”, Biometrics, 50, 640–652.zbMATHCrossRefMathSciNetGoogle Scholar
  10. GUYON, I., and ELISSEEFF, A. (2003), “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, 3, 1157–1182.zbMATHCrossRefGoogle Scholar
  11. HAND, D.J. (2006), “Classifier Technology and the Illusion of Progress (With Discussion)”, Statistical Science, 21, 1–34.zbMATHCrossRefMathSciNetGoogle Scholar
  12. HASTIE, T. (2006), “Gam 0.97, R Package”, http://cran.r-project.org.
  13. HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J. (2001), The Elements of Statistical Learning, New York: Springer-Verlag. zbMATHGoogle Scholar
  14. HASTIE, T.J., and TIBSHIRANI,R.J. (1990), Generalized AdditiveModels, London: Chapman and Hall.Google Scholar
  15. IMHOF, J.P. (1961), “Computing the Distribution of Quadratic Forms in Normal Variables”, Biometrika, 48, 419–426.zbMATHMathSciNetGoogle Scholar
  16. KAUERMANN, G., KRIVOBOKOVA, T., and FAHRMEIR, L. (2009), “Some Asymptotic Results on Generalized Penalized Spline Smoothing”, Journal of the Royal Statistical Society, Series B, 71, 487–503.CrossRefGoogle Scholar
  17. KOOPERBERG, C., BOSE, S., and STONE, C.J. (1997), “Polychotomous Regression.”, Journal of the American Statistical Association, 92, 117–127.zbMATHCrossRefGoogle Scholar
  18. LIN, X. (1997), “Variance Component Testing in Generalised Linear Models with Random Effects”, Biometrika, 84, 309–326.zbMATHCrossRefMathSciNetGoogle Scholar
  19. MCCULLOCH, C.E., and SEARLE, S.R. (2000), Generalized, Linear, and Mixed Models, New York: John Wiley and Sons.CrossRefGoogle Scholar
  20. ORMEROD, J.T. (2008), “On Semiparametric Regression and Data Mining”, PhD Thesis, School of Mathematics and Statistics, The University of New South Wales, Sydney, Australia.Google Scholar
  21. RAO, C.R. (1973), Linear Statistical Inference and Its Applications, New York: JohnWiley and Sons.zbMATHCrossRefGoogle Scholar
  22. RUPPERT, D., WAND, M. P., and CARROLL, R.J. (2003), Semiparametric Regression, New York: Cambridge University Press.zbMATHGoogle Scholar
  23. STONE, C. J., HANSEN, M. H., KOOPERBERG, C. ,and TRUONG, Y. K. (1997), “Polynomial Splines and Their Tensor Products in Extended Linear Modeling”, The Annals of Statistics, 25, 1371–1425.zbMATHCrossRefMathSciNetGoogle Scholar
  24. VAIDA, F., and BLANCHARD, S. (2005), “Conditional Akaike Information for Mixedeffect Models”, Biometrika, 92, 351–370.zbMATHCrossRefMathSciNetGoogle Scholar
  25. VERBEKE, G., and MOLENBERGHS, G. (2000), Linear Mixed Models for Longitudinal Data, New York: Springer-Verlag.zbMATHGoogle Scholar
  26. WAGER, C., VAIDA, F., and KAUERMANN, G. (2007), “Model Selection for P-Spline Smoothing Using Akaike Information Criteria”, Australian and New Zealand Journal of Statistics, 49, 173–190.zbMATHCrossRefMathSciNetGoogle Scholar
  27. WAKEFIELD, J.C., BEST, N.G., and WALLER, L. (2000), “Bayesian Approaches to Disease Mapping”, in Spatial Epidemiology, eds. P. Elliott, J.C. Wakefield, N.G. Best, and D.J. Briggs, Oxford: Oxford University Press, pp. 104–127. Google Scholar
  28. WAND, M.P. (2002), “Vector Differential Calculus in Statistics”, The American Statistician, 56, 55–62.CrossRefMathSciNetGoogle Scholar
  29. WAND, M. P. (2003), “Smoothing and Mixed Models”, Computational Statistics, 18, 223–249.zbMATHGoogle Scholar
  30. WAND, M.P. (2007), “Fisher Information for Generalised Linear Mixed Models”, Journal of Multivariate Analysis, 98, 1412–1416.zbMATHCrossRefMathSciNetGoogle Scholar
  31. WAND, M.P., and Ormerod, J.T. (2008), “On Semiparametric Regression with O’Sullivan Penalised Splines”, Australian and New Zealand Journal of Statistics, 50, 179–198.zbMATHCrossRefMathSciNetGoogle Scholar
  32. WELHAM, S.J., CULLIS, B.R., KENWARD, M.G., and THOMPSON, R. (2007), “A Comparison ofMixedModel Splines for Curve Fitting”, Australian and New Zealand Journal of Statistics, 49, 1–23.zbMATHCrossRefMathSciNetGoogle Scholar
  33. WOOD, S.N. (2003), “Thin-plate Regression Splines”, Journal of the Royal Statistical Society, Series B, 65, 95–114.zbMATHCrossRefGoogle Scholar
  34. WOOD, S.N. (2006), “Mgcv 1.3, R Package”, http://cran.r-project.org.
  35. YAU, P., KOHN, R., and WOOD, S. (2003), “Bayesian Variable Selection and Model Averaging in High-Dimensional Multinomial Nonparametric Regression”, Journal of Computational and Graphical Statistics, 12, 1–32.CrossRefMathSciNetGoogle Scholar
  36. ZHANG, D., and LIN, X. (2003), “Hypothesis Testing in Semiparametric Additive Mixed Models”, Biostatistics, 4, 57–74.zbMATHCrossRefGoogle Scholar
  37. ZHAO, Y., STAUDENMAYER, J., COULL, B.A., and WAND, M.P. (2006), “General Design Bayesian Generalized Linear Mixed Models”, Statistical Science, 21, 35–51.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Faculty of EconomicsUniversity BielefeldBielefeldGermany
  2. 2.School of Mathematics and StatisticsUniversity of New South WalesSydneyAustralia
  3. 3.School of Mathematics and Applied StatisticsUniversity of WollongongWollongongAustralia

Personalised recommendations