Advertisement

Statistical Methods & Applications

, Volume 20, Issue 1, pp 65–82 | Cite as

Robust variable selection with application to quality of life research

  • Andreas Alfons
  • Wolfgang E. Baaske
  • Peter Filzmoser
  • Wolfgang Mader
  • Roland Wieser
Article

Abstract

A large database containing socioeconomic data from 60 communities in Austria and Germany has been built, stemming from 18,000 citizens’ responses to a survey, together with data from official statistical institutes about these communities. This paper describes a procedure for extracting a small set of explanatory variables to explain response variables such as the cognition of quality of life. For better interpretability, the set of explanatory variables needs to be very small and the dependencies among the selected variables need to be low. Due to possible inhomogeneities within the data set, it is further required that the solution is robust to outliers and deviating points. In order to achieve these goals, a robust model selection method, combined with a strategy to reduce the number of selected predictor variables to a necessary minimum, is developed. In addition, this context-sensitive method is applied to obtain responsible factors describing quality of life in communities.

Keywords

Robustness Model selection Success factors Quality of life 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfons A (2010) \({\texttt{simFrame}}\): simulation framework. R package version 0.3.4Google Scholar
  2. Alfons A, Templ M, Filzmoser P (2009) \({\texttt{simFrame}}\): an object-oriented framework for statistical simulation. Research Report CS-2009-1. Department of Statistics and Probability Theory, Vienna University of TechnologyGoogle Scholar
  3. Atkinson A, Riani M (2002) Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika 89(4): 939–946zbMATHCrossRefMathSciNetGoogle Scholar
  4. Baaske W, Filzmoser P, Mader W, Wieser R (2009) Agriculture as a success factor for municipalities. In: Jahrbuch der Österreichischen Gesellschaft für Agrarökonomie (ÖGA), vol 18. Facultas Verlag, Vienna, pp 21–30. ISBN 978-3-7089-0432-3Google Scholar
  5. Choi H, Kiefer N (2010) Improving robust model selection tests for dynamic models. Econ J 13(2): 177–204MathSciNetGoogle Scholar
  6. Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation measures. Stat Meth Appl. doi: 10.1007/s10260-010-0142-z (to appear)
  7. Croux C, Filzmoser P, Pison G, Rousseeuw P (2003) Fitting multiplicative models by robust alternating regressions. Stat Comput 13(1): 23–36CrossRefMathSciNetGoogle Scholar
  8. Croux C, Dhaene G, Hoorelbeke D (2008) Robust standard errors for robust estimators. Discussion Papers Series 03.16, KU LeuvenGoogle Scholar
  9. Diener E, Suh E, Lucas R, Smith H (1999) Subjective well-being: three decades of progress. Psychol Bull 125(2): 276–302CrossRefGoogle Scholar
  10. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2): 407–499zbMATHCrossRefMathSciNetGoogle Scholar
  11. Everitt B, Dunn G (2001) Applied multivariate data analysis, 2nd edn. Arnold, London ISBN 0-340-54529-1zbMATHGoogle Scholar
  12. Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52(3): 1694–1711zbMATHCrossRefMathSciNetGoogle Scholar
  13. Furnival G, Wilson R (1974) Regression by leaps and bounds. Technometrics 16(4): 499–511zbMATHCrossRefGoogle Scholar
  14. Gatu C, Kontoghiorghes E (2006) Branch-and-bound algorithms for computing the best-subset regression models. J Comput Graph Stat 15(1): 139–156CrossRefMathSciNetGoogle Scholar
  15. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York ISBN 978-0-387-84857-0zbMATHCrossRefGoogle Scholar
  16. Khan J, Van Aelst S, Zamar R (2007a) Building a robust linear model with forward selection and stepwise procedures. Comput Stat Data Anal 52(1): 239–248zbMATHCrossRefMathSciNetGoogle Scholar
  17. Khan J, Van Aelst S, Zamar R (2007b) Robust linear model selection based on least angle regression. J Am Stat Assoc 102(480): 1289–1299zbMATHCrossRefMathSciNetGoogle Scholar
  18. Lumley T, Miller A (2009) \({\texttt{leaps}}\): regression subset selection. R package version 2.9Google Scholar
  19. Mallows C (1973) Some comments on C p. Technometrics 15(4): 661–675zbMATHCrossRefGoogle Scholar
  20. Maronna R, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4): 307–317CrossRefMathSciNetGoogle Scholar
  21. Maronna R, Martin D, Yohai V (2006) Robust statistics. Wiley, Chichester ISBN 978-0-470-01092-1zbMATHCrossRefGoogle Scholar
  22. McCann L, Welsch R (2007) Robust variable selection using least angle regression and elemental set sampling. Comput Stat Data Anal 52(1): 249–257zbMATHCrossRefMathSciNetGoogle Scholar
  23. Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton ISBN 1-58488-171-2zbMATHCrossRefGoogle Scholar
  24. Müller S, Welsh A (2005) Outlier robust model selection in linear regression. J Am Stat Assoc 100(472): 1297–1310zbMATHCrossRefGoogle Scholar
  25. R Development Core Team (2010) R: a Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN 3-900051-07-0
  26. Renwick R, Brown I, Raphael D (1994) Quality of life: linking conceptual approach to service provision. J Dev Disabil 3(2): 32–44Google Scholar
  27. Riani M, Atkinson A (2010) Robust model selection with flexible trimming. Comput Stat Data Anal 54(12): 3300–3312CrossRefGoogle Scholar
  28. Ronchetti E, Staudte R (1994) A robust version of Mallows’s C p. J Am Stat Assoc 89(426): 550–559zbMATHCrossRefMathSciNetGoogle Scholar
  29. Ronchetti E, Field C, Blanchard W (1997) Robust linear model selection by cross-validation. J Am Stat Assoc 92(439): 1017–1023zbMATHCrossRefMathSciNetGoogle Scholar
  30. Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley, New York ISBN 0-471-48855-0zbMATHCrossRefGoogle Scholar
  31. Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3): 212–223CrossRefMathSciNetGoogle Scholar
  32. Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Maechler M (2009) \({\texttt{robustbase}}\): basic robust statistics. R package version 0.5-0-1Google Scholar
  33. Salibian-Barrera M, Van Aelst S (2008) Robust model selection using fast and robust bootstrap. Comput Stat Data Anal 52(12): 5121–5135zbMATHCrossRefMathSciNetGoogle Scholar
  34. Salibian-Barrera M, Zamar R (2002) Bootstrapping robust estimates of regression. Ann Stat 30(2): 556–582zbMATHCrossRefMathSciNetGoogle Scholar
  35. Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2): 461–464zbMATHCrossRefGoogle Scholar
  36. Tichbon C, Newton P (2002) Life is do-able: quality of life development in a supportive small group setting. Occasional Paper Series 2, Mental Health Foundation of New ZealandGoogle Scholar
  37. Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6): 520–525CrossRefGoogle Scholar
  38. Van Aelst S, Welsch R, Zamar R (eds) (2010) Special issue on variable selection and robust procedures. Comput Stat Data Anal 54(12)Google Scholar
  39. Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton ISBN 978-0-470-98581-6CrossRefGoogle Scholar
  40. Wisnowski J, Simpson J, Montgomery D, Runger G (2003) Resampling methods for variable selection in robust regression. Comput Stat Data Anal 43(3): 341–355zbMATHCrossRefMathSciNetGoogle Scholar
  41. Yohai V (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(20): 642–656zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Andreas Alfons
    • 1
  • Wolfgang E. Baaske
    • 2
  • Peter Filzmoser
    • 1
  • Wolfgang Mader
    • 3
  • Roland Wieser
    • 2
  1. 1.Department of Statistics and Probability TheoryVienna University of TechnologyViennaAustria
  2. 2.STUDIA-Schlierbach, Studienzentrum für internationale AnalysenSchlierbachAustria
  3. 3.SPES AcademySchlierbachAustria

Personalised recommendations