Skip to main content
Log in

Robust variable selection with application to quality of life research

  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

A large database containing socioeconomic data from 60 communities in Austria and Germany has been built, stemming from 18,000 citizens’ responses to a survey, together with data from official statistical institutes about these communities. This paper describes a procedure for extracting a small set of explanatory variables to explain response variables such as the cognition of quality of life. For better interpretability, the set of explanatory variables needs to be very small and the dependencies among the selected variables need to be low. Due to possible inhomogeneities within the data set, it is further required that the solution is robust to outliers and deviating points. In order to achieve these goals, a robust model selection method, combined with a strategy to reduce the number of selected predictor variables to a necessary minimum, is developed. In addition, this context-sensitive method is applied to obtain responsible factors describing quality of life in communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alfons A (2010) \({\texttt{simFrame}}\): simulation framework. R package version 0.3.4

  • Alfons A, Templ M, Filzmoser P (2009) \({\texttt{simFrame}}\): an object-oriented framework for statistical simulation. Research Report CS-2009-1. Department of Statistics and Probability Theory, Vienna University of Technology

  • Atkinson A, Riani M (2002) Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika 89(4): 939–946

    Article  MATH  MathSciNet  Google Scholar 

  • Baaske W, Filzmoser P, Mader W, Wieser R (2009) Agriculture as a success factor for municipalities. In: Jahrbuch der Österreichischen Gesellschaft für Agrarökonomie (ÖGA), vol 18. Facultas Verlag, Vienna, pp 21–30. ISBN 978-3-7089-0432-3

  • Choi H, Kiefer N (2010) Improving robust model selection tests for dynamic models. Econ J 13(2): 177–204

    MathSciNet  Google Scholar 

  • Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation measures. Stat Meth Appl. doi:10.1007/s10260-010-0142-z (to appear)

  • Croux C, Filzmoser P, Pison G, Rousseeuw P (2003) Fitting multiplicative models by robust alternating regressions. Stat Comput 13(1): 23–36

    Article  MathSciNet  Google Scholar 

  • Croux C, Dhaene G, Hoorelbeke D (2008) Robust standard errors for robust estimators. Discussion Papers Series 03.16, KU Leuven

  • Diener E, Suh E, Lucas R, Smith H (1999) Subjective well-being: three decades of progress. Psychol Bull 125(2): 276–302

    Article  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2): 407–499

    Article  MATH  MathSciNet  Google Scholar 

  • Everitt B, Dunn G (2001) Applied multivariate data analysis, 2nd edn. Arnold, London ISBN 0-340-54529-1

    MATH  Google Scholar 

  • Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52(3): 1694–1711

    Article  MATH  MathSciNet  Google Scholar 

  • Furnival G, Wilson R (1974) Regression by leaps and bounds. Technometrics 16(4): 499–511

    Article  MATH  Google Scholar 

  • Gatu C, Kontoghiorghes E (2006) Branch-and-bound algorithms for computing the best-subset regression models. J Comput Graph Stat 15(1): 139–156

    Article  MathSciNet  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York ISBN 978-0-387-84857-0

    Book  MATH  Google Scholar 

  • Khan J, Van Aelst S, Zamar R (2007a) Building a robust linear model with forward selection and stepwise procedures. Comput Stat Data Anal 52(1): 239–248

    Article  MATH  MathSciNet  Google Scholar 

  • Khan J, Van Aelst S, Zamar R (2007b) Robust linear model selection based on least angle regression. J Am Stat Assoc 102(480): 1289–1299

    Article  MATH  MathSciNet  Google Scholar 

  • Lumley T, Miller A (2009) \({\texttt{leaps}}\): regression subset selection. R package version 2.9

  • Mallows C (1973) Some comments on C p . Technometrics 15(4): 661–675

    Article  MATH  Google Scholar 

  • Maronna R, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4): 307–317

    Article  MathSciNet  Google Scholar 

  • Maronna R, Martin D, Yohai V (2006) Robust statistics. Wiley, Chichester ISBN 978-0-470-01092-1

    Book  MATH  Google Scholar 

  • McCann L, Welsch R (2007) Robust variable selection using least angle regression and elemental set sampling. Comput Stat Data Anal 52(1): 249–257

    Article  MATH  MathSciNet  Google Scholar 

  • Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton ISBN 1-58488-171-2

    Book  MATH  Google Scholar 

  • Müller S, Welsh A (2005) Outlier robust model selection in linear regression. J Am Stat Assoc 100(472): 1297–1310

    Article  MATH  Google Scholar 

  • R Development Core Team (2010) R: a Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN 3-900051-07-0

  • Renwick R, Brown I, Raphael D (1994) Quality of life: linking conceptual approach to service provision. J Dev Disabil 3(2): 32–44

    Google Scholar 

  • Riani M, Atkinson A (2010) Robust model selection with flexible trimming. Comput Stat Data Anal 54(12): 3300–3312

    Article  Google Scholar 

  • Ronchetti E, Staudte R (1994) A robust version of Mallows’s C p . J Am Stat Assoc 89(426): 550–559

    Article  MATH  MathSciNet  Google Scholar 

  • Ronchetti E, Field C, Blanchard W (1997) Robust linear model selection by cross-validation. J Am Stat Assoc 92(439): 1017–1023

    Article  MATH  MathSciNet  Google Scholar 

  • Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley, New York ISBN 0-471-48855-0

    Book  MATH  Google Scholar 

  • Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3): 212–223

    Article  MathSciNet  Google Scholar 

  • Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Maechler M (2009) \({\texttt{robustbase}}\): basic robust statistics. R package version 0.5-0-1

  • Salibian-Barrera M, Van Aelst S (2008) Robust model selection using fast and robust bootstrap. Comput Stat Data Anal 52(12): 5121–5135

    Article  MATH  MathSciNet  Google Scholar 

  • Salibian-Barrera M, Zamar R (2002) Bootstrapping robust estimates of regression. Ann Stat 30(2): 556–582

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2): 461–464

    Article  MATH  Google Scholar 

  • Tichbon C, Newton P (2002) Life is do-able: quality of life development in a supportive small group setting. Occasional Paper Series 2, Mental Health Foundation of New Zealand

  • Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6): 520–525

    Article  Google Scholar 

  • Van Aelst S, Welsch R, Zamar R (eds) (2010) Special issue on variable selection and robust procedures. Comput Stat Data Anal 54(12)

  • Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton ISBN 978-0-470-98581-6

    Book  Google Scholar 

  • Wisnowski J, Simpson J, Montgomery D, Runger G (2003) Resampling methods for variable selection in robust regression. Comput Stat Data Anal 43(3): 341–355

    Article  MATH  MathSciNet  Google Scholar 

  • Yohai V (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(20): 642–656

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreas Alfons.

Additional information

The research was supported by a grant of the Austrian Research Promotion Agency (FFG), Project Ref. No. 813000/10345.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alfons, A., Baaske, W.E., Filzmoser, P. et al. Robust variable selection with application to quality of life research. Stat Methods Appl 20, 65–82 (2011). https://doi.org/10.1007/s10260-010-0151-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-010-0151-y

Keywords

Navigation