Abstract
A large database containing socioeconomic data from 60 communities in Austria and Germany has been built, stemming from 18,000 citizens’ responses to a survey, together with data from official statistical institutes about these communities. This paper describes a procedure for extracting a small set of explanatory variables to explain response variables such as the cognition of quality of life. For better interpretability, the set of explanatory variables needs to be very small and the dependencies among the selected variables need to be low. Due to possible inhomogeneities within the data set, it is further required that the solution is robust to outliers and deviating points. In order to achieve these goals, a robust model selection method, combined with a strategy to reduce the number of selected predictor variables to a necessary minimum, is developed. In addition, this context-sensitive method is applied to obtain responsible factors describing quality of life in communities.
Similar content being viewed by others
References
Alfons A (2010) \({\texttt{simFrame}}\): simulation framework. R package version 0.3.4
Alfons A, Templ M, Filzmoser P (2009) \({\texttt{simFrame}}\): an object-oriented framework for statistical simulation. Research Report CS-2009-1. Department of Statistics and Probability Theory, Vienna University of Technology
Atkinson A, Riani M (2002) Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika 89(4): 939–946
Baaske W, Filzmoser P, Mader W, Wieser R (2009) Agriculture as a success factor for municipalities. In: Jahrbuch der Österreichischen Gesellschaft für Agrarökonomie (ÖGA), vol 18. Facultas Verlag, Vienna, pp 21–30. ISBN 978-3-7089-0432-3
Choi H, Kiefer N (2010) Improving robust model selection tests for dynamic models. Econ J 13(2): 177–204
Croux C, Dehon C (2010) Influence functions of the Spearman and Kendall correlation measures. Stat Meth Appl. doi:10.1007/s10260-010-0142-z (to appear)
Croux C, Filzmoser P, Pison G, Rousseeuw P (2003) Fitting multiplicative models by robust alternating regressions. Stat Comput 13(1): 23–36
Croux C, Dhaene G, Hoorelbeke D (2008) Robust standard errors for robust estimators. Discussion Papers Series 03.16, KU Leuven
Diener E, Suh E, Lucas R, Smith H (1999) Subjective well-being: three decades of progress. Psychol Bull 125(2): 276–302
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2): 407–499
Everitt B, Dunn G (2001) Applied multivariate data analysis, 2nd edn. Arnold, London ISBN 0-340-54529-1
Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52(3): 1694–1711
Furnival G, Wilson R (1974) Regression by leaps and bounds. Technometrics 16(4): 499–511
Gatu C, Kontoghiorghes E (2006) Branch-and-bound algorithms for computing the best-subset regression models. J Comput Graph Stat 15(1): 139–156
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Springer, New York ISBN 978-0-387-84857-0
Khan J, Van Aelst S, Zamar R (2007a) Building a robust linear model with forward selection and stepwise procedures. Comput Stat Data Anal 52(1): 239–248
Khan J, Van Aelst S, Zamar R (2007b) Robust linear model selection based on least angle regression. J Am Stat Assoc 102(480): 1289–1299
Lumley T, Miller A (2009) \({\texttt{leaps}}\): regression subset selection. R package version 2.9
Mallows C (1973) Some comments on C p . Technometrics 15(4): 661–675
Maronna R, Zamar R (2002) Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4): 307–317
Maronna R, Martin D, Yohai V (2006) Robust statistics. Wiley, Chichester ISBN 978-0-470-01092-1
McCann L, Welsch R (2007) Robust variable selection using least angle regression and elemental set sampling. Comput Stat Data Anal 52(1): 249–257
Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC, Boca Raton ISBN 1-58488-171-2
Müller S, Welsh A (2005) Outlier robust model selection in linear regression. J Am Stat Assoc 100(472): 1297–1310
R Development Core Team (2010) R: a Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. ISBN 3-900051-07-0
Renwick R, Brown I, Raphael D (1994) Quality of life: linking conceptual approach to service provision. J Dev Disabil 3(2): 32–44
Riani M, Atkinson A (2010) Robust model selection with flexible trimming. Comput Stat Data Anal 54(12): 3300–3312
Ronchetti E, Staudte R (1994) A robust version of Mallows’s C p . J Am Stat Assoc 89(426): 550–559
Ronchetti E, Field C, Blanchard W (1997) Robust linear model selection by cross-validation. J Am Stat Assoc 92(439): 1017–1023
Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley, New York ISBN 0-471-48855-0
Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3): 212–223
Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibian-Barrera M, Verbeke T, Maechler M (2009) \({\texttt{robustbase}}\): basic robust statistics. R package version 0.5-0-1
Salibian-Barrera M, Van Aelst S (2008) Robust model selection using fast and robust bootstrap. Comput Stat Data Anal 52(12): 5121–5135
Salibian-Barrera M, Zamar R (2002) Bootstrapping robust estimates of regression. Ann Stat 30(2): 556–582
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2): 461–464
Tichbon C, Newton P (2002) Life is do-able: quality of life development in a supportive small group setting. Occasional Paper Series 2, Mental Health Foundation of New Zealand
Troyanskaya O, Cantor M, Sherlock G, Brown P, Hastie T, Tibshirani R, Botstein D, Altman R (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6): 520–525
Van Aelst S, Welsch R, Zamar R (eds) (2010) Special issue on variable selection and robust procedures. Comput Stat Data Anal 54(12)
Varmuza K, Filzmoser P (2009) Introduction to multivariate statistical analysis in chemometrics. CRC Press, Boca Raton ISBN 978-0-470-98581-6
Wisnowski J, Simpson J, Montgomery D, Runger G (2003) Resampling methods for variable selection in robust regression. Comput Stat Data Anal 43(3): 341–355
Yohai V (1987) High breakdown-point and high efficiency robust estimates for regression. Ann Stat 15(20): 642–656
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was supported by a grant of the Austrian Research Promotion Agency (FFG), Project Ref. No. 813000/10345.
Rights and permissions
About this article
Cite this article
Alfons, A., Baaske, W.E., Filzmoser, P. et al. Robust variable selection with application to quality of life research. Stat Methods Appl 20, 65–82 (2011). https://doi.org/10.1007/s10260-010-0151-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-010-0151-y