Resampling-based information criteria for best-subset regression

Reiss, Philip T.; Huang, Lei; Cavanaugh, Joseph E.; Roy, Amy Krain

doi:10.1007/s10463-012-0353-1

Resampling-based information criteria for best-subset regression

Published: 20 March 2012

Volume 64, pages 1161–1186, (2012)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Philip T. Reiss^1,2,
Lei Huang¹^nAff3,
Joseph E. Cavanaugh⁴ &
…
Amy Krain Roy⁵

777 Accesses
7 Citations
Explore all metrics

Abstract

When a linear model is chosen by searching for the best subset among a set of candidate predictors, a fixed penalty such as that imposed by the Akaike information criterion may penalize model complexity inadequately, leading to biased model selection. We study resampling-based information criteria that aim to overcome this problem through improved estimation of the effective model dimension. The first proposed approach builds upon previous work on bootstrap-based model selection. We then propose a more novel approach based on cross-validation. Simulations and analyses of a functional neuroimaging data set illustrate the strong performance of our resampling-based methods, which are implemented in a new R package.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonsingular subsampling for regression S estimators with categorical predictors

Article 03 September 2016

Statistical estimation in the presence of possibly incorrect model assumptions

Article 01 September 2017

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

References

Akaike H. (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov B.N., Csàki F. (eds) Second International Symposium on Information Theory. Akademiai Kiàdo, Budapest, pp 267–281
Google Scholar
Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716–723
Article MathSciNet MATH Google Scholar
Biswal B., Yetkin F.Z., Haughton V.M., Hyde J.S. (1995) Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine 34: 537–541
Article Google Scholar
Cerdeira, J. O., Duarte Silva, P., Cadima, J., Minhoto, M. (2009). subselect: Selecting variable subsets. R package version 0.10-1. http://CRAN.R-project.org/package=subselect
Davison A.C. (2003) Statistical Models. Cambridge University Press, Cambridge
Book MATH Google Scholar
Efron B. (1983) Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association 78: 316–331
Article MathSciNet MATH Google Scholar
Efron B. (2004) The estimation of prediction error: Covariance penalties and cross-validation (with discussion). Journal of the American Statistical Association 99: 619–642
Article MathSciNet MATH Google Scholar
Efron B., Tibshirani R. (1997) Improvements on cross-validation: The .632+ bootstrap method. Journal of the American Statistical Association 92: 548–560
MathSciNet MATH Google Scholar
Foster D.P., George E.I. (1994) The risk inflation criterion for multiple regression. Annals of Statistics 22: 1947–1975
Article MathSciNet MATH Google Scholar
George E.I., Foster D.P. (2000) Calibration and empirical Bayes variable selection. Biometrika 87: 731–747
Article MathSciNet MATH Google Scholar
Harville D.A. (2008) Matrix Algebra from a Statistician’s Perspective. Springer, New York
MATH Google Scholar
Helland I.S. (1988) On the structure of partial least squares regression. Communications in Statistics: Theory and Methods 17: 588–607
MathSciNet Google Scholar
Hoerl A.E., Kennard R.W. (1970) Ridge regression: applications to nonorthogonal problems. Technometrics 12: 69–82
Article MATH Google Scholar
Hurvich C.M., Tsai C.-L. (1989) Regression and time series model selection in small samples. Biometrika 76: 297–307
Article MathSciNet MATH Google Scholar
Ishiguro M., Sakamoto Y., Kitagawa G. (1997) Bootstrapping log likelihood and EIC, an extension of AIC. Annals of the Institute of Statistical Mathematics 49: 411–434
Article MathSciNet MATH Google Scholar
Konishi S., Kitagawa G. (1996) Generalised information criteria in model selection. Biometrika 83: 875–890
Article MathSciNet MATH Google Scholar
Konishi S., Kitagawa G. (2008) Information Criteria and Statistical Modeling. Springer, New York
Book MATH Google Scholar
Lawless J.F., Singhal K. (1978) Efficient screening of nonnormal regression models. Biometrics 34: 318–327
Article Google Scholar
Lumley, T. (2009). using Fortran code by A. Miller. leaps: regression subset selection. R package version 2.9. http://CRAN.R-project.org/package=leaps
Luo X., Stefanski L.A., Boos D.D. (2006) Tuning variable selection procedures by adding noise. Technometrics 48: 165–175
Article MathSciNet Google Scholar
Magnus J.R. (1986) The exact moments of a ratio of quadratic forms in normal variables. Annales d’Économie et de Statistique 4: 95–109
MathSciNet Google Scholar
Miller A. (2002) Subset Selection in Regression, 2nd ed. Chapman & Hall/CRC, Boca Raton
Book MATH Google Scholar
Pan W., Le C.T. (2001) Bootstrap model selection in generalized linear models. Journal of Agricultural, Biological, and Environmental Statistics 6: 49–61
Article MathSciNet Google Scholar
R Development Core Team (2010). R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0. http://www.R-project.org.
Reiss, P. T., Huang, L., Mennes, M. (2010). Fast function-on-scalar regression with penalized basis expansions. International Journal of Biostatistics, 6, article 28.
Google Scholar
Rosenberg M. (1965) Society and the Adolescent Self-Image. Princeton University Press, Princeton, NJ
Google Scholar
Shao J. (1996) Bootstrap model selection. Journal of the American Statistical Association 91: 655–665
Article MathSciNet MATH Google Scholar
Shen X., Ye J. (2002) Adaptive model selection. Journal of the American Statistical Association 97: 210–221
Article MathSciNet MATH Google Scholar
Stark D.E., Margulies D.S., Shehzad Z., Reiss P.T., Kelly A.M.C., Uddin L.Q., Gee D., Roy A.K., Banich M.T., Castellanos F.X., Milham M.P. (2008) Regional variation in interhemispheric coordination of intrinsic hemodynamic fluctuations. Journal of Neuroscience 28: 13754–13764
Article Google Scholar
Stein J.L., Wiedholz L.M., Bassett D.S., Weinberger D.R., Zink C.F., Mattay V.S., Meyer-Lindenberg A. (2007) A validated network of effective amygdala connectivity. NeuroImage 36: 736–745
Article Google Scholar
Sugiura N. (1978) Further analysis of the data by Akaike’s information criterion and the finite corrections. Communicatons in Statistics: Theory and Methods 7: 13–26
Article MathSciNet Google Scholar
Tibshirani R. (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58: 267–288
MathSciNet MATH Google Scholar
Tibshirani R., Knight K. (1999) The covariance inflation criterion for adaptive model selection. Journal of the Royal Statistical Society, Series B 61: 529–546
Article MathSciNet MATH Google Scholar
Watson, D., Weber, K., Assenheimer, J. S., Clark, L. A., Strauss, M. E., McCormick, R. A. (1995). Testing a tripartite model: I. Evaluating the convergent and discriminant validity of anxiety and depression symptom scales. Journal of Abnormal Psychology, 104, 3–14.
Google Scholar
Wood S.N. (1994) Monotonic smoothing splines fitted by cross validation. SIAM Journal on Scientific Computing, 15: 1126–1133
Article MathSciNet MATH Google Scholar
Wood S.N. (2006) Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC, Boca Raton
MATH Google Scholar
Ye J. (1998) On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association 93: 120–131
Article MathSciNet MATH Google Scholar

Download references

Author information

Lei Huang
Present address: Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, 21205, USA

Authors and Affiliations

Department of Child and Adolescent Psychiatry, New York University School of Medicine, New York, NY, 10016, USA
Philip T. Reiss & Lei Huang
Nathan S. Kline Institute for Psychiatric Research, Orangeburg, NY, 10962, USA
Philip T. Reiss
Department of Biostatistics, University of Iowa College of Public Health, Iowa City, IA, 52242, USA
Joseph E. Cavanaugh
Department of Psychology, Fordham University, Bronx, NY, 10458, USA
Amy Krain Roy

Authors

Philip T. Reiss
View author publications
You can also search for this author in PubMed Google Scholar
Lei Huang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph E. Cavanaugh
View author publications
You can also search for this author in PubMed Google Scholar
Amy Krain Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip T. Reiss.

About this article

Cite this article

Reiss, P.T., Huang, L., Cavanaugh, J.E. et al. Resampling-based information criteria for best-subset regression. Ann Inst Stat Math 64, 1161–1186 (2012). https://doi.org/10.1007/s10463-012-0353-1

Download citation

Received: 05 July 2010
Revised: 16 August 2011
Published: 20 March 2012
Issue Date: December 2012
DOI: https://doi.org/10.1007/s10463-012-0353-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Resampling-based information criteria for best-subset regression

Abstract

Access this article

Similar content being viewed by others

Nonsingular subsampling for regression S estimators with categorical predictors

Statistical estimation in the presence of possibly incorrect model assumptions

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Resampling-based information criteria for best-subset regression

Abstract

Access this article

Similar content being viewed by others

Nonsingular subsampling for regression S estimators with categorical predictors

Statistical estimation in the presence of possibly incorrect model assumptions

Bottom-Up Variable Selection in Cluster Analysis Using Bootstrapping: A Proposal

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation