Skip to main content
Log in

Simple measures of uncertainty for model selection

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We develop two simple measures of uncertainty for a model selection procedure. The first measure is similar in spirit to confidence set in parameter estimation; the second measure is focusing on error in model selection. The proposed methods are simpler, both conceptually and computationally, than the existing measures of uncertainty in model selection. We recognize major differences between model selection and traditional estimation or prediction problems, and propose reasonable frameworks, under which these measures are developed, and their theoretical properties are established. Empirical studies demonstrate performance of the proposed measures, their superiority over the existing methods, and their relevance to real-life applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike H (1973) Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second International symposium on information theory. Akademiai Kiado, Budapest, pp 267–281

    Google Scholar 

  • Bickel PJ, Chen A (2009) A nonparametric view of network models and Newman-Girvan and other modularities. PNAS 106:21068–21073

    Article  Google Scholar 

  • Chen L, Giannakouros P, Yang Y (2007) Model combining in factorial data analysis. J Stat Plan Inference 137:2920–2934

    Article  MathSciNet  Google Scholar 

  • Chipman H, George EI, McCulloch RE, Clyde M, Foster DP, Stine RA (2001) The practical implementation of Bayesian model selection. Lecture notes-monograph series, pp 65–134

  • Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Datta GS, Hall P, Mandal A (2011) Model selection by testing for the presence of small-area effects, and applications to area-level data. J Am Stat Assoc 106:361–374

    Article  MathSciNet  Google Scholar 

  • Efron B (1979) Bootstrap method: another look at the jackknife. Ann Stat 7:1–26

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499

    Article  MathSciNet  Google Scholar 

  • Ferrari D, Yang Y (2015) Confidence sets for model selection by F-testing. Stat Sin 25:1637–1658

    MathSciNet  MATH  Google Scholar 

  • Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction and estimation. J Am Stat Assoc 102:359–378

    Article  MathSciNet  Google Scholar 

  • Hansen PR, Lunde A, Nason JM (2011) The model confidence set. Econometrica 79:453–497

    Article  MathSciNet  Google Scholar 

  • Jiang J (2010) Large sample techniques for statistics. Springer, New York

    Book  Google Scholar 

  • Jiang J, Nguyen T (2015) The fence methods. World Scientific, Sinpapore

    Book  Google Scholar 

  • Jiang J, Li C, Paul D, Yang C, Zhao H (2016) On high-dimensional misspecified mixed model analysis in genome-wide association study. Ann Stat 44:2127–2160

    Article  MathSciNet  Google Scholar 

  • Jiao Y, Reid K, Smith E (2009) Model selection uncertainty and Bayesian model averaging in Fisheries Recruitment Modeling. In: Beamish RJ, Rothschild BJ (eds) The future of fisheries science in North America. Springer, Cham, pp 505–524

    Chapter  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795

    Article  MathSciNet  Google Scholar 

  • Lahiri P (ed) (2001) Model Selection, IMS Lecture Notes—Monograph Series, vol 38. Institute of Mathematical Statistics, Beachwood

  • Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25:464–492

    Article  MathSciNet  Google Scholar 

  • Lubke GH, Campbell I (2016) Inference based on the best-fitting model can contribute to the replication crisis: assessing model selection uncertainty using a bootstrap approach. Struct Equ Model 23:479–490

    Article  MathSciNet  Google Scholar 

  • Lubke GJ, Campbell I, McArtor D, Miller P, Luningham J, van den Berg SM (2017) Assessing model selection uncertainty using a bootstrap approach: an update. Struct Equ Model 24:230–245

    Article  MathSciNet  Google Scholar 

  • Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546

    Article  Google Scholar 

  • Nan Y, Yang Y (2014) Variable selection diagnostics measures for high-dimensional regression. J Comput Graph Stat 23:636–656

    Article  MathSciNet  Google Scholar 

  • Pang Z, Lin B, Jiang J (2016) Regularisation parameter selection via bootstrapping. Aust N Z J Stat 58:335–356

    Article  MathSciNet  Google Scholar 

  • Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MathSciNet  Google Scholar 

  • Shen X, Pan W, Zhu Y (2012) Likelihood-based selection and sharp parameter estimation. J Am Stat Assoc 107:223–232

    Article  MathSciNet  Google Scholar 

  • Shibata R (1976) Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63:117–126

    Article  MathSciNet  Google Scholar 

  • Tibshirani RJ (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 16:385–395

    MATH  Google Scholar 

  • Xie M, Singh K, Zhang C-H (2009) Confidence intervals for population ranks in the presence of ties and near ties. J Am Stat Assoc 104:775–788

    Article  MathSciNet  Google Scholar 

  • Yu Y, Yang Y, Yang Y (2017) Performance assessment of high-dimensional variable identification. arXiv:1704.08810

  • Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1204

    Article  MathSciNet  Google Scholar 

  • Zheng C, Ferrari D, Yang Y (2019a) Model selection confidence sets by likelihood ratio testing. Stat Sin 29:827–851

    MathSciNet  MATH  Google Scholar 

  • Zheng C, Ferrari D, Zhang M, Baird P (2019b) Ranking the importance of genetic factors by variable-selection confidence sets. J R Stat Soc Ser C (Appl Stat) 68:727–749

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Xiaohui Liu’s research is supported by NNSF of China (Grant Nos. 11601197 and 11461029), China Postdoctoral Science Foundation funded project (2016M600511, 2017T100475), and NSF of Jiangxi Province (Nos. 2017ACB21030, 2018ACB21002). The research of Jiming Jiang is partially supported by the NSF Grants DMS-1510219 and DMS-1713120. The authors are grateful to comments from an Associate Editor and two referees that have led to substantial improvement of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiming Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 478 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Li, Y. & Jiang, J. Simple measures of uncertainty for model selection. TEST 30, 673–692 (2021). https://doi.org/10.1007/s11749-020-00737-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-020-00737-9

Keywords

Mathematics Subject Classification

Navigation