Abstract
We develop two simple measures of uncertainty for a model selection procedure. The first measure is similar in spirit to confidence set in parameter estimation; the second measure is focusing on error in model selection. The proposed methods are simpler, both conceptually and computationally, than the existing measures of uncertainty in model selection. We recognize major differences between model selection and traditional estimation or prediction problems, and propose reasonable frameworks, under which these measures are developed, and their theoretical properties are established. Empirical studies demonstrate performance of the proposed measures, their superiority over the existing methods, and their relevance to real-life applications.
Similar content being viewed by others
References
Akaike H (1973) Information theory as an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second International symposium on information theory. Akademiai Kiado, Budapest, pp 267–281
Bickel PJ, Chen A (2009) A nonparametric view of network models and Newman-Girvan and other modularities. PNAS 106:21068–21073
Chen L, Giannakouros P, Yang Y (2007) Model combining in factorial data analysis. J Stat Plan Inference 137:2920–2934
Chipman H, George EI, McCulloch RE, Clyde M, Foster DP, Stine RA (2001) The practical implementation of Bayesian model selection. Lecture notes-monograph series, pp 65–134
Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge
Datta GS, Hall P, Mandal A (2011) Model selection by testing for the presence of small-area effects, and applications to area-level data. J Am Stat Assoc 106:361–374
Efron B (1979) Bootstrap method: another look at the jackknife. Ann Stat 7:1–26
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–499
Ferrari D, Yang Y (2015) Confidence sets for model selection by F-testing. Stat Sin 25:1637–1658
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction and estimation. J Am Stat Assoc 102:359–378
Hansen PR, Lunde A, Nason JM (2011) The model confidence set. Econometrica 79:453–497
Jiang J (2010) Large sample techniques for statistics. Springer, New York
Jiang J, Nguyen T (2015) The fence methods. World Scientific, Sinpapore
Jiang J, Li C, Paul D, Yang C, Zhao H (2016) On high-dimensional misspecified mixed model analysis in genome-wide association study. Ann Stat 44:2127–2160
Jiao Y, Reid K, Smith E (2009) Model selection uncertainty and Bayesian model averaging in Fisheries Recruitment Modeling. In: Beamish RJ, Rothschild BJ (eds) The future of fisheries science in North America. Springer, Cham, pp 505–524
Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795
Lahiri P (ed) (2001) Model Selection, IMS Lecture Notes—Monograph Series, vol 38. Institute of Mathematical Statistics, Beachwood
Lim C, Yu B (2016) Estimation stability with cross-validation (ESCV). J Comput Graph Stat 25:464–492
Lubke GH, Campbell I (2016) Inference based on the best-fitting model can contribute to the replication crisis: assessing model selection uncertainty using a bootstrap approach. Struct Equ Model 23:479–490
Lubke GJ, Campbell I, McArtor D, Miller P, Luningham J, van den Berg SM (2017) Assessing model selection uncertainty using a bootstrap approach: an update. Struct Equ Model 24:230–245
Madigan D, Raftery AE (1994) Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc 89:1535–1546
Nan Y, Yang Y (2014) Variable selection diagnostics measures for high-dimensional regression. J Comput Graph Stat 23:636–656
Pang Z, Lin B, Jiang J (2016) Regularisation parameter selection via bootstrapping. Aust N Z J Stat 58:335–356
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shen X, Pan W, Zhu Y (2012) Likelihood-based selection and sharp parameter estimation. J Am Stat Assoc 107:223–232
Shibata R (1976) Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63:117–126
Tibshirani RJ (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc B 16:385–395
Xie M, Singh K, Zhang C-H (2009) Confidence intervals for population ranks in the presence of ties and near ties. J Am Stat Assoc 104:775–788
Yu Y, Yang Y, Yang Y (2017) Performance assessment of high-dimensional variable identification. arXiv:1704.08810
Yuan Z, Yang Y (2005) Combining linear regression models: when and how? J Am Stat Assoc 100:1202–1204
Zheng C, Ferrari D, Yang Y (2019a) Model selection confidence sets by likelihood ratio testing. Stat Sin 29:827–851
Zheng C, Ferrari D, Zhang M, Baird P (2019b) Ranking the importance of genetic factors by variable-selection confidence sets. J R Stat Soc Ser C (Appl Stat) 68:727–749
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
Xiaohui Liu’s research is supported by NNSF of China (Grant Nos. 11601197 and 11461029), China Postdoctoral Science Foundation funded project (2016M600511, 2017T100475), and NSF of Jiangxi Province (Nos. 2017ACB21030, 2018ACB21002). The research of Jiming Jiang is partially supported by the NSF Grants DMS-1510219 and DMS-1713120. The authors are grateful to comments from an Associate Editor and two referees that have led to substantial improvement of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liu, X., Li, Y. & Jiang, J. Simple measures of uncertainty for model selection. TEST 30, 673–692 (2021). https://doi.org/10.1007/s11749-020-00737-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-020-00737-9
Keywords
- Average probability of coverage
- Bootstrapping
- Consistency
- LogP measure
- Model confidence set
- Model selection
- Uncertainty