Model selection criteria based on cross-validatory concordance statistics
- 186 Downloads
In the logistic regression framework, we present the development and investigation of three model selection criteria based on cross-validatory analogues of the traditional and adjusted c-statistics. These criteria are designed to estimate three corresponding measures of predictive error: the model misspecification prediction error, the fitting sample prediction error, and the sum of prediction errors. We aim to show that these estimators serve as suitable model selection criteria, facilitating the identification of a model that appropriately balances goodness-of-fit and parsimony, while achieving generalizability. We examine the properties of the selection criteria via an extensive simulation study designed as a factorial experiment. We then employ these measures in a practical application based on modeling the occurrence of heart disease.
KeywordsAkaike information criterion Logistic regression Prediction ROC curve Variable selection
We wish to thank our referees for their valuable feedback, which served to improve the original version of this manuscript.
- Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) 2nd international symposium on information theory. Akademia Kiado, Budapest, pp 267–281Google Scholar
- Takeuchi K (1976) Distribution of information statistics and criteria for adequacy of models. Math Sci 153:12–18 (in Japanese)Google Scholar
- Ten Eyck P, Cavanaugh JE (2015) The adjusted concordance statistic. In: Karagrigoriou A, Oliveira T, Skiadas C (eds) Statistical, stochastic and data analysis methods and applications. ISAST, Athens, pp 143–156Google Scholar