Abstract
A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We note that selecting some of the “uninformative" features could in principle still reduce noise and improve prognostic performance [23]. However, it would still be preferable for interpretability if indeed only the informative feature were recovered.
References
Abbott, J.H., Kingan, E.M.: Accuracy of physical therapists’ prognosis of low back pain from the clinical examination: a prospective cohort study. J. Manual Manip. Therapy 22(3), 154–161 (2014)
Aytug, H.: Feature selection for support vector machines using generalized benders decomposition. Eur. J. Oper. Res. 244(1), 210–218 (2015)
Bakker, E.W., Verhagen, A.P., Lucas, C., Koning, H.J., Koes, B.W.: Spinal mechanical load: a predictor of persistent low back pain? A prospective cohort study. Eur. Spine J. 16, 933–941 (2007)
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications. In: SIAM (2001)
Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270(3), 931–942 (2018)
Bertsimas, D., Dunn, J.: Machine Learning Under a Modern Optimization Lens. Dynamic Ideas, LLC, Charlestown (2019)
Bertsimas, D., Dunn, J., Pawlowski, C., Zhuo, Y.D.: Robust classification. INFORMS J. Optim. 1(1), 2–34 (2019)
Bertsimas, D., Pauphilet, J., Van Parys, B.: Sparse classification: a scalable discrete optimization perspective. Mach. Learn. 110, 3177–3209 (2021)
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16(3), 199–231 (2001)
Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., Van Calster, B.: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019)
Curtin, R.R., Im, S., Moseley, B., Pruhs, K., Samadian, A.: On coresets for regularized loss minimization. arXiv preprint arXiv:1905.10845 (2019)
Davidson, R., MacKinnon, J.G.: Bootstrap tests: how many bootstraps? Economet. Rev. 19(1), 55–68 (2000)
Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: continuous and mixed integer optimization perspectives. J. Mach. Learn. Res. 22(1), 6008–6054 (2021)
Deza, A., Atamturk, A.: Safe screening for logistic regression with l0–l2 regularization. arXiv preprint arXiv:2202.00467 (2022)
DIN, DKE: Deutsche Normungsroadmap Künstliche Intelligenz (Ausgabe 2) (2022). https://www.din.de/go/normungsroadmapki/
Dionne, C.E., Le Sage, N., Franche, R.L., Dorval, M., Bombardier, C., Deyo, R.A.: Five questions predicted long-term, severe, back-related functional limitations: evidence from three large prospective studies. J. Clin. Epidemiol. 64(1), 54–66 (2011)
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
European Commission: Proposal for a Regulation Of The European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (2021). https://artificialintelligenceact.eu/the-act/
Evans, D.W., et al.: Estimating risk of chronic pain and disability following musculoskeletal trauma in the united kingdom. JAMA Netw. Open 5(8), e2228870–e2228870 (2022)
van der Gaag, W.H., et al.: Developing clinical prediction models for nonrecovery in older patients seeking care for back pain: the back complaints in the elders prospective cohort study. Pain 162(6), 1632 (2021)
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, vol. 253, p. 40 (2003)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17 (2004)
Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)
Hancock, M.J., Maher, C.G., Latimer, J., Herbert, R.D., McAuley, J.H.: Can rate of recovery be predicted in patients with acute low back pain? Development of a clinical prediction rule. Eur. J. Pain 13(1), 51–55 (2009)
Harrell, F.E.: Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Heinze, G., Wallisch, C., Dunkler, D.: Variable selection-a review and recommendations for the practicing statistician. Biom. J. 60(3), 431–449 (2018)
Hollmann, N., Müller, S., Eggensperger, K., Hutter, F.: TabPFN: a transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848 (2022)
Kennedy, C.A., Haines, T., Beaton, D.E.: Eight predictive factors associated with response patterns during physiotherapy for soft tissue shoulder disorders were identified. J. Clin. Epidemiol. 59(5), 485–496 (2006)
Kuijpers, T., van der Windt, D.A., Boeke, A.J.P., Twisk, J.W., Vergouwe, Y., Bouter, L.M., van der Heijden, G.J.: Clinical prediction rules for the prognosis of shoulder pain in general practice. Pain 120(3), 276–285 (2006)
Kuijpers, T., van der Windt, D.A., van der Heijden, G.J., Twisk, J.W., Vergouwe, Y., Bouter, L.M.: A prediction rule for shoulder pain related sick leave: a prospective cohort study. BMC Musculoskelet. Disord. 7, 1–11 (2006)
Labbé, M., Martínez-Merino, L.I., Rodríguez-Chía, A.M.: Mixed integer linear programming for feature selection in support vector machine. Discret. Appl. Math. 261, 276–304 (2019)
LeDell, E., Petersen, M., van der Laan, M.: Computationally efficient confidence intervals for cross-validated area under the roc curve estimates. Elect. J. Statist. 9(1), 1583 (2015)
Lee, I.G., Zhang, Q., Yoon, S.W., Won, D.: A mixed integer linear programming support vector machine for cost-effective feature selection. Knowl. Based Syst. 203, 106145 (2020)
Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. In: Nature Precedings, p. 1 (2008)
Moons, K.G., et al.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann. Intern. Med. 162(1), W1–W73 (2015)
Moons, K.G., et al.: Probast: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann. Intern. Med. 170(1), W1–W33 (2019)
MOSEK ApS: MOSEK modeling cookbook (2022)
MOSEK ApS: MOSEK optimizer API for Python (2023)
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Scheele, J., et al.: Course and prognosis of older back pain patients in general practice: a prospective cohort study. PAIN® 154(6), 951–957 (2013)
Steinberg, E., Jung, K., Fries, J.A., Corbin, C.K., Pfohl, S.R., Shah, N.H.: Language models are an effective representation learning technique for electronic health record data. J. Biomed. Inform. 113, 103637 (2021)
Steyerberg, E.W.: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16399-0
Tamura, R., Takano, Y., Miyashiro, R.: Feature subset selection for kernel SVM classification via mixed-integer optimization. arXiv preprint arXiv:2205.14325 (2022)
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
Tillmann, A.M., Bienstock, D., Lodi, A., Schwartz, A.: Cardinality minimization, constraints, and regularization: a survey. arXiv preprint arXiv:2106.09606 (2021)
Wippert, P.M., et al.: Development of a risk stratification and prevention index for stratified care in chronic low back pain. Focus yellow flags (MiSpEx network). Pain Rep. 2(6), e623 (2017)
Wornow, M., et al.: The shaky foundations of clinical foundation models: a survey of large language models and foundation models for EMRs. arXiv preprint arXiv:2303.12961 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Knauer, R., Rodner, E. (2023). Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-42608-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42607-0
Online ISBN: 978-3-031-42608-7
eBook Packages: Computer ScienceComputer Science (R0)