Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

Knauer, Ricardo; Rodner, Erik

doi:10.1007/978-3-031-42608-7_10

Ricardo Knauer⁹ &
Erik Rodner⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14236))

Included in the following conference series:

German Conference on Artificial Intelligence (Künstliche Intelligenz)

643 Accesses
1 Altmetric

Abstract

A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We note that selecting some of the “uninformative" features could in principle still reduce noise and improve prognostic performance [23]. However, it would still be preferable for interpretability if indeed only the informative feature were recovered.

References

Abbott, J.H., Kingan, E.M.: Accuracy of physical therapists’ prognosis of low back pain from the clinical examination: a prospective cohort study. J. Manual Manip. Therapy 22(3), 154–161 (2014)
Article Google Scholar
Aytug, H.: Feature selection for support vector machines using generalized benders decomposition. Eur. J. Oper. Res. 244(1), 210–218 (2015)
Article MathSciNet MATH Google Scholar
Bakker, E.W., Verhagen, A.P., Lucas, C., Koning, H.J., Koes, B.W.: Spinal mechanical load: a predictor of persistent low back pain? A prospective cohort study. Eur. Spine J. 16, 933–941 (2007)
Article Google Scholar
Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications. In: SIAM (2001)
Google Scholar
Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270(3), 931–942 (2018)
Article MathSciNet MATH Google Scholar
Bertsimas, D., Dunn, J.: Machine Learning Under a Modern Optimization Lens. Dynamic Ideas, LLC, Charlestown (2019)
Google Scholar
Bertsimas, D., Dunn, J., Pawlowski, C., Zhuo, Y.D.: Robust classification. INFORMS J. Optim. 1(1), 2–34 (2019)
Article MathSciNet Google Scholar
Bertsimas, D., Pauphilet, J., Van Parys, B.: Sparse classification: a scalable discrete optimization perspective. Mach. Learn. 110, 3177–3209 (2021)
Article MathSciNet MATH Google Scholar
Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Google Scholar
Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16(3), 199–231 (2001)
Article MATH Google Scholar
Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., Van Calster, B.: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019)
Article Google Scholar
Curtin, R.R., Im, S., Moseley, B., Pruhs, K., Samadian, A.: On coresets for regularized loss minimization. arXiv preprint arXiv:1905.10845 (2019)
Davidson, R., MacKinnon, J.G.: Bootstrap tests: how many bootstraps? Economet. Rev. 19(1), 55–68 (2000)
Article MathSciNet MATH Google Scholar
Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: continuous and mixed integer optimization perspectives. J. Mach. Learn. Res. 22(1), 6008–6054 (2021)
MathSciNet MATH Google Scholar
Deza, A., Atamturk, A.: Safe screening for logistic regression with l0–l2 regularization. arXiv preprint arXiv:2202.00467 (2022)
DIN, DKE: Deutsche Normungsroadmap Künstliche Intelligenz (Ausgabe 2) (2022). https://www.din.de/go/normungsroadmapki/
Dionne, C.E., Le Sage, N., Franche, R.L., Dorval, M., Bombardier, C., Deyo, R.A.: Five questions predicted long-term, severe, back-related functional limitations: evidence from three large prospective studies. J. Clin. Epidemiol. 64(1), 54–66 (2011)
Article Google Scholar
Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)
Article MathSciNet MATH Google Scholar
European Commission: Proposal for a Regulation Of The European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (2021). https://artificialintelligenceact.eu/the-act/
Evans, D.W., et al.: Estimating risk of chronic pain and disability following musculoskeletal trauma in the united kingdom. JAMA Netw. Open 5(8), e2228870–e2228870 (2022)
Article Google Scholar
van der Gaag, W.H., et al.: Developing clinical prediction models for nonrecovery in older patients seeking care for back pain: the back complaints in the elders prospective cohort study. Pain 162(6), 1632 (2021)
Article Google Scholar
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, vol. 253, p. 40 (2003)
Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)
MATH Google Scholar
Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17 (2004)
Google Scholar
Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)
Article Google Scholar
Hancock, M.J., Maher, C.G., Latimer, J., Herbert, R.D., McAuley, J.H.: Can rate of recovery be predicted in patients with acute low back pain? Development of a clinical prediction rule. Eur. J. Pain 13(1), 51–55 (2009)
Article Google Scholar
Harrell, F.E.: Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7
Book MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7
Book MATH Google Scholar
Heinze, G., Wallisch, C., Dunkler, D.: Variable selection-a review and recommendations for the practicing statistician. Biom. J. 60(3), 431–449 (2018)
Article MathSciNet MATH Google Scholar
Hollmann, N., Müller, S., Eggensperger, K., Hutter, F.: TabPFN: a transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848 (2022)
Kennedy, C.A., Haines, T., Beaton, D.E.: Eight predictive factors associated with response patterns during physiotherapy for soft tissue shoulder disorders were identified. J. Clin. Epidemiol. 59(5), 485–496 (2006)
Article Google Scholar
Kuijpers, T., van der Windt, D.A., Boeke, A.J.P., Twisk, J.W., Vergouwe, Y., Bouter, L.M., van der Heijden, G.J.: Clinical prediction rules for the prognosis of shoulder pain in general practice. Pain 120(3), 276–285 (2006)
Article Google Scholar
Kuijpers, T., van der Windt, D.A., van der Heijden, G.J., Twisk, J.W., Vergouwe, Y., Bouter, L.M.: A prediction rule for shoulder pain related sick leave: a prospective cohort study. BMC Musculoskelet. Disord. 7, 1–11 (2006)
Article Google Scholar
Labbé, M., Martínez-Merino, L.I., Rodríguez-Chía, A.M.: Mixed integer linear programming for feature selection in support vector machine. Discret. Appl. Math. 261, 276–304 (2019)
Article MathSciNet MATH Google Scholar
LeDell, E., Petersen, M., van der Laan, M.: Computationally efficient confidence intervals for cross-validated area under the roc curve estimates. Elect. J. Statist. 9(1), 1583 (2015)
MathSciNet MATH Google Scholar
Lee, I.G., Zhang, Q., Yoon, S.W., Won, D.: A mixed integer linear programming support vector machine for cost-effective feature selection. Knowl. Based Syst. 203, 106145 (2020)
Article Google Scholar
Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. In: Nature Precedings, p. 1 (2008)
Google Scholar
Moons, K.G., et al.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann. Intern. Med. 162(1), W1–W73 (2015)
Article Google Scholar
Moons, K.G., et al.: Probast: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann. Intern. Med. 170(1), W1–W33 (2019)
Article Google Scholar
MOSEK ApS: MOSEK modeling cookbook (2022)
Google Scholar
MOSEK ApS: MOSEK optimizer API for Python (2023)
Google Scholar
Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)
Article MathSciNet MATH Google Scholar
Scheele, J., et al.: Course and prognosis of older back pain patients in general practice: a prospective cohort study. PAIN® 154(6), 951–957 (2013)
Article Google Scholar
Steinberg, E., Jung, K., Fries, J.A., Corbin, C.K., Pfohl, S.R., Shah, N.H.: Language models are an effective representation learning technique for electronic health record data. J. Biomed. Inform. 113, 103637 (2021)
Article Google Scholar
Steyerberg, E.W.: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16399-0
Book MATH Google Scholar
Tamura, R., Takano, Y., Miyashiro, R.: Feature subset selection for kernel SVM classification via mixed-integer optimization. arXiv preprint arXiv:2205.14325 (2022)
Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tillmann, A.M., Bienstock, D., Lodi, A., Schwartz, A.: Cardinality minimization, constraints, and regularization: a survey. arXiv preprint arXiv:2106.09606 (2021)
Wippert, P.M., et al.: Development of a risk stratification and prevention index for stratified care in chronic low back pain. Focus yellow flags (MiSpEx network). Pain Rep. 2(6), e623 (2017)
Article Google Scholar
Wornow, M., et al.: The shaky foundations of clinical foundation models: a survey of large language models and foundation models for EMRs. arXiv preprint arXiv:2303.12961 (2023)

Download references

Author information

Authors and Affiliations

KI-Werkstatt, University of Applied Sciences Berlin, Berlin, Germany
Ricardo Knauer & Erik Rodner

Authors

Ricardo Knauer
View author publications
You can also search for this author in PubMed Google Scholar
Erik Rodner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ricardo Knauer .

Editor information

Editors and Affiliations

Universität Würzburg, Würzburg, Germany
Dietmar Seipel
University of Greifswald, Greifswald, Germany
Alexander Steen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Knauer, R., Rodner, E. (2023). Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-42608-7_10
Published: 18 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42607-0
Online ISBN: 978-3-031-42608-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective