Skip to main content

Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective

  • Conference paper
  • First Online:
KI 2023: Advances in Artificial Intelligence (KI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14236))

Included in the following conference series:

Abstract

A key challenge in machine learning is to design interpretable models that can reduce their inputs to the best subset for making transparent predictions, especially in the clinical domain. In this work, we propose a certifiably optimal feature selection procedure for logistic regression from a mixed-integer conic optimization perspective that can take an auxiliary cost to obtain features into account. Based on an extensive review of the literature, we carefully create a synthetic dataset generator for clinical prognostic model research. This allows us to systematically evaluate different heuristic and optimal cardinality- and budget-constrained feature selection procedures. The analysis shows key limitations of the methods for the low-data regime and when confronted with label noise. Our paper not only provides empirical recommendations for suitable methods and dataset designs, but also paves the way for future research in the area of meta-learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We note that selecting some of the “uninformative" features could in principle still reduce noise and improve prognostic performance [23]. However, it would still be preferable for interpretability if indeed only the informative feature were recovered.

References

  1. Abbott, J.H., Kingan, E.M.: Accuracy of physical therapists’ prognosis of low back pain from the clinical examination: a prospective cohort study. J. Manual Manip. Therapy 22(3), 154–161 (2014)

    Article  Google Scholar 

  2. Aytug, H.: Feature selection for support vector machines using generalized benders decomposition. Eur. J. Oper. Res. 244(1), 210–218 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bakker, E.W., Verhagen, A.P., Lucas, C., Koning, H.J., Koes, B.W.: Spinal mechanical load: a predictor of persistent low back pain? A prospective cohort study. Eur. Spine J. 16, 933–941 (2007)

    Article  Google Scholar 

  4. Ben-Tal, A., Nemirovski, A.: Lectures on modern convex optimization: analysis, algorithms, and engineering applications. In: SIAM (2001)

    Google Scholar 

  5. Bertsimas, D., Copenhaver, M.S.: Characterization of the equivalence of robustification and regularization in linear and matrix regression. Eur. J. Oper. Res. 270(3), 931–942 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bertsimas, D., Dunn, J.: Machine Learning Under a Modern Optimization Lens. Dynamic Ideas, LLC, Charlestown (2019)

    Google Scholar 

  7. Bertsimas, D., Dunn, J., Pawlowski, C., Zhuo, Y.D.: Robust classification. INFORMS J. Optim. 1(1), 2–34 (2019)

    Article  MathSciNet  Google Scholar 

  8. Bertsimas, D., Pauphilet, J., Van Parys, B.: Sparse classification: a scalable discrete optimization perspective. Mach. Learn. 110, 3177–3209 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  9. Boyd, S., Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  10. Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16(3), 199–231 (2001)

    Article  MATH  Google Scholar 

  11. Christodoulou, E., Ma, J., Collins, G.S., Steyerberg, E.W., Verbakel, J.Y., Van Calster, B.: A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019)

    Article  Google Scholar 

  12. Curtin, R.R., Im, S., Moseley, B., Pruhs, K., Samadian, A.: On coresets for regularized loss minimization. arXiv preprint arXiv:1905.10845 (2019)

  13. Davidson, R., MacKinnon, J.G.: Bootstrap tests: how many bootstraps? Economet. Rev. 19(1), 55–68 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  14. Dedieu, A., Hazimeh, H., Mazumder, R.: Learning sparse classifiers: continuous and mixed integer optimization perspectives. J. Mach. Learn. Res. 22(1), 6008–6054 (2021)

    MathSciNet  MATH  Google Scholar 

  15. Deza, A., Atamturk, A.: Safe screening for logistic regression with l0–l2 regularization. arXiv preprint arXiv:2202.00467 (2022)

  16. DIN, DKE: Deutsche Normungsroadmap Künstliche Intelligenz (Ausgabe 2) (2022). https://www.din.de/go/normungsroadmapki/

  17. Dionne, C.E., Le Sage, N., Franche, R.L., Dorval, M., Bombardier, C., Deyo, R.A.: Five questions predicted long-term, severe, back-related functional limitations: evidence from three large prospective studies. J. Clin. Epidemiol. 64(1), 54–66 (2011)

    Article  Google Scholar 

  18. Dunning, I., Huchette, J., Lubin, M.: JuMP: a modeling language for mathematical optimization. SIAM Rev. 59(2), 295–320 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  19. European Commission: Proposal for a Regulation Of The European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts (2021). https://artificialintelligenceact.eu/the-act/

  20. Evans, D.W., et al.: Estimating risk of chronic pain and disability following musculoskeletal trauma in the united kingdom. JAMA Netw. Open 5(8), e2228870–e2228870 (2022)

    Article  Google Scholar 

  21. van der Gaag, W.H., et al.: Developing clinical prediction models for nonrecovery in older patients seeking care for back pain: the back complaints in the elders prospective cohort study. Pain 162(6), 1632 (2021)

    Article  Google Scholar 

  22. Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, vol. 253, p. 40 (2003)

    Google Scholar 

  23. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)

    MATH  Google Scholar 

  24. Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the NIPS 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17 (2004)

    Google Scholar 

  25. Guyon, I., Li, J., Mader, T., Pletscher, P.A., Schneider, G., Uhr, M.: Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recogn. Lett. 28(12), 1438–1444 (2007)

    Article  Google Scholar 

  26. Hancock, M.J., Maher, C.G., Latimer, J., Herbert, R.D., McAuley, J.H.: Can rate of recovery be predicted in patients with acute low back pain? Development of a clinical prediction rule. Eur. J. Pain 13(1), 51–55 (2009)

    Article  Google Scholar 

  27. Harrell, F.E.: Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7

    Book  MATH  Google Scholar 

  28. Hastie, T., Tibshirani, R., Friedman, J.H., Friedman, J.H.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  29. Heinze, G., Wallisch, C., Dunkler, D.: Variable selection-a review and recommendations for the practicing statistician. Biom. J. 60(3), 431–449 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  30. Hollmann, N., Müller, S., Eggensperger, K., Hutter, F.: TabPFN: a transformer that solves small tabular classification problems in a second. arXiv preprint arXiv:2207.01848 (2022)

  31. Kennedy, C.A., Haines, T., Beaton, D.E.: Eight predictive factors associated with response patterns during physiotherapy for soft tissue shoulder disorders were identified. J. Clin. Epidemiol. 59(5), 485–496 (2006)

    Article  Google Scholar 

  32. Kuijpers, T., van der Windt, D.A., Boeke, A.J.P., Twisk, J.W., Vergouwe, Y., Bouter, L.M., van der Heijden, G.J.: Clinical prediction rules for the prognosis of shoulder pain in general practice. Pain 120(3), 276–285 (2006)

    Article  Google Scholar 

  33. Kuijpers, T., van der Windt, D.A., van der Heijden, G.J., Twisk, J.W., Vergouwe, Y., Bouter, L.M.: A prediction rule for shoulder pain related sick leave: a prospective cohort study. BMC Musculoskelet. Disord. 7, 1–11 (2006)

    Article  Google Scholar 

  34. Labbé, M., Martínez-Merino, L.I., Rodríguez-Chía, A.M.: Mixed integer linear programming for feature selection in support vector machine. Discret. Appl. Math. 261, 276–304 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  35. LeDell, E., Petersen, M., van der Laan, M.: Computationally efficient confidence intervals for cross-validated area under the roc curve estimates. Elect. J. Statist. 9(1), 1583 (2015)

    MathSciNet  MATH  Google Scholar 

  36. Lee, I.G., Zhang, Q., Yoon, S.W., Won, D.: A mixed integer linear programming support vector machine for cost-effective feature selection. Knowl. Based Syst. 203, 106145 (2020)

    Article  Google Scholar 

  37. Little, M., McSharry, P., Hunter, E., Spielman, J., Ramig, L.: Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. In: Nature Precedings, p. 1 (2008)

    Google Scholar 

  38. Moons, K.G., et al.: Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (tripod): explanation and elaboration. Ann. Intern. Med. 162(1), W1–W73 (2015)

    Article  Google Scholar 

  39. Moons, K.G., et al.: Probast: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann. Intern. Med. 170(1), W1–W33 (2019)

    Article  Google Scholar 

  40. MOSEK ApS: MOSEK modeling cookbook (2022)

    Google Scholar 

  41. MOSEK ApS: MOSEK optimizer API for Python (2023)

    Google Scholar 

  42. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  43. Scheele, J., et al.: Course and prognosis of older back pain patients in general practice: a prospective cohort study. PAIN® 154(6), 951–957 (2013)

    Article  Google Scholar 

  44. Steinberg, E., Jung, K., Fries, J.A., Corbin, C.K., Pfohl, S.R., Shah, N.H.: Language models are an effective representation learning technique for electronic health record data. J. Biomed. Inform. 113, 103637 (2021)

    Article  Google Scholar 

  45. Steyerberg, E.W.: Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-16399-0

    Book  MATH  Google Scholar 

  46. Tamura, R., Takano, Y., Miyashiro, R.: Feature subset selection for kernel SVM classification via mixed-integer optimization. arXiv preprint arXiv:2205.14325 (2022)

  47. Tibshirani, R.: Regression shrinkage and selection via the LASSO. J. Roy. Stat. Soc.: Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  48. Tillmann, A.M., Bienstock, D., Lodi, A., Schwartz, A.: Cardinality minimization, constraints, and regularization: a survey. arXiv preprint arXiv:2106.09606 (2021)

  49. Wippert, P.M., et al.: Development of a risk stratification and prevention index for stratified care in chronic low back pain. Focus yellow flags (MiSpEx network). Pain Rep. 2(6), e623 (2017)

    Article  Google Scholar 

  50. Wornow, M., et al.: The shaky foundations of clinical foundation models: a survey of large language models and foundation models for EMRs. arXiv preprint arXiv:2303.12961 (2023)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ricardo Knauer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Knauer, R., Rodner, E. (2023). Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective. In: Seipel, D., Steen, A. (eds) KI 2023: Advances in Artificial Intelligence. KI 2023. Lecture Notes in Computer Science(), vol 14236. Springer, Cham. https://doi.org/10.1007/978-3-031-42608-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42608-7_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42607-0

  • Online ISBN: 978-3-031-42608-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics