Skip to main content

Machine Learning—Automated Machine Learning (AutoML) for Disease Prediction

  • Chapter
  • First Online:
Clinical Applications of Artificial Intelligence in Real-World Data

Abstract

The selection and tuning of feature selection, feature engineering, and classification or regression algorithms is a major challenge in machine learning, affecting both beginners and experts. Automated machine learning (AutoML) offers a solution by automating the creation of machine learning pipelines, eliminating the guesswork associated with a manual process. This chapter reviews the challenges of building pipelines and introduces some of the most widely used AutoML methods and open-source software. We focus on TPOT, an AutoML method that utilizes genetic programming for discovery and optimization and represents pipelines as expression trees. We also explore TPOT extensions and its use in handling biomedical big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLOS Comput Biol. 2022;18: e1010718.

    Article  Google Scholar 

  2. Combi C, Amico B, Bellazzi R, Holzinger A, Moore JH, Zitnik M, et al. A manifesto on explainability for artificial intelligence in medicine. Artif Intell Med. 2022;133: 102423.

    Article  Google Scholar 

  3. Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning: methods, systems, challenges. Springer; 2019.

    Google Scholar 

  4. Thornton C, Hutter F, Hoos HH, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. New York, NY, USA: ACM; 2013. p. 847–55.

    Google Scholar 

  5. Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Efficient and Robust automated machine learning. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Advances in neural information processing systems 28. Curran Associates Inc; 2015. p. 2962–70.

    Google Scholar 

  6. Olson RS, Bartley N, Urbanowicz RJ, Moore JH. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the genetic and evolutionary computation conference 2016. New York, NY, USA: ACM; 2016. p. 485–92.

    Google Scholar 

  7. Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd LC, Moore JH. Automating biomedical data science through tree-based pipeline optimization. In: Squillero G, Burelli P, editors. Applications of Evolutionary Computation. Cham: Springer; 2016. p. 123–37.

    Chapter  Google Scholar 

  8. Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K. Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J Mach Learn Res. 2017;18:826–30.

    Google Scholar 

  9. Wang H-L, Hsu W-Y, Lee M-H, Weng H-H, Chang S-W, Yang J-T, et al. Automatic machine-learning-based outcome prediction in patients with primary intracerebral hemorrhage. Front Neurol. 2019;10:910.

    Article  Google Scholar 

  10. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    MathSciNet  MATH  Google Scholar 

  11. Howard D, Maslej MM, Lee J, Ritchie J, Woollard G, French L. Transfer learning for risk classification of social media posts: model evaluation study. J Med Internet Res. 2020;22: e15371.

    Article  Google Scholar 

  12. van Eeden WA, Luo C, van Hemert AM, Carlier IVE, Penninx BW, Wardenaar KJ, et al. Predicting the 9-year course of mood and anxiety disorders with automated machine learning: a comparison between auto-sklearn, naïve Bayes classifier, and traditional logistic regression. Psychiatry Res. 2021;299: 113823.

    Article  Google Scholar 

  13. Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press; 1992.

    MATH  Google Scholar 

  14. Fortin F-A, Rainville F-MD, Gardner M-A, Parizeau M, Gagné C. DEAP: evolutionary algorithms made easy. J Mach Learn Res. 2012;13:2171−2175.

    Google Scholar 

  15. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002;6:182–97.

    Article  Google Scholar 

  16. Helmuth T, McPhee NF, Spector L. Lexicase selection for program synthesis: a diversity analysis. In: Riolo R, Worzel WP, Kotanchek M, Kordon A, editors. Genetic programming theory and practice XIII. Cham: Springer; 2016. p. 151–67.

    Chapter  Google Scholar 

  17. Le TT, Fu W, Moore JH. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinforma Oxf Engl. 2020;36:250–6.

    Article  Google Scholar 

  18. Romano J, Le T, Fu W, Moore J. TPOT-NN: augmenting tree-based automated machine learning with neural network estimators. Genet Program Evolvable Mach. 2021;1–21.

    Google Scholar 

  19. Manduchi E, Romano JD, Moore JH. The promise of automated machine learning for the genetic analysis of complex traits. Hum Genet. 2022;141:1529–44.

    Article  Google Scholar 

  20. Orlenko A, Kofink D, Lyytikäinen L-P, Nikus K, Mishra P, Kuukasjärvi P, et al. Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinforma Oxf Engl. 2020;36:1772–8.

    Article  Google Scholar 

  21. Manduchi E, Fu W, Romano JD, Ruberto S, Moore JH. Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinf. 2020;21:430.

    Article  Google Scholar 

  22. Purkayastha S, Zhao Y, Wu J, Hu R, McGirr A, Singh S, et al. Differentiation of low and high grade renal cell carcinoma on routine MRI with an externally validated automatic machine learning algorithm. Sci Rep. 2020;10:19503.

    Article  Google Scholar 

  23. Heimisdottir LH, Lin BM, Cho H, Orlenko A, Ribeiro AA, Simon-Soro A, et al. Metabolomics insights in early childhood caries. J Dent Res. 2021;100:615–22.

    Article  Google Scholar 

  24. Manduchi E, Le TT, Fu W, Moore JH. Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1379–86.

    Article  Google Scholar 

  25. Tragante V, Hemerich D, Alshabeeb M, Brænne I, Lempiäinen H, Patel RS, et al. Druggability of coronary artery disease risk loci. Circ Genomic Precis Med. 2018;11: e001977.

    Article  Google Scholar 

  26. La Cava W, Williams H, Fu W, Vitale S, Srivatsan D, Moore JH. Evaluating recommender systems for AI-driven biomedical informatics. Bioinforma Oxf Engl. 2021;37:250–6.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason H. Moore .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Moore, J.H., Ribeiro, P.H., Matsumoto, N., Saini, A.K. (2023). Machine Learning—Automated Machine Learning (AutoML) for Disease Prediction. In: Asselbergs, F.W., Denaxas, S., Oberski, D.L., Moore, J.H. (eds) Clinical Applications of Artificial Intelligence in Real-World Data. Springer, Cham. https://doi.org/10.1007/978-3-031-36678-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36678-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36677-2

  • Online ISBN: 978-3-031-36678-9

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics