Abstract
The selection and tuning of feature selection, feature engineering, and classification or regression algorithms is a major challenge in machine learning, affecting both beginners and experts. Automated machine learning (AutoML) offers a solution by automating the creation of machine learning pipelines, eliminating the guesswork associated with a manual process. This chapter reviews the challenges of building pipelines and introduces some of the most widely used AutoML methods and open-source software. We focus on TPOT, an AutoML method that utilizes genetic programming for discovery and optimization and represents pipelines as expression trees. We also explore TPOT extensions and its use in handling biomedical big data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chicco D, Oneto L, Tavazzi E. Eleven quick tips for data cleaning and feature engineering. PLOS Comput Biol. 2022;18: e1010718.
Combi C, Amico B, Bellazzi R, Holzinger A, Moore JH, Zitnik M, et al. A manifesto on explainability for artificial intelligence in medicine. Artif Intell Med. 2022;133: 102423.
Hutter F, Kotthoff L, Vanschoren J, editors. Automated machine learning: methods, systems, challenges. Springer; 2019.
Thornton C, Hutter F, Hoos HH, Leyton-Brown K. Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. New York, NY, USA: ACM; 2013. p. 847–55.
Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F. Efficient and Robust automated machine learning. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Advances in neural information processing systems 28. Curran Associates Inc; 2015. p. 2962–70.
Olson RS, Bartley N, Urbanowicz RJ, Moore JH. Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of the genetic and evolutionary computation conference 2016. New York, NY, USA: ACM; 2016. p. 485–92.
Olson RS, Urbanowicz RJ, Andrews PC, Lavender NA, Kidd LC, Moore JH. Automating biomedical data science through tree-based pipeline optimization. In: Squillero G, Burelli P, editors. Applications of Evolutionary Computation. Cham: Springer; 2016. p. 123–37.
Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K. Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J Mach Learn Res. 2017;18:826–30.
Wang H-L, Hsu W-Y, Lee M-H, Weng H-H, Chang S-W, Yang J-T, et al. Automatic machine-learning-based outcome prediction in patients with primary intracerebral hemorrhage. Front Neurol. 2019;10:910.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
Howard D, Maslej MM, Lee J, Ritchie J, Woollard G, French L. Transfer learning for risk classification of social media posts: model evaluation study. J Med Internet Res. 2020;22: e15371.
van Eeden WA, Luo C, van Hemert AM, Carlier IVE, Penninx BW, Wardenaar KJ, et al. Predicting the 9-year course of mood and anxiety disorders with automated machine learning: a comparison between auto-sklearn, naïve Bayes classifier, and traditional logistic regression. Psychiatry Res. 2021;299: 113823.
Koza JR. Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA, USA: MIT Press; 1992.
Fortin F-A, Rainville F-MD, Gardner M-A, Parizeau M, Gagné C. DEAP: evolutionary algorithms made easy. J Mach Learn Res. 2012;13:2171−2175.
Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput. 2002;6:182–97.
Helmuth T, McPhee NF, Spector L. Lexicase selection for program synthesis: a diversity analysis. In: Riolo R, Worzel WP, Kotanchek M, Kordon A, editors. Genetic programming theory and practice XIII. Cham: Springer; 2016. p. 151–67.
Le TT, Fu W, Moore JH. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinforma Oxf Engl. 2020;36:250–6.
Romano J, Le T, Fu W, Moore J. TPOT-NN: augmenting tree-based automated machine learning with neural network estimators. Genet Program Evolvable Mach. 2021;1–21.
Manduchi E, Romano JD, Moore JH. The promise of automated machine learning for the genetic analysis of complex traits. Hum Genet. 2022;141:1529–44.
Orlenko A, Kofink D, Lyytikäinen L-P, Nikus K, Mishra P, Kuukasjärvi P, et al. Model selection for metabolomics: predicting diagnosis of coronary artery disease using automated machine learning. Bioinforma Oxf Engl. 2020;36:1772–8.
Manduchi E, Fu W, Romano JD, Ruberto S, Moore JH. Embedding covariate adjustments in tree-based automated machine learning for biomedical big data analyses. BMC Bioinf. 2020;21:430.
Purkayastha S, Zhao Y, Wu J, Hu R, McGirr A, Singh S, et al. Differentiation of low and high grade renal cell carcinoma on routine MRI with an externally validated automatic machine learning algorithm. Sci Rep. 2020;10:19503.
Heimisdottir LH, Lin BM, Cho H, Orlenko A, Ribeiro AA, Simon-Soro A, et al. Metabolomics insights in early childhood caries. J Dent Res. 2021;100:615–22.
Manduchi E, Le TT, Fu W, Moore JH. Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1379–86.
Tragante V, Hemerich D, Alshabeeb M, Brænne I, Lempiäinen H, Patel RS, et al. Druggability of coronary artery disease risk loci. Circ Genomic Precis Med. 2018;11: e001977.
La Cava W, Williams H, Fu W, Vitale S, Srivatsan D, Moore JH. Evaluating recommender systems for AI-driven biomedical informatics. Bioinforma Oxf Engl. 2021;37:250–6.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Moore, J.H., Ribeiro, P.H., Matsumoto, N., Saini, A.K. (2023). Machine Learning—Automated Machine Learning (AutoML) for Disease Prediction. In: Asselbergs, F.W., Denaxas, S., Oberski, D.L., Moore, J.H. (eds) Clinical Applications of Artificial Intelligence in Real-World Data. Springer, Cham. https://doi.org/10.1007/978-3-031-36678-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-36678-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36677-2
Online ISBN: 978-3-031-36678-9
eBook Packages: MedicineMedicine (R0)