A review of automatic selection methods for machine learning algorithms and hyper-parameter values

  • Gang Luo
Original Article


Machine learning studies automatic algorithms that improve themselves through experience. It is widely used for analyzing and extracting value from large biomedical data sets, or “big biomedical data,” advancing biomedical research, and improving healthcare. Before a machine learning model is trained, the user of a machine learning software tool typically must manually select a machine learning algorithm and set one or more model parameters termed hyper-parameters. The algorithm and hyper-parameter values used can greatly impact the resulting model’s performance, but their selection requires special expertise as well as many labor-intensive manual iterations. To make machine learning accessible to layman users with limited computing expertise, computer science researchers have proposed various automatic selection methods for algorithms and/or hyper-parameter values for a given supervised machine learning problem. This paper reviews these methods, identifies several of their limitations in the big biomedical data environment, and provides preliminary thoughts on how to address these limitations. These findings establish a foundation for future research on automatically selecting algorithms and hyper-parameter values for analyzing big biomedical data.


Machine learning Big biomedical data Automatic algorithm selection Automatic hyper-parameter value selection 



We thank Qing T. Zeng, Michael Conway, Philip J. Brewster, David E. Jones, Angela P. Presson, Yue Zhang, Tom Greene, Alun Thomas, and Selena B. Thomas for helpful discussions.

Compliance with ethical standards

Conflict of interest

The author reports no conflicts of interest.


  1. Adankon MM, Cheriet M (2009) Model selection for the LS-SVM. Application to handwriting recognition. Pattern Recognit 42(12):3264–3270CrossRefzbMATHGoogle Scholar
  2. Ali A, Caruana R, Kapoor A (2014) Active learning with model selection. In: Proceedings of AAAI’14, pp 1673–1679Google Scholar
  3. Alpaydin E (2014) Introduction to machine learning, 3rd edn. The MIT Press, CambridgezbMATHGoogle Scholar
  4. Bardenet R, Brendel M, Kégl B, Sebag M (2013) Collaborative hyperparameter tuning. In: Proceedings of ICML’13, pp 199–207Google Scholar
  5. Bengio Y (2000) Gradient-based optimization of hyperparameters. Neural Comput 12(8):1889–1900CrossRefGoogle Scholar
  6. Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305MathSciNetzbMATHGoogle Scholar
  7. Bergstra J, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Proceedings of NIPS’11, pp 2546–2554Google Scholar
  8. Bergstra J, Yamins D, Cox DD (2013) Hyperopt: a Python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of SciPy 2013, pp 13–20Google Scholar
  9. Bertsekas DP (1999) Nonlinear programming, 2nd edn. Athena Scientific, BelmontzbMATHGoogle Scholar
  10. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  11. Brazdil P, Soares C, da Costa JP (2003) Ranking learning algorithms: using IBL and meta-learning on accuracy and time results. Mach Learn 50(3):251–277CrossRefzbMATHGoogle Scholar
  12. Burnham KP, Anderson DR (2003) Model selection and multimodel inference: a practical information-theoretic approach, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  13. Caruana R, Niculescu-Mizil A, Crew G, Ksikes A (2004) Ensemble selection from libraries of models. In: Proceedings of ICML’04Google Scholar
  14. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M et al. (2006) Bigtable: a distributed storage system for structured data. In: Proceedings of OSDI’06, pp 205–218Google Scholar
  15. Claeskens G, Hjort N (2008) Model selection and model averaging. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  16. Cleophas TJ, Zwinderman AH (2013a) Machine learning in medicine. Springer, New YorkCrossRefGoogle Scholar
  17. Cleophas TJ, Zwinderman AH (2013b) Machine learning in medicine: Part 2. Springer, New YorkCrossRefGoogle Scholar
  18. Cleophas TJ, Zwinderman AH (2013c) Machine learning in medicine: Part 3. Springer, New YorkCrossRefGoogle Scholar
  19. Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: Proceedings of OSDI’04, pp 137–150Google Scholar
  20. Domhan T, Springenberg JT, Hutter F (2015) Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In: Proceedings of IJCAI’15, pp 3460–3468Google Scholar
  21. Einbinder JS, Scully KW, Pates RD, Schubart JR, Reynolds RE (2001) Case study: a data warehouse for an academic medical center. J Healthc Inf Manag. 15(2):165–175Google Scholar
  22. Feurer M, Klein A, Eggensperger K, Springenberg J, Blum M, Hutter F (2015a) Efficient and robust automated machine learning. In: Proceedings of NIPS’15, pp 2944–2952Google Scholar
  23. Feurer M, Springenberg T, Hutter F (2015b) Initializing Bayesian hyperparameter optimization via meta-learning. In: Proceedings of AAAI’15, pp 1128–1135Google Scholar
  24. Fürnkranz J, Petrak J (2001) An evaluation of landmarking variants. In: Proceedings ECML/PKDD Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning 2001, pp 57–68Google Scholar
  25. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRC, Boca RatonzbMATHGoogle Scholar
  26. Google Prediction API homepage (2016) Accessed 20 January 2016
  27. Gu B, Liu B, Hu F, Liu H (2001) Efficiently determining the starting sample size for progressive sampling. In: Proceedings of ECML’01, pp 192–202Google Scholar
  28. Guo XC, Yang JH, Wu CG, Wang CY, Liang YC (2008) A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing 71(16–18):3211–3215CrossRefGoogle Scholar
  29. Guyon I, Bennett K, Cawley GC, Escalante HJ, Escalera S, Ho TK, Macià N, Ray B, Saeed M, Statnikov AR, Viegas E (2015) Design of the 2015 ChaLearn AutoML challenge. In: Proceedings of IJCNN’15, pp 1–8Google Scholar
  30. Hendry DF, Doornik JA (2014) Empirical model discovery and theory evaluation: automatic selection methods in econometrics. The MIT Press, CambridgeCrossRefGoogle Scholar
  31. Hoffman MD, Shahriari B, de Freitas N (2014) On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In: Proceedings of AISTATS’14, pp 365–374Google Scholar
  32. Hutter F, Hoos HH, Leyton-Brown K, Stützle T (2009) ParamILS: an automatic algorithm configuration framework. J Artif Intell Res 36:267–306zbMATHGoogle Scholar
  33. Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: Proceedings of LION’11, pp 507–523Google Scholar
  34. Hutter F, Hoos H, Leyton-Brown K (2014) An efficient approach for assessing hyperparameter importance. In: Proceedings of ICML’14, pp 754–762Google Scholar
  35. John GH, Langley P (1996) Static versus dynamic sampling for data mining. In: Proceedings of KDD’96, pp 367–370Google Scholar
  36. Jovic A, Brkic K, Bogunovic N (2014) An overview of free software tools for general data mining. In: Proceedings of MIPRO’14, pp 1112–1117Google Scholar
  37. Kadane JB, Lazar NA (2004) Methods and criteria for model selection. J Am Stat Assoc 99(465):279–290MathSciNetCrossRefzbMATHGoogle Scholar
  38. Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: Proceedings of SciPy 2014, pp 33–39Google Scholar
  39. Kraska T, Talwalkar A, Duchi JC, Griffith R, Franklin MJ, Jordan MI (2013) MLbase: a distributed machine-learning system. In: Proceedings of CIDR’13Google Scholar
  40. Lacoste A, Larochelle H, Marchand M, Laviolette F (2014a) Sequential model-based ensemble optimization. In: Proceedings of UAI’14, pp 440–448Google Scholar
  41. Lacoste A, Marchand M, Laviolette F, Larochelle H (2014b) Agnostic Bayesian learning of ensembles. In: Proceedings of ICML’14, pp 611–619Google Scholar
  42. Leite R, Brazdil P (2005) Predicting relative performance of classifiers from samples. In: Proceedings of ICML’05, pp 497–503Google Scholar
  43. Leite R, Brazdil P (2010) Active testing strategy to predict the best classification algorithm via sampling and metalearning. In: Proceedings of ECAI’10, pp 309–314Google Scholar
  44. Leite R, Brazdil P, Vanschoren J (2012) Selecting classification algorithms with active testing. In: Proceedings of MLDM’12, pp 117–131Google Scholar
  45. Liu H, Motoda H (2013) Feature selection for knowledge discovery and data mining. Springer, New YorkzbMATHGoogle Scholar
  46. Luo G (2015) MLBCD: a machine learning tool for big clinical data. Health Inf Sci Syst 3:3CrossRefGoogle Scholar
  47. Luo G (2016) Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction. Health Inf Sci Syst 4:2CrossRefGoogle Scholar
  48. Luo G, Frey LJ (2016) Efficient execution methods of pivoting for bulk extraction of Entity–Attribute–Value-modeled data. IEEE J Biomed Health Inform. 20(2):644–654CrossRefGoogle Scholar
  49. Luo G, Nkoy FL, Gesteland PH, Glasgow TS, Stone BL (2014) A systematic review of predictive modeling for bronchiolitis. Int J Med Inform 83(10):691–714CrossRefGoogle Scholar
  50. Luo G, Nkoy FL, Stone BL, Schmick D, Johnson MD (2015a) A systematic review of predictive models for asthma development in children. BMC Med Inform Decis Mak 15(1):99CrossRefGoogle Scholar
  51. Luo G, Stone BL, Sakaguchi F, Sheng X, Murtaugh MA (2015b) Using computational approaches to improve risk-stratified patient management: rationale and methods. JMIR Res Protoc. 4(4):e128CrossRefGoogle Scholar
  52. Luo G, Stone BL, Johnson MD, Nkoy FL (2016) Predicting appropriate admission of bronchiolitis patients in the emergency room: rationale and methods. JMIR Res Protoc. 5(1):e41CrossRefGoogle Scholar
  53. Maron O, Moore AW (1993) Hoeffding races: accelerating model selection search for classification and function approximation. In: Proceedings of NIPS’93, pp 59–66Google Scholar
  54. Nadkarni PM (2011) Metadata-driven software systems in biomedicine: designing systems that can adapt to changing knowledge. Springer, New YorkCrossRefGoogle Scholar
  55. Nocedal J, Wright S (2006) Numerical optimization, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  56. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830MathSciNetzbMATHGoogle Scholar
  57. Petrak J (2000) Fast subsampling performance estimates for classification algorithm selection. In: Proceedings of the ECML Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination 2000, pp 3–14Google Scholar
  58. Pfahringer B, Bensusan H, Giraud-Carrier CG (2000) Meta-learning by landmarking various learning algorithms. In: Proceedings of ICML’00, pp 743–750Google Scholar
  59. Provost FJ, Jensen D, Oates T (1999) Efficient progressive sampling. In: Proceedings of KDD’99, pp 23–32Google Scholar
  60. Roski J, Bo-Linn GW, Andrews TA (2014) Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood) 33(7):1115–1122CrossRefGoogle Scholar
  61. Sabharwal A, Samulowitz H, Tesauro G (2016) Selecting near-optimal learners via incremental data allocation. In: Proceedings of AAAI’16Google Scholar
  62. Shahriari B, Swersky K, Wang Z, Adams RP, de Freitas N (2015) Taking the human out of the loop: a review of Bayesian optimization. Proc IEEE 104(1):148–175CrossRefGoogle Scholar
  63. Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. In: Proceedings of NIPS’12, pp 2960–2968Google Scholar
  64. Soares C, Petrak J, Brazdil P (2001) Sampling-based relative landmarks: systematically test-driving algorithms before choosing. In: Proceedings of EPIA’01, pp 88–95Google Scholar
  65. Sparks ER, Talwalkar A, Smith V, Kottalam J, Pan X, Gonzalez JE et al. (2013) MLI: an API for distributed machine learning. In: Proceedings of ICDM’13, pp 1187–1192Google Scholar
  66. Sparks ER, Talwalkar A, Haas D, Franklin MJ, Jordan MI, Kraska T (2015) Automating model search for large scale machine learning. In: Proceedings of SoCC’15, pp 368–380Google Scholar
  67. Steyerberg EW (2009) Clinical prediction models: a practical approach to development, validation, and updating. Springer, New YorkCrossRefzbMATHGoogle Scholar
  68. Swersky K, Snoek J, Adams RP (2013) Multi-task Bayesian optimization. In: Proceedings of NIPS’13, 2004–2012Google Scholar
  69. Swersky K, Snoek J, Adams RP (2014) Freeze-thaw Bayesian optimization. Accessed 20 January 2016
  70. Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of KDD’13, pp 847–855Google Scholar
  71. van Rijn JN, Abdulrahman SM, Brazdil P, Vanschoren J (2015) Fast algorithm selection using learning curves. In: Proceedings of IDA’15, pp 298–309Google Scholar
  72. Wang L, Feng M, Zhou B, Xiang B, Mahadevan S (2015) Efficient hyper-parameter optimization for NLP applications. In: Proceedings of EMNLP’15, 2112–2117Google Scholar
  73. White JM (2013) Bandit algorithms for website optimization. O’Reilly Media, SebastopolGoogle Scholar
  74. Wistuba M, Schilling N, Schmidt-Thieme L (2015a) Hyperparameter search space pruning—a new component for sequential model-based hyperparameter optimization. In: Proceedings of ECML/PKDD (2) 2015, pp 104–119Google Scholar
  75. Wistuba M, Schilling N, Schmidt-Thieme L (2015b) Learning hyperparameter optimization initializations. In: Proceedings of DSAA’15, pp 1–10Google Scholar
  76. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, BurlingtonGoogle Scholar
  77. Yogatama D, Mann G (2014) Efficient transfer learning method for automatic hyperparameter tuning. In: Proceedings of AISTATS’14, pp 1077–1085Google Scholar
  78. Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of HotCloud 2010Google Scholar
  79. Zhou Z (2012) Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, Boca RatonGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Department of Biomedical InformaticsUniversity of UtahSalt Lake CityUSA

Personalised recommendations