Skip to main content
Log in

Automated machine learning with dynamic ensemble selection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Automated machine learning (AutoML) has been developed for automatically building effective machine learning pipelines. However, existing AutoML frameworks use a single individual pipeline or a weighted ensemble of several pipelines to create the final predictive model, which ignores the difference between the unseen instances and leads to undesirable performance. To construct customized models for different unseen instances, we propose a novel AutoML method based on dynamic ensemble selection of machine learning pipelines, where the most competent combination of base pipelines is selected and aggregated to predict a specific unseen instance. First, an effective base pipeline pool is generated by filtering out the underperforming pipelines. Second, when an unseen instance appears, we deploy a new dynamic balanced accuracy criterion to select the most competent ensemble of base pipelines according to its local region. Finally, the outputs of the selected pipelines are integrated to give the final prediction. Comprehensive experiments on 39 publicly available datasets demonstrate the superiority of the proposed method compared to some state-of-the-art AutoML frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bergstra J, Bardenet R, Bengio Y, et al (2011) Algorithms for hyper-parameter optimization. Advances in neural information processing systems 24

  2. Bergstra J, Yamins D, Cox DD, et al (2013) Hyperopt: A python library for optimizing the hyperparameters of machine learning algorithms. In: Proceedings of the 12th Python in science conference, Citeseer, p 20

  3. Brodersen KH, Ong CS, Stephan KE, et al (2010) The balanced accuracy and its posterior distribution. In: 2010 20th international conference on pattern recognition, IEEE, pp 3121–3124

  4. Brun AL, Britto AS, Oliveira LS, et al (2016) Contribution of data complexity features on dynamic classifier selection. In: 2016 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 4396–4403

  5. Caruana R, Niculescu-Mizil A, Crew G, et al (2004) Ensemble selection from libraries of models. In: Proceedings of the twenty-first international conference on Machine learning, p 18

  6. Caruana R, Munson A, Niculescu-Mizil A (2006) Getting the most out of ensemble selection. In: Sixth International Conference on Data Mining (ICDM’06), IEEE, pp 828–833

  7. Coello CAC, Lamont GB, Van Veldhuizen DA et al (2007) Evolutionary algorithms for solving multi-objective problems, vol 5. Springer

    MATH  Google Scholar 

  8. Crisan A, Fiore-Gartland B (2021) Fits and starts: Enterprise use of automl and the role of humans in the loop. In: Proceedings of the 2021 CHI Conference on human factors in computing systems, pp 1–15

  9. Cruz RM, Cavalcanti GD, Ren TI (2011) A method for dynamic ensemble selection based on a filter and an adaptive distance to improve the quality of the regions of competence. In: The 2011 International joint conference on neural networks, IEEE, pp 1126–1133

  10. Cruz RM, Sabourin R, Cavalcanti GD (2018) Dynamic classifier selection: Recent advances and perspectives. Inf Fusion 41:195–216

    Article  Google Scholar 

  11. Dos Santos EM, Sabourin R, Maupin P (2009) Overfitting cautious selection of classifier ensembles with genetic algorithms. Inf Fusion 10(2):150–162

    Article  Google Scholar 

  12. Dunn OJ (1964) Multiple comparisons using rank sums. Technometrics 6(3):241–252

    Article  Google Scholar 

  13. Fabris F, Freitas AA (2019) Analysing the overfit of the auto-sklearn automated machine learning tool. In: Machine Learning, optimization, and data science: 5th International conference, LOD 2019, Siena, Italy, September 10–13, 2019, Proceedings 5, Springer, pp 508–520

  14. Fakoor R, Mueller JW, Erickson N et al (2020) Fast, accurate, and simple models for tabular data via augmented distillation. Adv Neural Inf Process Syst 33:8671–8681

    Google Scholar 

  15. Feurer M, Klein A, Eggensperger K, et al (2015) Efficient and robust automated machine learning. Advances in neural information processing systems 28

  16. Feurer M, Eggensperger K, Falkner S, et al (2018) Practical automated machine learning for the automl challenge 2018. In: International workshop on automatic machine learning at ICML, pp 1189–1232

  17. Friedman M (1940) A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann Math Stat 11(1):86–92. https://doi.org/10.1214/aoms/1177731944

    Article  MathSciNet  MATH  Google Scholar 

  18. Galanopoulos A, Ayala-Romero JA, Leith DJ, et al (2021) Automl for video analytics with edge computing. In: IEEE INFOCOM 2021-IEEE Conference on computer communications, IEEE, pp 1–10

  19. Gijsbers P, Vanschoren J (2021) Gama: A general automated machine learning assistant. In: Dong Y, Ifrim G, Mladenić D et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, Applied data science and demo track, pp 560–564

    Google Scholar 

  20. Gijsbers P, LeDell E, Poirier S, et al (2019) An open source automl benchmark. In: 2019 International conference on machine learning AutoML Workshop

  21. Guyon I, Saffari A, Dror G, et al (2010) Model selection: beyond the bayesian/frequentist divide. Journal of Machine Learning Research 11(1)

  22. Guyon I, Bennett K, Cawley G, et al (2015) Design of the 2015 chalearn automl challenge. In: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, pp 1–8

  23. Guyon I, Sun-Hosoya L, Boullé M, et al (2019) Analysis of the automl challenge series. Automated Machine Learning p 177

  24. He X, Zhao K, Chu X (2021) Automl: A survey of the state-of-the-art. Knowledge-Based Systems 212(106):622

    Google Scholar 

  25. Hutter F, Hoos HH, Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In: International conference on learning and intelligent optimization, Springer, pp 507–523

  26. Ko AH, Sabourin R, Britto AS Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731

    Article  MATH  Google Scholar 

  27. Komer B, Bergstra J, Eliasmith C (2014) Hyperopt-sklearn: automatic hyperparameter configuration for scikit-learn. In: ICML workshop on AutoML, Citeseer, p 50

  28. Kotthoff L, Thornton C, Hoos HH et al (2017) Auto-weka 2.0: Automatic model selection and hyperparameter optimization in weka. J Mach Learn Res 18(25):1–5

    MathSciNet  Google Scholar 

  29. Van der Laan MJ, Polley EC, Hubbard AE (2007) Super learner. Statistical applications in genetics and molecular biology 6(1)

  30. Lacoste A, Larochelle H, Marchand M, et al (2014) Sequential model-based ensemble optimization. In: Proceedings of the 38th Conference on uncertainty in artificial intelligence, pp 440–448

  31. Lacoste A, Marchand M, Laviolette F, et al (2014) Agnostic bayesian learning of ensembles. In: International conference on machine learning, PMLR, pp 611–619

  32. LeDell E, Poirier S (2020) H2o automl: Scalable automatic machine learning. In: Proceedings of the AutoML Workshop at ICML

  33. Liu W, Wang H, Shen X, et al (2021) The emerging trends of multi-label learning. IEEE transactions on pattern analysis and machine intelligence

  34. Liu Y, Liu J, Li Y (2022) Automatic search of architecture and hyperparameters of graph convolutional networks for node classification. Applied Intelligence pp 1–16

  35. Olson RS, Moore JH (2016) Tpot: A tree-based pipeline optimization tool for automating machine learning. In: Workshop on automatic machine learning, PMLR, pp 66–74

  36. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  37. Rakotoarison H, Schoenauer M, Sebag M (2019) Automated machine learning with monte-carlo tree search. In: IJCAI-19-28th International joint conference on artificial intelligence, international joint conferences on artificial intelligence organization, pp 3296–3303

  38. Sapra D, Pimentel AD (2022) Designing convolutional neural networks with constrained evolutionary piecemeal training. Appl Intell 52(15):17103–17117

    Article  Google Scholar 

  39. Shahriari B, Swersky K, Wang Z et al (2015) Taking the human out of the loop: A review of bayesian optimization. Proc IEEE 104(1):148–175

    Article  Google Scholar 

  40. Snoek J, Rippel O, Swersky K, et al (2015) Scalable bayesian optimization using deep neural networks. In: International conference on machine learning, PMLR, pp 2171–2180

  41. Soares RG, Santana A, Canuto AM, et al (2006) Using accuracy and diversity to select classifiers to build ensembles. In: The 2006 IEEE International joint conference on neural network proceedings, IEEE, pp 1310–1316

  42. Swearingen T, Drevo W, Cyphers B, et al (2017) Atm: A distributed, collaborative, scalable system for automated machine learning. In: 2017 IEEE international conference on big data (big data), IEEE, pp 151–162

  43. Thornton C, Hutter F, Hoos HH, et al (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855

  44. Vanschoren J, Van Rijn JN, Bischl B et al (2014) Openml: networked science in machine learning. ACM SIGKDD Explorations Newsletter 15(2):49–60

    Article  Google Scholar 

  45. Wei XS, Ye HJ, Mu X et al (2019) Multi-instance learning with emerging novel class. IEEE Trans Knowl Data Eng 33(5):2109–2120

  46. Weng W, Wei B, Ke W et al (2023) Learning label-specific features with global and local label correlation for multi-label classification. Appl Intell 53(3):3017–3033

    Article  Google Scholar 

  47. Wever M, Tornede A, Mohr F, et al (2021) Automl for multi-label classification: Overview and empirical evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence

  48. Wistuba M, Schilling N, Schmidt-Thieme L (2017) Automatic frankensteining: Creating complex ensembles autonomously. In: Proceedings of the 2017 SIAM International conference on data mining, SIAM, pp 741–749

  49. Woloszynski T, Kurzynski M (2011) A probabilistic model of classifier competence for dynamic ensemble selection. Pattern Recognit 44(10–11):2656–2668

    Article  MATH  Google Scholar 

  50. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  51. Woods K, Kegelmeyer WP, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4):405–410

    Article  Google Scholar 

  52. Xiao J, Xie L, He C et al (2012) Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Exp Syst Appl 39(3):3668–3675

    Article  Google Scholar 

  53. Xu T, Gondra I, Chiu DK (2017) A maximum partial entropy-based method for multiple-instance concept learning. Appl Intell 46:865–875

    Article  Google Scholar 

  54. Zöller MA, Huber MF (2021) Benchmark and survey of automated machine learning frameworks. J Artif Intell Res 70:409–472

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyan Zhu.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Ren, J., Wang, J. et al. Automated machine learning with dynamic ensemble selection. Appl Intell 53, 23596–23612 (2023). https://doi.org/10.1007/s10489-023-04770-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04770-7

Keywords

Navigation