Skip to main content

Exploring the Reduction of Configuration Spaces of Workflows

  • Conference paper
  • First Online:
Discovery Science (DS 2023)

Abstract

Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdulrahman, S.M., Brazdil, P., van Rijn, J.N., Vanschoren, J.: Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach. Learn. 107, 79–108 (2017). https://doi.org/10.1007/s10994-017-5687-8. Special Issue on Metalearning and Algorithm Selection

    Article  MathSciNet  MATH  Google Scholar 

  2. Abdulrahman, S.M., Brazdil, P., Zinon, M., Adamu, A.: Simplifying the algorithm selection using reduction of rankings of classification algorithms. In: ICSCA 2019 Proceedings of the 8th International Conference on Software and Computer Applications, Malaysia, pp. 140–148. ACM, New York (2019)

    Google Scholar 

  3. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  4. Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Metalearning approaches for algorithm selection I (exploiting rankings). In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 19–37. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_2

    Chapter  Google Scholar 

  5. Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Setting-up configuration spaces and experiments. In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 143–168. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_8

    Chapter  Google Scholar 

  6. Eggensperger, K., et al.: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS Workshop on Bayesian Optimization in Theory and Practice, pp. 1–5 (2013)

    Google Scholar 

  7. Fawcett, C., Hoos, H.: Analysing differences between algorithm configurations through ablation. J. Heuristics 22(4), 431–458 (2016). https://doi.org/10.1007/s10732-014-9275-9

    Article  Google Scholar 

  8. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)

    Google Scholar 

  9. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 113–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_6

    Chapter  Google Scholar 

  10. Fréchette, A., Kotthoff, L., Rahwan, T., Hoos, H., Leyton-Brown, K., Michalak, T.: Using the Shapley value to analyze algorithm portfolios. In: 30th AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  11. Hetlerovič, D., Popelínskỳ, L., Brazdil, P., Soares, C., Freitas, F.: On usefulness of outlier elimination in classification tasks. In: International Symposium on Intelligent Data Analysis, pp. 143–156 (2022)

    Google Scholar 

  12. Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, pp. 754–762 (2014)

    Google Scholar 

  13. Pfisterer, F., van Rijn, J., Probst, P., Müller, A., Bischl, B.: Learning multiple defaults for machine learning algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 241–242 (2021)

    Google Scholar 

  14. van Rijn, J.N., Hutter, F.: Hyperparameter importance across datasets. In: KDD 2018: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018)

    Google Scholar 

  15. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)

    Google Scholar 

  16. Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2013)

    Article  Google Scholar 

  17. Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Evaluating component solver contributions to portfolio-based algorithm selectors. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 228–241. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31612-8_18

    Chapter  Google Scholar 

Download references

Acknowledgements

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020. The authors of this paper wish to thank the anonymous referees for their useful comments that helped us to improve the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando Freitas .

Editor information

Editors and Affiliations

Appendix

Appendix

List of 41 datasets, represented by OpenML-Dataset-Name (OpenML-Dataset-ID), used in the experiments. This set is subset of 72 datasets of the benchmarking suite OpenML-CC18 (https://docs.openml.org/benchmark/#openml-cc18):

kr-vs-kp (3), letter (6), balance-scale (11), mfeat-factors (12),

mfeat-fourier (14), mfeat-karhunen (16), cmc (23), optdigits (28),

pendigits (32), diabetes (37), splice (46), tic-tac-toe (50),

vehicle (54), electricity (151), satimage (182), vowel (307),

isolet (300), analcatdata_authorship (458), analcatdata_dmft (469),

Bioresponse (4134), wdbc (1510), phoneme (1489), qsar-biodeg (1494),

wall-robot-navigation (1497), semeion (1501), ilpd (1480), madelon (1485),

ozone-level-8hr (1487), cnae-9 (1468), PhishingWebsites (4534),

GesturePhaseSegmentationProcessed (4538), har (1478), texture (40499),

climate-model-simulation-crashes (40994), wilt (40983), car (40975),

segment (40984), mfeat-pixel (40979), Internet-Advertisements (40978),

dna (40670), churn (40701).

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Freitas, F., Brazdil, P., Soares, C. (2023). Exploring the Reduction of Configuration Spaces of Workflows. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-45275-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-45274-1

  • Online ISBN: 978-3-031-45275-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics