Exploring the Reduction of Configuration Spaces of Workflows

Freitas, Fernando; Brazdil, Pavel; Soares, Carlos

doi:10.1007/978-3-031-45275-8_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14276))

Included in the following conference series:

International Conference on Discovery Science

508 Accesses

Abstract

Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdulrahman, S.M., Brazdil, P., van Rijn, J.N., Vanschoren, J.: Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach. Learn. 107, 79–108 (2017). https://doi.org/10.1007/s10994-017-5687-8. Special Issue on Metalearning and Algorithm Selection
Article MathSciNet MATH Google Scholar
Abdulrahman, S.M., Brazdil, P., Zinon, M., Adamu, A.: Simplifying the algorithm selection using reduction of rankings of classification algorithms. In: ICSCA 2019 Proceedings of the 8th International Conference on Software and Computer Applications, Malaysia, pp. 140–148. ACM, New York (2019)
Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Metalearning approaches for algorithm selection I (exploiting rankings). In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 19–37. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_2
Chapter Google Scholar
Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Setting-up configuration spaces and experiments. In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 143–168. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_8
Chapter Google Scholar
Eggensperger, K., et al.: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS Workshop on Bayesian Optimization in Theory and Practice, pp. 1–5 (2013)
Google Scholar
Fawcett, C., Hoos, H.: Analysing differences between algorithm configurations through ablation. J. Heuristics 22(4), 431–458 (2016). https://doi.org/10.1007/s10732-014-9275-9
Article Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Google Scholar
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 113–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_6
Chapter Google Scholar
Fréchette, A., Kotthoff, L., Rahwan, T., Hoos, H., Leyton-Brown, K., Michalak, T.: Using the Shapley value to analyze algorithm portfolios. In: 30th AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Hetlerovič, D., Popelínskỳ, L., Brazdil, P., Soares, C., Freitas, F.: On usefulness of outlier elimination in classification tasks. In: International Symposium on Intelligent Data Analysis, pp. 143–156 (2022)
Google Scholar
Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, pp. 754–762 (2014)
Google Scholar
Pfisterer, F., van Rijn, J., Probst, P., Müller, A., Bischl, B.: Learning multiple defaults for machine learning algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 241–242 (2021)
Google Scholar
van Rijn, J.N., Hutter, F.: Hyperparameter importance across datasets. In: KDD 2018: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018)
Google Scholar
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)
Google Scholar
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2013)
Article Google Scholar
Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Evaluating component solver contributions to portfolio-based algorithm selectors. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 228–241. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31612-8_18
Chapter Google Scholar

Download references

Acknowledgements

This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020. The authors of this paper wish to thank the anonymous referees for their useful comments that helped us to improve the paper.

Author information

Authors and Affiliations

Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
Fernando Freitas & Carlos Soares
INESCTEC, Porto, Portugal
Pavel Brazdil
Faculdade de Economia, Universidade do Porto, Porto, Portugal
Pavel Brazdil
Fraunhofer AICOS Portugal, Porto, Portugal
Carlos Soares
Laboratory for Artificial Intelligence and Computer Science (LIACC), Porto, Portugal
Carlos Soares

Authors

Fernando Freitas
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fernando Freitas .

Editor information

Editors and Affiliations

Waikato University, Hamilton, New Zealand
Albert Bifet
Aeronautics Institute of Technology, São José dos Campos, Brazil
Ana Carolina Lorena
University of Porto, Porto, Portugal
Rita P. Ribeiro
University of Porto, Porto, Portugal
João Gama
University of Coimbra, Coimbra, Portugal
Pedro H. Abreu

Appendix

List of 41 datasets, represented by OpenML-Dataset-Name (OpenML-Dataset-ID), used in the experiments. This set is subset of 72 datasets of the benchmarking suite OpenML-CC18 (https://docs.openml.org/benchmark/#openml-cc18):

kr-vs-kp (3), letter (6), balance-scale (11), mfeat-factors (12),

mfeat-fourier (14), mfeat-karhunen (16), cmc (23), optdigits (28),

pendigits (32), diabetes (37), splice (46), tic-tac-toe (50),

vehicle (54), electricity (151), satimage (182), vowel (307),

isolet (300), analcatdata_authorship (458), analcatdata_dmft (469),

Bioresponse (4134), wdbc (1510), phoneme (1489), qsar-biodeg (1494),

wall-robot-navigation (1497), semeion (1501), ilpd (1480), madelon (1485),

ozone-level-8hr (1487), cnae-9 (1468), PhishingWebsites (4534),

GesturePhaseSegmentationProcessed (4538), har (1478), texture (40499),

climate-model-simulation-crashes (40994), wilt (40983), car (40975),

segment (40984), mfeat-pixel (40979), Internet-Advertisements (40978),

dna (40670), churn (40701).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Freitas, F., Brazdil, P., Soares, C. (2023). Exploring the Reduction of Configuration Spaces of Workflows. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-45275-8_3
Published: 08 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45274-1
Online ISBN: 978-3-031-45275-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring the Reduction of Configuration Spaces of Workflows

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation