Abstract
Many current AutoML platforms include a very large space of alternatives (the configuration space) that make it difficult to identify the best alternative for a given dataset. In this paper we explore a method that can reduce a large configuration space to a significantly smaller one and so help to reduce the search time for the potentially best workflow. We empirically validate the method on a set of workflows that include four ML algorithms (SVM, RF, LogR and LD) with different sets of hyperparameters. Our results show that it is possible to reduce the given space by more than one order of magnitude, from a few thousands to tens of workflows, while the risk that the best workflow is eliminated is nearly zero. The system after reduction is about one order of magnitude faster than the original one, but still maintains the same predictive accuracy and loss.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulrahman, S.M., Brazdil, P., van Rijn, J.N., Vanschoren, J.: Speeding up algorithm selection using average ranking and active testing by introducing runtime. Mach. Learn. 107, 79–108 (2017). https://doi.org/10.1007/s10994-017-5687-8. Special Issue on Metalearning and Algorithm Selection
Abdulrahman, S.M., Brazdil, P., Zinon, M., Adamu, A.: Simplifying the algorithm selection using reduction of rankings of classification algorithms. In: ICSCA 2019 Proceedings of the 8th International Conference on Software and Computer Applications, Malaysia, pp. 140–148. ACM, New York (2019)
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Metalearning approaches for algorithm selection I (exploiting rankings). In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 19–37. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_2
Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J.: Setting-up configuration spaces and experiments. In: Brazdil, P., van Rijn, J., Soares, C., Vanschoren, J. (eds.) Metalearning: Applications to Automated Machine Learning and Data Mining, pp. 143–168. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-67024-5_8
Eggensperger, K., et al.: Towards an empirical foundation for assessing Bayesian optimization of hyperparameters. In: NIPS Workshop on Bayesian Optimization in Theory and Practice, pp. 1–5 (2013)
Fawcett, C., Hoos, H.: Analysing differences between algorithm configurations through ablation. J. Heuristics 22(4), 431–458 (2016). https://doi.org/10.1007/s10732-014-9275-9
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, pp. 2962–2970 (2015)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J.T., Blum, M., Hutter, F.: Auto-sklearn: efficient and robust automated machine learning. In: Hutter, F., Kotthoff, L., Vanschoren, J. (eds.) Automated Machine Learning. TSSCML, pp. 113–134. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05318-5_6
Fréchette, A., Kotthoff, L., Rahwan, T., Hoos, H., Leyton-Brown, K., Michalak, T.: Using the Shapley value to analyze algorithm portfolios. In: 30th AAAI Conference on Artificial Intelligence (2016)
Hetlerovič, D., Popelínskỳ, L., Brazdil, P., Soares, C., Freitas, F.: On usefulness of outlier elimination in classification tasks. In: International Symposium on Intelligent Data Analysis, pp. 143–156 (2022)
Hutter, F., Hoos, H., Leyton-Brown, K.: An efficient approach for assessing hyperparameter importance. In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, pp. 754–762 (2014)
Pfisterer, F., van Rijn, J., Probst, P., Müller, A., Bischl, B.: Learning multiple defaults for machine learning algorithms. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 241–242 (2021)
van Rijn, J.N., Hutter, F.: Hyperparameter importance across datasets. In: KDD 2018: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2018)
Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 847–855. ACM (2013)
Vanschoren, J., van Rijn, J.N., Bischl, B., Torgo, L.: OpenML: networked science in machine learning. ACM SIGKDD Explor. Newsl. 15(2), 49–60 (2013)
Xu, L., Hutter, F., Hoos, H., Leyton-Brown, K.: Evaluating component solver contributions to portfolio-based algorithm selectors. In: Cimatti, A., Sebastiani, R. (eds.) SAT 2012. LNCS, vol. 7317, pp. 228–241. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31612-8_18
Acknowledgements
This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia, within project UIDB/50014/2020. The authors of this paper wish to thank the anonymous referees for their useful comments that helped us to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
List of 41 datasets, represented by OpenML-Dataset-Name (OpenML-Dataset-ID), used in the experiments. This set is subset of 72 datasets of the benchmarking suite OpenML-CC18 (https://docs.openml.org/benchmark/#openml-cc18):
kr-vs-kp (3), letter (6), balance-scale (11), mfeat-factors (12),
mfeat-fourier (14), mfeat-karhunen (16), cmc (23), optdigits (28),
pendigits (32), diabetes (37), splice (46), tic-tac-toe (50),
vehicle (54), electricity (151), satimage (182), vowel (307),
isolet (300), analcatdata_authorship (458), analcatdata_dmft (469),
Bioresponse (4134), wdbc (1510), phoneme (1489), qsar-biodeg (1494),
wall-robot-navigation (1497), semeion (1501), ilpd (1480), madelon (1485),
ozone-level-8hr (1487), cnae-9 (1468), PhishingWebsites (4534),
GesturePhaseSegmentationProcessed (4538), har (1478), texture (40499),
climate-model-simulation-crashes (40994), wilt (40983), car (40975),
segment (40984), mfeat-pixel (40979), Internet-Advertisements (40978),
dna (40670), churn (40701).
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Freitas, F., Brazdil, P., Soares, C. (2023). Exploring the Reduction of Configuration Spaces of Workflows. In: Bifet, A., Lorena, A.C., Ribeiro, R.P., Gama, J., Abreu, P.H. (eds) Discovery Science. DS 2023. Lecture Notes in Computer Science(), vol 14276. Springer, Cham. https://doi.org/10.1007/978-3-031-45275-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-45275-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45274-1
Online ISBN: 978-3-031-45275-8
eBook Packages: Computer ScienceComputer Science (R0)