Abstract
This chapter describes the various types of experiments that can be done with the vast amount of data, stored in experiment databases. We focus on three types of experiments done with the data stored in OpenML.
Chapter PDF
Similar content being viewed by others
References
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305.
Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Mantovani, R. G., van Rijn, J. N., and Vanschoren, J. (2021). OpenML benchmarking suites. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, NIPS’21.
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Domhan, T., Springenberg, J. T., and Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
Falkner, S., Klein, A., and Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of ICML’18, pages 1437–1446. JMLR.org.
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, ICML’96, pages 148–156.
Frey, P. W. and Slate, D. J. (1991). Letter recognition using Holland-style adaptive classifiers. Machine Learning, 6:161–182.
Friedman, J., Hastie, T., and Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:2000.
Hall, M. (1999). Correlation-based feature selection for machine learning. PhD thesis, University of Waikato.
Hutter, F., Hoos, H., and Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning, ICML’14, pages 754–762.
Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. LION, 5:507–523.
John, G. H. and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338–345. Morgan Kaufmann.
Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. (2017). Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Proc. of AISTATS 2017.
Landwehr, N., Hall, M., and Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2):161–205.
Lavesson, N. and Davidsson, P. (2006). Quantifying the impact of learning algorithm parameter tuning. In AAAI, volume 6, pages 395–400.
Lee, J. W. and Giraud-Carrier, C. (2011). A metric for unsupervised metalearning. Intelligent Data Analysis, 15(6):827–841.
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2017). Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization. In Proc. of ICLR 2017.
Post, M. J., van der Putten, P., and van Rijn, J. N. (2016). Does feature selection improve classification? a large scale experiment in OpenML. In Advances in Intelligent Data Analysis XV, pages 158–170. Springer.
Probst, P., Boulesteix, A.-L., and Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53):1–32.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Radovanovi´c, M., Nanopoulos, A., and Ivanovi´c, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. JMLR, 11:2487–2531.
Rokach, L. and Maimon, O. (2005). Clustering methods. In Data Mining and Knowledge Discovery Handbook, pages 321–352. Springer.
Sharma, A., van Rijn, J. N., Hutter, F., and M¨uller, A. (2019). Hyperparameter importance for image classification by residual neural networks. In Kralj Novak, P., ˇSmuc, T., and Dˇzeroski, S., editors, Discovery Science, pages 112–126. Springer International Publishing.
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, NIPS’12, page 2951–2959.
Strang, B., van der Putten, P., van Rijn, J. N., and Hutter, F. (2018). Don’t rule out simple models prematurely: A large scale benchmark comparing linear and non-linear classifiers in OpenML. In International Symposium on Intelligent Data Analysis, pages 303–315. Springer.
Thomas, J., Coors, S., and Bischl, B. (2018). Automatic gradient boosting. arXiv preprint arXiv:1807.03873.
van Rijn, J. N. (2016). Massively collaborative machine learning. PhD thesis, Leiden University.
van Rijn, J. N. and Hutter, F. (2018). Hyperparameter importance across datasets. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Brazdil, P., van Rijn, J.N., Soares, C., Vanschoren, J. (2022). Learning from Metadata in Repositories. In: Metalearning. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-67024-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-67024-5_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67023-8
Online ISBN: 978-3-030-67024-5
eBook Packages: Computer ScienceComputer Science (R0)