Learning from Metadata in Repositories

Brazdil, Pavel; van Rijn, Jan N.; Soares, Carlos; Vanschoren, Joaquin

doi:10.1007/978-3-030-67024-5_17

Pavel Brazdil⁶,
Jan N. van Rijn⁷,
Carlos Soares⁸ &
…
Joaquin Vanschoren⁹

Part of the book series: Cognitive Technologies ((COGTECH))

11k Accesses

Abstract

This chapter describes the various types of experiments that can be done with the vast amount of data, stored in experiment databases. We focus on three types of experiments done with the data stored in OpenML.

Download to read the full chapter text

Chapter PDF

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

Harvesting of Metadata with Open Access Tools

ODArchive – Creating an Archive for Structured Data from Open Data Portals

References

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305.
Google Scholar
Bischl, B., Casalicchio, G., Feurer, M., Gijsbers, P., Hutter, F., Lang, M., Mantovani, R. G., van Rijn, J. N., and Vanschoren, J. (2021). OpenML benchmarking suites. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, NIPS’21.
Google Scholar
Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
Google Scholar
Domhan, T., Springenberg, J. T., and Hutter, F. (2015). Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
Google Scholar
Falkner, S., Klein, A., and Hutter, F. (2018). BOHB: Robust and efficient hyperparameter optimization at scale. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of ICML’18, pages 1437–1446. JMLR.org.
Google Scholar
Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, ICML’96, pages 148–156.
Google Scholar
Frey, P. W. and Slate, D. J. (1991). Letter recognition using Holland-style adaptive classifiers. Machine Learning, 6:161–182.
Google Scholar
Friedman, J., Hastie, T., and Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:2000.
Google Scholar
Hall, M. (1999). Correlation-based feature selection for machine learning. PhD thesis, University of Waikato.
Google Scholar
Hutter, F., Hoos, H., and Leyton-Brown, K. (2014). An efficient approach for assessing hyperparameter importance. In Proceedings of the 31st International Conference on Machine Learning, ICML’14, pages 754–762.
Google Scholar
Hutter, F., Hoos, H. H., and Leyton-Brown, K. (2011). Sequential model-based optimization for general algorithm configuration. LION, 5:507–523.
Google Scholar
John, G. H. and Langley, P. (1995). Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338–345. Morgan Kaufmann.
Google Scholar
Klein, A., Falkner, S., Bartels, S., Hennig, P., and Hutter, F. (2017). Fast Bayesian optimization of machine learning hyperparameters on large datasets. In Proc. of AISTATS 2017.
Google Scholar
Landwehr, N., Hall, M., and Frank, E. (2005). Logistic model trees. Machine Learning, 59(1-2):161–205.
Google Scholar
Lavesson, N. and Davidsson, P. (2006). Quantifying the impact of learning algorithm parameter tuning. In AAAI, volume 6, pages 395–400.
Google Scholar
Lee, J. W. and Giraud-Carrier, C. (2011). A metric for unsupervised metalearning. Intelligent Data Analysis, 15(6):827–841.
Google Scholar
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2017). Hyperband: Bandit-Based Configuration Evaluation for Hyperparameter Optimization. In Proc. of ICLR 2017.
Google Scholar
Post, M. J., van der Putten, P., and van Rijn, J. N. (2016). Does feature selection improve classification? a large scale experiment in OpenML. In Advances in Intelligent Data Analysis XV, pages 158–170. Springer.
Google Scholar
Probst, P., Boulesteix, A.-L., and Bischl, B. (2019). Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research, 20(53):1–32.
Google Scholar
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1:81–106.
Google Scholar
Radovanovi´c, M., Nanopoulos, A., and Ivanovi´c, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. JMLR, 11:2487–2531.
Google Scholar
Rokach, L. and Maimon, O. (2005). Clustering methods. In Data Mining and Knowledge Discovery Handbook, pages 321–352. Springer.
Google Scholar
Sharma, A., van Rijn, J. N., Hutter, F., and M¨uller, A. (2019). Hyperparameter importance for image classification by residual neural networks. In Kralj Novak, P., ˇSmuc, T., and Dˇzeroski, S., editors, Discovery Science, pages 112–126. Springer International Publishing.
Google Scholar
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, NIPS’12, page 2951–2959.
Google Scholar
Strang, B., van der Putten, P., van Rijn, J. N., and Hutter, F. (2018). Don’t rule out simple models prematurely: A large scale benchmark comparing linear and non-linear classifiers in OpenML. In International Symposium on Intelligent Data Analysis, pages 303–315. Springer.
Google Scholar
Thomas, J., Coors, S., and Bischl, B. (2018). Automatic gradient boosting. arXiv preprint arXiv:1807.03873.
van Rijn, J. N. (2016). Massively collaborative machine learning. PhD thesis, Leiden University.
Google Scholar
van Rijn, J. N. and Hutter, F. (2018). Hyperparameter importance across datasets. In KDD ’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal
Pavel Brazdil
Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
Jan N. van Rijn
Porto Business School, Porto, Portugal
Carlos Soares
Department of Mathematics and Computer Science, Technische Universiteit Eindhoven, Eindhoven, The Netherlands
Joaquin Vanschoren

Authors

Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar
Jan N. van Rijn
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar
Joaquin Vanschoren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Brazdil .

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brazdil, P., van Rijn, J.N., Soares, C., Vanschoren, J. (2022). Learning from Metadata in Repositories. In: Metalearning. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-030-67024-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-67024-5_17
Published: 22 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67023-8
Online ISBN: 978-3-030-67024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning from Metadata in Repositories

Abstract

Chapter PDF

Similar content being viewed by others

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

Harvesting of Metadata with Open Access Tools

ODArchive – Creating an Archive for Structured Data from Open Data Portals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Learning from Metadata in Repositories

Abstract

Chapter PDF

Similar content being viewed by others

An Open Repository Model for Acquiring Knowledge About Scientific Experiments

Harvesting of Metadata with Open Access Tools

ODArchive – Creating an Archive for Structured Data from Open Data Portals

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation