Machine Learning

, Volume 87, Issue 2, pp 127–158 | Cite as

Experiment databases

A new way to share, organize and learn from experiments
  • Joaquin VanschorenEmail author
  • Hendrik Blockeel
  • Bernhard Pfahringer
  • Geoffrey Holmes
Open Access


Thousands of machine learning research papers contain extensive experimental comparisons. However, the details of those experiments are often lost after publication, making it impossible to reuse these experiments in further research, or reproduce them to verify the claims made. In this paper, we present a collaboration framework designed to easily share machine learning experiments with the community, and automatically organize them in public databases. This enables immediate reuse of experiments for subsequent, possibly much broader investigation and offers faster and more thorough analysis based on a large set of varied results. We describe how we designed such an experiment database, currently holding over 650,000 classification experiments, and demonstrate its use by answering a wide range of interesting research questions and by verifying a number of recent studies.


Experimental methodology Machine learning Databases Meta-learning 


  1. Aha, D. (1992). Generalizing from case studies: a case study. In Proceedings of the international conference on machine learning (ICML) (pp. 1–10). Google Scholar
  2. Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, MA, Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., & Sherlock, G. (2000). Gene ontology: tool for the unification of biology. Nature Genetics, 25, 25–29. CrossRefGoogle Scholar
  3. Asuncion, A., & Newman, D. J. (2007). UCI machine learning repository. University of California, School of Information and Computer Science. Google Scholar
  4. Ball, C. A., Brazma, A., Causton, H. C., & Chervitz, S. (2004). Submission of microarray data to public repositories. PLoS Biology, 2(9), e317. CrossRefGoogle Scholar
  5. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning, 36(1–2), 105–139. CrossRefGoogle Scholar
  6. Blockeel, H. (2006). Experiment databases: A novel methodology for experimental research. Lecture Notes in Computer Science, 3933, 72–85. CrossRefGoogle Scholar
  7. Blockeel, H., & Vanschoren, J. (2007). Experiment databases: towards an improved experimental methodology in machine learning. Lecture Notes in Computer Science, 4702, 6–17. CrossRefGoogle Scholar
  8. Bradford, J., & Brodley, C. (2001). The effect of instance-space partition on significance. Machine Learning, 42, 269–286. zbMATHCrossRefGoogle Scholar
  9. Brain, D., & Webb, G. (2002). The need for low bias algorithms in classification learning from large data sets. Lecture Notes in Artificial Intelligence, 2431, 62–73. Google Scholar
  10. Brazdil, P., Giraud-Carrier, C., Soares, C., & Vilalta, R. (2009). Metalearning: applications to data mining. Berlin: Springer. zbMATHGoogle Scholar
  11. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F., Kim, I., Markowitz, V., Matese, J., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., & Vingron, J. (2001). Minimum information about a microarray experiment. Nature Genetics, 29, 365–371. CrossRefGoogle Scholar
  12. Brown, D., Vogt, R., Beck, B., & Pruet, J. (2007). High energy nuclear database: a testbed for nuclear data information technology. In Proceedings of the international conference on nuclear data for science and technology, article 250. Google Scholar
  13. Carpenter, J. (2011). May the best analyst win. Science, 331(6018), 698–699. CrossRefGoogle Scholar
  14. Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. In Proceedings of the international conf. on machine learning (pp. 161–168). Google Scholar
  15. Chandrasekaran, B., & Josephson, J. (1999). What are ontologies, and why do we need them? IEEE Intelligent Systems, 14(1), 20–26. CrossRefGoogle Scholar
  16. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. MathSciNetzbMATHGoogle Scholar
  17. Derriere, S., Preite-Martinez, A., & Richard, A. (2006). UCDs and ontologies. ASP Conference Series, 351, 449. Google Scholar
  18. Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1923. CrossRefGoogle Scholar
  19. Frawley, W. (1989). The role of simulation in machine learning research. In Proceedings of the annual symposium on simulation (ANSS) (pp. 119–127). Google Scholar
  20. Fromont, E., Blockeel, H., & Struyf, J. (2007). Integrating decision tree learning into inductive databases. Lecture Notes in Computer Science, 4747, 81–96. CrossRefGoogle Scholar
  21. Hall, M. (1998). Correlation-based feature selection for machine learning. PhD Thesis, Waikato University. Google Scholar
  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1), 10–18. CrossRefGoogle Scholar
  23. Hand, D. (2006). Classifier technology and the illusion of progress. Statistical Science, 21(1), 1–14. MathSciNetzbMATHCrossRefGoogle Scholar
  24. Hilario, M., & Kalousis, A. (2000). Building algorithm profiles for prior model selection in knowledge discovery systems. Engineering Intelligent Systems, 8(2), 956–961. Google Scholar
  25. Hilario, M., Kalousis, A., Nguyen, P., & Woznica, A. (2009). A data mining ontology for algorithm selection and meta-mining. In Proceedings of the ECML-PKDD’09 workshop on service-oriented knowledge discovery (pp. 76–87). Google Scholar
  26. Hirsh, H. (2008). Data mining research: Current status and future opportunities. Statistical Analysis and Data Mining, 1(2), 104–107. MathSciNetCrossRefGoogle Scholar
  27. Holte, R. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–91. zbMATHCrossRefGoogle Scholar
  28. Hoste, V., & Daelemans, W. (2005). Comparing learning approaches to coreference resolution. There is more to it than bias. In Proceedings of the ICML’05 workshop on meta-learning (pp. 20–27). Google Scholar
  29. Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64. CrossRefGoogle Scholar
  30. Jensen, D., & Cohen, P. (2000). Multiple comparisons in induction algorithms. Machine Learning, 38, 309–338. zbMATHCrossRefGoogle Scholar
  31. Keogh, E., & Kasetty, S. (2003). On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4), 349–371. MathSciNetCrossRefGoogle Scholar
  32. Kietz, J., Serban, F., Bernstein, A., & Fischer, S. (2009). Towards cooperative planning of data mining workflows. In Proceedings of the ECML-PKDD’09 workshop on service-oriented knowledge discovery (pp. 1–12). Google Scholar
  33. King, R., Rowland, J., Oliver, S., Young, M., Aubrey, W., Byrne, E., Liakata, M., Markham, M., Pir, P., Soldatova, L. N., Sparkes, A., Whelan, K. E., & Clare, A. (2009). The automation of science. Science, 324(5923), 85–89. CrossRefGoogle Scholar
  34. Kohavi, R., & Wolpert, D. (1996). Bias plus variance decomposition for zero-one loss functions. In Proceedings of the international conference on machine learning (pp. 275–283). Google Scholar
  35. Leake, D., & Kendall-Morwick, J. (2008). Towards case-based support for e-science workflow generation by mining provenance. Lecture Notes in Computer Science, 5239, 269–283. CrossRefGoogle Scholar
  36. Manolescu, I., Afanasiev, L., Arion, A., Dittrich, J., Manegold, S., Polyzotis, N., Schnaitter, K., Senellart, P., & Zoupanos, S. (2008). The repeatability experiment of SIGMOD 2008. ACM SIGMOD Record, 37(1), 39–45. CrossRefGoogle Scholar
  37. Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine learning, neural and statistical classification. Ellis Horwood: Chichester. zbMATHGoogle Scholar
  38. Morik, K., & Scholz, M. (2004). The MiningMart approach to knowledge discovery in databases. In N. Zhong & J. Liu (Eds.), Intelligent technologies for information analysis (pp. 47–65). Berlin: Springer. Google Scholar
  39. Nielsen, M. (2008). The future of science: building a better collective memory. APS Physics, 17(10). Google Scholar
  40. Ochsenbein, F., Williams, R. W., Davenhall, C., Durand, D., Fernique, P., Hanisch, R., Giaretta, D., McGlynn, T., Szalay, A., & Wicenec, A. (2004). VOTable: tabular data for the Virtual Observatory. In Q. Peter & G. Krzysztof (Eds.), Toward an international virtual observatory (Vol. 30, pp. 118–123). Berlin: Springer. CrossRefGoogle Scholar
  41. Panov, P., Soldatova, L. N., & Džeroski, S. (2009). Towards an ontology of data mining investigations. Lecture Notes in Artificial Intelligence, 5808, 257–271. Google Scholar
  42. Pedersen, T. (2008). Empiricism is not a matter of faith. Computational Linguistics, 34, 465–470. CrossRefGoogle Scholar
  43. Perlich, C., Provost, F., & Simonoff, J. (2003). Tree induction vs. logistic regression: a learning-curve analysis. Journal of Machine Learning Research, 4, 211–255. MathSciNetGoogle Scholar
  44. Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Meta-learning by landmarking various learning algorithms. In Proceedings of the international conference on machine learning (ICML) (pp. 743–750). Google Scholar
  45. De Roure, D., Goble, C., & Stevens, R. (2009). The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generations Computer Systems, 25, 561–567. CrossRefGoogle Scholar
  46. Salzberg, S. (1999). On comparing classifiers: a critique of current research and methods. Data Mining and Knowledge Discovery, 1, 1–12. Google Scholar
  47. Schaaff, A. (2007). Data in astronomy: from the pipeline to the virtual observatory. Lecture Notes in Computer Science, 4832, 52–62. CrossRefGoogle Scholar
  48. Soldatova, L., & King, R. (2006). An ontology of scientific experiments. Journal of the Royal Society Interface, 3(11), 795–803. CrossRefGoogle Scholar
  49. Sonnenburg, S., Braun, M., Ong, C., Bengio, S., Bottou, L., Holmes, G., LeCun, Y., Muller, K., Pereira, F., Rasmussen, C., Ratsch, G., Scholkopf, B., Smola, A., Vincent, P., Weston, J., & Williamson, R. (2007). The need for open source software in machine learning. Journal of Machine Learning Research, 8, 2443–2466. Google Scholar
  50. Stoeckert, C., Causton, H. C., & Ball, C. A. (2002). Microarray databases: standards and ontologies. Nature Genetics, 32, 469–473. CrossRefGoogle Scholar
  51. Szalay, A., & Gray, J. (2001). The world-wide telescope. Science, 293, 2037–2040. CrossRefGoogle Scholar
  52. van Someren, M. (2001). Model class selection and construction: beyond the procrustean approach to machine learning applications. Lecture Notes in Computer Science, 2049, 196–217. CrossRefGoogle Scholar
  53. Vanschoren, J., & Blockeel, H. (2008). Investigating classifier learning behavior with experiment databases. Studies in Classification, Data Analysis, and Knowledge Organization, 5, 421–428. CrossRefGoogle Scholar
  54. Vanschoren, J., Pfahringer, B., & Holmes, G. (2008). Learning from the past with experiment databases. Lecture Notes in Artificial Intelligence, 5351, 485–492. Google Scholar
  55. Vanschoren, J., Blockeel, H., Pfahringer, B., & Holmes, G. (2009). Organizing the world’s machine learning information. Communications in Computer and Information Science, 17(12), 693–708. CrossRefGoogle Scholar
  56. Vizcaino, J., Cote, R., Reisinger, F., Foster, J., Mueller, M., Rameseder, J., Hermjakob, H., & Martens, L. (2009). A guide to the Proteomics Identifications Database proteomics data repository. Proteomics, 9(18), 4276–4283. CrossRefGoogle Scholar
  57. Wojnarski, M., Stawicki, S., & Wojnarowski, P. (2010). system for automated evaluation of algorithms in repeatable experiments. Lecture Notes in Computer Science, 6086, 20–29. CrossRefGoogle Scholar
  58. Wolpert, D. (2001). The supervised learning no-free-lunch theorems. In Proceedings of the online world conference on soft computing in industrial applications (pp. 25–42). Google Scholar
  59. Yasuda, N., Mizumoto, Y., Ohishi, M., O’Mullane, W., Budavari, T., Haridas, V., Li, N., Malik, T., Szalay, A., Hill, M., Linde, T., Mann, B., & Page, C. (2004). Astronomical data query language: simple query protocol for the virtual observatory. ASP Conference Series, 314, 293. Google Scholar
  60. Záková, M., Kremen, P., Zelezný, F., & Lavrač, N. (2008). Planning to learn with a knowledge discovery ontology. In Proceedings of the ICML/UAI/COLT’08 workshop on planning to learn (pp. 29–34). Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Joaquin Vanschoren
    • 1
    • 2
    Email author
  • Hendrik Blockeel
    • 1
    • 2
  • Bernhard Pfahringer
    • 3
  • Geoffrey Holmes
    • 3
  1. 1.LIACSUniversiteit LeidenLeidenThe Netherlands
  2. 2.Dept. of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium
  3. 3.Dept. of Computer ScienceThe University of WaikatoHamiltonNew Zealand

Personalised recommendations