Organizing the World’s Machine Learning Information

  • Joaquin Vanschoren
  • Hendrik Blockeel
  • Bernhard Pfahringer
  • Geoff Holmes
Part of the Communications in Computer and Information Science book series (CCIS, volume 17)


All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird’s-eye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Allison, L.: Models for machine learning and data mining in functional programming. Journal of Functional Programming 15(1), 15–32 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Ball, C.A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., et al.: Submission of Microarray Data to Public Repositories. PLoS Biol. 2(9), e317 (2004)CrossRefGoogle Scholar
  3. 3.
    Blockeel, H.: Experiment databases: A novel methodology for experimental research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., López de Mántaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., et al.: Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nature Genetics 29, 365–371 (2001)CrossRefGoogle Scholar
  6. 6.
    Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., et al.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Research 31(1), 68–71 (2003)CrossRefGoogle Scholar
  7. 7.
    Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 161–168. Springer, Heidelberg (2007)Google Scholar
  8. 8.
    The Data Mining Group: The Predictive Model Markup Language (PMML), version 3.2,
  9. 9.
    Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Džeroski, S.: Towards a General Framework for Data Mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)CrossRefGoogle Scholar
  12. 12.
    Perlich, C., Provost, F., Siminoff, J.: Tree induction vs. logistic regression: A learning curve analysis. Journal of Machine Learning Research 4, 211–255 (2003)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Soldatova, L.N., Clare, A., Sparkes, A., King, R.D.: An ontology for a Robot Scientist. Bioinformatics 22(14), 464–471 (2006)CrossRefGoogle Scholar
  14. 14.
    Stoeckert, C., Causton, H., Ball, C.: Microarray databases: standards and ontologies. Nature Genetics 32, 469–473 (2002)CrossRefGoogle Scholar
  15. 15.
    Vanschoren, J., Pfahringer, B., Holmes, G.: Learning From The Past with Experiment Databases. Working Paper Series 08/2008, Computer Science Department, University of Waikato (2008)Google Scholar
  16. 16.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Joaquin Vanschoren
    • 1
  • Hendrik Blockeel
    • 1
  • Bernhard Pfahringer
    • 2
  • Geoff Holmes
    • 2
  1. 1.Computer Science Dept.K.U. LeuvenLeuvenBelgium
  2. 2.Computer Science Dept.University of WaikatoHamiltonNew Zealand

Personalised recommendations