Skip to main content

UCI++: Improved Support for Algorithm Selection Using Datasetoids

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 5476)

Abstract

As companies employ a larger number of models, the problem of algorithm (and parameter) selection is becoming increasingly important. Two approaches to obtain empirical knowledge that is useful for that purpose are empirical studies and metalearning. However, most empirical (meta)knowledge is obtained from a relatively small set of datasets. In this paper, we propose a method to obtain a large number of datasets which is based on a simple transformation of existing datasets, referred to as datasetoids. We test our approach on the problem of using metalearning to predict when to prune decision trees. The results show significant improvement when using datasetoids. Additionally, we identify a number of potential anomalies in the generated datasetoids and propose methods to solve them.

Keywords

  • Algorithm Selection
  • Test Dataset
  • Class Distribution
  • Target Attribute
  • Symbolic Attribute

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-01307-2_46
  • Chapter length: 8 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   169.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-01307-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   219.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)

    Google Scholar 

  2. Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS, vol. 4702, pp. 6–17. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  3. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R.: Metalearning: Applications to Data Mining. In: Cognitive Technologies. Springer, Heidelberg (2009)

    Google Scholar 

  4. Brazdil, P., Soares, C., Costa, J.: Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning 50(3), 251–277 (2003)

    CrossRef  MATH  Google Scholar 

  5. Henery, R.J.: Classification. In: Michie, D., Spiegelhalter, D.J., Taylor, C.C. (eds.) Machine Learning, Neural and Statistical Classification, Ellis Horwood, vol. 2, pp. 6–16 (1994)

    Google Scholar 

  6. Hilario, M., Kalousis, A.: Quantifying the resilience of inductive classification algorithms. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 106–115. Springer, Heidelberg (2000)

    CrossRef  Google Scholar 

  7. Macià, N., Orriols-Puig, A., Bernadó-Mansilla, E.: Genetic-based synthetic data sets for the analysis of classifiers behavior. his 0, 507–512 (2008)

    Google Scholar 

  8. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2008) ISBN 3-900051-07-0

    Google Scholar 

  9. Soulié-Fogelman, F.: Data mining in the real world: What do we need and what do we have? In: Ghani, R., Soares, C. (eds.) Proceedings of the Workshop on Data Mining for Business Applications, pp. 44–48 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soares, C. (2009). UCI++: Improved Support for Algorithm Selection Using Datasetoids. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01307-2_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01306-5

  • Online ISBN: 978-3-642-01307-2

  • eBook Packages: Computer ScienceComputer Science (R0)