Machine Learning

, Volume 54, Issue 3, pp 275–312 | Cite as

On Data and Algorithms: Understanding Inductive Performance

  • Alexandros Kalousis
  • João Gama
  • Melanie Hilario
Article

Abstract

In this paper we address two symmetrical issues, the discovery of similarities among classification algorithms, and among datasets. Both on the basis of error measures, which we use to define the error correlation between two algorithms, and determine the relative performance of a list of algorithms. We use the first to discover similarities between learners, and both of them to discover similarities between datasets. The latter sketch maps on the dataset space. Regions within each map exhibit specific patterns of error correlation or relative performance. To acquire an understanding of the factors determining these regions we describe them using simple characteristics of the datasets. Descriptions of each region are given in terms of the distributions of dataset characteristics within it.

classification meta-learning error correlation classifier ranking clustering datasets clustering classifiers 

References

  1. Agresti, A. (1990). Categorical Data Analysis. John Wiley and Sons: Series in Probability and Mathematical Statistics.Google Scholar
  2. Aha, D. (1992). Generalizing from case studies: A case study. In D. Sleeman & P. Edwards (Eds.), Proceedings of the 9th International Machine Learning Conference (pp. 1–10). Morgan Kaufman.Google Scholar
  3. Ali, K., & Pazzani, M. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24, 1996.Google Scholar
  4. Bay, S., & Pazzani, M. (2000). Characterizing model errors and differences. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 49–56). Morgan Kaufman.Google Scholar
  5. Bensusan, H. (1999). Automatic Bias Learning: An Inquiry Into the Inductive Basis of Induction. Doctoral dissertation, University Of Sussex, Cognitive Science.Google Scholar
  6. Bock, H., & Diday, E. (2000). Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer-Verlag.Google Scholar
  7. Brazdil, P., Carlos, S., & Costa, J. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50, 251–277.Google Scholar
  8. Brodley, C. (1994). Recursive Automatic Algorithm Selection for Inductive Learning. Doctoral dissertation, University of Massachusetts.Google Scholar
  9. Cohen, W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (pp. 115–123). Morgan Kaufman.Google Scholar
  10. Cristianini, N., & Shawe-Taylor, J. (2002). An Introduction to Support Vector Machines. Cambridge University Press.Google Scholar
  11. Duda, R., Hart, P., & Stork, D. (2001). Pattern Classification and Scene Analysis. John Willey and Sons.Google Scholar
  12. Gama, J., & Brazdil, P. (1995). Characterization of classification algorithms. In C. Pinto-Ferreira & N. Mamede (Eds.), Proceedings of the 7th Portugese Conference in AI, EPIA 95 (pp. 83–102). Springer-Verlag.Google Scholar
  13. Gama, J., & Brazdil, P. (1999). Linear tree. Intelligent Data Analysis, 3, 1–22.Google Scholar
  14. Hirschberg, D. S., Pazzani, M. J., & Ali, K. M. (1994). Average Case Analysis of k-CNF and k-DNF Learning Algorithms, vol. II: Intersections Between Theory and Experiment, 15–28. MIT Press.Google Scholar
  15. Kalousis, A., & Hilario, M. (2001a). Feature selection for meta-learning. In D. Cheung, G. Williams & Q. Li (Eds.), Proceedings of the 5th Pasific Asia Conference on Knowledge Discovery and Data Mining (pp. 222–233). Springer.Google Scholar
  16. Kalousis, A., & Hilario, M. (2001b). Model selection via meta-learning: A comparative study. International Journal On Artificial Intelligence Tools, 10.Google Scholar
  17. Kalousis, A., & Theoharis, T. (1999). Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3, 319–337.Google Scholar
  18. Langley, P., & Sage, S. (1999). Tractable average-case analysis of naive Bayesian classifiers. In I. Bratko & S. Dzeroski (Eds.), Proceedings of the 16th International Conference on Machine Learning (pp. 220–228). Morgan Kaufmann.Google Scholar
  19. Langley, P., & Iba, W. (1993). Average-case analysis of a nearest neighbor algorithm. In R. Bajcsy (Ed.), Proceedings of the 13th International Joint Conference on AI (pp. 889–894). Morgan Kaufmann.Google Scholar
  20. METAL project (2002). http://www.metal-kdd.org.Google Scholar
  21. Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence.Google Scholar
  22. Peng, Y., Flach, P., Soares, C., & Brazdil, P. (2002). Improved dataset characterisation for meta-learning. In S. Lange & K. Satoh (Eds.), Proceedings of the 5th International Conference on Discovery Science 2002 (pp. 141–152). Springer-Verlag.Google Scholar
  23. Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Tell me who can learn you and I can tell you who you are: Landmarking various learning algorithms. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 743–750). Morgan Kaufman.Google Scholar
  24. Quinlan, J. (1992). C4.5: Programs for Machine Learning. Morgan Kaufman Publishers.Google Scholar
  25. Quinlan, R. (2000). http://www.rulequest.com/see5-info.html.Google Scholar
  26. Scheffer, T. (2000). Average-case analysis of classification algorithms for boolean functions and decision trees. In H. Arimura, S. Jain & A. Sharma (Eds.), Proceedings of the 11th International Conference Algorithmic Learning Theory (pp. 194–208). Springer Verlag, LNCS 1968.Google Scholar
  27. Soares, C., & Brazdil, P. (2000). Zoomed ranking: Selection of classification algrorithms based on relevant performance information. In D. Zighed, J. Komorowski & J. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (pp. 126–135). Springer.Google Scholar
  28. Todorovski, L., Blockeel, H., & Dzeroski, S. (2002). Ranking with predictive clustering trees. In T. Elomaa, H. Mannila and H. Toivonen (Eds.), Proceedings of the 13th European Conference on Machine Learning (pp. 444–455). Springer.Google Scholar
  29. Todorovski, L., & Dzeroski, S. (1999). Experiments in meta-level learning with ILP. In J. Zytkow & J. Rauch (Eds.), Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery (pp. 98–106). Springer.Google Scholar
  30. Tumer, K., & Ghosh, J. (1995). Classifier combining: Analytical results and implications. In In AAAI-95-Workshop in Induction of Multiple Learning Models.Google Scholar
  31. Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5, 241–259.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Alexandros Kalousis
    • 1
  • João Gama
    • 2
  • Melanie Hilario
    • 1
  1. 1.University of Geneva, Computer Science DepartmentGeneva 4Switzerland
  2. 2.LIACC, FEP—University of PortoPortoPortugal

Personalised recommendations