Machine Learning

, Volume 50, Issue 3, pp 251–277 | Cite as

Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results

  • Pavel B. Brazdil
  • Carlos Soares
  • Joaquim Pinto da Costa
Article

Abstract

We present a meta-learning method to support selection of candidate learning algorithms. It uses a k-Nearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the meta-learning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the meta-learning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.

algorithm recommendation meta-learning data characterization ranking 

References

  1. Aha, D. (1992). Generalizing from case studies: A case study. In D. Sleeman & P. Edwards (Eds.), Proceedings of the Ninth International Workshop on Machine Learning (ML92) (pp. 1–10). San Mateo, CA: Morgan Kaufmann.Google Scholar
  2. Atkeson, C. G., Moore, A. W., & Schaal, S. (1997). Locally Weighted Learning (Vol. 11) (pp. 11–74). Boston: Kluwer.Google Scholar
  3. Bensusan, H., & Giraud-Carrier, C. (2000). If you see la sagrada familia, you know where you are: Landmarking the learner space. Technical report, Department of Computer Science, University of Bristol.Google Scholar
  4. Bensusan, H., & Kalousis, A. (2001). Estimating the predictive accuracy of a classifier. In P. Flach & L. de Raedt (Eds.), Proceedings of the 12th European Conference on Machine Learning (pp. 25–36). New York: Springer.Google Scholar
  5. Bernstein, A., & Provost, F. (2001). An intelligent assistant for the knowledge discovery process. In W. Hsu, H. Kargupta, H. Liu, & N. Street (Eds.), Proceedings of the IJCAI-01 Workshop on Wrappers for Performance Enhancement in KDD.Google Scholar
  6. Berrer, H., Paterson, I., & Keller, J. (2000). Evaluation of machine-learning algorithm ranking advisors. In P. Brazdil & A. Jorge (Eds.), Proceedings of the PKDD2000 Workshop on Data Mining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions (pp. 1–13).Google Scholar
  7. Blake, C., Keogh, E., & Merz, C. (1998). Repository of machine learning databases. Available at http:/www. ics.uci.edu/~mlearn/MLRepository.htmlGoogle Scholar
  8. Brachman, R., & Anand, T. (1996). The process of knowledge discovery in databases. In U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, ch. 2 (pp. 37–57). AAAI Press/The MIT Press.Google Scholar
  9. Brazdil, P., Gama, J., & Henery, B. (1994). Characterizing the applicability of classification algorithms using metalevel learning. In F. Bergadano & L. de Raedt (Eds.), Proceedings of the European Conference on Machine Learning (ECML-94) (pp. 83–102). Berlin: Springer-Verlag.Google Scholar
  10. Brazdil, P., & Soares, C. (2000). A comparison of ranking methods for classification algorithm selection. In R. de Mántaras & E. Plaza (Eds.), Machine Learning: Proceedings of the 11th European Conference on Machine Learning ECML2000 (pp. 63–74). Berlin: Springer.Google Scholar
  11. Brazdil, P., Soares, C., & Pereira, R. (2001). Reducing rankings of classifiers by eliminating redundant cases. In P. Brazdil & A. Jorge (Eds.), Proceedings of the 10th Portuguese Conference on Artificial Intelligence (EPIA 2001). New York: Springer.Google Scholar
  12. Brodley, C. (1993). Addressing the selective superiority problem: Automatic algorithm/model class selection. In P. Utgoff (Ed.), Proceedings of the Tenth International Conference on Machine Learning (pp. 17–24). San Mateo, CA: Morgan Kaufmann.Google Scholar
  13. Charnes, A., Cooper, W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444.Google Scholar
  14. Cohen, W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 11th International Conference on Machine Learning (pp. 115–123). San Mateo, CA: Morgan Kaufmann.Google Scholar
  15. Fürnkranz, J., & Petrak, J. (2001). An evaluation of landmarking variants. In C. Giraud-Carrier, N. Lavrac, & S. Moyle (Eds.), Working Notes of the ECML/PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning (pp. 57–68).Google Scholar
  16. Gama, J. (1997). Probabilistic linear tree. In D. Fisher (Ed.), Proceedings of the 14th International Machine Learning Conference (ICML97) (pp. 134–142). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. Gama, J., & Brazdil, P. (1995). Characterization of classification algorithms. In C. Pinto-Ferreira & N. Mamede (Eds.), Progress in Artificial Intelligence (pp. 189–200). Berlin: Springer-Verlag.Google Scholar
  18. Henery, R. (1994). Methods for comparison. In D. Michie, D. Spiegelhalter, & C. Taylor (Eds.), Machine Learning, Neural and Statistical Classification, ch. 7 (pp. 107–124). Ellis Horwood.Google Scholar
  19. Hilario, M., & Kalousis, A. (1999). Building algorithm profiles for prior model selection in knowledge discovery systems. In Proceedings of the IEEE SMC'99 International Conference on Systems, Man and Cybernetics. New York: IEEE Press.Google Scholar
  20. Hilario, M., & Kalousis, A. (2001). Fusion of meta-knowledge and meta-data for case-based model selection. In A. Siebes & L. de Raedt (Eds.), Proceedings of the Fifth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD01). New York: Springer.Google Scholar
  21. Hochreiter, S., Younger, A., & Conwell, P. (2001). Learning to learn using gradient descent. In G. Dorffner, H. Bischof, & K. Hornik (Eds.), Lecture Notes on Comp. Sci. 2130, Proc. Intl. Conf. On Artificial Neural Networks (ICANN-2001) (pp. 87–94). New York: Springer.Google Scholar
  22. Kalousis, A., & Hilario, M. (2000). A comparison of inducer selection via instance-based and boosted decision tree meta-learning. In R. Michalski & P. Brazdil (Eds.), Proceedings of the Fifth International Workshop on Multistrategy Learning (pp. 233–247).Google Scholar
  23. Kalousis, A., & Theoharis, T. (1999).NOEMON:Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3:5, 319–337.Google Scholar
  24. Keller, J., Paterson, I., & Berrer, H. (2000). An integrated concept for multi-criteria ranking of data-mining algorithms. In J. Keller & C. Giraud-Carrier (Eds.), Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination.Google Scholar
  25. Kohavi, R., John, G., Long, R., Mangley, D., & Pfleger, K. (1994). MLC++: A machine learning library in c++. Technical report, Stanford University.Google Scholar
  26. Lindner, G., & Studer, R. (1999). AST: Support for algorithm selection with a CBR approach. In C. Giraud-Carrier & B. Pfahringer (Eds.), Recent Advances in Meta-Learning and Future Work (pp. 38–47). J. Stefan Institute. Available at http://ftp.cs.bris.ac.uk/cgc/ICML99/lindner.ps.ZGoogle Scholar
  27. Maron, O., & Moore, A. (1994). Hoeffding races: Accelerating model selection search for classification and function approximation. In J. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in Neural Information Processing Systems (pp. 59–66). San Mateo, CA: Morgan Kaufmann.Google Scholar
  28. METAL Consortium (2002). Esprit project METAL (#26.357). Available at www.metal-kdd.org.Google Scholar
  29. Michie, D., Spiegelhalter, D., & Taylor, C. (1994). Machine Learning, Neural and Statistical Classification. Ellis Horwood.Google Scholar
  30. Mitchell, T. (1997). Machine Learning. New York: McGraw-Hill.Google Scholar
  31. Nakhaeizadeh, G., & Schnabl, A. (1997). Towards the personalization of algorithms evaluation in data mining. In R. Agrawal & P. Stolorz (Eds.), Proceedings of the Third International Conference on Knowledge Discovery & Data Mining (pp. 289–293). AAAI Press.Google Scholar
  32. Nakhaeizadeh, G., & Schnabl, A. (1998). Development of multi-criteria metrics for evaluation of data mining algorithms. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the Fourth International Conference on Knowledge Discovery in Databases & Data Mining (pp. 37–42). AAAI Press.Google Scholar
  33. Neave, H., & Worthington, P. (1992). Distribution-Free Tests. London: Routledge.Google Scholar
  34. Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In P. Langley (Ed.), Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000) (pp. 743–750). San Mateo, CA: Morgan Kaufmann.Google Scholar
  35. Quinlan, R. (1998). C5.0: An Informal Tutorial. RuleQuest. Available at http://www.rulequest.com/see5-unix.html.Google Scholar
  36. Ripley, B. (1996). Pattern Recognition and Neural Networks. Cambridge.Google Scholar
  37. Schaffer, C. (1993). Selecting a classification method by cross-validation. Machine Learning, 13:1, 135–143.Google Scholar
  38. Schmidhuber, J., Zhao, J., & Schraudolph, N. (1997). Reinforcement Learning With Self-Modifying Policies (pp. 293–309). Boston: Kluwer.Google Scholar
  39. Soares, C., & Brazdil, P. (2000). Zoomed ranking: Selection of classification algorithms based on relevant performance information. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000) (pp. 126–135). New York: Springer.Google Scholar
  40. Soares, C., Brazdil, P., & Costa, J. (2000). Measures to compare rankings of classification algorithms. In H. Kiers, J.-P. Rasson, P. Groenen, & M. Schader (Eds.), Data Analysis, Classification and Related Methods, Proceedings of the Seventh Conference of the International Federation of Classification Societies IFCS (pp. 119–124). New York: Springer.Google Scholar
  41. Soares, C., Costa, J., & Brazdil, P. (2001a). Improved statistical support for matchmaking: Rank correlation taking rank importance into account. In JOCLAD 2001: VII Jornadas de Classificação e Análise de Dados (pp. 72–75).Google Scholar
  42. Soares, C., Petrak, J., & Brazdil, P. (2001b). Sampling-based relative landmarks: Systematically test-driving algorithms before choosing. In P. Brazdil & A. Jorge (Eds.), Proceedings of the 10th Portuguese Conference on Artificial Intelligence (EPIA 2001) (pp. 88–94). New York: Springer.Google Scholar
  43. Sohn, S. (1999). Meta analysis of classification algorithms for pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21:11, 1137–1144.Google Scholar
  44. Suyama, A., Negishi, N., & Yamaguchi, T. (1999). CAMLET: A platform for automatic composition of inductive applications using ontologies. In C. Giraud-Carrier & B. Pfahringer (Eds.), Proceedings of the ICML-99 Workshop on Recent Advances in Meta-Learning and Future Work (pp. 59–65).Google Scholar
  45. Todorovski, L., Brazdil, P., & Soares, C. (2000). Report on the experiments with feature selection in meta-level learning. In P. Brazdil & A. Jorge (Eds.), Proceedings of the Data Mining, Decision Support, Meta-Learning and ILP Workshop at PKDD2000 (pp. 27–39).Google Scholar
  46. Todorovski, L., & Džeroski, S. (1999). Experiments in meta-level learning with ILP. In J. Rauch & J. Zytkow (Eds.), Proceedings of the Third European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD99) (pp. 98–106). New York: Springer.Google Scholar
  47. Todorovski, L., & Džeroski, S. (2000). Combining multiple models with meta decision trees. In D. Zighed, J. Komorowski, & J. Zytkow (Eds.), Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD00) (pp. 54–64). New York: Springer.Google Scholar
  48. Todorovski, L., & Džeroski, S. (2003). Combining classifiers with meta-decision trees. Machine Learning Journal. 50:3 pp. 223–249.Google Scholar
  49. Vilalta, R. (1999). Understanding accuracy performance through concept characterization and algorithm analysis. In C. Giraud-Carrier & B. Pfahringer (Eds.), Recent Advances in Meta-Learning and Future Work (pp. 3–9). J. Stefan Institute.Google Scholar
  50. Wolpert, D., & Macready, W. (1996). No free lunch theorems for search. Technical Report SFI-TR-95-02-010, The Santa Fe Institute. Available at http://lucy.ipk.fhg.de:80/~stephan/nfl/nfl.psGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Pavel B. Brazdil
    • 1
  • Carlos Soares
    • 1
  • Joaquim Pinto da Costa
    • 1
  1. 1.LIACC/Faculty of EconomicsUniversity of PortoPortugal

Personalised recommendations