Abstract
The paper introduces meta decision trees (MDTs), a novel method for combining multiple classifiers. Instead of giving a prediction, MDT leaves specify which classifier should be used to obtain a prediction. We present an algorithm for learning MDTs based on the C4.5 algorithm for learning ordinary decision trees (ODTs). An extensive experimental evaluation of the new algorithm is performed on twenty-one data sets, combining classifiers generated by five learning algorithms: two algorithms for learning decision trees, a rule learning algorithm, a nearest neighbor algorithm and a naive Bayes algorithm. In terms of performance, stacking with MDTs combines classifiers better than voting and stacking with ODTs. In addition, the MDTs are much more concise than the ODTs and are thus a step towards comprehensible combination of multiple classifiers. MDTs also perform better than several other approaches to stacking.
Article PDF
Similar content being viewed by others
References
Ali, K. M. (1996). On explaining degree of error reduction due to combining multiple decision trees. In AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Available at http://www.cs.fit.edu/~imlm/imlm96/
Ali, K. M., & Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24, 173–202.
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. University of California, Irvine, Department of Information and Computer Sciences. Available at http://www.ics.uci.edu/ ~mlearn/MLRepository.html
Brazdil, P. B., & Henery, R. J. (1994). Analysis of results. In D. Michie, D. J. Spiegelhalter, & C. C. Taylor (Eds.), Machine learning, neural and statistical classification. Ellis Horwood.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24:2, 123–140.
Chan, P. K.,& Stolfo, S. J. (1997). On the accuracy of meta-learning for scalable data mining. Journal of Intelligent Information Systems, 8:1, 5–28.
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the Fifth European Working Session on Learning (pp. 151–163). Berlin: Springer-Verlag.
Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18:4, 97–136.
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Gama, J. (1999). Discriminant trees. In Proceedings of the Sixteenth International Conference on Machine Learning (pp. 134–142). San Mateo, CA: Morgan Kaufmann.
Gama, J. (2000). A linear-bayes classifier. Technical Report. Artificial Intelligence and Computer Science Laboratory, University of Porto.
Gama, J., Brazdil, P., & Valdes-Perez, R. (2000). Cascade generalization. Machine Learning, 41:3, 315–343.
Koppel, M., & Engelson, S. P. (1996). Integrating multiple classifiers by finding their areas of expertise. In AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Available at http://www.cs.fit.edu/~imlm/imlm96/
Merz, C. J. (1999). Using correspondence analysis to combine classifiers. Machine Learning, 36:1/2, 33–58. Dordrecht: Kluwer Academic Publishers.
Ortega, J. (1996). Exploiting multiple existing models and learning algorithms. In AAAI-96 Workshop on Integrating Multiple Learned Models for Improving and Scaling Machine Learning Algorithms. Available at http://www.cs.fit.edu/~imlm/imlm96/
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Todorovski, L., & Džeroski, S. (1999). Experiments in meta-level learning with ILP. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery (pp. 98–106). Berlin: Springer-Verlag.
Todorovski, L., & Džeroski, S. (2000). Combining two aspects of meta-learning with heterogeneous meta decision trees. In Proceedings of the Fifth International Workshop on Multistrategy Learning (pp. 221–232).
Wettschereck, D. (1994). A study of distance-based machine learning algorithms. Ph.D. Thesis, Department of Computer Science, Oregon State University, Corvallis.
Witten, I. H., & Frank, E. (1999). Data mining: Practical machine learning tools and techniques with Java implementations. San Mateo, CA: Morgan Kaufmann.
Wolpert, D. (1992). Stacked generalization. Neural Networks, 5:2, 241–260.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Todorovski, L., Džeroski, S. Combining Classifiers with Meta Decision Trees. Machine Learning 50, 223–249 (2003). https://doi.org/10.1023/A:1021709817809
Issue Date:
DOI: https://doi.org/10.1023/A:1021709817809