Machine Learning

, Volume 54, Issue 3, pp 255–273 | Cite as

Is Combining Classifiers with Stacking Better than Selecting the Best One?

  • Saso Džeroski
  • Bernard Ženko
Article

Abstract

We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation.

multi-response model trees stacking combining classifiers ensembles of classifiers meta-learning 

References

  1. Aha, D., Kibler, W. D., & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.Google Scholar
  2. Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases.Google Scholar
  3. Cleary, J. G., & Trigg, L. E. (1995). K*: An instance-based learner using an entropic distance measure. In Proceedings of the 12th International Conference on Machine Learning (pp. 108–114). San Francisco, Morgan Kaufmann.Google Scholar
  4. Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18:4, 97–136.Google Scholar
  5. Dietterich, T. G. (1998). Approximate statistical test for comparing supervised classification learning algorithms. Neural Computation, 10:7, 1895–1923.Google Scholar
  6. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.Google Scholar
  7. Džeroski, S., & Ženko, B. (2002). Is combining classifiers better than selecting the best one? In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.Google Scholar
  8. Džeroski, S., & Ženko, B. (2002). Stacking with multi-response model trees. In Multiple Classifiers Systems, Proceedings of the Third International Workshop, Berlin: Springer.Google Scholar
  9. Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32:1, 63–76.Google Scholar
  10. Gams, M., Bohanec, M., & Cestnik, B. (1994). A schema for using multiple knowledge. In S. J. Hanson, T. Petsche, M. Kearns, & R. L. Rivest, editors, Computational Learning Theory and Natural Learning Systems, volume II (pp. 157–170). Cambridge, Massachusetts: MIT Press.Google Scholar
  11. John, G. H., & Langley, P. (1995). Estimating continuous distributions in bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (pp. 338–345). San Francisco, Morgan Kaufmann.Google Scholar
  12. Kohavi, R. (1995). The power of decision tables. In Proceedings of the Eighth European Conference on Machine Learning (pp. 174–189).Google Scholar
  13. Merz, C. J. (1999). Using correspondence analysis to combine classifiers. Machine Learning, 36:1/2, 33–58.Google Scholar
  14. Quinlan, J. R. (1992). Learning with continuous classes. In Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (pp. 343–348). Singapore, World Scientific.Google Scholar
  15. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann.Google Scholar
  16. Seewald, A. K. (2002). How to make stacking better and faster while also taking care of an unknown weakness. In Proceedings of the Nineteenth International Conference on Machine Learning, San Francisco: Morgan Kaufmann.Google Scholar
  17. Ting, K. M., & Witten, I. H. (1999) Issues in stacked generalization. Journal of Artificial Intelligence Research, 10, 271–289.Google Scholar
  18. Todorovski, L., & Džeroski, S. (2000). Combining multiple models with meta decision trees. In Proceedings of the Fourth European Conference on Principles of Data Mining and Knowledge Discovery (pp. 54–64). Berlin, Springer.Google Scholar
  19. Todorovski, L., & Džeroski, S. (2002). Combining classifiers with meta decision trees. Machine Learning, 50:3, 223–249.Google Scholar
  20. Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18:2, 77–95.Google Scholar
  21. Wang, Y., & Witten, I. H. (1997). Induction of model trees for predicting continuous classes. In Proceedings of the Poster Papers of the European Conference on Machine Learning, Prague. University of Economics, Faculty of Informatics and Statistics.Google Scholar
  22. Witten, I. H., & Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann.Google Scholar
  23. Wolpert, D. (1992). Stacked generalization. Neural Networks, 5:2, 241–260.Google Scholar
  24. Ženko, B., & Džeroski, S. (2002). Stacking with an extended set of meta-level attributes and MLR. In Proceedings of the Thirteenth European Conference on Machine Learning, Berlin: Springer.Google Scholar
  25. Ženko, B., Todorovski, L., & Džeroski, S. (2001). A comparison of stacking with MDTs to bagging, boosting, and other stacking methods. In Proceedings of the First IEEE International Conference on Data Mining (pp. 669–670). Los Alamitos, IEEE Computer Society.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Saso Džeroski
    • 1
  • Bernard Ženko
  1. 1.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations