Diversity in Random Subspacing Ensembles

  • Alexey Tsymbal
  • Mykola Pechenizkiy
  • Pádraig Cunningham
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3181)

Abstract

Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. It was shown experimentally and theoretically that in order for an ensemble to be effective, it should consist of classifiers having diversity in their predictions. A number of ways are known to quantify diversity in ensembles, but little research has been done about their appropriateness. In this paper, we compare eight measures of the ensemble diversity with regard to their correlation with the accuracy improvement due to ensembles. We conduct experiments on 21 data sets from the UCI machine learning repository, comparing the correlations for random subspacing ensembles with different ensemble sizes and with six different ensemble integration methods. Our experiments show that the greatest correlation of the accuracy improvement, on average, is with the disagreement, entropy, and ambiguity diversity measures, and the lowest correlation, surprisingly, is with the Q and double fault measures. Normally, the correlation decreases linearly as the ensemble size increases. Much higher correlation values can be seen with the dynamic integration methods, which are shown to better utilize the ensemble diversity than their static analogues.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bauer, E., Kohavi, R.: An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Machine Learning 36(1,2), 105–139 (1999)CrossRefGoogle Scholar
  2. 2.
    Blake, C.L., Keogh, E., Merz, C.J.: UCI repository of machine learning databases, Dept. of Information and Computer Science, University of California, Irvine, CA (1999), http://www.ics.uci.edu/~mlearn/MLRepository.html
  3. 3.
    Brodley, C., Lane, T.: Creating and exploiting coverage and diversity. In: Proc. AAAI 1996 Workshop on Integrating Multiple Learned Models, Portland, OR, pp. 8–14 (1996)Google Scholar
  4. 4.
    Cunningham, P., Carney, J.: Diversity versus quality in classification ensembles based on feature selection. In: Lopez de Mantaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 109–116. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40(2), 139–157 (2000)CrossRefGoogle Scholar
  6. 6.
    Dietterich, T.G.: Machine learning research: four current directions. AI Magazine 18(4), 97–136 (1997)Google Scholar
  7. 7.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zeroone loss. Machine Learning 29(2,3), 103–130 (1997)MATHCrossRefGoogle Scholar
  8. 8.
    Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification processes. Image Vision and Computing Journal 19(9-10), 699–707 (2001)CrossRefGoogle Scholar
  9. 9.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  10. 10.
    Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. MIT Press, Cambridge (1995)Google Scholar
  11. 11.
    Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207 (2003)MATHCrossRefGoogle Scholar
  12. 12.
    Opitz, D.: Feature selection for ensembles. In: Proc. 16th National Conf. on Artificial Intelligence, pp. 379–384. AAAI Press, Menlo Park (1999)Google Scholar
  13. 13.
    Puuronen, S., Terziyan, V., Tsymbal, A.: A dynamic integration algorithm for an ensemble of classifiers. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 592–600. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Schaffer, C.: Selecting a classification method by cross-validation. Machine Learning 13, 135–143 (1993)Google Scholar
  15. 15.
    Shipp, C.A., Kuncheva, L.I.: Relationship between combination methods and measures of diversity in combining classifiers. Information Fusion 3, 135–148 (2002)CrossRefGoogle Scholar
  16. 16.
    Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: AAAI 1996 Workshop on Integrating Multiple Models for Improving and Scaling Machine Learning Algorithms (in conjunction with AAAI 1996), Portland, Oregon, USA, pp. 120–125 (1996)Google Scholar
  17. 17.
    Skurichina, M., Duin, R.P.W.: Bagging and the random subspace method for redundant feature spaces. In: Kittler, J., Roli, F. (eds.) MCS 2001. LNCS, vol. 2096, pp. 1–10. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  18. 18.
    Tsymbal, A., Puuronen, S., Patterson, D.: Ensemble feature selection with the simple Bayesian classification. Information Fusion 4(2), 87–100 (2003)CrossRefGoogle Scholar
  19. 19.
    Tsymbal, A., Puuronen, S., Skrypnyk, I.: Ensemble feature selection with dynamic integration of classifiers. In: Int. ICSC Congress on Computational Intelligence Methods and Applications CIMA 2001, Bangor, Wales, U.K, pp. 558–564 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Alexey Tsymbal
    • 1
  • Mykola Pechenizkiy
    • 2
  • Pádraig Cunningham
    • 1
  1. 1.Department of Computer ScienceTrinity College DublinIreland
  2. 2.Department of Computer Science and Information SystemsUniversity of JyväskyläFinland

Personalised recommendations