Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy

  • Marina Skurichina
  • Liudmila I. Kuncheva
  • Robert P. W. Duin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2364)

Abstract

In combining classifiers, it is believed that diverse ensembles perform better than non-diverse ones. In order to test this hypothesis, we study the accuracy and diversity of ensembles obtained in bagging and boosting applied to the nearest mean classifier. In our simulation study we consider two diversity measures: the Q statistic and the disagreement measure. The experiments, carried out on four data sets have shown that both diversity and the accuracy of the ensembles depend on the training sample size. With exception of very small training sample sizes, both bagging and boosting are more useful when ensembles consist of diverse classifiers. However, in boosting the relationship between diversity and the efficiency of ensembles is much stronger than in bagging.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Chandrasekaran, B.: Dimensionality and Sample Size Considerations in Pattern Recognition Practice. In: Krishnaiah, P.R., Kanal, L.N. (eds.): Handbook of Statistics, Vol. 2. North-Holland, Amsterdam (1987) 835–855Google Scholar
  2. 2.
    Lam, L.: Classifier Combinations: Implementations and Theoretical Issues. In: Kittler, J., Roli, F. (eds.): Multiple Classifier Systems (Proc. of the First Int. Workshop MCS, Cagliari, Italy). Lecture Notes in Computer Science, Vol. 1857, Springer-Verlag, Berlin (2000) 78–86Google Scholar
  3. 3.
    Cunningham, P., Carney, J.: Diversity versus Quality in Classification Ensembles Based on Feature Selection. Tech. Report TCD-CS-2000-02, Dept. of Computer Science, Trinity College, Dublin (2000)Google Scholar
  4. 4.
    Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., Duin, R.P.W.: Is Independence Good for Combining Classifiers? In: Proc. of the 15th Int. Conference on Pattern Recognition, Vol. 2, Barcelona, Spain (2000) 169–171Google Scholar
  5. 5.
    Breiman, L.: Bagging predictors. In: Machine Learning Journal 24(2) (1996) 123–140MATHMathSciNetGoogle Scholar
  6. 6.
    Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: Proc. of the 13th Int. Conference (1996) 148–156Google Scholar
  7. 7.
    Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press (1990) 400–407Google Scholar
  8. 8.
    Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. In: Machine Learning 36 (1999) 105–142CrossRefGoogle Scholar
  9. 9.
    Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.): Multiple Classifier Systems (Proc. of the First Int. Workshop MCS, Cagliari, Italy). Lecture Notes in Computer Science, Vol. 1857, Springer-Verlag, Berlin (2000) 1–15Google Scholar
  10. 10.
    Quinlan, J.R.: Bagging, Boosting, and C4.5. In: Proc. of the 14th National Conference on Artificial Intelligence (1996)Google Scholar
  11. 11.
    Skurichina, M.: Stabilizing Weak Classifiers. PhD thesis, Delft University of Technology, Delft, The Netherlands (2001)Google Scholar
  12. 12.
    Avnimelech, R., Intrator, N.: Boosting Regression Estimators. In: Neural Computation 11 (1999) 499–520CrossRefGoogle Scholar
  13. 13.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. In: Journal of Computer and System Sciences 55(1) (1997) 119–139MATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Skurichina, M., Duin, R.P.W.: The Role of Combining Rules in Bagging and Boosting. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.): Advances in Pattern Recognition (Proc. of the Joint Int. Workshops SSPR and SPR, Alicante, Spain). Lecture Notes in Computer Science, Vol. 1876, Springer-Verlag, Berlin (2000) 631–640Google Scholar
  15. 15.
    Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman&Hall, New York (1993)MATHGoogle Scholar
  16. 16.
    Cortes, C., Vapnik, V.: Support Vector Networks. In: Machine Learning 20 (1995) 273–297MATHGoogle Scholar
  17. 17.
    Breiman, L.: Arcing Classifiers. In: Annals of Statistics 26(3) (1998) 801–849MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Kuncheva, L.I., Whitaker, C.J.: Measures of Diversity in Classifier Ensembles (submitted)Google Scholar
  19. 19.
    Yule, G.U.: On the Association of Attributes in Statistics. In: Phil. Transactions A(194) (1900) 257–319CrossRefGoogle Scholar
  20. 20.
    Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8) (1998) 832–844CrossRefGoogle Scholar
  21. 21.
    Skalak, D.B.: The Sources of Increased Accuracy for Two Proposed Boosting Algorithms. In: Proc. of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop (1996)Google Scholar
  22. 22.
    Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Marina Skurichina
    • 1
  • Liudmila I. Kuncheva
    • 2
  • Robert P. W. Duin
    • 1
  1. 1.Pattern Recognition Group, Department of Applied Physics, Faculty of Applied SciencesDelft University of TechnologyDelftThe Netherlands
  2. 2.School of InformaticsUniversity of WalesBangorUK

Personalised recommendations