Advertisement

Statistics and Computing

, Volume 19, Issue 3, pp 317–327 | Cite as

A novel method for constructing ensemble classifiers

  • Chun-Xia ZhangEmail author
  • Jiang-She Zhang
Article

Abstract

This paper presents a novel ensemble classifier generation method by integrating the ideas of bootstrap aggregation and Principal Component Analysis (PCA). To create each individual member of an ensemble classifier, PCA is applied to every out-of-bag sample and the computed coefficients of all principal components are stored, and then the principal components calculated on the corresponding bootstrap sample are taken as additional elements of the original feature set. A classifier is trained with the bootstrap sample and some features randomly selected from the new feature set. The final ensemble classifier is constructed by majority voting of the trained base classifiers. The results obtained by empirical experiments and statistical tests demonstrate that the proposed method performs better than or as well as several other ensemble methods on some benchmark data sets publicly available from the UCI repository. Furthermore, the diversity-accuracy patterns of the ensemble classifiers are investigated by kappa-error diagrams.

Keywords

Ensemble classifier Bootstrap Bagging Random forest Adaboost Principal component analysis Kappa-error diagram 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpaydin, E.: Combined 5×2 cv F test for comparing supervised classification learning algorithms. Neural Comput. 11(8), 1885–1892 (1999) CrossRefGoogle Scholar
  2. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A comparison of decision tree ensemble creation techniques. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 173–180 (2007) CrossRefGoogle Scholar
  3. Blake, C.L., Merz, C.J.: UCI repository of machine learning datasets. http://www.ics.uci.edu/~mlearn/MLRepository.html (1998)
  4. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996a) zbMATHMathSciNetGoogle Scholar
  5. Breiman, L.: Out-of-bag estimation. Technical report, Statistics Department, University of California Berkeley, Berkeley, CA (1996b) Google Scholar
  6. Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998) zbMATHCrossRefMathSciNetGoogle Scholar
  7. Breiman, L.: Random forests. Ann. Stat. 45(1), 5–32 (2001) zbMATHMathSciNetGoogle Scholar
  8. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Tree. Chapman and Hall, New York (1984) Google Scholar
  9. Chandra, A., Yao, X.: Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(7–9), 686–700 (2006) CrossRefGoogle Scholar
  10. Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 10(7), 1895–1923 (1998) CrossRefGoogle Scholar
  11. Dietterich, T.G.: An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach. Learn. 40(2), 139–157 (2000) CrossRefGoogle Scholar
  12. Efron, B., Tibshirani, R.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993) zbMATHGoogle Scholar
  13. Fleiss, J.L.: Statistical Methods for Rates and Proportions, 2nd edn. Wiley, New York (1981) zbMATHGoogle Scholar
  14. Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, pp. 148–156 (1996) Google Scholar
  15. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997) zbMATHCrossRefMathSciNetGoogle Scholar
  16. Hansen, L.K., Salamon, P.: Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12(10), 993–1001 (1990) CrossRefGoogle Scholar
  17. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998) CrossRefGoogle Scholar
  18. Hothorn, T., Lausen, B.: Double-bagging: combining classifiers by bootstrap aggregation. Pattern Recognit. 36(6), 1303–1309 (2003) zbMATHCrossRefGoogle Scholar
  19. Hothorn, T., Lausen, B.: Bundling classifiers by bagging trees. Comput. Stat. Data Anal. 49(4), 1068–1078 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  20. Hothorn, T., Leisch, F., Zeileis, A., Hornik, K.: The design and analysis of benchmark experiments. J. Comput. Graph. Stat. 14(3), 675–699 (2005) CrossRefMathSciNetGoogle Scholar
  21. Krogh, A., Vedelsby, J.: Neural network ensembles, cross validation, and active learning. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 231–238. MIT Press, Cambridge (1995) Google Scholar
  22. Leblanc, M., Tibshirani, R.: Combining estimates in regression and classification. J. Am. Stat. Assoc. 91(436), 1641–1650 (1996) zbMATHCrossRefMathSciNetGoogle Scholar
  23. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 211–218. Morgan Kaufmann, San Mateo (1997) Google Scholar
  24. Meir, R., Rätsch, G.: An introduction to boosting and leveraging. In: Advanced Lectures on Machine Learning. Lecture Notes in Computer Science, vol. 2600, pp. 118–183. Springer, Berlin (2003) CrossRefGoogle Scholar
  25. Optiz, D.W., Shavlik, J.W.: Generating accurate and diverse members of a neural-network ensemble. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.L. (eds.) Advances in Neural Information Processing Systems, vol. 8, pp. 535–541. MIT Press, Cambridge (1996) Google Scholar
  26. Optiz, D., Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198 (1999) Google Scholar
  27. Rodríguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation forest: a new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1619–1630 (2006) CrossRefGoogle Scholar
  28. Skurichina, M., Duin, R.P.W.: Combining feature subsets in feature selection. In: Multiple Classifier Systems. Lecture Notes in Computer Science, vol. 2541, pp. 165–175. Springer, Berlin (2005) Google Scholar
  29. Tumer, K., Oza, N.C.: Input decimated ensembles. Pattern Anal. Appl. 6(1), 65–77 (2003) zbMATHCrossRefMathSciNetGoogle Scholar
  30. Tresp, V.: Committee machines. In: Hu, Y.H., Hwang, J.-N. (eds.) Handbook for Neural Network Signal Processing. CRC, Boca Raton (2001) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.School of ScienceXi’an Jiaotong UniversityXi’an ShaanxiPeople’s Republic of China

Personalised recommendations