Pattern Analysis and Applications

, Volume 13, Issue 1, pp 59–77 | Cite as

A variant of Rotation Forest for constructing ensemble classifiers

Theoretical Advances

Abstract

Rotation Forest, an effective ensemble classifier generation technique, works by using principal component analysis (PCA) to rotate the original feature axes so that different training sets for learning base classifiers can be formed. This paper presents a variant of Rotation Forest, which can be viewed as a combination of Bagging and Rotation Forest. Bagging is used here to inject more randomness into Rotation Forest in order to increase the diversity among the ensemble membership. The experiments conducted with 33 benchmark classification data sets available from the UCI repository, among which a classification tree is adopted as the base learning algorithm, demonstrate that the proposed method generally produces ensemble classifiers with lower error than Bagging, AdaBoost and Rotation Forest. The bias–variance analysis of error performance shows that the proposed method improves the prediction error of a single classifier by reducing much more variance term than the other considered ensemble procedures. Furthermore, the results computed on the data sets with artificial classification noise indicate that the new method is more robust to noise and kappa-error diagrams are employed to investigate the diversity–accuracy patterns of the ensemble classifiers.

Keywords

Ensemble classifier Rotation Forest Bagging AdaBoost Kappa-error diagram 

References

  1. 1.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140MATHMathSciNetGoogle Scholar
  2. 2.
    Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In: Proceedings of the 13th international conference on machine learning, Bari, Italy. Morgan Kaufmann, San Franciso, pp 148–156Google Scholar
  3. 3.
    Leblanc M, Tibshirani R (1996) Combining estimates in regression and classification. J Am Statist Assoc 91(436):1641–1650MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput System Sci 55(1):119–139MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32MATHCrossRefGoogle Scholar
  6. 6.
    Latinne P, Debeir O, Decaestecker C (2002) Combining different methods and number of weak decision trees. Pattern Anal Appl 5(2):201–209MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Skurichina M, Duin RPW (2002) Bagging, boosting and the random subspace method for linear classifiers. Pattern Anal Appl 5(2):121–135MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Tumer K, Oza NC (2003) Input decimated ensembles. Pattern Anal Appl 6(1):65–77MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Atınçay H (2004) Optimal resampling and classifier prototype selection in classifier ensembles using genetic algorithms. Pattern Anal Appl 7(3):285–295MathSciNetGoogle Scholar
  10. 10.
    Atınçay H (2005) A dempster-shafter theoretic framework for boosting based ensemble design. Pattern Anal Appl 8(3):287–302CrossRefMathSciNetGoogle Scholar
  11. 11.
    Masip D, Kuncheva LI, Vitrià (2005) An ensemble-based method for linear feature extraction for two-class problems. Pattern Anal Appl 8(3):227–237Google Scholar
  12. 12.
    Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRefGoogle Scholar
  13. 13.
    Banfield RE, Hall LO, Bowyer KW, Kegelmeyer WP (2007) A comparison of decision tree ensemble creation techniques. IEEE Trans Pattern Anal Mach Intell 29(1):173–180CrossRefGoogle Scholar
  14. 14.
    Rasheed S, Stashuk DW, Kamel MS (2008) Diversity-based combination of non-parametric classifiers for EMG signal decomposition. Pattern Anal Appl 11(3–4):385–408Google Scholar
  15. 15.
    Zhang CX, Zhang JS (2008) RotBoost: a technique for combining Rotation Forest and AdaBoost. Pattern Recog Lett 29(10):1524–1536CrossRefGoogle Scholar
  16. 16.
    Breiman L (1998) Arcing classifiers. Ann Statist 26(3):801–849MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Optiz D, Maclin R (1999) Popular ensemble methods: an empirical study. J Artif Intell Res 11:169–198Google Scholar
  18. 18.
    Friedman J, Hastie H, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Statist 28(2):337–407MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    Webb GI (2000) Multiboosting: a technique for combining boosting and bagging. Mach Learn 40(2):159–196CrossRefGoogle Scholar
  20. 20.
    Meir R, Rätsch G (2003) An introduction to boosting and leveraging. In: Advances lectures on machine learning. Lecture notes in computer science, vol 2600, pp 118–183Google Scholar
  21. 21.
    Jin R, Zhang J (2007) Multi-class learning by smoothed boosting. Mach Learn 67(3):207–227CrossRefGoogle Scholar
  22. 22.
    Schapire RE, Freund Y, Bartlett P, Lee WS (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann Statist 26(5):1651–1686MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1–2):105–139CrossRefGoogle Scholar
  24. 24.
    Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Mach Learn 40(2):139–157CrossRefGoogle Scholar
  25. 25.
    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman and Hall, New YorkMATHGoogle Scholar
  26. 26.
    Hansen LK, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001CrossRefGoogle Scholar
  27. 27.
    Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman and Hall, New YorkMATHGoogle Scholar
  28. 28.
    Rodríguez JJ, Alonso CJ, Prieto OJ (2005) Bias and variance of rotation-based ensembles. In: Computational intelligence and bioinspired systems. Lecture notes in computer science, vol 3512, pp 779–786Google Scholar
  29. 29.
    Kuncheva LI, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: Multiple classsifier systems. Lecture notes in computer science, vol 4472, pp 459–468Google Scholar
  30. 30.
    Kuncheva LI, Rodríguez JJ (2007) Classifier ensembles with a random linear oracle. IEEE Trans Knowl Data En 19(4):500–508CrossRefGoogle Scholar
  31. 31.
    Asuncion A, Newman DJ (2007) UCI machine learning repository. School of Information and Computer Science, University of California, University of California, Irvine. Available at: http://www.ics.uci.edu/~mlearn/MLRepository.htm
  32. 32.
    Optiz DW, Shavlik JW (1996) Genarating accurate and diverse members of a neural-network ensemble. In: Touretzky DS, Mozer MC, Hasselmo MM (eds) Advances in neural information processing system, vol 8, pp 535–541Google Scholar
  33. 33.
    Dietterich TG (1997) Machine-learning research: four current directions. AI Maga 18(4):97–136Google Scholar
  34. 34.
    Chandra A, Yao X (2006) Evolving hybrid ensembles of learning machines for better generalisation. Neurocomputing 69(1–2):686–700CrossRefGoogle Scholar
  35. 35.
    Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro G, Touretzky DS, Leen TK (eds) Advances in neural information processing system, vol 7, pp 231–238Google Scholar
  36. 36.
    Lim TS, Loh WY, Shin YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–229MATHCrossRefGoogle Scholar
  37. 37.
    Zhou ZH, Wu JX, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2):239–263MATHCrossRefMathSciNetGoogle Scholar
  38. 38.
    Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4(1):1–58CrossRefGoogle Scholar
  39. 39.
    Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the 12th international conference on machine learning. Morgan Kaufmann, San Franciso, pp 313–321Google Scholar
  40. 40.
    Kohavi R, Wolpert D (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the 13th international conference on machine learning, Bari, Italy. Morgan Kaufmann, San Franciso, pp 275–283Google Scholar
  41. 41.
    Friedman JH (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Min Knowl Disc 1(1):55–77CrossRefGoogle Scholar
  42. 42.
    Quinlan JR (1996) Bagging, boosting, and C4.5. In: Proceedings of the 13th national conference on artificial intelligence, Portland, Ore, pp 725–730Google Scholar
  43. 43.
    Miller RGJ (1991) Simultaneous statistical inference. Springer, New YorkGoogle Scholar
  44. 44.
    Maclin R, Optiz D (1997) An empirical evaluation of bagging and boosting. In: Proceedings of the 14th national conference on artificial intelligence. AAAI Press, Rhode Island, pp 546–551Google Scholar
  45. 45.
    Rätsch G, Onoda T, Müller KR (2001) Soft margins for Adaboost. Mach Learn 42(3):287–320MATHCrossRefGoogle Scholar
  46. 46.
    Margineantu DD, Dietterich TG (1997) Pruning adaptive boosting. In: Proceedings of the 14th international conference on machine learning. Morgan Kaufmann, San Franciso, pp 211–218Google Scholar
  47. 47.
    Fleiss JL, Levin B, Paik MC (1981) Statistical methods for rates and proportions. Wiley, New YorkMATHGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Faculty of ScienceXi’an Jiaotong UniversityXi’anChina

Personalised recommendations