Computational Statistics

, Volume 29, Issue 3–4, pp 849–867 | Cite as

Canonical Forest

  • Yu-Chuan Chen
  • Hyejung Ha
  • Hyunjoong Kim
  • Hongshik AhnEmail author
Original Paper


We propose a new classification ensemble method named Canonical Forest. The new method uses canonical linear discriminant analysis (CLDA) and bootstrapping to obtain accurate and diverse classifiers that constitute an ensemble. We note CLDA serves as a linear transformation tool rather than a dimension reduction tool. Since CLDA will find the transformed space that separates the classes farther in distribution, classifiers built on this space will be more accurate than those on the original space. To further facilitate the diversity of the classifiers in an ensemble, CLDA is applied only on a partial feature space for each bootstrapped data. To compare the performance of Canonical Forest and other widely used ensemble methods, we tested them on 29 real or artificial data sets. Canonical Forest performed significantly better in accuracy than other ensemble methods in most data sets. According to the investigation on the bias and variance decomposition, the success of Canonical Forest can be attributed to the variance reduction.


Canonical linear discriminant analysis Classification  Ensemble Linear discriminant analysis Rotation Forest 



Hyunjoong Kim’s work was partly supported by Basic Science Research program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2012R1A1A2042177). Hongshik Ahn’s work was partially supported by the IT Consiliance Creative Project through the Ministry of Knowledge Economy, Republic of Korea.


  1. Ahn H, Moon H, Fazzari MJ, Lim N, Chen JJ, Kodell RL (2007) Classification by ensembles from random partitions of high-dimensional data. Comput Stat Data Anal 51:6166–6179CrossRefzbMATHMathSciNetGoogle Scholar
  2. Anthony M, Biggs N (1992) Computational learning theory. Cambridge University Press, CambridgezbMATHGoogle Scholar
  3. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Science.
  4. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140zbMATHMathSciNetGoogle Scholar
  5. Breiman L (1998) Arcing classifiers. Ann Stat 26:801–849CrossRefzbMATHMathSciNetGoogle Scholar
  6. Breiman L (2001) Random Forest. Mach Learn 45:5–32CrossRefzbMATHGoogle Scholar
  7. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, BelmontzbMATHGoogle Scholar
  8. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46CrossRefGoogle Scholar
  9. Freund Y, Schapire R (1996) Experiments with a new boosting algorithm. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 148–156Google Scholar
  10. Freund Y, Schapire R (1997) A decision-theoretic generalization of online learning and an application to boosting. J Comput Syst Sci 55:119–139CrossRefzbMATHMathSciNetGoogle Scholar
  11. Geman S, Bienenstock E, Doursat R (1992) Neural networks and the bias/variance dilemma. Neural Comput 4:1–48CrossRefGoogle Scholar
  12. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New YorkCrossRefGoogle Scholar
  13. Hayashi K (2012) A boosting method with asymmetric mislabeling probabilities which depend on covariates. Comput Stat 27:203–218CrossRefGoogle Scholar
  14. Heinz G, Peterson LJ, Johnson RW, Kerk CJ (2003) Exploring relationships in body dimensions. J Stat Educ 11.
  15. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70zbMATHMathSciNetGoogle Scholar
  16. Hothorn T, Lausen B (2003) Double-Bagging: Combining classifiers by bootstrap aggregation. Pattern Recognit 36:1303–1309CrossRefzbMATHGoogle Scholar
  17. Ji C, Ma S (1997) Combinations of weak classifiers. IEEE Trans Neural Netw 8(1):32–42CrossRefGoogle Scholar
  18. Kestler HA, Lausser L, Linder W, Palm G (2011) On the fusion of threshold classifiers for categorization and dimensionality reduction. Comput Stat 26:321–340CrossRefGoogle Scholar
  19. Kim H, Loh WY (2001) Classification trees with unbiased multiway splits. J Am Stat Assoc 96:589–604CrossRefMathSciNetGoogle Scholar
  20. Kim H, Loh WY (2003) Classification trees with bivariate linear discriminant node models. J Comput Graph Stat 12:512–530CrossRefMathSciNetGoogle Scholar
  21. Kim H, Kim H, Moon H, Ahn H (2011) A weight-adjusted voting algorithm for ensemble of classifiers. J Korean Stat Soc 40:437–449CrossRefMathSciNetGoogle Scholar
  22. Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-one loss functions. In: Proceedings of the thirteenth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 275–283Google Scholar
  23. Kong EB, Dietterich TG (1995) Error-correcting output coding corrects bias and variance. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Francisco, pp 313–321Google Scholar
  24. Kuncheva LI, Rodríguez JJ (2007) An experimental study on rotation forest ensembles. In: Haindl H, Kittler J, Roli F (eds) Multiple classifier systems. Springer, Berlin, pp 459–468CrossRefGoogle Scholar
  25. Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51:181–207CrossRefzbMATHGoogle Scholar
  26. Leisch F, Dimitriadou E (2010) mlbench: machine learning benchmark problems. R package version 2.0-0Google Scholar
  27. Loh WY (2010) Improving the precision of classification trees. Ann Appl Stat 4:1710–1737MathSciNetGoogle Scholar
  28. Rodríguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630CrossRefGoogle Scholar
  29. Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227Google Scholar
  30. Statlib (2010) Datasets archive. Carnegie Mellon University, Department of Statistics.
  31. Terhune JM (1994) Geographical variation of harp seal underwater vocalisations. Can J Zool 72:892–897CrossRefGoogle Scholar
  32. Zhu J, Rosset S, Zou H, Hastie T (2009) Multi-class Adaboost. Stat Interface 2:349–360CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yu-Chuan Chen
    • 1
  • Hyejung Ha
    • 3
  • Hyunjoong Kim
    • 3
  • Hongshik Ahn
    • 1
    • 2
    Email author
  1. 1.Department of Applied Mathematics and StatisticsStony Brook UniversityStony BrookUSA
  2. 2.SUNY KoreaIncheon South Korea
  3. 3.Department of Applied StatisticsYonsei UniversitySeoul South Korea

Personalised recommendations