Skip to main content

Decision tree ensembles based on kernel features

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

A classifier ensemble is a set of classifiers whose individual decisions are combined to classify new examples. Classifiers, which can represent complex decision boundaries are accurate. Kernel functions can also represent complex decision boundaries. In this paper, we study the usefulness of kernel features for decision tree ensembles as they can improve the representational power of individual classifiers. We first propose decision tree ensembles based on kernel features and found that the performance of these ensembles is strongly dependent on the kernel parameters; the selected kernel and the dimension of the kernel feature space. To overcome this problem, we present another approach to create ensembles that combines the existing ensemble methods with the kernel machine philosophy. In this approach, kernel features are created and concatenated with the original features. The classifiers of an ensemble are trained on these extended feature spaces. Experimental results suggest that the approach is quite robust to the selection of parameters. Experiments also show that different ensemble methods (Random Subspace, Bagging, Adaboost.M1 and Random Forests) can be improved by using this approach.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

References

  1. 1.

    Ahmad A, Brown G. Article in press

  2. 2.

    Amasyali M, Ersoy O. Article in press

  3. 3.

    Arriaga R, Vempala S (2006) An Algorithmic Theory of Learning:Robust Concepts and Random Projection. Mach Learn 63(2):161–182

    Article  MATH  Google Scholar 

  4. 4.

    Balcan MF, Blum A (2006) On a Theory of Learning with Similarity Functions. In:Proceedings of the 23rd International Conference on Machine Learning

  5. 5.

    Balcan MF, Blum A, Vempala S (2006) Kernels as Features: On Kernels, Margins, and Low-dimensional Mappings. Mach Learn 65:79–94

    Article  Google Scholar 

  6. 6.

    Braun ML, Buhmann JM, Muller KR (2009) On Relevant Dimensions in Kernel Feature Spaces. J Mach Learn Res 9:1875–1908

    MathSciNet  Google Scholar 

  7. 7.

    Breiman L (1996) Bagging Predictors. Mach Learn 24(2):123–140

    MathSciNet  MATH  Google Scholar 

  8. 8.

    Breiman L (2001) Random Forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  9. 9.

    Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and Regression Trees. Wadsworth International Group, CA

    MATH  Google Scholar 

  10. 10.

    Burges CJC (1998) A Tutorial on Support Vector Machines for Pattern Recognition. Data Min Knowl Disc 2:121–167

    Article  Google Scholar 

  11. 11.

    Caruana R, Niculescu-Mizil A (2006) An Empirical Comparison of Supervised Learning Algorithms

  12. 12.

    Dietterich T G (2000) Ensemble Methods in Machine Learning

  13. 13.

    Diosan L, Rogozan A, Pecuchet J (2012) Improving Classification Performance of Support Vector Machine by Genetically Optimising Kernel Shape and Hyper-parameters. Appl Intell 36(2):280–294

    Article  Google Scholar 

  14. 14.

    Freund Y, Schapire RE (1996) Experiments with a New Boosting Algorithm. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156

  15. 15.

    Gama J, Liu X, Cohen P (1997) In: In Second International Symposium on Advances in Intelligent Data Analysis pages 187–198 (ed) Oblique Linear Tree, Springer-Verlag

  16. 16.

    Geurts P, Ernst D, Wehenkel L (2006) Extremely Randomized Trees. Mach Learn 63(1):3–42

    Article  MATH  Google Scholar 

  17. 17.

    Hall M, Frank E, Holmes Geoffrey, Pfahringer B, Reutemann P, Witten Ian H (2009) The WEKA Data Mining Software: An Update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  18. 18.

    Hansen LK, Salamon P (1990) Neural Network Ensembles. IEEE Trans on Pattern Anal and Mach Intell 12(10):993–1001

    Article  Google Scholar 

  19. 19.

    Ho TK (1998) The Random Subspace Method for Constructing Decision Forests. IEEE Trans on Pattern Anal and Mach Intell 20(8):832–844

    Article  Google Scholar 

  20. 20.

    Kim H, Pang S, Je H, Kim D, Bang S (2003) Constructing Support Vector Machine Ensemble. Pattern Recog 36(12):2757–2767

    Article  MATH  Google Scholar 

  21. 21.

    Kuncheva LI (2004) Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience

  22. 22.

    Mansilla EB, Ho TM (2004) On Classifier Domains of Competence. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR04), pages 136–139

  23. 23.

    Maudes J, Rodrguez JJ, Garca-Osorio C, Pardo C (2011) Random Projections for Linear SVM Ensembles. Appl Intell 34(3):347–359

    Article  Google Scholar 

  24. 24.

    Polikar R (2006) Ensemble Based Systems in Decision Making. IEEE Circuits and Systems Magazine, pages 21–45.Third Quarter

  25. 25.

    Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann Inc., San Francisco, CA, USA

    Google Scholar 

  26. 26.

    Rätsch G, Onoda T, Müller K-R (2001) Soft margins for AdaBoost. Mach Learn 42(3):287–320

    Article  MATH  Google Scholar 

  27. 27.

    Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation Forest: A New Classifier Ensemble Method. IEEE Trans on Pattern Anal and Mach Intell 28(10):1619–1630

    Article  Google Scholar 

  28. 28.

    Rwebangira MR (2008) Techniques for Exploiting Unlabeled Data. PhD thesis, School of Computer Science. Carnegie Mellon University

  29. 29.

    Scholkopf B, Smola AJ, Muller K (1998) Nonlinear Component Analysis as a kernel Eigenvalue problem. Neural Comput 10:1299—1319

    Article  Google Scholar 

  30. 30.

    Statnikov A, Wang L, Aliferis CF (2008) A Comprehensive Comparison of Random Forests and Support Vector Machines for Microarray-based Cancer Classification. BMC Bioinforma 9 (319)

  31. 31.

    Truong Y, Lin X, Beecher C (2004) Learning a Complex Metabolomic Dataset Using Random Forests and Support Vector Machine. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Seattle, Washington, USA, August 22-25, 2004, pages 835–840

  32. 32.

    Tumer K, Ghosh J (1996) Error Correlation and Error Reduction in Ensemble Classifiers. Connect Sci 8(3):385–404

    Article  Google Scholar 

  33. 33.

    Valentini G, Dietterich T (2004) Bias-variance Analysis of Support Vector Machines for the Development of Svm-based Ensemble Methods. J Mach Learn Res:725–775

  34. 34.

    Vapnik V (1998) Statistical Learning Theory. Wiley-Interscience, New York

    MATH  Google Scholar 

  35. 35.

    Wang C, You W (2013) Boosting-SVM:Effective Learning With Reduced Data Dimension. Appl Intell 39(3):465–474

    Article  Google Scholar 

  36. 36.

    Wanga S, Mathewb A, Chenc Y, Xia L, Mab L, Lee J (2009) Empirical Analysis of Support Vector Machine Ensemble Classifiers. Expert Syst Appl 9(3):6466–6476

    Article  Google Scholar 

  37. 37.

    Webb GI (2000) Multiboosting: A Technique for Combining Boosting and Wagging. Mach Learn 40(2):159–196

    Article  Google Scholar 

  38. 38.

    Zhu M (2008) Kernels and Ensembles: Perspectives on Statistical Learning. The American Stat 62:97–109

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Amir Ahmad.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ahmad, A. Decision tree ensembles based on kernel features. Appl Intell 41, 855–869 (2014). https://doi.org/10.1007/s10489-014-0575-4

Download citation

Keywords

  • Classifier ensembles
  • Decision trees
  • Kernel features
  • Random subspaces
  • Bagging
  • AdaBoost.M1
  • Random forests