Advertisement

Frontiers of Computer Science

, Volume 12, Issue 3, pp 560–570 | Cite as

Structural diversity for decision tree ensemble learning

Research Article

Abstract

Decision trees are a kind of off-the-shelf predictive models, and they have been successfully used as the base learners in ensemble learning. To construct a strong classifier ensemble, the individual classifiers should be accurate and diverse. However, diversity measure remains a mystery although there were many attempts. We conjecture that a deficiency of previous diversity measures lies in the fact that they consider only behavioral diversity, i.e., how the classifiers behave when making predictions, neglecting the fact that classifiers may be potentially different even when they make the same predictions. Based on this recognition, in this paper, we advocate to consider structural diversity in addition to behavioral diversity, and propose the TMD (tree matching diversity) measure for decision trees. To investigate the usefulness of TMD, we empirically evaluate performances of selective ensemble approaches with decision forests by incorporating different diversity measures. Our results validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed. This may raise a new direction to design better diversity measures and ensemble methods.

Keywords

ensemble learning structural diversity decision tree 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The authors would like to thank anonymous reviewers for their helpful comments and suggestions. This research was supported by the National Natural Science Foundation of China (Grant No. 61333014).

Supplementary material

11704_2018_7151_MOESM1_ESM.ppt (109 kb)
Supplementary material, approximately 109 KB.

References

  1. 1.
    Stiglic G, Kocbek S, Pernek I, Kokol P. Comprehensive decision tree models in bioinformatics. PloS One, 2012, 7(3): e33812CrossRefGoogle Scholar
  2. 2.
    Creamer G, Freund Y. Using boosting for financial analysis and performance prediction: application to s&p 500 companies, latin american adrs and banks. Computational Economics, 2010, 36(2): 133–151CrossRefGoogle Scholar
  3. 3.
    Rokach L. Decision forest: twenty years of research. Information Fusion, 2016, 27: 111–125CrossRefGoogle Scholar
  4. 4.
    Zhou Z H. Ensemble Methods: Foundations and Algorithms. Boca Raton, FL: Chapman & Hall/CRC, 2012Google Scholar
  5. 5.
    Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32CrossRefMATHGoogle Scholar
  6. 6.
    Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning, 2006, 63(1): 3–42CrossRefMATHGoogle Scholar
  7. 7.
    Rodriguez J J, Kuncheva L I, Alonso C J. Rotation forest: a new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619–1630CrossRefGoogle Scholar
  8. 8.
    Brown G, Wyatt J, Harris R, Yao X. Diversity creation methods: a survey and categorisation. Information Fusion, 2005, 6(1): 5–20CrossRefGoogle Scholar
  9. 9.
    Melville P, Mooney R J. Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. 2003, 505–510Google Scholar
  10. 10.
    Yu Y, Li Y F, Zhou Z H. Diversity regularized machine. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1603–1608Google Scholar
  11. 11.
    Breiman L. Randomizing outputs to increase prediction accuracy. Machine Learning, 2000, 40(3): 229–242CrossRefMATHGoogle Scholar
  12. 12.
    Kuncheva L I, Whitaker C J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 2003, 51(2): 181–207CrossRefMATHGoogle Scholar
  13. 13.
    Tang E K, Suganthan P N, Yao X. An analysis of diversity measures. Machine Learning, 2006, 65(1): 247–271CrossRefGoogle Scholar
  14. 14.
    Didaci L, Fumera G, Roli F. Diversity in classifier ensembles: fertile concept or dead end? In: Proceedings of the 11th International Workshop on Multiple Classifier Systems. 2013, 37–48CrossRefGoogle Scholar
  15. 15.
    Reyzin L, Schapire R E. How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning. 2006, 753–760Google Scholar
  16. 16.
    Quinlan J R. Simplifying decision trees. International Journal of Human-Computer Studies, 1999, 51(2): 497–510CrossRefGoogle Scholar
  17. 17.
    Freund Y, Mason L. The alternating decision tree learning algorithm. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 124–133Google Scholar
  18. 18.
    Friedman J H. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001, 1189–1232Google Scholar
  19. 19.
    Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. In: Proceedings of International Conference on Neural Information Processing Systems. 1995, 231–238Google Scholar
  20. 20.
    Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation, 1992, 4(1): 1–58CrossRefGoogle Scholar
  21. 21.
    Margineantu D D, Dietterich T G. Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 211–218Google Scholar
  22. 22.
    Brown G, Kuncheva L I. “Good” and “bad” diversity in majority vote ensembles. In: Proceedings of the 9th International Workshop on Multiple Classifier Systems. 2010, 124–133CrossRefGoogle Scholar
  23. 23.
    Brown G. An information theoretic perspective on multiple classifier systems. In: Proceedings of the 8th International Workshop on Multiple Classifier Systems. 2009, 344–353CrossRefGoogle Scholar
  24. 24.
    Zhou Z H, Li N. Multi-information ensemble diversity. In: Proceedings of the 9th International Workshop on Multiple Classifier Systems. 2010, 134–144CrossRefGoogle Scholar
  25. 25.
    Brooks F P Jr. Three great challenges for half-century-old computer science. Journal of the ACM, 2003, 50(1): 25–26CrossRefGoogle Scholar
  26. 26.
    Zhou Z H, Wu J, TangW. Ensembling neural networks: many could be better than all. Artificial Intelligence, 2002, 137(1): 239–263MathSciNetCrossRefMATHGoogle Scholar
  27. 27.
    Martínez-Munoz G, Hernández-Lobato D, Suárez A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 245–259CrossRefGoogle Scholar
  28. 28.
    Giacinto G, Roli F, Fumera G. Design of effective multiple classifier systems by clustering of classifiers. In: Proceedings of the 15th International Conference on Pattern Recognition. 2000, 160–163Google Scholar
  29. 29.
    Lazarevic A, Obradovic Z. Effective pruning of neural network classifier ensembles. In: Proceedings of International Joint Conference on Neural Networks. 2001, 796–801Google Scholar
  30. 30.
    Zhang Y, Burer S, Street W N. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 2006, 7: 1315–1338MathSciNetMATHGoogle Scholar
  31. 31.
    Li N, Zhou Z H. Selective ensemble under regularization framework. In: Proceedings of the 8th International Workshop on Multiple Classifier Systems. 2009, 293–303CrossRefGoogle Scholar
  32. 32.
    Qian C, Yu Y, Zhou Z H. Pareto ensemble pruning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2935–2941Google Scholar
  33. 33.
    Pawlik M, Augsten N. Tree edit distance: Robust and memoryefficient. Information Systems, 2016, 56: 157–173CrossRefGoogle Scholar
  34. 34.
    Wolberg WH, Mangasarian O L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 1990, 87(23): 9193–9196CrossRefMATHGoogle Scholar
  35. 35.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 2009, 11(1): 10–18CrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina

Personalised recommendations