Structural diversity for decision tree ensemble learning

Sun, Tao; Zhou, Zhi-Hua

doi:10.1007/s11704-018-7151-8

Structural diversity for decision tree ensemble learning

Research Article
Published: 15 February 2018

Volume 12, pages 560–570, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Tao Sun¹ &
Zhi-Hua Zhou¹

254 Accesses
32 Citations
1 Altmetric
Explore all metrics

Abstract

Decision trees are a kind of off-the-shelf predictive models, and they have been successfully used as the base learners in ensemble learning. To construct a strong classifier ensemble, the individual classifiers should be accurate and diverse. However, diversity measure remains a mystery although there were many attempts. We conjecture that a deficiency of previous diversity measures lies in the fact that they consider only behavioral diversity, i.e., how the classifiers behave when making predictions, neglecting the fact that classifiers may be potentially different even when they make the same predictions. Based on this recognition, in this paper, we advocate to consider structural diversity in addition to behavioral diversity, and propose the TMD (tree matching diversity) measure for decision trees. To investigate the usefulness of TMD, we empirically evaluate performances of selective ensemble approaches with decision forests by incorporating different diversity measures. Our results validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed. This may raise a new direction to design better diversity measures and ensemble methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diversity-Based Random Forests with Sample Weight Learning

Article 07 June 2019

Increasing Diversity in Random Forests Using Naive Bayes

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

References

Stiglic G, Kocbek S, Pernek I, Kokol P. Comprehensive decision tree models in bioinformatics. PloS One, 2012, 7(3): e33812
Article Google Scholar
Creamer G, Freund Y. Using boosting for financial analysis and performance prediction: application to s&p 500 companies, latin american adrs and banks. Computational Economics, 2010, 36(2): 133–151
Article Google Scholar
Rokach L. Decision forest: twenty years of research. Information Fusion, 2016, 27: 111–125
Article Google Scholar
Zhou Z H. Ensemble Methods: Foundations and Algorithms. Boca Raton, FL: Chapman & Hall/CRC, 2012
Google Scholar
Breiman L. Random forests. Machine Learning, 2001, 45(1): 5–32
Article MATH Google Scholar
Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Machine Learning, 2006, 63(1): 3–42
Article MATH Google Scholar
Rodriguez J J, Kuncheva L I, Alonso C J. Rotation forest: a new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1619–1630
Article Google Scholar
Brown G, Wyatt J, Harris R, Yao X. Diversity creation methods: a survey and categorisation. Information Fusion, 2005, 6(1): 5–20
Article Google Scholar
Melville P, Mooney R J. Constructing diverse classifier ensembles using artificial training examples. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence. 2003, 505–510
Google Scholar
Yu Y, Li Y F, Zhou Z H. Diversity regularized machine. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1603–1608
Google Scholar
Breiman L. Randomizing outputs to increase prediction accuracy. Machine Learning, 2000, 40(3): 229–242
Article MATH Google Scholar
Kuncheva L I, Whitaker C J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 2003, 51(2): 181–207
Article MATH Google Scholar
Tang E K, Suganthan P N, Yao X. An analysis of diversity measures. Machine Learning, 2006, 65(1): 247–271
Article Google Scholar
Didaci L, Fumera G, Roli F. Diversity in classifier ensembles: fertile concept or dead end? In: Proceedings of the 11th International Workshop on Multiple Classifier Systems. 2013, 37–48
Chapter Google Scholar
Reyzin L, Schapire R E. How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning. 2006, 753–760
Google Scholar
Quinlan J R. Simplifying decision trees. International Journal of Human-Computer Studies, 1999, 51(2): 497–510
Article Google Scholar
Freund Y, Mason L. The alternating decision tree learning algorithm. In: Proceedings of the 16th International Conference on Machine Learning. 1999, 124–133
Google Scholar
Friedman J H. Greedy function approximation: a gradient boosting machine. Annals of Statistics. 2001, 1189–1232
Google Scholar
Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. In: Proceedings of International Conference on Neural Information Processing Systems. 1995, 231–238
Google Scholar
Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation, 1992, 4(1): 1–58
Article Google Scholar
Margineantu D D, Dietterich T G. Pruning adaptive boosting. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 211–218
Google Scholar
Brown G, Kuncheva L I. “Good” and “bad” diversity in majority vote ensembles. In: Proceedings of the 9th International Workshop on Multiple Classifier Systems. 2010, 124–133
Chapter Google Scholar
Brown G. An information theoretic perspective on multiple classifier systems. In: Proceedings of the 8th International Workshop on Multiple Classifier Systems. 2009, 344–353
Chapter Google Scholar
Zhou Z H, Li N. Multi-information ensemble diversity. In: Proceedings of the 9th International Workshop on Multiple Classifier Systems. 2010, 134–144
Chapter Google Scholar
Brooks F P Jr. Three great challenges for half-century-old computer science. Journal of the ACM, 2003, 50(1): 25–26
Article Google Scholar
Zhou Z H, Wu J, TangW. Ensembling neural networks: many could be better than all. Artificial Intelligence, 2002, 137(1): 239–263
Article MathSciNet MATH Google Scholar
Martínez-Munoz G, Hernández-Lobato D, Suárez A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(2): 245–259
Article Google Scholar
Giacinto G, Roli F, Fumera G. Design of effective multiple classifier systems by clustering of classifiers. In: Proceedings of the 15th International Conference on Pattern Recognition. 2000, 160–163
Google Scholar
Lazarevic A, Obradovic Z. Effective pruning of neural network classifier ensembles. In: Proceedings of International Joint Conference on Neural Networks. 2001, 796–801
Google Scholar
Zhang Y, Burer S, Street W N. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 2006, 7: 1315–1338
MathSciNet MATH Google Scholar
Li N, Zhou Z H. Selective ensemble under regularization framework. In: Proceedings of the 8th International Workshop on Multiple Classifier Systems. 2009, 293–303
Chapter Google Scholar
Qian C, Yu Y, Zhou Z H. Pareto ensemble pruning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2935–2941
Google Scholar
Pawlik M, Augsten N. Tree edit distance: Robust and memoryefficient. Information Systems, 2016, 56: 157–173
Article Google Scholar
Wolberg WH, Mangasarian O L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, 1990, 87(23): 9193–9196
Article MATH Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I H. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 2009, 11(1): 10–18
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank anonymous reviewers for their helpful comments and suggestions. This research was supported by the National Natural Science Foundation of China (Grant No. 61333014).

Author information

Authors and Affiliations

National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Tao Sun & Zhi-Hua Zhou

Authors

Tao Sun
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhi-Hua Zhou.

Additional information

Tao Sun received his BS degree in the School of Automation from Huazhong University of Science and Technology, China in 2015. He is currently a graduate student in the Department of Computer Science and Technology, Nanjing University, China. His research interests include machine learning and data mining.

Zhi-Hua Zhou is a professor at the Department of Computer Science and Technology, Nanjing University, China. He is the founding director of LAMDA. He is a foreign member of the Academy of Europe, and fellow of the ACM, AAAI, AAAS, IEEE, IAPR, and CCF.His main research interests are in artificial intelligence, machine learning and data mining.

Electronic supplementary material

Supplementary material, approximately 109 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, T., Zhou, ZH. Structural diversity for decision tree ensemble learning. Front. Comput. Sci. 12, 560–570 (2018). https://doi.org/10.1007/s11704-018-7151-8

Download citation

Received: 28 April 2017
Accepted: 21 December 2017
Published: 15 February 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s11704-018-7151-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structural diversity for decision tree ensemble learning

Abstract

Access this article

Similar content being viewed by others

Diversity-Based Random Forests with Sample Weight Learning

Increasing Diversity in Random Forests Using Naive Bayes

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 109 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structural diversity for decision tree ensemble learning

Abstract

Access this article

Similar content being viewed by others

Diversity-Based Random Forests with Sample Weight Learning

Increasing Diversity in Random Forests Using Naive Bayes

CLUB-DRF: A Clustering Approach to Extreme Pruning of Random Forests

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 109 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation