Decision tree induction with a constrained number of leaf nodes

Wu, Chia-Chi; Chen, Yen-Liang; Liu, Yi-Hung; Yang, Xiang-Yu

doi:10.1007/s10489-016-0785-z

Decision tree induction with a constrained number of leaf nodes

Published: 15 April 2016

Volume 45, pages 673–685, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chia-Chi Wu¹,
Yen-Liang Chen²,
Yi-Hung Liu³ &
…
Xiang-Yu Yang²

1020 Accesses
26 Citations
Explore all metrics

Abstract

With the advantages of being easy to understand and efficient to compute, the decision tree method has long been one of the most popular classifiers. Decision trees constructed with existing approaches, however, tend to be huge and complex, and consequently are difficult to use in practical applications. In this study, we deal with the problem of tree complexity by allowing users to specify the number of leaf nodes, and then construct a decision tree that allows maximum classification accuracy with the given number of leaf nodes. A new algorithm, the Size Constrained Decision Tree (SCDT), is proposed with which to construct a decision tree, paying close attention on how to efficiently use the limited number of leaf nodes. Experimental results show that the SCDT method can successfully generate a simpler decision tree and offers better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SPAARC: A Fast Decision Tree Algorithm

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Decisions Tree Learning Method Based on Three-Way Decisions

References

Almuallim H (1996) An efficient algorithm for optimal pruning of decision trees. Artif Intell 83:347–362
Article Google Scholar
Aamodt A, Plazas E (1994) Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun 7:39–59
Google Scholar
Ahmad A (2014) Decision tree ensembles based on kernel features. Appl Intell 41(3):855–869
Article Google Scholar
Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html. Accessed 08 August 2015
Barros RC, Basgalupp MP (2012) A survey of evolutionary algorithms for Decision-Tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(3):291–312
Article Google Scholar
Benferhat S, Boudjelida A, Tabia K, Drias H (2013) An intrusion detection and alert correlation approach based on revising probabilistic classifiers using expert knowledge. Appl Intell 38(4):520–540
Article Google Scholar
Bernardo D, Hagras H, Tsang E (2013) A genetic type-2 fuzzy logic based system for the generation of summarised linguistic predictive models for financial applications. Soft Comput 17:2185–2201
Article Google Scholar
Ben-Assuli O, Leshno M (2013) Using electronic medical records in admission decisions: a cost effectiveness analysis. Decis Sci 44(3):463–481
Article Google Scholar
Bohanec M, Bratko I (1994) Trading accuracy for simplicity in decision trees. Mach Learn 15:223–250
MATH Google Scholar
Borgonovo E, Marinacci M (2015) Decision analysis under ambiguity. Eur J Oper Res 244(3):823–836
Article MathSciNet Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont
MATH Google Scholar
Chih-Chung Chang, Chih-Jen Lin (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27
Article Google Scholar
Cheeseman P, Kelly J, Self M (1988) Autoclass: A Bayesian classification system. In: proceedings of the Fifth Intl Workshop on Machine Learning, vol 27, pp 54–64
Deng H, Runger G, Tuv E, Bannister W (2014) CBC: An associative classifier with a small number of rules. Decis Support Syst 59:163–170
Article Google Scholar
De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13:161–188
Google Scholar
Fournier D, Cremilleux B (2002) A quality index for decision tree pruning. Knowl-Based Syst 15:37–43
Article Google Scholar
Frini A, Guitouni A, Martel JM (2012) A general decomposition approach for multi-criteria decision trees. Eur J Oper Res 220(2):452–460
Article MathSciNet MATH Google Scholar
Garofalakis M, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Disc 7:187–214
Article MathSciNet Google Scholar
Gehrke J, Ganti V, Ramakrishnan R, Loh WY (1999) BOAT-Optimistic decision tree construction. In: Proceedings of the 1999 ACM-SIGMOD International Conference, pp 311–323
Gharehgozli AH, Yu Y, de Koster R, Udding JT (2014) A decision-tree stacking heuristic minimising the expected number of reshuffles at a container terminal. Int J Prod Res 52(9):2592–2611
Article Google Scholar
Han J (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann
Huang YL, Kammerdiner A (2013) Reduction of service time variation in patient visit groups using decision tree method for an effective scheduling. Int J Healthc Technol Manag 14(1-2): 3–21
Article Google Scholar
Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13:189–228
Article Google Scholar
Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283
Article Google Scholar
Kwon S, Kim YG, Cha S (2012) Web robot detection based on pattern-matching technique. J Inf Sci 38(2):118–126
Article Google Scholar
Lee CC, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Comm 53:1162–1171
Article Google Scholar
Lehnfeld J, Knust S (2014) Loading, unloading and premarshalling of stacks in storage areas: Survey and classification. Eur J Oper Res 239:297–312
Article MathSciNet MATH Google Scholar
Lin WY, Hu YH, Tsai CF (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):421–436
Article Google Scholar
Lomax S, Vadera S (2013) A survey of Cost-Sensitive decision tree induction algorithms. ACM Comput Surv 45(2):1–35
Article MATH Google Scholar
Mehta M, Rissanen J, Agrawal R (1995) MDL-Based decision tree pruning. In: Proceedings of Int’l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada
Mohammed N, Chen R, Fung BCM, Yu PS (2011) Differentially private data release for data mining. In: proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 493–501
Murthy SK (1998) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 2:345–389
Article Google Scholar
Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21:9–41
Article MathSciNet Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106
Google Scholar
Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234
Article Google Scholar
Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Rastogi R, Shim K (2000) PUBLIC: A decision tree classifier that integrates building and pruning. Data Min Knowl Disc 4:315–344
Article MATH Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, UK
Book MATH Google Scholar
Shafer J, Agrawal R, Mehta M (1996) SPRINT: A scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Bases, pp 544–555
Stahl F, Bramer M (2012) Jmax-pruning: a facility for the information theoretic pruning of modular classification rules. Knowl-Based Syst 29:12–19
Article Google Scholar
Turney PD (1995) Cost-Sensitive Classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409
Google Scholar
Tsang S, Kao B, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78
Article Google Scholar
Wagstaff KL, Kocurek M, Mazzoni D, Tang B (2010) Progressive refinement for support vector machines. Data Min Knowl Disc 20:53–69
Article MathSciNet Google Scholar
Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25:1–20
Article Google Scholar
Wozniak M (2010) A hybrid decision tree training method using data streams. Knowl Inf Syst:1–13
Wu CH, Wei WL, Lin JC, Lee WY (2013) Speaking effect removal on emotion recognition from facial expressions based on eigenface conversion. IEEE Trans Multimed 15(8):1732–1744
Article Google Scholar
Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15:321–334
Article Google Scholar
Zhang S (2012) Decision tree classifiers sensitive to heterogeneous costs. J Syst Softw 85:771–779
Article Google Scholar
Zhang WD, Wang SH, Phillips P, Ji GL (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 64:22–31
Article Google Scholar
Zhao Z, Chen Y, Liu J, Shen Z, Liu M (2011) Cross-people mobile-phone based activity recognition

Download references

Acknowledgments

The authors would like to thank the Editor-in-Chief, Dr. Moonis Ali, and the anonymous referees for their helps and valuable comments to improve this paper. This research was supported by the Fundamental Research Funds of Shantou University of China (Grant no. 120-760161).

Author information

Authors and Affiliations

The Advanced Research Institute, Institute for information Industry, Taipei, Taiwan, 115, Republic of China
Chia-Chi Wu
Department of Information Management, National Central University, Chung-Li, Taiwan, 320, Republic of China
Yen-Liang Chen & Xiang-Yu Yang
Business School, Shantou University, Shantou, Guangdong, 515063, China
Yi-Hung Liu

Authors

Chia-Chi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yen-Liang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Hung Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiang-Yu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi-Hung Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CC., Chen, YL., Liu, YH. et al. Decision tree induction with a constrained number of leaf nodes. Appl Intell 45, 673–685 (2016). https://doi.org/10.1007/s10489-016-0785-z

Download citation

Published: 15 April 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s10489-016-0785-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decision tree induction with a constrained number of leaf nodes

Abstract

Access this article

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Decisions Tree Learning Method Based on Three-Way Decisions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Decision tree induction with a constrained number of leaf nodes

Abstract

Access this article

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Learning Decision Trees with Flexible Constraints and Objectives Using Integer Optimization

Decisions Tree Learning Method Based on Three-Way Decisions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation