Skip to main content
Log in

Decision tree induction with a constrained number of leaf nodes

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

With the advantages of being easy to understand and efficient to compute, the decision tree method has long been one of the most popular classifiers. Decision trees constructed with existing approaches, however, tend to be huge and complex, and consequently are difficult to use in practical applications. In this study, we deal with the problem of tree complexity by allowing users to specify the number of leaf nodes, and then construct a decision tree that allows maximum classification accuracy with the given number of leaf nodes. A new algorithm, the Size Constrained Decision Tree (SCDT), is proposed with which to construct a decision tree, paying close attention on how to efficiently use the limited number of leaf nodes. Experimental results show that the SCDT method can successfully generate a simpler decision tree and offers better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Almuallim H (1996) An efficient algorithm for optimal pruning of decision trees. Artif Intell 83:347–362

    Article  Google Scholar 

  2. Aamodt A, Plazas E (1994) Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI Commun 7:39–59

    Google Scholar 

  3. Ahmad A (2014) Decision tree ensembles based on kernel features. Appl Intell 41(3):855–869

    Article  Google Scholar 

  4. Asuncion A, Newman DJ (2007) UCI Machine Learning Repository. http://www.ics.uci.edu/mlearn/MLRepository.html. Accessed 08 August 2015

  5. Barros RC, Basgalupp MP (2012) A survey of evolutionary algorithms for Decision-Tree induction. IEEE Trans Syst Man Cybern Part C Appl Rev 42(3):291–312

    Article  Google Scholar 

  6. Benferhat S, Boudjelida A, Tabia K, Drias H (2013) An intrusion detection and alert correlation approach based on revising probabilistic classifiers using expert knowledge. Appl Intell 38(4):520–540

    Article  Google Scholar 

  7. Bernardo D, Hagras H, Tsang E (2013) A genetic type-2 fuzzy logic based system for the generation of summarised linguistic predictive models for financial applications. Soft Comput 17:2185–2201

    Article  Google Scholar 

  8. Ben-Assuli O, Leshno M (2013) Using electronic medical records in admission decisions: a cost effectiveness analysis. Decis Sci 44(3):463–481

    Article  Google Scholar 

  9. Bohanec M, Bratko I (1994) Trading accuracy for simplicity in decision trees. Mach Learn 15:223–250

    MATH  Google Scholar 

  10. Borgonovo E, Marinacci M (2015) Decision analysis under ambiguity. Eur J Oper Res 244(3):823–836

    Article  MathSciNet  Google Scholar 

  11. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont

    MATH  Google Scholar 

  12. Chih-Chung Chang, Chih-Jen Lin (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2(27):1–27

    Article  Google Scholar 

  13. Cheeseman P, Kelly J, Self M (1988) Autoclass: A Bayesian classification system. In: proceedings of the Fifth Intl Workshop on Machine Learning, vol 27, pp 54–64

  14. Deng H, Runger G, Tuv E, Bannister W (2014) CBC: An associative classifier with a small number of rules. Decis Support Syst 59:163–170

    Article  Google Scholar 

  15. De Jong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13:161–188

    Google Scholar 

  16. Fournier D, Cremilleux B (2002) A quality index for decision tree pruning. Knowl-Based Syst 15:37–43

    Article  Google Scholar 

  17. Frini A, Guitouni A, Martel JM (2012) A general decomposition approach for multi-criteria decision trees. Eur J Oper Res 220(2):452–460

    Article  MathSciNet  MATH  Google Scholar 

  18. Garofalakis M, Hyun D, Rastogi R, Shim K (2003) Building decision trees with constraints. Data Min Knowl Disc 7:187–214

    Article  MathSciNet  Google Scholar 

  19. Gehrke J, Ganti V, Ramakrishnan R, Loh WY (1999) BOAT-Optimistic decision tree construction. In: Proceedings of the 1999 ACM-SIGMOD International Conference, pp 311–323

  20. Gharehgozli AH, Yu Y, de Koster R, Udding JT (2014) A decision-tree stacking heuristic minimising the expected number of reshuffles at a container terminal. Int J Prod Res 52(9):2592–2611

    Article  Google Scholar 

  21. Han J (2006) Data Mining: Concepts and Techniques. Morgan Kaufmann

  22. Huang YL, Kammerdiner A (2013) Reduction of service time variation in patient visit groups using decision tree method for an effective scheduling. Int J Healthc Technol Manag 14(1-2): 3–21

    Article  Google Scholar 

  23. Janikow CZ (1993) A knowledge-intensive genetic algorithm for supervised learning. Mach Learn 13:189–228

    Article  Google Scholar 

  24. Kotsiantis SB (2013) Decision trees: a recent overview. Artif Intell Rev 39:261–283

    Article  Google Scholar 

  25. Kwon S, Kim YG, Cha S (2012) Web robot detection based on pattern-matching technique. J Inf Sci 38(2):118–126

    Article  Google Scholar 

  26. Lee CC, Mower E, Busso C, Lee S, Narayanan S (2011) Emotion recognition using a hierarchical binary decision tree approach. Speech Comm 53:1162–1171

    Article  Google Scholar 

  27. Lehnfeld J, Knust S (2014) Loading, unloading and premarshalling of stacks in storage areas: Survey and classification. Eur J Oper Res 239:297–312

    Article  MathSciNet  MATH  Google Scholar 

  28. Lin WY, Hu YH, Tsai CF (2012) Machine learning in financial crisis prediction: a survey. IEEE Trans Syst Man Cybern Part C Appl Rev 42(4):421–436

    Article  Google Scholar 

  29. Lomax S, Vadera S (2013) A survey of Cost-Sensitive decision tree induction algorithms. ACM Comput Surv 45(2):1–35

    Article  MATH  Google Scholar 

  30. Mehta M, Rissanen J, Agrawal R (1995) MDL-Based decision tree pruning. In: Proceedings of Int’l Conference on Knowledge Discovery in Databases and Data Mining (KDD-95), Montreal, Canada

  31. Mohammed N, Chen R, Fung BCM, Yu PS (2011) Differentially private data release for data mining. In: proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 493–501

  32. Murthy SK (1998) Automatic construction of decision trees from data: a multi-disciplinary survey. Data Min Knowl Disc 2:345–389

    Article  Google Scholar 

  33. Nijssen S, Fromont E (2010) Optimal constraint-based decision tree induction from itemset lattices. Data Min Knowl Disc 21:9–41

    Article  MathSciNet  Google Scholar 

  34. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106

    Google Scholar 

  35. Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234

    Article  Google Scholar 

  36. Quinlan JR (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo

    Google Scholar 

  37. Rastogi R, Shim K (2000) PUBLIC: A decision tree classifier that integrates building and pruning. Data Min Knowl Disc 4:315–344

    Article  MATH  Google Scholar 

  38. Ripley BD (1996) Pattern recognition and neural networks. Cambridge University Press, UK

    Book  MATH  Google Scholar 

  39. Shafer J, Agrawal R, Mehta M (1996) SPRINT: A scalable parallel classifier for data mining. In: Proceedings of 1996 International Conference on Very Large Data Bases, pp 544–555

  40. Stahl F, Bramer M (2012) Jmax-pruning: a facility for the information theoretic pruning of modular classification rules. Knowl-Based Syst 29:12–19

    Article  Google Scholar 

  41. Turney PD (1995) Cost-Sensitive Classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. J Artif Intell Res 2:369–409

    Google Scholar 

  42. Tsang S, Kao B, Ho WS, Lee SD (2011) Decision trees for uncertain data. IEEE Trans Knowl Data Eng 23(1):64–78

    Article  Google Scholar 

  43. Wagstaff KL, Kocurek M, Mazzoni D, Tang B (2010) Progressive refinement for support vector machines. Data Min Knowl Disc 20:53–69

    Article  MathSciNet  Google Scholar 

  44. Wang BX, Japkowicz N (2010) Boosting support vector machines for imbalanced data sets. Knowl Inf Syst 25:1–20

    Article  Google Scholar 

  45. Wozniak M (2010) A hybrid decision tree training method using data streams. Knowl Inf Syst:1–13

  46. Wu CH, Wei WL, Lin JC, Lee WY (2013) Speaking effect removal on emotion recognition from facial expressions based on eigenface conversion. IEEE Trans Multimed 15(8):1732–1744

    Article  Google Scholar 

  47. Zhao H (2008) Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl Inf Syst 15:321–334

    Article  Google Scholar 

  48. Zhang S (2012) Decision tree classifiers sensitive to heterogeneous costs. J Syst Softw 85:771–779

    Article  Google Scholar 

  49. Zhang WD, Wang SH, Phillips P, Ji GL (2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl-Based Syst 64:22–31

    Article  Google Scholar 

  50. Zhao Z, Chen Y, Liu J, Shen Z, Liu M (2011) Cross-people mobile-phone based activity recognition

Download references

Acknowledgments

The authors would like to thank the Editor-in-Chief, Dr. Moonis Ali, and the anonymous referees for their helps and valuable comments to improve this paper. This research was supported by the Fundamental Research Funds of Shantou University of China (Grant no. 120-760161).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Hung Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, CC., Chen, YL., Liu, YH. et al. Decision tree induction with a constrained number of leaf nodes. Appl Intell 45, 673–685 (2016). https://doi.org/10.1007/s10489-016-0785-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0785-z

Keywords

Navigation