Skip to main content
Log in

A framework to induce more stable decision trees for pattern classification

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Decision tree learning algorithms are known to be unstable, such that small changes in the training data can result in highly different output models. Instability is an important issue in the context of machine learning which is usually overlooked. In this paper, we illustrate and discuss the problem of instability of decision tree induction algorithms and propose a framework to induce more stable decision trees. In the proposed framework, the split test encompasses two advantageous properties: First, it is able to contribute multiple attributes. Second, it has a polylithic structure. The first property alleviates the race between the competing attributes to be installed at an internal node, which is the major cause of instability. The second property has the potential of improving the stability by providing the locality of the effect of the instances on the split test. We illustrate the effectiveness of the proposed framework by providing a complying decision tree learning algorithm and conducting several experiments. We have evaluated the structural stability of the algorithms by employing three measures. The experimental results reveal that the decision trees induced by the proposed framework exhibit great stability and competitive accuracy in comparison with several well-known decision tree learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  Google Scholar 

  2. Breiman L (1996) Heuristics of instability and stabilization in model selection. Ann Stat 24(6):2350–2383

    Article  MATH  MathSciNet  Google Scholar 

  3. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International Group, Belmont, California

    MATH  Google Scholar 

  4. Brodley CE, Utgoff PE (1995) Multivariate decision trees. Mach Learn 19:45–77

    MATH  Google Scholar 

  5. Chandra B, Kothari R, Paul P (1995) A new node splitting measure for decision tree construction. Patt Recogn 43:2725–2731

    Article  MATH  Google Scholar 

  6. Demšar, J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 1–16

  7. Dwyer KD (2007) Decision tree instability and active learning, Ph.D. thesis. University of Alberta, Edmonton

  8. Fong PK, Weber-Jahnke J (2012) Privacy preserving decision tree learning using unrealized data sets. IEEE Trans Know Data Eng 24:353–364

    Article  Google Scholar 

  9. Friedman M (1940) A Comparison of Alternative Tests of Significance for the Problem of m Rankings. Ann Math Stat 11:86–92

    Article  MATH  MathSciNet  Google Scholar 

  10. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: An update. SIGKDD Explor. Newsl 11:10–18

    Google Scholar 

  11. Hu Q, Che X, Zhang L, Zhang D, Guo M, Yu D (2012) Rank entropy-based decision trees for monotonic classification. IEEE Trans Know Data Eng 24:2052–2064

    Article  Google Scholar 

  12. Kohavi R (1996) Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid, Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, 202–207

  13. Kohavi R, Kunz C (1997) Option decision trees with majority votes, Proceedings of the Fourteenth International Conference on Machine Learning, ICML ’97

  14. Last M, Maimon O, Minkov E (2002) Improving stability of decision trees. Int J Patt Recogn Art Intell 16:145–159

    Article  Google Scholar 

  15. Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science

  16. Maher PE, St.Clair D (1993) Uncertain reasoning in an ID3 machine learning framework, Second IEEE International Conference on Fuzzy Systems, 7–12

  17. Paul J, Verleysen M, Dupont P (2012) The stability of feature selection and class prediction from ensemble tree classifiers, ESANN2012 Special Session on Machine Ensembles

  18. Quinlan JR (1986) Induction of Decision Trees. Mach Learn 1(1):81–106

    Google Scholar 

  19. Quinlan JR (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco

  20. Shi H (2007) Best-first decision tree learning, Master’s thesis. University of Waikato

  21. Simpson PK (1992) Fuzzy min-max neural networks. I. Classification. IEEE Trans Neural Networks 3:776–786

    Article  Google Scholar 

  22. Turney P (1995) Technical note: Bias and the quantification of stability. Mach Learn 20:23–33

    Google Scholar 

  23. Wang X, Liu X, Pedrycz W, Zhang L (2015) Fuzzy rule based decision trees. Patt Recogn 48:50–59

    Article  Google Scholar 

  24. Yi W, Lu M, Liu Z (2011) Multi-valued attribute and multi-labelled data decision tree algorithm. Int J Mach Learn Cyber 2:67–74

    Article  Google Scholar 

  25. Zimmermann A (2008) Ensemble-trees: leveraging ensemble power inside decision trees. Discovery Science, Springer, Berlin Heidelberg, Lecture Notes in Computer Science 5255:76–87

  26. Dannegger F (2000) Tree stability diagnostics and some remedies against instability. Stat Med 19:475–491

    Article  Google Scholar 

  27. Furnkranz J (1998) Integrative windowing. Stat Med 8:129–164

    MATH  Google Scholar 

  28. Rokach L, Maimon O (2008) Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing Co. Pte. Ltd, volume 69 series in machine learning and artificial intelligence

  29. Alpaydin E (2010) Introduction to Machine Learning, The MIT Press, 2nd edition

  30. Briand B, Ducharme GR, Parache V, Mercat-Rommens C (2009) A similarity measure to assess the stability of classification trees. Comp Stat Data Anal 53(4):1208–1217

    Article  MATH  MathSciNet  Google Scholar 

  31. Mirzamomen Z, Kangavari M (2016) Fuzzy Min-Max neural network based decision trees, Intelligent Data Analysis, to appear soon in. 20(4)

  32. Gama J (2004) Functional trees. Mach Learn 55(3):219–250

    Article  MATH  Google Scholar 

  33. Domingos P, Hulten G (2000) Mining high-speed data streams. Proceedi Sixth ACM SIGKDD Int Conf Know Dis Data Mining, KDD 00:71–80

    Article  Google Scholar 

  34. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams, Proceedings of the 2001 ACM SIGKDD International Conference On Knowledge Discovery and Data Mining, 97–106

  35. Gama J, Rocha R, Medas P (2003) Accurate decision trees for mining high-speed data streams, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, New York, USA, 523–528

  36. Hashemi S, Yang Y (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Mining Know Dis 19:95–131

    Article  MathSciNet  Google Scholar 

  37. Bifet A, Gavaldá R (2009) Adaptive learning from evolving data streams. Advances in Intelligent Data Analysis VIII, Lecture Notes in Computer Science, Springer, Berlin Heidelberg 5772:249–260

  38. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) Moa: massive online analysis. J Mach Learn Res 11:1601–1604

    Google Scholar 

  39. Zhao Q (2005) Learning with data streams: an nntree based approach, Embedded and Ubiquitous Computing. In: T Enokido, L Yan , B Xiao, D Kim, Y Dai, L Yang (eds) Lecture Notes in Computer Science. Springer, Berlin, Heidelberg vol 3823, pp 519–528

  40. Heath David, Kasif Simon, Salzberg Steven (1993) Induction of oblique decision trees. J Arti Intell Res 2(2):1–32

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zahra Mirzamomen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mirzamomen, Z., Kangavari, M.R. A framework to induce more stable decision trees for pattern classification. Pattern Anal Applic 20, 991–1004 (2017). https://doi.org/10.1007/s10044-016-0542-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0542-2

Keywords

Navigation