Classification Trees

  • Lior Rokach
  • Oded Maimon


Decision Trees are considered to be one of the most popular approaches for representing classifiers. Researchers from various disciplines such as statistics, machine learning, pattern recognition, and Data Mining have dealt with the issue of growing a decision tree from available data. This paper presents an updated survey of current methods for constructing decision tree classifiers in a top-down manner. The chapter suggests a unified algorithmic framework for presenting these algorithms and describes various splitting criteria and pruning methodologies.

Key words

Decision tree Information Gain Gini Index Gain Ratio Pruning Minimum Description Length C4.5 CART Oblivious Decision Trees 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Almuallim H., An Efficient Algorithm for Optimal Pruning of Decision Trees. Artificial Intelligence 83(2): 347-362, 1996.CrossRefGoogle Scholar
  2. Almuallim H,. and Dietterich T.G., Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69:1-2, 279-306, 1994.zbMATHCrossRefMathSciNetGoogle Scholar
  3. Alsabti K., Ranka S. and Singh V., CLOUDS: A Decision Tree Classifier for Large Datasets, Conference on Knowledge Discovery and Data Mining (KDD-98), August 1998.Google Scholar
  4. Attneave F., Applications of Information Theory to Psychology. Holt, Rinehart andWinston, 1959.Google Scholar
  5. Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier.CrossRefGoogle Scholar
  6. Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Contextsensitive medical information retrieval, The 11th World Congress on Medical Informatics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286.Google Scholar
  7. Baker E., and Jain A. K., On feature ordering in practice and some finite sample effects. In Proceedings of the Third International Joint Conference on Pattern Recognition, pages 45-49, San Diego, CA, 1976.Google Scholar
  8. BenBassat M., Myopic policies in sequential classification. IEEE Trans. on Computing, 27(2):170-174, February 1978.CrossRefMathSciNetGoogle Scholar
  9. Bennett X. and Mangasarian O.L., Multicategory discrimination via linear programming. Optimization Methods and Software, 3:29-39, 1994.CrossRefGoogle Scholar
  10. Bratko I., and Bohanec M., Trading accuracy for simplicity in decision trees, Machine Learning 15: 223-250, 1994.zbMATHGoogle Scholar
  11. Breiman L., Friedman J., Olshen R., and Stone C.. Classification and Regression Trees. Wadsworth Int. Group, 1984.Google Scholar
  12. Brodley C. E. and Utgoff. P. E., Multivariate decision trees. Machine Learning, 19:45-77, 1995.zbMATHGoogle Scholar
  13. Buntine W., Niblett T., A Further Comparison of Splitting Rules for Decision-Tree Induction. Machine Learning, 8: 75-85, 1992.Google Scholar
  14. Catlett J., Mega induction: Machine Learning on Vary Large Databases, PhD, University of Sydney, 1991.Google Scholar
  15. Chan P.K. and Stolfo S.J, On the Accuracy of Meta-learning for Scalable Data Mining, J. Intelligent Information Systems, 8:5-28, 1997.CrossRefGoogle Scholar
  16. Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007.CrossRefGoogle Scholar
  17. Crawford S. L., Extensions to the CART algorithm. Int. J. of ManMachine Studies, 31(2):197-217, August 1989.CrossRefMathSciNetGoogle Scholar
  18. Dietterich, T. G., Kearns, M., and Mansour, Y., Applying the weak learning framework to understand and improve C4.5. Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96-104, San Francisco: Morgan Kaufmann, 1996.Google Scholar
  19. Duda, R., and Hart, P., Pattern Classification and Scene Analysis, New-York, Wiley, 1973.zbMATHGoogle Scholar
  20. Esposito F., Malerba D. and Semeraro G., A Comparative Analysis of Methods for Pruning Decision Trees. EEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476-492, 1997.CrossRefGoogle Scholar
  21. Fayyad U., and Irani K. B., The attribute selection problem in decision tree generation. In proceedings of Tenth National Conference on Artificial Intelligence, pp. 104–110, Cambridge, MA: AAAI Press/MIT Press, 1992.Google Scholar
  22. Ferri C., Flach P., and Hernández-Orallo J., Learning Decision Trees Using the Area Under the ROC Curve. In Claude Sammut and Achim Hoffmann, editors, Proceedings of the 19th International Conference on Machine Learning, pp. 139-146. Morgan Kaufmann, July 2002Google Scholar
  23. Fifield D. J., Distributed Tree Construction From Large Datasets, Bachelor’s Honor Thesis, Australian National University, 1992.Google Scholar
  24. Freitas X., and Lavington S. H., Mining Very Large Databases With Parallel Processing, Kluwer Academic Publishers, 1998.Google Scholar
  25. Friedman J. H., A recursive partitioning decision rule for nonparametric classifiers. IEEE Trans. on Comp., C26:404-408, 1977.CrossRefGoogle Scholar
  26. Friedman, J. H., “Multivariate Adaptive Regression Splines”, The Annual Of Statistics, 19, 1-141, 1991.zbMATHCrossRefGoogle Scholar
  27. Gehrke J., Ganti V., Ramakrishnan R., Loh W., BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: pp. 169-180, 1999.CrossRefGoogle Scholar
  28. Gehrke J., Ramakrishnan R., Ganti V., RainForest - A Framework for Fast Decision Tree Construction of Large Datasets, Data Mining and Knowledge Discovery, 4, 2/3) 127-162, 2000.CrossRefGoogle Scholar
  29. Gelfand S. B., Ravishankar C. S., and Delp E. J., An iterative growing and pruning algorithm for classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence, 13(2):163-174, 1991.CrossRefGoogle Scholar
  30. Gillo M. W., MAID: A Honeywell 600 program for an automatised survey analysis. Behavioral Science 17: 251-252, 1972.Google Scholar
  31. Hancock T. R., Jiang T., Li M., Tromp J., Lower Bounds on Learning Decision Lists and Trees. Information and Computation 126(2): 114-122, 1996.zbMATHCrossRefMathSciNetGoogle Scholar
  32. Holte R. C., Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63-90, 1993.zbMATHCrossRefGoogle Scholar
  33. Hyafil L. and Rivest R.L., Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15-17, 1976zbMATHCrossRefMathSciNetGoogle Scholar
  34. Janikow, C.Z., Fuzzy Decision Trees: Issues and Methods, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issue 1, pp. 1-14. 1998.Google Scholar
  35. John G. H., Robust linear discriminant trees. In D. Fisher and H. Lenz, editors, Learning From Data: Artificial Intelligence and Statistics V, Lecture Notes in Statistics, Chapter 36, pp. 375-385. Springer-Verlag, New York, 1996.Google Scholar
  36. Kass G. V., An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119-127, 1980.CrossRefGoogle Scholar
  37. Kearns M. and Mansour Y., A fast, bottom-up decision tree pruning algorithm with nearoptimal generalization, in J. Shavlik, ed., ‘Machine Learning: Proceedings of the Fifteenth International Conference’, Morgan Kaufmann Publishers, Inc., pp. 269-277, 1998.Google Scholar
  38. Kearns M. and Mansour Y., On the boosting ability of top-down decision tree learning algorithms. Journal of Computer and Systems Sciences, 58(1): 109-128, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  39. Kohavi R. and Sommerfield D., Targeting business users with decision table classifiers, in R. Agrawal, P. Stolorz & G. Piatetsky-Shapiro, eds, ‘Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining’, AAAI Press, pp. 249-253, 1998.Google Scholar
  40. Langley, P. and Sage, S., Oblivious decision trees and abstract cases. inWorking Notes of the AAAI-94 Workshop on Case-Based Reasoning, pp. 113-117, Seattle, WA: AAAI Press, 1994.Google Scholar
  41. Li X. and Dubes R. C., Tree classifier design with a Permutation statistic, Pattern Recognition 19:229-235, 1986.CrossRefGoogle Scholar
  42. Lim X., Loh W.Y., and Shih X., A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms . Machine Learning 40:203-228, 2000.zbMATHCrossRefGoogle Scholar
  43. Lin Y. K. and Fu K., Automatic classification of cervical cells using a binary tree classifier. Pattern Recognition, 16(1):69-80, 1983.CrossRefGoogle Scholar
  44. Loh W.Y.,and Shih X., Split selection methods for classification trees. Statistica Sinica, 7: 815-840, 1997.zbMATHMathSciNetGoogle Scholar
  45. Loh W.Y. and Shih X., Families of splitting criteria for classification trees. Statistics and Computing 9:309-315, 1999.CrossRefGoogle Scholar
  46. Loh W.Y. and Vanichsetakul N., Tree-structured classification via generalized discriminant Analysis. Journal of the American Statistical Association, 83: 715-728, 1988.zbMATHCrossRefMathSciNetGoogle Scholar
  47. Lopez de Mantras R., A distance-based attribute selection measure for decision tree induction, Machine Learning 6:81-92, 1991.Google Scholar
  48. Lubinsky D., Algorithmic speedups in growing classification trees by using an additive split criterion. Proc. AI&Statistics 93, pp. 435-444, 1993.Google Scholar
  49. Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001.Google Scholar
  50. Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Proceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002.Google Scholar
  51. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial Intelligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005.Google Scholar
  52. Martin J. K., An exact probability metric for decision tree splitting and stopping. An Exact Probability Metric for Decision Tree Splitting and Stopping, Machine Learning, 28, 2-3):257-291, 1997.Google Scholar
  53. Mehta M., Rissanen J., Agrawal R., MDL-Based Decision Tree Pruning. KDD 1995: pp. 216-221, 1995.Google Scholar
  54. Mehta M., Agrawal R. and Rissanen J., SLIQ: A fast scalable classifier for Data Mining: In Proc. If the fifth Int’l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996.Google Scholar
  55. Mingers J., An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4(2):227-243, 1989.CrossRefGoogle Scholar
  56. Morgan J. N. and Messenger R. C., THAID: a sequential search program for the analysis of nominal scale dependent variables. Technical report, Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1973.Google Scholar
  57. Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behavioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544–4566, 2008.zbMATHCrossRefMathSciNetGoogle Scholar
  58. Muller W., and Wysotzki F., Automatic construction of decision trees for classification. Annals of Operations Research, 52:231-247, 1994.CrossRefGoogle Scholar
  59. Murthy S. K., Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey. Data Mining and Knowledge Discovery, 2(4):345-389, 1998.CrossRefGoogle Scholar
  60. Naumov G.E., NP-completeness of problems of construction of optimal decision trees. Soviet Physics: Doklady, 36(4):270-271, 1991.zbMATHMathSciNetGoogle Scholar
  61. Niblett T. and Bratko I., Learning Decision Rules in Noisy Domains, Proc. Expert Systems 86, Cambridge: Cambridge University Press, 1986.Google Scholar
  62. Olaru C., Wehenkel L., A complete fuzzy decision tree technique, Fuzzy Sets and Systems, 138(2):221–254, 2003.CrossRefMathSciNetGoogle Scholar
  63. Pagallo, G. and Huassler, D., Boolean feature discovery in empirical learning, Machine Learning, 5(1): 71-99, 1990.CrossRefGoogle Scholar
  64. Peng Y., Intelligent condition monitoring using fuzzy inductive learning, Journal of Intelligent Manufacturing, 15 (3): 373-380, June 2004.CrossRefGoogle Scholar
  65. Quinlan, J.R., Induction of decision trees, Machine Learning 1, 81-106, 1986.Google Scholar
  66. Quinlan, J.R., Simplifying decision trees, International Journal of Man-Machine Studies, 27, 221-234, 1987.CrossRefGoogle Scholar
  67. Quinlan, J.R., Decision Trees and Multivalued Attributes, J. Richards, ed., Machine Intelligence, V. 11, Oxford, England, Oxford Univ. Press, pp. 305-318, 1988.Google Scholar
  68. Quinlan, J. R., Unknown attribute values in induction. In Segre, A. (Ed.), Proceedings of the Sixth International Machine Learning Workshop Cornell, New York. Morgan Kaufmann, 1989.Google Scholar
  69. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, 1993.Google Scholar
  70. Quinlan, J. R. and Rivest, R. L., Inferring Decision Trees Using The Minimum Description Length Principle. Information and Computation, 80:227-248, 1989.zbMATHCrossRefMathSciNetGoogle Scholar
  71. Rastogi, R., and Shim, K., PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning, Data Mining and Knowledge Discovery, 4(4):315-344, 2000.zbMATHCrossRefGoogle Scholar
  72. Rissanen, J., Stochastic complexity and statistical inquiry. World Scientific, 1989.Google Scholar
  73. Rokach, L., Decomposition methodology for classification tasks: a meta decomposer framework, Pattern Analysis and Applications, 9(2006):257–271.Google Scholar
  74. Rokach L., Genetic algorithm-based feature set partitioning for classification problems, Pattern Recognition, 41(5):1676–1700, 2008.zbMATHCrossRefGoogle Scholar
  75. Rokach L., Mining manufacturing data using genetic algorithm-based feature set decomposition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008.CrossRefGoogle Scholar
  76. Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE International Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001.Google Scholar
  77. Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intelligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158.Google Scholar
  78. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer.Google Scholar
  79. Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285–299, 2006, Springer.CrossRefGoogle Scholar
  80. Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications,World Scientific Publishing, 2008.Google Scholar
  81. Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Approach, Proceedings of the 14th International Symposium On Methodologies For Intelligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31.Google Scholar
  82. Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer-Verlag, 2004.Google Scholar
  83. Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350.CrossRefGoogle Scholar
  84. Rounds, E., A combined non-parametric approach to feature selection and binary decision tree design, Pattern Recognition 12, 313-317, 1980.CrossRefGoogle Scholar
  85. Schlimmer, J. C. , Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning. In Proceedings of the 1993 International Conference on Machine Learning: pp 284-290, San Mateo, CA, Morgan Kaufmann, 1993.Google Scholar
  86. Sethi, K., and Yoo, J. H., Design of multicategory, multifeature split decision trees using perceptron learning. Pattern Recognition, 27(7):939-947, 1994.CrossRefGoogle Scholar
  87. Shafer, J. C., Agrawal, R. and Mehta, M. , SPRINT: A Scalable Parallel Classifier for Data Mining, Proc. 22nd Int. Conf. Very Large Databases, T. M. Vijayaraman and Alejandro P. Buchmann and C. Mohan and Nandlal L. Sarda (eds), 544-555, Morgan Kaufmann, 1996.Google Scholar
  88. Sklansky, J. and Wassel, G. N., Pattern classifiers and trainable machines. SpringerVerlag, New York, 1981.zbMATHGoogle Scholar
  89. Sonquist, J. A., Baker E. L., and Morgan, J. N., Searching for Structure. Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1971.zbMATHGoogle Scholar
  90. Taylor P. C., and Silverman, B. W., Block diagrams and splitting criteria for classification trees. Statistics and Computing, 3(4):147-161, 1993.CrossRefGoogle Scholar
  91. Utgoff, P. E., Perceptron trees: A case study in hybrid concept representations. Connection Science, 1(4):377-391, 1989.CrossRefGoogle Scholar
  92. Utgoff, P. E., Incremental induction of decision trees. Machine Learning, 4: 161-186, 1989.CrossRefGoogle Scholar
  93. Utgoff, P. E., Decision tree induction based on efficient tree restructuring, Machine Learning 29, 1):5-44, 1997.zbMATHCrossRefGoogle Scholar
  94. Utgoff, P. E., and Clouse, J. A., A Kolmogorov-Smirnoff Metric for Decision Tree Induction, Technical Report 96-3, University of Massachusetts, Department of Computer Science, Amherst, MA, 1996.Google Scholar
  95. Wallace, C. S., and Patrick J., Coding decision trees, Machine Learning 11: 7-22, 1993.zbMATHCrossRefGoogle Scholar
  96. Zantema, H., and Bodlaender H. L., Finding Small Equivalent Decision Trees is Hard, International Journal of Foundations of Computer Science, 11(2):343-354, 2000.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Information System EngineeringBen-Gurion UniversityBeer-ShebaIsrael
  2. 2.Department of Industrial EngineeringTel-Aviv UniversityRamat-AvivIsrael

Personalised recommendations