Abstract
Decision-tree induction algorithms are highly used in a variety of domains for knowledge discovery and pattern recognition. They have the advantage of producing a comprehensible classification/regression model and satisfactory accuracy levels in several application domains, such as medical diagnosis and credit risk assessment. In this chapter, we present in detail the most common approach for decision-tree induction: top-down induction (Sect. 2.3). Furthermore, we briefly comment on some alternative strategies for induction of decision trees (Sect. 2.4). Our goal is to summarize the main design options one has to face when building decision-tree induction algorithms. These design choices will be specially interesting when designing an evolutionary algorithm for evolving decision-tree induction algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Data fragmentation is a well-known problem in top-down decision trees. Nodes with few instances usually lack statistical support for further partitioning. This phenomenon happens for most of the split criteria available, since their distributions depend on the number of instances in each particular node.
- 3.
OC1 only allows the option of employing oblique splits when \(N > 2n\), though this threshold can be user-defined.
- 4.
- 5.
For inverting a matrix, the Gauss-Jordan procedure takes time proportional to \(O(n^{3})\). The fastest algorithm for inverting matrices to date is \(O(n^{2.376})\) (the Coppersmith-Winograd algorithm).
- 6.
Nominal attributes are not used more than once in a given subtree.
References
A. Agresti, Categorical Data Analysis, 2nd edn., Wiley Series in Probability and Statistics (Wiley-Interscience, Hoboken, 2002)
E. Alpaydin, Introduction to Machine Learning (MIT Press, Cambridge, 2010). ISBN: 026201243X, 9780262012430
P.W. Baim, A method for attribute selection in inductive learning systems. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 888–896 (1988)
R.C. Barros et al., A bottom-up oblique decision tree induction algorithm, in 11th International Conference on Intelligent Systems Design and Applications, pp. 450–456 (2011)
R.C. Barros et al., A framework for bottom-up induction of decision trees, Neurocomputing (2013 in press)
R.C. Barros et al., A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst. Man, Cybern. Part C: Appl. Rev. 42(3), 291–312 (2012)
M.P. Basgalupp et al., A beam-search based decision-tree induction algorithm, in Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques. IGI-Global (2011)
K. Bennett, Global tree optimization: a non-greedy decision tree algorithm. Comput. Sci. Stat. 26, 156–160 (1994)
K. Bennett, O. Mangasarian, Multicategory discrimination via linear programming. Optim. Methods Softw. 2, 29–39 (1994)
K. Bennett, O. Mangasarian, Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1, 23–34 (1992)
L. Bobrowski, M. Kretowski, Induction of multivariate decision trees by using dipolar criteria, in European Conference on Principles of Data Mining and Knowledge Discovery. pp. 331–336 (2000)
L. Breiman et al., Classification and Regression Trees (Wadsworth, Belmont, 1984)
L. Breslow, D. Aha, Simplifying decision trees: a survey. Knowl. Eng. Rev. 12(01), 1–40 (1997)
C.E. Brodley, P.E. Utgoff, Multivariate versus univariate decision trees. Technical Report. Department of Computer Science, University of Massachusetts at Amherst (1992)
A. Buja, Y.-S. Lee, Data mining criteria for tree-based regression and classification, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 27–36 (2001)
W. Buntine, A theory of learning classification rules, PhD thesis. University of Technology, Sydney (1992)
W. Buntine, Learning classification trees. Stat. Comput. 2, 63–73 (1992)
R. Casey, G. Nagy, Decision tree design using a probabilistic model. IEEE Trans. Inf. Theory 30(1), 93–99 (1984)
B. Cestnik, I. Bratko, On estimating probabilities in tree pruning, Machine Learning-EWSL-91, Vol. 482. Lecture Notes in Computer Science (Springer, Berlin, 1991), pp. 138–150
B. Chandra, R. Kothari, P. Paul, A new node splitting measure for decision tree construction. Pattern Recognit. 43(8), 2725–2731 (2010)
B. Chandra, P.P. Varghese, Moving towards efficient decision tree construction. Inf. Sci. 179(8), 1059–1069 (2009)
J. Ching, A. Wong, K. Chan, Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995)
P. Chou, Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 340–354 (1991)
P. Clark, T. Niblett, The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
D. Coppersmith, S.J. Hong, J.R.M. Hosking, Partitioning nominal attributes in decision trees. Data Min. Knowl. Discov. 3, 197–217 (1999)
R.L. De Mántaras, A distance-based attribute selection measure for decision tree induction. Mach. Learn. 6(1), 81–92 (1991). ISSN: 0885–6125
G. De’ath, Multivariate regression trees: A new technique for modeling species-environment relationships. Ecology 83(4), 1105–1117 (2002)
L. Devroye, L. Györfi, G. Lugosi, A Probabilistic Theory of Pattern Recognition (Springer, New York, 1996)
M. Dong, R. Kothari, Look-ahead based fuzzy decision tree induction. IEEE Trans. Fuzzy Syst. 9(3), 461–468 (2001)
B. Draper, C. Brodley, Goal-directed classification using linear machine decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 16(9), 888–893 (1994)
S. Esmeir, S. Markovitch, Anytime learning of decision trees. J. Mach. Learn. Res. 8, 891–933 (2007)
F. Esposito, D. Malerba, G. Semeraro, A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 476–491 (1997)
F. Esposito, D. Malerba, G. Semeraro, A further study of pruning methods in decision tree induction, in Fifth International Workshop on Artificial Intelligence and Statistics. pp. 211–218 (1995)
F. Esposito, D. Malerba, G. Semeraro, Simplifying decision trees by pruning and grafting: new results (extended abstract), in 8th European Conference on Machine Learning. ECML’95. (Springer, London, 1995) pp. 287–290
U. Fayyad, K. Irani, The attribute selection problem in decision tree generation, in National Conference on Artificial Intelligence. pp. 104–110 (1992)
A. Frank, A. Asuncion, UCI Machine Learning Repository (2010)
A.A. Freitas, A critical review of multi-objective optimization in data mining: a position paper. SIGKDD Explor. Newsl. 6(2), 77–86 (2004). ISSN: 1931–0145
J.H. Friedman, A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. 100(4), 404–408 (1977)
S.B. Gelfand, C.S. Ravishankar, E.J. Delp, An iterative growing and pruning algorithm for classification tree design. IEEE Int. Conf. Syst. Man Cybern. 2, 818–823 (1989)
M.W. Gillo, MAID: A Honeywell 600 program for an automatised survey analysis. Behav. Sci. 17, 251–252 (1972)
M. Gleser, M. Collen, Towards automated medical decisions. Comput. Biomed. Res. 5(2), 180–189 (1972)
L.A. Goodman, W.H. Kruskal, Measures of association for cross classifications. J. Am. Stat. Assoc. 49(268), 732–764 (1954)
T. Hancock et al., Lower bounds on learning decision lists and trees. Inf. Comput. 126(2) (1996)
C. Hartmann et al., Application of information theory to the construction of efficient decision trees. IEEE Trans. Inf. Theory 28(4), 565–577 (1982)
R. Haskell, A. Noui-Mehidi, Design of hierarchical classifiers, in Computing in the 90s, Vol. 507. Lecture Notes in Computer Science, ed. by N. Sherwani, E. de Doncker, J. Kapenga (Springer, Berlin, 1991), pp. 118–124
H. Hauska, P. Swain, The decision tree classifier: design and potential, in 2nd Symposium on Machine Processing of Remotely Sensed Data (1975)
D. Heath, S. Kasif, S. Salzberg, Induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1993)
W. Hsiao, Y. Shih, Splitting variable selection for multivariate regression trees. Stat. Probab. Lett. 77(3), 265–271 (2007)
E.B. Hunt, J. Marin, P.J. Stone, Experiments in Induction (Academic Press, New York, 1966)
L. Hyafil, R. Rivest, Constructing optimal binary decision trees is NP-complete. Inf. Process. Lett. 5(1), 15–17 (1976)
A. Ittner, Non-linear decision trees, in 13th International Conference on Machine Learning. pp. 1–6 (1996)
B. Jun et al., A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 1371–1375 (1997)
G. Kalkanis, The application of confidence interval error analysis to the design of decision tree classifiers. Pattern Recognit. Lett. 14(5), 355–361 (1993)
A. Karali\(\check{c}\), Employing linear regression in regression tree leaves, 10th European Conference on Artificial Intelligence. ECAI’92 (Wiley, New York, 1992)
G.V. Kass, An exploratory technique for investigating large quantities of categorical data. APPL STATIST 29(2), 119–127 (1980)
B. Kim, D. Landgrebe, Hierarchical classifier design in high-dimensional numerous class cases. IEEE Trans. Geosci. Remote Sens. 29(4), 518–528 (1991)
I. Kononenko, I. Bratko, E. Roskar, Experiments in automatic learning of medical diagnostic rules. Technical Report Ljubljana, Yugoslavia: Jozef Stefan Institute (1984)
I. Kononenko, Estimating attributes: analysis and extensions of RELIEF, Proceedings of the European Conference on Machine Learning on Machine Learning (Springer, New York, 1994). ISBN: 3-540-57868-4
G. Landeweerd et al., Binary tree versus single level tree classification of white blood cells. Pattern Recognit. 16(6), 571–577 (1983)
D.R. Larsen, P.L. Speckman, Multivariate regression trees for analysis of abundance data. Biometrics 60(2), 543–549 (2004)
Y.-S. Lee, A new splitting approach for regression trees. Technical Report. Dongguk University, Department of Statistics: Dongguk University, Department of Statistics (2001)
X. Li, R.C. Dubes, Tree classifier design with a permutation statistic. Pattern Recognit. 19(3), 229–235 (1986)
X.-B. Li et al., Multivariate decision trees using linear discriminants and tabu search. IEEE Trans. Syst., Man, Cybern.-Part A: Syst. Hum. 33(2), 194–205 (2003)
H. Liu, R. Setiono, Feature transformation and multivariate decision tree induction. Discov. Sci. 1532, 279–291 (1998)
W. Loh, Y. Shih, Split selection methods for classification trees. Stat. Sin. 7, 815–840 (1997)
W. Loh, Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12, 361–386 (2002)
D. Malerba et al., Top-down induction of model trees with regression and splitting nodes. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 612–625 (2004)
O. Mangasarian, R. Setiono, W. H. Wolberg, Pattern recognition via linear programming: theory and application to medical diagnosis, in SIAM Workshop on Optimization (1990)
N. Manwani, P. Sastry, A Geometric Algorithm for Learning Oblique Decision Trees, in Pattern Recognition and Machine Intelligence, ed. by S. Chaudhury, et al. (Springer, Berlin, 2009), pp. 25–31
J. Martin, An exact probability metric for decision tree splitting and stopping. Mach. Learn. 28(2), 257–291 (1997)
J. Mingers, An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 4(2), 227–243 (1989)
J. Mingers, An empirical comparison of selection measures for decision-tree induction. Mach. Learn. 3(4), 319–342 (1989)
J. Mingers, Expert systems—rule induction with statistical data. J. Oper. Res. Soc. 38, 39–47 (1987)
T.M. Mitchell, Machine Learning (McGraw-Hill, New York, 1997)
F. Mola, R. Siciliano, A fast splitting procedure for classification trees. Stat. Comput. 7(3), 209–216 (1997)
J.N. Morgan, R.C. Messenger, THAID: a sequential search program for the analysis of nominal scale dependent variables. Technical Report. Institute for Social Research, University of Michigan (1973)
S.K. Murthy, S. Kasif, S.S. Salzberg, A system for induction of oblique decision trees. J. Artif. Intell. Res. 2, 1–32 (1994)
S.K. Murthy, Automatic construction of decision trees from data: A multi-disciplinary survey. Data Min. Knowl. Discov. 2(4), 345–389 (1998)
S.K. Murthy, S. Salzberg, Lookahead and pathology in decision tree induction, in 14th International Joint Conference on Artificial Intelligence. (Morgan Kaufmann, San Francisco, 1995), pp. 1025–1031
S.K. Murthy et al., OC1: A randomized induction of oblique decision trees, in Proceedings of the 11th National Conference on Artificial Intelligence (AAAI’93), pp. 322–327 (1993)
G.E. Naumov, NP-completeness of problems of construction of optimal decision trees. Sov. Phys. Doklady 36(4), 270–271 (1991)
T. Niblett, I. Bratko, Learning decision rules in noisy domains, in 6th Annual Technical Conference on Research and Development in Expert Systems III. pp. 25–34 (1986)
N.J. Nilsson, The Mathematical Foundations of Learning Machines (Morgan Kaufmann Publishers Inc., San Francisco, 1990). ISBN: 1-55860-123-6
S.W. Norton, Generating better decision trees, 11th International Joint Conference on Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco, 1989)
K. Osei-Bryson, Post-pruning in regression tree induction: an integrated approach. Expert Syst. Appl. 34(2), 1481–1490 (2008)
D. Page, S. Ray, Skewing: An efficient alternative to lookahead for decision tree induction, in 18th International Joint Conference on Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco, 2003), pp. 601–607
A. Patterson, T. Niblett, ACLS User Manual (Intelligent Terminals Ltd., Glasgow, 1983)
K. Pattipati, M. Alexandridis, Application of heuristic search and information theory to sequential fault diagnosis. IEEE Trans. Syst. Man Cybern. 20, 872–887 (1990)
J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, San Francisco, 1993). ISBN: 1-55860-238-0
J.R. Quinlan, Decision trees as probabilistic classifiers, in 4th International Workshop on Machine Learning (1987)
J.R. Quinlan, Discovering rules by induction from large collections of examples, in Expert Systems in the Micro-elect Age, ed. by D. Michie (Edinburgh University Press, Edinburgh, 1979)
J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
J.R. Quinlan, Learning with continuous classes, in 5th Australian Joint Conference on Artificial Intelligent. 92, pp. 343–348 (1992)
J.R. Quinlan, Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)
J.R. Quinlan, Unknown attribute values in induction, in 6th International Workshop on Machine Learning. pp. 164–168 (1989)
J.R. Quinlan, R.L. Rivest, Inferring decision trees using the minimum description length principle. Inf. Comput. 80(3), 227–248 (1989)
M. Robnik-Sikonja, I. Kononenko, Pruning regression trees with MDL, in European Conference on Artificial Intelligence. pp. 455–459 (1998)
L. Rokach, O. Maimon, Top-down induction of decision trees classifiers—a survey. IEEE Trans. Syst. Man, Cybern. Part C: Appl. Rev. 35(4), 476–487 (2005)
E.M. Rounds, A combined nonparametric approach to feature selection and binary decision tree design. Pattern Recognit. 12(5), 313–317 (1980)
J.P. Sá et al., Decision trees using the minimum entropy-of-error principle, in 13th International Conference on Computer Analysis of Images and Patterns (Springer, Berlin, 2009), pp. 799–807
S. Safavian, D. Landgrebe, A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 21(3), 660–674 (1991). ISSN: 0018–9472
I.K. Sethi, G.P.R. Sarvarayudu, Hierarchical classifier design using mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 4(4), 441–445 (1982)
S. Shah, P. Sastry, New algorithms for learning and pruning oblique decision trees. IEEE Trans. Syst. Man, Cybern. Part C: Applic. Rev. 29(4), 494–505 (1999)
C.E. Shannon, A mathematical theory of communication. BELL Syst. Tech. J. 27(1), 379–423, 625–56 (1948)
Y. Shih, Selecting the best categorical split for classification trees. Stat. Probab. Lett. 54, 341–345 (2001)
L.M. Silva et al., Error entropy in classification problems: a univariate data analysis. Neural Comput. 18(9), 2036–2061 (2006)
J.A. Sonquist, E.L. Baker, J.N. Morgan, Searching for structure. Technical Report. Institute for Social Research University of Michigan (1971)
J. Talmon, A multiclass nonparametric partitioning algorithm. Pattern Recognit. Lett. 4(1), 31–38 (1986)
P.J. Tan, D.L. Dowe, MML inference of oblique decision trees, in 17th Australian Joint Conference on AI. pp. 1082–1088 (2004)
P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining (Addison-Wesley, Boston, 2005)
P.C. Taylor, B.W. Silverman, Block diagrams and splitting criteria for classification trees. Stat. Comput. 3, 147–161 (1993)
P. Taylor, M. Jones, Splitting criteria for regression trees. J. Stat. Comput. Simul. 55(4), 267–285 (1996)
L. Torgo, Functional models for regression tree leaves, in 14th International Conference on Machine Learning. ICML’97. (Morgan Kaufmann Publishers Inc., San Francisco, 1997), pp. 385–393
L. Torgo, A comparative study of reliable error estimators for pruning regression trees, in Iberoamerican Conference on Artificial Intelligence (Springer, Berlin, 1998), pp. 1–12
L. Torgo, Error estimators for pruning regression trees, in 10th European Conference on Machine Learning (Springer, Berlin, 1998), pp. 125–130
K.P. Unnikrishnan, K.P. Venugopal, Alopex: A correlation-based learning algorithm for feedforward and recurrent neural networks. Neural Comput. 6, 469–490 (1994)
P.E. Utgoff, Perceptron trees: a case study in hybrid concept representations. Connect. Sci. 1(4), 377–391 (1989)
P.E. Utgoff, N.C. Berkman, J.A. Clouse, Decision tree induction based on efficient tree restructuring. Mach. Learn. 29(1), 5–44 (1997)
P.E. Utgoff, C.E. Brodley, Linear machine decision trees. Technical Report. University of Massachusetts, Dept of Comp Sci (1991)
P.E. Utgoff, J.A. Clouse. A Kolmogorov-Smirnoff Metric for Decision Tree Induction. Technical Report. University of Massachusetts, pp. 96–3 (1996)
P. Utgoff, C. Brodley, An incremental method for finding multivariate splits for decision trees, in 7th International Conference on Machine Learning. pp. 58–65 (1990)
P.K. Varshney, C.R.P. Hartmann, J.M.J. de Faria, Application of information theory to sequential fault diagnosis. IEEE Trans. Comput. 31(2), 164–170 (1982)
D. Wang, L. Jiang, An improved attribute selection measure for decision tree induction, in: 4th International Conference on Fuzzy Systems and Knowledge Discovery. pp. 654–658 (2007)
Y. Wang, I.H. Witten, Induction of model trees for predicting continuous classes, in Poster papers of the 9th European Conference on Machine Learning (Springer, Berlin, 1997)
A.P. White, W.Z. Liu, Technical note: Bias in information-based measures in decision tree induction. Mach. Learn. 15(3), 321–329 (1994)
S.S. Wilks, Mathematical Statistics (Wiley, New York, 1962)
I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations (Morgan Kaufmann, San Francisco, 1999). ISBN: 1558605525
C.T. Yildiz, E. Alpaydin, Omnivariate decision trees. IEEE Trans. Neural Netw. 12(6), 1539–1546 (2001)
H. Zantema, H. Bodlaender, Finding small equivalent decision trees is hard. Int. J. Found. Comput. Sci. 11(2), 343–354 (2000)
X. Zhou, T. Dillon, A statistical-heuristic feature selection criterion for decision tree induction. IEEE Trans. Pattern Anal. Mac. Intell. 13(8), 834–841 (1991)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 The Author(s)
About this chapter
Cite this chapter
Barros, R.C., de Carvalho, A.C.P.L.F., Freitas, A.A. (2015). Decision-Tree Induction. In: Automatic Design of Decision-Tree Induction Algorithms. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14231-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-14231-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14230-2
Online ISBN: 978-3-319-14231-9
eBook Packages: Computer ScienceComputer Science (R0)