The power of decision tables

Kohavi, Ron

doi:10.1007/3-540-59286-5_57

Ron Kohavi¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 912))

Included in the following conference series:

European Conference on Machine Learning

2607 Accesses
250 Citations

Abstract

We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform state-of-the-art algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing cross-validation that is applicable to incremental learning algorithms including IDTM. Using incremental cross-validation, it is possible to cross-validate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incremental cross-validation is independent of the number of folds chosen, hence leave-one-out cross-validation and ten-fold cross-validation take the same time.

Download to read the full chapter text

Chapter PDF

SPAARC: A Fast Decision Tree Algorithm

Recent advances in decision trees: an updated survey

Article 10 October 2022

Decision Tree Induction Methods and Their Application to Big Data

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aha, D. W. & Bankert, R. L. (1994), A comparative evaluation of sequential feature selection algorithms, in “Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics”, pp. 1–7.
Google Scholar
Almuallim, H. & Dietterich, T. G. (1991), Learning with many irrelevant features, in “Ninth National Conference on Artificial Intelligence”, MIT Press, pp. 547–552.
Google Scholar
Almuallim, H. & Dietterich, T. G. (1992), On learning more concepts, in “Proceedings of the Ninth International Conference on Machine Learning”, Morgan Kaufmann, pp. 11–19.
Google Scholar
Boyce, D., Farhi, A. & Weischedel, R. (1974), Optimal Subset Selection, Springer-Verlag.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth International Group.
Google Scholar
Caruana, R. & Freitag, D. (1994), Greedy attribute selection, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
Google Scholar
Clark, P. & Niblett, T. (1989), “The CN2 induction algorithm”, MLJ 3(4), 261–283.
Google Scholar
Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to algorithms, McGraw-Hill.
Google Scholar
Devijver, P. A. & Kittler, J. (1982), Pattern Recognition: A Statistical Approach, Prentice-Hall International.
Google Scholar
Efron, B. (1983), “Estimating the error rate of a prediction rule: improvement on cross-validation”, Journal of the American Statistical Association 78(382), 316–330.
Google Scholar
Garey, M. R. (1972), “Optimal binary identification procedures”, Siam Journal on Applied Mathematics 23, 173–186.
Google Scholar
Ginsberg, M. L. (1993), Essential of Artificial Intelligence, Morgan Kaufmann.
Google Scholar
Hartmann, C. R. P., Varshney, P. K., Mehrotra, K. G. & Gerberich, C. L. (1982), “Application of information theory to the construction of efficient decision trees”, IEEE Transactions on information theory IT-28(4), 565–577.
Google Scholar
Holte, R. C. (1993), “Very simple classification rules perform well on most commonly used datasets”, Machine Learning 11, 63–90.
Google Scholar
Hyafil, L. & Rivest, R. L. (1976), “Constructing optimal binary decision trees is NP-complete”, Information Processing Letters 5(1), 15–17.
Google Scholar
John, G. H. (1994), Cross-validated C4.5: Using error estimation for automatic parameter selection, Technical Report STAN-CS-TN-94-12, Computer Science Department, Stanford University.
Google Scholar
John, G., Kohavi, R. & Pfleger, K. (1994), Irrelevant features and the subset selection problem, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 121–129. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/ml94.ps.
Google Scholar
Kohavi, R. (1994a), Bottom-up induction of oblivious, read-once decision graphs, in “Proceedings of the European Conference on Machine Learning”. Available by anonymous ftp from starry.Stanford.EDU:pub/ronnyk/euroML94.ps.
Google Scholar
Kohavi, R. (1994b), Bottom-up induction of oblivious, read-once decision graphs: strengths and limitations, in “Twelfth National Conference on Artificial Intelligence”, pp. 613–618. Available by anonymous ftp from Starry.Stanford.EDU:pub/ronnyk/aaai94.ps.
Google Scholar
Kohavi, R. (1994c), Feature subset selection as search with probabilistic estimates, in “AAAI Fall Symposium on Relevance”, pp. 122–126. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/aaaiSymposium94.ps.
Google Scholar
Kohavi, R. & Frasca, B. (1994), Useful feature subsets and rough set reducts, in “Third International Workshop on Rough Sets and Soft Computing”, pp. 310–317. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/rough.ps.
Google Scholar
Langley, P. & Sage, S. (1994), Induction of selective bayesian classifiers, in “Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence”, Morgan Kaufmann, Seattle, WA, pp. 399–406.
Google Scholar
Maron, O. & Moore, A. W. (1994), Hoeffding races: Accelerating model selection search for classification and function approximation, in “Advances in Neural Information Processing Systems”, Vol. 6, Morgan Kaufmann.
Google Scholar
Miller, A. J. (1990), Subset Selection in Regression, Chapman and Hall.
Google Scholar
Modrzejewski, M. (1993), Feature selection using rough sets theory, in P. B. Brazdil, ed., “Proceedings of the European Conference on Machine Learning”, pp. 213–226.
Google Scholar
Moore, A. W. & Lee, M. S. (1994), Efficient algorithms for minimizing cross validation error, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
Google Scholar
Moore, A. W., Hill, D. J. & Johnson, M. P. (1992), An empirical investigation of brute force to choose features, smoothers and function approximators, in “Computational Learning Theory and Natural Learning Systems Conference”.
Google Scholar
Murphy, P. M. & Aha, D. W. (1994), UCI repository of machine learning databases, For information contact ml-repository@ics.uci.edu.
Google Scholar
Nilsson, N. J. (1980), Principles of Artificial Intelligence, Morgan Kaufmann.
Google Scholar
Pawlak, Z. (1987), “Decision tables — a rough sets approach”, Bull. of EATCS 33, 85–96.
Google Scholar
Pawlak, Z. (1991), Rough Sets, Kluwer Academic Publishers.
Google Scholar
Pawlak, Z., Wong, S. & Ziarko, W. (1988), “Rough sets: Probabilistic versus deterministic approach”, Internation Journal of Man Machine Studies 29, 81–95.
Google Scholar
Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106. Reprinted in Shavlik and Dietterich (eds.) Readings in Machine Learning.
Google Scholar
Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, California.
Google Scholar
Reinwald, L. T. & Soland, R. M. (1966), “Conversion of limited-entry decision tables to optimal computer programs i: Minimum average processing time”, Journal of the ACM 13(3), 339–358.
Google Scholar
Reinwald, L. T. & Soland, R. M. (1967), “Conversion of limited-entry decision tables to optimal computer programs ii: Minimum storage requirement”, Journal of the ACM 14(4), 742–755.
Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986), Learning Internal Representations by Error Propagation, MIT Press, chapter 8.
Google Scholar
Schaffer, C. (1994), A conservation law for generalization performance, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 259–265.
Google Scholar
Schumacher, H. & Sevcik, K. C. (1976), “The synthetic approach to decision table conversion”, Communications of the ACM 19(6), 343–351.
Google Scholar
Shao, J. (1993), “Linear model seletion via cross-validation”, Journal of the American Statistical Association 88(422), 486–494.
Google Scholar
Slowinski, R. (1992), Intelligent decision support: handbook of applications and advances of the rough sets theory, Kluwer Academic Publishers.
Google Scholar
Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions”, Journal of the Royal Statistical Society B 36, 111–147.
Google Scholar
Taylor, C., Michie, D. & Spiegalhalter, D. (1994), Machine Learning, Neural and Statistical Classification, Paramount Publishing International.
Google Scholar
Thrun etal. (1991), The monk's problems: A performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University.
Google Scholar
Utgoff, P. E. (1994), An improved algorithm for incremental induction of decision trees, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 318–325.
Google Scholar
Weiss, S. M. (1991), “Small sample error rate estimation for k-nearest neighbor classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(3), 285–289.
Google Scholar
Weiss, S. M. & Kulikowski, C. A. (1991), Computer Systems that Learn, Morgan Kaufmann, San Mateo, CA.
Google Scholar
Wolpert, D. H. (1994), The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework, Technical report, The Santa Fe Institute, Santa Fe, NM.
Google Scholar
Zhang, P. (1992), “On the distributional properties of model selection criteria”, Journal of the American Statistical Association 87(419), 732–737.
Google Scholar
Ziarko, W. (1991), The discovery, analysis, and representation of data dependencies in databases, in G. Piatetsky-Shapiro & W. Frawley, eds, “Knowledge Discovery in Databases”, MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Stanford University, 94305, Stanford, CA
Ron Kohavi

Authors

Ron Kohavi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Nada Lavrac Stefan Wrobel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kohavi, R. (1995). The power of decision tables. In: Lavrac, N., Wrobel, S. (eds) Machine Learning: ECML-95. ECML 1995. Lecture Notes in Computer Science, vol 912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59286-5_57

Download citation

DOI: https://doi.org/10.1007/3-540-59286-5_57
Published: 04 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59286-0
Online ISBN: 978-3-540-49232-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

The power of decision tables

Abstract

Chapter PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Recent advances in decision trees: an updated survey

Decision Tree Induction Methods and Their Application to Big Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The power of decision tables

Abstract

Chapter PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Recent advances in decision trees: an updated survey

Decision Tree Induction Methods and Their Application to Big Data

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation