Abstract
We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and real-world domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform state-of-the-art algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing cross-validation that is applicable to incremental learning algorithms including IDTM. Using incremental cross-validation, it is possible to cross-validate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incremental cross-validation is independent of the number of folds chosen, hence leave-one-out cross-validation and ten-fold cross-validation take the same time.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aha, D. W. & Bankert, R. L. (1994), A comparative evaluation of sequential feature selection algorithms, in “Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics”, pp. 1–7.
Almuallim, H. & Dietterich, T. G. (1991), Learning with many irrelevant features, in “Ninth National Conference on Artificial Intelligence”, MIT Press, pp. 547–552.
Almuallim, H. & Dietterich, T. G. (1992), On learning more concepts, in “Proceedings of the Ninth International Conference on Machine Learning”, Morgan Kaufmann, pp. 11–19.
Boyce, D., Farhi, A. & Weischedel, R. (1974), Optimal Subset Selection, Springer-Verlag.
Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth International Group.
Caruana, R. & Freitag, D. (1994), Greedy attribute selection, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
Clark, P. & Niblett, T. (1989), “The CN2 induction algorithm”, MLJ 3(4), 261–283.
Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to algorithms, McGraw-Hill.
Devijver, P. A. & Kittler, J. (1982), Pattern Recognition: A Statistical Approach, Prentice-Hall International.
Efron, B. (1983), “Estimating the error rate of a prediction rule: improvement on cross-validation”, Journal of the American Statistical Association 78(382), 316–330.
Garey, M. R. (1972), “Optimal binary identification procedures”, Siam Journal on Applied Mathematics 23, 173–186.
Ginsberg, M. L. (1993), Essential of Artificial Intelligence, Morgan Kaufmann.
Hartmann, C. R. P., Varshney, P. K., Mehrotra, K. G. & Gerberich, C. L. (1982), “Application of information theory to the construction of efficient decision trees”, IEEE Transactions on information theory IT-28(4), 565–577.
Holte, R. C. (1993), “Very simple classification rules perform well on most commonly used datasets”, Machine Learning 11, 63–90.
Hyafil, L. & Rivest, R. L. (1976), “Constructing optimal binary decision trees is NP-complete”, Information Processing Letters 5(1), 15–17.
John, G. H. (1994), Cross-validated C4.5: Using error estimation for automatic parameter selection, Technical Report STAN-CS-TN-94-12, Computer Science Department, Stanford University.
John, G., Kohavi, R. & Pfleger, K. (1994), Irrelevant features and the subset selection problem, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 121–129. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/ml94.ps.
Kohavi, R. (1994a), Bottom-up induction of oblivious, read-once decision graphs, in “Proceedings of the European Conference on Machine Learning”. Available by anonymous ftp from starry.Stanford.EDU:pub/ronnyk/euroML94.ps.
Kohavi, R. (1994b), Bottom-up induction of oblivious, read-once decision graphs: strengths and limitations, in “Twelfth National Conference on Artificial Intelligence”, pp. 613–618. Available by anonymous ftp from Starry.Stanford.EDU:pub/ronnyk/aaai94.ps.
Kohavi, R. (1994c), Feature subset selection as search with probabilistic estimates, in “AAAI Fall Symposium on Relevance”, pp. 122–126. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/aaaiSymposium94.ps.
Kohavi, R. & Frasca, B. (1994), Useful feature subsets and rough set reducts, in “Third International Workshop on Rough Sets and Soft Computing”, pp. 310–317. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/rough.ps.
Langley, P. & Sage, S. (1994), Induction of selective bayesian classifiers, in “Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence”, Morgan Kaufmann, Seattle, WA, pp. 399–406.
Maron, O. & Moore, A. W. (1994), Hoeffding races: Accelerating model selection search for classification and function approximation, in “Advances in Neural Information Processing Systems”, Vol. 6, Morgan Kaufmann.
Miller, A. J. (1990), Subset Selection in Regression, Chapman and Hall.
Modrzejewski, M. (1993), Feature selection using rough sets theory, in P. B. Brazdil, ed., “Proceedings of the European Conference on Machine Learning”, pp. 213–226.
Moore, A. W. & Lee, M. S. (1994), Efficient algorithms for minimizing cross validation error, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
Moore, A. W., Hill, D. J. & Johnson, M. P. (1992), An empirical investigation of brute force to choose features, smoothers and function approximators, in “Computational Learning Theory and Natural Learning Systems Conference”.
Murphy, P. M. & Aha, D. W. (1994), UCI repository of machine learning databases, For information contact ml-repository@ics.uci.edu.
Nilsson, N. J. (1980), Principles of Artificial Intelligence, Morgan Kaufmann.
Pawlak, Z. (1987), “Decision tables — a rough sets approach”, Bull. of EATCS 33, 85–96.
Pawlak, Z. (1991), Rough Sets, Kluwer Academic Publishers.
Pawlak, Z., Wong, S. & Ziarko, W. (1988), “Rough sets: Probabilistic versus deterministic approach”, Internation Journal of Man Machine Studies 29, 81–95.
Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106. Reprinted in Shavlik and Dietterich (eds.) Readings in Machine Learning.
Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, California.
Reinwald, L. T. & Soland, R. M. (1966), “Conversion of limited-entry decision tables to optimal computer programs i: Minimum average processing time”, Journal of the ACM 13(3), 339–358.
Reinwald, L. T. & Soland, R. M. (1967), “Conversion of limited-entry decision tables to optimal computer programs ii: Minimum storage requirement”, Journal of the ACM 14(4), 742–755.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986), Learning Internal Representations by Error Propagation, MIT Press, chapter 8.
Schaffer, C. (1994), A conservation law for generalization performance, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 259–265.
Schumacher, H. & Sevcik, K. C. (1976), “The synthetic approach to decision table conversion”, Communications of the ACM 19(6), 343–351.
Shao, J. (1993), “Linear model seletion via cross-validation”, Journal of the American Statistical Association 88(422), 486–494.
Slowinski, R. (1992), Intelligent decision support: handbook of applications and advances of the rough sets theory, Kluwer Academic Publishers.
Stone, M. (1974), “Cross-validatory choice and assessment of statistical predictions”, Journal of the Royal Statistical Society B 36, 111–147.
Taylor, C., Michie, D. & Spiegalhalter, D. (1994), Machine Learning, Neural and Statistical Classification, Paramount Publishing International.
Thrun etal. (1991), The monk's problems: A performance comparison of different learning algorithms, Technical Report CMU-CS-91-197, Carnegie Mellon University.
Utgoff, P. E. (1994), An improved algorithm for incremental induction of decision trees, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 318–325.
Weiss, S. M. (1991), “Small sample error rate estimation for k-nearest neighbor classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(3), 285–289.
Weiss, S. M. & Kulikowski, C. A. (1991), Computer Systems that Learn, Morgan Kaufmann, San Mateo, CA.
Wolpert, D. H. (1994), The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework, Technical report, The Santa Fe Institute, Santa Fe, NM.
Zhang, P. (1992), “On the distributional properties of model selection criteria”, Journal of the American Statistical Association 87(419), 732–737.
Ziarko, W. (1991), The discovery, analysis, and representation of data dependencies in databases, in G. Piatetsky-Shapiro & W. Frawley, eds, “Knowledge Discovery in Databases”, MIT Press.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kohavi, R. (1995). The power of decision tables. In: Lavrac, N., Wrobel, S. (eds) Machine Learning: ECML-95. ECML 1995. Lecture Notes in Computer Science, vol 912. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-59286-5_57
Download citation
DOI: https://doi.org/10.1007/3-540-59286-5_57
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-59286-0
Online ISBN: 978-3-540-49232-0
eBook Packages: Springer Book Archive