The power of decision tables
 Ron Kohavi
 … show all 1 hide
Abstract
We evaluate the power of decision tables as a hypothesis space for supervised learning algorithms. Decision tables are one of the simplest hypothesis spaces possible, and usually they are easy to understand. Experimental results show that on artificial and realworld domains containing only discrete features, IDTM, an algorithm inducing decision tables, can sometimes outperform stateoftheart algorithms such as C4.5. Surprisingly, performance is quite good on some datasets with continuous features, indicating that many datasets used in machine learning either do not require these features, or that these features have few values. We also describe an incremental method for performing crossvalidation that is applicable to incremental learning algorithms including IDTM. Using incremental crossvalidation, it is possible to crossvalidate a given dataset and IDTM in time that is linear in the number of instances, the number of features, and the number of label values. The time for incremental crossvalidation is independent of the number of folds chosen, hence leaveoneout crossvalidation and tenfold crossvalidation take the same time.
 Aha, D. W. & Bankert, R. L. (1994), A comparative evaluation of sequential feature selection algorithms, in “Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics”, pp. 1–7.
 Almuallim, H. & Dietterich, T. G. (1991), Learning with many irrelevant features, in “Ninth National Conference on Artificial Intelligence”, MIT Press, pp. 547–552.
 Almuallim, H. & Dietterich, T. G. (1992), On learning more concepts, in “Proceedings of the Ninth International Conference on Machine Learning”, Morgan Kaufmann, pp. 11–19.
 Boyce, D., Farhi, A. & Weischedel, R. (1974), Optimal Subset Selection, SpringerVerlag.
 Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth International Group.
 Caruana, R. & Freitag, D. (1994), Greedy attribute selection, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
 Clark, P. & Niblett, T. (1989), “The CN2 induction algorithm”, MLJ 3(4), 261–283.
 Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1990), Introduction to algorithms, McGrawHill.
 Devijver, P. A. & Kittler, J. (1982), Pattern Recognition: A Statistical Approach, PrenticeHall International.
 Efron, B. (1983), “Estimating the error rate of a prediction rule: improvement on crossvalidation”, Journal of the American Statistical Association 78(382), 316–330.
 Garey, M. R. (1972), “Optimal binary identification procedures”, Siam Journal on Applied Mathematics 23, 173–186.
 Ginsberg, M. L. (1993), Essential of Artificial Intelligence, Morgan Kaufmann.
 Hartmann, C. R. P., Varshney, P. K., Mehrotra, K. G. & Gerberich, C. L. (1982), “Application of information theory to the construction of efficient decision trees”, IEEE Transactions on information theory IT28(4), 565–577.
 Holte, R. C. (1993), “Very simple classification rules perform well on most commonly used datasets”, Machine Learning 11, 63–90.
 Hyafil, L. & Rivest, R. L. (1976), “Constructing optimal binary decision trees is NPcomplete”, Information Processing Letters 5(1), 15–17.
 John, G. H. (1994), Crossvalidated C4.5: Using error estimation for automatic parameter selection, Technical Report STANCSTN9412, Computer Science Department, Stanford University.
 John, G., Kohavi, R. & Pfleger, K. (1994), Irrelevant features and the subset selection problem, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 121–129. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/ml94.ps.
 Kohavi, R. (1994a), Bottomup induction of oblivious, readonce decision graphs, in “Proceedings of the European Conference on Machine Learning”. Available by anonymous ftp from starry.Stanford.EDU:pub/ronnyk/euroML94.ps.
 Kohavi, R. (1994b), Bottomup induction of oblivious, readonce decision graphs: strengths and limitations, in “Twelfth National Conference on Artificial Intelligence”, pp. 613–618. Available by anonymous ftp from Starry.Stanford.EDU:pub/ronnyk/aaai94.ps.
 Kohavi, R. (1994c), Feature subset selection as search with probabilistic estimates, in “AAAI Fall Symposium on Relevance”, pp. 122–126. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/aaaiSymposium94.ps.
 Kohavi, R. & Frasca, B. (1994), Useful feature subsets and rough set reducts, in “Third International Workshop on Rough Sets and Soft Computing”, pp. 310–317. Available by anonymous ftp from: starry.Stanford.EDU:pub/ronnyk/rough.ps.
 Langley, P. & Sage, S. (1994), Induction of selective bayesian classifiers, in “Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence”, Morgan Kaufmann, Seattle, WA, pp. 399–406.
 Maron, O. & Moore, A. W. (1994), Hoeffding races: Accelerating model selection search for classification and function approximation, in “Advances in Neural Information Processing Systems”, Vol. 6, Morgan Kaufmann.
 Miller, A. J. (1990), Subset Selection in Regression, Chapman and Hall.
 Modrzejewski, M. (1993), Feature selection using rough sets theory, in P. B. Brazdil, ed., “Proceedings of the European Conference on Machine Learning”, pp. 213–226.
 Moore, A. W. & Lee, M. S. (1994), Efficient algorithms for minimizing cross validation error, in W. W. Cohen & H. Hirsh, eds, “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann.
 Moore, A. W., Hill, D. J. & Johnson, M. P. (1992), An empirical investigation of brute force to choose features, smoothers and function approximators, in “Computational Learning Theory and Natural Learning Systems Conference”.
 Murphy, P. M. & Aha, D. W. (1994), UCI repository of machine learning databases, For information contact mlrepository@ics.uci.edu.
 Nilsson, N. J. (1980), Principles of Artificial Intelligence, Morgan Kaufmann.
 Pawlak, Z. (1987), “Decision tables — a rough sets approach”, Bull. of EATCS 33, 85–96.
 Pawlak, Z. (1991), Rough Sets, Kluwer Academic Publishers.
 Pawlak, Z., Wong, S. & Ziarko, W. (1988), “Rough sets: Probabilistic versus deterministic approach”, Internation Journal of Man Machine Studies 29, 81–95.
 Quinlan, J. R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106. Reprinted in Shavlik and Dietterich (eds.) Readings in Machine Learning.
 Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, California.
 Reinwald, L. T. & Soland, R. M. (1966), “Conversion of limitedentry decision tables to optimal computer programs i: Minimum average processing time”, Journal of the ACM 13(3), 339–358.
 Reinwald, L. T. & Soland, R. M. (1967), “Conversion of limitedentry decision tables to optimal computer programs ii: Minimum storage requirement”, Journal of the ACM 14(4), 742–755.
 Rumelhart, D. E., Hinton, G. E. & Williams, R. J. (1986), Learning Internal Representations by Error Propagation, MIT Press, chapter 8.
 Schaffer, C. (1994), A conservation law for generalization performance, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 259–265.
 Schumacher, H. & Sevcik, K. C. (1976), “The synthetic approach to decision table conversion”, Communications of the ACM 19(6), 343–351.
 Shao, J. (1993), “Linear model seletion via crossvalidation”, Journal of the American Statistical Association 88(422), 486–494.
 Slowinski, R. (1992), Intelligent decision support: handbook of applications and advances of the rough sets theory, Kluwer Academic Publishers.
 Stone, M. (1974), “Crossvalidatory choice and assessment of statistical predictions”, Journal of the Royal Statistical Society B 36, 111–147.
 Taylor, C., Michie, D. & Spiegalhalter, D. (1994), Machine Learning, Neural and Statistical Classification, Paramount Publishing International.
 Thrun etal. (1991), The monk's problems: A performance comparison of different learning algorithms, Technical Report CMUCS91197, Carnegie Mellon University.
 Utgoff, P. E. (1994), An improved algorithm for incremental induction of decision trees, in “Machine Learning: Proceedings of the Eleventh International Conference”, Morgan Kaufmann, pp. 318–325.
 Weiss, S. M. (1991), “Small sample error rate estimation for knearest neighbor classifiers”, IEEE Transactions on Pattern Analysis and Machine Intelligence 13(3), 285–289.
 Weiss, S. M. & Kulikowski, C. A. (1991), Computer Systems that Learn, Morgan Kaufmann, San Mateo, CA.
 Wolpert, D. H. (1994), The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework, Technical report, The Santa Fe Institute, Santa Fe, NM.
 Zhang, P. (1992), “On the distributional properties of model selection criteria”, Journal of the American Statistical Association 87(419), 732–737.
 Ziarko, W. (1991), The discovery, analysis, and representation of data dependencies in databases, in G. PiatetskyShapiro & W. Frawley, eds, “Knowledge Discovery in Databases”, MIT Press.
 Title
 The power of decision tables
 Book Title
 Machine Learning: ECML95
 Book Subtitle
 8th European Conference on Machine Learning Heraclion, Crete, Greece, April 25–27, 1995 Proceedings
 Pages
 pp 174189
 Copyright
 1995
 DOI
 10.1007/3540592865_57
 Print ISBN
 9783540592860
 Online ISBN
 9783540492320
 Series Title
 Lecture Notes in Computer Science
 Series Volume
 912
 Series Subtitle
 Lecture Notes in Artificial Intelligence
 Series ISSN
 03029743
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 SpringerVerlag
 Additional Links
 Topics
 Industry Sectors
 eBook Packages
 Editors
 Authors

 Ron Kohavi ^{(1)}
 Author Affiliations

 1. Computer Science Department, Stanford University, 94305, Stanford, CA
Continue reading...
To view the rest of this content please follow the download PDF link above.