Machine Learning

, Volume 11, Issue 1, pp 63–90 | Cite as

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

  • Robert C. Holte
Article

Abstract

This article reports an empirical investigation of the accuracy of rules that classify examples on the basis of a single attribute. On most datasets studied, the best of these very simple rules is as accurate as the rules induced by the majority of machine learning systems. The article explores the implications of this finding for machine learning research and applications.

empirical learning accuracy–complexity tradeoff pruning ID3 

References

  1. Aha, D.W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799). San Mateo, CA: Morgan Kaufmann.Google Scholar
  2. Bergandano, F., Matwin, S., Michalski, R.S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The Poseidon system. Machine Learning, 8, 5–44.Google Scholar
  3. Buntine, W. (1989). Learning classification rules using Bayes. in A Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 94–98). San Mateo, CA: Morgan Kaufmann.Google Scholar
  4. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–86.Google Scholar
  5. Catlett, J. (1991a). Megainduction: A test flight. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 596–599). San Mateo, CA: Morgan Kaufmann.Google Scholar
  6. Catlett, J. (1991b). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp: 164–178). Springer-Verlag.Google Scholar
  7. Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.) Machine Learning—EWSL-91 (pp. 138–150). Springer-Verlag.Google Scholar
  8. Cestnik, G., Konenenko, I., & Bratko, I. (1987). Assistant-86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavrac (Eds.), Progress in Machine Learning (pp. 31–45). Wilmslow, England: Sigma Press.Google Scholar
  9. Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp. 151–163). Springer-Verlag.Google Scholar
  10. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.Google Scholar
  11. Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrac (Eds.), Progress in machine learning (pp. 11–30). Wilmslow, England: Sigma Press.Google Scholar
  12. de la Maza, M. (1991). A prototype based symbolic concept learning system. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 41–45). San Mateo, CA: Morgan Kaufmann.Google Scholar
  13. Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, 248.Google Scholar
  14. Fisher, D.H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.Google Scholar
  15. Fisher, D.H. & McKusick, K.B. (1989). An empirical comparison of ID3 and back-propagation. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). San Mateo, CA: Morgan Kaufmann.Google Scholar
  16. Fisher, D.H., & Schlimmer, J.C. (1988). Concept simplification and prediction accuracy. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.Google Scholar
  17. Hirsh, H. (1990). Learning from data with bounded inconsistency. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 32–39). San Mateo, CA: Morgan Kaufmann.Google Scholar
  18. Holder, L.B., Jr., (1991). Maintaining the utility of learned knowledge using model-based adaptive control. Ph.D. thesis, Computer Science Department, University of Illinois at Urbana-Champaign.Google Scholar
  19. Holte, R.C., Acker, L., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.Google Scholar
  20. Iba, W.F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.) Proceedings of the Ninth International Conference on Machine Learning (pp. 233–240). San Mateo, CA: Morgan Kaufmann.Google Scholar
  21. Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.Google Scholar
  22. Jensen, D. (1992). Induction with randomization testing: Decision-oriented analysis of large data sets. Ph.D. thesis, Washington University, St. Louis, Missouri.Google Scholar
  23. Kibler, D., & Aha, D.W. (1988). Comparing instance-averaging with instance-filtering learning algorithms. In D. Sleeman (Ed.), EWSL88: Proceedings of the 3rd European Working Session on Learning (pp. 63–69). Pitman.Google Scholar
  24. Lopez de Mantaras, R. (1991). A Distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.Google Scholar
  25. McLeish, M., & Cecile, M. (1990). Enhancing medical expert systems with knowledge obtained from statistical data. Annals of Mathematics and Artificial Intelligence, 2, 261–276.Google Scholar
  26. Michalski, R.S. (1990). Learning flexible concepts: fundamental ideas and a method based on two-tiered representation. In Y. Kodratoff & R.S. Michalski (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 3). San Mateo, CA: Morgan Kaufmann.Google Scholar
  27. Michalski, R.S., & Chilausky, R.L. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4 (2), 125–161.Google Scholar
  28. Michalski, R.S., Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). San Mateo, CA: Morgan Kaufmann.Google Scholar
  29. Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 (2), 227–243.Google Scholar
  30. Quinlan, J.R. (1989). Unknown attribute values in induction. In A. Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 164–168). San Mateo, CA: Morgan Kaufmann.Google Scholar
  31. Quinlan, J.R. (1987). Generating production rules from decision trees. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307). San Mateo, CA: Morgan Kaufmann.Google Scholar
  32. Quinlan, J.R. (1986). Induction of decision trees, Machine Learning, 1, 81–106.Google Scholar
  33. Quinlan, J.R., Compton, P.J., Horn, K.A., & Lazurus, L. (1986). Inductive knowledge acquisition: a case study. Proceedings of the Second Australian Conference on Applications of Expert Systems. Sydney, Australia.Google Scholar
  34. Rendell, L., & Seshu, R. (1990). Learning hard concepts through constructive induction. Computational Intelligence, 6, 247–270.Google Scholar
  35. Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.Google Scholar
  36. Saxena, S. (1989). Evaluating alternative instance representations. In A. Segre (Ed.), Proceedings of the Sixth International Conference on Machine Learning (pp. 465–468). San Mateo, CA: Morgan Kaufmann.Google Scholar
  37. Schaffer, C. (in press). Overfitting avoidance as bias. Machine Learning.Google Scholar
  38. Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. Proceedings of AAAI-92, the Tenth National Conference on Artificial Intelligence.Google Scholar
  39. Schlimmer, J.S. (1987). Concept acquisition through representational adjustment (Technical Report 87-19). Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine.Google Scholar
  40. Schoenauer, M., & Sebag, M. (1990). Incremental learning of rules and meta-rules. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 49–57). San Mateo, CA: Morgan Kaufmann.Google Scholar
  41. Shapiro, A.D. (1987). Structured induction of expert systems. Reading, MA: Addison-Wesley.Google Scholar
  42. Shavlik, J., Mooney, R.J., & Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6, 111–143.Google Scholar
  43. Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 121–134). San Mateo, CA: Morgan Kaufmann.Google Scholar
  44. Tan, M., & Schlimmer, J. (1990). Two case studies in cost-sensitive concept acquisition. Proceedings ofAAAI-90, the Eighth National Conference on Artificial Intelligence (pp. 854–860). Cambridge, MA: MIT Press.Google Scholar
  45. Utgoff, P.E., & Bradley, C.E. (1990). An incremental method for finding multivariate splits for decision trees. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 58–65). San Mateo, CA: Morgan Kaufmann.Google Scholar
  46. Weiss, S.M., Galen, R.S., & Tadepalli, P.V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 47–71.Google Scholar
  47. Weiss, S.M., & Kapouleas, I. (1990). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.Google Scholar
  48. Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in IDS. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99). San Mateo, CA: Morgan Kaufmann.Google Scholar
  49. Yeung, D.-Y. (1991). A neural network approach to constructive induction. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 228–232). San Mateo, CA: Morgan Kaufmann.Google Scholar

Copyright information

© Kluwer Academic Publishers 1993

Authors and Affiliations

  • Robert C. Holte
    • 1
  1. 1.Computer Science DepartmentUniversity of OttawaOttawaCanada

Personalised recommendations