Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Abstract

This article reports an empirical investigation of the accuracy of rules that classify examples on the basis of a single attribute. On most datasets studied, the best of these very simple rules is as accurate as the rules induced by the majority of machine learning systems. The article explores the implications of this finding for machine learning research and applications.

References

  1. Aha, D.W., & Kibler, D. (1989). Noise-tolerant instance-based learning algorithms. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 794–799). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  2. Bergandano, F., Matwin, S., Michalski, R.S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The Poseidon system. Machine Learning, 8, 5–44.

    Google Scholar 

  3. Buntine, W. (1989). Learning classification rules using Bayes. in A Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 94–98). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  4. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–86.

    Google Scholar 

  5. Catlett, J. (1991a). Megainduction: A test flight. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 596–599). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  6. Catlett, J. (1991b). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp: 164–178). Springer-Verlag.

  7. Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.) Machine Learning—EWSL-91 (pp. 138–150). Springer-Verlag.

  8. Cestnik, G., Konenenko, I., & Bratko, I. (1987). Assistant-86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavrac (Eds.), Progress in Machine Learning (pp. 31–45). Wilmslow, England: Sigma Press.

    Google Scholar 

  9. Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Y. Kodratoff (Ed.), Machine Learning—EWSL-91 (pp. 151–163). Springer-Verlag.

  10. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.

    Google Scholar 

  11. Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrac (Eds.), Progress in machine learning (pp. 11–30). Wilmslow, England: Sigma Press.

    Google Scholar 

  12. de la Maza, M. (1991). A prototype based symbolic concept learning system. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 41–45). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  13. Diaconis, P., & Efron, B. (1983). Computer-intensive methods in statistics. Scientific American, 248.

  14. Fisher, D.H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172.

    Google Scholar 

  15. Fisher, D.H. & McKusick, K.B. (1989). An empirical comparison of ID3 and back-propagation. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 788–793). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  16. Fisher, D.H., & Schlimmer, J.C. (1988). Concept simplification and prediction accuracy. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 22–28). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  17. Hirsh, H. (1990). Learning from data with bounded inconsistency. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 32–39). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  18. Holder, L.B., Jr., (1991). Maintaining the utility of learned knowledge using model-based adaptive control. Ph.D. thesis, Computer Science Department, University of Illinois at Urbana-Champaign.

  19. Holte, R.C., Acker, L., & Porter, B.W. (1989). Concept learning and the problem of small disjuncts. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 813–818). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  20. Iba, W.F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.) Proceedings of the Ninth International Conference on Machine Learning (pp. 233–240). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  21. Iba, W., Wogulis, J., & Langley, P. (1988). Trading off simplicity and coverage in incremental concept learning. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 73–79). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  22. Jensen, D. (1992). Induction with randomization testing: Decision-oriented analysis of large data sets. Ph.D. thesis, Washington University, St. Louis, Missouri.

  23. Kibler, D., & Aha, D.W. (1988). Comparing instance-averaging with instance-filtering learning algorithms. In D. Sleeman (Ed.), EWSL88: Proceedings of the 3rd European Working Session on Learning (pp. 63–69). Pitman.

  24. Lopez de Mantaras, R. (1991). A Distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.

    Google Scholar 

  25. McLeish, M., & Cecile, M. (1990). Enhancing medical expert systems with knowledge obtained from statistical data. Annals of Mathematics and Artificial Intelligence, 2, 261–276.

    Google Scholar 

  26. Michalski, R.S. (1990). Learning flexible concepts: fundamental ideas and a method based on two-tiered representation. In Y. Kodratoff & R.S. Michalski (Eds.), Machine Learning: An Artificial Intelligence Approach (Vol. 3). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  27. Michalski, R.S., & Chilausky, R.L. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4 (2), 125–161.

    Google Scholar 

  28. Michalski, R.S., Mozetic, I., Hong, J., & Lavrac, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. Proceedings of the Fifth National Conference on Artificial Intelligence (pp. 1041–1045). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  29. Mingers, J. (1989). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4 (2), 227–243.

    Google Scholar 

  30. Quinlan, J.R. (1989). Unknown attribute values in induction. In A. Segre (Ed.), Proceedings of the 6th International Workshop on Machine Learning (pp. 164–168). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  31. Quinlan, J.R. (1987). Generating production rules from decision trees. Proceedings of the Tenth International Joint Conference on Artificial Intelligence (pp. 304–307). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  32. Quinlan, J.R. (1986). Induction of decision trees, Machine Learning, 1, 81–106.

    Google Scholar 

  33. Quinlan, J.R., Compton, P.J., Horn, K.A., & Lazurus, L. (1986). Inductive knowledge acquisition: a case study. Proceedings of the Second Australian Conference on Applications of Expert Systems. Sydney, Australia.

  34. Rendell, L., & Seshu, R. (1990). Learning hard concepts through constructive induction. Computational Intelligence, 6, 247–270.

    Google Scholar 

  35. Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.

    Google Scholar 

  36. Saxena, S. (1989). Evaluating alternative instance representations. In A. Segre (Ed.), Proceedings of the Sixth International Conference on Machine Learning (pp. 465–468). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  37. Schaffer, C. (in press). Overfitting avoidance as bias. Machine Learning.

  38. Schaffer, C. (1992). Sparse data and the effect of overfitting avoidance in decision tree induction. Proceedings of AAAI-92, the Tenth National Conference on Artificial Intelligence.

  39. Schlimmer, J.S. (1987). Concept acquisition through representational adjustment (Technical Report 87-19). Ph.D. thesis, Department of Information and Computer Science, University of California, Irvine.

  40. Schoenauer, M., & Sebag, M. (1990). Incremental learning of rules and meta-rules. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 49–57). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  41. Shapiro, A.D. (1987). Structured induction of expert systems. Reading, MA: Addison-Wesley.

    Google Scholar 

  42. Shavlik, J., Mooney, R.J., & Towell, G. (1991). Symbolic and neural learning algorithms: An experimental comparison. Machine Learning, 6, 111–143.

    Google Scholar 

  43. Tan, M., & Eshelman, L. (1988). Using weighted networks to represent classification knowledge in noisy domains. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 121–134). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  44. Tan, M., & Schlimmer, J. (1990). Two case studies in cost-sensitive concept acquisition. Proceedings ofAAAI-90, the Eighth National Conference on Artificial Intelligence (pp. 854–860). Cambridge, MA: MIT Press.

    Google Scholar 

  45. Utgoff, P.E., & Bradley, C.E. (1990). An incremental method for finding multivariate splits for decision trees. In B.W. Porter & R.J. Mooney (Eds.), Proceedings of the Seventh International Conference on Machine Learning (pp. 58–65). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  46. Weiss, S.M., Galen, R.S., & Tadepalli, P.V. (1990). Maximizing the predictive value of production rules. Artificial Intelligence, 45, 47–71.

    Google Scholar 

  47. Weiss, S.M., & Kapouleas, I. (1990). An empirical comparison of pattern recognition, neural nets, and machine learning classification methods. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 781–787). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  48. Wirth, J., & Catlett, J. (1988). Experiments on the costs and benefits of windowing in IDS. In J. Laird (Ed.), Proceedings of the Fifth International Conference on Machine Learning (pp. 87–99). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  49. Yeung, D.-Y. (1991). A neural network approach to constructive induction. In L.A. Birnbaum & G.C. Collins (Eds.), Proceedings of the Eighth International Conference on Machine Learning (pp. 228–232). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Holte, R.C. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–90 (1993). https://doi.org/10.1023/A:1022631118932

Download citation

  • empirical learning
  • accuracy–complexity tradeoff
  • pruning
  • ID3