Machine Learning

, Volume 82, Issue 3, pp 475–481 | Cite as

An experimental test of Occam’s razor in classification

Technical Note

Abstract

A widely persisting interpretation of Occam’s razor is that given two classifiers with the same training error, the simpler classifier is more likely to generalize better. Within a long-lasting debate in the machine learning community over Occam’s razor, Domingos (Data Min. Knowl. Discov. 3:409–425, 1999) rejects this interpretation and proposes that model complexity is only a confounding factor usually correlated with the number of models from which the learner selects. It is thus hypothesized that the risk of overfitting (poor generalization) follows only from the number of model tests rather than the complexity of the selected model. We test this hypothesis on 30 UCI data sets using polynomial classification models. The results confirm Domingos’ hypothesis on the 0.05 significance level and thus refutes the above interpretation of Occam’s razor. Our experiments however also illustrate that decoupling the two factors (model complexity and number of model tests) is problematic.

Keywords

Model complexity Generalization Empirical evaluation 

References

  1. Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425. CrossRefGoogle Scholar
  2. Esmeir, S., & Markovitch, S. (2007). Occam’s razor just got sharper. In M. Veloso (Ed.), IJCAI’07: proceedings of the 20th international joint conference on artificial intelligence (pp. 768–773). San Mateo: Morgan Kaufmann. Google Scholar
  3. Gamberger, D., & Lavrač, N. (1997). Conditions for Occam’s razor applicability and noise elimination. In M. van Someren & G. Widmer (Eds.), ECML’97: proceedings of the 9th European conference on machine learning (pp. 108–123). Berlin: Springer. Google Scholar
  4. Grünwald, P. (2001). Occam, Bayes, MDL and the real world (presentation slides). In NIPS 2001 workshop on Occam’s razor. Google Scholar
  5. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Berlin: Springer. MATHGoogle Scholar
  6. Jensen, D., & Cohen, P. R. (2000). Multiple comparisons in induction algorithms. Machine Learning, 38, 308–338. CrossRefGoogle Scholar
  7. Murphy, P. M., & Pazzani, M. J. (1994). Exploring the decision forest: an empirical investigation of Occam’s razor in decision tree induction. Journal of Artificial Intelligence Research, 1(1), 257–275. MATHGoogle Scholar
  8. Needham, S. L., & Dowe, D. L. (2001). Message length as an effective Ockham’s razor in decision tree induction. In T. Richardson & T. Jaakkola (Eds.), Proceedings of the 8th international workshop on artificial intelligence and statistics (pp. 253–260). Google Scholar
  9. Nolan, D. (1997). Quantitative parsimony. The British Journal for the Philosophy of Science, 48(2), 329–343. CrossRefGoogle Scholar
  10. Paes, A., Železný, F., Zaverucha, G., Page, D., & Srinivasan, A. (2007). ILP through propositionalization and stochastic k-term DNF learning. In S. Muggleton, R. Otero, & A. Tamaddoni-Nezhad (Eds.), ILP’06: proceedings of the 16th international conference on inductive logic programming (pp. 379–393). Google Scholar
  11. Piatetsky-Shapiro, G. (1996). Editorial comments. KDD Nuggets, 96, 28. Google Scholar
  12. Quinlan, J. R., & Cameron-Jones, R. (1995). Oversearching and layered search in empirical learning. In L. P. Kaelbling & A. Saffiotti (Eds.), IJCAI’95: proceedings of the 14th international joint conference on artificial intelligence (pp. 1019–1024). San Mateo: Morgan Kaufmann. Google Scholar
  13. Rückert, U., & Kramer, S. (2003). Stochastic local search in k-term DNF learning. In T. Fawcett & N. Mishra (Eds.), ICML 2003: proceedings of the 20th international conference on machine learning (pp. 648–655). New York: AAAI Press. Google Scholar
  14. Železný, F., Srinivasan, A., & Page, D. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208. MATHCrossRefGoogle Scholar
  15. Webb, G. I. (1996). Further experimental evidence against the utility of Occam’s razor. Journal of Artificial Intelligence Research, 4, 397–417. MATHGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Faculty of Electrical EngineeringCzech Technical University in PraguePrahaCzech Republic

Personalised recommendations