Shorter Rules Are Better, Aren’t They?

  • Julius Stecher
  • Frederik Janssen
  • Johannes FürnkranzEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9956)


It is conventional wisdom in inductive rule learning that shorter rules should be preferred over longer rules, a principle also known as Occam’s Razor. This is typically justified with the fact that longer rules tend to be more specific and are therefore also more likely to overfit the data. In this position paper, we would like to challenge this assumption by demonstrating that variants of conventional rule learning heuristics, so-called inverted heuristics, learn longer rules that are not more specific than the shorter rules learned by conventional heuristics. Moreover, we will argue with some examples that such longer rules may in many cases be more understandable than shorter rules, again contradicting a widely held view. This is not only relevant for subgroup discovery but also for related concepts like characteristic rules, formal concept analysis, or closed itemsets.


Formal Concept Analysis Subgroup Discovery Selection Heuristic Medical Dataset Brain Stroke 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank Dragan Gamberger, Nada Lavrač, and Heiko Paulheim for letting us play with their data.


  1. 1.
    Bensusan, H.: God doesn’t always shave with Occam’s Razor - learning when and how to prune. In: Nédellec, C., Rouveirol, C. (eds.) Proceedings of the 10th European Conference on Machine Learning (ECML 1998), pp. 119–124 (1998)Google Scholar
  2. 2.
    Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Occam’s Razor. Inf. Process. Lett. 24, 377–380 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Domingos, P.: The role of Occam’s Razor in knowledge discovery. Data Min. Knowl. Discovery 3(4), 409–425 (1999)CrossRefGoogle Scholar
  4. 4.
    Fürnkranz, J.: Separate-and-conquer rule learning. Artif. Intell. Rev. 13(1), 3–54 (1999)CrossRefzbMATHGoogle Scholar
  5. 5.
    Fürnkranz, J., Flach, P.A.: ROC ’n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)CrossRefzbMATHGoogle Scholar
  6. 6.
    Fürnkranz, J., Gamberger, D., Lavrač, N.: Foundations of Rule Learning. Springer, Heidelberg (2012)CrossRefzbMATHGoogle Scholar
  7. 7.
    Gamberger, D., Lavrač, N.: Active subgroup mining: a case study in coronary heart disease risk group detection. Artif. Intell. Med. 28(1), 27–57 (2003)CrossRefGoogle Scholar
  8. 8.
    Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999)CrossRefzbMATHGoogle Scholar
  9. 9.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  10. 10.
    Janssen, F., Fürnkranz, J.: On the quest for optimal rule learning heuristics. Mach. Learn. 78(3), 343–379 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Kralj, P., Lavrač, N., Gamberger, D., Krstačić, A.: Contrast set mining through subgroup discovery applied to brain ischaemina data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2016. LNCS (LNAI), vol. 4426, pp. 579–586. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71701-0_61 CrossRefGoogle Scholar
  12. 12.
    Kralj Novak, P., Lavrač, N., Webb, G.I.: Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining. J. Mach. Learn. Res. 10, 377–403 (2009)zbMATHGoogle Scholar
  13. 13.
    Lavrač, N., Kavšek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5, 153–188 (2004)MathSciNetGoogle Scholar
  14. 14.
    Michalski, R.S.: On the quasi-minimal solution of the general covering problem. In: Proceedings of the 5th International Symposium on Information Processing (FCIP 1969), pp. 125–128, Bled, Yugoslavia (1969)Google Scholar
  15. 15.
    Michalski, R.S.: A theory and methodology of inductive learning. Artif. Intell. 20(2), 111–162 (1983)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Mitchell, T.M.: The Need for Biases in Learning Generalizations. Technical report, Computer Science Department, Rutgers University, New Brunswick, MA (1980)Google Scholar
  17. 17.
    Murphy, P.M., Pazzani, M.J.: Exploring the decision forest: an empirical investigation of Occam’s Razor in decision tree induction. J. Artif. Intell. Res. 1, 257–275 (1994)zbMATHGoogle Scholar
  18. 18.
    Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proceedings of the International Conference on Web Intelligence and Semantics (WIMS 2012) (2012)Google Scholar
  19. 19.
    Ristoski, P., Paulheim, H.: Analyzing statistics with background knowledge from linked open data. In: Proceedings of the 1st International Workshop on Semantic Statistics (SemStats-2013). CEUR workshop proceedings, Sydney, Australia (2013)Google Scholar
  20. 20.
    Stecher, J., Janssen, F., Fürnkranz, J.: Separating rule refinement and rule selection heuristics in inductive rule learning. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 114–129. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44845-8_8 Google Scholar
  21. 21.
    Stumme, G., Taouil, R., Bastide, Y., Pasquier, N., Lakhal, L.: Computing iceberg concept lattices with Titanic. Data Knowl. Eng. 42(2), 189–222 (2002)CrossRefzbMATHGoogle Scholar
  22. 22.
    Webb, G.I.: Further experimental evidence against the utility of Occam’s Razor. J. Artif. Intell. Res. 4, 397–417 (1996)zbMATHGoogle Scholar
  23. 23.
    Wille, R.: Restructuring lattice theory: an approach based on hierarchies of concepts. In: Rival, I. (ed.) Ordered Sets, pp. 445–470. Reidel, Dordrecht-Boston (1982)CrossRefGoogle Scholar
  24. 24.
    Zaki, M.J., Hsiao, C.J.: CHARM: an efficient algorithm for closed itemset mining. In: Grossman, R.L., Han, J., Kumar, V., Mannila, H., Motwani, R. (eds.) Proceedings of the 2nd SIAM International Conference on Data Mining (SDM-02), pp. 457–473. Arlington, VA (2002)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Julius Stecher
    • 1
  • Frederik Janssen
    • 1
  • Johannes Fürnkranz
    • 1
    Email author
  1. 1.Technische Universität Darmstadt, Knowledge EngineeringDarmstadtGermany

Personalised recommendations