Why Is Rule Learning Optimistic and How to Correct It

  • Martin Možina
  • Janez Demšar
  • Jure Žabkar
  • Ivan Bratko
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


In their search through a huge space of possible hypotheses, rule induction algorithms compare estimations of qualities of a large number of rules to find the one that appears to be best. This mechanism can easily find random patterns in the data which will – even though the estimating method itself may be unbiased (such as relative frequency) – have optimistically high quality estimates. It is generally believed that the problem, which eventually leads to overfitting, can be alleviated by using m-estimate of probability. We show that this can only partially mend the problem, and propose a novel solution to making the common rule evaluation functions account for multiple comparisons in the search. Experiments on artificial data sets and data sets from the UCI repository show a large improvement in accuracy of probability predictions and also a decent gain in AUC of the constructed models.


Class Probability True Probability Inductive Logic Programming Extreme Value Distribution Rule Quality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Atkinson, K.E.: An Introduction to Numerical Analysis. John Wiley and Sons, New York (1989)zbMATHGoogle Scholar
  2. 2.
    Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)Google Scholar
  3. 3.
    Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  4. 4.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning Journal 4(3), 261–283 (1989)Google Scholar
  5. 5.
    Demšar, J., Zupan, B.: Orange: From experimental machine learning to interactive data mining. White Paper, Faculty of Computer and Information Science, University of Ljubljana (2004),
  6. 6.
    Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  7. 7.
    Fisher, R.A., Tippett, L.H.C.: Limiting forms of the frequency distribution of the largest and smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180–190 (1928)zbMATHCrossRefGoogle Scholar
  8. 8.
    Fuernkranz, J., Flach, P.A.: Roc ’n’ rule learning – towards a better understanding of covering algorithms. Machine Learning 58(1), 39–77 (2005)zbMATHCrossRefGoogle Scholar
  9. 9.
    Gumbel, E.J.: Statistical theory of extreme values and some practical applications. National Bureau of Standards Applied Mathematics Series (US Government Printing Office) 33 (1954)Google Scholar
  10. 10.
    Gumbel, E.J., Lieblein, J.: Some applications of extreme-value models. American Statistician 8(5), 14–17 (1954)CrossRefGoogle Scholar
  11. 11.
    Gupta, S.S.: Order statistics from the gamma distribution. Technometrics 2, 243–262 (1960)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Machine Learning 38(3), 309–338 (2000)zbMATHCrossRefGoogle Scholar
  13. 13.
    Lavrač, N., Flach, P.A., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS, vol. 1634, pp. 174–185. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Li, W., Sun, F., Grosse, I.: Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression. Journal of Computational Biology 11(2/3), 215–226 (2004)CrossRefGoogle Scholar
  15. 15.
    Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994),
  16. 16.
    Todorovski, L., Flach, P., Lavrač, N.: Predictive Performance of Weighted Relative Accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 255–264. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Martin Možina
    • 1
  • Janez Demšar
    • 1
  • Jure Žabkar
    • 1
  • Ivan Bratko
    • 1
  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljana

Personalised recommendations