Why Is Rule Learning Optimistic and How to Correct It

  • Martin Možina
  • Janez Demšar
  • Jure Žabkar
  • Ivan Bratko
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


In their search through a huge space of possible hypotheses, rule induction algorithms compare estimations of qualities of a large number of rules to find the one that appears to be best. This mechanism can easily find random patterns in the data which will – even though the estimating method itself may be unbiased (such as relative frequency) – have optimistically high quality estimates. It is generally believed that the problem, which eventually leads to overfitting, can be alleviated by using m-estimate of probability. We show that this can only partially mend the problem, and propose a novel solution to making the common rule evaluation functions account for multiple comparisons in the search. Experiments on artificial data sets and data sets from the UCI repository show a large improvement in accuracy of probability predictions and also a decent gain in AUC of the constructed models.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Atkinson, K.E.: An Introduction to Numerical Analysis. John Wiley and Sons, New York (1989)MATHGoogle Scholar
  2. 2.
    Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)Google Scholar
  3. 3.
    Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  4. 4.
    Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning Journal 4(3), 261–283 (1989)Google Scholar
  5. 5.
    Demšar, J., Zupan, B.: Orange: From experimental machine learning to interactive data mining. White Paper, Faculty of Computer and Information Science, University of Ljubljana (2004), http://www.ailab.si/orange
  6. 6.
    Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
  7. 7.
    Fisher, R.A., Tippett, L.H.C.: Limiting forms of the frequency distribution of the largest and smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180–190 (1928)MATHCrossRefGoogle Scholar
  8. 8.
    Fuernkranz, J., Flach, P.A.: Roc ’n’ rule learning – towards a better understanding of covering algorithms. Machine Learning 58(1), 39–77 (2005)MATHCrossRefGoogle Scholar
  9. 9.
    Gumbel, E.J.: Statistical theory of extreme values and some practical applications. National Bureau of Standards Applied Mathematics Series (US Government Printing Office) 33 (1954)Google Scholar
  10. 10.
    Gumbel, E.J., Lieblein, J.: Some applications of extreme-value models. American Statistician 8(5), 14–17 (1954)CrossRefGoogle Scholar
  11. 11.
    Gupta, S.S.: Order statistics from the gamma distribution. Technometrics 2, 243–262 (1960)MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Machine Learning 38(3), 309–338 (2000)MATHCrossRefGoogle Scholar
  13. 13.
    Lavrač, N., Flach, P.A., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS, vol. 1634, pp. 174–185. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  14. 14.
    Li, W., Sun, F., Grosse, I.: Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression. Journal of Computational Biology 11(2/3), 215–226 (2004)CrossRefGoogle Scholar
  15. 15.
    Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/mlrepository.html
  16. 16.
    Todorovski, L., Flach, P., Lavrač, N.: Predictive Performance of Weighted Relative Accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 255–264. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Martin Možina
    • 1
  • Janez Demšar
    • 1
  • Jure Žabkar
    • 1
  • Ivan Bratko
    • 1
  1. 1.Faculty of Computer and Information ScienceUniversity of LjubljanaLjubljana

Personalised recommendations