Why Is Rule Learning Optimistic and How to Correct It

Možina, Martin; Demšar, Janez; Žabkar, Jure; Bratko, Ivan

doi:10.1007/11871842_33

Why Is Rule Learning Optimistic and How to Correct It

Martin Možina²¹,
Janez Demšar²¹,
Jure Žabkar²¹ &
…
Ivan Bratko²¹

Conference paper

5477 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Abstract

In their search through a huge space of possible hypotheses, rule induction algorithms compare estimations of qualities of a large number of rules to find the one that appears to be best. This mechanism can easily find random patterns in the data which will – even though the estimating method itself may be unbiased (such as relative frequency) – have optimistically high quality estimates. It is generally believed that the problem, which eventually leads to overfitting, can be alleviated by using m-estimate of probability. We show that this can only partially mend the problem, and propose a novel solution to making the common rule evaluation functions account for multiple comparisons in the search. Experiments on artificial data sets and data sets from the UCI repository show a large improvement in accuracy of probability predictions and also a decent gain in AUC of the constructed models.

Download to read the full chapter text

Chapter PDF

References

Atkinson, K.E.: An Introduction to Numerical Analysis. John Wiley and Sons, New York (1989)
MATH Google Scholar
Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)
Google Scholar
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)
Chapter Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning Journal 4(3), 261–283 (1989)
Google Scholar
Demšar, J., Zupan, B.: Orange: From experimental machine learning to interactive data mining. White Paper, Faculty of Computer and Information Science, University of Ljubljana (2004), http://www.ailab.si/orange
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Fisher, R.A., Tippett, L.H.C.: Limiting forms of the frequency distribution of the largest and smallest member of a sample. Proc. Camb. Phil. Soc. 24, 180–190 (1928)
Article MATH Google Scholar
Fuernkranz, J., Flach, P.A.: Roc ’n’ rule learning – towards a better understanding of covering algorithms. Machine Learning 58(1), 39–77 (2005)
Article MATH Google Scholar
Gumbel, E.J.: Statistical theory of extreme values and some practical applications. National Bureau of Standards Applied Mathematics Series (US Government Printing Office) 33 (1954)
Google Scholar
Gumbel, E.J., Lieblein, J.: Some applications of extreme-value models. American Statistician 8(5), 14–17 (1954)
Article Google Scholar
Gupta, S.S.: Order statistics from the gamma distribution. Technometrics 2, 243–262 (1960)
Article MATH MathSciNet Google Scholar
Jensen, D.D., Cohen, P.R.: Multiple comparisons in induction algorithms. Machine Learning 38(3), 309–338 (2000)
Article MATH Google Scholar
Lavrač, N., Flach, P.A., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS, vol. 1634, pp. 174–185. Springer, Heidelberg (1999)
Chapter Google Scholar
Li, W., Sun, F., Grosse, I.: Extreme value distribution based gene selection criteria for discriminant microarray data analysis using logistic regression. Journal of Computational Biology 11(2/3), 215–226 (2004)
Article Google Scholar
Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/mlrepository.html
Todorovski, L., Flach, P., Lavrač, N.: Predictive Performance of Weighted Relative Accuracy. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS, vol. 1910, pp. 255–264. Springer, Heidelberg (2000)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Tržaška cesta 25, SI-1001, Ljubljana
Martin Možina, Janez Demšar, Jure Žabkar & Ivan Bratko

Authors

Martin Možina
View author publications
You can also search for this author in PubMed Google Scholar
Janez Demšar
View author publications
You can also search for this author in PubMed Google Scholar
Jure Žabkar
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Bratko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Možina, M., Demšar, J., Žabkar, J., Bratko, I. (2006). Why Is Rule Learning Optimistic and How to Correct It. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_33

Download citation

DOI: https://doi.org/10.1007/11871842_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics