Why Is Rule Learning Optimistic and How to Correct It
In their search through a huge space of possible hypotheses, rule induction algorithms compare estimations of qualities of a large number of rules to find the one that appears to be best. This mechanism can easily find random patterns in the data which will – even though the estimating method itself may be unbiased (such as relative frequency) – have optimistically high quality estimates. It is generally believed that the problem, which eventually leads to overfitting, can be alleviated by using m-estimate of probability. We show that this can only partially mend the problem, and propose a novel solution to making the common rule evaluation functions account for multiple comparisons in the search. Experiments on artificial data sets and data sets from the UCI repository show a large improvement in accuracy of probability predictions and also a decent gain in AUC of the constructed models.
Unable to display preview. Download preview PDF.
- 2.Cestnik, B.: Estimating probabilities: A crucial task in machine learning. In: Proceedings of the Ninth European Conference on Artificial Intelligence, pp. 147–149 (1990)Google Scholar
- 4.Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning Journal 4(3), 261–283 (1989)Google Scholar
- 5.Demšar, J., Zupan, B.: Orange: From experimental machine learning to interactive data mining. White Paper, Faculty of Computer and Information Science, University of Ljubljana (2004), http://www.ailab.si/orange
- 6.Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)Google Scholar
- 9.Gumbel, E.J.: Statistical theory of extreme values and some practical applications. National Bureau of Standards Applied Mathematics Series (US Government Printing Office) 33 (1954)Google Scholar
- 15.Murphy, P.M., Aha, D.W.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1994), http://www.ics.uci.edu/~mlearn/mlrepository.html