On the quest for optimal rule learning heuristics

Janssen, Frederik; Fürnkranz, Johannes

doi:10.1007/s10994-009-5162-2

On the quest for optimal rule learning heuristics

Published: 09 December 2009

Volume 78, pages 343–379, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

On the quest for optimal rule learning heuristics

Download PDF

Frederik Janssen¹ &
Johannes Fürnkranz¹

1124 Accesses
43 Citations
Explore all metrics

Abstract

The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.

References

Akaike, H. (1974). A new look at the statistical model selection. IEEE Transactions on Automatic Control, 19(6), 716–723.
Article MATH MathSciNet Google Scholar
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://archive.ics.uci.edu/ml/.
Bayardo, R. Jr., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-97) (pp. 145–154).
Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 265–276).
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.
Google Scholar
Burges, S. (2006). Meta-Lernen einer Evaluierungs-Funktion für einen Regel-Lerner. Master’s thesis, TU Darmstadt, December 2006 (in German) (English title: Meta-learning of an evaluation function for a rule learner).
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In L. Aiello (Ed.), Proceedings of the 9th European conference on artificial intelligence (pp. 147–150). ECAI-90, Stockholm, Sweden, 1990. London: Pitman.
Google Scholar
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European working session on learning (pp. 151–163). EWSL-91, Porto, Portugal, 1991. Berlin: Springer.
Google Scholar
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.
Google Scholar
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA, July 9–12, 1995. San Mateo: Morgan Kaufmann.
Google Scholar
Demsar, J. (2006). Statistical comparisons of classifiers over multiple datasets. Journal of Machine Learning Research, 7, 1–30.
MathSciNet Google Scholar
Fan, R.-E., Chen, P.-H., Lin, C.-J., & Joachims, T. (2005). Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6, 1889–1918.
Google Scholar
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th international conference on machine learning (pp. 144–151). ICML-98, Madison, WI, 1998. San Mateo: Morgan Kaufmann.
Google Scholar
Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54.
Article MATH Google Scholar
Fürnkranz, J. (2004). Modeling rule precision. In J. Fürnkranz (Ed.), Proceedings of the ECML/PKDD-04 workshop on advances in inductive rule learning (pp. 30–45). Pisa, Italy, 2004.
Fürnkranz, J. (2004). Fossil: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Lecture notes in artificial intelligence : Vol. 784. Proceedings of the 7th European conference on machine learning (pp. 122–137). ECML-94, Catania, Italy, 1994. Berlin: Springer.
Google Scholar
Fürnkranz, J. (1997). Pruning algorithms for rule learning. Machine Learning, 27(2), 139–171.
Article Google Scholar
Fürnkranz, J., & Flach, P. (2004). An analysis of stopping and filtering criteria for rule learning. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Lecture notes in artificial intelligence : Vol. 3201. Proceedings of the 15th European conference on machine learning (pp. 123–133). ECML-04, Pisa, Italy, 2004. Berlin: Springer.
Google Scholar
Fürnkranz, J., & Flach, P. A. (2005). ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Machine Learning, 58(1), 39–77.
Article MATH Google Scholar
Fürnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In W. Cohen & H. Hirsh (Eds.), Proceedings of the 11th international conference on machine learning (pp. 70–77). ML-94, New Brunswick, NJ, 1994. San Mateo: Morgan Kaufmann.
Google Scholar
Holte, R., Acker, L., & Porter, B. (1989). Concept learning and the problem of small disjuncts. In Proceedings of the 11th international joint conference on artificial intelligence (pp. 813–818). IJCAI-89, Detroit, MI, 1989. San Mateo: Morgan Kaufmann.
Google Scholar
Janssen, F., & Fürnkranz, J. (2007). On meta-learning rule learning heuristics. In Proceedings of the 7th IEEE conference on data mining (pp. 529–534). ICDM-07, Omaha, NE, 2007.
Janssen, F., & Fürnkranz, J. (2008). An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In T. Horvath, J.-F. Boulicaut, & M. Berthold (Eds.), Proceedings of the 11th international conference on discovery science (pp. 40–51). DS-08, Budapest, Hungary, 2008. Berlin: Springer.
Google Scholar
Janssen, F., & Fürnkranz, J. (2009). A re-evaluation of the over-searching phenomenon in inductive rule learning. In Proceedings of the SIAM international conference on data mining (pp. 329–340). SDM-09, Sparks, NV, 2009.
Klösgen, W. (1992). Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora. International Journal of Intelligent Systems, 7, 649–673.
Article MATH Google Scholar
Lavrač, N., Flach, P., & Zupan, B. (1999). Rule evaluation measures: a unifying view. In S. Džeroski & P. Flach (Eds.), Proceedings of the 9th international workshop on inductive logic programming (ILP-99) (pp. 174–185). Berlin: Springer.
Google Scholar
Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Google Scholar
Lavrač, N., Cestnik, B., & Džeroski, S. (1992a). Search heuristics in empirical inductive logic programming. In Logical approaches to machine learning, workshop notes of the 10th European conference on AI, Vienna, Austria, 1992.
Lavrač, N., Cestnik, B., & Džeroski, S. (1992b). Use of heuristics in empirical inductive logic programming. In S. H. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd international workshop on inductive logic programming (ILP-92), Number TM-1182 in ICOT Technical Memorandum, Tokyo, Japan, 1992. Institute for New Generation Computer Technology.
Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th international symposium on information processing (pp. 125–128). Switching Circuits, Vol. A3, FCIP-69, Bled, Yugoslavia, 1969.
Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342.
Google Scholar
Mozina, M., Demšar, J., Zabkar, J., & Bratko, I. (2006). Why is rule learning optimistic and how to correct it. In Machine learning: ECML 2006, 17th European conference on machine learning (pp. 330–340).
Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3, 4), 245–286. Special issue on inductive logic programming.
Article Google Scholar
Quinlan, J. (1996). Learning first-order definitions of functions. Journal of Artificial Intelligence Research, 5, 139–161.
MATH Google Scholar
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Google Scholar
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning. An artificial intelligence approach (pp. 463–482). Palo Alto: Tioga.
Google Scholar
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill.
Google Scholar
Scheffer, T. (2005). Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(3), 381–395.
Google Scholar
Tan, P.-N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 32–41). KDD-02, Edmonton, Alberta, 2002.
Thiel, M. (2005). Separate and Conquer Framework und disjunktive Regeln. Master’s thesis, TU Darmstadt, 2005. In German (English title: Separate and conquer framework and disjunctive rules).
Todorovski, L., Flach, P., & Lavrac, N. (2000). Predictive performance of weighted relative accuracy. In D. A. Zighed, J. Komorowski, & J. Zytkow (Eds.), 4th European conference on principles of data mining and knowledge discovery (PKDD2000) (pp. 255–264). Berlin: Springer.
Chapter Google Scholar
Vapnik, V., Levin, E., & Cun, Y. L. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876.
Article Google Scholar
Witten, I. H., & Frank, E. (2005). Data mining—practical machine learning tools and techniques with java implementations (2nd edn.). San Mateo: Morgan Kaufmann. http://www.cs.waikato.ac.nz/~ml/weka/.
Google Scholar
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In J. Komorowski & J. Zytkow (Eds.), Proc. first European symposium on principles of data mining and knowledge discovery (pp. 78–87). PKDD-97, Berlin, 1997. Berlin: Springer.
Google Scholar
Wu, T., Chen, Y., & Han, J. (2007). Association mining in large databases: a re-examination of its measures. In Proceedings of the 11th European symposium on principles of data mining and knowledge discovery (pp. 621–628). PKDD-07, Warsaw, Poland, 2007. Berlin: Springer.
Google Scholar
Xiong, H., Shekhar, S., Tan, P.-N., & Kumar, V. (2004). Exploiting a support-based upper bound of Pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 334–343). KDD-04, Seattle, USA, 2004.

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Frederik Janssen & Johannes Fürnkranz

Authors

Frederik Janssen
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Fürnkranz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Fürnkranz.

Additional information

Editor: Hendrik Blockeel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Janssen, F., Fürnkranz, J. On the quest for optimal rule learning heuristics. Mach Learn 78, 343–379 (2010). https://doi.org/10.1007/s10994-009-5162-2

Download citation

Received: 14 March 2008
Revised: 02 November 2009
Accepted: 04 November 2009
Published: 09 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10994-009-5162-2

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the quest for optimal rule learning heuristics

Abstract

Article PDF

Similar content being viewed by others

Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning

Learning customized and optimized lists of rules with mathematical programming

On the Trade-Off Between Consistency and Coverage in Multi-label Rule Learning Heuristics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

On the quest for optimal rule learning heuristics

Abstract

Article PDF

Similar content being viewed by others

Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning

Learning customized and optimized lists of rules with mathematical programming

On the Trade-Off Between Consistency and Coverage in Multi-label Rule Learning Heuristics

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation