Abstract
The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.
Article PDF
Similar content being viewed by others
References
Akaike, H. (1974). A new look at the statistical model selection. IEEE Transactions on Automatic Control, 19(6), 716–723.
Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://archive.ics.uci.edu/ml/.
Bayardo, R. Jr., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-97) (pp. 145–154).
Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 265–276).
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.
Burges, S. (2006). Meta-Lernen einer Evaluierungs-Funktion für einen Regel-Lerner. Master’s thesis, TU Darmstadt, December 2006 (in German) (English title: Meta-learning of an evaluation function for a rule learner).
Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In L. Aiello (Ed.), Proceedings of the 9th European conference on artificial intelligence (pp. 147–150). ECAI-90, Stockholm, Sweden, 1990. London: Pitman.
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European working session on learning (pp. 151–163). EWSL-91, Porto, Portugal, 1991. Berlin: Springer.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA, July 9–12, 1995. San Mateo: Morgan Kaufmann.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple datasets. Journal of Machine Learning Research, 7, 1–30.
Fan, R.-E., Chen, P.-H., Lin, C.-J., & Joachims, T. (2005). Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6, 1889–1918.
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th international conference on machine learning (pp. 144–151). ICML-98, Madison, WI, 1998. San Mateo: Morgan Kaufmann.
Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54.
Fürnkranz, J. (2004). Modeling rule precision. In J. Fürnkranz (Ed.), Proceedings of the ECML/PKDD-04 workshop on advances in inductive rule learning (pp. 30–45). Pisa, Italy, 2004.
Fürnkranz, J. (2004). Fossil: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Lecture notes in artificial intelligence : Vol. 784. Proceedings of the 7th European conference on machine learning (pp. 122–137). ECML-94, Catania, Italy, 1994. Berlin: Springer.
Fürnkranz, J. (1997). Pruning algorithms for rule learning. Machine Learning, 27(2), 139–171.
Fürnkranz, J., & Flach, P. (2004). An analysis of stopping and filtering criteria for rule learning. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Lecture notes in artificial intelligence : Vol. 3201. Proceedings of the 15th European conference on machine learning (pp. 123–133). ECML-04, Pisa, Italy, 2004. Berlin: Springer.
Fürnkranz, J., & Flach, P. A. (2005). ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Machine Learning, 58(1), 39–77.
Fürnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In W. Cohen & H. Hirsh (Eds.), Proceedings of the 11th international conference on machine learning (pp. 70–77). ML-94, New Brunswick, NJ, 1994. San Mateo: Morgan Kaufmann.
Holte, R., Acker, L., & Porter, B. (1989). Concept learning and the problem of small disjuncts. In Proceedings of the 11th international joint conference on artificial intelligence (pp. 813–818). IJCAI-89, Detroit, MI, 1989. San Mateo: Morgan Kaufmann.
Janssen, F., & Fürnkranz, J. (2007). On meta-learning rule learning heuristics. In Proceedings of the 7th IEEE conference on data mining (pp. 529–534). ICDM-07, Omaha, NE, 2007.
Janssen, F., & Fürnkranz, J. (2008). An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In T. Horvath, J.-F. Boulicaut, & M. Berthold (Eds.), Proceedings of the 11th international conference on discovery science (pp. 40–51). DS-08, Budapest, Hungary, 2008. Berlin: Springer.
Janssen, F., & Fürnkranz, J. (2009). A re-evaluation of the over-searching phenomenon in inductive rule learning. In Proceedings of the SIAM international conference on data mining (pp. 329–340). SDM-09, Sparks, NV, 2009.
Klösgen, W. (1992). Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora. International Journal of Intelligent Systems, 7, 649–673.
Lavrač, N., Flach, P., & Zupan, B. (1999). Rule evaluation measures: a unifying view. In S. Džeroski & P. Flach (Eds.), Proceedings of the 9th international workshop on inductive logic programming (ILP-99) (pp. 174–185). Berlin: Springer.
Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188.
Lavrač, N., Cestnik, B., & Džeroski, S. (1992a). Search heuristics in empirical inductive logic programming. In Logical approaches to machine learning, workshop notes of the 10th European conference on AI, Vienna, Austria, 1992.
Lavrač, N., Cestnik, B., & Džeroski, S. (1992b). Use of heuristics in empirical inductive logic programming. In S. H. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd international workshop on inductive logic programming (ILP-92), Number TM-1182 in ICOT Technical Memorandum, Tokyo, Japan, 1992. Institute for New Generation Computer Technology.
Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th international symposium on information processing (pp. 125–128). Switching Circuits, Vol. A3, FCIP-69, Bled, Yugoslavia, 1969.
Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342.
Mozina, M., Demšar, J., Zabkar, J., & Bratko, I. (2006). Why is rule learning optimistic and how to correct it. In Machine learning: ECML 2006, 17th European conference on machine learning (pp. 330–340).
Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3, 4), 245–286. Special issue on inductive logic programming.
Quinlan, J. (1996). Learning first-order definitions of functions. Journal of Artificial Intelligence Research, 5, 139–161.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning. An artificial intelligence approach (pp. 463–482). Palo Alto: Tioga.
Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill.
Scheffer, T. (2005). Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(3), 381–395.
Tan, P.-N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 32–41). KDD-02, Edmonton, Alberta, 2002.
Thiel, M. (2005). Separate and Conquer Framework und disjunktive Regeln. Master’s thesis, TU Darmstadt, 2005. In German (English title: Separate and conquer framework and disjunctive rules).
Todorovski, L., Flach, P., & Lavrac, N. (2000). Predictive performance of weighted relative accuracy. In D. A. Zighed, J. Komorowski, & J. Zytkow (Eds.), 4th European conference on principles of data mining and knowledge discovery (PKDD2000) (pp. 255–264). Berlin: Springer.
Vapnik, V., Levin, E., & Cun, Y. L. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876.
Witten, I. H., & Frank, E. (2005). Data mining—practical machine learning tools and techniques with java implementations (2nd edn.). San Mateo: Morgan Kaufmann. http://www.cs.waikato.ac.nz/~ml/weka/.
Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In J. Komorowski & J. Zytkow (Eds.), Proc. first European symposium on principles of data mining and knowledge discovery (pp. 78–87). PKDD-97, Berlin, 1997. Berlin: Springer.
Wu, T., Chen, Y., & Han, J. (2007). Association mining in large databases: a re-examination of its measures. In Proceedings of the 11th European symposium on principles of data mining and knowledge discovery (pp. 621–628). PKDD-07, Warsaw, Poland, 2007. Berlin: Springer.
Xiong, H., Shekhar, S., Tan, P.-N., & Kumar, V. (2004). Exploiting a support-based upper bound of Pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 334–343). KDD-04, Seattle, USA, 2004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Hendrik Blockeel.
Rights and permissions
About this article
Cite this article
Janssen, F., Fürnkranz, J. On the quest for optimal rule learning heuristics. Mach Learn 78, 343–379 (2010). https://doi.org/10.1007/s10994-009-5162-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5162-2