Advertisement

FOIL-D: Efficiently Scaling FOIL for Multi-relational Data Mining of Large Datasets

  • Joseph Bockhorst
  • Irene Ong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3194)

Abstract

Multi-relational rule mining is important for knowledge discovery in relational databases as it allows for discovery of patterns involving multiple relational tables. Inductive logic programming (ILP) techniques have had considerable success on a variety of multi-relational rule mining tasks, however, most ILP systems do not scale to very large datasets. In this paper we present two extensions to a popular ILP system, FOIL, that improve its scalability. (i) We show how to interface FOIL directly to a relational database management system. This enables FOIL to run on data sets that previously had been out of its scope. (ii) We describe estimation methods, based on histograms, that significantly decrease the computational cost of learning a set of rules. We present experimental results that indicate that on a set of standard ILP datasets, the rule sets learned using our extensions are equivalent to those learned with standard FOIL but at considerably less cost.

Keywords

Bayesian Network Inductive Logic Programming Target Relation Relational Database Management System True Gain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood (1994)Google Scholar
  2. 2.
    Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5, 239–2666 (1990)Google Scholar
  3. 3.
    Dimaio, F., Shavlik, J.: Speeding up relational data mining by learning to estimate candidate hypothesis scores. In: Proceedings of the ICDM Workshop on Foundations and New Directions of Data Mining (2003)Google Scholar
  4. 4.
    Tang, L.R., Mooney, R.J., Melville, P.: Scaling up ilp to large examples: Results on link discovery for counter-terrorism. In: Proceedings of the KDD, Workshop on Multi-Relational Data Mining, Washington, DC,pp.107–121 (2003)Google Scholar
  5. 5.
    Mooney, R.J., Melville, P., Tang, L.R., Shavlik, J., Dutra, I., Page, D., Santos Costa, V.: Relational data mining with inductive logic programming for link discovery. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Data Mining: Next Generation Challenges and Future Directions, AAAI Press, Menlo Park (2004) (to appear)Google Scholar
  6. 6.
    Stonebraker, M., Kemnitz, G.: The postgres next-generation database management system. Communications of the ACM 34, 78–92 (1991)CrossRefGoogle Scholar
  7. 7.
    Fu, Y., Han, J.: Meta-rule-guided mining of association rules in relational databases. In: Proc, Int’l Workshop. on Knowledge Discovery and Deductive and Object-Oriented Databases,pp. 39–46 (1995)Google Scholar
  8. 8.
    Brockhausen, P., Morik, K.: Directaccess of an ilp algorithm to a database management system. In: Proceedings of the MLnet Familiarization Workshop, pp.95–110 (1996)Google Scholar
  9. 9.
    Lisi, F.A., Malerba, D.: Inducing multi-level association rules from multiple relation. Machine Learning 55, 175–210 (2004)zbMATHCrossRefGoogle Scholar
  10. 10.
    Ioannidis, Y.E., Poosala, V.: Histogram-based solutions to diverse database estimation problems. IEEE Data Eng. Bull. 18, 10–18 (1995)Google Scholar
  11. 11.
    Ioannidis, Y.E., Poosala, V.: Balancing histogram optimality and practicality for query result size estimation. In: SIGMOD Conference,pp. 233–244 (1995)Google Scholar
  12. 12.
    Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Proceedings of the European Conference on Machine Learning, Vienna, Austria, pp.3–20 (1993)Google Scholar
  13. 13.
    Ramakrishnan, R.: Database Management Systems. McGraw-Hill, New York (1998)Google Scholar
  14. 14.
    Pazzani, M., Kibler, D.: The utility of knowledge in inductive learning. Machine Learning 9, 57–94 (1992)Google Scholar
  15. 15.
    Pearl, J.: Probabalistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)Google Scholar
  16. 16.
    Michalski, R.S., Mozetic̃, I., Hong, J., Lavrac̃, N.: The multipurpose incremental learning system AQ15 and its testing application to three medical domains. In: Proceedings of the Fifth National Conference on Artificial Intelligence, Philadelphia, PA, pp. 1041–1045. Morgan Kaufmann, San Francisco (1986)Google Scholar
  17. 17.
    Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Annual Conference of the Fifth International Joint Conference on Artificial Intelligence, Amherst, MA, pp. 356–362. Lawrence Erlbaum, Mahwah (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Joseph Bockhorst
    • 1
  • Irene Ong
    • 1
  1. 1.Department of Computer SciencesUniversity of WisconsinMadisonUSA

Personalised recommendations