Efficient and Scalable Induction of Logic Programs Using a Deductive Database System

  • Michel Ferreira
  • Nuno A. Fonseca
  • Ricardo Rocha
  • Tiago Soares
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4455)

Abstract

A consequence of ILP systems being implemented in Prolog or using Prolog libraries is that, usually, these systems use a Prolog internal database to store and manipulate data. However, in real-world problems, the original data is rarely in Prolog format. In fact, the data is often kept in Relational Database Management Systems (RDBMS) and then converted to a format acceptable by the ILP system. Therefore, a more interesting approach is to link the ILP system to the RDBMS and manipulate the data without converting it. This scheme has the advantage of being more scalable since the whole data does not need to be loaded into memory by the ILP system. In this paper we study several approaches of coupling ILP systems with RDBMS systems and evaluate their impact on performance. We propose to use a Deductive Database (DDB) system to transparently translate the hypotheses to relational algebra expressions. The empirical evaluation performed shows that the execution time of ILP algorithms can be effectively reduced using a DDB and that the size of the problems can be increased due to a non-memory storage of the data.

Keywords

Implementation Performance Deductive Databases 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research, 235–242 (2000)Google Scholar
  2. 2.
    Benson, D., Karsch-Mizrachi, I., Lipman, D., Ostell, J., Wheeler, D.: GenBank. Nucleic Acids Research 33, 235–242 (2005)CrossRefGoogle Scholar
  3. 3.
    Wrobel, S.: Inductive Logic Programming for Knowledge Discovery in Databases. In: Relational Data Mining, pp. 74–101. Springer, Heidelberg (2001)Google Scholar
  4. 4.
    Raedt, L.D.: Attribute Value Learning versus Inductive Logic Programming: The Missing Links. In: Page, D.L. (ed.) Inductive Logic Programming. LNCS, vol. 1446, pp. 1–8. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Raedt, L.D., Laer, W.V.: Inductive Constraint Logic. In: International Conference on Algorithmic Learning Theory, pp. 80–94. Springer, Heidelberg (1995)Google Scholar
  6. 6.
    Raedt, L.D., Dehaspe, L.: Clausal Discovery. Machine Learning 26, 99–146 (1997)MATHCrossRefGoogle Scholar
  7. 7.
    Srinivasan, A.: The Aleph Manual (2003), available from http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph
  8. 8.
    Muggleton, S., Firth, J.: Relational Rule Induction with CProgol4.4: A Tutorial Introduction. In: Relational Data Mining, pp. 160–188. Springer, Heidelberg (2001)Google Scholar
  9. 9.
    Fonseca, N.A., Silva, F., Camacho, R.: April - An Inductive Logic Programming System. In: Fisher, M., van der Hoek, W., Konev, B., Lisitsa, A. (eds.) JELIA 2006. LNCS (LNAI), vol. 4160, pp. 481–484. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Soares, T., Ferreira, M., Rocha, R.: The MYDDAS Programmer’s Manual. Technical Report DCC-2005-10, Department of Computer Science, University of Porto (2005)Google Scholar
  11. 11.
    Shen, W.-M., Leng, B.: Metapattern Generation for Integrated Data Mining. In: Knowledge Discovery and Data Mining, pp. 152–157 (1996)Google Scholar
  12. 12.
    Brockhausen, P., Morik, K.: Direct Access of an ILP Algorithm to a Database Management System. In: MLnet Familiarization Workshop on Data Mining with Inductive Logic Programing, pp. 95–100 (1996)Google Scholar
  13. 13.
    Morik, K.: Knowledge Discovery in Databases - an Inductive Logic Programming Approach. In: Foundations of Computer Science: Potential - Theory - Cognition, pp. 429–436. Springer, Heidelberg (1997)Google Scholar
  14. 14.
    Bockhorst, J., Ong, I.M.: FOIL-D: Efficiently Scaling FOIL for Multi-Relational Data Mining of Large Datasets. In: Camacho, R., King, R., Srinivasan, A. (eds.) ILP 2004. LNCS (LNAI), vol. 3194, pp. 63–79. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Botta, M., Giordana, A., Saitta, L., Sebag, M.: Relational Learning as Search in a Critical Region. Journal of Machine Learning Research 4, 431–463 (2003)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Codd, E.F.: A relational model for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)MATHCrossRefGoogle Scholar
  17. 17.
    Ullman, J.D.: Principles of Database and Knowledge-Base Systems. Computer Science Press (1989)Google Scholar
  18. 18.
    Muggleton, S., Raedt, L.D.: Inductive Logic Programming: Theory and Methods. Journal of Logic Programming 19/20, 629–679 (1994)CrossRefGoogle Scholar
  19. 19.
    Soares, T., Rocha, R., Ferreira, M.: Generic Cut Actions for External Prolog Predicates. In: Van Hentenryck, P. (ed.) PADL 2006. LNCS, vol. 3819, pp. 16–30. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  20. 20.
    Blockeel, H., Dehaspe, L., Demoen, B., Janssens, G., Ramon, J., Vandecasteele, H.: Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs. Journal of Machine Learning Research 16, 135–166 (2002)MATHGoogle Scholar
  21. 21.
    Muggleton, S.: Inverse Entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming 13, 245–286 (1995)Google Scholar
  22. 22.
    Blockeel, H., Raedt, L.D.: Top-Down Induction of First-Order Logical Decision Trees. Artificial Intelligence 101, 285–297 (1998)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    McCreath, E., Sharma, A.: Extraction of meta-knowledge to restrict the hypothesis space for ILP systems. In: Australian Joint Conference on Artificial Intelligence, pp. 75–82. World Scientific, Singapore (1995)Google Scholar
  24. 24.
    Santos Costa, V., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Laer, W.V.: Query Transformations for Improving the Efficiency of ILP Systems. Journal of Machine Learning Research 4, 465–491 (2002)CrossRefGoogle Scholar
  25. 25.
    Srinivasan, A.: A study of two sampling methods for analysing large datasets with ILP. Data Mining and Knowledge Discovery 3(1), 95–123 (1999)CrossRefGoogle Scholar
  26. 26.
    DiMaio, F., Shavlik, J.W.: Learning an Approximation to Inductive Logic Programming Clause Evaluation. In: Camacho, R., King, R., Srinivasan, A. (eds.) ILP 2004. LNCS (LNAI), vol. 3194, pp. 80–97. Springer, Heidelberg (2004)Google Scholar
  27. 27.
    Berardi, M., Varlaro, A., Malerba, D.: On the Effect of Caching in Recursive Theory Learning. In: Camacho, R., King, R., Srinivasan, A. (eds.) ILP 2004. LNCS (LNAI), vol. 3194, pp. 44–62. Springer, Heidelberg (2004)Google Scholar
  28. 28.
    Rocha, R., Fonseca, N.A., Santos Costa, V.: On Applying Tabling to Inductive Logic Programming. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 707–714. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  29. 29.
    Fonseca, N.A., Silva, F., Camacho, R.: Strategies to Parallelize ILP Systems. In: Kramer, S., Pfahringer, B. (eds.) ILP 2005. LNCS (LNAI), vol. 3625, pp. 136–153. Springer, Heidelberg (2005)Google Scholar
  30. 30.
    Weber, I.: Discovery of First-Order Regularities in a Relational Database Using Offline Candidate Determination. In: Džeroski, S., Lavrač, N. (eds.) Inductive Logic Programming. LNCS, vol. 1297, pp. 288–295. Springer, Heidelberg (1997)Google Scholar
  31. 31.
    Dehaspe, L., Toironen, H.: Discovery of Relational Association Rules. In: Relational Data Mining, pp. 189–208. Springer, Heidelberg (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Michel Ferreira
    • 1
  • Nuno A. Fonseca
    • 1
  • Ricardo Rocha
    • 1
  • Tiago Soares
    • 1
  1. 1.DCC-FC & LIACC, University of PortoPortugal

Personalised recommendations