Abstract
The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.
Article PDF
Similar content being viewed by others
References
Blaták, J., & Popelínský, L. (2006). dRAP: a framework for distributed mining first-order frequent patterns. In Proceedings of the 16th conference on inductive logic programming (pp. 25–27). Berlin: Springer.
Boström, H. (2000). Induction of recursive transfer rules. In J. Cussens & S. Džeroski (Eds.), Lecture notes in computer science : Vol. 1925. Learning language in logic (pp. 237–246). Berlin: Springer.
Botta, M., Giordana, A., Saitta, L., & Sebag, M. (2003). Relational learning as search in a critical region. Journal of Machine Learning Research, 4, 431–463.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
Clare, A., & King, R. D. (2003). Data mining the yeast genome in a lazy functional language. In Proceedings of the fifth international symposium on practical aspects of declarative languages (pp. 19–36).
Colton, S., & Muggleton, S. (2003). ILP for mathematical discovery. In Proceedings of the 13th international conference on inductive logic programming (pp. 93–111).
Cussens, J. (1997). Part-of-speech tagging using Progol. In Proceedings of the 7th international workshop on inductive logic programming (pp. 93–108).
Dehaspe, L., & De Raedt, L. (1995). Parallel inductive logic programming. In Proceedings of the MLnet familiarization workshop on statistics, machine learning and knowledge discovery in databases.
Dehaspe, L., Toivonen, H., & King, R. D. (1998). Finding frequent substructures in chemical compounds. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) (pp. 30–36). Menlo Park: AAAI Press.
Dolšak, B., Bratko, I., & Jezernik, A. (1997). Application of machine learning in finite element computation. In Machine learning, data mining and knowledge discovery: methods and applications. New York: Wiley.
Džeroski, S., Demšar, D., & Grbović, J. (2000). Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence, 13(1), 7–17.
Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). London: Chapman and Hall.
Fonseca, N. A., Silva, F., & Camacho, R. (2006). April—an inductive logic programming system. In Lecture notes in artificial intelligence : Vol. 4160. Proceedings of the 10th European conference on logics in artificial intelligence (JELIA06) (pp. 481–484), Liverpool, 2006. Berlin: Springer.
Graham, J., Page, D., & Kamal, A. (2003). Accelerating the drug design process through parallel inductive logic programming data mining. In Proceeding of the computational systems bioinformatics (CSB’03). New York: IEEE.
Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing (2nd ed.). Reading: Addison-Wesley.
King, R. D. (2004). Applying inductive logic programming to predicting gene function. AI Magazine, 25(1), 57–68.
King, R. D., & Srinivasan, A. (1996). Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5), 1031–1040.
King, R. D., Muggleton, S., & Sternberg, M. J. E. (1992). Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. In Proceedings of the national academy of sciences (Vol. 89, pp. 11322–11326).
Konstantopoulos, S. K. (2003). A data-parallel version of Aleph. In Proceedings of the workshop on parallel and distributed computing for machine learning, co-located with ECML/PKDD’2003, Dubrovnik, Croatia.
Marchand-Geneste, N., Watson, K. A., Alsberg, B., & King, R. D. (2002). A new approach to pharmacophore mapping and QSAR analysis using inductive logic programming. Application to thermolysin inhibitors and glycogen phosphorylase B inhibitors. Journal of Medicinal Chemistry, 45, 399–409 (Erratum: Journal of Medicinal Chemistry, 46, 653).
Matsui, T., Inuzuka, N., Seki, H., & Itoh, H. (1992). Comparison of three parallel implementations of an induction algorithm. In 8th international parallel computing workshop (pp. 181–188), Singapore.
Michalski, R. S. (1980). Pattern recognition as rule-guided inductive inference. In Proceedings of IEEE transactions on pattern analysis and machine intelligence (pp. 349–361).
Message Passing Interface Forum. (1994). MPI: a message-passing interface standard (Technical Report UT-CS-94-230). University of Tennessee, Knoxville, TN, USA.
Muggleton, S. (1994). Inductive logic programming: derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11.
Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming, 13(3–4), 245–286.
Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the 1st conference on algorithmic learning theory (pp. 368–381), Ohmsma, Tokyo, Japan.
Muggleton, S., & Feng, C. (1992). Efficient induction in logic programs. In S. Muggleton (Ed.), Proceedings of the 2nd international workshop on inductive logic programming (pp. 281–298). New York: Academic Press.
Muggleton, S., & Firth, J. (2001). Relational rule induction with CProgol4.4: a tutorial introduction. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 160–188). Berlin: Springer.
Ohwada, H., & Mizoguchi, F. (1999). Parallel execution for speeding up inductive logic programming systems. In Lecture notes in artificial intelligence : Vol. 1721. Proceedings of the 9th international workshop on inductive logic programming (pp. 277–286). Berlin: Springer.
Ohwada, H., Nishiyama, H., & Mizoguchi, F. (2000). Concurrent execution of optimal hypothesis search for inverse entailment. In J. Cussens & A. Frisch (Eds.), Lecture notes in artificial intelligence : Vol. 1866. Proceedings of the 10th international conference on inductive logic programming (pp. 165–173). Berlin: Springer.
Papadimitriou, C. H., & Steiglitz, K. (1982). Combinatorial optimisation. Edgewood-Cliffs: Prentice-Hall.
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning Journal, 5(3), 239–266.
Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In P. Brazdil (Ed.), Proceedings of the 6th European conference on machine learning (Vol. 667, pp. 3–20). Berlin: Springer.
Rocha, R., Fonseca, N. A., & Santos Costa, V. (2005). On applying tabling to inductive logic programming. In Lecture notes in artificial intelligence : Vol. 3720. Proceedings of the 16th European conference on machine learning, ECML-05 (pp. 707–714). Berlin: Springer.
Santos Costa, V., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., & Van Laer, W. (2003). Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research, 4, 465–491.
Sebag, M., & Rouveirol, C. (1997). Tractable induction and classification in first order logic via stochastic matching. In Proceedings of the 15th international joint conference on artificial intelligence (pp. 888–893). San Mateo: Morgan Kaufmann.
Skillicorn, D. B., & Wang, Y. (2001). Parallel and sequential algorithms for data mining using inductive logic. Knowledge and Information Systems, 3(4), 405–421.
Smith, R. G. (1980). The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Transactions on Computers, 29(12), 1104–1113.
Squyres, J. M., & Lumsdaine, A. (2003). A component architecture for LAM/MPI. In Lecture notes in computer science : Vol. 2840. Proceedings, 10th European PVM/MPI users’ group meeting, Venice, Italy, 2003. Berlin: Springer.
Srinivasan, A. (1999). A study of two sampling methods for analysing large datasets with ILP. Data Mining and Knowledge Discovery, 3(1), 95–123.
Srinivasan, A. (2000). A study of two probabilistic methods for searching large spaces with ILP (Technical Report PRG-TR-16-00). Oxford University Computing Laboratory.
Srinivasan, A. (2003). The Aleph manual. Available from http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph.
Srinivasan, A., & Kothari, R. (2005). A study of applying dimensionality reduction to restrict the size of a hypothesis space. In Proceedings of the 15th international conference on inductive logic programming (pp. 348–365).
Srinivasan, A., Muggleton, S., King, R. D., & Sternberg, M. J. E. (1994a). Mutagenesis: ILP experiments in a non-determinate biological domain. In S. Wrobel (Ed.), GMD-Studien: Vol. 237. Proceedings of the 4th international workshop on inductive logic programming (pp. 217–232).
Srinivasan, A., Muggleton, S., King, R. D., & Sternberg, M. J. E. (1994b). Mutagenesis: ILP experiments in a non-determinate biological domain. In S. Wrobel (Ed.), GMD-Studien: Vol. 237. Proceedings of the 4th international workshop on inductive logic programming (pp. 217–232).
Srinivasan, A., King, R. D., Muggleton, S., & Sternberg, M. J. E. (1997). Carcinogenesis predictions using ILP. In S. Džeroski & N. Lavrač (Eds.), Proceedings of the 7th international workshop on inductive logic programming (Vol. 1297, pp. 273–287). Berlin: Springer.
Tang, L. R., & Mooney, R. J. (2001). Using multiple clause constructors in inductive logic programming for semantic parsing. In EMCL ’01: proceedings of the 12th European conference on machine learning (pp. 466–477). London, UK, 2001. Berlin: Springer.
Tobudic, A., & Widmer, G. (2003). Relational IBL in music with a new structural similarity measure. In Proceedings of the 13th international conference on inductive logic programming (pp. 365–382).
Todorovski, L., Ljubič, P., & Džeroski, S. (2004). Inducing polynomial equations for regression. In Proceedings of the 15th European conference on machine learning (pp. 441–452).
Turcotte, M., Muggleton, S. H., & Sternberg, M. J. E. (2001). Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology, 306, 591–605.
Wielemaker, J. (2003). Native preemptive threads in SWI-Prolog. In C. Palamidessi (Ed.), Lecture notes in artificial intelligence : Vol. 2916. Proceedings of the 19th international conference on logic programming (pp. 331–345). Berlin: Springer.
Železný, F., Srinivasan, A., & Page, D. (2002). Lattice-search runtime distributions may be heavy-tailed. In S. Matwin & C. Sammut (Eds.), Lecture notes in artificial intelligence : Vol. 2583. Proceedings of the 12th international conference on inductive logic programming (pp. 333–345). Berlin: Springer.
Železný, F., Srinivasan, A., & Page, D. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Hendrik Blockeel.
This is an extended version of the paper entitled Strategies to Parallelize ILP Systems, published in the Proceedings of the 15th International Conference on Inductive Logic Programming (ILP 2005), vol. 3625 of LNAI, pp. 136–153, Springer-Verlag.
Rights and permissions
About this article
Cite this article
Fonseca, N.A., Srinivasan, A., Silva, F. et al. Parallel ILP for distributed-memory architectures. Mach Learn 74, 257–279 (2009). https://doi.org/10.1007/s10994-008-5094-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5094-2