Advertisement

Machine Learning

, Volume 74, Issue 3, pp 257–279 | Cite as

Parallel ILP for distributed-memory architectures

  • Nuno A. FonsecaEmail author
  • Ashwin Srinivasan
  • Fernando Silva
  • Rui Camacho
Article

Abstract

The growth of machine-generated relational databases, both in the sciences and in industry, is rapidly outpacing our ability to extract useful information from them by manual means. This has brought into focus machine learning techniques like Inductive Logic Programming (ILP) that are able to extract human-comprehensible models for complex relational data. The price to pay is that ILP techniques are not efficient: they can be seen as performing a form of discrete optimisation, which is known to be computationally hard; and the complexity is usually some super-linear function of the number of examples. While little can be done to alter the theoretical bounds on the worst-case complexity of ILP systems, some practical gains may follow from the use of multiple processors. In this paper we survey the state-of-the-art on parallel ILP. We implement several parallel algorithms and study their performance using some standard benchmarks. The principal findings of interest are these: (1) of the techniques investigated, one that simply constructs models in parallel on each processor using a subset of data and then combines the models into a single one, yields the best results; and (2) sequential (approximate) ILP algorithms based on randomized searches have lower execution times than (exact) parallel algorithms, without sacrificing the quality of the solutions found.

Keywords

ILP Parallelism Efficiency 

References

  1. Blaták, J., & Popelínský, L. (2006). dRAP: a framework for distributed mining first-order frequent patterns. In Proceedings of the 16th conference on inductive logic programming (pp. 25–27). Berlin: Springer. Google Scholar
  2. Boström, H. (2000). Induction of recursive transfer rules. In J. Cussens & S. Džeroski (Eds.), Lecture notes in computer science : Vol. 1925. Learning language in logic (pp. 237–246). Berlin: Springer. CrossRefGoogle Scholar
  3. Botta, M., Giordana, A., Saitta, L., & Sebag, M. (2003). Relational learning as search in a critical region. Journal of Machine Learning Research, 4, 431–463. CrossRefMathSciNetGoogle Scholar
  4. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth. zbMATHGoogle Scholar
  5. Clare, A., & King, R. D. (2003). Data mining the yeast genome in a lazy functional language. In Proceedings of the fifth international symposium on practical aspects of declarative languages (pp. 19–36). Google Scholar
  6. Colton, S., & Muggleton, S. (2003). ILP for mathematical discovery. In Proceedings of the 13th international conference on inductive logic programming (pp. 93–111). Google Scholar
  7. Cussens, J. (1997). Part-of-speech tagging using Progol. In Proceedings of the 7th international workshop on inductive logic programming (pp. 93–108). Google Scholar
  8. Dehaspe, L., & De Raedt, L. (1995). Parallel inductive logic programming. In Proceedings of the MLnet familiarization workshop on statistics, machine learning and knowledge discovery in databases. Google Scholar
  9. Dehaspe, L., Toivonen, H., & King, R. D. (1998). Finding frequent substructures in chemical compounds. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98) (pp. 30–36). Menlo Park: AAAI Press. Google Scholar
  10. Dolšak, B., Bratko, I., & Jezernik, A. (1997). Application of machine learning in finite element computation. In Machine learning, data mining and knowledge discovery: methods and applications. New York: Wiley. Google Scholar
  11. Džeroski, S., Demšar, D., & Grbović, J. (2000). Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence, 13(1), 7–17. CrossRefGoogle Scholar
  12. Everitt, B. S. (1992). The analysis of contingency tables (2nd ed.). London: Chapman and Hall. Google Scholar
  13. Fonseca, N. A., Silva, F., & Camacho, R. (2006). April—an inductive logic programming system. In Lecture notes in artificial intelligence : Vol. 4160. Proceedings of the 10th European conference on logics in artificial intelligence (JELIA06) (pp. 481–484), Liverpool, 2006. Berlin: Springer. Google Scholar
  14. Graham, J., Page, D., & Kamal, A. (2003). Accelerating the drug design process through parallel inductive logic programming data mining. In Proceeding of the computational systems bioinformatics (CSB’03). New York: IEEE. Google Scholar
  15. Grama, A., Gupta, A., Karypis, G., & Kumar, V. (2003). Introduction to parallel computing (2nd ed.). Reading: Addison-Wesley. Google Scholar
  16. King, R. D. (2004). Applying inductive logic programming to predicting gene function. AI Magazine, 25(1), 57–68. Google Scholar
  17. King, R. D., & Srinivasan, A. (1996). Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. Environmental Health Perspectives, 104(5), 1031–1040. CrossRefGoogle Scholar
  18. King, R. D., Muggleton, S., & Sternberg, M. J. E. (1992). Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. In Proceedings of the national academy of sciences (Vol. 89, pp. 11322–11326). Google Scholar
  19. Konstantopoulos, S. K. (2003). A data-parallel version of Aleph. In Proceedings of the workshop on parallel and distributed computing for machine learning, co-located with ECML/PKDD’2003, Dubrovnik, Croatia. Google Scholar
  20. Marchand-Geneste, N., Watson, K. A., Alsberg, B., & King, R. D. (2002). A new approach to pharmacophore mapping and QSAR analysis using inductive logic programming. Application to thermolysin inhibitors and glycogen phosphorylase B inhibitors. Journal of Medicinal Chemistry, 45, 399–409 (Erratum: Journal of Medicinal Chemistry, 46, 653). CrossRefGoogle Scholar
  21. Matsui, T., Inuzuka, N., Seki, H., & Itoh, H. (1992). Comparison of three parallel implementations of an induction algorithm. In 8th international parallel computing workshop (pp. 181–188), Singapore. Google Scholar
  22. Michalski, R. S. (1980). Pattern recognition as rule-guided inductive inference. In Proceedings of IEEE transactions on pattern analysis and machine intelligence (pp. 349–361). Google Scholar
  23. Message Passing Interface Forum. (1994). MPI: a message-passing interface standard (Technical Report UT-CS-94-230). University of Tennessee, Knoxville, TN, USA. Google Scholar
  24. Muggleton, S. (1994). Inductive logic programming: derivations, successes and shortcomings. SIGART Bulletin, 5(1), 5–11. CrossRefGoogle Scholar
  25. Muggleton, S. (1995). Inverse entailment and Progol. New Generation Computing, Special Issue on Inductive Logic Programming, 13(3–4), 245–286. Google Scholar
  26. Muggleton, S., & Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the 1st conference on algorithmic learning theory (pp. 368–381), Ohmsma, Tokyo, Japan. Google Scholar
  27. Muggleton, S., & Feng, C. (1992). Efficient induction in logic programs. In S. Muggleton (Ed.), Proceedings of the 2nd international workshop on inductive logic programming (pp. 281–298). New York: Academic Press. Google Scholar
  28. Muggleton, S., & Firth, J. (2001). Relational rule induction with CProgol4.4: a tutorial introduction. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 160–188). Berlin: Springer. Google Scholar
  29. Ohwada, H., & Mizoguchi, F. (1999). Parallel execution for speeding up inductive logic programming systems. In Lecture notes in artificial intelligence : Vol. 1721. Proceedings of the 9th international workshop on inductive logic programming (pp. 277–286). Berlin: Springer. Google Scholar
  30. Ohwada, H., Nishiyama, H., & Mizoguchi, F. (2000). Concurrent execution of optimal hypothesis search for inverse entailment. In J. Cussens & A. Frisch (Eds.), Lecture notes in artificial intelligence : Vol. 1866. Proceedings of the 10th international conference on inductive logic programming (pp. 165–173). Berlin: Springer. CrossRefGoogle Scholar
  31. Papadimitriou, C. H., & Steiglitz, K. (1982). Combinatorial optimisation. Edgewood-Cliffs: Prentice-Hall. Google Scholar
  32. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning Journal, 5(3), 239–266. Google Scholar
  33. Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In P. Brazdil (Ed.), Proceedings of the 6th European conference on machine learning (Vol. 667, pp. 3–20). Berlin: Springer. Google Scholar
  34. Rocha, R., Fonseca, N. A., & Santos Costa, V. (2005). On applying tabling to inductive logic programming. In Lecture notes in artificial intelligence : Vol. 3720. Proceedings of the 16th European conference on machine learning, ECML-05 (pp. 707–714). Berlin: Springer. Google Scholar
  35. Santos Costa, V., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., & Van Laer, W. (2003). Query transformations for improving the efficiency of ILP systems. Journal of Machine Learning Research, 4, 465–491. CrossRefGoogle Scholar
  36. Sebag, M., & Rouveirol, C. (1997). Tractable induction and classification in first order logic via stochastic matching. In Proceedings of the 15th international joint conference on artificial intelligence (pp. 888–893). San Mateo: Morgan Kaufmann. Google Scholar
  37. Skillicorn, D. B., & Wang, Y. (2001). Parallel and sequential algorithms for data mining using inductive logic. Knowledge and Information Systems, 3(4), 405–421. zbMATHCrossRefGoogle Scholar
  38. Smith, R. G. (1980). The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Transactions on Computers, 29(12), 1104–1113. CrossRefGoogle Scholar
  39. Squyres, J. M., & Lumsdaine, A. (2003). A component architecture for LAM/MPI. In Lecture notes in computer science : Vol. 2840. Proceedings, 10th European PVM/MPI users’ group meeting, Venice, Italy, 2003. Berlin: Springer. Google Scholar
  40. Srinivasan, A. (1999). A study of two sampling methods for analysing large datasets with ILP. Data Mining and Knowledge Discovery, 3(1), 95–123. CrossRefGoogle Scholar
  41. Srinivasan, A. (2000). A study of two probabilistic methods for searching large spaces with ILP (Technical Report PRG-TR-16-00). Oxford University Computing Laboratory. Google Scholar
  42. Srinivasan, A. (2003). The Aleph manual. Available from http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph.
  43. Srinivasan, A., & Kothari, R. (2005). A study of applying dimensionality reduction to restrict the size of a hypothesis space. In Proceedings of the 15th international conference on inductive logic programming (pp. 348–365). Google Scholar
  44. Srinivasan, A., Muggleton, S., King, R. D., & Sternberg, M. J. E. (1994a). Mutagenesis: ILP experiments in a non-determinate biological domain. In S. Wrobel (Ed.), GMD-Studien: Vol. 237. Proceedings of the 4th international workshop on inductive logic programming (pp. 217–232). Google Scholar
  45. Srinivasan, A., Muggleton, S., King, R. D., & Sternberg, M. J. E. (1994b). Mutagenesis: ILP experiments in a non-determinate biological domain. In S. Wrobel (Ed.), GMD-Studien: Vol. 237. Proceedings of the 4th international workshop on inductive logic programming (pp. 217–232). Google Scholar
  46. Srinivasan, A., King, R. D., Muggleton, S., & Sternberg, M. J. E. (1997). Carcinogenesis predictions using ILP. In S. Džeroski & N. Lavrač (Eds.), Proceedings of the 7th international workshop on inductive logic programming (Vol. 1297, pp. 273–287). Berlin: Springer. Google Scholar
  47. Tang, L. R., & Mooney, R. J. (2001). Using multiple clause constructors in inductive logic programming for semantic parsing. In EMCL ’01: proceedings of the 12th European conference on machine learning (pp. 466–477). London, UK, 2001. Berlin: Springer. Google Scholar
  48. Tobudic, A., & Widmer, G. (2003). Relational IBL in music with a new structural similarity measure. In Proceedings of the 13th international conference on inductive logic programming (pp. 365–382). Google Scholar
  49. Todorovski, L., Ljubič, P., & Džeroski, S. (2004). Inducing polynomial equations for regression. In Proceedings of the 15th European conference on machine learning (pp. 441–452). Google Scholar
  50. Turcotte, M., Muggleton, S. H., & Sternberg, M. J. E. (2001). Automated discovery of structural signatures of protein fold and function. Journal of Molecular Biology, 306, 591–605. CrossRefGoogle Scholar
  51. Wielemaker, J. (2003). Native preemptive threads in SWI-Prolog. In C. Palamidessi (Ed.), Lecture notes in artificial intelligence : Vol. 2916. Proceedings of the 19th international conference on logic programming (pp. 331–345). Berlin: Springer. Google Scholar
  52. Železný, F., Srinivasan, A., & Page, D. (2002). Lattice-search runtime distributions may be heavy-tailed. In S. Matwin & C. Sammut (Eds.), Lecture notes in artificial intelligence : Vol. 2583. Proceedings of the 12th international conference on inductive logic programming (pp. 333–345). Berlin: Springer. Google Scholar
  53. Železný, F., Srinivasan, A., & Page, D. (2006). Randomised restarted search in ILP. Machine Learning, 64(1–3), 183–208. zbMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Nuno A. Fonseca
    • 1
    Email author
  • Ashwin Srinivasan
    • 2
    • 3
  • Fernando Silva
    • 4
  • Rui Camacho
    • 5
  1. 1.Instituto de Biologia Molecular e Celular (IBMC) & CRACSUniversidade do PortoPortoPortugal
  2. 2.IBM India Research Laboratory, Block 1Indian Institute of TechnologyNew DelhiIndia
  3. 3.Department of CSE & Centre for Health InformaticsUniversity of New South WalesSydneyAustralia
  4. 4.CRACS & Faculdade de CiênciasUniversidade do PortoPortoPortugal
  5. 5.LIAAD & Faculdade de EngenhariaUniversidade do PortoPortoPortugal

Personalised recommendations