Advertisement

Parallel Algorithms for Multirelational Data Mining: Application to Life Science Problems

  • Rui Camacho
  • Jorge G. BarbosaEmail author
  • Altino Sampaio
  • João Ladeiras
  • Nuno A. Fonseca
  • Vítor S. Costa
Chapter
Part of the Computer Communications and Networks book series (CCN)

Abstract

Data Mining (DM) algorithms are able to construct models from available data that can be very useful for both business and science. However, a powerful representation language is required to express the highly complex models that stem from structured data. Multirelational algorithms can then take advantage of this representation for both data and models. The drawback is that for very large or highly complex domains multirelational algorithms may require long running times. This problem can be substantially reduced using parallel implementations. In this chapter, we present a survey on parallel approaches to run Inductive Logic Programming (ILP), a flavor of multirelational algorithms. We also analyze different scheduling approaches for those implementations and describe two applications where the proposed approaches may be very useful.

Keywords

Inductive logic programming knowledge discovery supervised learning independent and-parallelism load balancing 

References

  1. 1.
    Alves, A., Camacho, R., Oliveira, E.: Discovery of functional relationships in multi-relational data using inductive logic programming. In: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 Nov 2004, Brighton, UK, pp. 319–322 (2004)Google Scholar
  2. 2.
    EC Amazon: Amazon elastic compute cloud (amazon ec2), 2010. https://aws.amazon.com/ec2/
  3. 3.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)Google Scholar
  4. 4.
    Blaták J., Popelínský, L.: dRAP: A framework for distributed mining firts-order frequent patterns. In: Proceedings of the 16th Conference on Inductive Logic Programming, pp. 25–27. Springer, (2006)Google Scholar
  5. 5.
    Bone, P., Somogyi, Z., Schachte, P.: Estimating the overlap between dependent computations for automatic parallelization. TPLP 11(4–5), 575–591 (2011)Google Scholar
  6. 6.
    Bratko, I., Muggleton, S., Varsek, A.: Learning qualitative models of dynamic systems. In: Proceedings of the Eighth International Machine Learning Workshop, San Mateo, Ca, 1991. Morgan-KaufmannGoogle Scholar
  7. 7.
    Buntine, W.: Generalised subsumption and its applications to induction and redundancy. Artif. Intell. J. 36(2):149–176 (1988). revised version of the paper that won the A.I. Best Paper Award at ECAI-86Google Scholar
  8. 8.
    Calheiros, R.N., Ranjan, R., Beloglazov, A., De Rose, C.A.F., Buyya, R.: Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw.: Pract. Experience 41(1):23–50 (2011)Google Scholar
  9. 9.
    Clare, A., King, R.D.: Data mining the yeast genome in a lazy functional language. In: Proceedings of the Fifth International Symposium on Practical Aspects of Declarative Languages, pp. 19–36 (2003)Google Scholar
  10. 10.
    Costa, V.S., de Castro Dutra, I., Rocha, R.: Threads and or-parallelism unified. TPLP 10(4–6), 417–432 (2010)Google Scholar
  11. 11.
    Dasgupta, K., Mandal, B., Dutta, P., Kumar Mandal, J., Dam, S.: A genetic algorithm (ga) based load balancing strategy for cloud computing. Proc. Technol. 10, 340–347 (2013)Google Scholar
  12. 12.
    Jeffrey, D., Sanjay, G.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)Google Scholar
  13. 13.
    Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases (1995)Google Scholar
  14. 14.
    Delgado, J., Salah Eddin, A., Adjouadi, M., Sadjadi, S.M.: Paravirtualization for scientific computing: performance analysis and prediction. In: 2011 IEEE 13th International Conference on High Performance Computing and Communications (HPCC), pp. 536–543. IEEE (2011)Google Scholar
  15. 15.
    Fayyad, U.M., Uthurusamy, R., (eds.) In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), Montreal, Canada, August 20-21, 1995. AAAI Press (1995)Google Scholar
  16. 16.
    Nuno, A., Ashwin Srinivasan, F., Silva, F.M.A., Camacho, R.: Parallel ilp for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)Google Scholar
  17. 17.
    Message Passing Interface Forum: MPI: A message-passing interface standard. Technical Report UT-CS-94-230, University of Tennessee, Knoxville, TN, USA (1994)Google Scholar
  18. 18.
    Ian Foster and Carl Kesselman: The Grid 2: Blueprint for a new computing infrastructure. Elsevier (2003)Google Scholar
  19. 19.
    Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I.: Above the clouds: A berkeley view of cloud computing. Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS 28:13 (2009)Google Scholar
  20. 20.
    Garnett, M.J., Edelman, E.J., Heidorn, S.J., Greenman, C.D., Dastur, A., Lau, K.W., Patricia Greninger, I., Thompson, R., Luo, X., Soares, J., et al.: Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483(7391), 570–575 (2012)CrossRefGoogle Scholar
  21. 21.
    Cloud Google: Google cloud platform. https://cloud.google.com/
  22. 22.
    Graham, J., Page, D., Kamal, A.: Accelerating the drug design process through parallel inductive logic programming data mining. In: Proceeding of the Computational Systems Bioinformatics (CSB’03). IEEE (2003)Google Scholar
  23. 23.
    Gupta, G., Pontelli, E., Ali, K.A.M., Carlsson, M., Hermenegildo, M.V.: Parallel execution of prolog programs: a survey. ACM Trans. Program. Lang. Syst. 23(4), 472–602 (2001)Google Scholar
  24. 24.
    Huang, W., Liu, J., Abali, B., Panda, D.K.: A case for high performance computing with virtual machines. In: Proceedings of the 20th annual international conference on Supercomputing, pp. 125–134. ACM (2006)Google Scholar
  25. 25.
    Juve, G., Deelman, E.: Scientific workflows and clouds. Crossroads 16(3), 14–18 (2010)CrossRefGoogle Scholar
  26. 26.
    King, R., Muggleton S., Lewis, R., Sternberg, M.: Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. National Acad. Sci. 89(23) (1992)Google Scholar
  27. 27.
    King, R., Sternberg, M.J.E.: A machine learning approach for the prediction of protein secondary structure. J. Mol. Biol. 216, 441–457 (1990)CrossRefGoogle Scholar
  28. 28.
    Konstantopoulos, S.K.: A data-parallel version of Aleph. In: Proceedings of the Workshop on Parallel and Distributed Computing for Machine Learning, co-located with ECML/PKDD’2003, Dubrovnik, Croatia, 2003Google Scholar
  29. 29.
    Krishna, P.V.: Honey bee behavior inspired load balancing of tasks in cloud computing environments. Appl. Soft Comput. 13(5), 2292–2303 (2013)Google Scholar
  30. 30.
    Lloyd, J.W.: Foundations of Logic Programming. Springer, New York (1984)CrossRefzbMATHGoogle Scholar
  31. 31.
    Martinez-Angeles, C.A., de Castro Dutra, I., Costa, V.S., Buenabad-Chavez, J.: A datalog engine for gpus. In: Declarative Programming and Knowledge Management - Declarative Programming Days, KDPD 2013, Unifying INAP, WFLP, and WLP, Kiel, Germany, September 11-13, 2013, Revised Selected Papers, vol. 8439 of Lecture Notes in Computer Science, pp. 152–168. Springer (2013)Google Scholar
  32. 32.
    Matsui, T., Inuzuka, N., Seki, H., Itoh, H.: Comparison of three parallel implementations of an induction algorithm. In: 8th International Parallel Computing Workshop, pp. 181–188. Singapore (1998)Google Scholar
  33. 33.
    Mauch, Viktor, Kunze, Marcel, Hillenbrand, Marius: High performance cloud computing. Future Gen. Comput. Syst. 29(6), 1408–1416 (2013)CrossRefGoogle Scholar
  34. 34.
    Menden, M.P., Iorio, F., Garnett, M., McDermott, U., Benes, C.H., Ballester, P.J., Saez-Rodriguez, J.: Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PloS one 8(4), e61318 (2013)Google Scholar
  35. 35.
    Michalski, R.S.: Pattern recognition as rule-guided inductive inference. In: Proceedings of IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 349–361 (1980)Google Scholar
  36. 36.
    Mitchell, T.M.: Generalization as search. Artificial intell. 18(2), 203–226 (1982)Google Scholar
  37. 37.
    Muggleton, S.: Inductive logic programming. New Gener. Comput. 8(4), 295–317 (1991)CrossRefzbMATHGoogle Scholar
  38. 38.
    Muggleton, S.: Inductive logic programming: derivations, successes and shortcomings. In: Proceedings of the European Conference on Machine Learning: ECML-93, pp. 21–37, Vienna, Austria, April 1993Google Scholar
  39. 39.
    Muggleton, S.: Inverse entailment and progol. New Gener. Comput. Special issue on Inductive Logic Programming 13(3–4), 245–286 (1995)Google Scholar
  40. 40.
    Muggleton, S.: Learning from positive data. In: Inductive Logic Programming, 6th International Workshop, ILP-96, Stockholm, Sweden, August 26-28, 1996, Selected Papers, pp. 358–376 (1996)Google Scholar
  41. 41.
    Muggleton, S., Firth, J.: Relational rule induction with CProgol4.4: a tutorial introduction. In: Džeroski, S., Lavrač, N. (eds.) Relational Data Mining, pp. 160–188. Springer (2001)Google Scholar
  42. 42.
    Muggleton, S., De Raedt, L.: Inductive logic programming: theory and methods. J. Logic Program. 19(20), 629–679 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Ohwada, H., Mizoguchi, F.: Parallel execution for speeding up inductive logic programming systems. In: Proceedings of the 9th International Workshop on Inductive Logic Programming, number 1721 in LNAI, pp. 277–286. Springer (1999)Google Scholar
  44. 44.
    Ohwada, H., Nishiyama, H., Mizoguchi, F.: Concurrent execution of optimal hypothesis search for inverse entailment. In: Cussens, J., Frisch, A. (eds.) Proceedings of the 10th International Conference on Inductive Logic Programming, vol. 1866 of LNAI, pp. 165–173. Springer (2000)Google Scholar
  45. 45.
    Pacini, Elina, Mateos, Cristian, Garino, Carlos García: Distributed job scheduling based on swarm intelligence: a survey. Comput. Electr. Eng. 40(1), 252–269 (2014)CrossRefGoogle Scholar
  46. 46.
    Plotkin, G.D.: A note on inductive generalisation, pp. 153–163. In: Meltzer, B., Michie, D. (eds.) Edinburgh University Press, Edinburgh (1969)Google Scholar
  47. 47.
    Quinlan, J.R., Cameron-Jones, R.M.: FOIL: A midterm report. In: Brazdil, P. (ed.) Proceedings of the 6th European Conference on Machine Learning, vol. 667, pp. 3–20. Springer (1993)Google Scholar
  48. 48.
    Ramakrishnan, L., Zbiegel, P.T., Campbell, S., Bradshaw, R., Canon, R.S., Coghlan, S., Sakrejda, I., Desai, N., Declerck, T., Liu, A.: Magellan: experiences from a science cloud. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 49–58. ACM (2011)Google Scholar
  49. 49.
    Ramezani, F., Lu, J., Hussain, F.: Task based system load balancing approach in cloud environments. In: Knowledge Engineering and Management, pp. 31–42. Springer (2014)Google Scholar
  50. 50.
    Ramezani, F., Jie, L., Hussain, F.K.: Task-based system load balancing in cloud computing using particle swarm optimization. Int. J. Parallel Programm. 42(5), 739–754 (2014)Google Scholar
  51. 51.
    RD1, K., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.: Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proc Natl Acad Sci USA, 9(93(1)):438–42 (1996)Google Scholar
  52. 52.
    Reinaldo, F., Fernandes, C., Rahman, A., Malucelli, A., Camacho, R.: Assessing the eligibility of kidney transplant donors. In: Machine Learning and Data Mining in Pattern Recognition, 6th International Conference, MLDM 2009, Leipzig, Germany, July 23–25, 2009. Proceedings, pp. 802–809 (2009)Google Scholar
  53. 53.
    Robinson, J.A.: A machine-oriented logic based on the resolution principle. J. ACM 12(1), 23–41 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Vítor, S.C., Ashwin, S., Rui, C., Hendrik, B., Bart, D., Gerda, J., Jan, S., Henk, V., Wim, V.L: Query transformations for improving the efficiency of ILP systems. J. Mach. Learn. Res. 4, 465–491 (2003)Google Scholar
  55. 55.
    Skillicorn, David B., Wang, Yu.: Parallel and sequential algorithms for data mining using inductive logic. Knowl. Inf. Syst. 3(4), 405–421 (2001)CrossRefzbMATHGoogle Scholar
  56. 56.
    Smith, R.G.: The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Trans. Comput. 29(12), 1104–1113 (1980)CrossRefGoogle Scholar
  57. 57.
  58. 58.
    Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Inductive Logic Programming, 7th International Workshop, ILP-97, Prague, Czech Republic, Sept. 17–20, 1997, Proceedings, pp. 273–287 (1997)Google Scholar
  59. 59.
    Fonseca, N.A., Pereira, M., Santos Costa, V., Camacho, R.: Interactive discriminative mining of chemical fragments. In: Proceedings of the 2010 International Conference on Inductive Logic Programming (ILP 2010), number 6489 in Lecture Notes in Artificial Intelligence, pp. 59–66. Springer (2011)Google Scholar
  60. 60.
    White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Inc. (2012)Google Scholar
  61. 61.
    Wielemaker, J.: Native preemptive threads in SWI-Prolog. In: Palamidessi, C. (ed.) Proceedings of the 19th International Conference on Logic Programming, vol. 2916 of LNAI, pp. 331–345. Springer (2003)Google Scholar
  62. 62.
    Wirth., R.: Learning by failure to prove. In: Proceedings Third European Working Session on Learning, pp. 237–251. London (1988) PitmanGoogle Scholar
  63. 63.
    Woo, Y.T., Lai, D., McLain, J.L., Manibusan, M.K., Dellarco, V.: Use of mechanism-based structure-activity relationships analysis in carcinogenic potential ranking for drinking water disinfection by-products. Environ. Health Perspect 110((Suppl 1)), 75–87 (2002)Google Scholar
  64. 64.
    Wu, F., Wu, Q., Tan, Y.: Workflow scheduling in cloud: a survey. J. Supercomput. 1–46 (2015)Google Scholar
  65. 65.
    Xu, Y., Wu, L., Guo, L., Chen, Z., Yang, L., Shi, Z.: An intelligent load balancing algorithm towards efficient cloud computing. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)Google Scholar
  66. 66.
    Zaverucha, G., Santos Costa, V., Paes, A. (eds.) Inductive logic programming—23rd International Conference, ILP 2013, Rio de Janeiro, Brazil, August 28-30, 2013, Revised Selected Papers, volume 8812 of Lecture Notes in Computer Science. Springer (2014)Google Scholar
  67. 67.
    Zhan, Z.H., Fang Liu, X., Jiao Gong, Y., Zhang, J., Shu-Hung Chung, H., Li. Y.: Cloud computing resource scheduling and a survey of its evolutionary approaches. ACM Computing Surveys (CSUR) 47(4), 63 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Rui Camacho
    • 1
  • Jorge G. Barbosa
    • 1
    Email author
  • Altino Sampaio
    • 2
  • João Ladeiras
    • 1
  • Nuno A. Fonseca
    • 3
  • Vítor S. Costa
    • 4
  1. 1.DEI & Faculty of Engineering of University of PortoPortoPortugal
  2. 2.IPP, Escola Superior de Tecnologia e Gestão de FelgueirasCIICESIPortugal
  3. 3.EMBL-European Bioinformatics InstituteWelcome Trust Genome CampusHinxtonUK
  4. 4.DCC & Faculty of Sciences of University of PortoPortoPortugal

Personalised recommendations