Relational Learning with GPUs: Accelerating Rule Coverage

  • Carlos Alberto Martínez-Angeles
  • Haicheng Wu
  • Inês Dutra
  • Vítor Santos Costa
  • Jorge Buenabad-Chávez
Article

Abstract

Relational learning algorithms mine complex databases for interesting patterns. Usually, the search space of patterns grows very quickly with the increase in data size, making it impractical to solve important problems. In this work we present the design of a relational learning system, that takes advantage of graphics processing units (GPUs) to perform the most time consuming function of the learner, rule coverage. To evaluate performance, we use four applications: a widely used relational learning benchmark for predicting carcinogenesis in rodents, an application in chemo-informatics, an application in opinion mining, and an application in mining health record data. We compare results using a single and multiple CPUs in a multicore host and using the GPU version. Results show that the GPU version of the learner is up to eight times faster than the best CPU version.

Keywords

Relational learning Inductive logic Logic programming Datalog Relational databases Parallel computing GPUs 

References

  1. 1.
    Afrati, F.N., Borkar, V., Carey, M., Polyzotis, N., Ullman, J.D.: Cluster computing, recursion and datalog. In: Proceedings of the First International Conference on Datalog Reloaded, Datalog’10, pp. 120–144. Springer, Berlin (2011)Google Scholar
  2. 2.
    Beeri, C., Ramakrishnan, R.: On the power of magic. J. Log. Program. 10(3&4), 255–299 (1991)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Bekkerman, R., Bilenko, M., Langford, J. (eds.): Scaling up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)Google Scholar
  4. 4.
    Chakrabarti, D., Faloutsos, C.: Graph mining: laws, generators, and algorithms. ACM Comput. Surv. 38(1) (2006). doi:10.1145/1132952.1132954
  5. 5.
    Collins, J.M.: The DTP AIDS antiviral screen program (1999). http://dtp.nci.nih.gov/docs/aids/aidsdata.html
  6. 6.
    Côrte-Real, J., Dutra, I., Rocha, R.: A map-reduce constructor for prolog. In: Proceedings of the International Conference on Principles and Practice of Declarative Programming (PPDP) (2013)Google Scholar
  7. 7.
    Costa, V.S., Sagonas, K., Lopes, R.: Demand-driven indexing of prolog clauses. In: Veronica D., Ilkka N. (eds.) Proceedings of the 23rd International Conference on Logic Programming, volume 4670 of Lecture Notes in Computer Science, pp. 305–409. Springer (2007)Google Scholar
  8. 8.
    Costa, V.S., Srinivasan, A., Camacho, R., Blockeel, H., Demoen, B., Janssens, G., Struyf, J., Vandecasteele, H., Van Laer, W.: Query transformations for improving the efficiency of ilp systems. J. Mach. Learn. Res. 4, 465–491 (2003)MATHGoogle Scholar
  9. 9.
    Costa, V.S., Rocha, R., Damas, L.: The yap prolog system. Theory Pract. Log. Program. 12(1–2), 5–34 (2012)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
  11. 11.
    Dastgeer, U., Li, L., Kessler, C.: Smart containers and skeleton programming for GPU-based systems. In: Proceedings 7th International Symposium on High-Level Parallel Programming and Applications (HLPP’14), Amsterdam (2014)Google Scholar
  12. 12.
    De Raedt, L.: Logical and Relational Learning. Springer, Berlin (2008)CrossRefMATHGoogle Scholar
  13. 13.
    Dehaspe, L., De Raedt, L.: Parallel inductive logic programming. In: In Proceedings of the MLnet Familiarization Workshop on Statistics, Machine Learning and Knowledge Discovery in Databases, pp. 112–117 (1995)Google Scholar
  14. 14.
    Diamos, G., Wu, H., Lele, A., Wang, J., Yalamanchili, S.: Efficient relational algebra algorithms and data structures for GPU. Technical report, Georgia Institute of Technology (2012)Google Scholar
  15. 15.
    Diamos, G., Wu, H., Wang, J., Lele, A., Yalamanchili, S.: Relational algorithms for multi-bulk-synchronous processors. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, New York, NY, USA, pp. 301–302. ACM (2013)Google Scholar
  16. 16.
    Fonseca, N.A., Srinivasan, A., Silva, F.M.A., Camacho, R.: Parallel ILP for distributed-memory architectures. Mach. Learn. 74(3), 257–279 (2009)CrossRefGoogle Scholar
  17. 17.
    Gavanelli, M., Riguzzi, F., Milano, M., Cagnoli, P.: Constraint and optimization techniques for supporting policy making. In: Yu, T., Chawla, N., Simoff, S. (eds) Computational Intelligent Data Analysis for Sustainable Development, Data Mining and Knowledge Discovery Series, chap. 12, pp. 361–382. Chapman & Hall/CRC, Abingdon (2013)Google Scholar
  18. 18.
    Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: a tutorial. In: Proceedings of the Second International Conference on Datalog in Academia and Industry, Datalog 2.0’12, pp. 1–8. Springer, Berlin (2012)Google Scholar
  19. 19.
    Green, O., McColl, R., Bader, D.A.: GPU merge path: a GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS ’12, New York, NY, USA, pp. 331–340. ACM (2012)Google Scholar
  20. 20.
    He, B., Mian, L., Yang, K., Fang, R., Govindaraju, N.K., Luo, Q., Sander, P.V.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21:1–21:39 (2009)CrossRefGoogle Scholar
  21. 21.
    Huang, S.S., Green, T.J., Loo, B.T.: Datalog and emerging applications: an interactive tutorial. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD ’11, New York, NY, USA, pp. 1213–1216. ACM (2011)Google Scholar
  22. 22.
    Martínez-Angeles, C.A., Dutra, I., Costa, V.S., Buenabad-Chávez, J.: A datalog engine for GPUs. In: WFLP-2013: 22nd International Workshop on Functional and (Constraint) Logic Programming, Kiel, Germany, 11–13 Sept, pp. 239–253 (2013)Google Scholar
  23. 23.
    Muggleton, S.: Inverse entailment and progol. New Gener. Comput. 13, 245–286 (1995)CrossRefGoogle Scholar
  24. 24.
    Odeh, S., Green, O., Mwassi, Z., Shmueli, O., Birk, Y.: Merge path—parallel merging made simple. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 1611–1618 (2012)Google Scholar
  25. 25.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2012)Google Scholar
  26. 26.
  27. 27.
    Ryan, P.B., Schuemie, M.J.: Evaluating performance of risk identification methods through a large-scale simulation of observational data. Drug Saf. 36(1), 171–180 (2013)CrossRefGoogle Scholar
  28. 28.
    Sean Baxter: modern GPU library—tutorial. http://nvlabs.github.io/moderngpu/index.html (visited in Jan 2015) (2013)
  29. 29.
    Srinivasan, A.: The Aleph manual. University of Oxford, England (2001). http://www.cs.ox.ac.uk/activities/machlearn/Aleph/aleph.html
  30. 30.
    Srinivasan, A., King, R.D., Muggleton, S.H., Sternberg, M.J.E.: Carcinogenesis predictions using ILP. In: Lavrac, N., Dszeroski, S. (eds.) Inductive Logic Programming, volume 1297 of Lecture Notes in Computer Science, pp. 273–287. Springer, Berlin (1997)Google Scholar
  31. 31.
    Srinivasan, A., Faruquie, T.A., Joshi, S.: Data and task parallelism in ILP using MapReduce. Mach. Learn. 86(1), 141–168 (2012)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Taskar, B., Getoor, L.: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007)MATHGoogle Scholar
  33. 33.
    Tekle, K.T., Liu, Y.A.: More efficient datalog queries: subsumptive tabling beats magic sets. In: SIGMOD Conference, pp. 661–672 (2011)Google Scholar
  34. 34.
    Thrust: a parallel template library. http://thrust.github.io/
  35. 35.
    TPC-H transaction processing performance council benchmark H. http://www.tpc.org/tpch/
  36. 36.
    Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. I. Computer Science Press, Rockville (1988)Google Scholar
  37. 37.
    Ullman, J.D.: Principles of Database and Knowledge-Base Systems, vol. II. Computer Science Press, Rockville (1989)Google Scholar
  38. 38.
    Weislow, O.S., Kiser, R., Fine, D.L., Bader, J., Shoemaker, R.H., Boyd, M.R.: New soluble-formazan assay for hiv-1 cytopathic effects: application to high-flux screening of synthetic and natural products for aids-antiviral activity. J. Natl. Cancer Inst. 81(8), 577–586 (1989)CrossRefGoogle Scholar
  39. 39.
    Wu, H., Diamos, G., Cadambi, S., Yalamanchili, S.: Kernel weaver: automatically fusing database primitives for efficient GPU computation. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, Washington, DC, USA, IEEE Computer Society, pp. 107–118 (2012)Google Scholar
  40. 40.
    Wu, H., Diamos, G., Sheard, T., Aref, M., Baxter, S., Garland, M., Yalamanchili, S.: Red fox: an execution environment for relational query processing on gpus. In: International Symposium on Code Generation and Optimization (CGO) (2014)Google Scholar
  41. 41.
    Wu, H., Diamos, G., Wang, J., Cadambi, S., Yalamanchili, S., Chakradhar, S.: Optimizing data warehousing applications for gpus using kernel fusion/fission. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW ’12, Washington, DC, USA, IEEE Computer Society, pp. 2433–2442 (2012)Google Scholar
  42. 42.
    Young, J., Wu, H., Yalamanchili, S.: Satisfying data-intensive queries using GPU clusters. In: 2012 SC Companion High Performance Computing, Networking, Storage and Analysis (SCC), pp. 1314–1314 (2012)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Carlos Alberto Martínez-Angeles
    • 1
  • Haicheng Wu
    • 3
  • Inês Dutra
    • 2
  • Vítor Santos Costa
    • 2
  • Jorge Buenabad-Chávez
    • 1
  1. 1.Departamento de ComputaciónCINVESTAV-IPNMexicoMexico
  2. 2.Departmento de Ciência de ComputadoresCRACS INESC-TEC LA and Universidade do PortoPortoPortugal
  3. 3.Georgia Institute of TechnologyAtlantaUSA

Personalised recommendations