International Journal of Parallel Programming

, Volume 42, Issue 4, pp 583–600 | Cite as

Efficient Abstractions for GPGPU Programming

  • Mathias BourgoinEmail author
  • Emmanuel Chailloux
  • Jean-Luc Lamotte


General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons.


GPGPU DSL OCaml Parallel skeletons Parallel abstractions 



The authors thank Stan Scott and the “High Performance and Distributed Computing” department of Queen’s University of Belfast (United Kingdom) for the 2DRMP library and the program PROP as well as Rachid Habel from Telecom SudParis for sharing his knowledge on PROP. This work is part of the OpenGPU project and is partially funded by SYSTEMATIC PARIS REGION SYSTEMS & ICT CLUSTER (


  1. 1.
  2. 2.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Experience Special Issue: Euro-Par 2009(23), 187–198 (2011)Google Scholar
  3. 3.
    Beck, R., Larsen, H., Jensen, T., Thomsen, B.: Extending scala with general purpose GPU programming. Technical report, Adlborg University, Departement of Computer Science (2011)Google Scholar
  4. 4.
    Bourgoin, M., Chailloux, E., Lamotte, J.L.: SPOC : GPGPU programming through stream processing with OCaml. Parallel Process. Lett. 22(2), 1–12 (2012)Google Scholar
  5. 5.
    Bourgoin, M., Chailloux, E., Lamotte, J.L.: High level GPGPU programming with parallel skeletons. In: Patterns for Parallel Programming on GPUs. Saxe-Coburg Publications (to appear). ISBN 978-1-874672-57-9Google Scholar
  6. 6.
    Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Notices 46(8), 47 (2011)CrossRefGoogle Scholar
  7. 7.
    Cray Inc. CAPS Enterprise, N., Group, T.P.: OpenACC 1.0 specification (2011)Google Scholar
  8. 8.
    Dolbeau, R., Bihan, S., Bodin, F.: HMPP: a hybrid multi-core parallel programming environment. In First Workshop on General Purpose Processing on Graphics Processing Units (2007)Google Scholar
  9. 9.
    Enmyren, J., Kessler, C.W.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings of the Fourth International Workshop on High-Level Parallel Programming and Applications, HLPP ’10, pp. 5–14. ACM (2010)Google Scholar
  10. 10.
    Fortin, P., Habel, R., Jezequel, F., Lamotte, J., Scott, N.: Deployment on gpus of an application in computational atomic physics. In: Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1359–1366. IEEE (2011)Google Scholar
  11. 11.
    Leroy, X., Doligez, D., Firsch, A., Garrigue, J., Remy, D.R., Vouillon, J.: The OCaml system release 4.00: documentation and user’s manual. Technical report, Inria (2012).
  12. 12.
    Munshi, A., et al.: The OpenCL Specification (2012).
  13. 13.
    Nvidia, C.: Cublas library (2012).
  14. 14.
    Nvidia, C.: Cuda C Programming Guide (2012).
  15. 15.
    Scott, N., Scott, M., Burke, P., Stitt, T., Faro-Maza, V., Denis, C., Maniopoulou, A.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. 180(12), 2424–2449 (2009)CrossRefzbMATHGoogle Scholar
  16. 16.
    Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL-a portable skeleton library for high-level GPU programming. In: 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), pp. 1176–1182. IEEE (2011)Google Scholar
  17. 17.
    Svensson, J.: Obsidian: GPU Kernel programming in Haskell. Technical report 77L, Computer Science and Enginering, Chalmers University of Technology and Gothenburg University (2011)Google Scholar
  18. 18.
    Tarditi, D., Puri, S., Oglesby, J.: Accelerator: using data parallelism to program GPUs for general-purpose uses. ACM SIGARCH Comput. Archit. News 34(5), 325–335 (2006)CrossRefGoogle Scholar
  19. 19.
    Tomov, S., Nath, R., Du, P., Dongarra, J.: Magma users guide. ICL, UTK (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Mathias Bourgoin
    • 1
    Email author
  • Emmanuel Chailloux
    • 1
  • Jean-Luc Lamotte
    • 1
  1. 1.Laboratoire d’Informatique de Paris 6 (LIP6-UMR 7606)Université Pierre et Marie Curie (UPMC-Paris 6), Sorbonne UniversitésParisFrance

Personalised recommendations