International Journal of Parallel Programming

, Volume 45, Issue 2, pp 242–261 | Cite as

High Level Data Structures for GPGPU Programming in a Statically Typed Language

  • Mathias Bourgoin
  • Emmanuel Chailloux
  • Jean-Luc Lamotte


To increase software performance, it is now common to use hardware accelerators. Currently, GPUs are the most widespread accelerators that can handle general computations. This requires to use GPGPU frameworks such as Cuda or OpenCL. Both are very low-level and make the benefit of GPGPU programming difficult to achieve. In particular, they require to write programs as a combination of two subprograms, and, to manually manage devices and memory transfers. This increases the complexity of the overall software design. The idea we develop in this paper is to guarantee expressiveness and safety for CPU and GPU computations and memory managements with high-level data-structures and static type-checking. In this paper, we present how statically typed languages, compilers and libraries help harness high level GPGPU programming. In particular, we show how we added high-level user-defined data structures to a GPGPU programming framework based on a statically typed programming language: OCaml. Thus, we describe the introduction of records and tagged unions shared between the host program and GPGPU kernels described via a domain specific language as well as a simple pattern matching control structure to manage them. Examples, practical tests and comparisons with state of the art tools, show that our solutions improve code design, productivity, and safety while providing a high level of performance.


GPGPU Static typing High-level programming Data-structure DSL Parallel skeletons Composition  OCaml 


  1. 1.
    Augonnet, C., et al.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exp. 23(2), 187–198 (2009)CrossRefGoogle Scholar
  2. 2.
    Bergstrom, L., Fluet, M., Rainey, M., Reppy, J., Rosen, S., Shaw, A.: Data-only flattening for nested data parallelism. In: Proceedings of the 2013 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2013), pp. 81–92. ACM, New York, NY (2013)Google Scholar
  3. 3.
    Bergstrom, L., Reppy, J.: Nested data-parallelism on the GPU. In: Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming (ICFP 2012), pp. 247–258 (2012)Google Scholar
  4. 4.
    Blelloch, G.E., et al.: Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21(1), 4–14 (1994)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bourgoin, M., Chailloux, E.: GPGPU composition with OCaml. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY’14, pp. 32:32–32:37 (2014)Google Scholar
  6. 6.
    Bourgoin, M., Chailloux, E., Lamotte, J.L.: Efficient abstractions for GPGPU programming. Int. J. Parallel Prog. 42(4), 583–600 (2014)CrossRefGoogle Scholar
  7. 7.
    Catanzaro, B., Garland, M., Keutzer, K.: Copperhead: compiling an embedded data parallel language. SIGPLAN Not. 46(8), 47–56 (2011)CrossRefGoogle Scholar
  8. 8.
    Chakravarty, M., et al.: Accelerating Haskell array codes with multicore GPUs. In: Workshop on Declarative Aspects of Multicore Programming (DAMP), pp. 3–14 (2011)Google Scholar
  9. 9.
    Clifton-Everest, R., et al.: Embedding foreign code. In: International Symposium on Practical Aspects of Declarative Languages (PADL), pp. 136–151. Springer (2014)Google Scholar
  10. 10.
    Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30, 389–406 (2004)CrossRefGoogle Scholar
  11. 11.
    Cunningham, D., Bordawekar, R., Saraswat, V.: GPU programming in a high level language: compiling X10 to CUDA. In: ACM SIGPLAN X10 Workshop, X10 ’11, pp. 8:1–8:10. ACM (2011)Google Scholar
  12. 12.
    Esterie, P., et al.: The numerical template toolbox: a modern C++ design for scientific computing. J. Parallel Distrib. Comput. 74(12), 3240–3253 (2014)CrossRefGoogle Scholar
  13. 13.
    Gautier, T., et al.: Xkaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1299–1308 (2013)Google Scholar
  14. 14.
    Maranget, L.: Compiling pattern matching to good decision trees. In: Workshop on ML, ML ’08, pp. 35–46. ACM (2008)Google Scholar
  15. 15.
    Masliah, I., Baboulin, M., Falcou, J.: Metaprogramming dense linear algebra solvers. Applications to multi and many-core architectures. In: 13th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2015). Helsinki, Finland (2015)Google Scholar
  16. 16.
    Nystrom, N., White, D., Das, K.: Firepile: Run-time compilation for GPUs in scala. In: Proceedings of the 10th ACM International Conference on Generative Programming and Component Engineering, GPCE ’11, pp. 107–116. ACM (2011)Google Scholar
  17. 17.
    Rompf, T., et al.: Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In: Proceedings of the 40th Symposium on Principles of Programming Languages, POPL ’13. ACM (2013)Google Scholar
  18. 18.
    Rubinsteyn, A., et al.: Parakeet: a just-in-time parallel accelerator for python. In: The 4th USENIX Workshop on Hot Topics in Parallelism. USENIX (2012)Google Scholar
  19. 19.
    Scott, N.S., et al.: 2DRMP: a suite of two-dimensional R-matrix propagation codes. Comput. Phys. Commun. pp. 2424–2449 (2009)Google Scholar
  20. 20.
    Vouillon, J., Balat, V.: From bytecode to javascript: the Js_of_ocaml compiler. Softw. Pract. Exp. 44(8), 951–972 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Mathias Bourgoin
    • 1
    • 2
  • Emmanuel Chailloux
    • 3
  • Jean-Luc Lamotte
    • 3
  1. 1.CNRS, VERIMAGUniversity of Grenoble AlpesGrenobleFrance
  2. 2.INSA Centre Val de LoireUniversity of OrléansOrléansFrance
  3. 3.Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6ParisFrance

Personalised recommendations