The Journal of Supercomputing

, Volume 69, Issue 1, pp 25–33 | Cite as

SkelCL: a high-level extension of OpenCL for multi-GPU systems

  • Michel Steuwer
  • Sergei Gorlatch


Application development for modern high-performance systems with graphics processing units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. We present SkelCL—a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL makes three main enhancements to the OpenCL standard: (1) memory management is simplified using parallel container data types (vectors and matrices); (2) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs; (3) computations are conveniently expressed using parallel algorithmic patterns (skeletons). We demonstrate how SkelCL is used to implement parallel applications, and we report experimental evaluation of our approach in terms of programming effort and performance.


Parallel programming GPU programming OpenCL Algorithmic skeletons SkelCL Many-cores 



This work is partially supported by the OFERTIE (FP7) and MONICA projects. We would like to thank NVIDIA for their generous hardware donation.


  1. 1.
    (2011) OpenACC Application Program Interface. Version 1.0Google Scholar
  2. 2.
    AMD (2013) Bolt—A C++ template library optimized for GPUsGoogle Scholar
  3. 3.
    Elangovan VK, Badia RM, Parra EA (2013) OmpSs-OpenCL programming model for heterogeneous systems. In: Kasahara H, Kimura K (eds) Languages and compilers for parallel computing, volume 7760 of LNCS. Springer, Berlin, Heidelberg, pp 96–111CrossRefGoogle Scholar
  4. 4.
    Enmyren J, Kessler C (2010) SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings 4th international workshop on high-level parallel programming and applications (HLPP-2010)Google Scholar
  5. 5.
    Ernsting S, Kuchen H (2012) Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int J High Perform Comput Netw 7(2):129–138CrossRefGoogle Scholar
  6. 6.
    Gorlatch S, Cole M (2011) Parallel skeletons. In: Encyclopedia of parallel computing, pp 1417–1422Google Scholar
  7. 7.
    Hoberock J, Bell N (2013) NVIDIA. A parallel template, library, thrustGoogle Scholar
  8. 8.
    Khronos OpenCL Working Group (2013) The OpenCL specification. Version 2.0Google Scholar
  9. 9.
    Kittler J (1983) On the accuracy of the sobel edge detector. Image Vis Comput 1(1):37–42CrossRefGoogle Scholar
  10. 10.
    Mandelbrot B (1980) Fractal aspects of the iteration of \( z \mapsto \lambda z(1 - z)\) for complex \(\lambda \) and \(z\). Ann N Y Acad Sci 357(1):249–259Google Scholar
  11. 11.
    NVIDIA (2013) NVIDIA CUDA SDK code samples. Version 5.0Google Scholar
  12. 12.
    OpenMP Architecture Board (2013) OpenMP API. Version 4.0Google Scholar
  13. 13.
    Steuwer M, Gorlatch S (2013) Enhancing OpenCL for high-level programming of multi-GPU systems. In: Malyshkin V (ed) Parallel computing technologies (PaCT 2013), volume 7979 of LNCS. Springer, Berlin, Heidelberg, pp 258–272Google Scholar
  14. 14.
    Steuwer M, Kegel P, Gorlatch S (2011) SkelCL—a portable skeleton library for high-level GPU programming. In: Parallel and distributed processing workshops and Ph.D. forum (IPDPSW), 2011 IEEE international symposium, pp 1176–1182Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.University of MuensterMünsterGermany

Personalised recommendations