SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems
- 10 Citations
- 1.2k Downloads
Abstract
Application development for modern high-performance systems with Graphics Processing Units (GPUs) currently relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs.
In this paper, we present SkelCL – a high-level programming approach for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel algorithmic patterns (skeletons); 2) memory management is simplified using parallel container data types (vectors and matrices); 3) an automatic data (re)distribution mechanism allows for implicit data movements between GPUs and ensures scalability when using multiple GPUs. We demonstrate how SkelCL is used to implement parallel applications on one- and two-dimensional data. We report experimental results to evaluate our approach in terms of programming effort and performance.
Preview
Unable to display preview. Download preview PDF.
References
- 1.AMD APP SDK code samples, version 2.7 (February 2013), http://developer.amd.com/
- 2.NVIDIA CUDA SDK code samples, version 5.0 (February 2013), http://developer.nvidia.com/
- 3.Arora, N., Shringarpure, A., Vuduc, R.W.: Direct N-body Kernels for Multicore Platforms. In: Proceedings of the 2009 International Conference on Parallel Processing, ICPP 2009, pp. 379–387. IEEE Computer Society, Washington, DC (2009)CrossRefGoogle Scholar
- 4.Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Graphics Hardware 2007 (2007)Google Scholar
- 5.T.P. Group: PGI Accelerator Programming Model for Fortran & C (2010)Google Scholar
- 6.OpenACC Application Program Interface. version 1.0 (2011), http://www.openacc.org/
- 7.OpenMP Application Program Interface. OpenMP Architecture Review Board, version 3.0 (2008), http://www.openmp.org/mp-documents/spec30.pdf
- 8.Bihan, S., Moulard, G., Dolbeau, R., et al.: Directive-based heterogeneous programming a GPU-accelerated RTM use case. In: Proceedings of the 7th International Conference on Computing, Communications and Control Technologies (2009)Google Scholar
- 9.Kong, J., Dimitrov, M., Yang, Y., et al.: Accelerating MATLAB image processing toolbox functions on GPUs. In: GPGPU 2010: Proc. of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units. ACM (2010)Google Scholar
- 10.Mandelbrot, B.B.: Fractal aspects of the iteration of z↦λz(1 − z) for complex λ and z. Annals of the New York Academy of Sciences 357, 249–259 (1980)CrossRefGoogle Scholar
- 11.NVIDIA CUDA API Reference Manual, version 5.0 (February 2013)Google Scholar
- 12.Chang, D., Desoky, A., Ouyang, M., Rouchka, E.: Compute Pairwise Manhattan Distance and Pearson Correlation Coefficient of Data Points with GPU. In: Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, SNPD 2009, pp. 501–506 (2009)Google Scholar
- 13.Munshi, A.: The OpenCL Specification, version 1.2Google Scholar
- 14.Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL – A Portable Skeleton Library for High-Level GPU Programming. In: 2011 IEEE 25th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 1171–1177 (2011)Google Scholar
- 15.Gorlatch, S., Cole, M.: Parallel skeletons. In: Encyclopedia of Parallel Computing, pp. 1417–1422 (2011)Google Scholar
- 16.Hoberock, J., Bell, N.: Thrust: A Parallel Template Library (2009)Google Scholar
- 17.Enmyren, J., Kessler, C.: SkePU: A multi-backend skeleton programming library for multi-GPU systems. In: Proceedings 4th Int. Workshop on High-Level Parallel Programming and Applications, pp. 5–14 (2010)Google Scholar
- 18.University of Southern California SIPI Image Database. Girl (lena, or lenna), http://sipi.usc.edu/database/database.php?volume=misc