Skip to main content

Towards High-Level Programming for Systems with Many Cores

  • Conference paper
  • First Online:
Perspectives of System Informatics (PSI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8974))

  • 509 Accesses

Abstract

Application development for modern high-performance systems with many cores, i.e., comprising multiple Graphics Processing Units (GPUs) and multi-core CPUs, currently exploits low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we advocate a high-level programming approach for such systems, which relies on the following two main principles: (a) the model is based on the current OpenCL standard, such that programs remain portable across various many-core systems, independently of the vendor, and all low-level code optimizations can be applied; (b) the model extends OpenCL with three high-level features which simplify many-core programming and are automatically translated by the system into OpenCL code. The high-level features of our programming model are as follows: (1) memory management is simplified and automated using parallel container data types (vectors and matrices); (2) a data (re)distribution mechanism supports data partitioning and generates automatic data movements between multiple GPUs; (3) computations are precisely and concisely expressed using parallel algorithmic patterns (skeletons). The well-defined skeletons allow for semantics-preserving transformations of SkelCL programs which can be applied in the process of program development, as well as in the compilation and optimization phase. We demonstrate how our programming model and its implementation are used to express several parallel applications, and we report first experimental results on evaluating our approach in terms of program size and target performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. OpenACC application program interface. Version 1.0 (2011)

    Google Scholar 

  2. AMD. AMD APP SDK code samples. Version 2.7, February 2013

    Google Scholar 

  3. AMD. Bolt – A C++ template library optimized for GPUs (2013)

    Google Scholar 

  4. Arora, N., Shringarpure, A., Vuduc, R.W.: Direct N-body kernels for multicore platforms. In: 2012 41st International Conference on Parallel Processing, pp. 379–387. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  5. Blelloch, G.E.: Prefix sums and their applications. In: Sythesis of Parallel Algorithms, pp. 35–60. Morgan Kaufmann Publishers Inc. (1990)

    Google Scholar 

  6. Chang, D.-J., Desoky, A.H., Ouyang, M., Rouchka, E.C.: Compute pairwise manhattan distance and pearson correlation coefficient of data points with GPU. In: 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing, pp. 501–506 (2009)

    Google Scholar 

  7. Elangovan, V.K., Badia, R.M., Parra, E.A.: OmpSs-OpenCL programming model for heterogeneous systems. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 96–111. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  8. Enmyren, J., Kessler. C.: SkePU: a multi-backend skeleton programming library for multi-GPU systems. In: Proceedings 4th International Workshop on High-Level Parallel Programming and Applications (HLPP-2010), pp. 5–14 (2010)

    Google Scholar 

  9. Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-GPU systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)

    Article  Google Scholar 

  10. Gorlatch, S., Cole, M.: Parallel skeletons. In: Padua, D.A. (ed.) Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, US (2011)

    Google Scholar 

  11. Gorlatch, S., Lengauer, C.: (De)Composition rules for parallel scan and reduction. In: Proceedings of the 3rd International Working Conference on Massively Parallel Programming Models (MPPM’97), pp. 23–32. IEEE Computer Society Press (1998)

    Google Scholar 

  12. Hoberock, J., Bell, N.: (NVIDIA). Thrust: a parallel template, Library (2013)

    Google Scholar 

  13. Khronos Group. The OpenCL specification, Version 2.0, November 2013

    Google Scholar 

  14. Kirk, D.B., Hwu, W.W.: Programming Massively Parallel Processors - A Hands-on Approach. Morgan Kaufman, San Francisco (2010)

    Google Scholar 

  15. Nitsche, T.: Skeleton implementations based on generic data distributions. In: 2nd International Workshop on Constructive Methods for Parallel Programming (2000)

    Google Scholar 

  16. NVIDIA. CUBLAS (2013). http://developer.nvidia.com/cublas

  17. NVIDIA. NVIDIA CUDA SDK code samples. Version 5.0, February 2013

    Google Scholar 

  18. OpenMP Architecture Review Board. OpenMP API. Version 4.0 (2013)

    Google Scholar 

  19. Pepper, P., Südholt. M.: Deriving parallel numerical algorithms using data distribution algebras: Wang’s algorithm. In: 30th Annual Hawaii International Conference on System Sciences (HICSS), pp. 501–510 (1997)

    Google Scholar 

  20. Steuwer, M., Friese, M., Albers, S., Gorlatch, S.: Introducing and implementing the allpairs skeleton for programming multi-GPU systems. Int. J. Parallel Prog. 42(4), 601–618 (2013)

    Article  Google Scholar 

  21. Steuwer, M., Gorlatch, S.: SkelCL: enhancing OpenCL for high-level programming of multi-GPU systems. In: Malyshkin, V. (ed.) PaCT 2013. LNCS, vol. 7979, pp. 258–272. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Acknowledgments

This work is partially supported by the OFERTIE (FP7) and MONICA projects. We would like to thank the anonymous reviewers for their valuable comments, as well as NVIDIA for their generous hardware donation used in our experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergei Gorlatch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gorlatch, S., Steuwer, M. (2015). Towards High-Level Programming for Systems with Many Cores. In: Voronkov, A., Virbitskaite, I. (eds) Perspectives of System Informatics. PSI 2014. Lecture Notes in Computer Science(), vol 8974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46823-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46823-4_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46822-7

  • Online ISBN: 978-3-662-46823-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics