CnC-CUDA: Declarative Programming for GPUs

Grossman, Max; Simion Sbîrlea, Alina; Budimlić, Zoran; Sarkar, Vivek

doi:10.1007/978-3-642-19595-2_16

Max Grossman¹⁷,
Alina Simion Sbîrlea¹⁷,
Zoran Budimlić¹⁷ &
…
Vivek Sarkar¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6548))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

873 Accesses
8 Citations

Abstract

The computer industry is at a major inflection point in its hardware roadmap due to the end of a decades-long trend of exponentially increasing clock frequencies. Instead, future computer systems are expected to be built using homogeneous and heterogeneous many-core processors with 10’s to 100’s of cores per chip, and complex hardware designs to address the challenges of concurrency, energy efficiency and resiliency. Unlike previous generations of hardware evolution, this shift towards many-core computing will have a profound impact on software. These software challenges are further compounded by the need to enable parallelism in workloads and application domains that traditionally did not have to worry about multiprocessor parallelism in the past. A recent trend in mainstream desktop systems is the use of graphics processor units (GPUs) to obtain order-of-magnitude performance improvements relative to general-purpose CPUs. Unfortunately, hybrid programming models that support multithreaded execution on CPUs in parallel with CUDA execution on GPUs prove to be too complex for use by mainstream programmers and domain experts, especially when targeting platforms with multiple CPU cores and multiple GPU devices.

In this paper, we extend past work on Intel’s Concurrent Collections (CnC) programming model to address the hybrid programming challenge using a model called CnC-CUDA. CnC is a declarative and implicitly parallel coordination language that supports flexible combinations of task and data parallelism while retaining determinism. CnC computations are built using steps that are related by data and control dependence edges, which are represented by a CnC graph. The CnC-CUDA extensions in this paper include the definition of multithreaded steps for execution on GPUs, and automatic generation of data and control flow between CPU steps and GPU steps. Experimental results show that this approach can yield significant performance benefits with both GPU execution and hybrid CPU/GPU execution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards High-Level Programming for Systems with Many Cores

High-Level Programming for Many-Cores Using C++14 and the STL

Article 13 March 2017

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

References

Habanero Multicore Software Project, http://habanero.rice.edu
Budimlić, Z., Burke, M., Cavé, V., Knobe, K., Lowney, G., Newton, R., Palsberg, J., Peixotto, D., Sarkar, V., Schlimbach, F., Taşrlar, S.: The CnC Programming Model. In: SIAM PP10, Special Issue on Scientific Programming (2010)
Google Scholar
Burke, M.G., Knobe, K., Newton, R., Sarkar, V.: The Concurrent Collections Programming Model. In: Padua, D. (ed.) Encyclopedia of Parallel Computing. Springer, New York (to be published 2011)
Google Scholar
Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Menon, R.: Programming in OpenMP. Academic Press, London (2001)
Google Scholar
Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.-H., Skadron, K.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (October 2009)
Google Scholar
Concurrent Collections in Habanero-Java, HJ (2010), http://habanero.rice.edu/cnc-download
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Charles, P., et al.: X10: An object-oriented approach to non-uniform cluster computing. In: Proceedings of OOPSLA 2005, ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 519–538 (2005)
Google Scholar
Barik, R., et al.: Experiences with an smp implementation for x10 based on the java concurrency utilities. In: Workshop on Programming Models for Ubiquitous Parallelism (PMUP), held in conjunction with PACT 2006 (September 2006)
Google Scholar
Bocchino, R.L., et al.: A type and effect system for Deterministic Parallel Java. In: Proceedings of OOPSLA 2009, ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, pp. 97–116 (2009)
Google Scholar
Lo, V.M., et al.: Oregami: Tools for mapping parallel computations to parallel architectures. IJPP: International Journal of Parallel Programming 20(3), 237–270 (1991)
Google Scholar
Lee, V.W., et al.: Debunking the 100x gpu vs. cpu myth: An evaluation of throughput computing on cpu and gpu. In: ISCA 2010: ACM IEEE International Symposium on Computer Architecture (June 2010)
Google Scholar
Budimlić, Z., et al.: Declarative aspects of memory management in the concurrent collections parallel programming model. In: DAMP 2009: the Workshop on Declarative Aspects of Multicore Programming, pp. 47–58. ACM, New York (2008)
Google Scholar
Gelernter, D.: Generative communication in linda. ACM Trans. Program. Lang. Syst. 7(1), 80–112 (1985)
Article MATH Google Scholar
The Java Grande Forum benchmark suite, http://www.epcc.ed.ac.uk/javagrande
Kennedy, K., Koelbel, C., Zima, H.P.: The rise and fall of High Performance Fortran. In: Proceedings of HOPL 2007, Third ACM SIGPLAN History of Programming Languages Conference, pp. 1–22 (2007)
Google Scholar
Khronos OpenCL Working Group. The OpenCL Specification - Version 1.0. Technical report, The Khronos Group (2009)
Google Scholar
Knobe, K., Offner, C.D.: Tstreams: A model of parallel computation (preliminary report). Technical Report HPL-2004-78, HP Labs (2004)
Google Scholar
Lee, S., Min, S.-J., Eigenmann, R.: Openmp to gpgpu: a compiler framework for automatic translation and optimization. In: PPoPP 2009, pp. 101–110. ACM, New York (2009)
Google Scholar
Nickolls, J., Buck, I., Garland, M., Nvidia, Skadron, K.: Scalable Parallel Programming with CUDA. ACM Queue 6(2), 40–53 (2008)
Article Google Scholar
Peierls, T., Goetz, B., Bloch, J., Bowbeer, J., Lea, D., Holmes, D.: Java Concurrency in Practice. Addison-Wesley Professional, Reading (2005)
Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O’Reilly Media, Sebastopol (2007)
Google Scholar
Yan, Y., Grossman, M., Sarkar, V.: Jcuda: A programmer-friendly interface for accelerating java programs with cuda. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 887–899. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Rice University, USA
Max Grossman, Alina Simion Sbîrlea, Zoran Budimlić & Vivek Sarkar

Authors

Max Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Alina Simion Sbîrlea
View author publications
You can also search for this author in PubMed Google Scholar
Zoran Budimlić
View author publications
You can also search for this author in PubMed Google Scholar
Vivek Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Rice University, 6100 Main Street, 77005-1892, Houston, TX, USA
Keith Cooper , John Mellor-Crummey & Vivek Sarkar , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grossman, M., Simion Sbîrlea, A., Budimlić, Z., Sarkar, V. (2011). CnC-CUDA: Declarative Programming for GPUs. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds) Languages and Compilers for Parallel Computing. LCPC 2010. Lecture Notes in Computer Science, vol 6548. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19595-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-19595-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19594-5
Online ISBN: 978-3-642-19595-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CnC-CUDA: Declarative Programming for GPUs

Abstract

Access this chapter

Preview

Similar content being viewed by others

Towards High-Level Programming for Systems with Many Cores

High-Level Programming for Many-Cores Using C++14 and the STL

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

CnC-CUDA: Declarative Programming for GPUs

Abstract

Access this chapter

Preview

Similar content being viewed by others

Towards High-Level Programming for Systems with Many Cores

High-Level Programming for Many-Cores Using C++14 and the STL

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation