Separate Compilation in a Language-Integrated Heterogeneous Environment

Murphy, Mike; Marathe, Jaydeep; Bharambe, Girish; Lee, Sean; Grover, Vinod

doi:10.1007/978-3-319-09967-5_7

Mike Murphy¹⁷,
Jaydeep Marathe¹⁷,
Girish Bharambe¹⁸,
Sean Lee¹⁷ &
…
Vinod Grover¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

679 Accesses
1 Citations
1 Altmetric

Abstract

Heterogeneous computing platforms are becoming more common in recent years. Effective programming languages and tools will play a key role in unlocking the performance potential of these systems. In this paper, we present the design and implementation of separate compilation and linking support for the CUDA programming platform. CUDA provides a language-integrated environment for writing parallel programs targeting hybrid systems with CPUs and GPUs (Graphics Processing Unit). We present a novel linker that allows linking of multiple subsets of GPU executable code. We also describe a link time optimization of GPU shared memory layout. Finally, we measure the impact of separate compilation with real world benchmarks and present our conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Original objects are required during host linking, since they may define host entities (e.g., functions) with external linkage that are referenced from host code in other objects.
2.
In this case, the device linker ignores any object files without device code.
3.
Names of entities with static linkage are mangled with a unique translation unit specific prefix.
4.
Thus, it may reduce the the occupancy [7] of the GPU.
5.
Patching object files during linking may also complicate “rule-based” build environments. These define rules to produce a “result” given one or more inputs, and the input entities are not expected to be modified in the user-provided implementation of the rule. Also, the object files may not be modifiable, e.g., because of file permissions or because they contain objects that are part of multiple programs, such as a system provided library.
6.
Some exceptions are template instantiations and inline function definitions. These are ignored for module-id calculation.
7.
In uncommon cases where no such function or variable is available, the current time value is used along with the file name and path.
8.
Disallowing device function cloning reduces overall program size. We found that this significantly reduces overall compile time for a few large files in our repository (up to 7x reduction).
9.
Shared memory variables can have extern specifiers and be in different translation units.
10.
90 % of Lawa’s compilation time is spent on device code so any improvements are predominantly due to changes on the device side.
11.
No run time reported for Thrust, since we don’t have performance tests for this benchmark.
12.
Only kernels with non-zero shared memory sizes are reported here.

References

Buck, I.: GPU computing: programming a massively parallel processor. In: International Symposium on Code Generation and Optimization (2007)
Google Scholar
Levine, J.R.: Linkers and Loaders. Morgan-Kaufman, San Francisco, CA (1999)
Google Scholar
Presser, L., White, J.R.: Linkers and loaders. ACM Comput. Surv. 4(3), 149–167 (1972)
Article Google Scholar
Taylor, I.L.: Part 1 of 20 on linkers (2007). http://www.airs.com/blog/archives/38
ELF specification: System V Application Binary Interface (2010). http://www.sco.com/developers/gabi/latest/contents.html
Khronos OpenCL Working Group: OpenCL Specification version 1.2 (2011). http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf
NVIDIA Corporation: NVIDIA CUDA programming guide (2012)
Google Scholar
The C++ Standards Committee ISO/IEC JTC1/SC22/WG21: 14882:2011(E), Programming Languages C++ (2011)
Google Scholar
Top500 Project: TOP500 Supercomputer Sites (2012). http://i.top500.org/overtime
Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., Meseguer, J., Talcott, C.: Introduction. In: Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., Meseguer, J., Talcott, C. (eds.) All About Maude - A High-Performance Logical Framework. LNCS, vol. 4350, pp. 1–28. Springer, Heidelberg (2007)
Chapter Google Scholar
OpenACC Corporation: The OpenACC Application Programming Interface (2012). http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf
The Portland Group: PGI CUDA Fortran Compiler (2012). http://www.pgroup.com/resources/cudafortran.htm
The Portland Group: PGI Accelerator Compilers with OpenACC Directives (2012). http://www.pgroup.com/resources/accel.htm
Microsoft Corporation: C++ AMP: Language and Programming Model (2012). http://msdn.microsoft.com/en-us/library/hh265137.aspx
Intel Corporation: Intel Array Building Blocks (2012). http://software.intel.com/en-us/articles/intel-array-building-blocks
Chow, Alex Chunghen: Programming the Cell Broadband Engine (2012). http://www.gamasutra.com/view/feature/130278/programming_the_cell_broadband_.php
LLVM: LLVM gold plugin (2013). http://llvm.org/docs/GoldPlugin.html

Download references

Author information

Authors and Affiliations

NVIDIA Corporation, Santa Clara, USA
Mike Murphy, Jaydeep Marathe, Sean Lee & Vinod Grover
NVIDIA Corporation, Pune, India
Girish Bharambe

Authors

Mike Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Jaydeep Marathe
View author publications
You can also search for this author in PubMed Google Scholar
Girish Bharambe
View author publications
You can also search for this author in PubMed Google Scholar
Sean Lee
View author publications
You can also search for this author in PubMed Google Scholar
Vinod Grover
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mike Murphy .

Editor information

Editors and Affiliations

Silicon Valley, Qualcomm Research, San Jose, California, USA
Călin Cașcaval
Silicon Valley, Qualcomm Research, San Jose, California, USA
Pablo Montesinos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murphy, M., Marathe, J., Bharambe, G., Lee, S., Grover, V. (2014). Separate Compilation in a Language-Integrated Heterogeneous Environment. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-09967-5_7
Published: 01 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09966-8
Online ISBN: 978-3-319-09967-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics