Skip to main content

Separate Compilation in a Language-Integrated Heterogeneous Environment

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8664))

Abstract

Heterogeneous computing platforms are becoming more common in recent years. Effective programming languages and tools will play a key role in unlocking the performance potential of these systems. In this paper, we present the design and implementation of separate compilation and linking support for the CUDA programming platform. CUDA provides a language-integrated environment for writing parallel programs targeting hybrid systems with CPUs and GPUs (Graphics Processing Unit). We present a novel linker that allows linking of multiple subsets of GPU executable code. We also describe a link time optimization of GPU shared memory layout. Finally, we measure the impact of separate compilation with real world benchmarks and present our conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Original objects are required during host linking, since they may define host entities (e.g., functions) with external linkage that are referenced from host code in other objects.

  2. 2.

    In this case, the device linker ignores any object files without device code.

  3. 3.

    Names of entities with static linkage are mangled with a unique translation unit specific prefix.

  4. 4.

    Thus, it may reduce the the occupancy [7] of the GPU.

  5. 5.

    Patching object files during linking may also complicate “rule-based” build environments. These define rules to produce a “result” given one or more inputs, and the input entities are not expected to be modified in the user-provided implementation of the rule. Also, the object files may not be modifiable, e.g., because of file permissions or because they contain objects that are part of multiple programs, such as a system provided library.

  6. 6.

    Some exceptions are template instantiations and inline function definitions. These are ignored for module-id calculation.

  7. 7.

    In uncommon cases where no such function or variable is available, the current time value is used along with the file name and path.

  8. 8.

    Disallowing device function cloning reduces overall program size. We found that this significantly reduces overall compile time for a few large files in our repository (up to 7x reduction).

  9. 9.

    Shared memory variables can have extern specifiers and be in different translation units.

  10. 10.

    90 % of Lawa’s compilation time is spent on device code so any improvements are predominantly due to changes on the device side.

  11. 11.

    No run time reported for Thrust, since we don’t have performance tests for this benchmark.

  12. 12.

    Only kernels with non-zero shared memory sizes are reported here.

References

  1. Buck, I.: GPU computing: programming a massively parallel processor. In: International Symposium on Code Generation and Optimization (2007)

    Google Scholar 

  2. Levine, J.R.: Linkers and Loaders. Morgan-Kaufman, San Francisco, CA (1999)

    Google Scholar 

  3. Presser, L., White, J.R.: Linkers and loaders. ACM Comput. Surv. 4(3), 149–167 (1972)

    Article  Google Scholar 

  4. Taylor, I.L.: Part 1 of 20 on linkers (2007). http://www.airs.com/blog/archives/38

  5. ELF specification: System V Application Binary Interface (2010). http://www.sco.com/developers/gabi/latest/contents.html

  6. Khronos OpenCL Working Group: OpenCL Specification version 1.2 (2011). http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf

  7. NVIDIA Corporation: NVIDIA CUDA programming guide (2012)

    Google Scholar 

  8. The C++ Standards Committee ISO/IEC JTC1/SC22/WG21: 14882:2011(E), Programming Languages C++ (2011)

    Google Scholar 

  9. Top500 Project: TOP500 Supercomputer Sites (2012). http://i.top500.org/overtime

  10. Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., Meseguer, J., Talcott, C.: Introduction. In: Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N., Meseguer, J., Talcott, C. (eds.) All About Maude - A High-Performance Logical Framework. LNCS, vol. 4350, pp. 1–28. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. OpenACC Corporation: The OpenACC Application Programming Interface (2012). http://www.openacc.org/sites/default/files/OpenACC.1.0_0.pdf

  12. The Portland Group: PGI CUDA Fortran Compiler (2012). http://www.pgroup.com/resources/cudafortran.htm

  13. The Portland Group: PGI Accelerator Compilers with OpenACC Directives (2012). http://www.pgroup.com/resources/accel.htm

  14. Microsoft Corporation: C++ AMP: Language and Programming Model (2012). http://msdn.microsoft.com/en-us/library/hh265137.aspx

  15. Intel Corporation: Intel Array Building Blocks (2012). http://software.intel.com/en-us/articles/intel-array-building-blocks

  16. Chow, Alex Chunghen: Programming the Cell Broadband Engine (2012). http://www.gamasutra.com/view/feature/130278/programming_the_cell_broadband_.php

  17. LLVM: LLVM gold plugin (2013). http://llvm.org/docs/GoldPlugin.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mike Murphy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Murphy, M., Marathe, J., Bharambe, G., Lee, S., Grover, V. (2014). Separate Compilation in a Language-Integrated Heterogeneous Environment. In: Cașcaval, C., Montesinos, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2013. Lecture Notes in Computer Science(), vol 8664. Springer, Cham. https://doi.org/10.1007/978-3-319-09967-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09967-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09966-8

  • Online ISBN: 978-3-319-09967-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics