Manage OpenMP GPU Data Environment Under Unified Address Space

Li, Lingda; Finkel, Hal; Kong, Martin; Chapman, Barbara

doi:10.1007/978-3-319-98521-3_5

Lingda Li¹⁸,
Hal Finkel¹⁹,
Martin Kong¹⁸ &
…
Barbara Chapman¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11128))

Included in the following conference series:

International Workshop on OpenMP

851 Accesses
4 Citations

Abstract

OpenMP has supported the offload of computations to accelerators such as GPUs since version 4.0. A crucial aspect in OpenMP offloading is to manage the accelerator data environment. Currently, this has to be explicitly programmed by users, which is non-trival and often results in suboptimal performance. The unified memory feature available in recent GPU architectures introduces another option, implicit management. However, our experiments show that it incurs several performance issues, especially under GPU memory oversubscription. In this paper, we propose a compiler and runtime collaborative approach to manage OpenMP GPU data under unified memory. In our framework, the compiler performs data reuse analysis to assist runtime data management. The runtime combines static and dynamic information to make optimized data management decisions. We have implement the proposed technology in the LLVM framework. The evaluation shows our method can achieve significant performance improvement for OpenMP GPU offloading.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory

Data Transfer and Reuse Analysis Tool for GPU-Offloading Using OpenMP

Pragma Directed Shared Memory Centric Optimizations on GPUs

Article 07 March 2016

References

OpenACC. http://www.openacc.org
Summit. https://www.olcf.ornl.gov/summit
OpenMP 4.0 specifications (2013). http://www.openmp.org/wp-content/uploads/OpenMP4.0.0.pdf
Agarwal, N., Nellans, D., Stephenson, M., O’Connor, M., Keckler, S.W.: Page placement strategies for GPUs within heterogeneous memory systems. In: ASPLOS 2015, pp. 607–618. ACM, New York (2015)
Google Scholar
Antao, S.F., et al.: Offloading support for OpenMP in Clang and LLVM. In: LLVM-HPC 2016, pp. 1–11. IEEE Press, Piscataway (2016)
Google Scholar
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)
Google Scholar
Cui, X., Scogland, T.R.W., de Supinski, B.R., Feng, W.C.: Directive-based partitioning and pipelining for graphics processing units. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 575–584, May 2017
Google Scholar
Grinberg, L., Bertolli, C., Haque, R.: Hands on with OpenMP4.5 and unified memory: developing applications for IBM’s hybrid CPU + GPU systems (Part I). In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 3–16. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_1
Chapter Google Scholar
Hahnfeld, J., Cramer, T., Klemm, M., Terboven, C., Müller, M.S.: A pattern for overlapping communication and computation with OpenMP\(^*\) target directives. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 325–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_22
Chapter Google Scholar
Jablin, T.B., Prabhu, P., Jablin, J.A., Johnson, N.P., Beard, S.R., August, D.I.: Automatic CPU-GPU communication management and optimization. In: PLDI 2011, pp. 142–151. ACM, New York (2011)
Google Scholar
Jaleel, A., Theobald, K.B., Steely, Jr., S.C., Emer, J.: High performance cache replacement using re-reference interval prediction (RRIP). In: ISCA 2010, pp. 60–71. ACM, New York (2010)
Google Scholar
Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: CGO 2004, p. 75. IEEE Computer Society, Washington, DC (2004)
Google Scholar
Li, L., Tong, D., Xie, Z., Lu, J., Cheng, X.: Optimal bypass monitor for high performance last-level caches. In: PACT 2012, pp. 315–324. ACM, New York (2012)
Google Scholar
Mishra, A., Li, L., Kong, M., Finkel, H., Chapman, B.: Benchmarking and evaluating unified memory for OpenMP GPU offloading. In: LLVM-HPC 2017, pp. 6:1–6:10. ACM, New York (2017)
Google Scholar
NVIDIA: Compute unified device architecture programming guide (2007)
Google Scholar
Olivier, S.L., Hammond, S.D., Duran, A.: Double buffering for MCDRAM on second generation Intel® Xeon Phi™ processors with OpenMP. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 311–324. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_21
Chapter Google Scholar
Pai, S., Govindarajan, R., Thazhuthaveetil, M.J.: Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme. In: PACT 2012, pp. 33–42. ACM, New York (2012)
Google Scholar
Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: Adaptive insertion policies for high performance caching. In: ISCA 2007, pp. 381–391. ACM, New York (2007)
Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)
Article Google Scholar
Zhao, J., Xie, Y.: Optimizing bandwidth and power of graphics memory with hybrid memory technologies and adaptive data migration. In: ICCAD 2012, pp. 81–87. ACM, New York (2012)
Google Scholar

Download references

Acknowledgement

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

Author information

Authors and Affiliations

Brookhaven National Laboratory, Upton, USA
Lingda Li, Martin Kong & Barbara Chapman
Argonne National Laboratory, Lemont, USA
Hal Finkel

Authors

Lingda Li
View author publications
You can also search for this author in PubMed Google Scholar
Hal Finkel
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kong
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Chapman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lingda Li .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Pedro Valero-Lara
Universitat Politècnica de Catalunya, Barcelona, Spain
Xavier Martorell
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Sergi Mateo Bellido
Universitat Politècnica de Catalunya, Barcelona, Barcelona, Spain
Jesus Labarta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Finkel, H., Kong, M., Chapman, B. (2018). Manage OpenMP GPU Data Environment Under Unified Address Space. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-98521-3_5
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Manage OpenMP GPU Data Environment Under Unified Address Space

Abstract

Access this chapter

Similar content being viewed by others

Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory

Data Transfer and Reuse Analysis Tool for GPU-Offloading Using OpenMP

Pragma Directed Shared Memory Centric Optimizations on GPUs

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Manage OpenMP GPU Data Environment Under Unified Address Space

Abstract

Access this chapter

Similar content being viewed by others

Towards Automatic OpenMP-Aware Utilization of Fast GPU Memory

Data Transfer and Reuse Analysis Tool for GPU-Offloading Using OpenMP

Pragma Directed Shared Memory Centric Optimizations on GPUs

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation