memCUDA: Map Device Memory to Host Memory on GPGPU Platform

Jin, Hai; Li, Bo; Zheng, Ran; Zhang, Qin; Ao, Wenbing

doi:10.1007/978-3-642-15672-4_26

Hai Jin¹⁹,
Bo Li¹⁹,
Ran Zheng¹⁹,
Qin Zhang¹⁹ &
…
Wenbing Ao¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6289))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

12k Accesses
2 Citations
3 Altmetric

Abstract

The Compute Unified Device Architecture (CUDA) programming environment from NVIDIA is a milestone towards making programming many-core GPUs more flexible to programmers. However, there are still many challenges for programmers when using CUDA. One is how to deal with GPU device memory, and data transfer between host memory and GPU device memory explicitly. In this study, source-to-source compiling and runtime library technologies are used to implement an experimental programming system based on CUDA, called memCUDA, which can automatically map GPU device memory to host memory. With some pragma directive language, programmer can directly use host memory in CUDA kernel functions, during which the tedious and error-prone data transfer and device memory management are shielded from programmer. The performance is also improved with some near-optimal technologies. Experiment results show that memCUDA programs can get similar effect with well-optimized CUDA programs with more compact source code.

This work is supported by National 973 Basic Research Program of China under grant No.2007CB310900, the Ministry of Education-Intel Information Technology special research fund (No.MOE-INTEL-10-05) and National Natural Science Foundation of China (No.60803006).

Download to read the full chapter text

Chapter PDF

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Keywords

References

NVIDIA. NVIDIA CUDA, http://www.NVIDIA.com/cuda
McCool, M.D., Wadleigh, K., Henderson, B., Lin, H.-Y.: Performance Evaluation of GPUs Using the RapidMind Development Platform. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2006)
Google Scholar
Lee, S.-I., Johnson, T., Eigenmann, R.: Cetus - an extensible compiler infrastructure for source-to-source transformation. In: Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (2003)
Google Scholar
Ueng, S.-Z., Lathara, M., Baghsorkhi, S.S., Hwu, W.-M.W.: CUDA-lite: Reducing GPU programming complexity. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 1–15. Springer, Heidelberg (2008)
Chapter Google Scholar
IMPACT Research Group. The Parboil benchmark suite (2007), http://www.crhc.uiuc.edu/IMPACT/parboil.php
Hou, Q., Zhou, K., Guo, B.: BSGP: bulk-synchronous GPU programming. ACM Transaction on Graphics 27(3) (2008)
Google Scholar
Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of the Second Workshop on General Purpose Processing on Graphics Processing Units (2009)
Google Scholar
NVIDIA, http://developer.download.nvidia.com/compute/cuda/2_2/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.2.pdf
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 15th ACM SIGPLAN Principles and Practice of Parallel Computing (2008)
Google Scholar
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.-Z., Stratton, J.A., Hwu, W.-M.W.: Program optimization space pruning for a multithreaded GPU. In: Proceedings of the International Symposium on Code Generation and Optimization (2008)
Google Scholar
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. ACM Transaction on Graphics 23(3), 777–786 (2004)
Article Google Scholar
Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)
Chapter Google Scholar
ANTLR, http://www.antlr.org/
Liao, S.-W., Du, Z., Wu, G., Lueh, G.-Y.: Data and computation transformations for Brook streaming applications on multiprocessors. In: Proceedings of the 4th International Symposium on Code Generation and Optimization (2006)
Google Scholar
Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A Compiler Framework for Optimization of Affine Loop Nests for GPGPU. In: Proceedings of the 22nd Annual International Conference on Supercomputing (2008)
Google Scholar
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2009)
Google Scholar
Sh, A.: High-Level Metaprogramming Language for Modern GPUs (2004), http://libsh.sourceforge.net
NVIDIA, http://www.nvidia.com/object/fermi_architecture.html

Download references

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Hai Jin, Bo Li, Ran Zheng, Qin Zhang & Wenbing Ao

Authors

Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Ran Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Qin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenbing Ao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Rochester, P.O. Box 270226, 14627, Rochester, NY, USA
Chen Ding
School of Computer Science and Technology, Huazhong University of Science and Technology, 430074, Wuhan, China
Zhiyuan Shao
School of Computer Science and Technology, Services Computing Technology and Huazhong University of Science and Technology, 430074, Wuhan, China
Ran Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, H., Li, B., Zheng, R., Zhang, Q., Ao, W. (2010). memCUDA: Map Device Memory to Host Memory on GPGPU Platform. In: Ding, C., Shao, Z., Zheng, R. (eds) Network and Parallel Computing. NPC 2010. Lecture Notes in Computer Science, vol 6289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15672-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-15672-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15671-7
Online ISBN: 978-3-642-15672-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

memCUDA: Map Device Memory to Host Memory on GPGPU Platform

Abstract

Chapter PDF

Similar content being viewed by others

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

memCUDA: Map Device Memory to Host Memory on GPGPU Platform

Abstract

Chapter PDF

Similar content being viewed by others

A Multi-Level Platform-Independent GPU API for High-Level Programming Models

Automated Transformation of GPU-Specific OpenCL Kernels Targeting Performance Portability on Multi-Core/Many-Core CPUs

Hands on with OpenMP4.5 and Unified Memory: Developing Applications for IBM’s Hybrid CPU + GPU Systems (Part II)

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation