Abstract
The Compute Unified Device Architecture (CUDA) programming environment from NVIDIA is a milestone towards making programming many-core GPUs more flexible to programmers. However, there are still many challenges for programmers when using CUDA. One is how to deal with GPU device memory, and data transfer between host memory and GPU device memory explicitly. In this study, source-to-source compiling and runtime library technologies are used to implement an experimental programming system based on CUDA, called memCUDA, which can automatically map GPU device memory to host memory. With some pragma directive language, programmer can directly use host memory in CUDA kernel functions, during which the tedious and error-prone data transfer and device memory management are shielded from programmer. The performance is also improved with some near-optimal technologies. Experiment results show that memCUDA programs can get similar effect with well-optimized CUDA programs with more compact source code.
This work is supported by National 973 Basic Research Program of China under grant No.2007CB310900, the Ministry of Education-Intel Information Technology special research fund (No.MOE-INTEL-10-05) and National Natural Science Foundation of China (No.60803006).
Chapter PDF
Similar content being viewed by others
References
NVIDIA. NVIDIA CUDA, http://www.NVIDIA.com/cuda
McCool, M.D., Wadleigh, K., Henderson, B., Lin, H.-Y.: Performance Evaluation of GPUs Using the RapidMind Development Platform. In: Proceedings of the ACM/IEEE Conference on Supercomputing (2006)
Lee, S.-I., Johnson, T., Eigenmann, R.: Cetus - an extensible compiler infrastructure for source-to-source transformation. In: Proceedings of the International Workshop on Languages and Compilers for Parallel Computing (2003)
Ueng, S.-Z., Lathara, M., Baghsorkhi, S.S., Hwu, W.-M.W.: CUDA-lite: Reducing GPU programming complexity. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 1–15. Springer, Heidelberg (2008)
IMPACT Research Group. The Parboil benchmark suite (2007), http://www.crhc.uiuc.edu/IMPACT/parboil.php
Hou, Q., Zhou, K., Guo, B.: BSGP: bulk-synchronous GPU programming. ACM Transaction on Graphics 27(3) (2008)
Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of the Second Workshop on General Purpose Processing on Graphics Processing Units (2009)
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-M.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 15th ACM SIGPLAN Principles and Practice of Parallel Computing (2008)
Ryoo, S., Rodrigues, C.I., Stone, S.S., Baghsorkhi, S.S., Ueng, S.-Z., Stratton, J.A., Hwu, W.-M.W.: Program optimization space pruning for a multithreaded GPU. In: Proceedings of the International Symposium on Code Generation and Optimization (2008)
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. ACM Transaction on Graphics 23(3), 777–786 (2004)
Stratton, J.A., Stone, S.S., Hwu, W.-M.W.: MCUDA: An Efficient Implementation of CUDA Kernels for Multi-Core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)
ANTLR, http://www.antlr.org/
Liao, S.-W., Du, Z., Wu, G., Lueh, G.-Y.: Data and computation transformations for Brook streaming applications on multiprocessors. In: Proceedings of the 4th International Symposium on Code Generation and Optimization (2006)
Baskaran, M.M., Bondhugula, U., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: A Compiler Framework for Optimization of Affine Loop Nests for GPGPU. In: Proceedings of the 22nd Annual International Conference on Supercomputing (2008)
Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2009)
Sh, A.: High-Level Metaprogramming Language for Modern GPUs (2004), http://libsh.sourceforge.net
NVIDIA, http://www.nvidia.com/object/fermi_architecture.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 IFIP International Federation for Information Processing
About this paper
Cite this paper
Jin, H., Li, B., Zheng, R., Zhang, Q., Ao, W. (2010). memCUDA: Map Device Memory to Host Memory on GPGPU Platform. In: Ding, C., Shao, Z., Zheng, R. (eds) Network and Parallel Computing. NPC 2010. Lecture Notes in Computer Science, vol 6289. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15672-4_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-15672-4_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15671-7
Online ISBN: 978-3-642-15672-4
eBook Packages: Computer ScienceComputer Science (R0)