Abstract
In light of GPUs’ powerful floating-point operation capacity, heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC). However, due to the complexity of programming on GPUs, porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge. The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing. To effectively inherit existing OpenMP applications and reduce the transplant cost, we extend OpenMP with a group of compiler directives, which explicitly divide tasks among the CPU and the GPU, and map time-consuming computing fragments to run on the GPU, thus dramatically simplifying the transplantation. We have designed and implemented MPtoStream, a compiler of the extended OpenMP for AMD’s stream processing GPUs. Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system, incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU, over the execution on the Xeon CPU alone.
Similar content being viewed by others
References
Owens J D, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware. Comput Graph Forum, 2007, 26: 80–113
Luebke D, Harris M, Krüger J, et al. GPGPU: general purpose computation on graphics hardware. In: ACM SIGGRAPH 2004 Course Notes. New York: ACM, 2004. 33
Fan Z, Qiu F, Kaufman A, et al. GPU cluster for high performance computing. In: SC04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. Washington DC: IEEE Computer Society, 2004. 47
Kirk D. Nvidia cuda software and GPU parallel computing architecture. In: ISMM 07: Proceedings of the 6th International Symposium on Memory Management. New York: ACM, 2007. 103–104
Buck I. Brook Spec v0.2. Technical Report. Stanford University, 2003
Ryoo S, Rodrigues C I, Stone S S, et al. Program optimization carving for gpu computing. J Parall Distri Com, 2008, 68: 1389–1401
Lee S, Min S J, Eigenmann R. Openmp to gpgpu: a compiler framework for automatic translation and optimization. In: PPoPP’09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM, 2008. 101–110
Han T D, Abdelrahman T S. hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York: ACM, 2009. 52–61
Yang X J, Yan X B, Xing Z C, et al. Fei teng 64 stream processing system: architecture, compiler, and programming. IEEE Trans Parall Distr, 2008, 20: 1142–1157
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, X., Tang, T., Wang, G. et al. MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems. Sci. China Inf. Sci. 55, 1961–1971 (2012). https://doi.org/10.1007/s11432-011-4342-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4342-4