Advertisement

Science China Information Sciences

, Volume 55, Issue 9, pp 1961–1971 | Cite as

MPtostream: an OpenMP compiler for CPU-GPU heterogeneous parallel systems

  • XueJun Yang
  • Tao Tang
  • GuiBin Wang
  • Jia Jia
  • XinHai Xu
Research Paper

Abstract

In light of GPUs’ powerful floating-point operation capacity, heterogeneous parallel systems incorporating general purpose CPUs and GPUs have become a highlight in the research field of high performance computing(HPC). However, due to the complexity of programming on GPUs, porting a large number of existing scientific computing applications to the heterogeneous parallel systems remains a big challenge. The OpenMP programming interface is widely adopted on multi-core CPUs in the field of scientific computing. To effectively inherit existing OpenMP applications and reduce the transplant cost, we extend OpenMP with a group of compiler directives, which explicitly divide tasks among the CPU and the GPU, and map time-consuming computing fragments to run on the GPU, thus dramatically simplifying the transplantation. We have designed and implemented MPtoStream, a compiler of the extended OpenMP for AMD’s stream processing GPUs. Our experimental results show that programming with the extended directives deviates from programming with OpenMP by less than 11% modification and achieves significant speedup ranging from 3.1 to 17.3 on a heterogeneous system, incorporating an Intel Xeon E5405 CPU and an AMD FireStream 9250 GPU, over the execution on the Xeon CPU alone.

Keywords

GPGPU stream OpenMP compiler 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Owens J D, Luebke D, Govindaraju N, et al. A survey of general-purpose computation on graphics hardware. Comput Graph Forum, 2007, 26: 80–113CrossRefGoogle Scholar
  2. 2.
    Luebke D, Harris M, Krüger J, et al. GPGPU: general purpose computation on graphics hardware. In: ACM SIGGRAPH 2004 Course Notes. New York: ACM, 2004. 33Google Scholar
  3. 3.
    Fan Z, Qiu F, Kaufman A, et al. GPU cluster for high performance computing. In: SC04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing. Washington DC: IEEE Computer Society, 2004. 47Google Scholar
  4. 4.
    Kirk D. Nvidia cuda software and GPU parallel computing architecture. In: ISMM 07: Proceedings of the 6th International Symposium on Memory Management. New York: ACM, 2007. 103–104CrossRefGoogle Scholar
  5. 5.
    Buck I. Brook Spec v0.2. Technical Report. Stanford University, 2003Google Scholar
  6. 6.
    Ryoo S, Rodrigues C I, Stone S S, et al. Program optimization carving for gpu computing. J Parall Distri Com, 2008, 68: 1389–1401CrossRefGoogle Scholar
  7. 7.
    Lee S, Min S J, Eigenmann R. Openmp to gpgpu: a compiler framework for automatic translation and optimization. In: PPoPP’09: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York: ACM, 2008. 101–110CrossRefGoogle Scholar
  8. 8.
    Han T D, Abdelrahman T S. hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. New York: ACM, 2009. 52–61Google Scholar
  9. 9.
    Yang X J, Yan X B, Xing Z C, et al. Fei teng 64 stream processing system: architecture, compiler, and programming. IEEE Trans Parall Distr, 2008, 20: 1142–1157CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • XueJun Yang
    • 1
  • Tao Tang
    • 1
  • GuiBin Wang
    • 1
  • Jia Jia
    • 1
  • XinHai Xu
    • 1
  1. 1.National Laboratory for Parallel and Distributed ProcessingNational University of Defense TechnologyChangshaChina

Personalised recommendations