Griffon – GPU Programming APIs for Scientific and General Purpose Computing

  • Pisit Makpaisit
  • Worawan Marurngsith
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 91)


Applications can accelerates up to hundreds of times faster by offloading some computation from CPU to execute at graphical processing units (GPUs). This technique is so called the general-purpose computation on graphic processing units (GPGPUs). Recent research on accelerating various applications by GPGPUs using a programming model from NVIDIA, called Compute Unified Device Architecture (CUDA), have shown significant improvement on performance results. However, writing an efficient CUDA program requires in-depth understanding of GPU architecture in order to develop a suitable data-parallel strategy, and to express it in a low-level style of code. Thus, CUDA programming is still considered complex and error-prone. This paper proposes a new set of application program interfaces (APIs), called Griffon, and its compiler framework for automatic translation of C programs to CUDA-based programs. The Griffon APIs allow programmers to exploit the performance of multicore machines using OpenMP and offloads computations to GPUs using Griffon directives. The Griffon compiler framework uses a new graph algorithm for efficiently exploiting data locality. Experimental results on a 16-core NVIDIA Geforce 8400M GS using four workloads show that Griffon-based programs can accelerate up to 89 times faster than their sequential implementation.


GPU Accelerating Computing Automatic translation CUDA Parallel Programming 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Owens, J.D., et al.: A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26(1), 80–113 (2007)CrossRefGoogle Scholar
  2. 2.
    Scanzio, S., et al.: Parallel implementation of artificial neural network training. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing, ICASSP (2010)Google Scholar
  3. 3.
    Strigl, D., Kofler, K., Podlipnig, S.: Performance and Scalability of GPU-Based Convolutional Neural Networks. In: 2010 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (DPD) (2010)Google Scholar
  4. 4.
    Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. SIGPLAN Not. 44(4), 101–110 (2009)CrossRefGoogle Scholar
  5. 5.
    Che, S., et al.: A performance study of general-purpose applications on graphics processors using CUDA. Journal of Parallel and Distributed Computing 68(10), 1370–1380 (2008)CrossRefGoogle Scholar
  6. 6.
    McCormick, P., et al.: Scout: a data-parallel programming language for graphics processors. Parallel Computing 33(10-11), 648–662 (2007)Google Scholar
  7. 7.
    Han, T.D., Abdelrahman, T.S.: hiCUDA: a high-level directive-based language for GPU programming. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 52–61. ACM, Washington, D.C (2009)Google Scholar
  8. 8.
    NVIDIA, NVIDIA CUDA C Programming Guide Version 3.2 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Pisit Makpaisit
    • 1
  • Worawan Marurngsith
    • 1
  1. 1.Department of Computer Science, Faculty of Science and TechnologyThammasat UniversityPathum ThaniThailand

Personalised recommendations