Accelerating S3D: A GPGPU Case Study

  • Kyle Spafford
  • Jeremy Meredith
  • Jeffrey Vetter
  • Jacqueline Chen
  • Ray Grout
  • Ramanan Sankaran
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6043)


The graphics processor (GPU) has evolved into an appealing choice for high performance computing due to its superior memory bandwidth, raw processing power, and flexible programmability. As such, GPUs represent an excellent platform for accelerating scientific applications. This paper explores a methodology for identifying applications which present significant potential for acceleration. In particular, this work focuses on experiences from accelerating S3D, a high-fidelity turbulent reacting flow solver. The acceleration process is examined from a holistic viewpoint, and includes details that arise from different phases of the conversion. This paper also addresses the issue of floating point accuracy and precision on the GPU, a topic of immense importance to scientific computing. Several performance experiments are conducted, and results are presented from the NVIDIA Tesla C1060 GPU. We generalize from our experiences to provide a roadmap for deploying existing scientific applications on heterogeneous GPU platforms.


Single Precision Graphic Processor Memory Access Pattern Precision Version Timestep Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    NVIDIA: CUDA programming guide 2.0 downloaded (December 1, 2008),
  2. 2.
    Hawkes, E.R., Sankaran, R., Sutherland, J.C., Chen, J.H.: Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models. Journal of Physics: Conference Series 16, 65–79 (2005)CrossRefGoogle Scholar
  3. 3.
    Sutherland, J.C.: Evaluation of mixing and reaction models for large-eddy simulation of nonpremixed combustion using direct numerical simulation. Dept. of Chemical and Fuels Engineering, PhD, University of Utah (2004)Google Scholar
  4. 4.
    Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the Cray XT. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–11 (April 2008)Google Scholar
  5. 5.
    Mellor-Crummey, J.: Harnessing the power of emerging petascale platforms. Journal of Physics: Conference Series 78(1), 12–48 (2007)Google Scholar
  6. 6.
    Owens, J., Houston, M., Luebke, D., Green, S., Stone, J., Phillips, J.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)CrossRefGoogle Scholar
  7. 7.
    Barrachina, S., Castillo, M., Igual, F., Mayo, R.: Evaluation and tuning of the level 3 CUBLAS for graphics processors. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing (IPDPS), April 2008, pp. 1–8 (2008)Google Scholar
  8. 8.
    Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, April 2008, pp. 1–8 (2008)Google Scholar
  9. 9.
    Cummins, G., Adams, R., Newell, T.: Scientific computation through a GPU. In: Southeastcon, pp. 244–246. IEEE, Los Alamitos (April 2008)CrossRefGoogle Scholar
  10. 10.
    Bolz, J., Farmer, I., Grinspun, E., Schröoder, P.: Sparse matrix solvers on the GPU: conjugate gradients and multigrid. In: SIGGRAPH’03: ACM SIGGRAPH 2003 Papers, pp. 917–924. ACM, New York (2003)CrossRefGoogle Scholar
  11. 11.
    He, B., Govindaraju, N.K., Luo, Q., Smith, B.: Efficient gather and scatter operations on graphics processors. In: SC ’07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, pp. 1–12. ACM, New York (2007)CrossRefGoogle Scholar
  12. 12.
    Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating molecular modeling applications with graphics processors. Journal of Computational Chemistry 28, 2618–2640 (2005)CrossRefGoogle Scholar
  13. 13.
    Rodrigues, C.I., Hardy, D.J., Stone, J.E., Schulten, K., Hwu, W.M.W.: GPU acceleration of cutoff pair potentials for molecular modeling applications. In: CF ’08: Proceedings of the 2008 conference on Computing frontiers, pp. 273–282. ACM, New York (2008)CrossRefGoogle Scholar
  14. 14.
    Kruger, J., Westermann, R.: Acceleration techniques for GPU-based volume rendering. In: Visualization, VIS 2003, October 2003, pp. 287–292. IEEE, Los Alamitos (2003)Google Scholar
  15. 15.
    Mueller, K., Xu, F.: Practical considerations for GPU-accelerated CT. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, April 2006, pp. 1184–1187 (2006)Google Scholar
  16. 16.
    Shende, S., Malony, A.D., Cuny, J., Beckman, P., Karmesin, S., Lindlan, K.: Portable profiling and tracing for parallel, scientific applications using C++. In: SPDT ’98: Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, pp. 134–145. ACM, New York (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Kyle Spafford
    • 1
  • Jeremy Meredith
    • 1
  • Jeffrey Vetter
    • 1
  • Jacqueline Chen
    • 2
  • Ray Grout
    • 2
  • Ramanan Sankaran
    • 1
  1. 1.Oak Ridge National Laboratory 
  2. 2.Sandia National Laboratories 

Personalised recommendations