Accelerating S3D: A GPGPU Case Study
The graphics processor (GPU) has evolved into an appealing choice for high performance computing due to its superior memory bandwidth, raw processing power, and flexible programmability. As such, GPUs represent an excellent platform for accelerating scientific applications. This paper explores a methodology for identifying applications which present significant potential for acceleration. In particular, this work focuses on experiences from accelerating S3D, a high-fidelity turbulent reacting flow solver. The acceleration process is examined from a holistic viewpoint, and includes details that arise from different phases of the conversion. This paper also addresses the issue of floating point accuracy and precision on the GPU, a topic of immense importance to scientific computing. Several performance experiments are conducted, and results are presented from the NVIDIA Tesla C1060 GPU. We generalize from our experiences to provide a roadmap for deploying existing scientific applications on heterogeneous GPU platforms.
KeywordsSingle Precision Graphic Processor Memory Access Pattern Precision Version Timestep Size
Unable to display preview. Download preview PDF.
- 1.NVIDIA: CUDA programming guide 2.0 downloaded (December 1, 2008), www.nvidia.com/object/cudadevelop.html
- 3.Sutherland, J.C.: Evaluation of mixing and reaction models for large-eddy simulation of nonpremixed combustion using direct numerical simulation. Dept. of Chemical and Fuels Engineering, PhD, University of Utah (2004)Google Scholar
- 4.Yu, W., Vetter, J., Oral, H.: Performance characterization and optimization of parallel I/O on the Cray XT. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–11 (April 2008)Google Scholar
- 5.Mellor-Crummey, J.: Harnessing the power of emerging petascale platforms. Journal of Physics: Conference Series 78(1), 12–48 (2007)Google Scholar
- 7.Barrachina, S., Castillo, M., Igual, F., Mayo, R.: Evaluation and tuning of the level 3 CUBLAS for graphics processors. In: Proceedings of the IEEE Symposium on Parallel and Distributed Processing (IPDPS), April 2008, pp. 1–8 (2008)Google Scholar
- 8.Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, April 2008, pp. 1–8 (2008)Google Scholar
- 14.Kruger, J., Westermann, R.: Acceleration techniques for GPU-based volume rendering. In: Visualization, VIS 2003, October 2003, pp. 287–292. IEEE, Los Alamitos (2003)Google Scholar
- 15.Mueller, K., Xu, F.: Practical considerations for GPU-accelerated CT. In: 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, April 2006, pp. 1184–1187 (2006)Google Scholar
- 16.Shende, S., Malony, A.D., Cuny, J., Beckman, P., Karmesin, S., Lindlan, K.: Portable profiling and tracing for parallel, scientific applications using C++. In: SPDT ’98: Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, pp. 134–145. ACM, New York (1998)CrossRefGoogle Scholar