Skip to main content

On the GPU Performance of 3D Stencil Computations Implemented in OpenCL

  • Conference paper
Supercomputing (ISC 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7905))

Included in the following conference series:

Abstract

Aiming at a close examination of the OpenCL performance myth, we study in this paper OpenCL implementations of several representative 3D stencil computations. It is found that typical optimization techniques such as array padding, plane sweeping and chunking give similar performance boosts to the OpenCL implementations, as those obtained in corresponding CUDA programs. The key to good performance lies in maximizing the use of on-chip resources of a GPU, same for both OpenCL and CUDA programming. In most cases, the achieved FLOPS rates on NVIDIA’s Fermi and Kepler GPUs are fully comparable between the two programming alternatives. For four typical 3D stencil computations, the performance of the OpenCL implementations is on average 9% and 2% faster than that of the CUDA counterparts on GTX590 and Tesla K20, respectively. At the moment, the only clear advantage of CUDA programming for stencil computations arises from CUDA’s ability of using the read-only data cache on NVIDIA’s Kepler GPUs. The skepticism about OpenCL’s GPU performance thus seems unjustified for 3D stencil computations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Khronos OpenCL Working Group: The OpenCL Specification (2011), http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

  2. Fang, J., Varbanescu, A., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: Proceedings of the 2011 International Conference on Parallel Processing, pp. 216–225. IEEE Computer Society Press (2011)

    Google Scholar 

  3. Karimi, K., Dickson, N., Hamze, F.: A performance comparison of CUDA and OpenCL (2010), http://arxiv.org/ftp/arxiv/papers/1005/1005.2581.pdf

  4. Komatsu, K., Sato, K., Arai, Y., Koyama, K., Takizawa, H., Kobayashi, H.: Evaluating performance and portability of OpenCL programs. In: Proceedings of the Fifth International Workshop on Automatic Performance Tuning (iWAPT 2010). IEEE Computer Society Press (2010)

    Google Scholar 

  5. Du, P., Weber, R., Luszczek, P., Tomov, S., Peterson, G., Dongarra, J.: From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming. Parallel Computing 38(8), 391–407 (2012)

    Article  Google Scholar 

  6. Unat, D., Cai, X., Baden, S.: Mint: realizing CUDA performance in 3D stencil methods with annotated C. In: Proceedings of the 25th ACM International Conference on Supercomputing, pp. 214–224. ACM (2011)

    Google Scholar 

  7. Schäfer, A., Fey, D.: High performance stencil code algorithms for GPGPUs. In: Proceedings of the International Conference on Computational Science. Procedia Computer Science, vol. 4, pp. 2027–2036. Elsevier (2011)

    Google Scholar 

  8. NVIDIA: NVIDIA OpenCL Best Practices Guide (2009), http://developer.download.nvidia.com/compute/cuda/2_3/opencl/docs/NVIDIA_OpenCL_BestPracticesGuide.pdf

  9. NVIDIA: NVIDIA OpenCL SDK code sample of 3D FDTD, http://developer.download.nvidia.com/compute/DevZone/OpenCL/Projects/oclFDTD3d.zip

  10. Datta, K., Murphy, M., Volkov, V., Williams, S., Carter, J., Oliker, L., Patterson, D., Shalf, J., Yelick, K.: Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press (2008)

    Google Scholar 

  11. Holewinski, J., Pouchet, L.N., Sadayappan, P.: High-performance code generation for stencil computations on GPU architectures. In: Proceedings of the 26th ACM International Conference on Supercomputing, pp. 311–320. ACM (2012)

    Google Scholar 

  12. Zhang, Y., Mueller, F.: Auto-generation and auto-tuning of 3D stencil codes on GPU clusters. In: Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp. 155–164. ACM (2012)

    Google Scholar 

  13. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press (2010)

    Google Scholar 

  14. Zumbusch, G.: Tuning a finite difference computation for parallel vector processors. In: Proceedings of the 2012 11th International Symposium on Parallel and Distributed Computing, pp. 63–70. IEEE Computer Society Press (2012)

    Google Scholar 

  15. Yang, Y., Cui, H., Feng, X., Xue, J.: A hybrid circular queue method for iterative stencil computations on GPUs. Journal of Computer Science and Technology 27(1), 57–74 (2012)

    Article  Google Scholar 

  16. Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: Symposium on Application Accelerators in High Performance Computing, SAAHPC 2010 (2010)

    Google Scholar 

  17. Demidov, D.: VexCL: Vector expression template library for OpenCL (2013), http://www.codeproject.com/Articles/415058/VexCL-Vector-expression-template-library-for-OpenC

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, H., Wu, N., Wen, M., Zhang, C., Cai, X. (2013). On the GPU Performance of 3D Stencil Computations Implemented in OpenCL. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2013. Lecture Notes in Computer Science, vol 7905. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38750-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38750-0_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38749-4

  • Online ISBN: 978-3-642-38750-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics