OpenCL as a Programming Model for GPU Clusters

  • Jungwon Kim
  • Sangmin Seo
  • Jun Lee
  • Jeongho Nah
  • Gangwon Jo
  • Jaejin Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7146)

Abstract

In this paper, we propose an OpenCL framework for GPU clusters. The target cluster architecture consists of a single host node and multiple compute nodes. They are connected by an interconnection network, such as Gigabit and InfiniBand switches. Each compute node consists of multiple GPUs. Each GPU becomes an OpenCL compute device. The host node executes the host program in an OpenCL application. Our OpenCL framework provides an illusion of a single system for the user. It allows the application to utilize GPUs in a compute node as if they were in the host node. No communication API, such as the MPI library, is required in the application source. We show that the original OpenCL semantics naturally fits to the GPU cluster environment, and the framework achieves both high performance and ease of programming. We implement the OpenCL framework and evaluate its performance on a GPU cluster that consists of one host and eight compute nodes using six OpenCL benchmark applications.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    AMD: AMD Accelerated Parallel Processing SDK v2.3, http://developer.amd.com/gpu/AMDAPPSDK/Pages/default.aspx
  2. 2.
    AMD: AMD Accelerated Parallel Processing (APP) SDK With OpenCL 1.1 Support (2011), http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx
  3. 3.
    Amza, C., Cox, A.L., Dwarkadas, S., Keleher, P., Lu, H., Rajamony, R., Yu, W., Zwaenepoel, W.: TreadMarks: Shared Memory Computing on Networks of Workstations. Computer 29, 18–28 (1996)CrossRefGoogle Scholar
  4. 4.
    Barak, A., Ben-nun, T., Levy, E., Shiloh, A.: A Package for OpenCL Based Heterogeneous Computing on Clusters with Many GPU Devices. In: Proceedings of the Workshop on Parallel Programming and Applications on Accelerator Clusters, PPAAC 2010 (2010)Google Scholar
  5. 5.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81 (2008)Google Scholar
  6. 6.
    Chen, L., Liu, L., Tang, S., Huang, L., Jing, Z., Xu, S., Zhang, D., Shou, B.: Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 151–165. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Chen, Y., Cui, X., Mei, H.: Large-scale FFT on GPU clusters. In: Proceedings of the 24th ACM International Conference on Supercomputing, ICS 2010, pp. 315–324 (2010)Google Scholar
  8. 8.
    Fan, Z., Qiu, F., Kaufman, A., Yoakum-Stover, S.: GPU cluster for high performance computing. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, SC 2004, pp. 47–58 (2004)Google Scholar
  9. 9.
    IBM: OpenCL Development Kit for Linux on Power (2011), http://www.alphaworks.ibm.com/tech/opencl
  10. 10.
  11. 11.
    Khronos OpenCL Working Group: The OpenCL Specification Version 1.1 (2010), http://www.khronos.org/opencl
  12. 12.
    Kim, J., Kim, H., Lee, J.H., Lee, J.: Achieving a single compute device image in OpenCL for multiple GPUs. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP 2011, pp. 277–288 (2011)Google Scholar
  13. 13.
    Lattner, C., Adve, V.: LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In: Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO 2004, pp. 75–86 (2004)Google Scholar
  14. 14.
    NASA Advanced Supercomputing Division: NAS Parallel Benchmarks version 3.2, http://www.nas.nasa.gov/Resources/Software/npb.html
  15. 15.
  16. 16.
    NVIDIA: NVIDIA CUDA C Programming Guide 3.2 (2010)Google Scholar
  17. 17.
    NVIDIA: NVIDIA GPU Computing Developer Home Page (2011), http://developer.nvidia.com/object/gpucomputing.html
  18. 18.
    Phillips, J.C., Stone, J.E., Schulten, K.: Adapting a message-driven parallel application to GPU-accelerated clusters. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 8:1–8:9 (2008)Google Scholar
  19. 19.
    Seoul National University and Samsung: SNU-SAMSUNG OpenCL Framework (2010), http://opencl.snu.ac.kr
  20. 20.
    The IMPACT Research Group: Parboil Benchmark suite, http://impact.crhc.illinois.edu/parboil.php

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Jungwon Kim
    • 1
  • Sangmin Seo
    • 1
  • Jun Lee
    • 1
  • Jeongho Nah
    • 1
  • Gangwon Jo
    • 1
  • Jaejin Lee
    • 1
  1. 1.Center for Manycore Programming School of Computer Science and EngineeringSeoul National UniversitySeoulKorea

Personalised recommendations