Advertisement

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL

  • Roger Ferrer
  • Judit Planas
  • Pieter Bellens
  • Alejandro Duran
  • Marc Gonzalez
  • Xavier Martorell
  • Rosa M. Badia
  • Eduard Ayguade
  • Jesus Labarta
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6548)

Abstract

In this paper, we present OMPSs, a programming model based on OpenMP and StarSs, that can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on three different architectures, SMP, Cell/B.E. and GPUs, showing the wide usefulness of the approach. The evaluation is done with four different benchmarks, Matrix Multiply, BlackScholes, Perlin Noise, and Julia Set. We compare the results obtained with the execution of the same benchmarks written in OpenCL, in the same architectures. The results show that OMPSs greatly outperforms the OpenCL environment. It is more flexible to exploit multiple accelerators. And due to the simplicity of the annotations, it increases programmer’s productivity.

Keywords

Runtime System Multicore Processor Cell Processor Matrix Multiply CUDA Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    AMD Corporation. The AMD Fusion Family of APUs, http://fusion.amd.com
  2. 2.
    AMD/ATI. OpenCL: The Open Standard for Parallel Programming of GPUs and Multi–core CPUs (2010), http://www.amd.com/us/products/technologies/stream-technology/opencl/Pages/opencl.aspx
  3. 3.
    Ayguade, E., Badia, R.M., Cabrera, D., Duran, A., Gonzalez, M., Igual, F., Jimenez, D., Labarta, J., Martorell, X., Mayo, R., Perez, J.M., Quintana-Orti, E.S.: A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures. In: Müller, M.S., de Supinski, B.R., Chapman, B.M. (eds.) IWOMP 2009. LNCS, vol. 5568, pp. 154–167. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Su, E., Unnikrishnan, P., Zhang, G.: A proposal for task parallelism in openMP. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 1–12. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Cooper, P., Dolinsky, U., Donaldson, A.F., Richards, A., Riley, C., Russell, G.: Offload – automating code migration to heterogeneous multicore systems. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 337–352. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A Hybrid Multi-core Parallel Programming Environment. In: Workshop on General Processing Using GPUs (2006)Google Scholar
  7. 7.
    Eichenberger, A.E., O’Brien, K., O’Brien, K.M., Wu, P., Chen, T., Oden, P.H., Prener, D.A., Shepherd, J.C., So, B., Sura, Z., Wang, A., Zhang, T., Zhao, P., Gschwind, M., Archambault, R., Gao, Y., Koo, R.: Using advanced compiler technology to exploit the performance of the cell broadband engine\(^{\mbox{(tm)}}\) architecture. IBM Systems Journal 45(1), 59–84 (2006)CrossRefGoogle Scholar
  8. 8.
    IBM Corporation. OpenCL (2010), http://www.alphaworks.ibm.com/tech/opencl
  9. 9.
    Intel Corporation. Intel Unveils Product Plans for HPC (May 2010), http://www.intel.com/pressroom/archive/releases/2010/20100531comp.htm
  10. 10.
    Kindratenko, V., Enos, J., Shi, G., Showerman, M., Stone, G.A.J., Phillips, J., Hwu, W.: GPU Clusters for High-Performance Computing. In: IEEE Int. Conf. on Cluster Comp. Workshop on Parallel Programming on Accelerator Clusters (2009)Google Scholar
  11. 11.
    Knight, T.J., Park, J.Y., Ren, M., Houston, M., Erez, M., Fatahalian, K., Aiken, A., Dally, W.J., Hanrahan, P.: Compilation for explicitly managed memory hierarchies. In: Proceedings of the 2007 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2007)Google Scholar
  12. 12.
    Linderman, M., Collins, J., Wang, H., Meng, T.: Merge: A Programming Model for Heterogeneous Multi-core Systems. In: Proc. of the 14th Int. Conf. on Arch. Support for Prog. Languages and Operating Systems (ASPLOS) (March 2009)Google Scholar
  13. 13.
    NVIDIA Corporation. NVIDIA CUDA Compute Unified Device Architecture Version 2.0 (2008)Google Scholar
  14. 14.
    NVIDIA Corporation. OpenCL (2010), http://www.nvidia.com/object/cuda_opencl_new.html
  15. 15.
    O’Brien, K., O’Brien, K.M., Sura, Z., Chen, T., Zhang, T.: Supporting openmp on cell. International Journal of Parallel Programming 36(3), 289–311 (2008)CrossRefMATHGoogle Scholar
  16. 16.
    OpenMP Architecture Review Board. OpenMP Application Program Interface. Version 3.0 (May 2008)Google Scholar
  17. 17.
    Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it easier to program the Cell Broadband Engine processor. IBM Journal of Research and Development 51(5), 593–604 (2007)CrossRefGoogle Scholar
  18. 18.
    RapidMind. RapidMind Multi-core Development Platform, http://www.rapidmind.com/pdfs/RapidmindDatasheet.pdf
  19. 19.
    Ueng, S.-Z., Lathara, M., Baghsorkhi, S.S., Hwu, W.-m.W.: CUDA-Lite: Reducing GPU Programming Complexity. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 1–15. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  20. 20.
    Wang, P., Collins, J., Chinya, G., Jiang, H., Tian, X., Girkar, M., Yang, N., Lueh, G.-Y., Wang, H.: EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In: Proc. of PLDI, pp. 156–166 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Roger Ferrer
    • 1
  • Judit Planas
    • 1
  • Pieter Bellens
    • 1
  • Alejandro Duran
    • 1
  • Marc Gonzalez
    • 1
    • 2
  • Xavier Martorell
    • 1
    • 2
  • Rosa M. Badia
    • 1
    • 3
  • Eduard Ayguade
    • 1
    • 2
  • Jesus Labarta
    • 1
    • 2
  1. 1.Barcelona Supercomputing CenterBarcelonaSpain
  2. 2.Departament d’Arquitectura de ComputadorsUniv. Politècnica de CatalunyaBarcelonaSpain
  3. 3.IIIA, Artificial Intelligence Research Institute, CSICSpanish National Research CouncilSpain

Personalised recommendations