Advertisement

Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach

  • Arpith C. Jacob
  • Ravi Nair
  • Alexandre E. Eichenberger
  • Samuel F. Antao
  • Carlo Bertolli
  • Tong Chen
  • Zehra Sura
  • Kevin O’Brien
  • Michael Wong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9342)

Abstract

Modern high-performance machines are challenging to program because of the availability of a wide array of compute resources that often requires low-level, specialized knowledge to exploit. OpenMP is an effective directive-based approach that can effectively exploit shared-memory multicores. The recently introduced OpenMP 4.0 standard extends the directive-based approach to exploit accelerators. However, programming clusters still requires the use of other specialized languages or libraries.

In this work we propose the use of the target offloading constructs to program nodes distributed in a cluster. We introduce an abstract model of a cluster that defines a clique of distinct shared-memory domains that are manipulated with the target constructs. We have implemented this model in the LLVM compiler with an OpenMP runtime that supports transparent offloading to nodes in a cluster using MPI. Our initial results on HMMER, a widely used Bioinformatics tool, show excellent scaling behavior with a small constant-factor overhead as compared to a baseline MPI implementation. Our work raises the intriguing possibility of a natural progression of a program compiled for serial execution, to parallel execution on a multicore, to offloading onto accelerators, and finally extendible with minimal additional effort onto a cluster.

References

  1. 1.
    Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. J. High Perf. Comput. Appl. 21(3), 291–312 (2007)CrossRefGoogle Scholar
  2. 2.
    Charles, P., et al.: X10: An object-oriented approach to non-uniform cluster computing. In: Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 519–538 (2005)Google Scholar
  3. 3.
    Clang: A C language family frontend for LLVM. http://clang.llvm.org
  4. 4.
    Eichenberger, A.E., O’Brien, K.: Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q. IBM J. Res. Dev. 57(1/2), 8:1–8:8 (2013)CrossRefGoogle Scholar
  5. 5.
    El-Ghazawi, T., Smith, L.: UPC: Unified parallel C. In: Supercomputing (2006)Google Scholar
  6. 6.
    Hoeflinger, J.P.: Extending OpenMP to clusters (2006)Google Scholar
  7. 7.
    Hu, Y., Lu, H., Cox, A.L., Zwaenepoel, W.: OpenMP for networks of SMPs. J. Parallel Distrib. Comput. 60(12), 1512–1530 (2000)CrossRefzbMATHGoogle Scholar
  8. 8.
    Kogge, P.M.: Performance analysis of a large memory application on multiple architectures. In: Conference on Partitioned Global Address Space Programming Models (2013)Google Scholar
  9. 9.
    Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Operating Systems Design and Implementation, pp. 583–598, October 2014Google Scholar
  10. 10.
    The LLVM Compiler Infrastructure. http://llvm.org
  11. 11.
    Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: STEP: a distributed OpenMP for coarse-grain parallelism tool. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 83–99. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  12. 12.
    Numrich, R.W., Reid, J.: Co-array fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)CrossRefGoogle Scholar
  13. 13.
    Ojima, Y., Sato, M., Harada, H., Ishikawa, Y.: Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system. In: Cluster Computing and the Grid, pp. 450–456, May 2003Google Scholar
  14. 14.
    OpenMP Application Program Interface. http://www.openmp.org/
  15. 15.
    OpenMP, A.R.B.: OpenMP version 4.0, May 2013Google Scholar
  16. 16.
    Rowstron, A., et al.: Nobody ever got fired for using hadoop on a cluster. In: Workshop on Hot Topics in Cloud Data Processing, pp. 2:1–2:5 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Arpith C. Jacob
    • 1
  • Ravi Nair
    • 1
  • Alexandre E. Eichenberger
    • 1
  • Samuel F. Antao
    • 1
  • Carlo Bertolli
    • 1
  • Tong Chen
    • 1
  • Zehra Sura
    • 1
  • Kevin O’Brien
    • 1
  • Michael Wong
    • 2
  1. 1.IBM T.J. Watson Research CenterYorktown HeightsUSA
  2. 2.IBM Software GroupTorontoCanada

Personalised recommendations