Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach

Jacob, Arpith C.; Nair, Ravi; Eichenberger, Alexandre E.; Antao, Samuel F.; Bertolli, Carlo; Chen, Tong; Sura, Zehra; O’Brien, Kevin; Wong, Michael

doi:10.1007/978-3-319-24595-9_3

Arpith C. Jacob¹⁸,
Ravi Nair¹⁸,
Alexandre E. Eichenberger¹⁸,
Samuel F. Antao¹⁸,
Carlo Bertolli¹⁸,
Tong Chen¹⁸,
Zehra Sura¹⁸,
Kevin O’Brien¹⁸ &
…
Michael Wong¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9342))

Included in the following conference series:

International Workshop on OpenMP

798 Accesses
6 Citations

Abstract

Modern high-performance machines are challenging to program because of the availability of a wide array of compute resources that often requires low-level, specialized knowledge to exploit. OpenMP is an effective directive-based approach that can effectively exploit shared-memory multicores. The recently introduced OpenMP 4.0 standard extends the directive-based approach to exploit accelerators. However, programming clusters still requires the use of other specialized languages or libraries.

In this work we propose the use of the target offloading constructs to program nodes distributed in a cluster. We introduce an abstract model of a cluster that defines a clique of distinct shared-memory domains that are manipulated with the target constructs. We have implemented this model in the LLVM compiler with an OpenMP runtime that supports transparent offloading to nodes in a cluster using MPI. Our initial results on HMMER, a widely used Bioinformatics tool, show excellent scaling behavior with a small constant-factor overhead as compared to a baseline MPI implementation. Our work raises the intriguing possibility of a natural progression of a program compiled for serial execution, to parallel execution on a multicore, to offloading onto accelerators, and finally extendible with minimal additional effort onto a cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For example, the IBM Power^® System E880 is configurable up to 16 TB. See http://www-03.ibm.com/systems/power/hardware/e880/.
2.
HMMER 3.1b2: http://hmmer.org.

References

Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the chapel language. J. High Perf. Comput. Appl. 21(3), 291–312 (2007)
Article Google Scholar
Charles, P., et al.: X10: An object-oriented approach to non-uniform cluster computing. In: Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 519–538 (2005)
Google Scholar
Clang: A C language family frontend for LLVM. http://clang.llvm.org
Eichenberger, A.E., O’Brien, K.: Experimenting with low-overhead OpenMP runtime on IBM Blue Gene/Q. IBM J. Res. Dev. 57(1/2), 8:1–8:8 (2013)
Article Google Scholar
El-Ghazawi, T., Smith, L.: UPC: Unified parallel C. In: Supercomputing (2006)
Google Scholar
Hoeflinger, J.P.: Extending OpenMP to clusters (2006)
Google Scholar
Hu, Y., Lu, H., Cox, A.L., Zwaenepoel, W.: OpenMP for networks of SMPs. J. Parallel Distrib. Comput. 60(12), 1512–1530 (2000)
Article MATH Google Scholar
Kogge, P.M.: Performance analysis of a large memory application on multiple architectures. In: Conference on Partitioned Global Address Space Programming Models (2013)
Google Scholar
Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Operating Systems Design and Implementation, pp. 583–598, October 2014
Google Scholar
The LLVM Compiler Infrastructure. http://llvm.org
Millot, D., Muller, A., Parrot, C., Silber-Chaussumier, F.: STEP: a distributed OpenMP for coarse-grain parallelism tool. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 83–99. Springer, Heidelberg (2008)
Chapter Google Scholar
Numrich, R.W., Reid, J.: Co-array fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)
Article Google Scholar
Ojima, Y., Sato, M., Harada, H., Ishikawa, Y.: Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system. In: Cluster Computing and the Grid, pp. 450–456, May 2003
Google Scholar
OpenMP Application Program Interface. http://www.openmp.org/
OpenMP, A.R.B.: OpenMP version 4.0, May 2013
Google Scholar
Rowstron, A., et al.: Nobody ever got fired for using hadoop on a cluster. In: Workshop on Hot Topics in Cloud Data Processing, pp. 2:1–2:5 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T.J. Watson Research Center, 1101 Kitchawan Rd., Yorktown Heights, NY, USA
Arpith C. Jacob, Ravi Nair, Alexandre E. Eichenberger, Samuel F. Antao, Carlo Bertolli, Tong Chen, Zehra Sura & Kevin O’Brien
IBM Software Group, Toronto, ON, Canada
Michael Wong

Authors

Arpith C. Jacob
View author publications
You can also search for this author in PubMed Google Scholar
Ravi Nair
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre E. Eichenberger
View author publications
You can also search for this author in PubMed Google Scholar
Samuel F. Antao
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Bertolli
View author publications
You can also search for this author in PubMed Google Scholar
Tong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zehra Sura
View author publications
You can also search for this author in PubMed Google Scholar
Kevin O’Brien
View author publications
You can also search for this author in PubMed Google Scholar
Michael Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arpith C. Jacob .

Editor information

Editors and Affiliations

RWTH Aachen University, Aachen, Germany
Christian Terboven
Lawrence Livermore National Laboratory, Livermore, California, USA
Bronis R. de Supinski
RWTH Aachen University, Aachen, Germany
Pablo Reble
University of Houston, Houston, California, USA
Barbara M. Chapman
RWTH Aachen University, Aachen, Germany
Matthias S. Müller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jacob, A.C. et al. (2015). Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach. In: Terboven, C., de Supinski, B., Reble, P., Chapman, B., Müller, M. (eds) OpenMP: Heterogenous Execution and Data Movements. IWOMP 2015. Lecture Notes in Computer Science(), vol 9342. Springer, Cham. https://doi.org/10.1007/978-3-319-24595-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-24595-9_3
Published: 26 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24594-2
Online ISBN: 978-3-319-24595-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics