Exploiting Fine- and Coarse-Grained Parallelism Using a Directive Based Approach
Modern high-performance machines are challenging to program because of the availability of a wide array of compute resources that often requires low-level, specialized knowledge to exploit. OpenMP is an effective directive-based approach that can effectively exploit shared-memory multicores. The recently introduced OpenMP 4.0 standard extends the directive-based approach to exploit accelerators. However, programming clusters still requires the use of other specialized languages or libraries.
In this work we propose the use of the target offloading constructs to program nodes distributed in a cluster. We introduce an abstract model of a cluster that defines a clique of distinct shared-memory domains that are manipulated with the target constructs. We have implemented this model in the LLVM compiler with an OpenMP runtime that supports transparent offloading to nodes in a cluster using MPI. Our initial results on HMMER, a widely used Bioinformatics tool, show excellent scaling behavior with a small constant-factor overhead as compared to a baseline MPI implementation. Our work raises the intriguing possibility of a natural progression of a program compiled for serial execution, to parallel execution on a multicore, to offloading onto accelerators, and finally extendible with minimal additional effort onto a cluster.
- 2.Charles, P., et al.: X10: An object-oriented approach to non-uniform cluster computing. In: Conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 519–538 (2005)Google Scholar
- 3.Clang: A C language family frontend for LLVM. http://clang.llvm.org
- 5.El-Ghazawi, T., Smith, L.: UPC: Unified parallel C. In: Supercomputing (2006)Google Scholar
- 6.Hoeflinger, J.P.: Extending OpenMP to clusters (2006)Google Scholar
- 8.Kogge, P.M.: Performance analysis of a large memory application on multiple architectures. In: Conference on Partitioned Global Address Space Programming Models (2013)Google Scholar
- 9.Li, M., et al.: Scaling distributed machine learning with the parameter server. In: Operating Systems Design and Implementation, pp. 583–598, October 2014Google Scholar
- 10.The LLVM Compiler Infrastructure. http://llvm.org
- 13.Ojima, Y., Sato, M., Harada, H., Ishikawa, Y.: Performance of cluster-enabled OpenMP for the SCASH software distributed shared memory system. In: Cluster Computing and the Grid, pp. 450–456, May 2003Google Scholar
- 14.OpenMP Application Program Interface. http://www.openmp.org/
- 15.OpenMP, A.R.B.: OpenMP version 4.0, May 2013Google Scholar
- 16.Rowstron, A., et al.: Nobody ever got fired for using hadoop on a cluster. In: Workshop on Hot Topics in Cloud Data Processing, pp. 2:1–2:5 (2012)Google Scholar