From MPI to OpenSHMEM: Porting LAMMPS
This work details the opportunities and challenges of porting a Petascale, MPI-based application —LAMMPS— to OpenSHMEM. We investigate the major programming challenges stemming from the differences in communication semantics, address space organization, and synchronization operations between the two programming models. This work provides several approaches to solve those challenges for representative communication patterns in LAMMPS, e.g., by considering group synchronization, peer’s buffer status tracking, and unpacked direct transfer of scattered data. The performance of LAMMPS is evaluated on the Titan HPC system at ORNL. The OpenSHMEM implementations are compared with MPI versions in terms of both strong and weak scaling. The results outline that OpenSHMEM provides a rich semantic to implement scalable scientific applications. In addition, the experiments demonstrate that OpenSHMEM can compete with, and often improve on, the optimized MPI implementation.
KeywordsMessage Passing Interface Address Space Collective Operation Strong Scaling Synchronization Operation
This material is based upon work supported by the U.S. Department of Energy, under contract #DE-AC05-00OR22725, through UT Battelle subcontract #4000123323. The work at Oak Ridge National Laboratory (ORNL) is supported by the United States Department of Defense and used the resources of the Extreme Scale Systems Center located at the ORNL.
- 1.Using the GNI and DMAPP APIs. Technical Report S-2446-3103, Cray Inc. (2011). http://docs.cray.com/books/S-2446-3103/S-2446-3103.pdf
- 2.OpenSHMEM application programming interface (version 1.2). Technical report, Open Source Software Solutions, Inc. (OSSS) (2015). http://www.openshmem.org
- 3.Barriuso, R., Knies, A.: SHMEM’s user’s guide for C. Technical report, Cray Research Inc. (1994)Google Scholar
- 5.Jose, J., Potluri, S., Subramoni, H., Lu, X., Hamidouche, K., Schulz, K., Sundar, H., Panda, D.K.: Designing scalable out-of-core sorting with hybrid MPI+PGAS programming models. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 7:1–7:9. ACM, New York (2014). doi: 10.1145/2676870.2676880
- 7.Li, M., Lin, J., Lu, X., Hamidouche, K., Tomko, K., Panda, D.K.: Scalable MiniMD design with hybrid MPI and OpenSHMEM. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS 2014, pp. 24:1–24:4. ACM, New York (2014). doi: 10.1145/2676870.2676893
- 8.Li, M., Lu, X., Potluri, S., Hamidouche, K., Jose, J., Tomko, K., Panda, D.: Scalable graph500 design with MPI-3 RMA. In: 2014 IEEE International Conference on Cluster Computing (CLUSTER), pp. 230–238, September 2014. doi: 10.1109/CLUSTER.2014.6968755
- 9.MPI Forum. MPI: A Message-Passing Interface Standard (Version 2.2). High Performance Computing Center Stuttgart (HLRS), September 2009Google Scholar
- 10.Plimpton, S.: Parallel FFT package. Technical report, Sandia National Labs. http://www.sandia.gov/~sjplimp/docs/fft/README.html
- 13.Pophale, S., Nanjegowda, R., Curtis, T., Chapman, B., Jin, H., Poole, S., Kuehn, J.: OpenSHMEM Performance and Potential: An NPB Experimental Study. In: Proceedings of the 6th Conference on Partitioned Global Address Space Programming Model, PGAS 2012. ACM, New York (2012)Google Scholar