Tools for Simulation and Benchmark Generation at Exascale
The path to exascale high-performance computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. Simulations using models of future HPC systems and communication traces from applications running on existing HPC systems can offer an insight into the performance of future architectures. This work targets technology developed for scalable application tracing of communication events and memory profiles, but can be extended to other areas, such as I/O, control flow, and data flow. It further focuses on extreme-scale simulation of millions of Message Passing Interface (MPI) ranks using a lightweight parallel discrete event simulation (PDES) toolkit for performance evaluation. Instead of simply replaying a trace within a simulation, the approach is to generate a benchmark from it and to run this benchmark within a simulation using models to reflect the performance characteristics of future-generation HPC systems. This provides a number of benefits, such as eliminating the data intensive trace replay and enabling simulations at different scales. The presented work utilizes the ScalaTrace tool to generate scalable trace files, the ScalaBenchGen tool to generate the benchmark, and the xSim tool to run the benchmark within a simulation.
KeywordsMessage Passing Interface Trace File Benchmark Generator Delta Time Message Passing Interface Application
This work was supported in part by NSF grants 1217748, 0937908 and 0958311, as well, as a subcontract from ORNL. Research sponsored in part by the Laboratory Directed Research and Development Program of ORNL, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract No. De-AC05-00OR22725. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.
- 1.MPI-2: Extensions to the Message Passing Interface (July 1997). http://micro.ustc.edu.cn/Linux/MPI/mpi-20.pdf
- 3.Marathe, J., Mueller, F.: Detecting memory performance bottlenecks via binary rewriting. In: Workshop on Binary Translation (Sept 2002)Google Scholar
- 4.Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: International Parallel and Distributed Processing Symposium, Long Beach (April 2007)Google Scholar
- 6.Ratn, P., Mueller, F., de Supinski, B.R., Schulz, M.: Preserving time in large-scale communication traces. In: International Conference on Supercomputing, Island of Kos, pp. 46–55 (June 2008)Google Scholar
- 7.Vetter, J.S., de Supinski, B.R.: Dynamic software testing of MPI applications with umpire. In: Supercomputing, Dallas, p. 51 (2000)Google Scholar
- 8.Wu, X., Deshpande, V., Mueller, F.: ScalaBenchGen: auto-generation of communication benchmark traces. In: International Parallel and Distributed Processing Symposium, Shanghai (April 2012)Google Scholar
- 9.Wu, X., Mueller, F.: ScalaExtrap: trace-based communication extrapolation for SPMD programs. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Antonio, pp. 113–122 (Feb 2011)Google Scholar
- 10.Wu, X., Mueller, F.: Elastic and scalable tracing and accurate replay of non-deterministic events. In: International Conference on Supercomputing, Eugene, pp. 59–68 (June 2013)Google Scholar
- 11.Wu, X., Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Probabilistic communication and i/o tracing with deterministic replay at scale. In: International Conference on Parallel Processing, Taipei, pp. 196–205 (Sept 2011)Google Scholar