Tools for Simulation and Benchmark Generation at Exascale

  • Mahesh Lagadapati
  • Frank Mueller
  • Christian EngelmannEmail author
Conference paper


The path to exascale high-performance computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. Simulations using models of future HPC systems and communication traces from applications running on existing HPC systems can offer an insight into the performance of future architectures. This work targets technology developed for scalable application tracing of communication events and memory profiles, but can be extended to other areas, such as I/O, control flow, and data flow. It further focuses on extreme-scale simulation of millions of Message Passing Interface (MPI) ranks using a lightweight parallel discrete event simulation (PDES) toolkit for performance evaluation. Instead of simply replaying a trace within a simulation, the approach is to generate a benchmark from it and to run this benchmark within a simulation using models to reflect the performance characteristics of future-generation HPC systems. This provides a number of benefits, such as eliminating the data intensive trace replay and enabling simulations at different scales. The presented work utilizes the ScalaTrace tool to generate scalable trace files, the ScalaBenchGen tool to generate the benchmark, and the xSim tool to run the benchmark within a simulation.


Message Passing Interface Trace File Benchmark Generator Delta Time Message Passing Interface Application 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by NSF grants 1217748, 0937908 and 0958311, as well, as a subcontract from ORNL. Research sponsored in part by the Laboratory Directed Research and Development Program of ORNL, managed by UT-Battelle, LLC for the U.S. Department of Energy under Contract No. De-AC05-00OR22725. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.


  1. 1.
    MPI-2: Extensions to the Message Passing Interface (July 1997).
  2. 2.
    Havlak, P., Kennedy, K.: An implementation of interprocedural bounded regular section analysis. IEEE Trans. Parallel Distrib. Syst. 2(3), 350–360 (1991)CrossRefGoogle Scholar
  3. 3.
    Marathe, J., Mueller, F.: Detecting memory performance bottlenecks via binary rewriting. In: Workshop on Binary Translation (Sept 2002)Google Scholar
  4. 4.
    Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: International Parallel and Distributed Processing Symposium, Long Beach (April 2007)Google Scholar
  5. 5.
    Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: ScalaTrace: scalable compression and replay of communication traces in high performance computing. J. Parallel Distrib. Comput. 69(8), 969–710 (2009)CrossRefGoogle Scholar
  6. 6.
    Ratn, P., Mueller, F., de Supinski, B.R., Schulz, M.: Preserving time in large-scale communication traces. In: International Conference on Supercomputing, Island of Kos, pp. 46–55 (June 2008)Google Scholar
  7. 7.
    Vetter, J.S., de Supinski, B.R.: Dynamic software testing of MPI applications with umpire. In: Supercomputing, Dallas, p. 51 (2000)Google Scholar
  8. 8.
    Wu, X., Deshpande, V., Mueller, F.: ScalaBenchGen: auto-generation of communication benchmark traces. In: International Parallel and Distributed Processing Symposium, Shanghai (April 2012)Google Scholar
  9. 9.
    Wu, X., Mueller, F.: ScalaExtrap: trace-based communication extrapolation for SPMD programs. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Antonio, pp. 113–122 (Feb 2011)Google Scholar
  10. 10.
    Wu, X., Mueller, F.: Elastic and scalable tracing and accurate replay of non-deterministic events. In: International Conference on Supercomputing, Eugene, pp. 59–68 (June 2013)Google Scholar
  11. 11.
    Wu, X., Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Probabilistic communication and i/o tracing with deterministic replay at scale. In: International Conference on Parallel Processing, Taipei, pp. 196–205 (Sept 2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Mahesh Lagadapati
    • 1
  • Frank Mueller
    • 1
  • Christian Engelmann
    • 2
    Email author
  1. 1.Department of Computer ScienceNorth Carolina State UniversityRaleighUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations