Compiler-Assisted Source-to-Source Skeletonization of Application Models for System Simulation

  • Jeremiah J. WilkeEmail author
  • Joseph P. Kenny
  • Samuel Knight
  • Sebastien Rumley
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10876)


Performance modeling of networks through simulation requires application endpoint models that inject traffic into the simulation models. Endpoint models today for system-scale studies consist mainly of post-mortem trace replay, but these off-line simulations may lack flexibility and scalability. On-line simulations running so-called skeleton applications run reduced versions of an application that generate traffic that is the same or similar to the full application. These skeleton apps have advantages for flexibility and scalability, but they often must be custom written for the simulator itself. Auto-skeletonization of existing application source code via compiler tools would provide endpoint models with minimal development effort. These source-to-source transformations have been only narrowly explored. We introduce a pragma language and corresponding Clang-driven source-to-source compiler that performs auto-skeletonization based on provided pragma annotations. We describe the compiler toolchain, validate the generated skeletons, and show scalability of the generated simulation models beyond 100 K endpoints for example MPI applications. Overall, we assert that our proposed auto-skeletonization approach and the flexible skeletons it produces can be an important tool in realizing balanced exascale interconnect designs.



This work was funded by Sandia National Laboratories, which is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) under contract DE-NA-0003525.


  1. 1.
  2. 2.
    Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics (LULESH).
  3. 3.
    The Mantevo Project.
  4. 4.
    Bao, W., et al.: Static and dynamic frequency scaling on multicore cpus. ACM Trans. Archit. Code Optim. 13(4), 51:1–51:26 (2016)CrossRefGoogle Scholar
  5. 5.
    Binkert, N., et al.: The gem5 simulator. SIGARCH Comput. Archit. News 39(2), 1–7 (2011)CrossRefGoogle Scholar
  6. 6.
    Chan, C.P., et al.: Topology-aware performance optimization and modeling of adaptive mesh refinement codes for exascale. In: International Workshop on Communication Optimizations in HPC (COMHPC), pp. 17–28. IEEE (2016)Google Scholar
  7. 7.
    Chennupati, G., et al.: AMM: scalable memory reuse model to predict the performance of physics codes. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 649–650 (2017)Google Scholar
  8. 8.
    Degomme, A., Legrand, A., Markomanolis, G.S., Quinson, M., Stillwell, M., Suter, F.: Simulating MPI applications: the SMPI approach. IEEE Trans. Parallel Distrib. Syst. 28, 2387–2400 (2017)CrossRefGoogle Scholar
  9. 9.
    Desprez, F., Markomanolis, G., Quinson, M., Suter, F.: Assessing the performance of MPI applications through time-independent trace replay. In: PSTI 2011: Second International Workshop on Parallel Software Tools and Tool Infrastructures (2011)Google Scholar
  10. 10.
    Fujimoto, R.M.: Parallel discrete event simulation. Commun. ACM 33, 30–53 (1990)CrossRefGoogle Scholar
  11. 11.
    Gropp, W., Lusk, E.L., Skjellum, A.: Using MPI - 2nd Edition: Portable Parallel Programming with the Message Passing Interface. The MIT Press, Cambridge (1999)Google Scholar
  12. 12.
    Groves, T., et al.: (SAI) Stalled, Active and Idle: characterizing power and performance of large-scale dragonfly networks. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 50–59 (2016)Google Scholar
  13. 13.
    Guo, J., Yi, Q., Meng, J., Zhang, J., Balaji, P.: Compiler-assisted overlapping of communication and computation in MPI applications. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), pp. 60–69 (2016)Google Scholar
  14. 14.
    Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: simulating large-scale applications in the LogGOPS model. In: HPDC 2010: 19th ACM International Symposium on High Performance Distributed Computing, pp. 597–604 (2010)Google Scholar
  15. 15.
    Jain, N., et al.: Evaluating HPC networks via simulation of parallel workloads. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 154–165 (2016)Google Scholar
  16. 16.
    Jiang, N., Becker, D.U., Michelogiannakis, G., Balfour, J.D., Towles, B., Shaw, D.E., Kim, J., Dally, W.J.: A detailed and flexible cycle-accurate Network-on-Chip simulator. In: ISPASS, pp. 86–96 (2013)Google Scholar
  17. 17.
    Minkenberg, C.: HPC networks: challenges and the role of optics. In: Optical Fiber Communications Conference and Exhibition (OFC), 2015, pp. 1–3. IEEE (2015)Google Scholar
  18. 18.
    Preissl, R., Schulz, M., Kranzlmüller, D., de Supinski, B.R., Quinlan, D.J.: Using MPI communication patterns to guide source code transformations. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part III. LNCS, vol. 5103, pp. 253–260. Springer, Heidelberg (2008). Scholar
  19. 19.
    Rodrigues, A., et al.: Improvements to the structural simulation toolkit. In: International Conference on Simulation Tools and Techniques, pp. 190–195 (2012)Google Scholar
  20. 20.
    Rumley, S., et al.: Optical interconnects for extreme scale computing systems. Parallel Comput. 64, 65–80 (2017)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Snavely, A., et al.: A framework for performance modeling and prediction. In: SC 2002: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–17 (2002)Google Scholar
  22. 22.
    Sottile, M., et al.: Semi-automatic extraction of software skeletons for benchmarking large-scale parallel applications. In: PADS 2013: ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, pp. 1–10 (2013)Google Scholar
  23. 23.
    Strout, M.M., Kreaseck, B., Hovland, P.D.: Data-flow analysis for MPI programs. In: ICPP 2006: International Conference on Parallel Processing, pp. 175–184 (2006)Google Scholar
  24. 24.
    Susukita, R., et al.: Performance prediction of large-scale parallel system and application using macro-level simulation. In: SC 2008: International Conference for High Performance Computing, Networking, Storage and Analysis (2008)Google Scholar
  25. 25.
    Wilke, J.J., Sargsyan, K., Kenny, J.P., Debusschere, B., Najm, H.N., Hendry, G.: Validation and Uncertainty assessment of extreme-scale HPC simulation through Bayesian inference. In: Wolf, F., Mohr, B., an Mey, D. (eds.) Euro-Par 2013. LNCS, vol. 8097, pp. 41–52. Springer, Heidelberg (2013). Scholar
  26. 26.
    Xu, Q.: Automatic Construction of Coordinated Performance Skeletons, p. 84 (2007)Google Scholar
  27. 27.
    Zhang, W., Almgren, A.S., Day, M., Nguyen, T., Shalf, J., Unat, D.: Boxlib with tiling: An AMR software framework. CoRR abs/1604.03570 (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Jeremiah J. Wilke
    • 1
    Email author
  • Joseph P. Kenny
    • 1
  • Samuel Knight
    • 1
  • Sebastien Rumley
    • 2
  1. 1.Sandia National LaboratoriesLivermoreUSA
  2. 2.Lightwave Research LaboratoryColumbia UniversityNew York CityUSA

Personalised recommendations