Predictive analysis of a hydrodynamics application on large-scale CMP clusters

  • J. A. Davis
  • G. R. Mudalige
  • S. D. Hammond
  • J. A. Herdman
  • I. Miller
  • S. A. Jarvis
Special Issue Paper

Abstract

We present the development of a predictive performance model for the high-performance computing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The developed model elucidates the parallel computation of Hydra, with which it is possible to predict its run-time and scaling performance on varying large-scale chip multiprocessor (CMP) clusters. A key feature of the model is its granularity; with the model we are able to separate the contributing costs, including computation, point-to-point communications, collectives, message buffering and message synchronisation. The predictions are validated on two contrasting large-scale HPC systems, an AMD Opteron/InfiniBand cluster and an IBM BlueGene/P, both of which are located at the Lawrence Livermore National Laboratory (LLNL) in the US. We validate the model on up to 2,048 cores, where it achieves a >85% accuracy in weak-scaling studies. We also demonstrate use of the model in exposing the increasing costs of collectives for this application, and also the influence of node density on network accesses, therefore highlighting the impact of machine choice when running this hydrodynamics application at scale.

Keywords

Hydrodynamics Performance modelling High performance computing Multi-core 

References

  1. 1.
    Barker KJ, Davis K, Kerbyson DJ (2009) Performance modeling in action: performance prediction of a cray XT4 system during upgrade. In: IEEE international on parallel and distributed processing symposium (IPDPS 2009), May 2009 Google Scholar
  2. 2.
    Benson G, Chu CW, Huang Q, Caglar S (2003) A comparison of MPICH allgather algorithms on switched networks. Recent advances in parallel virtual machine and message passing interface, pp 335–343 Google Scholar
  3. 3.
    Hammond SD, Mudalige GR, Smith JA, Jarvis SA, Herdman JA, Vadgama A (2009) WARPP—a toolkit for simulating high-performance parallel scientific codes. In: Proc 2nd international conference on simulation tools and techniques, March 2009 Google Scholar
  4. 4.
    Hoisie A, Lubeck H, Wasserman HJ (2000) Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int J High Perform Comput Appl 14(4):330–346 CrossRefGoogle Scholar
  5. 5.
    Hoisie A, Johnson G, Kerbyson DJ, Lang M, Pakin S (2006) A performance comparison through benchmarking and modeling of three leading supercomputers: blue gene/L, red storm, and purple. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing (SC 2006), pp 74–84 CrossRefGoogle Scholar
  6. 6.
    Johnson G, Kerbyson DJ, Lang M (2008) Optimization of infiniband for scientific applications. In: IEEE international parallel and distributed processing symposium (IPDPS 2008), April 2008 Google Scholar
  7. 7.
    Kerbyson DJ, Alme HJ, Hoisie A, Petrini F, Wasserman HJ, Gittings M (2001) Predictive performance and scalability modelling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE conference on supercomputing (SC 2001) Google Scholar
  8. 8.
    Kerbyson DJ, Hoisie A, Wasserman HJ (2002) Use of predictive performance modeling during large-scale systems installation. In: 1st int workshop on hardware/software support for parallel and distributed scientific and engineering computing Google Scholar
  9. 9.
    Kerbyson DJ, Lang M, Johnson G (2008) Infiniband routing table optimizations for scientific applications. Parallel Process Lett 18(4):589–608 MathSciNetCrossRefGoogle Scholar
  10. 10.
    Mathis MM, Kerbyson DJ (2004) Performance modeling of unstructured mesh particle transport computations. In: Proceedings of international parallel and distributed processing symposium (IPDPS), Santa Fe, NM, April 2004 Google Scholar
  11. 11.
    Mathis MM, Amato NM, Adams ML (2000) A general performance model for parallel sweeps on orthogonal grids for particle transport calculations. Technical report, Texas A&M University Google Scholar
  12. 12.
    Mudalige GR, Vernon MK, Jarvis SA (2008) A plug-and-play model for evaluating wavefront computations on parallel architectures. In: IEEE international parallel and distributed processing symposium (IPDPS 2008), April 2008 Google Scholar
  13. 13.
    Petrini F, Kerbyson DJ, Pakin S (2003) The case of the missing supercomputer performance: achieving optimal performance on the 8, 192 processors of ASCI Q. In: Proceedings of the 2003 ACM/IEEE conference on supercomputing (SC 2003), pp 55–62 CrossRefGoogle Scholar
  14. 14.
    Reussner R, Sanders P, Muller M (1998) SKaMPI: a detailed, accurate MPI benchmark. Recent advances in parallel virtual machine and message passing interface, pp 52–59 Google Scholar
  15. 15.
    Sundaram-Stukel D, Vernon MK (1999) Predictive analysis of a wavefront application using LogGP. In: Proceedings of principles and practice of parallel programming, pp 141–150 Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • J. A. Davis
    • 1
  • G. R. Mudalige
    • 2
  • S. D. Hammond
    • 1
  • J. A. Herdman
    • 3
  • I. Miller
    • 3
  • S. A. Jarvis
    • 1
  1. 1.Performance Computing and Visualisation, Department of Computer ScienceUniversity of WarwickCoventryUK
  2. 2.Oxford eResearch CentreUniversity of OxfordOxfordUK
  3. 3.Atomic Weapons EstablishmentAldermaston, ReadingUK

Personalised recommendations