Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation

  • S. R. Alam
  • R. F. Barrett
  • H. Jagode
  • J. A. Kuehn
  • S. W. Poole
  • R. Sankaran
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5704)


An upgrade from dual-core to quad-core AMD processor on the Cray XT system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) has resulted in significant changes in the hardware and software stack, including a deeper memory hierarchy, SIMD instructions and a multi-core aware MPI library. In this paper, we evaluate impact of a subset of these key changes on large-scale scientific applications. We will provide insights into application tuning and optimization process and report on how different strategies yield varying rates of successes and failures across different application domains. For instance, we demonstrate that the vectorization instructions (SSE) provide a performance boost of as much as 50% on fusion and combustion applications. Moreover, we reveal how the resource contentions could limit the achievable performance and provide insights into how application could exploit Petascale XT5 system’s hierarchical parallelism.


Direct Numerical Simulation International Thermonuclear Experimental Reactor5 Software Stack LINPACK Benchmark Embarrassingly Parallel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Dagum98]
    Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (1998)CrossRefGoogle Scholar
  2. [Snir98]
    Snir, M., Gropp, W.D., et al. (eds.): MPI – the complete reference (2-volume set), 2nd edn. MIT Press, Cambridge (1998)Google Scholar
  3. [BGL05]
    Gara, A., et al.: Overview of the Blue Gene/L system architecture. IBM Journal of Research and Development, 49(2-3) (2005)Google Scholar
  4. [BGP08]
    Vetter, J.S., et al.: Early Evaluation of IBM BlueGene/P. In: Proceedings of Supercomputing (2008)Google Scholar
  5. [XT3a]
    Camp, W.J., Tomkins, J.L.: Thor’s hammer: The first version of the Red Storm MPP architecture. In: Proceedings of Conference on High Performance Networking and Computing, Baltimore, MD (November 2002)Google Scholar
  6. [XT3b]
    Vetter, J.S., Alam, S.R., et al.: Early Evaluation of the Cray XT3. In: Proc. IEEE International Parallel and Distributed Processing Symposium, IPDPS (2006)Google Scholar
  7. [XT4a]
    Alam, S.R., Barrett, R.F., et al.: Cray XT4: An Early Evaluation for Petascale Scientific Simulation. In: Proceedings of the IEEE/ACM Conference on Supercomputing SC 2007 (2007) Google Scholar
  8. [XT4b]
    Alam, S.R., Barrett, R.F., et al.: The Cray XT4 Quad-core: A First Look. In: Proceedings of the 50th Cray User Group (2008)Google Scholar
  9. [Kelly05]
    Kelly, S., Brightwell, R.: Software architecture of the lightweight kernel, catamount. In: Proceedings of the 47th Cray User Group (2005)Google Scholar
  10. [HPCCa]
    Luszczek, P., Dongarra, J., et al.: Introduction to the HPC Challenge Benchmark Suite (March 2005)Google Scholar
  11. [HPCCb]
    High Performance Computing Challenge Benchmark Suite Website,
  12. [AORSA08]
    Barrett, R.F., Chan, T., et al.: A complex-variables version of high performance computing LINPACK benchmark, HPL (2008) (in preparation)Google Scholar
  13. [Jaeger06]
    Jaeger, E.F., Berry, L.A., et al.: Self-consistent full-wave and Fokker-Planck calculations for ion cyclotron heating in non-Maxwellian plasmas. Physics of Plasmas (May 13, 2006)Google Scholar
  14. [Jaeger07]
    Jaeger, E.F., Berry, L.A., et al.: Simulation of high power ICRF wave heating in the ITER burning plasma. In: Jaeger, E.F., Berry, L.A. (eds.) Proceedings of the 49th Annual Meeting of the Division of Plasma Physics of the American Physical Society, vol. 52. Bulletin of the American Physical Society (2007)Google Scholar
  15. [Dongarra90]
    Dongarra, J.J., DuCroz, J., et al.: A set of level 3 basic linear algebra subprograms. ACM Trans.on Math. Soft. 16, 1–17 (1990)CrossRefzbMATHGoogle Scholar
  16. [Langou07]
    Langou, J., Luszczek, P., et al.: Tools and techniques for exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative re_nement for linear systems). In: Proc. ACM/IEEE Supercomputing (SC 2006) (2006) Google Scholar
  17. [Chen06]
    Chen, J.H., Hawkes, E.R., et al.: Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics. Combustion and flame 145, 128–144 (2006)CrossRefGoogle Scholar
  18. [Sankaran07]
    Sankaran, R., Hawkes, E.R., et al.: Structure of a spatially developing turbulent lean methane-air Bunsen flame. Proceedings of the combustion institute 31, 1291–1298 (2007)CrossRefGoogle Scholar
  19. [Hawkes07]
    Hawkes, E.R., Sankaran, R., et al.: Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics. Proceedings of the combustion institute 31, 1633–1640 (2007)CrossRefGoogle Scholar
  20. [Kennedy00]
    Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations. Applied numerical mathematics 35(3), 177–264 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  21. [Scal]
    The ScaLAPACK Project,
  22. [Petit04]
    Petitet, A., Whaley, R.C., Dongarra, J.J., Cleary, A.: HPL: A portable high-performance LINPACK benchmark for distributed-memory computers (January 2004),
  23. [PAPI00]
    Browne, S., Dongarra, J., et al.: A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters. In: Proceedings of Supercomputing (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • S. R. Alam
    • 1
  • R. F. Barrett
    • 1
  • H. Jagode
    • 1
  • J. A. Kuehn
    • 1
  • S. W. Poole
    • 1
  • R. Sankaran
    • 1
  1. 1.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations