The co-design architecture for exascale systems, a novel approach for scalable designs

  • Gilad Shainer
  • Todd Wilde
  • Pak Lui
  • Tong Liu
  • Michael Kagan
  • Mike Dubman
  • Yiftah Shahar
  • Richard Graham
  • Pavel Shamis
  • Steve Poole
Special Issue Paper

Abstract

High performance computing (HPC) has begun scaling beyond the Petaflop range towards the Exaflop (1000 Petaflops) mark. One of the major concerns throughout the development toward such performance capability is scalability—both at the system level and the application layer. In this paper we present a novel approach for a new design concept—the co-design approach with enables a tighter development of both the application communication libraries and the underlying hardware interconnect solution in order to overcome scalability issues and to enable a more efficient design approach towards Exascale computing. We have suggested a new application programing interface and have demonstrated a 50x improvement of performance and scalability increases.

Keywords

Co-design Exascale ScalableHPC ScalableSHMEM 

Notes

Acknowledgements

Part of this work was supported by the United States Department of Defense and used resources of the Extreme Scale Systems Center at Oak Ridge National Laboratory.

References

  1. 1.
    Venkata MG, Graham RL, Ladd JS, Shamis P, Rabinovitz I, Filipov V, Shainer G (2011) ConnectX-2 CORE-direct enabled asynchronous broadcast collective communications. Workshop on communication architecture for scalable systems, held in conjunction with IPDPS Google Scholar
  2. 2.
    Graham R, Venkata MG, Ladd J, Shamis P, Rabinovitz I, Filipov V, Shainer G (2011) Cheetah: a framework for scalable hierarchical collective operations. In: CCGrid Google Scholar
  3. 3.
    Rabinovitz I, Shamis P, Graham RL, Bloch N, Shainer G (2010) Network offloaded hierarchical collectives using ConnectX-2’s CORE-direct capabilities. In: EuroMPI 2010, September 2010, Stuttgart, Germany Google Scholar
  4. 4.
    Graham L, Poole S, Shamis P, Bloch G, Bloch N, Chapman H, Kagan M, Shahar A, Rabinovitz I, Shainer G (2010) ConnectX-2 InfiniBand management queues: first investigation of the new support for network offloaded collective operations. In: 10th IEEE/ACM international conference on cluster, May 2010, Melbourne, Victoria, Australia Google Scholar
  5. 5.
    InfiniBand Trade Association. www.infinibandta.org/
  6. 6.
    HPC Advisory Council HPC center. http://www.hpcadvisorycouncil.com/cluster_center.php
  7. 7.
    The TOP500 list. www.top500.org

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Gilad Shainer
    • 1
  • Todd Wilde
    • 1
  • Pak Lui
    • 1
  • Tong Liu
    • 1
  • Michael Kagan
    • 1
  • Mike Dubman
    • 1
  • Yiftah Shahar
    • 1
  • Richard Graham
    • 2
  • Pavel Shamis
    • 2
  • Steve Poole
    • 2
  1. 1.Mellanox TechnologiesSunnyvaleUSA
  2. 2.Oak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations