ScELA: Scalable and Extensible Launching Architecture for Clusters

  • Jaidev K. Sridhar
  • Matthew J. Koop
  • Jonathan L. Perkins
  • Dhabaleswar K. Panda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5374)


As cluster sizes head into tens of thousands, current job launch mechanisms do not scale as they are limited by resource constraints as well as performance bottlenecks. The job launch process includes two phases – spawning of processes on processors and information exchange between processes for job initialization. Implementations of various programming models follow distinct protocols for the information exchange phase. We present the design of a scalable, extensible and high-performance job launch architecture for very large scale parallel computing. We present implementations of this architecture which achieve a speedup of more than 700% in launching a simple Hello World MPI application on 10,240 processor cores and also scale to more than 3 times the number of processor cores compared to prior solutions.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    TOP 500 Project: Top 500 Supercomputer Sites,
  2. 2.
    Sandia National Laboratories: Thunderbird Linux Cluster,
  3. 3.
    Message Passing Interface Forum: MPI: A Message-Passing Interface Standard (1994)Google Scholar
  4. 4.
    Network-Based Computing Laboratory: MVAPICH: MPI-1 over InfiniBand,
  5. 5.
    Argonne National Laboratory: MPICH2 : High-performance and Widely Portable MPI,
  6. 6.
    Huang, W., Santhanaraman, G., Jin, H.-W., Gao, Q., Panda, D.K.: Design of high performance mvapich2: Mpi2 over infiniband. In: Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID 2006) (2006)Google Scholar
  7. 7.
    Carlson, W., Draper, J., Culler, D., Yelick, K., Brooks, E., Warren, K.: Introduction to upc and language specification. IDA Center for Computing Sciences (1999)Google Scholar
  8. 8.
    Shukla, A., Brecht, T.: Tcp connection management mechanisms for improving internet server performance. In: 1st IEEE Workshop on Hot Topics in Web Systems and Technologies, 2006. HOTWEB 2006, November 13-14, 2006, pp. 1–12 (2006)Google Scholar
  9. 9.
    Moody, A., Fernandez, J., Petrini, F., Panda, D.: Scalable NIC-based Reduction on Large-scale Clusters. In: Supercomputing, 2003 ACM/IEEE Conference (2003)Google Scholar
  10. 10.
    InfiniBand Trade Association: InfiniBand Architecture Specification,
  11. 11.
    Lawrence Berkeley National Laboratory: MVICH: MPI for Virtual Interface Architecture (2001),
  12. 12.
    Butler, R., Gropp, W., Lusk, E.: Components and interfaces of a process management system for parallel programs. In: Parallel Computing (2001)Google Scholar
  13. 13.
    Texas Advanced Computing Center: HPC Systems,
  14. 14.
    Yu, W., Wu, J., Panda, D.K.: Scalable startup of parallel programs over infiniband. In: Bougé, L., Prasanna, V.K. (eds.) HiPC 2004. LNCS, vol. 3296. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Brightwell, R., Fisk, L.: Scalable parallel application launch on cplant. In: Supercomputing, ACM/IEEE 2001 Conference, November 10-16 (2001)Google Scholar
  16. 16.
    Lawrence Livermore National Laboratory and Hewlett Packard and Bull and Linux NetworX: Simple Linux Utility for Resource Management,
  17. 17.
    Network-based Computing Laboratory: (MVAPICH: MPI over InfiniBband and iWARP),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jaidev K. Sridhar
    • 1
  • Matthew J. Koop
    • 1
  • Jonathan L. Perkins
    • 1
  • Dhabaleswar K. Panda
    • 1
  1. 1.Network-Based Computing LaboratoryThe Ohio State UniversityColumbusUSA

Personalised recommendations