Middleware in Modern High Performance Computing System Architectures

  • Christian Engelmann
  • Hong Ong
  • Stephen L. Scott
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4488)


A recent trend in modern high performance computing (HPC) system architectures employs “lean” compute nodes running a lightweight operating system (OS). Certain parts of the OS as well as other system software services are moved to service nodes in order to increase performance and scalability. This paper examines the impact of this HPC system architecture trend on HPC “middleware” software solutions, which traditionally equip HPC systems with advanced features, such as parallel and distributed programming models, appropriate system resource management mechanisms, remote application steering and user interaction techniques. Since the approach of keeping the compute node software stack small and simple is orthogonal to the middleware concept of adding missing OS features between OS and application, the role and architecture of middleware in modern HPC systems needs to be revisited. The result is a paradigm shift in HPC middleware design, where single middleware services are moved to service nodes, while runtime environments (RTEs) continue to reside on compute nodes.


High Performance Computing Middleware Lean Compute Node Lightweight Operating System 


  1. 1.
    Geist, G.A., Beguelin, A., Dongarra, J.J., Jiang, W., Manchek, R., Sunderam, V.S.: PVM: Parallel Virtual Machine: A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)zbMATHGoogle Scholar
  2. 2.
    Snir, M., Otto, S., Huss-Lederman, S., Walker, D., Dongarra, J.: MPI: The Complete Reference. MIT Press, Cambridge (1996)Google Scholar
  3. 3.
    SciDAC Center for Component Technology for Terascale Simulation Software (CCTTSS): High-Performance Scientific Component Research: Accomplishments and Future Directions (2005), Available at
  4. 4.
    Kesselman, C., Foster, I.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, San Francisco (1998)Google Scholar
  5. 5.
    Hendriks, E.: BProc: The Beowulf distributed process space. In: Proceedings of 16th ACM International Conference on Supercomputing (ICS) 2002, New York, NY, USA, pp. 129–136 (2002)Google Scholar
  6. 6.
    Hsieh, J., Leng, T., Fang, Y.C.: OSCAR: A turnkey solution for cluster computing. Dell Power Solutions, 138–140 (2001)Google Scholar
  7. 7.
    Papadopoulos, P.M., Katz, M.J., Bruno, G.: NPACI Rocks: Tools and techniques for easily deploying manageable Linux clusters. In: Proceedings of IEEE International Conference on Cluster Computing (Cluster 2001), Newport Beach, CA, USA (2001)Google Scholar
  8. 8.
    Becker, D., Monkman, B.: Scyld ClusterWareTM: An innovative architecture for maximizing return on investment in Linux clustering (2006), Available at
  9. 9.
    Morin, C., Lottiaux, R., Vallée, G., Gallard, P., Utard, G., Badrinath, R., Rilling, L.: Kerrighed: A single system image cluster operating system for high performance computing. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1291–1294. Springer, Heidelberg (2003)Google Scholar
  10. 10.
    Brightwell, R., Kelly, S.M., VanDyke, J.P.: Catamount software architecture with dual core extensions. In: Proceedings of 48th Cray User Group (CUG) Conference 2006, Lugano, Ticino, Switzerland (2006)Google Scholar
  11. 11.
    Moreira, J., Brutman, M., Castanos, J., Gooding, T., Inglett, T., Lieber, D., McCarthy, P., Mundy, M., Parker, J., Wallenfelt, B., Giampapa, M., Engelsiepen, T., Haskin, R.: Designing a highly-scalable operating system: The Blue Gene/L story. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC 2006), Tampa, FL, USA (2006)Google Scholar
  12. 12.
    Buck, B.R., Hollingsworth, J.K.: An API for runtime code patching. Journal of High Performance Computing Applications (4) (2000)Google Scholar
  13. 13.
    Kohl, J.A., Papadopoulos, P.M.: Efficient and flexible fault tolerance and migration of scientific simulations using CUMULVS. In: Proceedings of 2nd SIGMETRICS Symposium on Parallel and Distributed Tools (SPDT 1998), Welches, OR, USA (1998)Google Scholar
  14. 14.
    Sterling, T.: Beowulf cluster computing with Linux. MIT Press, Cambridge (2002)Google Scholar
  15. 15.
    Sterling, T., Salmon, J., Becker, D.J., Savarese, D.F.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge (1999)Google Scholar
  16. 16.
    Cray Inc., Seattle, WA, USA: Cray XT4 Computing Platform Documentation (2006), Available at
  17. 17.
    Novell Inc.: SUSE Linux Enterprise Distribution (2006), Available at
  18. 18.
    Cluster File Systems, Inc., Boulder, CO, USA: Lustre Cluster File System (2006), Available at
  19. 19.
    Cluster File Systems, Inc., Boulder, CO, USA: Lustre Cluster File System Architecture Whitepaper (2006), Available at

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Christian Engelmann
    • 1
  • Hong Ong
    • 1
  • Stephen L. Scott
    • 1
  1. 1.Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831-6164USA

Personalised recommendations