Reducing the Overhead of Intra-Node Communication in Clusters of SMPs

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3758)


This article presents the C++ library vShark which reduces the intra-node communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide thread-safe communication but most importantly, to improve the communication between threads within one SMP node. vShark uses a modular but transparent design which makes it independent of specific communication libraries. Thus, different subsystems such as MPI, CORBA, or PVM could also be used for low-level communication. We present an implementation of vShark based on MPI and the POSIX thread library, and show that the efficient intra-node communication of vShark improves the performance of parallel algorithms.


clusters of SMPs parallel programming models message passing between threads 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Demaine, E.D.: A Threads-Only MPI Implementation for the Development of Parallel Programs. In: Proc. of the 11th International Symposium on High Performance Computing Systems (HPCS 1997), Winnipeg, Manitoba, Canada, July 1997, pp. 153–163 (1997)Google Scholar
  2. 2.
    Ferrari, A., Sunderam, V.S.: Multiparadigm Distributed Computing with TPVM. Concurrency: Practice and Experience 10(3), 199–228 (1998)zbMATHCrossRefGoogle Scholar
  3. 3.
    Haines, M., Cronk, D., Mehrotra, P.: On the Design of Chant: A Talking Threads Package. In: Proc. of the 1994 conference on Supercomputing, pp. 350–359. IEEE Computer Society Press, Los Alamitos (1994)CrossRefGoogle Scholar
  4. 4.
    Hippold, J., Rünger, G.: A Communication API for Implementing Irregular Algorithms on SMP Clusters. In: Dongarra, J., Laforenza, D., Orlando, S. (eds.) EuroPVM/MPI 2003. LNCS, vol. 2840, pp. 455–463. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  5. 5.
    Huang, C., Lawlor, O., Kalé, L.V.: Adaptive MPI. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 306–322. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Hunold, S., Rauber, T., Rünger, G.: Hierarchical Matrix-Matrix Multiplication based on Multiprocessor Tasks. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3037, pp. 1–8. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  7. 7.
    Hunold, S., Rauber, T., Rünger, G.: Multilevel Hierarchical Matrix Multiplication on Clusters. In: Proc. of the 18th Annual ACM International Conference on Supercomputing, ICS 2004, June 2004, pp. 136–145 (2004)Google Scholar
  8. 8.
    Pakin, S., Pant, A.: VMI 2.0: A Dynamically Reconfigurable Messaging Layer for Availability, Usability, and Management. In: The 8th International Symposium on High Performance Computer Architecture (HPCA-8), Workshop on Novel Uses of System Area Networks (SAN-1), Cambridge, Massachusetts, February 2 (2002)Google Scholar
  9. 9.
    Parkbench Committee Assembled by R. Hockney (Chairman) and M. Berry (Secretary). Parkbench report: Public international benchmarks for parallel computers. Scientific Programming 3(2), 101–146, (summer 1994)Google Scholar
  10. 10.
    Porras, J., Huttunen, P., Ikonen, J.: The Effect of the 2nd Generation Clusters: Changes in the Parallel Programming Paradigms. In: ICCS 2004. LNCS, vol. 3037, pp. 10–17. Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Protopopov, B.V., Skjellum, A.: A Multithreaded Message Passing Interface (MPI) Architecture: Performance and Program Issues. Journal of Parallel and Distributed Computing 61(4), 449–466 (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Savant, J., Seidel, S.: MuPC: A Run Time System for Unified Parallel C. Technical report, Department of Computer Science, Michigan Technological University (September 2002)Google Scholar
  13. 13.
    Sun Microsystems Computer Company. Sun MPI 4.1 Programming and Reference Guide (March 2000)Google Scholar
  14. 14.
    Tang, H., Yang, T.: Optimizing Threaded MPI Execution on SMP Clusters. In: Proc. of the 15th International Conference on Supercomputing, pp. 381–392. ACM Press, New York (2001)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  1. 1.Department of Mathematics, Physics and Computer ScienceUniversity of BayreuthGermany

Personalised recommendations