Supporting parallel applications on clusters of workstations: The Virtual Communication Machine‐based architecture

Roşu, Marcel‐Cătălin; Schwan, Karsten; Fujimoto, Richard

doi:10.1023/A:1019064911399

Supporting parallel applications on clusters of workstations: The Virtual Communication Machine‐based architecture

Published: May 1998

Volume 1, pages 51–67, (1998)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Marcel‐Cătălin Roşu¹,
Karsten Schwan¹ &
Richard Fujimoto¹

70 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. The architecture addresses what is considered one of the most important problems of cluster-based parallel computing: the inherent inability of scaling the performance of communication software along with the host CPU performance. The Virtual Communication Machine (VCM), resident on the network coprocessor, presents a scalable software solution by providing configurable communication functionality directly accessible at user-level. The VCM architecture is configurable in that it enables the transfer to the VCM of selected communication-related functionality that is traditionally part of the application and/or the host kernel. Such transfers are beneficial when a significant reduction of the host CPU's load translates into a small increase in the coprocessor's load. The functionality implemented by the coprocessor is available at the application level as VCM instructions. Host CPU(s) and coprocessor interact through shared memory regions, thereby avoiding expensive CPU context switches. The host kernel is not involved in this interaction; it simply “connects” the application to the VCM during the initialization phase and is called infrequently to handle exceptional conditions. Protection is enforced by the VCM based on information supplied by the kernel. The VCM-based communication architecture admits low cost and open implementations, as demonstrated by its current ATM-based implementation based on off-the-shelf hardware components and using standard AAL5 packets. The architecture makes it easy to implement communication software that exhibits negligible overheads on the host CPU(s) and offers latencies and bandwidths close to the hardware limits of the underlying network. These characteristics are due to the VCM's support for zero-copy messaging with gather/scatter capabilities and the VCM's direct access to any data structure in an application's address space. This paper describes two versions of an ATM-based VCM implementation, which differ in the way they use the memory on the network adapter. Their performance under heavy load is compared in the context of a synthetic client/server application. The same application is used to evaluate the scalability of the architecture to multiple VCM-based network interfaces per host. Parallel implementations of the Traveling Salesman Problem and of Georgia Tech Time Warp, an engine for discrete-event simulation, are used to demonstrate VCM functionality and the high performance of its implementation. The distributed- and shared-memory versions of these two applications exhibit comparable performance, despite the significant cost-performance advantage of the distributed-memory platform.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

MT-3000: a heterogeneous multi-zone processor for HPC

Article 24 May 2022

References

H. Agusleo and N. Soparkar, Employing logic-enhanced memory for high-performance ATM network interfaces, in: Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing(IEEE Computer Society, Los Alamitos, CA, 1996) pp. 192-200.
Google Scholar
T.E. Anderson, H.M. Levy, B.N. Bershad and E.D. Lazowska, The interaction of architecture and operating system design, in: Proceedings of the 4th ACM International Conference on Architectural Support for Programming Languages and Operating Systems(Association for Computing Machinery, New York, NY, 1991) pp. 108-120.
Google Scholar
B. Bershad, S. Savage, P. Pardyak, E. Sirer, M. Fiuczynski, D. Becker, C. Chambers and S. Eggers, Extensibility, safety, and performance in the SPIN operating system, in: Proceedings of the 15th ACM Symposium on Operating System Principles(Association for Computing Machinery, New York, NY, 1995) pp. 267-283.
Google Scholar
G. Buzzard, D. Jacobson, M. Mackey, S. Marovich and J. Wilkes, An implementation of the Hamlyn sender-managed interface architecture, in: Proceedings of the 2nd Symposium on Operating Systems Design and Implementations(Association for Computing Machinery, New York, NY, 1996) pp. 245-259.
Google Scholar
C. Carothers, R. Fujimoto, Y.-B. Lin and P. England, Distributed simulations of large-scale pcs networks, in: Proceedings of the 2nd IEEE International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems(IEEE Computer Society, Los Alamitos, CA, 1994) pp. 2-11.
Google Scholar
S. Das, R. Fujimoto, K. Panesar, D. Allison and M. Hybinette, GTW: A Time Warp system for shared memory multiprocessors, in: Proceedings of the 1994 Winter Simulation Conference(Association for Computing Machinery, New York, NY, 1994) pp. 1332-1339.
Google Scholar
A. Davis, M. Swanson and M. Parker, Efficient communication mechanisms for cluster based parallel computing, in: Proceedings of the 1st International Workshop on Communication and Architectural Support for Network-Based Parallel Computing, eds. D.K. Panda and C.B. Stunkel (Springer, Heidelberg, 1997) pp. 1-15.
Google Scholar
P. Druschel and G. Banga, Lazy receiver processing (LRP): a network subsystem architecture for server systems, in: Proceedings of the 2nd Symposium on Operating Systems Design and Implementations(Association for Computing Machinery, New York, NY, 1996) pp. 261-275.
Google Scholar
P. Druschel, L.L. Peterson and B.S. Davie, Experiences with a high-speed network adaptor: a software perspective, in: Proceedings of the SIGCOMM' 94 Conference on Communications Architectures, Protocols and Applications(Association for Computing Machinery, New York, NY, 1994) pp. 2-13.
Google Scholar
C. Dubnicki, A. Bilas, K. Li and J. Philbin, Design and implementation of virtual memory-mapped communication on Myrinet, in: Proceedings of the 11th International Parallel Processing Symposium(IEEE Computer Society, Los Alamitos, CA, 1997) pp. 388-396.
Google Scholar
C. Dubnicki, A. Bilas, Y. Chen, S. Damianakis and K. Li, VMMC-2: Efficient support for reliable, connection-oriented communication, in: Proceedings of Hot Interconnects V(1997) pp. 37-46.
A. Edwards, G. Watson, J. Lumley, D. Banks, C. Calamvokis and C. Dalton, User-space protocols deliver high performance to applications on a low-cost Gb/s LAN, in: Proceedings of the SIGCOMM' 94 Conference on Communications Architectures, Protocols and Applications(Association for Computing Machinery, New York, NY, 1994) pp. 14-23.
Google Scholar
D. Engler, M. Kaashoek and J. Jr. O'Toole, Exokernel: an operating system architecture for application-level resource management, in: Proceedings of the 15th ACM Symposium on Operating System Principles(Association for Computing Machinery, New York, NY, 1995) pp. 251-266.
Google Scholar
E.W. Felten, R.D. Alpert, A. Bilas, M.A. Blumrich, D.W. Clark, S. Damianakis, C. Dubnicki, L. Iftode and K. Li, Early experience with message-passing on the SHRIMP multi-computer, in: Proceedings of the 23rd ACM Annual International Symposium on Computer Architecture(Association for Computing Machinery, New York, NY, 1996) pp. 296-307.
Google Scholar
FORE Systems, Programmer's Reference Manual for AALI Interface, MANU 0023 (FORE Systems Inc., Warrendale, PA, 1995).
Google Scholar
R. Fujimoto, Performance of Time Warp under synthetic workloads, in: Proceedings of the SCS Multi-conference on Distributed Simulation(Society for Computer Simulation, San Diego, CA, 1990) pp. 23-28.
Google Scholar
R. Fujimoto and K. Panesar, Buffer management in shared-memory Time Warp systems, in: Proceedings of the 9th Workshop on Parallel and Distributed Simulation(IEEE Computer Society, Los Alamitos, CA, 1995) pp. 149-156.
Google Scholar
D. Jefferson, Virtual time, ACM Transactions on Programming Languages and Systems 7 (1985) 404-425.
Article Google Scholar
M.B. Jones, D. Ros¸u and M.-C. Roşu, CPU reservations and time constraints: Efficient, predictable scheduling of independent activities, in: Proceedings of the 16th ACM Symposium on Operating Systems Principles(Association for Computing Machinery, New York, NY, 1997) pp. 198-211.
Google Scholar
P.M. Kogge, EXECUBE - A new architecture for scalable MPPs, in: Proceedings of the 1994 International Conference on Parallel Processing(CRC Press, Boca Raton, FL, 1994) pp. 77-84.
Google Scholar
R.P. Martin, A.M. Vahdat, D.E. Culler and T.E. Anderson, Effects of communication latency, overhead and bandwidth in a cluster architecture, in: Proceedings of the 24th ACM Annual International Symposium on Computer Architecture(Association for Computing Machinery, New York, NY, 1997) pp. 85-97.
Google Scholar
S. Pakin, M. Laura and A. Chien, High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet, in: Proceedings of the 1995 ACM Conference on Supercomputing(CDROM) (Association for Computing Machinery, New York, NY, 1995).
Google Scholar
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas and K. Yelick, A case for intelligent RAM: IRAM, IEEE Micro 17 (1997) 34-44.
Article Google Scholar
L. Peterson, N. Hutchinson, S. O'Malley and M. Abbot, RPC in the x-kernel: Evaluating new design techniques, in: Proceedings of the 12th ACM Symposium on Operating Systems Principles(Association for Computing Machinery, New York, NY, 1989) pp. 91-101.
Google Scholar
M.-C. Roşu, Processor controlled off-processor I/O, TR95-1538, CS Department Cornell University, Ithaca, NY (1995).
Google Scholar
M. Rosenblum, E. Bugnion, S.A. Herrod, E. Witchel and A. Gupta, The impact of architectural trends on operating system performance, in: Proceedings of the 15th ACM Symposium on Operating Systems Principles(Association for Computing Machinery, New York, NY, 1995) pp. 285-298.
Google Scholar
P. Sarkar and M. Bailey, CNI: A high-performance network interface for workstation clusters, in: Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing(IEEE Computer Society, Los Alamitos, CA, 1996) pp. 151-160.
Google Scholar
A. Saulsbury, F. Pong and A. Nowatzyk, Missing the memory wall: the case for processor/memory integration, in: Proceedings of the 23rd Annual International Symposium on Computer Architecture(Association for Computing Machinery, New York, NY, 1996) pp. 90-101.
Google Scholar
K.E. Schauser, C.J. Scheiman, J.M. Ferguson and P.Z. Kolano, Exploiting the capabilities of communication co-processors, in: Proceedings of the 10th International Parallel Processing Symposium(IEEE Computer Society, Los Alamitos, CA, 1996) pp. 109-115.
Google Scholar
K. Schwan, T. Bihari, B.W. Weide and G. Taulbee, High performance operating system primitives for robotics and real-time control systems, ACM Transactions on Computer Systems 5 (1987) 189-231.
Article Google Scholar
P. Steenkiste, A systematic approach to host interface design for high-speed networks, IEEE Computer 26 (1994) 47-57.
Google Scholar
C.A. Thekkath and H.M. Levy, Limits to low-latency communication on high-speed networks, ACM Transactions on Computer Systems 11 (1993) 179-203.
Article Google Scholar
C.B.S. Traw and J.M. Smith, Hardware/software organization of a high performance ATM host interface, IEEE Journal on Selected Areas in Communications 2 (1993) 240-253.
Article Google Scholar
T. von Eicken, A. Basu, V. Buch and W. Vogels, U-Net: A user-level network interface for parallel and distributed computing, in: Proceedings of the 15th ACM Symposium on Operating Systems Principles(Association for Computing Machinery, New York, NY, 1995) pp. 40-53.
Google Scholar
D. Wallach, W. Hsieh, K. Johnson, M. Kaashoek and W. Weihl, Optimistic active messages: a mechanism for scheduling communication with computation, in: Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(Association for Computing Machinery, New York, NY, 1995) pp. 217-226.
Google Scholar
M. Welsh, A. Basu and T. von Eicken, Incorporating memory management into user-level network interfaces, in: Proceedings of Hot Interconnects V(1997) pp. 27-36.
J. Wilkes, Hamlyn - an interface for sender-based communications, TR HPL-OSR-92-13, Hewlett-Packard Laboratories, Palo Alto, CA (1992).
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computing, Georgia Institute of Technology, Atlanta, GA, 30332‐0280, USA
Marcel‐Cătălin Roşu, Karsten Schwan & Richard Fujimoto

Authors

Marcel‐Cătălin Roşu
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Schwan
View author publications
You can also search for this author in PubMed Google Scholar
Richard Fujimoto
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roşu, M., Schwan, K. & Fujimoto, R. Supporting parallel applications on clusters of workstations: The Virtual Communication Machine‐based architecture. Cluster Computing 1, 51–67 (1998). https://doi.org/10.1023/A:1019064911399

Download citation

Issue Date: May 1998
DOI: https://doi.org/10.1023/A:1019064911399

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supporting parallel applications on clusters of workstations: The Virtual Communication Machine‐based architecture

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Parallelizing the dual revised simplex method

MT-3000: a heterogeneous multi-zone processor for HPC

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Supporting parallel applications on clusters of workstations: The Virtual Communication Machine‐based architecture

Abstract

Access this article

Similar content being viewed by others

Containerization technologies: taxonomies, applications and challenges

Parallelizing the dual revised simplex method

MT-3000: a heterogeneous multi-zone processor for HPC

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation