Workshop on Run-Time Systems for Parallel Programming Matthew Haines, University of Wyoming, USA Koen Langendoen, Vrije Universiteit, The Netherlands Greg Benson, University of Califonia at Davis, USA

Parallel and Distributed Processing

Volume 1388 of the series Lecture Notes in Computer Science pp 159-170


Migration and rollback transparency for arbitrary distributed applications in workstation clusters

  • Stefan PetriAffiliated withInstitute for Computer Engineering, Medical University at Lübeck
  • , Matthias BolzeAffiliated withInstitute for Operating Systems and Computer Networks, Technical University Braunschweig
  • , Horst Langendörfer

* Final gross prices may vary according to local VAT.

Get Access


Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs.

The Beam system [18] uses a global virtual name space to provide migration and rollback transparency in user space for distributed groups of processes on workstations. The system calls are interposed and their parameters translated between the name spaces. Unlike other migration mechanisms, Beam does not require the applications to be written for a specific programming model or communication library.

In this paper we describe design and implementation of a separate system call interposition process [3] that accesses the application via the debugging interface. The main advantage of this approach is that it can handle even unmodified (e. g. commercially bought) application programs. We compare measured performance figures with previous similar approaches [15, 20].