Performing tasks on restartable message-passing processors

Chlebus, Bogdan S.; De Prisco, Roberto; Shvartsman, Alex A.

doi:10.1007/BFb0030678

Bogdan S. Chlebus¹,
Roberto De Prisco² &
Alex A. Shvartsman³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1320))

Included in the following conference series:

International Workshop on Distributed Algorithms

245 Accesses
4 Citations

Abstract

This work presents new algorithms for the “Do-All” problem that consists of performing t tasks reliably in a message-passing synchronous system of p fault-prone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stop-failures and it does not allow restarts. It has the available processor steps complexity S = O((t + plog p/ log log p) · log f) and the message complexity M = O(t + plog p/ log log p + f · p). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for large f, it has better S complexity. This algorithm is used as the basis for another algorithm which tolerates any pattern of stop-failures and restarts. This new algorithm is the first solution for the Do-All problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + plog p + f) · min{log p, log f}), and its message complexity is M = O(t + p · log p + f · p), where f is the number of failures.

This work was supported by the following contracts: ARPA N00014-92-J-4033 and F19628-95-C-0118, NSF 922124-CCR, ONR-AFOSR F49620-94-1-01997, and DFG-Graduiertenkolleg “Parallele Rechnernetzwerke in der Produktionstechnik” ME 872/4-1, DFG-SFB 376 “Massive Parallelität: Algorithmen, Entwurfsmethoden, Anwendungen”. The research of the third author was substantially done at the Massachusetts Institute of Technology. The research of the first and the third authors was partly done while visiting Heinz Nixdorf Institut, Universität-GH Paderborn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. De Prisco, A. Mayer, and M. Yung, “Time-Optimal Message-Efficient Work Performance in the Presence of Faults,” in Proc. 13th ACM Symposium on Principles of Distributed Computing, 1994, pp. 161–172.
Google Scholar
C. Dwork, J. Halpern, O. Waarts, “Performing Work Efficiently in the Presence of Faults”, to appear in SIAM J. on Computing, prelim. vers. appeared as Accomplishing Work in the Presence of Failures in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.
Google Scholar
Z. Galil, A. Mayer, and M. Yung, ”Resolving Message Complexity of Byzantine Agreement and Beyond,” in Proc. 36th IEEE Symposium on Foundations of Computer Science, 1995, pp. 724–733.
Google Scholar
V. Hadzilacos and S. Toueg, “Fault-Tolerant Broadcasts and Related Problems,” in Distributed Systems, 2nd Ed., S. Mullender, Ed., Addison-Wesley and ACM Press, 1993.
Google Scholar
P.C. Kanellakis, D. Michailidis, A.A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, Nordic J. of Computing, vol. 2, pp. 146–180, 1995 (prel. vers. in WDAG-7, pp. 99–114, 1993).
Google Scholar
P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust,” Distributed Computing, vol. 5, pp. 201–217, 1992; prel. version in Proc. of the 8th ACM Symp. on Principles of Distributed Computing, 1989, pp. 211–222.
Google Scholar
P.C. Kanellakis and A.A. Shvartsman, Fault-Tolerant Parallel Computation, ISBN 0-7923-9922-6, Kluwer Academic Publishers, 1997.
Google Scholar
Z.M. Kedem, K.V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.
Google Scholar
Z.M. Kedem, K.V. Palem, M.O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.
Google Scholar
C. Martel, personal communication, March, 1991.
Google Scholar
C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.
Google Scholar

Download references

Author information

Authors and Affiliations

Instytut Informatyki, Uniwersytet Warszawski, Banacha 2, 02-097, Warszawa, Poland
Bogdan S. Chlebus
Laboratory for Computer Science, Massachusetts Institute of Technology, 545 Technology Square, NE43-368, Cambridge, MA, USA
Roberto De Prisco
Department of Computer Science and Engineering, University of Connecticut, 191 Auditorium Road, U-155, Storrs, CT, USA
Alex A. Shvartsman

Authors

Bogdan S. Chlebus
View author publications
You can also search for this author in PubMed Google Scholar
Roberto De Prisco
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Shvartsman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Marios Mavronicolas Philippas Tsigas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chlebus, B.S., De Prisco, R., Shvartsman, A.A. (1997). Performing tasks on restartable message-passing processors. In: Mavronicolas, M., Tsigas, P. (eds) Distributed Algorithms. WDAG 1997. Lecture Notes in Computer Science, vol 1320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030678

Download citation

DOI: https://doi.org/10.1007/BFb0030678
Published: 20 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63575-8
Online ISBN: 978-3-540-69600-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics