Skip to main content

Performing tasks on restartable message-passing processors

  • Contributed Papers
  • Conference paper
  • First Online:
Distributed Algorithms (WDAG 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1320))

Included in the following conference series:

Abstract

This work presents new algorithms for the “Do-All” problem that consists of performing t tasks reliably in a message-passing synchronous system of p fault-prone processors. The algorithms are based on an aggressive coordination paradigm in which multiple coordinators may be active as the result of failures. The first algorithm is tolerant of f < p stop-failures and it does not allow restarts. It has the available processor steps complexity S = O((t + plog p/ log log p) · log f) and the message complexity M = O(t + plog p/ log log p + f · p). Unlike prior solutions, our algorithm uses redundant broadcasts when encountering failures and, for large f, it has better S complexity. This algorithm is used as the basis for another algorithm which tolerates any pattern of stop-failures and restarts. This new algorithm is the first solution for the Do-All problem that efficiently deals with processor restarts. Its available processor steps complexity is S = O((t + plog p + f) · min{log p, log f}), and its message complexity is M = O(t + p · log p + f · p), where f is the number of failures.

This work was supported by the following contracts: ARPA N00014-92-J-4033 and F19628-95-C-0118, NSF 922124-CCR, ONR-AFOSR F49620-94-1-01997, and DFG-Graduiertenkolleg “Parallele Rechnernetzwerke in der Produktionstechnik” ME 872/4-1, DFG-SFB 376 “Massive Parallelität: Algorithmen, Entwurfsmethoden, Anwendungen”. The research of the third author was substantially done at the Massachusetts Institute of Technology. The research of the first and the third authors was partly done while visiting Heinz Nixdorf Institut, Universität-GH Paderborn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. De Prisco, A. Mayer, and M. Yung, “Time-Optimal Message-Efficient Work Performance in the Presence of Faults,” in Proc. 13th ACM Symposium on Principles of Distributed Computing, 1994, pp. 161–172.

    Google Scholar 

  2. C. Dwork, J. Halpern, O. Waarts, “Performing Work Efficiently in the Presence of Faults”, to appear in SIAM J. on Computing, prelim. vers. appeared as Accomplishing Work in the Presence of Failures in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.

    Google Scholar 

  3. Z. Galil, A. Mayer, and M. Yung, ”Resolving Message Complexity of Byzantine Agreement and Beyond,” in Proc. 36th IEEE Symposium on Foundations of Computer Science, 1995, pp. 724–733.

    Google Scholar 

  4. V. Hadzilacos and S. Toueg, “Fault-Tolerant Broadcasts and Related Problems,” in Distributed Systems, 2nd Ed., S. Mullender, Ed., Addison-Wesley and ACM Press, 1993.

    Google Scholar 

  5. P.C. Kanellakis, D. Michailidis, A.A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, Nordic J. of Computing, vol. 2, pp. 146–180, 1995 (prel. vers. in WDAG-7, pp. 99–114, 1993).

    Google Scholar 

  6. P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust,” Distributed Computing, vol. 5, pp. 201–217, 1992; prel. version in Proc. of the 8th ACM Symp. on Principles of Distributed Computing, 1989, pp. 211–222.

    Google Scholar 

  7. P.C. Kanellakis and A.A. Shvartsman, Fault-Tolerant Parallel Computation, ISBN 0-7923-9922-6, Kluwer Academic Publishers, 1997.

    Google Scholar 

  8. Z.M. Kedem, K.V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.

    Google Scholar 

  9. Z.M. Kedem, K.V. Palem, M.O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.

    Google Scholar 

  10. C. Martel, personal communication, March, 1991.

    Google Scholar 

  11. C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Marios Mavronicolas Philippas Tsigas

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chlebus, B.S., De Prisco, R., Shvartsman, A.A. (1997). Performing tasks on restartable message-passing processors. In: Mavronicolas, M., Tsigas, P. (eds) Distributed Algorithms. WDAG 1997. Lecture Notes in Computer Science, vol 1320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030678

Download citation

  • DOI: https://doi.org/10.1007/BFb0030678

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63575-8

  • Online ISBN: 978-3-540-69600-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics