Fault-tolerant shared memory simulations

Extended abstract
  • Petra Berenbrink
  • Friedhelm Meyer auf der Heide
  • Volker Stemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1046)

Abstract

We consider the problem of simulating a PRAM on a faulty distributed memory machine (DMM). We focus on dynamic faults, i.e. each processor or memory module independently fails during the simulation of a PRAM step with fixed probability and remains faulty for the rest of the simulation. We build upon randomized hashing-based simulations on non-faulty DMMs from [14], which achieve delay O (log log n), with high probability. We design and analyze routines for handling faults occurring during the simulation. Based on these routines we present simulations on faulty DMMs with the same delay O(log log n) as in the non-faulty case, provided that the failure probability of processors and modules is small enough to guarantee an expected linear number of processors and modules to survive the simulation. Thus the facility of being resilient to memory or processor faults increases the delay of the simulation at most by a constant factor.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    J.R. Anderson and G.L. Miller: Optical communication for pointer based algorithms. Technical Report CRI 88-14, Computer Science Department, University of Southern Carolina, Los Angeles, CA 90089-0782 USA, 1988.Google Scholar
  2. [2]
    Ö. Babaoglu, R. Drummond and P. Stephenson: The impact of communication network properties on reliable broadcast protocols. Technical Report, Department of Computer Science, Cornell University, Ithaca, New York 1988.Google Scholar
  3. [3]
    P. Berenbrink, F. Meyer auf der Heide and V. Stemann: Fault-tolerant shared memory simulations. Technical Report, to appear.Google Scholar
  4. [4]
    B.S. Chlebus, A. Gambin and P. Indyk: PRAM computations resilient to memory faults. In Proc. of the 2nd Annual European Symposium on Algorithms, pp 401–412, 1994.Google Scholar
  5. [5]
    F. Christian, H. Aghili, D. Dolev and Ray Strong: Atomic broadcast: from simple message diffusion to byzantine agreement. Computer Science, 1984.Google Scholar
  6. [6]
    A. Czumaj, F. Meyer auf der Heide and V. Stemann: Shared memory simulations with triple logarithmic delay. In Proc. of the 3rd Annual European Symposium on Algorithms, pp 46–59, 1995.Google Scholar
  7. [7]
    M. Dietzfelbinger and F. Meyer auf der Heide: Simple, efficient shared memory simulations. In Proc. of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, pp 110–119, 1993.Google Scholar
  8. [8]
    M. Dietzfelbinger and F. Meyer auf der Heide: How to distribute a hash table in a complete network. In Proc. of the 22nd ACM Symposium on Theory of Computing, pp 117–127, 1990.Google Scholar
  9. [9]
    L.A. Goldberg, M. Jerrum and T. Leighton: A doubly logarithmic communication algorithm for the completely connected optical communication parallel computer. In Proc. of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures, pp 300–309, 1993.Google Scholar
  10. [10]
    L.A. Goldberg, Y. Matias and S. Rao: An optical simulation of shared memory. In Proc. of the 6th Annual ACM Symposium on Parallel Algorithms and Architectures, pp 257–267, 1994.Google Scholar
  11. [11]
    R. Karp, M. Luby, and F. Meyer auf der Heide: Efficient PRAM simulations on distributed memory machine. In Proc. of the 24th Annual ACM Symposium on Theory of Computing, pp 318–326, 1992.Google Scholar
  12. [12]
    P.D. MacKenzie, C.G. Plaxton, R. Rajamaran: On contention resolution protocols and associated phenomena. University of Texas at Austin, Technical Report 94-06, 1994.Google Scholar
  13. [13]
    F. Meyer auf der Heide: Hashing strategies for simulating shared memory on distributed memory machines. In Proc. of the 1st Heinz Nixdorf Symposium “Parallel Architectures and their Efficient Use”, F. Meyer auf der Heide, B. Monien, A.L. Rosenberg, eds., pp 20–29, 1992.Google Scholar
  14. [14]
    F. Meyer auf der Heide, C. Scheideler and V. Stemann: Exploiting storage redundancy to speed up randomized shared memory simulations. In Proc. of the 12th Annual Symposium on Theoretical Aspects of Computer Science, pp 267–278, 1995.Google Scholar
  15. [15]
    J.P. Schmitt, A. Siegel and A. Srinivasan: Chernoff-Hoeffding bounds for applications with limited independence. In the Proc. of the 4th ACM-Siam Symposium on Discrete Algorithms, pp 331–340, 1993.Google Scholar
  16. [16]
    A. Siegel: On universal classes of fast high performance hash functions, their time-space tradeoff and their applications. In Proc. of the 30th IEEE Annual Symposium on Foundations of Computer Science, pp 20–25, 1989.Google Scholar
  17. [17]
    E. Upfal and A. Wigderson: How to share memory in a distributed system. J. Assoc. Comput. Mach. 34, pp 116–127, 1987.Google Scholar

Copyright information

© Springer-Verlag 1996

Authors and Affiliations

  • Petra Berenbrink
    • 1
  • Friedhelm Meyer auf der Heide
    • 1
  • Volker Stemann
    • 1
  1. 1.Heinz Nixdorf Institute and Dept. of Computer ScienceUniversity of PaderbornPaderbornGermany
  2. 2.International Computer Science InstituteBerkeley

Personalised recommendations