Skip to main content

Abstract

THIS study of fault-tolerant parallel computation uses models of computation based on the parallel random access machine, or pram. The pram model is generally accepted as a convenient abstraction useful for defining and analyzing parallel algorithms. However it makes some assumptions that call into question its practicality. The main such assumptions are global synchronization of processors, high-bandwidth concurrent access to shared memory, and infallibility of processors, interconnections and memory. In this monograph we pursue the goal of preserving the high-level pram abstraction that makes it attractive for programming parallel algorithms, while narrowing the gap between prams and realizable parallel machines. Our primary focus is the removal of the assumption that the processors are failure-free. In some settings we also show how to relax the assumption of global synchrony and how to limit shared memory access concurrency in fault-tolerant algorithms while preserving their efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliographic Notes

  1. R.M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.

    Google Scholar 

  2. N. Pippenger, “Communications Networks,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.

    Google Scholar 

  3. L. Valiant, “General Purpose Parallel Architectures,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.

    Google Scholar 

  4. F. Thomson Leighton, Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes, Morgan Kaufman Publishers, San Mateo, CA, 1992.

    Google Scholar 

  5. S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.

    Google Scholar 

  6. T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, 1990.

    Google Scholar 

  7. H.T. Kung and C.E. Leiserson, “Algorithms for VLSI Processor Arrays”, presented at the Symp. on Sparse Matrix Computations and Their Applications, Knoxville, TN, 1978.

    Google Scholar 

  8. C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, MA, 1980.

    Google Scholar 

  9. S. Owicki and D. Gries, “An Axiomatic Proof Technique for Parallel Programs I”, Acta Informatica, vol. 6, pp. 319–340, 1976.

    Article  MathSciNet  MATH  Google Scholar 

  10. M.J. Flynn, “Very High Speed Computing Systems”, in Proc. of IEEE, vol. 54, no. 12, pp. 1901–1909, 1966.

    Google Scholar 

  11. D.E. Knuth, The Art of Computer Programming, vol. 3, Sorting and Searching, Addison-Wesley Publ. Co., 1973.

    Google Scholar 

  12. L. Rudolph, “A Robust Sorting Network”, IEEE Trans. on Computers, vol. 34, no. 4, pp. 326–335, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  13. F. Cristian, “Understanding Fault-Tolerant Distributed Systems”, in Communications of the ACM, vol. 3, no. 2, pp. 56–78, 1991.

    Google Scholar 

  14. N.A. Lynch, Distributed Algorithms, Morgan Kaufman Publishers, San Mateo, CA, 1995.

    Google Scholar 

  15. M. Pease, R. Shostak, L. Lamport, “Reaching agreement in the presence of faults”, JACM, vol. 27, no. 2, pp. 228–234, 1980.

    Article  MathSciNet  MATH  Google Scholar 

  16. L. Lamport, R. Shostak and M. Pease, “The Byzantine Generals Problem”, ACM TOPLAS, vol. 4, no. 3, pp. 382–401, 1982.

    Article  MATH  Google Scholar 

  17. M.J. Fischer, N. A. Lynch, M. S. Paterson, “Impossibility of distributed consensus with one faulty process”, JACM, vol. 32, no. 2, pp. 374–382, 1985.

    Article  MathSciNet  MATH  Google Scholar 

  18. N.A. Lynch, “One Hundred Impossibility Proofs for Distributed Comuting”, Proc. of the 8th ACM Symposium on Principles of Distributed Computing, pp. 1–27, 1989.

    Chapter  Google Scholar 

  19. L. Lamport and N.A. Lynch, “Distributed Computing: Models and Methods,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.

    Google Scholar 

  20. R.D. Schlichting and F.B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1, no. 3, pp. 222–238, 1983.

    Article  Google Scholar 

  21. C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMS are (Almost) as Good as Synchronous PRAMS,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.

    Google Scholar 

  22. Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.

    Google Scholar 

  23. P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5, no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.

    Google Scholar 

  24. Z.M. Kedem, K.V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kanellakis, P.C., Shvartsman, A.A. (1997). Introduction. In: Fault-Tolerant Parallel Computation. The Springer International Series in Engineering and Computer Science, vol 401. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-5210-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-5210-6_1

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-5177-9

  • Online ISBN: 978-1-4757-5210-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics