Abstract
THIS study of fault-tolerant parallel computation uses models of computation based on the parallel random access machine, or pram. The pram model is generally accepted as a convenient abstraction useful for defining and analyzing parallel algorithms. However it makes some assumptions that call into question its practicality. The main such assumptions are global synchronization of processors, high-bandwidth concurrent access to shared memory, and infallibility of processors, interconnections and memory. In this monograph we pursue the goal of preserving the high-level pram abstraction that makes it attractive for programming parallel algorithms, while narrowing the gap between prams and realizable parallel machines. Our primary focus is the removal of the assumption that the processors are failure-free. In some settings we also show how to relax the assumption of global synchrony and how to limit shared memory access concurrency in fault-tolerant algorithms while preserving their efficiency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Bibliographic Notes
R.M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
N. Pippenger, “Communications Networks,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
L. Valiant, “General Purpose Parallel Architectures,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
F. Thomson Leighton, Introduction to Parallel Algorithms and Architectures: Array, Trees, Hypercubes, Morgan Kaufman Publishers, San Mateo, CA, 1992.
S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.
T.H. Cormen, C.E. Leiserson and R.L. Rivest, Introduction to Algorithms, MIT Press, 1990.
H.T. Kung and C.E. Leiserson, “Algorithms for VLSI Processor Arrays”, presented at the Symp. on Sparse Matrix Computations and Their Applications, Knoxville, TN, 1978.
C. Mead and L. Conway, Introduction to VLSI Systems, Addison-Wesley, Reading, MA, 1980.
S. Owicki and D. Gries, “An Axiomatic Proof Technique for Parallel Programs I”, Acta Informatica, vol. 6, pp. 319–340, 1976.
M.J. Flynn, “Very High Speed Computing Systems”, in Proc. of IEEE, vol. 54, no. 12, pp. 1901–1909, 1966.
D.E. Knuth, The Art of Computer Programming, vol. 3, Sorting and Searching, Addison-Wesley Publ. Co., 1973.
L. Rudolph, “A Robust Sorting Network”, IEEE Trans. on Computers, vol. 34, no. 4, pp. 326–335, 1985.
F. Cristian, “Understanding Fault-Tolerant Distributed Systems”, in Communications of the ACM, vol. 3, no. 2, pp. 56–78, 1991.
N.A. Lynch, Distributed Algorithms, Morgan Kaufman Publishers, San Mateo, CA, 1995.
M. Pease, R. Shostak, L. Lamport, “Reaching agreement in the presence of faults”, JACM, vol. 27, no. 2, pp. 228–234, 1980.
L. Lamport, R. Shostak and M. Pease, “The Byzantine Generals Problem”, ACM TOPLAS, vol. 4, no. 3, pp. 382–401, 1982.
M.J. Fischer, N. A. Lynch, M. S. Paterson, “Impossibility of distributed consensus with one faulty process”, JACM, vol. 32, no. 2, pp. 374–382, 1985.
N.A. Lynch, “One Hundred Impossibility Proofs for Distributed Comuting”, Proc. of the 8th ACM Symposium on Principles of Distributed Computing, pp. 1–27, 1989.
L. Lamport and N.A. Lynch, “Distributed Computing: Models and Methods,” in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
R.D. Schlichting and F.B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1, no. 3, pp. 222–238, 1983.
C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMS are (Almost) as Good as Synchronous PRAMS,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.
Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.
P.C. Kanellakis and A.A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5, no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.
Z.M. Kedem, K.V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Kanellakis, P.C., Shvartsman, A.A. (1997). Introduction. In: Fault-Tolerant Parallel Computation. The Springer International Series in Engineering and Computer Science, vol 401. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-5210-6_1
Download citation
DOI: https://doi.org/10.1007/978-1-4757-5210-6_1
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-5177-9
Online ISBN: 978-1-4757-5210-6
eBook Packages: Springer Book Archive