Fault-Tolerance and Efficiency in Massively Parallel Algorithms

Kanellakis, Paris C.; Shvartsman, Alex A.

doi:10.1007/978-0-585-27316-7_5

Paris C. Kanellakis² &
Alex A. Shvartsman³

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 284))

64 Accesses
2 Citations

Abstract

We present an overview of massively parallel deterministic algorithms which combine high fault-tolerance and efficiency. This desirable combination (called robustness here) is nontrivial, since increasing efficiency implies removing redundancy whereas increasing fault-tolerance requires adding redundancy to computations. We study a spectrum of algorithmic models for which significant robustness is achievable, from static fault, synchronous computation to dynamic fault, asynchronous computation. In addition to fail-stop processor models, we examine and deal with arbitrarily initialized memory and restricted memory access concurrency. We survey the deterministic upper bounds for the basic Write-All primitive, the lower bounds on its efficiency, and we identify some of the key open questions. We also generalize the robust computing of functions to relations; this new approach can model approximate computations. We show how to compute approximate Write-All optimally. Finally, we synthesize the state-of-the-art in a complexity classification, which extends with fault-tolerance the traditional classification of efficient parallel algorithms.

This research was supported by ONR grant N00014-91-J-1613.

Download to read the full chapter text

Chapter PDF

Scalability in Parallel Processing

Topic 9: Parallel and Distributed Programming

Scheduling for Fault-Tolerance: An Introduction

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Bibliography

M. Ajtai, J. Aspnes, C. Dwork, O. Waarts, “The Competitive Analysis of Wait-Free Algorithms and its Application to the Cooperative Collect Problem”, manuscript 1993.
Google Scholar
G. B. Adams III, D. P. Agrawal, H. J. Seigel, “A Survey and Comparison of Fault-tolerant Multistage Interconnection Networks”, IEEE Computer, 20,6, pp. 14–29, 1987.
Google Scholar
R. Anderson, H. Woll, “Wait-Free Parallel Algorithms for the Union-Find Problem”, Proc. of the 23rd ACM Symp. on Theory of Computing, pp. 370–380, 1991.
Google Scholar
Y. Aumann and M.O. Rabin, “Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation”, in Proc. of the 33rd IEEE Symposium on Foundations of Computer Science, pp. 147–156, 1992.
Google Scholar
Y. Aumann, Z.M. Kedem, K.V. Palem, M.O. Rabin, “Highly Efficient Asynchronous Execution of Large-Grained Parallel Programs”, in Proc. of the 34th IEEE Symposium on Foundations of Computer Science, pp. 271–280, 1993.
Google Scholar
P. Beame and J. Hastad, “Optimal bounds for decision problems on the CRCW PRAM,” Journal of the ACM, vol. 36,no. 3, pp. 643–670, 1989.
Article MATH MathSciNet Google Scholar
P. Beame, M. Kik and M. Kutylowski, “Information broadcasting by Exclusive Read PRAMs”, manuscript 1992.
Google Scholar
J. Buss, P.C. Kanellakis, P. Ragde, A.A. Shvartsman, “Parallel algorithms with processor failures and delays”, Brown Univ. TR CS-91-54, August 1991.
Google Scholar
R. Cole and O. Zajicek, “the APRAM: Incorporating Asynchrony into the PRAM Model,” in Proc. of the 1989 ACM Symp. on Parallel Algorithms and Architectures, pp. 170–178, 1989.
Google Scholar
R. Cole and O. Zajicek, “the Expected Advantage of Asynchrony,” in Proc. 2nd ACM Symp. on Parallel Algorithms and Architectures, pp. 85–94, 1990.
Google Scholar
R. DePrisco, A. Mayer, M. Young, “Time-Optimal Message-Optimal Work performance in the Presence of Faults” manuscript, 1994.
Google Scholar
C. Dwork, J. Halpern, O. Waarts, “Accomplishing Work in the Presence of Failures” in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.
Google Scholar
D. Eppstein and Z. Galil, “Parallel Techniques for Combinatorial Computation”, Annual Computer Science Review, 3 (1988), pp. 233–83.
Article MathSciNet Google Scholar
S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.
Google Scholar
P. Gibbons, “A More Practical PRAM Model,” in Proc. of the 1989 ACM Symposium on Parallel Algorithms and Architectures, pp. 158–168, 1989.
Google Scholar
P. C. Kanellakis, D. Michailidis, A. A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, 7th Int-l Workshop on Distributed Algorithms, pp. 99–114, 1993.
Google Scholar
P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5,no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.
Article MATH Google Scholar
P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms On Restartable Fail-Stop Processors”, in Proc. of the 10th ACM Symposium on Principles of Distributed Computing, 1991.
Google Scholar
P. C. Kanellakis and A. A. Shvartsman, “Robust Computing with Fail-Stop Processors”, in Proc. of the Second Annual Review and Workshop on Ultradependable Multicomputers, Office of Naval Research, pp. 55–60, 1991.
Google Scholar
R. M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
Google Scholar
Z. M. Kedem, K. V. Palem, M. O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.
Google Scholar
Z. M. Kedem, K. V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.
Google Scholar
Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.
Google Scholar
C. P. Kruskal, L. Rudolph, M. Snir, “Efficient Synchronization on Multiprocessors with Shared Memory,” in ACM Trans. on Programming Languages and Systems, vol. 10,no. 4, pp. 579–601 1988.
Article MATH Google Scholar
C. P. Kruskal, L. Rudolph, M. Snir, “A Complexity Theory of Efficient Parallel Algorithms,” Theoretical Computer Science 71, pp. 95–132, 1990.
Article MATH MathSciNet Google Scholar
L. E. Ladner, M. J. Fischer, “Parallel Prefix Computation”, Journal of the ACM, vol. 27,no. 4, pp. 831–838, 1980.
Article MATH MathSciNet Google Scholar
M. Li and Y. Yesha, “New Lower Bounds for Parallel Computation,” Journal of the ACM, vol. 36,no. 3, pp. 671–680, 1989.
Article MATH MathSciNet Google Scholar
A. López-Ortiz, “Algorithm X takes work ω(n log² n/log log n) in a synchronous fail-stop (no restart) PRAM”, unpublished manuscript, 1992.
Google Scholar
C. Martel, personal communication, March, 1991.
Google Scholar
C. Martel, A. Park, and R. Subramonian, “Work-optimal Asynchronous Algorithms for Shared Memory Parallel Computers,” SIAM Journal on Computing, vol. 21, pp. 1070–1099, 1992
Article MATH MathSciNet Google Scholar
C. Martel and R. Subramonian, “On the Complexity of Certified Write-All Algorithms”, to appear in Journal of Algorithms (a prel. version in the Proc. of the 12th Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, India, December 1992).
Google Scholar
C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.
Google Scholar
J. Naor, R.M. Roth, “Constructions of Permutation Arrays for Ceratin Scheduling Cost Measures”, manuscript, 1993.
Google Scholar
N. Nishimura, “Asynchronous Shared Memory Parallel Computation,” in Proc. 3rd ACM Symp. on Parallel Algor. and Architect., pp. 76–84, 1990.
Google Scholar
N. Pippinger, “On Simultaneous Resource Bounds”, in Proc. of 20th IEEE Symposium on Foundations of Computer Science, pp. 307–311, 1979.
Google Scholar
M.O. Rabin, “Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance”, J. of ACM, vol. 36,no. 2, pp. 335–348, 1989.
Article MATH MathSciNet Google Scholar
D. B. Sarrazin and M. Malek, “Fault-Tolerant Semiconductor Memories”, IEEE Computer, vol. 17,no. 8, pp. 49–56, 1984.
Google Scholar
R. D. Schlichting and F. B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1,no. 3, pp. 222–238, 1983.
Article Google Scholar
J. T. Schwartz, “Ultracomputers”, ACM Transactions on Programming Languages and Systems, vol. 2,no. 4, pp. 484–521, 1980.
Article MATH Google Scholar
A. A. Shvartsman, “Achieving Optimal CRCW PRAM Fault-Tolerance”, Information Processing Letters, vol. 39,no. 2, pp. 59–66, 1991.
Article MATH MathSciNet Google Scholar
A. A. Shvartsman, Fault-Tolerant and Efficient Parallel Computation, Ph.D. dissertation, Brown University, Tech. Rep. CS-92-23, 1992.
Google Scholar
A. A. Shvartsman, “Efficient Write-All Algorithm for Fail-Stop PRAM Without Initialized Memory”, Information Processing Letters, vol. 44,no. 6, pp. 223–231, 1992.
Article MATH Google Scholar
R.E. Tarjan, U. Vishkin, “Finding biconnected components and computing tree functions in logarithmic parallel time”, in Proc. of the 25th IEEE FOCS, pp. 12–22, 1984.
Google Scholar
J. S. Vitter, R. A. Simmons, “New Classes for Parallel Complexity: A Study of Unification and Other Complete Problems for P,” IEEE Trans. Comput., vol. 35,no. 5, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Brown University, PO Box 1910, Providence, RI, 02912, USA
Paris C. Kanellakis
Digital Equipment Corporation, Digital Consulting Technology Office, 30 Porter Road, Littleton, MA, 01460, USA
Alex A. Shvartsman

Authors

Paris C. Kanellakis
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Shvartsman
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Office of Naval Research, USA
Gary M. Koob & Clifford G. Lau &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kanellakis, P.C., Shvartsman, A.A. (1994). Fault-Tolerance and Efficiency in Massively Parallel Algorithms. In: Koob, G.M., Lau, C.G. (eds) Foundations of Dependable Computing. The Springer International Series in Engineering and Computer Science, vol 284. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-27316-7_5

Download citation

DOI: https://doi.org/10.1007/978-0-585-27316-7_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-9485-3
Online ISBN: 978-0-585-27316-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Fault-Tolerance and Efficiency in Massively Parallel Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Scalability in Parallel Processing

Topic 9: Parallel and Distributed Programming

Scheduling for Fault-Tolerance: An Introduction

Keywords

Bibliography

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Fault-Tolerance and Efficiency in Massively Parallel Algorithms

Abstract

Chapter PDF

Similar content being viewed by others

Scalability in Parallel Processing

Topic 9: Parallel and Distributed Programming

Scheduling for Fault-Tolerance: An Introduction

Keywords

Bibliography

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation