Abstract
We present an overview of massively parallel deterministic algorithms which combine high fault-tolerance and efficiency. This desirable combination (called robustness here) is nontrivial, since increasing efficiency implies removing redundancy whereas increasing fault-tolerance requires adding redundancy to computations. We study a spectrum of algorithmic models for which significant robustness is achievable, from static fault, synchronous computation to dynamic fault, asynchronous computation. In addition to fail-stop processor models, we examine and deal with arbitrarily initialized memory and restricted memory access concurrency. We survey the deterministic upper bounds for the basic Write-All primitive, the lower bounds on its efficiency, and we identify some of the key open questions. We also generalize the robust computing of functions to relations; this new approach can model approximate computations. We show how to compute approximate Write-All optimally. Finally, we synthesize the state-of-the-art in a complexity classification, which extends with fault-tolerance the traditional classification of efficient parallel algorithms.
This research was supported by ONR grant N00014-91-J-1613.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Bibliography
M. Ajtai, J. Aspnes, C. Dwork, O. Waarts, “The Competitive Analysis of Wait-Free Algorithms and its Application to the Cooperative Collect Problem”, manuscript 1993.
G. B. Adams III, D. P. Agrawal, H. J. Seigel, “A Survey and Comparison of Fault-tolerant Multistage Interconnection Networks”, IEEE Computer, 20,6, pp. 14–29, 1987.
R. Anderson, H. Woll, “Wait-Free Parallel Algorithms for the Union-Find Problem”, Proc. of the 23rd ACM Symp. on Theory of Computing, pp. 370–380, 1991.
Y. Aumann and M.O. Rabin, “Clock Construction in Fully Asynchronous Parallel Systems and PRAM Simulation”, in Proc. of the 33rd IEEE Symposium on Foundations of Computer Science, pp. 147–156, 1992.
Y. Aumann, Z.M. Kedem, K.V. Palem, M.O. Rabin, “Highly Efficient Asynchronous Execution of Large-Grained Parallel Programs”, in Proc. of the 34th IEEE Symposium on Foundations of Computer Science, pp. 271–280, 1993.
P. Beame and J. Hastad, “Optimal bounds for decision problems on the CRCW PRAM,” Journal of the ACM, vol. 36,no. 3, pp. 643–670, 1989.
P. Beame, M. Kik and M. Kutylowski, “Information broadcasting by Exclusive Read PRAMs”, manuscript 1992.
J. Buss, P.C. Kanellakis, P. Ragde, A.A. Shvartsman, “Parallel algorithms with processor failures and delays”, Brown Univ. TR CS-91-54, August 1991.
R. Cole and O. Zajicek, “the APRAM: Incorporating Asynchrony into the PRAM Model,” in Proc. of the 1989 ACM Symp. on Parallel Algorithms and Architectures, pp. 170–178, 1989.
R. Cole and O. Zajicek, “the Expected Advantage of Asynchrony,” in Proc. 2nd ACM Symp. on Parallel Algorithms and Architectures, pp. 85–94, 1990.
R. DePrisco, A. Mayer, M. Young, “Time-Optimal Message-Optimal Work performance in the Presence of Faults” manuscript, 1994.
C. Dwork, J. Halpern, O. Waarts, “Accomplishing Work in the Presence of Failures” in Proc. 11th ACM Symposium on Principles of Distributed Computing, pp. 91–102, 1992.
D. Eppstein and Z. Galil, “Parallel Techniques for Combinatorial Computation”, Annual Computer Science Review, 3 (1988), pp. 233–83.
S. Fortune and J. Wyllie, “Parallelism in Random Access Machines”, Proc. the 10th ACM Symposium on Theory of Computing, pp. 114–118, 1978.
P. Gibbons, “A More Practical PRAM Model,” in Proc. of the 1989 ACM Symposium on Parallel Algorithms and Architectures, pp. 158–168, 1989.
P. C. Kanellakis, D. Michailidis, A. A. Shvartsman, “Controlling Memory Access Concurrency in Efficient Fault-Tolerant Parallel Algorithms”, 7th Int-l Workshop on Distributed Algorithms, pp. 99–114, 1993.
P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms Can Be Made Robust”, Distributed Computing, vol. 5,no. 4, pp. 201–217, 1992; prelim. vers. in Proc. of the 8th ACM PODC, pp. 211–222, 1989.
P. C. Kanellakis and A. A. Shvartsman, “Efficient Parallel Algorithms On Restartable Fail-Stop Processors”, in Proc. of the 10th ACM Symposium on Principles of Distributed Computing, 1991.
P. C. Kanellakis and A. A. Shvartsman, “Robust Computing with Fail-Stop Processors”, in Proc. of the Second Annual Review and Workshop on Ultradependable Multicomputers, Office of Naval Research, pp. 55–60, 1991.
R. M. Karp and V. Ramachandran, “A Survey of Parallel Algorithms for Shared-Memory Machines”, in Handbook of Theoretical Computer Science (ed. J. van Leeuwen), vol. 1, North-Holland, 1990.
Z. M. Kedem, K. V. Palem, M. O. Rabin, A. Raghunathan, “Efficient Program Transformations for Resilient Parallel Computation via Randomization,” in Proc. 24th ACM Symp. on Theory of Comp., pp. 306–318, 1992.
Z. M. Kedem, K. V. Palem, A. Raghunathan, and P. Spirakis, “Combining Tentative and Definite Executions for Dependable Parallel Computing,” in Proc 23d ACM. Symposium on Theory of Computing, pp. 381–390, 1991.
Z. M. Kedem, K. V. Palem, and P. Spirakis, “Efficient Robust Parallel Computations,” Proc. 22nd ACM Symp. on Theory of Computing, pp. 138–148, 1990.
C. P. Kruskal, L. Rudolph, M. Snir, “Efficient Synchronization on Multiprocessors with Shared Memory,” in ACM Trans. on Programming Languages and Systems, vol. 10,no. 4, pp. 579–601 1988.
C. P. Kruskal, L. Rudolph, M. Snir, “A Complexity Theory of Efficient Parallel Algorithms,” Theoretical Computer Science 71, pp. 95–132, 1990.
L. E. Ladner, M. J. Fischer, “Parallel Prefix Computation”, Journal of the ACM, vol. 27,no. 4, pp. 831–838, 1980.
M. Li and Y. Yesha, “New Lower Bounds for Parallel Computation,” Journal of the ACM, vol. 36,no. 3, pp. 671–680, 1989.
A. López-Ortiz, “Algorithm X takes work ω(n log2 n/log log n) in a synchronous fail-stop (no restart) PRAM”, unpublished manuscript, 1992.
C. Martel, personal communication, March, 1991.
C. Martel, A. Park, and R. Subramonian, “Work-optimal Asynchronous Algorithms for Shared Memory Parallel Computers,” SIAM Journal on Computing, vol. 21, pp. 1070–1099, 1992
C. Martel and R. Subramonian, “On the Complexity of Certified Write-All Algorithms”, to appear in Journal of Algorithms (a prel. version in the Proc. of the 12th Conference on Foundations of Software Technology and Theoretical Computer Science, New Delhi, India, December 1992).
C. Martel, R. Subramonian, and A. Park, “Asynchronous PRAMs are (Almost) as Good as Synchronous PRAMs,” in Proc. 32d IEEE Symposium on Foundations of Computer Science, pp. 590–599, 1990.
J. Naor, R.M. Roth, “Constructions of Permutation Arrays for Ceratin Scheduling Cost Measures”, manuscript, 1993.
N. Nishimura, “Asynchronous Shared Memory Parallel Computation,” in Proc. 3rd ACM Symp. on Parallel Algor. and Architect., pp. 76–84, 1990.
N. Pippinger, “On Simultaneous Resource Bounds”, in Proc. of 20th IEEE Symposium on Foundations of Computer Science, pp. 307–311, 1979.
M.O. Rabin, “Efficient Dispersal of Information for Security, Load Balancing and Fault Tolerance”, J. of ACM, vol. 36,no. 2, pp. 335–348, 1989.
D. B. Sarrazin and M. Malek, “Fault-Tolerant Semiconductor Memories”, IEEE Computer, vol. 17,no. 8, pp. 49–56, 1984.
R. D. Schlichting and F. B. Schneider, “Fail-Stop Processors: an Approach to Designing Fault-tolerant Computing Systems”, ACM Transactions on Computer Systems, vol. 1,no. 3, pp. 222–238, 1983.
J. T. Schwartz, “Ultracomputers”, ACM Transactions on Programming Languages and Systems, vol. 2,no. 4, pp. 484–521, 1980.
A. A. Shvartsman, “Achieving Optimal CRCW PRAM Fault-Tolerance”, Information Processing Letters, vol. 39,no. 2, pp. 59–66, 1991.
A. A. Shvartsman, Fault-Tolerant and Efficient Parallel Computation, Ph.D. dissertation, Brown University, Tech. Rep. CS-92-23, 1992.
A. A. Shvartsman, “Efficient Write-All Algorithm for Fail-Stop PRAM Without Initialized Memory”, Information Processing Letters, vol. 44,no. 6, pp. 223–231, 1992.
R.E. Tarjan, U. Vishkin, “Finding biconnected components and computing tree functions in logarithmic parallel time”, in Proc. of the 25th IEEE FOCS, pp. 12–22, 1984.
J. S. Vitter, R. A. Simmons, “New Classes for Parallel Complexity: A Study of Unification and Other Complete Problems for P,” IEEE Trans. Comput., vol. 35,no. 5, 1986.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Kluwer Academic Publishers
About this chapter
Cite this chapter
Kanellakis, P.C., Shvartsman, A.A. (1994). Fault-Tolerance and Efficiency in Massively Parallel Algorithms. In: Koob, G.M., Lau, C.G. (eds) Foundations of Dependable Computing. The Springer International Series in Engineering and Computer Science, vol 284. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-27316-7_5
Download citation
DOI: https://doi.org/10.1007/978-0-585-27316-7_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-9485-3
Online ISBN: 978-0-585-27316-7
eBook Packages: Springer Book Archive