Abstract
An application structured as a fault-tolerant bag of tasks adapts easily to changing resources. To be represented by a single bag of tasks, a computation must decompose into purely independent tasks. The work summarised here investigates performance of structuring approaches applicable where this ideal is not possible, partly through analysis and partly through measurements of a realistic fault-tolerant computation.
Chapter PDF
References
G. S. Almasi and A. Gottlieb. Highly Parallel Computing. Benjamin/Cummings, 2nd edition, 1994. ISBN 0-8053-0443-6.
D. E. Bakken. Supporting Fault-Tolerant Parallel Programming in Linda. PhD thesis, The University of Arizona, Aug. 1994.
A. Baratloo, P Dasgupta, and Z. M. Kedem. CALYPSO: A novel software system for fault-tolerant parallel processing on distributed platforms. In 4th International Symposium on High Performance Distributed Computing. IEEE, Aug. 1995.
P A. Bernstein, M. Hsu, and B. Mann. Implementing recoverable requests using queues. ACM SIGMOD, pages 112–122, 1990.
P. M. Chen, E. K. Lee, G. A. Gibson, R. H. Katz, and D. A. Patterson. RAID: highperformance, reliable secondary storage. ACM Computing Surveys, 26(2):145–185, June 1994.
T. Clark and K. P. Birman. Using the ISIS resource manager for distributed, fault-tolerant computing. Technical Report 92-1289, Cornell University Computer Science Department, June 1992.
J. M. del Rosario and A. Choudhary. High performance I/O for parallel computers: Problems and prospects. IEEE Computer, pages 59–68, Mar. 1994.
G. H. Golub and C. F. V Loan. Matrix Computations. John Hopkins University Press, second edition, 1989. ISBN 0-8018-3772-3.
J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kauffman, 1993.
K. Jeong. Fault-Tolerant Parallel Processing Combining Linda, Checkpointing, and Transactions. PhD thesis, New York University, Jan. 1996.
J. Smith. Fault Tolerant Parallel Applications Using a Network Of Workstations. PhD thesis, University of Newcastle upon Tyne, 1996. Forthcoming.
J. A. Smith and S. Shrivastava. Performance of data and compute intensive programs over a network of workstations. Theoretical Computer Science, 1997. To appear in special issue for Euro-Par'96 papers.
V. S. Sunderam, G. A. Geist, J. J. Dongarra, and R. J. Manchek. The PVM concurrent computing system: Evolution, experiences, and trends. Parallel Computing Vol. 20(4), pages 531–546, 1993.
M. Zyngier. md. ftp://sweet-smoke.ufr-info-p7.ibp.fr/pub/Linux/, Apr. 1996. version 0.35.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smith, J. (1997). On synchronisation in fault-tolerant data and compute intensive programs over a network of workstations. In: Lengauer, C., Griebl, M., Gorlatch, S. (eds) Euro-Par'97 Parallel Processing. Euro-Par 1997. Lecture Notes in Computer Science, vol 1300. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0002847
Download citation
DOI: https://doi.org/10.1007/BFb0002847
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63440-9
Online ISBN: 978-3-540-69549-3
eBook Packages: Springer Book Archive