An Extended Atomic Consistency Protocol for Recoverable DSM Systems

Brzezinski, Jerzy; Szychowiak, Michal

doi:10.1007/978-3-540-24669-5_2

Jerzy Brzezinski¹⁶ &
Michal Szychowiak¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3019))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

594 Accesses
1 Citations

Abstract

This paper describes a new checkpoint recovery protocol for Distributed Shared Memory (DSM) systems with read-write objects. It is based on independent checkpointing integrated with a coherence protocol for the atomic consistency model. The protocol offers high availability of shared objects in spite of multiple node and communication failures, introducing little overhead. It ensures fast recovery in case of multiple node failures and enables a DSM system to circumvent the network partitioning, as far as a majority partition can be constituted. A formal proof of correctness of the protocol is also presented.

This work has been partially supported by the State Committee for Scientific Research grant no. 7T11C 036 21

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Brzeziñski, J., Szychowiak, M.: Replication of Checkpoints in Recoverable DSM Systems. In: Proc. 21st Int’l Conference on Parallel and Distributed Computing and Networks PDCN 2003, Innsbruck (2003)
Google Scholar
Christodoulopoulou, R., Azimi, R., Bilas, A.: Dynamic data replication: An approach to providing fault-tolerant shared memory clusters. In: Proc. 9^thIEEE Symposium on High-Performance Computer Architecture HPCA9, Anaheim, California (2003)
Google Scholar
Kongmunvattana, A., Tanchatchawal, S., Tzeng, N.-F.: Coherence-Based Coordinated Checkpointing for Software Distributed Shared Memory Systems. In: Proc. 20^th Conference on Distributed Computing Systems, pp. 556–563 (2000)
Google Scholar
Morin, C., Kermarrec, A.-M., Morin, C., Banâtre, M.: An Efficient and Scalable Approach for Implementing Fault Tolerant DSM Architectures. IEEE transactions on Computers 49(5), 414–430 (2000)
Article Google Scholar
Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems 24(8), 321–359 (1989)
Article Google Scholar
Park, T., Yeom, H.Y.: A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems. Journal of Supercomputing 15(3), 295–320 (2002)
Article Google Scholar
Stumm, M., Zhou, S.: Fault Tolerant Distributed Shared Memory Algorithms. In: Proc. 2^ndIEEE Symposium on Parallel and Distributed Processing, Dallas, pp. 719–724 (1990)
Google Scholar
Sultan, F., Nguyen, T.D., Iftode, L.: Scalable Fault-Tolerant Distributed Shared Memory. In: Proc. Supercomputing SC 2000, Dallas, pp. 54–68 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Science, Poznan University of Technology, Piotrowo 3a, 60-965, Poznan, Poland
Jerzy Brzezinski & Michal Szychowiak

Authors

Jerzy Brzezinski
View author publications
You can also search for this author in PubMed Google Scholar
Michal Szychowiak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computational and Information Sciences, Czestochowa University of Technology,
Roman Wyrzykowski
Computer Science Department, University of Tennessee, TN 37996-3450, Knoxville, USA
Jack Dongarra
Systems Research Institute, Polish Academy of Science, Warsaw, Poland
Marcin Paprzycki
Informatics & Mathematical Modeling, Technical University of Denmark, DK-2800, Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brzezinski, J., Szychowiak, M. (2004). An Extended Atomic Consistency Protocol for Recoverable DSM Systems. In: Wyrzykowski, R., Dongarra, J., Paprzycki, M., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2003. Lecture Notes in Computer Science, vol 3019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24669-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-24669-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21946-0
Online ISBN: 978-3-540-24669-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics