Skip to main content

An Efficient Recoverable DSM on a Network of Workstations: Design and Implementation

  • Chapter
  • 107 Accesses

Abstract

In front of the increasing throughput of local area networks, Networks Of Workstations (NOW) have become a convenient and cheaper alternative to parallel architectures for the execution of long-running parallel applications. However, made up of a large number of components they may experience failures. ICARE is a recoverable distributed shared memory (RDSM), based on backward error recovery, implemented on an ATM-based platform running CHORUS microkernel. This paper presents the implementation and performance evaluation of ICARE which exhibits a low overhead. Indeed, ICARE takes benefit of the already existing features of a DSM system in order to combine both availability and efficiency. Shared data are stored in standard memories and are managed by extending the coherence protocol.

This work has been partially funded by the DRET research contract number 93.124.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Cabillic, G. Muller, and I. Puaut. The performance of consistent checkpointingin distributed shared memory systems. In Proc. of the 14th Symposium on Reliable Distributed Systems, September 1995.

    Google Scholar 

  2. J.B. Carter, A.L. Cox, S. Dwarkadas, E.M. Elnozahi, D.B. Johnson, P. Keheler, S. Rodrigues, W. Yu, and W. Zwaenepoel. Network multicomputing using recoverable distributed shared memory. In Proc. of the IEEE International Conference CompCon’93, 1993.

    Google Scholar 

  3. M. Costa, P. Guedes, M. Sequeira, N. Neves, and M. Castro. Lightweight logging for lazy release consistent distributed shared memory. In Proc of the Symposium on Operating Systems Design and Implementation, November 1996.

    Google Scholar 

  4. B.D. Fleisch. Reliable distributed shared memory. In Proc. of the 2nd Workshop on Experimental Distributed Systems, pages 102–105, 1990.

    Google Scholar 

  5. G. Janakiraman and Y. Tamir. Coordinated checkpointing-rollback error recovery for distributed shared memory multicomputers. In Proc of the 13th Symposium on Reliable Distributed Systems, 1994.

    Google Scholar 

  6. B. Janssens and W.K. Fuchs. Relaxing consistency in recoverable distributed shared memory. In Proc of the 23rd International Symposium on Fault Tolerant Computing Systems, 1993.

    Google Scholar 

  7. B. Janssens and W.K. Fuchs. Reducing interprocessor dependence in recoverable distributed shared memory. In Proc. of the 13th Symposium on Reliable Distributed Systems, October 1994.

    Google Scholar 

  8. A.-M. Kermarrec. Contrôle de la réplication des données dans une mémoire virtuelle partagée recouvrable efficace (control of data replication in an efficient distributed shared virtual memory). Technique et science informatiques, 15, May 1996.

    Google Scholar 

  9. A.-M. Kermarrec. Une approche globale fondée sur la réplication pour la disponibilité et l’efficacité des systèmes extensibles à mémoire partagée (A Global Approach, Based on Data Replication for Efficiency and Highavailability in Large-scale Distributed Shared Memory Systems). PhD thesis, Université de Rennes 1, 1996.

    Google Scholar 

  10. K. Li and P. Hudack. Memory coherence in shared memory systems. ACM Transactions on Computer Systems, 7(4):321–359, November 1989.

    Article  Google Scholar 

  11. N. Neves, M. Castro, and P. Guedes. A checkpoint protocol for an entry consistent shared memory system. In Proc. of the 13th ACM Symposium on Principles of Distributed Computing, August 1994.

    Google Scholar 

  12. G.G. Richard III and M. Singhal. Using logging and asynchronous checkpointing to implement recoverable distributed shared memory. In Proc. of the 12th Symposium on Reliable Distributed Systems, 1993.

    Google Scholar 

  13. M. Rozier, V. Abrossimov, F. Armand, I. Boule, M. Gien, M. Guillemont, F. Hermann, P. Leonard, S. Langlois, and W. Neuhauser. Chorus distributed operating system. Computing Systems, 1(4):305–370, October 1988.

    Google Scholar 

  14. G. Suri, B. Janssens, and W.K. Fuchs. Reduced overhead logging for rollback recovery in distributed shared memory. In Proc of the 25th International Symposium on Fault Tolerant Computing Systems, 1995.

    Google Scholar 

  15. V. Tarn and M. Hsu. Fast recovery in distributed shared virtual memory systems. In Proc. of the 10th International Conference on Distributed Computing Systems, June 1990.

    Google Scholar 

  16. T.J. Wilkinson. Implementing Fault Tolerance in a 64-bit Distributed Operating System. PhD thesis, City University, London, July 1993.

    Google Scholar 

  17. K.L. Wu and W.K. Fuchs. Recoverable distributed shared virtual memory. IEEE Transactions on Computers, 39(4), April 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kermarrec, AM., Morin, C. (1998). An Efficient Recoverable DSM on a Network of Workstations: Design and Implementation. In: Fault-Tolerant Parallel and Distributed Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5449-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5449-3_7

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7488-6

  • Online ISBN: 978-1-4615-5449-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics