Advertisement

Checkpointing and Communication Pattern-Neutral Algorithm for Removing Messages Logged by Senders

  • JinHo Ahn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4208)

Abstract

The traditional sender-based message logging protocols use a garbage collection algorithm to result in a large number of additional messages and forced checkpoints. So, in our previous work, an algorithm was introduced to allow each process to autonomously remove useless log information in its volatile storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. However, even after a process has executed the algorithm, its storage buffer may still be overloaded in some communication and checkpointing patterns. This paper proposes a new garbage collection algorithm CCPNA for sender-based message logging to address all the problems mentioned above. The algorithm considerably reduces the number of processes to participate in the garbage collection by using the size of the log information of each process. Thus, CCPNA incurs more additional messages and forced checkpoints than our previous algorithm. However, it can avoid the risk of overloading the storage buffers regardless of the specific checkpointing and communication patterns. Also, CCPNA reduces the number of additional messages and forced checkpoints compared with the traditional algorithm.

Keywords

message-passing system fault-tolerance message logging checkpointing garbage collection 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahn, J.: An Efficient Algorithm for Removing Useless Logged Messages in SBML Protocols. In: Chakraborty, G. (ed.) ICDCIT 2005. LNCS, vol. 3816, pp. 166–171. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  2. 2.
    Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging. In: Proc. of the 15th International Conference on High Performance Networking and Computing (SC 2003) (November 2003)Google Scholar
  3. 3.
    Chandy, K.M., Lamport, L.: Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)CrossRefGoogle Scholar
  4. 4.
    Johnson, D.B., Zwaenpoel, W.: Sender-Based Message Logging. In: Digest of Papers: 17th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1987)Google Scholar
  5. 5.
    Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)CrossRefGoogle Scholar
  6. 6.
    Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM 21, 558–565 (1978)zbMATHCrossRefGoogle Scholar
  7. 7.
    McNab, R., Howell, F.W.: simjava: a discrete event simulation package for Java with applications in computer systems modelling. In: Proc. First International Conference on Web-based Modelling and Simulation (1998)Google Scholar
  8. 8.
    Powell, M.L., Presotto, D.L.: Publishing: A reliable broadcast communication mechanism. In: Proc. of the 9th International Symposium on Operating System Principles, pp. 100–109 (1983)Google Scholar
  9. 9.
    Sens, P., Folliot, B.: The STAR Fault Tolerant manager for Distributed Operating Environments. Software Practice and Experience 28(10), 1079–1099 (1998)CrossRefGoogle Scholar
  10. 10.
    Schlichting, R.D., Schneider, F.B.: Fail-stop processors: an approach to designing fault-tolerant distributed computing systems. ACM Transactions on Computer Systems 1, 222–238 (1985)CrossRefGoogle Scholar
  11. 11.
    Strom, R.E., Bacon, D.F., Yemeni, S.A.: Volatile Logging in n-Fault-Tolerant Distributed Systems. In: Digest of Papers: the 18th International Symposium on Fault-Tolerant Computing, pp. 44–49 (1988)Google Scholar
  12. 12.
    Strom, R.E., Yemeni, S.A.: Optimistic recovery in distributed systems. ACM Transactions on Computer Systems 3, 204–226 (1985)CrossRefGoogle Scholar
  13. 13.
    Xu, J., Netzer, R.B., Mackey, M.: Sender-based message logging for reducing rollback propagation. In: Proc. of the 7th International Symposium on Parallel and Distributed Processing, pp. 602–609 (1995)Google Scholar
  14. 14.
    Yao, B., Ssu, K.-F., Fuchs, W.K.: Message Logging in Mobile Computing. In: Proc. of the 29th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • JinHo Ahn
    • 1
  1. 1.Dept. of Computer Science, College of ScienceKyonggi UniversityKyonggidoRepublic of Korea

Personalised recommendations