Checkpointing and Communication Pattern-Neutral Algorithm for Removing Messages Logged by Senders
The traditional sender-based message logging protocols use a garbage collection algorithm to result in a large number of additional messages and forced checkpoints. So, in our previous work, an algorithm was introduced to allow each process to autonomously remove useless log information in its volatile storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. However, even after a process has executed the algorithm, its storage buffer may still be overloaded in some communication and checkpointing patterns. This paper proposes a new garbage collection algorithm CCPNA for sender-based message logging to address all the problems mentioned above. The algorithm considerably reduces the number of processes to participate in the garbage collection by using the size of the log information of each process. Thus, CCPNA incurs more additional messages and forced checkpoints than our previous algorithm. However, it can avoid the risk of overloading the storage buffers regardless of the specific checkpointing and communication patterns. Also, CCPNA reduces the number of additional messages and forced checkpoints compared with the traditional algorithm.
Keywordsmessage-passing system fault-tolerance message logging checkpointing garbage collection
Unable to display preview. Download preview PDF.
- 2.Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging. In: Proc. of the 15th International Conference on High Performance Networking and Computing (SC 2003) (November 2003)Google Scholar
- 4.Johnson, D.B., Zwaenpoel, W.: Sender-Based Message Logging. In: Digest of Papers: 17th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1987)Google Scholar
- 7.McNab, R., Howell, F.W.: simjava: a discrete event simulation package for Java with applications in computer systems modelling. In: Proc. First International Conference on Web-based Modelling and Simulation (1998)Google Scholar
- 8.Powell, M.L., Presotto, D.L.: Publishing: A reliable broadcast communication mechanism. In: Proc. of the 9th International Symposium on Operating System Principles, pp. 100–109 (1983)Google Scholar
- 11.Strom, R.E., Bacon, D.F., Yemeni, S.A.: Volatile Logging in n-Fault-Tolerant Distributed Systems. In: Digest of Papers: the 18th International Symposium on Fault-Tolerant Computing, pp. 44–49 (1988)Google Scholar
- 13.Xu, J., Netzer, R.B., Mackey, M.: Sender-based message logging for reducing rollback propagation. In: Proc. of the 7th International Symposium on Parallel and Distributed Processing, pp. 602–609 (1995)Google Scholar
- 14.Yao, B., Ssu, K.-F., Fuchs, W.K.: Message Logging in Mobile Computing. In: Proc. of the 29th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1999)Google Scholar