IPPS 1998: Parallel and Distributed Processing pp 742-755 | Cite as
Checkpointing protocols in distributed systems with mobile hosts: A performance analysis
Abstract
Checkpointing distributed applications involving mobile hosts is an important task to reduce the rollback during a recovery from a failure and to manage voluntary disconnections. In this paper we show the basic characteristics a checkpointing protocol needs to work with mobile hosts, namely, reduction of the number of checkpoints, the use of incremental checkpointing and consistent global checkpoint built on the fly. Previous points must be implemented by using as small control information as possible and ensuring little rollback. A comparative analysis of the performance of some interesting communication-induced checkpointing protocols, adapted to a mobile setting, is presented. The analysis has been carried out by using discrete event simulation and several models have been considered for the hosts mobility.
Preview
Unable to display preview. Download preview PDF.
References
- 1.Acharya, A. and Badrinath, B. R., Checkpointing Distributed Application on Mobile Computers, Proc. 3-th International Conference on Parallel and Distributed Information Systems, 1994.Google Scholar
- 2.Acharya, A. and Badrinath, B. R., Delivering Multicast Messages in Network with Mobile Hosts, Proc. 13-th International Conference on Distributed Computing Systems, 1993.Google Scholar
- 3.Alagar, S. and Venkatesan, S., Causal Ordering in Distributed Mobile Systems, IEEE Trans. on Computers, 46(3): 353–361, 1997.CrossRefGoogle Scholar
- 4.Alonso, R. and Korth, H., Database Systems Issues in Nomadic Computing, Proc. ACM SIGMOD International Conference on Management of Data, 1993.Google Scholar
- 5.Badrinath, B. R., Acharya, A. and Imielinsky, T., Structuring Distributed Algorithms for Mobile Hosts, Proc. 14-th International Conference on Distributed Computing Systems, 1994.Google Scholar
- 6.Baldoni, R., Quaglia, F. and Fornara, P., An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems, Proc. 16-th IEEE Int. Symposium on Reliable Distributed Systems, 1997.Google Scholar
- 7.Briatico, D., Ciuffoletti, A. and Simoncini, L., A Distributed Domino-Effect Free Recovery Algorithm, in Proc. IEEE Int. Symposium on Reliability Distributed Software and Database, 1984.Google Scholar
- 8.Chandy, K.M. and Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems, ACM Transactions on Computer Systems, 3(1): 63–75, 1985.CrossRefGoogle Scholar
- 9.Elnozahy, E. N., Johnson, D. B. and Wang, Y. M., A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Technical Report CMU-CS-96-181, Carnegie-Mellon University, 1996.Google Scholar
- 10.Imielinsky, T. and Badrinath, B. R., Wireless Computing, Communications of the ACM, 37(10): 19–27, 1994.Google Scholar
- 11.Koo, R. and Toueg, S., Checkpointing and Rollback-Recovery for Distributed Systems, IEEE Transactions on Software Engineering, 13(1): 23–31, 1987.Google Scholar
- 12.Lamport, L., Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, 21(7): 558–565, 1978.CrossRefGoogle Scholar
- 13.Prakash, R. and Singhal, M., A Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems, IEEE Transactions on Parallel and Distributed Systems, 7(10): 1035–1048, 1996.CrossRefGoogle Scholar
- 14.Quaglia, F., Baldoni, R. and Ciciani, B., A Checkpointing-Recovery Scheme For Domino Free Distributed Systems, Proc. 2-nd Workshop on Fault Tolerant Parallel and Distributed Systems, 1997.Google Scholar
- 15.Randell, B., System structure for software fault tolerance, IEEE Transactions on Software Engineering, SE1(2):220–232, 1975.Google Scholar
- 16.Russell, D.L., State Restoration in Systems of Communicating Processes, IEEE Transactions on Software Engineering, SE6(2): 183–194, 1980.Google Scholar