Checkpointing protocols in distributed systems with mobile hosts: A performance analysis

  • F. Quaglia
  • B. Ciciani
  • R. Baldoni
Workshop on Fault-Tolerant Parallel and Distributed Systems Dimiter Avresky, Boston University David B. Kaeli, Notheastern University
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1388)


Checkpointing distributed applications involving mobile hosts is an important task to reduce the rollback during a recovery from a failure and to manage voluntary disconnections. In this paper we show the basic characteristics a checkpointing protocol needs to work with mobile hosts, namely, reduction of the number of checkpoints, the use of incremental checkpointing and consistent global checkpoint built on the fly. Previous points must be implemented by using as small control information as possible and ensuring little rollback. A comparative analysis of the performance of some interesting communication-induced checkpointing protocols, adapted to a mobile setting, is presented. The analysis has been carried out by using discrete event simulation and several models have been considered for the hosts mobility.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Acharya, A. and Badrinath, B. R., Checkpointing Distributed Application on Mobile Computers, Proc. 3-th International Conference on Parallel and Distributed Information Systems, 1994.Google Scholar
  2. 2.
    Acharya, A. and Badrinath, B. R., Delivering Multicast Messages in Network with Mobile Hosts, Proc. 13-th International Conference on Distributed Computing Systems, 1993.Google Scholar
  3. 3.
    Alagar, S. and Venkatesan, S., Causal Ordering in Distributed Mobile Systems, IEEE Trans. on Computers, 46(3): 353–361, 1997.CrossRefGoogle Scholar
  4. 4.
    Alonso, R. and Korth, H., Database Systems Issues in Nomadic Computing, Proc. ACM SIGMOD International Conference on Management of Data, 1993.Google Scholar
  5. 5.
    Badrinath, B. R., Acharya, A. and Imielinsky, T., Structuring Distributed Algorithms for Mobile Hosts, Proc. 14-th International Conference on Distributed Computing Systems, 1994.Google Scholar
  6. 6.
    Baldoni, R., Quaglia, F. and Fornara, P., An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems, Proc. 16-th IEEE Int. Symposium on Reliable Distributed Systems, 1997.Google Scholar
  7. 7.
    Briatico, D., Ciuffoletti, A. and Simoncini, L., A Distributed Domino-Effect Free Recovery Algorithm, in Proc. IEEE Int. Symposium on Reliability Distributed Software and Database, 1984.Google Scholar
  8. 8.
    Chandy, K.M. and Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems, ACM Transactions on Computer Systems, 3(1): 63–75, 1985.CrossRefGoogle Scholar
  9. 9.
    Elnozahy, E. N., Johnson, D. B. and Wang, Y. M., A Survey of Rollback-Recovery Protocols in Message-Passing Systems, Technical Report CMU-CS-96-181, Carnegie-Mellon University, 1996.Google Scholar
  10. 10.
    Imielinsky, T. and Badrinath, B. R., Wireless Computing, Communications of the ACM, 37(10): 19–27, 1994.Google Scholar
  11. 11.
    Koo, R. and Toueg, S., Checkpointing and Rollback-Recovery for Distributed Systems, IEEE Transactions on Software Engineering, 13(1): 23–31, 1987.Google Scholar
  12. 12.
    Lamport, L., Time, Clocks and the Ordering of Events in a Distributed System, Communications of the ACM, 21(7): 558–565, 1978.CrossRefGoogle Scholar
  13. 13.
    Prakash, R. and Singhal, M., A Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems, IEEE Transactions on Parallel and Distributed Systems, 7(10): 1035–1048, 1996.CrossRefGoogle Scholar
  14. 14.
    Quaglia, F., Baldoni, R. and Ciciani, B., A Checkpointing-Recovery Scheme For Domino Free Distributed Systems, Proc. 2-nd Workshop on Fault Tolerant Parallel and Distributed Systems, 1997.Google Scholar
  15. 15.
    Randell, B., System structure for software fault tolerance, IEEE Transactions on Software Engineering, SE1(2):220–232, 1975.Google Scholar
  16. 16.
    Russell, D.L., State Restoration in Systems of Communicating Processes, IEEE Transactions on Software Engineering, SE6(2): 183–194, 1980.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • F. Quaglia
    • 1
  • B. Ciciani
    • 1
  • R. Baldoni
    • 1
  1. 1.Dipartimento di Informatica e SistemisticaUniversità di Roma “La Sapienza”RomaItaly

Personalised recommendations