Advertisement

Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover

  • Neeraj Mittal
  • Kuppahalli L. Phaneesh
  • Felix C. Freiling
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4305)

Abstract

The termination detection problem involves detecting whether an ongoing distributed computation has ceased all its activities. We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model. It has been shown that the problem is impossible to solve under crash-recovery model in general. We identify two conditions under which the termination detection problem can be solved in a safe manner. We also propose algorithms to detect termination under the conditions identified.

Keywords

Termination Detection Stable Storage Failure Detector Good Process Passive Recovery 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Tel, G.: Distributed Control for AI. Technical Report UU-CS-1998-17, Information and Computing Sciences, Utrecht University, The Netherlands (1998)Google Scholar
  2. 2.
    Dijkstra, E.W., Scholten, C.S.: Termination Detection for Diffusing Computations. Information Processing Letters (IPL) 11(1), 1–4 (1980)MATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    Francez, N.: Distributed Termination. ACM Transactions on Programming Languages and Systems (TOPLAS) 2(1), 42–55 (1980)MATHCrossRefGoogle Scholar
  4. 4.
    Stupp, G.: Stateless Termination Detection. In: Proceedings of the 16th Symposium on Distributed Computing (DISC), Toulouse, France, pp. 163–172 (2002)Google Scholar
  5. 5.
    Khokhar, A.A., Hambrusch, S.E., Kocalar, E.: Termination Detection in Data-Driven Parallel Computations/Applications. Journal of Parallel and Distributed Computing (JPDC) 63(3), 312–326 (2003)MATHCrossRefGoogle Scholar
  6. 6.
    Mittal, N., Venkatesan, S., Peri, S.: Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies. In: Proceedings of the 18th Symposium on Distributed Computing (DISC), Amsterdam, The Netherlands, pp. 290–304 (2004)Google Scholar
  7. 7.
    Matocha, J., Camp, T.: A Taxonomy of Distributed Termination Detection Algorithms. The Journal of Systems and Software 43(3), 207–221 (1998)CrossRefGoogle Scholar
  8. 8.
    Venkatesan, S.: Reliable Protocols for Distributed Termination Detection. IEEE Transactions on Reliability 38(1), 103–110 (1989)CrossRefGoogle Scholar
  9. 9.
    Lai, T.H., Wu, L.F.: An (N − 1)-Resilient Algorithm for Distributed Termination Detection. IEEE Transactions on Parallel and Distributed Systems (TPDS) 6(1), 63–78 (1995)CrossRefGoogle Scholar
  10. 10.
    Tseng, Y.C.: Detecting Termination by Weight-Throwing in a Faulty Distributed System. Journal of Parallel and Distributed Computing (JPDC) 25(1), 7–15 (1995)CrossRefGoogle Scholar
  11. 11.
    Hélary, J.M., Murfin, M., Mostefaoui, A., Raynal, M., Tronel, F.: Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors. IEEE Transactions on Parallel and Distributed Systems (TPDS) 11(9), 897–909 (2000)CrossRefGoogle Scholar
  12. 12.
    Mittal, N., Freiling, F.C., Venkatesan, S., Penso, L.D.: Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System. In: Proceedings of the 19th Symposium on Distributed Computing (DISC), pp. 93–107 (2005)Google Scholar
  13. 13.
    Wu, L.F., Lai, T.H., Tseng, Y.C.: Consensus and Termination Detection in the Presence of Faulty Processes. In: Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS), Hsinchu, Taiwan, pp. 267–274 (1992)Google Scholar
  14. 14.
    Majuntke, M.: Termination Detection in Systems Where Processes May Crash and Recover. Master’s thesis, RWTH Aachen University (2006)Google Scholar
  15. 15.
    Mittal, N., Phaneesh, K.L., Freiling, F.C.: Safe Termination Detection in an Asynchronous Distributed System when Processes may Crash and Recover. Technical Report UTDCS-41-06, Department of Computer Science, The University of Texas at Dallas, Richardson, TX 75083, USA (2006)Google Scholar
  16. 16.
    Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM (CACM) 21(7), 558–565 (1978)MATHCrossRefGoogle Scholar
  17. 17.
    Aguilera, M.K., Chen, W., Toueg, S.: Failure Detection and Consensus in the Crash Recovery Model. Distributed Computing (DC) 13(2), 99–125 (2000)CrossRefGoogle Scholar
  18. 18.
    Basu, A., Charron-Bost, B., Toueg, S.: Simulating Reliable Links with Unreliable Links in the Presence of Process Crashes. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 105–122. Springer, Heidelberg (1996)Google Scholar
  19. 19.
    Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM 43(2), 225–267 (1996)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Larrea, M., Fernández, A., Arévalo, S.: On the Implementation of Unreliable Failure Detectors in Partially Synchronous Systems. IEEE Transactions on Computers 53(7), 815–828 (2004)CrossRefGoogle Scholar
  21. 21.
    Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A Realistic Look At Failure Detectors. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, pp. 345–353 (2002)Google Scholar
  22. 22.
    Mattern, F.: Virtual Time and Global States of Distributed Systems. In: Bermond, J.-C., Raynal, M. (eds.) WDAG 1989. LNCS, vol. 392, pp. 215–226. Springer, Heidelberg (1989)Google Scholar
  23. 23.
    Fidge, C.J.: Logical Time in Distributed Computing Systems. IEEE Computer 24(8), 28–33 (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Neeraj Mittal
    • 1
  • Kuppahalli L. Phaneesh
    • 1
  • Felix C. Freiling
    • 2
  1. 1.Department of Computer ScienceThe University of Texas at DallasRichardsonUSA
  2. 2.Department of Computer ScienceUniversity of MannheimMannheimGermany

Personalised recommendations