Advertisement

Failure detectors for crash faults in cloud

  • Bharati Sinha
  • Awadhesh Kumar Singh
  • Poonam Saini
Original Research
  • 15 Downloads

Abstract

Failure detector (FD) is an inherent component in atomic broadcast and consensus protocols. Failures are broadly categorized into two types: crash and byzantine. The crash failures simply discontinue the working of a system whereas byzantine reflects the malicious behavior while ongoing communication. The problem to detect a failure becomes more challenging in a dynamic asynchronous environment like cloud computing. The paper proposes a failure detector in order to handle crash faults in cloud while addressing scalability. We introduce BCMP networks in order to compute performance parameters of the proposed algorithm, thereby, detecting failures in an accurate manner. Although, failure detection schemes have a tradeoff between efficiency and latency, the proposed algorithm achieves optimal balance between both metrics.

Keywords

Failure detectors Cloud computing Crash faults 

Notes

References

  1. Aguilera MK, Toueg S, Deianov B (1999) Revisiting the weakest failure detector for uniform reliable broadcast. In: Proceedings of the 13th international symposium on distributed computing, pp 19–33. Springer, Heidelberg.  https://doi.org/10.1007/3-540-48169-9_2 Google Scholar
  2. Allen AO (1990) Probability, statistics, and queuing theory with computer science application. Academic Press, Inc., Boston. https://dl.acm.org/citation.cfm?id=90319. Accessed 30 Sept 2017Google Scholar
  3. Baskett F, Chandy K, Muntz R, Palacios F (1975) Open, closed,and mixed networks of queues with different classes of customers. J ACM.  https://doi.org/10.1145/321879.321887 CrossRefzbMATHGoogle Scholar
  4. Benkaouha H, Abdelli A, Ben-Othman J, Mokdad L (2016) Towards an efficient failure detection in MANETs. Wirel Commun Mobile Comput.  https://doi.org/10.1002/wcm.2739 CrossRefGoogle Scholar
  5. Chandra TD, Toueg S (1996) Unreliable failure detectors for reliable distributed systems. J ACM.  https://doi.org/10.1145/234533.234549 MathSciNetCrossRefzbMATHGoogle Scholar
  6. Chandra TD, Hadzilacos V, Toueg S (1996) The weakest failure detector for solving consensus. J ACM.  https://doi.org/10.1145/226643.226647 MathSciNetCrossRefzbMATHGoogle Scholar
  7. Chen W, Toueg S, Aguilera MK (2002) On the quality of service of failure detectors. IEEE Trans Comput.  https://doi.org/10.1109/TC.2002.1004595 MathSciNetCrossRefzbMATHGoogle Scholar
  8. Cristian F (1991) Understanding fault-tolerant distributed systems. Commun ACM.  https://doi.org/10.1145/102792.102801 CrossRefGoogle Scholar
  9. Delporte-Gallet C, Fauconnnier H, Guerraoui R (2002) A realistic look at failure detectors. In: Proceedings of the international conference on in dependable systems and networks. IEEE, pp 345–353.  https://doi.org/10.1109/dsn.2002.1028919
  10. Dwork C, Lynch N, Stockmeyer L (1988) Consensus in the presence of partial synchrony. J ACM.  https://doi.org/10.1145/42282.42283 CrossRefGoogle Scholar
  11. Elhadef M, Boukerche A (2007) A failure detection service for large-scale dependable wireless ad-hoc and sensor networks. In: The second international conference in availability, reliability and security. IEEE, pp 182–189.  https://doi.org/10.1109/ARES.2007.3
  12. Fischer MJ, Lynch NA, Paterson MS (1985) Impossibility of distributed consensus with one faulty process. J ACM.  https://doi.org/10.1145/3149.214121 CrossRefzbMATHGoogle Scholar
  13. Gupta I, Chandra TD, Goldszmidt GS (2001) On scalable and efficient distributed failure detectors. In: Proceedings of the twentieth annual ACM symposium on principles of distributed computing. ACM, pp 170–179.  https://doi.org/10.1145/383962.384010
  14. Jin R, Wang B, Wei W, Zhang X, Chen X, Bar-Shalom Y, Willete P (2016) Detecting node failures in mobile wireless networks: a probabilistic approach. IEEE Trans Mob Comput.  https://doi.org/10.1109/TMC.2015.2474371 CrossRefGoogle Scholar
  15. Larrea M, Fernandez A, Arevalo S (2000) Optimal implementation of the weakest failure detector for solving consensus. In: Proceedings the 19th IEEE symposium on reliable distributed systems. IEEE, pp 52–59.  https://doi.org/10.1145/343477.362113
  16. Larrea M, Fernandez A, Arevalo S (2002) Eventually consistent failure detectors. In: Proceedings 10th Euromicro workshop on parallel distributed and network-based processing. IEEE, pp 91–98.  https://doi.org/10.1145/378580.378747
  17. Lazowska E, Zahorjan J, Graham G, Sevick K (1984) Quantitative system performance: computer system analysis using queueing network models. Prentice-Hall, Englewood Cliffs. https://dl.acm.org/citation.cfm?id=2971. Accessed 2 Nov 2017
  18. Liu D (2015) A fault-tolerant architecture for ROIA in cloud. J Ambient Intell Humaniz Comput.  https://doi.org/10.1007/s12652-014-0220-4 CrossRefGoogle Scholar
  19. Liu J, Wu Z, Wu J, Dong J, Zhao Y, Wen D (2017) A Weibull distribution accrual failure detector for cloud computing. PLoS One.  https://doi.org/10.1371/journal.pone.0173666 CrossRefGoogle Scholar
  20. Ma T, Hillston J, Anderson S (2010) On the quality of service of crash-recovery failure detectors. IEEE Trans Depend Secure Comput.  https://doi.org/10.1109/TDSC.2009.35 CrossRefGoogle Scholar
  21. Piuri V (1994) Design of fault-tolerant distributed control systems. IEEE Trans Instrum Meas.  https://doi.org/10.1109/19.293430 CrossRefGoogle Scholar
  22. Schneider FB (1990) Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput Surv.  https://doi.org/10.1145/98163.98167 CrossRefGoogle Scholar
  23. Silva FM, Oliveira RL, Monteiro CC, Inacio PR, Freire M (2017) CloudSim Plus: a cloud computing simulation framework pursuing software engineering principles for improved modularity, extensibility and correctness. In: International symposium on integrated network management. IEEE, pp 400–407.  https://doi.org/10.23919/INM.2017.7987304
  24. Turchetti RC, Duarte EP, Arantes L, Sens P (2016) A QoS-configurable failure detection service for internet applications. J Internet Serv Appl.  https://doi.org/10.1186/s13174-016-0051-y CrossRefGoogle Scholar
  25. Wang H, Wang YJ (2018) Maximizing reliability and performance with reliability-driven task scheduling in heterogeneous distributed computing systems. J Ambient Intell Humaniz Comput.  https://doi.org/10.1007/s12652-018-0926-9 CrossRefGoogle Scholar
  26. Wang F, Jin H, Zou D, Qiang W (2014) FDKeeper: a quick and open failure detector for cloud computing system. In: International conference on computer science and software engineering. ACM, pp 1–8.  https://doi.org/10.1145/2641483.2641539
  27. Xiong N, Vasilakos AV, Wu J, Yang YR, Rindos A, Zhou Y, Pan Y (2012) A self-tuning failure detection scheme for cloud computing service. In: Parallel and distributed processing symposium (IPDPS). IEEE, pp 668–679.  https://doi.org/10.1109/IPDPS.2012.126
  28. Yi G, Heo YA, Byun H, Jeong SY (2018) MRM: mobile resource management scheme on mobile cloud computing. J Ambient Intell Humaniz Comput.  https://doi.org/10.1007/s12652-017-0578-1 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Bharati Sinha
    • 1
  • Awadhesh Kumar Singh
    • 1
  • Poonam Saini
    • 2
  1. 1.National Institute of TechnologyKurukshetraIndia
  2. 2.Punjab Engineering CollegeChandigarhIndia

Personalised recommendations