On Affirmative Adaptive Failure Detection

Noor, Ahmad Shukri Mohd; Deris, Mustafa Mat; Herawan, Tutut; Hassan, Mohamad Nor

doi:10.1007/978-3-642-33065-0_13

On Affirmative Adaptive Failure Detection

Ahmad Shukri Mohd Noor²²,
Mustafa Mat Deris²³,
Tutut Herawan²⁴ &
…
Mohamad Nor Hassan²²

Conference paper

1396 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Abstract

Fault detection methodology is a crucial part in providing a scalable, dependable and high availability of grid computing environment. The most popular technique that used in detecting fault is heartbeat mechanism where it monitors the grid resources in a very short interval. However, the heartbeat mechanism-based technique for fault detection suffers from weaknesses of either fast detection with low accuracy or completeness in detecting failures with a lengthy timeout. In this paper, we propose Affirmative Adaptive Failure Detection (AAFD). In this technique, the integration of newly proposed failure detection algorithm and the ping service is essential not only for dynamically improving certainty level of accuracy, but it is also very significant in verifying the aliveness of a site for strong completeness failure detection and reduces waiting time. The model outperforms the existing techniques by 18% to 39% in term of algorithm performance. On the average, AAFD detection is about 30% better than other detection algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mohd. Noor, A.S., Mat Deris, M.: Extended Heartbeat Mechanism for Fault Detection Service Methodology. In: Ślęzak, D., Kim, T.-H., Yau, S.S., Gervasi, O., Kang, B.-H. (eds.) GDC 2009. CCIS, vol. 63, pp. 88–95. Springer, Heidelberg (2009)
Chapter Google Scholar
Hwang, S., Kesselman, C.: Introduction, Requirement for Fault Tolerance in the Grid, Related Work. A Flexible Framework for Fault Tolerance in the Grid. Journal of Grid Computing 1, 251–272 (2003)
Article MATH Google Scholar
Mills, K., Rose, S., Quirolgico, S., Britton, M., Tan, C.: An autonomic failure detection algorithm. SIGSOFT Softw. Eng. Notes 29(1), 79–83 (2004)
Article Google Scholar
Dabrowski, C., Mills, K., Rukhin, A.: A Performance of Service-Discovery Architectures in Response to Node Failures. In: Proceedings of the 2003 International Conference on Software Engineering Research and Practice, SERP 2003, pp. 95–101 (2003)
Google Scholar
Stelling, P., Foster, I., Kesselman, C., Lee, C., Laszewski, G.: A Fault Detection Service for Wide Area Distributed Computations. In: Proceedings of HPDC, pp. 268–278 (1998)
Google Scholar
Hayashibara, N., Defago, X., Yared, R., Katayama, T.: The φ accrual failure detector. In: Proceeding of 23rd IEEE International Symposium on Reliable Distributed Systems, SRDS 2004, pp. 66–78 (2004)
Google Scholar
Abawajy, J.H., Dandamudi, S.P.: A Reconfigurable Multi-Layered Grid Scheduling Infrastructure. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, pp. 138–144 (2003)
Google Scholar
Abawajy, J.H.: Fault Detection Service Architecture for Grid Computing Systems. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3044, pp. 107–115. Springer, Heidelberg (2004)
Chapter Google Scholar
Parziale, L., Dias, A., Filho, L.T., Smith, D., VanStee, J., Ver, M.: Achieving High Availability on Linux for System Z with Linux-HA Release 2. An International Business Machines (IBM) Corporation Redbooks Publication (2009)
Google Scholar
Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service failure detectors. IEEE Transactions on Computers 51(2), 13–32 (2002)
Article MathSciNet Google Scholar
Elhadef, M., Boukerche, A.: A Gossip-Style Crash Faults Detection Protocol for Wireless Ad-Hoc and Mesh Networks. In: Proceeding of International Conference Performance Computing and Communications, IPCCC 2007, pp. 600–602 (2007)
Google Scholar
Bertier, M., Marin, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proceeding of International Conference on Dependable Systems and Networks, DSN 2002, pp. 354–363 (2002)
Google Scholar
Khilar, P., Singh, J., Mahapatra, S.: Design and Evaluation of a Failure Detection Algorithm for Large Scale Ad Hoc Networks Using Cluster Based Approach. In: Proceeding of International Conference on Information Technology, ICIT 2008, pp. 153–158 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Science and Technology, Universiti Malaysia Terengganu, 21030, Kuala Terengganu, Malaysia
Ahmad Shukri Mohd Noor & Mohamad Nor Hassan
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400, Parit Raja, Batu Pahat, Johor Darul Takzim, Malaysia
Mustafa Mat Deris
Faculty of Computer System and Software Engineering, Universiti Malaysia Pahang, Lebuh Raya Tun Razak, 26300 Gambang, Kuantan, Pahang, Malaysia
Tutut Herawan

Authors

Ahmad Shukri Mohd Noor
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Mat Deris
View author publications
You can also search for this author in PubMed Google Scholar
Tutut Herawan
View author publications
You can also search for this author in PubMed Google Scholar
Mohamad Nor Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology, Deakin University, Melbourne Burwood Campus, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Yang Xiang
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Department of Intelligent Informatics, Kyushu Sangyo University, 2-3-1 Matsukadai, Higashi-ku, 813-8503, Fukuoka, Japan
Bernady O. Apduhan
School of Information Science and Engineering, Central South University, 410083, Changsha, Hunan Province, P.R. China
Guojun Wang
Department of Information Engineering, Hiroshima University, 1-4-1, Kagamiyama, 739-8527, Higashi-Hiroshima, Japan
Koji Nakano
School of Information Technologies, University of Sydney, Building J12, 2006, Sydney, NSW, Australia
Albert Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Noor, A.S.M., Deris, M.M., Herawan, T., Hassan, M.N. (2012). On Affirmative Adaptive Failure Detection. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-33065-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics