Abstract
Fault detection methodology is a crucial part in providing a scalable, dependable and high availability of grid computing environment. The most popular technique that used in detecting fault is heartbeat mechanism where it monitors the grid resources in a very short interval. However, the heartbeat mechanism-based technique for fault detection suffers from weaknesses of either fast detection with low accuracy or completeness in detecting failures with a lengthy timeout. In this paper, we propose Affirmative Adaptive Failure Detection (AAFD). In this technique, the integration of newly proposed failure detection algorithm and the ping service is essential not only for dynamically improving certainty level of accuracy, but it is also very significant in verifying the aliveness of a site for strong completeness failure detection and reduces waiting time. The model outperforms the existing techniques by 18% to 39% in term of algorithm performance. On the average, AAFD detection is about 30% better than other detection algorithms.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Mohd. Noor, A.S., Mat Deris, M.: Extended Heartbeat Mechanism for Fault Detection Service Methodology. In: Ślęzak, D., Kim, T.-H., Yau, S.S., Gervasi, O., Kang, B.-H. (eds.) GDC 2009. CCIS, vol. 63, pp. 88–95. Springer, Heidelberg (2009)
Hwang, S., Kesselman, C.: Introduction, Requirement for Fault Tolerance in the Grid, Related Work. A Flexible Framework for Fault Tolerance in the Grid. Journal of Grid Computing 1, 251–272 (2003)
Mills, K., Rose, S., Quirolgico, S., Britton, M., Tan, C.: An autonomic failure detection algorithm. SIGSOFT Softw. Eng. Notes 29(1), 79–83 (2004)
Dabrowski, C., Mills, K., Rukhin, A.: A Performance of Service-Discovery Architectures in Response to Node Failures. In: Proceedings of the 2003 International Conference on Software Engineering Research and Practice, SERP 2003, pp. 95–101 (2003)
Stelling, P., Foster, I., Kesselman, C., Lee, C., Laszewski, G.: A Fault Detection Service for Wide Area Distributed Computations. In: Proceedings of HPDC, pp. 268–278 (1998)
Hayashibara, N., Defago, X., Yared, R., Katayama, T.: The φ accrual failure detector. In: Proceeding of 23rd IEEE International Symposium on Reliable Distributed Systems, SRDS 2004, pp. 66–78 (2004)
Abawajy, J.H., Dandamudi, S.P.: A Reconfigurable Multi-Layered Grid Scheduling Infrastructure. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, pp. 138–144 (2003)
Abawajy, J.H.: Fault Detection Service Architecture for Grid Computing Systems. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3044, pp. 107–115. Springer, Heidelberg (2004)
Parziale, L., Dias, A., Filho, L.T., Smith, D., VanStee, J., Ver, M.: Achieving High Availability on Linux for System Z with Linux-HA Release 2. An International Business Machines (IBM) Corporation Redbooks Publication (2009)
Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service failure detectors. IEEE Transactions on Computers 51(2), 13–32 (2002)
Elhadef, M., Boukerche, A.: A Gossip-Style Crash Faults Detection Protocol for Wireless Ad-Hoc and Mesh Networks. In: Proceeding of International Conference Performance Computing and Communications, IPCCC 2007, pp. 600–602 (2007)
Bertier, M., Marin, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proceeding of International Conference on Dependable Systems and Networks, DSN 2002, pp. 354–363 (2002)
Khilar, P., Singh, J., Mahapatra, S.: Design and Evaluation of a Failure Detection Algorithm for Large Scale Ad Hoc Networks Using Cluster Based Approach. In: Proceeding of International Conference on Information Technology, ICIT 2008, pp. 153–158 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Noor, A.S.M., Deris, M.M., Herawan, T., Hassan, M.N. (2012). On Affirmative Adaptive Failure Detection. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-33065-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33064-3
Online ISBN: 978-3-642-33065-0
eBook Packages: Computer ScienceComputer Science (R0)