Skip to main content

On Affirmative Adaptive Failure Detection

  • Conference paper
  • 1396 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7440))

Abstract

Fault detection methodology is a crucial part in providing a scalable, dependable and high availability of grid computing environment. The most popular technique that used in detecting fault is heartbeat mechanism where it monitors the grid resources in a very short interval. However, the heartbeat mechanism-based technique for fault detection suffers from weaknesses of either fast detection with low accuracy or completeness in detecting failures with a lengthy timeout. In this paper, we propose Affirmative Adaptive Failure Detection (AAFD). In this technique, the integration of newly proposed failure detection algorithm and the ping service is essential not only for dynamically improving certainty level of accuracy, but it is also very significant in verifying the aliveness of a site for strong completeness failure detection and reduces waiting time. The model outperforms the existing techniques by 18% to 39% in term of algorithm performance. On the average, AAFD detection is about 30% better than other detection algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Mohd. Noor, A.S., Mat Deris, M.: Extended Heartbeat Mechanism for Fault Detection Service Methodology. In: Ślęzak, D., Kim, T.-H., Yau, S.S., Gervasi, O., Kang, B.-H. (eds.) GDC 2009. CCIS, vol. 63, pp. 88–95. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  • Hwang, S., Kesselman, C.: Introduction, Requirement for Fault Tolerance in the Grid, Related Work. A Flexible Framework for Fault Tolerance in the Grid. Journal of Grid Computing 1, 251–272 (2003)

    Article  MATH  Google Scholar 

  • Mills, K., Rose, S., Quirolgico, S., Britton, M., Tan, C.: An autonomic failure detection algorithm. SIGSOFT Softw. Eng. Notes 29(1), 79–83 (2004)

    Article  Google Scholar 

  • Dabrowski, C., Mills, K., Rukhin, A.: A Performance of Service-Discovery Architectures in Response to Node Failures. In: Proceedings of the 2003 International Conference on Software Engineering Research and Practice, SERP 2003, pp. 95–101 (2003)

    Google Scholar 

  • Stelling, P., Foster, I., Kesselman, C., Lee, C., Laszewski, G.: A Fault Detection Service for Wide Area Distributed Computations. In: Proceedings of HPDC, pp. 268–278 (1998)

    Google Scholar 

  • Hayashibara, N., Defago, X., Yared, R., Katayama, T.: The φ accrual failure detector. In: Proceeding of 23rd IEEE International Symposium on Reliable Distributed Systems, SRDS 2004, pp. 66–78 (2004)

    Google Scholar 

  • Abawajy, J.H., Dandamudi, S.P.: A Reconfigurable Multi-Layered Grid Scheduling Infrastructure. In: Proceedings of International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, pp. 138–144 (2003)

    Google Scholar 

  • Abawajy, J.H.: Fault Detection Service Architecture for Grid Computing Systems. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3044, pp. 107–115. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  • Parziale, L., Dias, A., Filho, L.T., Smith, D., VanStee, J., Ver, M.: Achieving High Availability on Linux for System Z with Linux-HA Release 2. An International Business Machines (IBM) Corporation Redbooks Publication (2009)

    Google Scholar 

  • Chen, W., Toueg, S., Aguilera, M.K.: On the quality of service failure detectors. IEEE Transactions on Computers 51(2), 13–32 (2002)

    Article  MathSciNet  Google Scholar 

  • Elhadef, M., Boukerche, A.: A Gossip-Style Crash Faults Detection Protocol for Wireless Ad-Hoc and Mesh Networks. In: Proceeding of International Conference Performance Computing and Communications, IPCCC 2007, pp. 600–602 (2007)

    Google Scholar 

  • Bertier, M., Marin, P.: Implementation and performance evaluation of an adaptable failure detector. In: Proceeding of International Conference on Dependable Systems and Networks, DSN 2002, pp. 354–363 (2002)

    Google Scholar 

  • Khilar, P., Singh, J., Mahapatra, S.: Design and Evaluation of a Failure Detection Algorithm for Large Scale Ad Hoc Networks Using Cluster Based Approach. In: Proceeding of International Conference on Information Technology, ICIT 2008, pp. 153–158 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Noor, A.S.M., Deris, M.M., Herawan, T., Hassan, M.N. (2012). On Affirmative Adaptive Failure Detection. In: Xiang, Y., Stojmenovic, I., Apduhan, B.O., Wang, G., Nakano, K., Zomaya, A. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2012. Lecture Notes in Computer Science, vol 7440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33065-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33065-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33064-3

  • Online ISBN: 978-3-642-33065-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics