FTMXT: Fault-Tolerant Immediate Mode Heuristics in Computational Grid

  • Sanjaya Kumar Panda
  • Pabitra Mohan Khilar
  • Durga Prasad Mohapatra
Conference paper

Abstract

Fault tolerance plays a key role in computational grid. It enables a system to work smoothly in the presence of one or more failure components. The components are failing due to some unavoidable reasons like power failure, network failure, system failure, etc. In this chapter, we address the problem of machine failure in computational grid. The proposed system model uses the round trip time to detect the failure, and it uses the checkpointing strategy to recover from the failure. This model is applied to the traditional immediate mode heuristics such as minimum execution time (MET) and minimum completion time (MCT) (defined as MXT). The proposed Fault-Tolerant MET (FTMET) and Fault-Tolerant MCT (FTMCT) heuristics (defined as FTMXT) are simulated using MATLAB. The experimental results are discussed and compared with the traditional heuristics. The results show that the proposed approaches bypass the permanent failure and reduce the makespan.

Keywords

Immediate mode Minimum execution time Minimum completion time Scheduling Fault tolerance Grid computing 

References

  1. 1.
    Medeiros, R., Cirne, W., Brasileiro, F., Sauve, J.: Faults in Grids: why are they so bad and what can be done about it. In: Proceedings of the Fourth International Workshop on Grid Computing. (2003)Google Scholar
  2. 2.
    Murshed, M., Buyya, R., Abramson, D.: GridSim: A Toolkit for the Modeling and Simulation of Global Grids, pp. 1–15. Monash University Journal. (2001)Google Scholar
  3. 3.
    Vasques, J., Veiga, L.: A decentralized utility-based grid scheduling algorithm. In: 28th Annual ACM Symposium on Applied Computing, pp. 619–624. (2013)Google Scholar
  4. 4.
    Li, M., Xiong, N., Yang, B., Li, Z., Park, J.H., Lee, C.: Posted price model based on GRS and its optimization for improving grid resource sharing efficiency. Telecommun. Syst. 55(1), 71–79 (2014)CrossRefGoogle Scholar
  5. 5.
    Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic mapping of a class of independent tasks onto heterogeneous computing systems. J. Parallel Distrib. Comput. 59, 107–131 (1999)CrossRefGoogle Scholar
  6. 6.
    Sadashiv, N., Kumar, S.M.D.: Cluster, grid and Cloud computing: a detailed comparison. In: IEEE 6th International Conference on Computer Science and Education, Singapore, pp. 477–482. (2011)Google Scholar
  7. 7.
    Ergu, D., Kou, G., Peng, Y., Shi, Y., Shi, Y.: The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment. J. Supercomput. 64, 835–848 (2013). SpringerCrossRefGoogle Scholar
  8. 8.
    Mushtaq, H., Al-Ars, Z., Bertels, K.: Survey of fault tolerance techniques for shared memory multicore/multiprocessor systems. In: IEEE 6th International Design and Test Workshop, Beirut, Lebanon, pp. 12–17. (2011)Google Scholar
  9. 9.
    Treaster, M.: A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems. National Center for Supercomputing Applications. University of Illinois. (2005)Google Scholar
  10. 10.
    Nazir, B., Khan, T.: Fault tolerant job scheduling in computational grid. In: IEEE 2nd International Conference on Emerging Technologies, Peshawar, Pakistan, pp. 708–713. (2006)Google Scholar
  11. 11.
    Guo, S., Huang, H., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60, 263–274 (2011)CrossRefGoogle Scholar
  12. 12.
    Khanli, L.M., Far, M.E., Rahmani, A.M.: RFOH: a new fault tolerant job scheduler in grid computing. In: IEEE 2nd International Conference on Computer Engineering and Applications, Bali, Indonesia, pp. 422–425. (2010)Google Scholar
  13. 13.
    Upadhyay, N., Misra, M.: Incorporating fault tolerance in GA-based scheduling in grid environment. In: IEEE World Congress Information and Communication Technologies, Mumbai, India, pp. 772–777. (2011)Google Scholar
  14. 14.
    Nanthiya, D., Keerthika, P.: Load balancing GridSim architecture with fault tolerance. In: International Conference on Information Communication and Embedded Systems, Chennai, India, pp. 425–428. (2013)Google Scholar
  15. 15.
    Duarte, E.P., Weber, A., Fonseca, K.V.O.: Distributed diagnosis of dynamic events in partitionable arbitrary topology networks. IEEE Trans. Parallel Distrib. Syst. 23, 1415–1426 (2012)CrossRefGoogle Scholar
  16. 16.
    Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 810–837 (2001)CrossRefGoogle Scholar
  17. 17.
    Panda, S.K., Khilar, P.M., Mohapatra, D.P.: FTM2: fault tolerant batch mode heuristics in computational grid. In: 10th International Conference on Distributed Computing and Internet Technology. Lecture Notes in Computer Science, vol. 8337, pp. 98–104. (2013)Google Scholar
  18. 18.
    Panda, S.K.: Efficient scheduling heuristics for independent tasks in computational grids. M. Tech. thesis, National Institute of Technology Rourkela (2013)Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  • Sanjaya Kumar Panda
    • 1
  • Pabitra Mohan Khilar
    • 2
  • Durga Prasad Mohapatra
    • 2
  1. 1.Department of CS&EISM DhanbadDhanbadIndia
  2. 2.Department of CS&ENIT RourkelaRourkelaIndia

Personalised recommendations