Advertisement

Development of cognitive fault tolerant model for scientific workflows by integrating overlapped migration and check-pointing approach

  • P. Padmakumari
  • A. Umamakeswari
Original Research
  • 12 Downloads

Abstract

High computation power and storage space are needed to execute the complex scientific workflows. Cloud computing resources are used effectively to execute such complex workflows. Tasks executing in the workflow is dependent in nature and hence failure of a task affects the overall performance of the execution. In order to execute the workflow without any interrupt, a proactive intelligent fault tolerant model is necessary. This paper proposes a model called cognitive fault tolerant (CFT) with three important phases for tolerating the task and VM failure proactively. In prediction phase, combined ensemble prediction method is used to predict the task failures and label tuning algorithms are used to generate the intermediate labels and to strengthen the prediction. The segregation phase isolates the task based on priority assignment. Last phase of CFT model is recovery. Fitness checking is used to find whether the predicted failure is due to task or VM. Post prediction checkpointing (PPC) method is used as recovery process for task failure. VM failure can be recovered using post or pre replication overlapped migration method. The validation of proposed CFT model can be done by comparing with other existing algorithms. Experimental analysis shows that proposed CFT model improvise the reliability of workflow execution in cloud environment.

Keywords

Fault tolerant Check-pointing Scientific workflows Migration Cloud computing 

Notes

References

  1. Abdulhamid SM, Latiff MSA (2017) A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness. Appl Soft Comput J 61:670–680.  https://doi.org/10.1016/j.asoc.2017.08.048 CrossRefGoogle Scholar
  2. Aderholdt F, Han F, Scott SL, Naughton T (2014) Efficient checkpointing of virtual machines using virtual machine introspection. In: 2014 14th IEEE/ACM Int Symp Clust Cloud Grid Comput pp 414–423.  https://doi.org/10.1109/CCGrid.2014.72
  3. Amin A, Ammar RA, Gokhale SS (2003) An efficient method to schedule tandem of real-time tasks in cluster computing with possible processor failures. In: Proc IEEE Symp Comput Commun pp 1207–1212.  https://doi.org/10.1109/ISCC.2003.1214277
  4. Amin Z, Singh H, Sethi N (2015) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 116:11–17.  https://doi.org/10.5120/20435-2768 Google Scholar
  5. Amoon M, El-Bahnasawy N, Sadi S, Wagdi M (2018) On the design of reactive approach with flexible checkpoint interval to tolerate faults in cloud computing systems. J Ambient Intell Human Comput.  https://doi.org/10.1007/s12652-018-1139-y Google Scholar
  6. Bala A, Chana I (2012) Fault tolerance—challenges, techniques and implementation in cloud computing. Int J Comput Sci 9:288–293Google Scholar
  7. Bala A, Chana I (2015a) Intelligent failure prediction models for scientific workflows. Expert Syst Appl 42:980–989.  https://doi.org/10.1016/j.eswa.2014.09.014 CrossRefGoogle Scholar
  8. Bala A, Chana I (2015b) Expert systems with applications intelligent failure prediction models for scientific workflows. Expert Syst Appl 42:980–989.  https://doi.org/10.1016/j.eswa.2014.09.014 CrossRefGoogle Scholar
  9. Buyya R, Ranjan R, Calheiros RN (2009) Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: Proc 2009 Int Conf High Perform Comput Simulation HPCS 2009 pp 1–11.  https://doi.org/10.1109/HPCSIM.2009.5192685
  10. Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38:4626–4636.  https://doi.org/10.1016/j.eswa.2010.10.024 CrossRefGoogle Scholar
  11. Catal C, Diri B (2009) A systematic review of software fault prediction studies. Expert Syst Appl 36:7346–7354.  https://doi.org/10.1016/j.eswa.2008.10.027 CrossRefGoogle Scholar
  12. Chen W, Deelman E (2012) WorkflowSim: a toolkit for simulating scientific workflows in distributed environments. In: 2012 IEEE 8th Int Conf E-Science, e-Science 2012.  https://doi.org/10.1109/eScience.2012.6404430
  13. Dai Y, Xiang Y, Zhang G (2009) Self-healing and hybrid diagnosis in cloud computing. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 5931:45–56.  https://doi.org/10.1007/978-3-642-10665-1_5 CrossRefGoogle Scholar
  14. Deelman E, Vahi K, Juve G et al (2015) Pegasus, a workflow management system for science automation. Futur Gener Comput Syst 46:17–35.  https://doi.org/10.1016/j.future.2014.10.008 CrossRefGoogle Scholar
  15. Jhawar R, Piuri V, Santambrogio M (2012) A comprehensive conceptual system-level approach to fault tolerance in cloud computing. In: SysCon 2012–2012 IEEE Int Syst Conf Proc pp 601–605.  https://doi.org/10.1109/SysCon.2012.6189503
  16. Patra PK, Singh H, Singh G et al (2013) Fault tolerance techniques and comparative implementation in cloud computing. Int J Comput Sci 64:288–293Google Scholar
  17. Poola D, Ramamohanarao K, Buyya R (2014) Fault-tolerant workflow scheduling using spot instances on clouds. Procedia Comput Sci 29:523–533.  https://doi.org/10.1016/j.procs.2014.05.047 CrossRefGoogle Scholar
  18. Prathiba S, Sowvarnica S (2017) Survey of failures and fault tolerance in cloud. In: Proc 2017 2nd Int Conf Comput Commun Technol ICCCT 2017 pp169–172.  https://doi.org/10.1109/ICCCT2.2017.7972271
  19. Qiang W, Jiang C, Ran L et al (2015) CDMCR: multi-level fault-tolerant system for distributed applications in cloud. Int J Appl Eng Res 9:5968–5974.  https://doi.org/10.1002/sec Google Scholar
  20. Qin X, Jiang H (2006) A novel fault-tolerant scheduling algorithm for precedence constrained tasks in real-time heterogeneous systems. Parallel Comput 32:331–356.  https://doi.org/10.1016/j.parco.2006.06.006 MathSciNetCrossRefGoogle Scholar
  21. Russell I, Markov Z (2017) An introduction to the weka data mining system (Abstract Only). In: Proc 2017 ACM SIGCSE Tech Symp Comput Sci Educ—SIGCSE’17 pp 742–742.  https://doi.org/10.1145/3017680.3017821
  22. Samak T, Gunter D, Goode M et al (2012) Failure analysis of distributed scientific workflows executing in the cloud. In: Proc 2012 8th Int Conf Netw Serv Manag CNSM 2012 pp 46–54Google Scholar
  23. Ying C, Yu J, He J (2018) Towards fault tolerance optimization based on checkpoints of in-memory framework spark. J Ambient Intell Human Comput.  https://doi.org/10.1007/s12652-018-1018-6 Google Scholar
  24. Zhao W, Melliar-Smith PM, Moser LE (2010) Fault tolerance middleware for cloud computing. In: Proc 2010 IEEE 3rd Int Conf Cloud Comput CLOUD 2010 pp 67–74.  https://doi.org/10.1109/CLOUD.2010.26
  25. Zhou A, Sun Q, Li J (2017) Enhancing reliability via checkpointing in cloud computing systems. China Commun 14:108–117.  https://doi.org/10.1109/CC.2017.8010962 CrossRefGoogle Scholar
  26. Zhu X, Wang J, Guo H et al (2016) Fault-tolerant scheduling for real-time scientific workflows with elastic resource provisioning in virtualized clouds. IEEE Trans Parallel Distrib Syst 27:3501–3517.  https://doi.org/10.1109/TPDS.2016.2543731 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of ComputingSASTRA Deemed UniversityThanjavurIndia

Personalised recommendations