Skip to main content
Log in

On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Virtualization, particularly in the field of cloud computing, is a common strategy to improve existing computing resources. Hadoop, one of the Apache projects, is designed to scale up from single servers to thousands of machines, each offering local computation and storage capabilities. However, how to guarantee both stability and reliability of virtualization have become important topics. In this article, to reach this goal we used current open-source software and platforms, for instance, the Xen-Hypervisor virtualization technology, and the OpenNebula virtual machines management tool. After extending components capabilities, we developed a mechanism to support our ideas and reached high availability with Hadoop that is also called as virtualization fault tolerance (VFT). We considered a practical problem, i.e., the single-point-of-failure issue that occurs frequently in virtualization systems, and the experimental results confirm that the downtime interval can be greatly shortened even if failure occurred. As a result, VFT is useful not only for Hadoop applications, but also for more areas in cluster-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  1. Chaudhary V, Minsuk C, Walters JP, Guercio S, Gallo S (2008) A comparison of virtualization technologies for HPC. In: 22nd international conference on advanced information networking and applications, AINA 2008, pp 861–868

    Google Scholar 

  2. Rafael M-V, Ruben SM, Ignacio ML (2009) Elastic management of cluster-based services in the cloud. In: Proceedings of the 1st workshop on automated control for datacenters and clouds, Barcelona, Spain. ACM, New York, pp 19–24

    Google Scholar 

  3. Engelmann C, Scott SL, Leangsuksun C, He X (2008) 8th IEEE international symposium on symmetric active/active high availability for high-performance computing system services: accomplishments and limitations. In: Cluster computing and the grid, CCGRID ‘08, pp 813–818

    Google Scholar 

  4. Turner D, Xuehua C (2002) Protocol-dependent message-passing performance on Linux clusters. In: IEEE international conference on cluster computing, proceedings, pp 187–194

    Chapter  Google Scholar 

  5. Xen (2013) Available: http://www.xen.org/. Accessed 3 June 2013

  6. Hadoop (2013) Available: http://hadoop.apache.org. Accessed 3 June 2013

  7. Ning C, Zhong-hai W, Hong-zhi L, Qi-xun Z (2010) Improving downloading performance in hadoop distributed file system. J Comput Appl. doi:10.1016/j.future.2008.07.009

  8. Grossman RL, Gu Y, Sabala M, Zhang W (2009) Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst 25:179–183

    Article  Google Scholar 

  9. Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: IEEE international symposium on performance analysis of systems & software (ISPASS), White Plains, NY, pp 122–133

    Chapter  Google Scholar 

  10. Mackey G, Sehrish S, Jun W (2009) Improving metadata management for small files in HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–4

    Chapter  Google Scholar 

  11. Cloudera. Available: http://www.cloudera.com

  12. Xuhui L, Jizhong H, Yunqin Z, Chengde H, Xubin H (2009) Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–8

    Google Scholar 

  13. White T (2012) Hadoop: The definitive guide. Storage and analysis at Internet scale, 3rd edn. O’Reilly Media/Yahoo Press, Sebastopol

    Google Scholar 

  14. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:1–26

    Article  MATH  Google Scholar 

  15. Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. Oper Syst Rev 37:29–43

    Article  Google Scholar 

  16. Engelmann C, Scott SL, Leangsuksun C, He X (2006) Active/active replication for highly available HPC system services. In: The first international conference on availability, reliability and security, ARES 2006, p 7

    Google Scholar 

  17. Fei-fei L, Xiang-zhan Y, Gang W (2009) Design and implementation of high availability distributed system based on multi-level heartbeat protocol. In: IITA international conference on control, automation and systems engineering, CASE 2009, pp 83–87

    Google Scholar 

  18. Walters J, Chaudhary V (2009) A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50:209–239

    Article  Google Scholar 

  19. Vargas E (2000) High availability fundamentals. Sun Microsystems, Santa Clara

    Google Scholar 

  20. Vallee G, Engelmann C, Tikotekar A, Naughton T, Charoenpornwattana K, Leangsuksun C, Scott SL (2008) A framework for proactive fault tolerance. In: Third international conference on availability, reliability and security, ARES 08, pp 659–664

    Google Scholar 

  21. Ang C-W, Tham C-K (2007) Analysis and optimization of service availability in a HA cluster with load-dependent machine availability. IEEE Trans Parallel Distrib Syst 18:1307–1319

    Article  Google Scholar 

  22. Dejan M, Liorente LM, Montero RS (2011) OpenNebula: a cloud management tool. IEEE Internet Comput 15:11–14

    Google Scholar 

  23. Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The Eucalyptus open-source cloud-computing system. Presented at the proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid

  24. Sempolinski P, Thain D (2010) A comparison and critique of eucalyptus, OpenNebula and Nimbus. In: IEEE second international conference on cloud computing technology and science (CloudCom), pp 417–426

    Google Scholar 

  25. Yang C-T, Cheng H-Y, Chou W-L, Kuo C-T (2011) A dynamic resource allocation model for virtual machine management on cloud. In: Symposium on cloud and service computing

    Google Scholar 

  26. Piedad F, Hawkins M (2001) High availability, design, techniques and processes. Prentice-Hall, New York

    Google Scholar 

  27. DRBD Official Site (2013) Available: http://www.drbd.org. Accessed 3 June 2013

  28. Heartbeat—Linux High Availability (2013) Available: http://linux-ha.org/wiki/Heartbeat. Accessed 3 June 2013

  29. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. Oper Syst Rev 37:164–177

    Article  Google Scholar 

  30. Hagen Wv (2008) Professional Xen virtualization, 1st edn. Wiley, New York

    Google Scholar 

  31. Yang C-T, Tseng C-H, Chou K-Y, Tsaur S-C (2009) A virtualized HPC cluster computing environment on Xen with web-based user interface. In: Second international conference, HPCA 2009, Shanghai, China, 10–12 August 2009, pp 503–508. Revised Selected papers. doi:10.1007/978-3-642-11842-5_70

    Google Scholar 

  32. Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of the 21st annual international conference on supercomputing, Seattle, Washington. doi:10.1145/1274971.1274978

    Google Scholar 

  33. Montero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71:750–757

    Article  Google Scholar 

  34. Sotomayor B, Montero RS, Llorente IM, Foster I (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22

    Article  Google Scholar 

  35. OpenVZ. Available: http://wiki.openvz.org/Main_Page

  36. OpenNebula. Available: http://www.opennebula.org

  37. Hai Z, Kun T, Xuejie Z (2010) An approach to optimized resource scheduling algorithm for open-source cloud systems. In: Fifth annual ChinaGrid conference (ChinaGrid), pp 124–129

    Google Scholar 

  38. Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE international conference on computer and information technology, pp 2736–2743

    Google Scholar 

  39. Apache JMeter. Available: http://jakarta.apache.org. Accessed 3 June 2013

  40. MountableHDFS. Available: http://wiki.apache.org/hadoop/MountableHDFS. Accessed 3 June 2013

Download references

Acknowledgements

This work is sponsored by Tunghai University, The U-Care ICT Integration Platform for the Elderly, No. 102GREEnS004-2, Aug. 2013. This work was also supported in part by the National Science Council, Taiwan ROC, under grant numbers NSC102-2218- E-029-002 and NSC101-2218-E-029-004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, CT., Liu, JC., Hsu, CH. et al. On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. J Supercomput 69, 1103–1122 (2014). https://doi.org/10.1007/s11227-013-1045-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-013-1045-1

Keywords

Navigation