On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

Yang, Chao-Tung; Liu, Jung-Chun; Hsu, Ching-Hsien; Chou, Wei-Li

doi:10.1007/s11227-013-1045-1

On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

Published: 10 December 2013

Volume 69, pages 1103–1122, (2014)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chao-Tung Yang¹,
Jung-Chun Liu¹,
Ching-Hsien Hsu² &
…
Wei-Li Chou¹

914 Accesses
37 Citations
Explore all metrics

Abstract

Virtualization, particularly in the field of cloud computing, is a common strategy to improve existing computing resources. Hadoop, one of the Apache projects, is designed to scale up from single servers to thousands of machines, each offering local computation and storage capabilities. However, how to guarantee both stability and reliability of virtualization have become important topics. In this article, to reach this goal we used current open-source software and platforms, for instance, the Xen-Hypervisor virtualization technology, and the OpenNebula virtual machines management tool. After extending components capabilities, we developed a mechanism to support our ideas and reached high availability with Hadoop that is also called as virtualization fault tolerance (VFT). We considered a practical problem, i.e., the single-point-of-failure issue that occurs frequently in virtualization systems, and the experimental results confirm that the downtime interval can be greatly shortened even if failure occurred. As a result, VFT is useful not only for Hadoop applications, but also for more areas in cluster-based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fault-tolerant feedback virtual machine deployment based on user-personalized requirements

Article 19 May 2018

Using independent resource allocation strategies to solve conflicts of Hadoop distributed architecture in virtualization

Article 20 November 2020

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

References

Chaudhary V, Minsuk C, Walters JP, Guercio S, Gallo S (2008) A comparison of virtualization technologies for HPC. In: 22nd international conference on advanced information networking and applications, AINA 2008, pp 861–868
Google Scholar
Rafael M-V, Ruben SM, Ignacio ML (2009) Elastic management of cluster-based services in the cloud. In: Proceedings of the 1st workshop on automated control for datacenters and clouds, Barcelona, Spain. ACM, New York, pp 19–24
Google Scholar
Engelmann C, Scott SL, Leangsuksun C, He X (2008) 8th IEEE international symposium on symmetric active/active high availability for high-performance computing system services: accomplishments and limitations. In: Cluster computing and the grid, CCGRID ‘08, pp 813–818
Google Scholar
Turner D, Xuehua C (2002) Protocol-dependent message-passing performance on Linux clusters. In: IEEE international conference on cluster computing, proceedings, pp 187–194
Chapter Google Scholar
Xen (2013) Available: http://www.xen.org/. Accessed 3 June 2013
Hadoop (2013) Available: http://hadoop.apache.org. Accessed 3 June 2013
Ning C, Zhong-hai W, Hong-zhi L, Qi-xun Z (2010) Improving downloading performance in hadoop distributed file system. J Comput Appl. doi:10.1016/j.future.2008.07.009
Grossman RL, Gu Y, Sabala M, Zhang W (2009) Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst 25:179–183
Article Google Scholar
Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: IEEE international symposium on performance analysis of systems & software (ISPASS), White Plains, NY, pp 122–133
Chapter Google Scholar
Mackey G, Sehrish S, Jun W (2009) Improving metadata management for small files in HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–4
Chapter Google Scholar
Cloudera. Available: http://www.cloudera.com
Xuhui L, Jizhong H, Yunqin Z, Chengde H, Xubin H (2009) Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–8
Google Scholar
White T (2012) Hadoop: The definitive guide. Storage and analysis at Internet scale, 3rd edn. O’Reilly Media/Yahoo Press, Sebastopol
Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:1–26
Article MATH Google Scholar
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. Oper Syst Rev 37:29–43
Article Google Scholar
Engelmann C, Scott SL, Leangsuksun C, He X (2006) Active/active replication for highly available HPC system services. In: The first international conference on availability, reliability and security, ARES 2006, p 7
Google Scholar
Fei-fei L, Xiang-zhan Y, Gang W (2009) Design and implementation of high availability distributed system based on multi-level heartbeat protocol. In: IITA international conference on control, automation and systems engineering, CASE 2009, pp 83–87
Google Scholar
Walters J, Chaudhary V (2009) A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50:209–239
Article Google Scholar
Vargas E (2000) High availability fundamentals. Sun Microsystems, Santa Clara
Google Scholar
Vallee G, Engelmann C, Tikotekar A, Naughton T, Charoenpornwattana K, Leangsuksun C, Scott SL (2008) A framework for proactive fault tolerance. In: Third international conference on availability, reliability and security, ARES 08, pp 659–664
Google Scholar
Ang C-W, Tham C-K (2007) Analysis and optimization of service availability in a HA cluster with load-dependent machine availability. IEEE Trans Parallel Distrib Syst 18:1307–1319
Article Google Scholar
Dejan M, Liorente LM, Montero RS (2011) OpenNebula: a cloud management tool. IEEE Internet Comput 15:11–14
Google Scholar
Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The Eucalyptus open-source cloud-computing system. Presented at the proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid
Sempolinski P, Thain D (2010) A comparison and critique of eucalyptus, OpenNebula and Nimbus. In: IEEE second international conference on cloud computing technology and science (CloudCom), pp 417–426
Google Scholar
Yang C-T, Cheng H-Y, Chou W-L, Kuo C-T (2011) A dynamic resource allocation model for virtual machine management on cloud. In: Symposium on cloud and service computing
Google Scholar
Piedad F, Hawkins M (2001) High availability, design, techniques and processes. Prentice-Hall, New York
Google Scholar
DRBD Official Site (2013) Available: http://www.drbd.org. Accessed 3 June 2013
Heartbeat—Linux High Availability (2013) Available: http://linux-ha.org/wiki/Heartbeat. Accessed 3 June 2013
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. Oper Syst Rev 37:164–177
Article Google Scholar
Hagen Wv (2008) Professional Xen virtualization, 1st edn. Wiley, New York
Google Scholar
Yang C-T, Tseng C-H, Chou K-Y, Tsaur S-C (2009) A virtualized HPC cluster computing environment on Xen with web-based user interface. In: Second international conference, HPCA 2009, Shanghai, China, 10–12 August 2009, pp 503–508. Revised Selected papers. doi:10.1007/978-3-642-11842-5_70
Google Scholar
Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of the 21st annual international conference on supercomputing, Seattle, Washington. doi:10.1145/1274971.1274978
Google Scholar
Montero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71:750–757
Article Google Scholar
Sotomayor B, Montero RS, Llorente IM, Foster I (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22
Article Google Scholar
OpenVZ. Available: http://wiki.openvz.org/Main_Page
OpenNebula. Available: http://www.opennebula.org
Hai Z, Kun T, Xuejie Z (2010) An approach to optimized resource scheduling algorithm for open-source cloud systems. In: Fifth annual ChinaGrid conference (ChinaGrid), pp 124–129
Google Scholar
Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE international conference on computer and information technology, pp 2736–2743
Google Scholar
Apache JMeter. Available: http://jakarta.apache.org. Accessed 3 June 2013
MountableHDFS. Available: http://wiki.apache.org/hadoop/MountableHDFS. Accessed 3 June 2013

Download references

Acknowledgements

This work is sponsored by Tunghai University, The U-Care ICT Integration Platform for the Elderly, No. 102GREEnS004-2, Aug. 2013. This work was also supported in part by the National Science Council, Taiwan ROC, under grant numbers NSC102-2218- E-029-002 and NSC101-2218-E-029-004.

Author information

Authors and Affiliations

Department of Computer Science, Tunghai University, Taichung, 40704, Taiwan
Chao-Tung Yang, Jung-Chun Liu & Wei-Li Chou
Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, 30010, Taiwan
Ching-Hsien Hsu

Authors

Chao-Tung Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Chun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Hsien Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Li Chou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao-Tung Yang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, CT., Liu, JC., Hsu, CH. et al. On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. J Supercomput 69, 1103–1122 (2014). https://doi.org/10.1007/s11227-013-1045-1

Download citation

Published: 10 December 2013
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11227-013-1045-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

Abstract

Access this article

Similar content being viewed by others

Fault-tolerant feedback virtual machine deployment based on user-personalized requirements

Using independent resource allocation strategies to solve conflicts of Hadoop distributed architecture in virtualization

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism

Abstract

Access this article

Similar content being viewed by others

Fault-tolerant feedback virtual machine deployment based on user-personalized requirements

Using independent resource allocation strategies to solve conflicts of Hadoop distributed architecture in virtualization

Real-Time Fault-Tolerant Scheduling Algorithm in Virtualized Clouds

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation