Abstract
Virtualization, particularly in the field of cloud computing, is a common strategy to improve existing computing resources. Hadoop, one of the Apache projects, is designed to scale up from single servers to thousands of machines, each offering local computation and storage capabilities. However, how to guarantee both stability and reliability of virtualization have become important topics. In this article, to reach this goal we used current open-source software and platforms, for instance, the Xen-Hypervisor virtualization technology, and the OpenNebula virtual machines management tool. After extending components capabilities, we developed a mechanism to support our ideas and reached high availability with Hadoop that is also called as virtualization fault tolerance (VFT). We considered a practical problem, i.e., the single-point-of-failure issue that occurs frequently in virtualization systems, and the experimental results confirm that the downtime interval can be greatly shortened even if failure occurred. As a result, VFT is useful not only for Hadoop applications, but also for more areas in cluster-based systems.
Similar content being viewed by others
References
Chaudhary V, Minsuk C, Walters JP, Guercio S, Gallo S (2008) A comparison of virtualization technologies for HPC. In: 22nd international conference on advanced information networking and applications, AINA 2008, pp 861–868
Rafael M-V, Ruben SM, Ignacio ML (2009) Elastic management of cluster-based services in the cloud. In: Proceedings of the 1st workshop on automated control for datacenters and clouds, Barcelona, Spain. ACM, New York, pp 19–24
Engelmann C, Scott SL, Leangsuksun C, He X (2008) 8th IEEE international symposium on symmetric active/active high availability for high-performance computing system services: accomplishments and limitations. In: Cluster computing and the grid, CCGRID ‘08, pp 813–818
Turner D, Xuehua C (2002) Protocol-dependent message-passing performance on Linux clusters. In: IEEE international conference on cluster computing, proceedings, pp 187–194
Xen (2013) Available: http://www.xen.org/. Accessed 3 June 2013
Hadoop (2013) Available: http://hadoop.apache.org. Accessed 3 June 2013
Ning C, Zhong-hai W, Hong-zhi L, Qi-xun Z (2010) Improving downloading performance in hadoop distributed file system. J Comput Appl. doi:10.1016/j.future.2008.07.009
Grossman RL, Gu Y, Sabala M, Zhang W (2009) Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst 25:179–183
Shafer J, Rixner S, Cox AL (2010) The hadoop distributed filesystem: balancing portability and performance. In: IEEE international symposium on performance analysis of systems & software (ISPASS), White Plains, NY, pp 122–133
Mackey G, Sehrish S, Jun W (2009) Improving metadata management for small files in HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–4
Cloudera. Available: http://www.cloudera.com
Xuhui L, Jizhong H, Yunqin Z, Chengde H, Xubin H (2009) Implementing WebGIS on hadoop: a case study of improving small file I/O performance on HDFS. In: IEEE international conference on cluster computing and workshops, CLUSTER’09, pp 1–8
White T (2012) Hadoop: The definitive guide. Storage and analysis at Internet scale, 3rd edn. O’Reilly Media/Yahoo Press, Sebastopol
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26:1–26
Ghemawat S, Gobioff H, Leung S-T (2003) The Google file system. Oper Syst Rev 37:29–43
Engelmann C, Scott SL, Leangsuksun C, He X (2006) Active/active replication for highly available HPC system services. In: The first international conference on availability, reliability and security, ARES 2006, p 7
Fei-fei L, Xiang-zhan Y, Gang W (2009) Design and implementation of high availability distributed system based on multi-level heartbeat protocol. In: IITA international conference on control, automation and systems engineering, CASE 2009, pp 83–87
Walters J, Chaudhary V (2009) A fault-tolerant strategy for virtualized HPC clusters. J Supercomput 50:209–239
Vargas E (2000) High availability fundamentals. Sun Microsystems, Santa Clara
Vallee G, Engelmann C, Tikotekar A, Naughton T, Charoenpornwattana K, Leangsuksun C, Scott SL (2008) A framework for proactive fault tolerance. In: Third international conference on availability, reliability and security, ARES 08, pp 659–664
Ang C-W, Tham C-K (2007) Analysis and optimization of service availability in a HA cluster with load-dependent machine availability. IEEE Trans Parallel Distrib Syst 18:1307–1319
Dejan M, Liorente LM, Montero RS (2011) OpenNebula: a cloud management tool. IEEE Internet Comput 15:11–14
Nurmi D, Wolski R, Grzegorczyk C, Obertelli G, Soman S, Youseff L, Zagorodnov D (2009) The Eucalyptus open-source cloud-computing system. Presented at the proceedings of the 2009 9th IEEE/ACM international symposium on cluster computing and the grid
Sempolinski P, Thain D (2010) A comparison and critique of eucalyptus, OpenNebula and Nimbus. In: IEEE second international conference on cloud computing technology and science (CloudCom), pp 417–426
Yang C-T, Cheng H-Y, Chou W-L, Kuo C-T (2011) A dynamic resource allocation model for virtual machine management on cloud. In: Symposium on cloud and service computing
Piedad F, Hawkins M (2001) High availability, design, techniques and processes. Prentice-Hall, New York
DRBD Official Site (2013) Available: http://www.drbd.org. Accessed 3 June 2013
Heartbeat—Linux High Availability (2013) Available: http://linux-ha.org/wiki/Heartbeat. Accessed 3 June 2013
Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer R, Pratt I, Warfield A (2003) Xen and the art of virtualization. Oper Syst Rev 37:164–177
Hagen Wv (2008) Professional Xen virtualization, 1st edn. Wiley, New York
Yang C-T, Tseng C-H, Chou K-Y, Tsaur S-C (2009) A virtualized HPC cluster computing environment on Xen with web-based user interface. In: Second international conference, HPCA 2009, Shanghai, China, 10–12 August 2009, pp 503–508. Revised Selected papers. doi:10.1007/978-3-642-11842-5_70
Nagarajan AB, Mueller F, Engelmann C, Scott SL (2007) Proactive fault tolerance for HPC with Xen virtualization. In: Proceedings of the 21st annual international conference on supercomputing, Seattle, Washington. doi:10.1145/1274971.1274978
Montero RS, Moreno-Vozmediano R, Llorente IM (2011) An elasticity model for high throughput computing clusters. J Parallel Distrib Comput 71:750–757
Sotomayor B, Montero RS, Llorente IM, Foster I (2009) Virtual infrastructure management in private and hybrid clouds. IEEE Internet Comput 13(5):14–22
OpenVZ. Available: http://wiki.openvz.org/Main_Page
OpenNebula. Available: http://www.opennebula.org
Hai Z, Kun T, Xuejie Z (2010) An approach to optimized resource scheduling algorithm for open-source cloud systems. In: Fifth annual ChinaGrid conference (ChinaGrid), pp 124–129
Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment. In: 10th IEEE international conference on computer and information technology, pp 2736–2743
Apache JMeter. Available: http://jakarta.apache.org. Accessed 3 June 2013
MountableHDFS. Available: http://wiki.apache.org/hadoop/MountableHDFS. Accessed 3 June 2013
Acknowledgements
This work is sponsored by Tunghai University, The U-Care ICT Integration Platform for the Elderly, No. 102GREEnS004-2, Aug. 2013. This work was also supported in part by the National Science Council, Taiwan ROC, under grant numbers NSC102-2218- E-029-002 and NSC101-2218-E-029-004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yang, CT., Liu, JC., Hsu, CH. et al. On improvement of cloud virtual machine availability with virtualization fault tolerance mechanism. J Supercomput 69, 1103–1122 (2014). https://doi.org/10.1007/s11227-013-1045-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-1045-1