Design and Implementation of Virtual Hadoop Cluster on Private Cloud

  • Garima SinghEmail author
  • Anil Kumar Singh
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 799)


Virtualization has made it feasible to deploy hadoop in cloud environment. Virtualized hadoop offers unique benefits like setting up a cluster in short time, flexibility to use variety of hardware (SAN, NAS, DAS), high availability and many more. With companies like Google, Microsoft, Rackspace and IBM providing their own infrastructure for cloud service more and more business is assumed to move on cloud in the near future. Apart from public cloud business can make use of private, community or hybrid cloud deployment model. In this paper, the focus is on private cloud deployment which offers its own benefits like security, reduced cost, more control over resources etc. The design and implementation of private cloud using Xen 6.5 bare metal hypervisor is discussed in this paper. Further it also discusses deploying hadoop as service on the cloud with the help of shell script. For experimental purpose 8 physical hosts are connected to 60 Tb SAN with QLogic 20-Port 8 Gb SAN switch module which provides fiber connectivity to the storage. Finally, the performance of hadoop on the cloud is evaluated.


Virtualization SAN Hadoop MapReduce High availability 


  1. 1.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010)CrossRefGoogle Scholar
  2. 2.
    Mell, P., Grance, T., et al.: The NIST definition of cloud computing. Commun. ACM 53, 50 (2011)Google Scholar
  3. 3.
    Dillon, T., Wu, C., Chang, E.: Cloud computing: issues and challenges. In: 2010 24th IEEE International Conference on Advanced Information Networking and Applications (AINA), pp. 27–33. IEEE (2010)Google Scholar
  4. 4.
    Savu, L.: Cloud computing: deployment models, delivery models, risks and research challenges. In: 2011 International Conference on Computer and Management (CAMAN), pp. 1–4. IEEE (2011)Google Scholar
  5. 5.
    Li, A., Yang, X., Kandula, S., Zhang, M.: CloudCmp: comparing public cloud providers. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, pp. 1–14. ACM (2010)Google Scholar
  6. 6.
    Rimal, B.P., Choi, E., Lumb, I.: A taxonomy and survey of cloud computing systems. In: INC, IMS and IDC, pp. 44–51 (2009)Google Scholar
  7. 7.
    Wang, L., Tao, J., Kunze, M., Castellanos, A.C., Kramer, D., Karl, W.: Scientific cloud computing: early definition and experience. In: 2008 10th IEEE International Conference on High Performance Computing and Communications, HPCC 2008, pp. 825–830. IEEE (2008)Google Scholar
  8. 8.
    Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: Proceedings of the 2nd Conference on Symposium on Networked Systems Design and Implementation, vol. 2, pp. 273–286. USENIX Association (2005)Google Scholar
  9. 9.
    Padala, P., Zhu, X., Wang, Z., Singhal, S., Shin, K.G., et al.: Performance evaluation of virtualization technologies for server consolidation. HP Labs Technical report (2007)Google Scholar
  10. 10.
    Barroso, L.A., Hölzle, U.: The case for energy-proportional computing. Computer 40(12), 33–37 (2007)CrossRefGoogle Scholar
  11. 11.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  12. 12.
    Zhang, Q., Liu, L., Lee, K., Zhou, Y., Singh, A., Mandagere, N., Gopisetty, S., Alatorre, G.: Improving Hadoop service provisioning in a geographically distributed cloud. In: 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 432–439. IEEE (2014)Google Scholar
  13. 13.
    Guo, Y., Rao, J., Jiang, C., Zhou, X.: Moving Hadoop into the cloud with flexible slot management and speculative execution. IEEE Trans. Parallel Distrib. Syst. 28(3), 798–812 (2017)CrossRefGoogle Scholar
  14. 14.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 164–177. ACM (2003)CrossRefGoogle Scholar
  15. 15.
    Abels, T., Dhawan, P., Chandrasekaran, B.: An overview of xen virtualization. Dell Power Solut. 8, 109–111 (2005)Google Scholar
  16. 16.
    Borthakur, D.: The Hadoop distributed file system: architecture and design. Hadoop Proj. Website 11(2007), 21 (2007)Google Scholar
  17. 17.
    Ibrahim, S., Jin, H., Lu, L., Qi, L., Wu, S., Shi, X.: Evaluating MapReduce on virtual machines: the Hadoop case. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 519–528. Springer, Heidelberg (2009). Scholar
  18. 18.
    Ibrahim, S., Jin, H., Cheng, B., Cao, H., Wu, S., Qi, L.: CLOUDLET: towards mapreduce implementation on virtual machines. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 65–66. ACM (2009)Google Scholar
  19. 19.
    Xu, G., Xu, F., Ma, H.: Deploying and researching Hadoop in virtual machines. In: 2012 IEEE International Conference on Automation and Logistics (ICAL), pp. 395–399. IEEE (2012)Google Scholar
  20. 20.
    Wendt, M.E.: Cloud-based Hadoop deployments: benefits and considerations. Technical report (2014)Google Scholar
  21. 21.
    Conley, M., Vahdat, A., Porter, G.: Achieving cost-efficient, data-intensive computing in the cloud. In: Proceedings of the Sixth ACM Symposium on Cloud Computing, pp. 302–314. ACM (2015)Google Scholar
  22. 22.
    Citrix XenServer Design. Designing xenserver network configurationsGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.Motilal Nehru National Institute of TechnologyAllahabadIndia

Personalised recommendations