, Volume 98, Issue 1–2, pp 55–72 | Cite as

Enabling collaborative MapReduce on the Cloud with a single-sign-on mechanism

  • Jiaqi Zhao
  • Jie TaoEmail author
  • Achim Streit


Cloud Computing introduces a novel computing paradigm that allows the users to run their applications on a customized environment using on-demand resources. This novel computing concept is enabled by several technologies including the Web, virtualization, distributed file systems as well as parallel programming models. For parallel computing on the Cloud, MapReduce is currently the first choice for Cloud providers to deliver data analysis services because this model is specially designed for data-intensive applications while a Cloud centre is actually also a data centre hosting a huge amount of data usually in Petascale. The current deployment of MapReduce on the Cloud, however, follows the traditional execution model of MapReduce that needs the support of a cluster manager. This means that the single virtual machines created on the Cloud have to be organized into a cluster in order to be capable of running a MapReduce application. This is not only a burden for system management but also prohibits inter-Cloud computing that can involve the resources of different Clouds to solve large problems with big data or distributed data. We developed a software framework for individual virtual machines to execute a MapReduce application in a parallel/collaborative way without the necessity of installing a middleware or specific software package for system management. A focus of this research work is a Single-Sign-On (SSON) mechanism that enables the remote access to the individual machines. We validated the SSON mechanism together with the entire MapReduce framework using a private Cloud. Experimental results show both the functionality and the feasibility of our approach.


Cloud computing MapReduce framework User authentication Security model 

Mathematics Subject Classification



  1. 1.
    Alhamazani K, Ranjan R, Mitra K, Rabhi F, Khan S.U, Guabtni A, Bhatnagar V (2013) An overview of the commercial cloud monitoring tools: research dimensions, design issues, and state-of-the-art. CoRR.
  2. 2.
    Amazon (2013) Amazon elastic compute cloud.
  3. 3.
    Bing T, Moca M, Chevalier S, Haiwu H, Fedak G (2010) Towards MapReduce for desktop grid computing. In: Proceedings of the international conference on P2P, parallel, grid, cloud and internet computing, pp 193–200Google Scholar
  4. 4.
    Chandra R, Dagum L, Kohr D, Maydan D, McDonald J, Menon R (2001) Parallel programming in OpenMP. Morgan Kaufmann, Los Altos, CA. ISBN:1-55860-671-8Google Scholar
  5. 5.
    Chen D, Li D, Xiong M, Bao H, Li X (2010) GPGPU-aided ensemble empirical mode decomposition for EEG analysis during anaesthesia. IEEE Trans Inf Technol BioMed 14(6):1417–1427CrossRefGoogle Scholar
  6. 6.
    Chen D, Wang L, Ouyang G, Li X (2011) Massively parallel neural signal processing on a many-core platform. IEEE/AIP Mag Comput Sci Eng 13(6):42–51CrossRefGoogle Scholar
  7. 7.
    Chen D, Wang L, Wu X, Chen J, Khan S, Kolodziej J, Tian M, Huang F, Liu W (2013) Hybrid modelling and simulation of huge crowd over a hierarchical grid architecture. Futur Gener Comput Syst 29(5):1309–1317CrossRefGoogle Scholar
  8. 8.
    Costa F, Silva L, Dahlin M (2011) Volunteer cloud computing: MapReduce over the Internet. In: Proceedings of the IEEE international symposium on parallel and distributed processing workshops and Phd Forum, pp 1855–1862Google Scholar
  9. 9.
    Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. J ACM Commun 51(1):107–113CrossRefGoogle Scholar
  10. 10.
    Dou A, Kalogeraki V, Gunopulos D, Mielikainen T, Tuulos V.H (2010) Misco: a mapreduce framework for mobile systems. In: Proceedings of the 3rd international conference on pervasive technologies related to assistive environmentsGoogle Scholar
  11. 11.
    Fedak G, He H, Cappello F (2008) BitDew: a programmable environment for large-scale data management and distribution. In: Proceedings of the ACM/IEEE conference on supercomputingGoogle Scholar
  12. 12.
    Gentzsch W (2001) Sun Grid Engine: towards creating a compute power grid. In: Proceedings of the 1st international symposium on cluster computing and the grid, pp 35–36. Washington, USAGoogle Scholar
  13. 13.
    Ghemawat S, Gobioff H, Leung S (2003) The Google file system. In: Proceedings of the ACM symposium on operating systems principles, pp 29–43Google Scholar
  14. 14.
    Globus: Grid security infrastructure (2013).
  15. 15.
    Hadoop: Apache Hadoop Project (2012).
  16. 16.
    Hameed A, Khoshkbari A, Ranjan R, Khan S.U, Kolodziej J, Balaji P, Zeadally S, Malluhi QM, Tzirtas N, Vishnav A, Zomaya A (2014) A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems (accepted)Google Scholar
  17. 17.
    He B, Fang W, Luo Q, Govindaraju N.K, Wang T (2008) Mars: a mapreduce framework on graphics processors. In: Proceedings of international conference on parallel architectures and compilation techniques, pp 260–269Google Scholar
  18. 18.
    Ibrahim S, Jin H, Cheng B, Cao H, Wu S, Qi L (2009) CLOUDLET: towards mapreduce implementation on virtual machines. In: Proceedings of the ACM international symposium on high performance distributed computing, pp 65–66Google Scholar
  19. 19.
    Keahey K, Freeman T (2008) Science clouds: early experiences in cloud computing for scientific applications. In: Proceedings of the first workshop on cloud computing and its applicationsGoogle Scholar
  20. 20.
    Kolodziej J, Khan S, Wang L, Byrski A, Nasro M, Madani S (2013) Hierarchical genetic-based grid scheduling with energy optimization. Clust Comput. doi: 10.1007/s10586-012-0226-7
  21. 21.
    Kolodziej J, Khan S, Wang L, Kisiel-Dorohinicki M, Madani S (2012) Security, energy, and performance-aware resource allocation mechanisms for computational grids. Futur Gener Comput Syst. doi: 10.1016/j.future.2012.09.009
  22. 22.
    Kolodziej J, Khan S, Wang L, Zomaya A (2013) Energy efficient genetic-based schedulers in computational grids. Concurr Comput Pract Exp . doi: 10.1002/cpe.2839
  23. 23.
    Liu H, Orban D (2011) Cloud MapReduce: a MapReduce implementation on top of a cloud operating system. In: Proceedings of the international symposium on cluster, cloud and grid computing, pp 464–474Google Scholar
  24. 24.
    Mell P, Grance T (2013) The NIST definition of cloud computing.
  25. 25.
    Menzel M, Ranjan R, Wang L, Khan S, Chen J (2014) CloudGenius: a hybrid decison support method for automating the migration of web application clustes to public clouds (accepted)Google Scholar
  26. 26.
    Miao Y, Wang L, Liu D (2013) A web 2.0-based scientific gateway for massive remote sensing image processing. Concurr Comput Pract Exp. doi: 10.1002/cpe.3049
  27. 27.
    Pacheco P (1996) Parallel programming with MPI. No. 978-1-55860-339-4 in ISBN. Morgan Kaufmann, Los AltosGoogle Scholar
  28. 28.
    Ranger C, Raghuraman R, Penmetsa A, Bradski G, Kozyrakis C (2007) Evaluating MapReduce for multi-core and multiprocessor systems. In: Proceedings of the IEEE international symposium on high performance computer architecture, pp 13–24Google Scholar
  29. 29.
    Ranjan R, Buyya R, Harwood A (2005) A case for cooperative and incentive based coupling of distributed clusters. In: Proceedings of the 7th IEEE international conference on cluster computing (Cluster 2005), pp 1–11. Boston, MS, USAGoogle Scholar
  30. 30.
    Ranjan R, Buyya R, Nepal S, Georgakopulo D (2014) A note on resource orchestration for cloud computing (accepted)Google Scholar
  31. 31.
    Rescorla E (2002) SSL and TLS designing adn building secure systems. Addison-Wesley, ReadingGoogle Scholar
  32. 32.
    Roy I, Setty STV, Kilzer A, Shmatikov V, Witchel E (2010) Airavat: security and privacy for MapReduce. In: Proceedings of the 7th USENIX conference on networked systems design and implementationGoogle Scholar
  33. 33.
    Shan Y, Wang B, Yan J, Wang Y, Xu N, Yang H (2010) FPMR: MapReduce framework on FPGA. In: Proceedings of the annual ACM/SIGDA international symposium on field programmable gate arrays, pp 93–102Google Scholar
  34. 34.
    Shvachko K, Hairong K, Radia S, Chansler R (2010) The Hadoop distributed file system. In: Proceedings of the IEEE symposium on mass storage systems and technologies, pp 1–10Google Scholar
  35. 35.
    Sotomayor B, Montero R, Llorente I, Foster I (2008) Capacity leasing in cloud systems using the OpenNebula engine. In: The first workshop on cloud computing and its applicationsGoogle Scholar
  36. 36.
    Staples G (2006) TORQUE resource manager. In: Proceedings of the 2006 ACM/IEEE conference on supercomputingGoogle Scholar
  37. 37.
    Tatebe O, Hiraga K, Soda N (2010) Gfarm grid file system. New Gener Comput 28(3):257–275zbMATHCrossRefGoogle Scholar
  38. 38.
    Wang L, Chen D, Hu Y, Ma Y, Wang J (2013) Towards enabling cyberinfrastructure as a service in clouds. Comput Electr Eng 39(1):3–14CrossRefGoogle Scholar
  39. 39.
    Wang L, Chen D, Liu W, Ma Y, Wu Y, Deng Z (2013) Parallel simulation of threat management for urban water distribution systems with MapReduce in clouds. IEEE Mag Comput Sci Eng. doi: 10.1109/MCSE.2012.89
  40. 40.
    Wang L, Khan S, Chen D, Kolodziej J, Ranjan R, Xu C, Zomaya A (2013) Energy-aware parallel task scheduling in a cluster. Futur Gener Comput Syst 29(7):1661–1670CrossRefGoogle Scholar
  41. 41.
    Wang L, Khan S, Dayal J (2012) Thermal aware workload placement with task-temperature profiles in a data center. J Supercomput 61(3):780–803CrossRefGoogle Scholar
  42. 42.
    Wang L, Kunze M, Tao J, von Laszewski G (2011) Towards building a cloud for scientific applications. Adv Eng Softw 42(9):714–722CrossRefGoogle Scholar
  43. 43.
    Wang L, Laszewski G, Younge A, He X, Kunze M, Tao J, Fu C (2010) Cloud computing: a perspective study. New Gener Comput 28(2):137–146zbMATHCrossRefGoogle Scholar
  44. 44.
    Wang L, Tao J, Ranjan R, Marten H, Streit A, Chen J, Chen D (2013) G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur Gener Comput Syst 29(3):739–750CrossRefGoogle Scholar
  45. 45.
    Wei J, Liu D, Wang L (2013) A general metric and parallel framework for adaptive image fusion. Concurr Comput Pract Exp. doi: 10.1002/cpe.3037
  46. 46.
    Wei W, Du J, Yu T, Gu X (2009) SecureMR: a service integrity assurance framework for MapReduce. In: Proceedings of annual computer security applications conference, pp 73–82Google Scholar
  47. 47.
    Zhao J, Wang L, Tao J, Chen J, Sun W, Ranjan RR, Kolodziej J, Streit A, Georgakopoulos D (2014) A security framework in G-Hadoop for big data computing across distributed Cloud data centres. J Comput Syst Sci. doi: 10.1016/j.jcss.2014.02.006

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  1. 1.School of Basic SciencesChangchun University of TechnologyChangchunPeople’s Republic of China
  2. 2.Steinbuch Centre for ComputingKarlsruhe Institute of TechnologyKarlsruheGermany

Personalised recommendations