Improving the Map and Shuffle Phases in Hadoop MapReduce

  • J. V. N. LakshmiEmail author
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 77)


Massive amounts of data are needed to be processed as analysis is becoming a challenging issue for network-centric applications in data management. Advanced tools are required for processing such data sets for analyzing. As a proficient analogous computing programming representation, MapReduce and Hadoop are employed for extensive data analysis applications. However, MapReduce still suffers with performance problems and MapReduce uses a shuffle phase as a featured element for logical I/O strategy. The map phase requires an improvement in its performance as this phase’s output acts as an input to the next phase. Its result reveals the efficiency, so map phase needs some intermediate checkpoints which regularly monitor all the splits generated by intermediate phases. MapReduce model is designed in a way that there is a need to wait until all maps accomplish their given task. This acts as a barrier for effective resource utilization. This paper implements shuffle as a service component to decrease the overall execution time of jobs, monitor map phase by skew handling, and increase resource utilization in a cluster.


MapReduce Hadoop Shuffle Big data Data analytics HDFS 


  1. 1.
    Arulmurugan, A., Srinivasan, R.: Enhanced task scheduling scheme for Hadoop Map Reduce systems. In: IJETCSE, May 2015 Google Scholar
  2. 2.
    Dimitris, F., Ioannis, M.: Scheduling Map Reduce Jobs and Data Shuffle on Unrelated Process. MIT, Cambridge (2015)Google Scholar
  3. 3.
    Pavloet, A.: A comparison of approaches to large-scale data analysis. In: Proceedings of ACM SIGMOD, vol. 5, pp. 367–378 (2009)Google Scholar
  4. 4.
    Yandong, W., Yu, W., Que, X.: Virtual shuffling for efficient data movement in Map Reduce. In: IEEE Transitions on Computers Conference, June 2015Google Scholar
  5. 5.
    Luiz, A.B., Jeffrey, D., Holzle, U.: Web search for a planet: the Google cluster architecture. IEEE Micro 23(2), 22–28 (2003)CrossRefGoogle Scholar
  6. 6.
    Huston, L., Wickremesinghe, R., SatyaNarayana, M.: Storage architecture for early discard in interactive search. In: FAST Conference Proceedings (2004)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Map Reduce: simplified data processing on large clusters in Google, Inc OSDI (2004).Google Scholar
  8. 8.
    Lakshmi, J.V.N., Ananthi, S.: A theoretical model for big data analytics using machine learning algorithms. In: ICACCI Conference, Delhi, October 2015Google Scholar
  9. 9.
    Kwon, Y.C., Howe, B.: A study of skew in Map Reduce application. In: International Conference, USA (2014)Google Scholar
  10. 10.
    Alan, F.G., Olga, N., Shubham, C., Pradeep, K., Shravan, M.N.: Building a high level dataflow system on top of Map Reduce: the pig experience. In: IEEE Conference (2009) Google Scholar
  11. 11.
    Yanfei, G., Jia, R., Xiaobo, Z: IShuffle—improving Hadoop performance with shuffle-on-write. In: USENIX ICAC, USA (2013)Google Scholar
  12. 12.
    Abouzeid, A., Bajda, P., Abadi, D.J., Rasin, A., et al.: HadoopDB: an architectural hybrid of Map Reduce and DBMS technologies for analytical workloads. PVLDB 2(1), 922–933 (2009)Google Scholar
  13. 13.
    Ananthi, S., Lakshmi, J.V.N.: A study on Hadoop architecture for big data analytics. In: Delhi Conference ICETSCET, September 2014Google Scholar
  14. 14.
    Herodotos, H., Lim, H., Luo, G.: StarFish—a self tuning system for Big Data Analytics, CIDR, USA (2011)Google Scholar
  15. 15.
    Ronnie, C., et al.: SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of VLDB (2008)Google Scholar
  16. 16.
    Ashish, T., Joy deep Sen, S.: HIVE—a warehousing solution over a Map Reduce framework. In: VLDB (2009)Google Scholar
  17. 17.
    Li, J., Ye, Y.: Improving the shuffle of Hadoop Map Reduce. In: Proceedings of IEEE ICCCTS (2013)Google Scholar
  18. 18.
    Li, J., Yue, Y., Lin, X.: Improving the shuffle of Hadoop Map Reduce. In: IEEE ICCCTS, Beijing, China (2013)Google Scholar
  19. 19.
    Prateek, D., Sriram, K., Janakiram, D.: Chisel: resource savvy approach for handling skew in Map Reduce application. In: IEEE Conference on Cloud Computing, vol. 35, pp. 45–56 (2013)Google Scholar
  20. 20.
    Dean, J., Ghemawat, S.: Map Reduce: a flexible data processing tool. ACM Commun. 53, 72–77 (2010)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  1. 1.AIMS Institutes of Higher EducationPeenya, BengaluruIndia

Personalised recommendations