ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms

  • Linlin Ding
  • Junchang Xin
  • Guoren Wang
  • Shan Huang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7239)

Abstract

As a parallel programming model, MapReduce processes scalable and parallel applications with huge amounts of data on large clusters. In MapReduce framework, there are no communication mechanisms among Mappers, neither are among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data objects. We observe that this waste can be avoided by simple communication mechanisms. In this paper, we propose ComMapReduce, a framework that extends and improves MapReduce for efficient query processing of massive data in the cloud. With efficient lightweight communication mechanisms, ComMapReduce can effectively filter the unpromising intermediate data objects in Map phase so as to decrease the input of Reduce phase specifically. Three communication strategies, Lazy, Eager and Hybrid, are proposed to filter the unpromising intermediate results of Map phase. In addition, two optimization strategies, Prepositive and Postpositive, are presented to enhance the performance of query processing by filtering more candidate data objects. Our extensive experiments on different synthetic datasets demonstrate that ComMapReduce framework outperforms the original MapReduce framework in all metrics without affecting its existing characteristics.

Keywords

Data Object Query Processing Communication Strategy Master Node Slave Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc.of OSDI, pp. 137–150 (2004)Google Scholar
  2. 2.
  3. 3.
  4. 4.
    Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive-A Warehousing Solution Over a Map-Reduce Framework. PVLDB 2(2), 1626–1629 (2009)Google Scholar
  5. 5.
    Carstoiu, D., Lepadatu, E., Gaspar, M.: Hbase-non SQL Database, Performances Evaluation. IJACT-AICIT 2(5), 42–52 (2010)CrossRefGoogle Scholar
  6. 6.
    Olston, C., Reed, B., Srivastava, U., et al.: Pig Latin: A Not-so-foreign Language for Data Processing. In: Proc.of SIGMOD, pp. 1099–1110 (2008)Google Scholar
  7. 7.
    Abadi, D.J.: Data Management in the Cloud: Limitations and Opportunities. IEEE Data Eng. Bull. (DEBU) 32(1), 3–12 (2009)Google Scholar
  8. 8.
    Yang, H., Dasdan, A., Hsiao, R., et al.: Map-reduce-merge: Simplified Relational Data Processing on Large Clusters. In: Proc. of SIGMOD, pp. 1029–1040 (2007)Google Scholar
  9. 9.
    Abouzeid, A., Baida-Pawlikowski, K., Abadi, D., et al.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 922–933 (2009)Google Scholar
  10. 10.
    Panda, B., Herbach, J.S., Basu, S., et al.: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. In: Proc. of VLDB, pp. 1426–1437 (2009)Google Scholar
  11. 11.
    Cary, A., Sun, Z., Hristidis, V., Rishe, N.: Experiences on Processing Spatial Data with MapReduce. In: Winslett, M. (ed.) SSDBM 2009. LNCS, vol. 5566, pp. 302–319. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Blanas, S., Patel, J.M., Ercegovac, V., et al.: A Comparision of Join Algorithms for Log Processing in MapReduce. In: Proc. of SIGMOD, pp. 975–986 (2010)Google Scholar
  13. 13.
    Pavlo, A., Paulson, E., Rasin, A., et al.: A Comparison of Approaches to Large-scale Data Analysis. In: Proc. of SIGMOD, pp. 165–178 (2009)Google Scholar
  14. 14.
    Dittrich, J., Quian-Ruiz, J., Jindal, A., et al.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB 3(1), 518–529 (2010)Google Scholar
  15. 15.
    Bu, Y., Howe, B., Balazinska, M., et al.: HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB 3(1), 285–296 (2010)Google Scholar
  16. 16.
    Malewicz, G., Austern, M.H., Bik, A.J.C., et al.: Pregel: A System for Large-scale Graph Processing. Proc. of SIGMOD, pp. 135–146 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Linlin Ding
    • 1
    • 2
  • Junchang Xin
    • 1
    • 2
  • Guoren Wang
    • 1
    • 2
  • Shan Huang
    • 1
    • 2
  1. 1.Key Laboratory of Medical Image Computing (NEU)Ministry of EducationP.R. China
  2. 2.College of Information Science & EngineeringNortheastern UniversityP.R. China

Personalised recommendations