ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms

Ding, Linlin; Xin, Junchang; Wang, Guoren; Huang, Shan

doi:10.1007/978-3-642-29035-0_11

Linlin Ding^22,23,
Junchang Xin^22,23,
Guoren Wang^22,23 &
…
Shan Huang^22,23

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7239))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1892 Accesses
7 Citations

Abstract

As a parallel programming model, MapReduce processes scalable and parallel applications with huge amounts of data on large clusters. In MapReduce framework, there are no communication mechanisms among Mappers, neither are among Reducers. When the amount of final results is much smaller than the original data, it is a waste of time processing the unpromising intermediate data objects. We observe that this waste can be avoided by simple communication mechanisms. In this paper, we propose ComMapReduce, a framework that extends and improves MapReduce for efficient query processing of massive data in the cloud. With efficient lightweight communication mechanisms, ComMapReduce can effectively filter the unpromising intermediate data objects in Map phase so as to decrease the input of Reduce phase specifically. Three communication strategies, Lazy, Eager and Hybrid, are proposed to filter the unpromising intermediate results of Map phase. In addition, two optimization strategies, Prepositive and Postpositive, are presented to enhance the performance of query processing by filtering more candidate data objects. Our extensive experiments on different synthetic datasets demonstrate that ComMapReduce framework outperforms the original MapReduce framework in all metrics without affecting its existing characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc.of OSDI, pp. 137–150 (2004)
Google Scholar
Hadoop, http://hadoop.apache.org/
HDFS, http://hadoop.apache.org/common/hdfs/
Thusoo, A., Sarma, J.S., Jain, N., et al.: Hive-A Warehousing Solution Over a Map-Reduce Framework. PVLDB 2(2), 1626–1629 (2009)
Google Scholar
Carstoiu, D., Lepadatu, E., Gaspar, M.: Hbase-non SQL Database, Performances Evaluation. IJACT-AICIT 2(5), 42–52 (2010)
Article Google Scholar
Olston, C., Reed, B., Srivastava, U., et al.: Pig Latin: A Not-so-foreign Language for Data Processing. In: Proc.of SIGMOD, pp. 1099–1110 (2008)
Google Scholar
Abadi, D.J.: Data Management in the Cloud: Limitations and Opportunities. IEEE Data Eng. Bull. (DEBU) 32(1), 3–12 (2009)
Google Scholar
Yang, H., Dasdan, A., Hsiao, R., et al.: Map-reduce-merge: Simplified Relational Data Processing on Large Clusters. In: Proc. of SIGMOD, pp. 1029–1040 (2007)
Google Scholar
Abouzeid, A., Baida-Pawlikowski, K., Abadi, D., et al.: HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB 2(1), 922–933 (2009)
Google Scholar
Panda, B., Herbach, J.S., Basu, S., et al.: PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. In: Proc. of VLDB, pp. 1426–1437 (2009)
Google Scholar
Cary, A., Sun, Z., Hristidis, V., Rishe, N.: Experiences on Processing Spatial Data with MapReduce. In: Winslett, M. (ed.) SSDBM 2009. LNCS, vol. 5566, pp. 302–319. Springer, Heidelberg (2009)
Chapter Google Scholar
Blanas, S., Patel, J.M., Ercegovac, V., et al.: A Comparision of Join Algorithms for Log Processing in MapReduce. In: Proc. of SIGMOD, pp. 975–986 (2010)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., et al.: A Comparison of Approaches to Large-scale Data Analysis. In: Proc. of SIGMOD, pp. 165–178 (2009)
Google Scholar
Dittrich, J., Quian-Ruiz, J., Jindal, A., et al.: Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing). PVLDB 3(1), 518–529 (2010)
Google Scholar
Bu, Y., Howe, B., Balazinska, M., et al.: HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB 3(1), 285–296 (2010)
Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., et al.: Pregel: A System for Large-scale Graph Processing. Proc. of SIGMOD, pp. 135–146 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Medical Image Computing (NEU), Ministry of Education, P.R. China
Linlin Ding, Junchang Xin, Guoren Wang & Shan Huang
College of Information Science & Engineering, Northeastern University, P.R. China
Linlin Ding, Junchang Xin, Guoren Wang & Shan Huang

Authors

Linlin Ding
View author publications
You can also search for this author in PubMed Google Scholar
Junchang Xin
View author publications
You can also search for this author in PubMed Google Scholar
Guoren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shan Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, Seoul National University, Gwanak-ro, Gwanak-gu, 151747, Seoul, South Korea
Sang-goo Lee
Computer School, Wuhan University, Luo-jia-shan, Wuchang, 430081, Wuhan, Hubei Province, China
Zhiyong Peng
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou
Department of Computer Science, Kangwon National University, 192-1, Hyoja2-Dong, 200701, Chuncheon, Kangwon, South Korea
Yang-Sae Moon
Institute for Computer Science and Business Information, University of Duisburg-Essen, Schützenbahn 70, 45117, Essen, Germany
Rainer Unland
School of Information and Communication Engineering, Chungbuk National University, 52 Naesudong-ro, Heungdeok-gu, 4072, Cheongju, Chungbuk, South Korea
Jaesoo Yoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ding, L., Xin, J., Wang, G., Huang, S. (2012). ComMapReduce: An Improvement of MapReduce with Lightweight Communication Mechanisms. In: Lee, Sg., Peng, Z., Zhou, X., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7239. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29035-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-29035-0_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29034-3
Online ISBN: 978-3-642-29035-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics