Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce

Tang, Bing; He, Haiwu; Fedak, Gilles

doi:10.1007/978-3-319-11194-0_1

Bing Tang²⁵,
Haiwu He²⁶ &
Gilles Fedak²⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8631))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

2647 Accesses
1 Citations

Abstract

A novel MapReduce computation model in hybrid computing environment called HybridMR is proposed in the paper. Using this model, high performance cluster nodes and heterogeneous desktop PCs in Internet or Intranet can be integrated to form a hybrid computing environment. In this way, the computation and storage capability of large-scale desktop PCs can be fully utilized to process large-scale datasets. HybridMR relies on a hybrid distributed file system called HybridDFS, and a time-out method has been used in HybridDFS to prevent volatility of desktop PCs, and file replication mechanism is used to realize reliable storage. A new node priority-based fair scheduling (NPBFS) algorithm has been developed in HybridMR to achieve both data storage balance and job assignment balance by assigning each node a priority through quantifying CPU speed, memory size and I/O bandwidth. Performance evaluation results show that the proposed hybrid computation model not only achieves reliable MapReduce computation, reduces task response time and improves the performance of MapReduce, but also reduces the computation cost and achieves a greener computing mode.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, D.P.: Boinc: A system for public-resource computing and storage. In: Buyya, R. (ed.) GRID, pp. 4–10. IEEE Computer Society (2004)
Google Scholar
Cappello, F., Djilali, S., Fedak, G., Hérault, T., Magniette, F., Néri, V., Lodygensky, O.: Computing on large-scale distributed systems: Xtremweb architecture, programming models, security, tests and convergence with grid. Future Generation Comp. Syst. 21(3), 417–437 (2005)
Article Google Scholar
Costa, F., Veiga, L., Ferreira, P.: Internet-scale support for map-reduce processing. J. Internet Services and Applications 4(1), 1–17 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Fedak, G., He, H., Cappello, F.: Bitdew: A data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Network and Computer Applications 32(5), 961–975 (2009)
Article Google Scholar
Jin, H., Yang, X., Sun, X.H., Raicu, I.: Adapt: Availability-aware mapreduce data placement for non-dedicated distributed computing. In: ICDCS, pp. 516–525. IEEE (2012)
Google Scholar
Lee, K., Figueiredo, R.J.O.: Mapreduce on opportunistic resources leveraging resource availability. In: CloudCom, pp. 435–442 (2012)
Google Scholar
Lin, H., Ma, X., Chun Feng, W.: Reliable mapreduce computing on opportunistic resources. Cluster Computing 15(2), 145–161 (2012)
Article Google Scholar
Litzkow, M.J., Livny, M., Mutka, M.W.: Condor - a hunter of idle workstations. In: ICDCS, pp. 104–111 (1988)
Google Scholar
Marozzo, F., Talia, D., Trunfio, P.: P2P-Mapreduce: Parallel data processing in dynamic cloud environments. J. Comput. Syst. Sci. 78(5), 1382–1402 (2012)
Article Google Scholar
Tang, B., Fedak, G.: Analysis of data reliability tradeoffs in hybrid distributed storage systems. In: IPDPS Workshops, pp. 1546–1555. IEEE Computer Society (2012)
Google Scholar
Tang, B., Moca, M., Chevalier, S., He, H., Fedak, G.: Towards mapreduce for desktop grid computing. In: Xhafa, F., Barolli, L., Nishino, H., Aleksy, M. (eds.) 3PGCIC, pp. 193–200. IEEE Computer Society (2010)
Google Scholar
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: Draves, R., van Renesse, R. (eds.) OSDI, pp. 29–42. USENIX Association (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, China
Bing Tang
LIP Laboratory, UMR CNRS - ENS Lyon - INRIA - UCB Lyon 5668, University of Lyon, 46 allée d’Italie, 69364, Lyon Cedex 07, France
Haiwu He & Gilles Fedak

Authors

Bing Tang
View author publications
You can also search for this author in PubMed Google Scholar
Haiwu He
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Fedak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Illinois Institute of Technology, 60616-3793, Chicago, IL, USA
Xian-he Sun
School of Computer Science and Technology, Dalian Maritime University, 1 Linghai Road, 116026, Dalian, China
Wenyu Qu
SEECS, University of Ottawa, 8, King Edward Ave, K1N 6N5, Ottawa, ON, Canada
Ivan Stojmenovic
Deakin University, 221 Burwood Highway, 3125, Burwood, VIC, Australia
Wanlei Zhou
Dalian Maritime University, NO.1 Linhai Road Dailian, 116026, China
Zhiyang Li
BeiHang University, XueYuan Road No.37, HaiDian District, Beijing, China
Hua Guo
University of Bradford, BD7 1DP, Bradford, West Yorkshire, United Kingdom
Geyong Min
Dalian Maritime University, NO.1 Linhai Road Dailian, China, 116026
Tingting Yang
Computer Network Information Center, Chinese Academy of Sciences, 100190, Beijing, China
Yulei Wu
Shandong University, 27 Shanda Nanlu, 250100, Jinan City, Shandong Province, China
Lei Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, B., He, H., Fedak, G. (2014). Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce. In: Sun, Xh., et al. Algorithms and Architectures for Parallel Processing. ICA3PP 2014. Lecture Notes in Computer Science, vol 8631. Springer, Cham. https://doi.org/10.1007/978-3-319-11194-0_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-11194-0_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11193-3
Online ISBN: 978-3-319-11194-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics