Pipelined Multi-GPU MapReduce for Big-Data Processing

Chen, Yi; Qiao, Zhi; Davis, Spencer; Jiang, Hai; Li, Kuan-Ching

doi:10.1007/978-3-319-00804-2_17

Yi Chen²,
Zhi Qiao²,
Spencer Davis²,
Hai Jiang² &
…
Kuan-Ching Li³

Part of the book series: Studies in Computational Intelligence ((SCI,volume 493))

1574 Accesses
6 Citations
1 Altmetric

Abstract

MapReduce is a popular large-scale data-parallel processing model. Its success has stimulated several studies of implementing MapReduce on Graphic Processing Unit (GPU). However, these studies focus most of their efforts on single-GPU algorithms and cannot handle large data sets which exceed GPU memory capacity. This paper describes an upgrade version of MGMR, a pipelined multi-GPU MapReduce system (PMGMR), which addresses the challenge of big data. PMGMR employs the power of multiple GPUs, improves GPU utilization using new GPU features such as streams and Hyper-Q, and handles large data sets which exceeds GPU and even CPU memory. Compared to MGMR, the newly proposed scheme achieves a 2.5-fold performance improvement and increases system scalability, while allowing users to write straight forward MapReduce code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bollier, D., Firestone, C.M.: The promise and peril of big data. Aspen Institute, Communications and Society Program (2010)
Google Scholar
Chen, L., Agrawal, G.: Optimizing mapreduce for gpus with effective shared memory usage. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, pp. 199–210 (2012)
Google Scholar
Chen, L., Huo, X., Agrawal, G.: Accelerating mapreduce on a coupled cpu-gpu architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 25 (2012)
Google Scholar
Chen, Y., Qiao, Z., Jiang, H., Li, K.C., Ro, W.W.: Mgmr: Multi-gpu based mapreduce. In: To Appear in Proceedings of the 8th International Conference on Grid and Pervasive Computing (2013)
Google Scholar
Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–194 (2001)
Google Scholar
Dean, J., Ghemawa, S.: Mapreduce: Simplied data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Dinov, I.D.: Cuda optimization strategies for compute-and memory-bound neuroimaging algorithms. Computer Methods and Programs in Biomedicine (2011)
Google Scholar
Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: Marla: Mapreduce for heterogeneous clusters. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 49–56 (2012)
Google Scholar
Foster, I., Kesselman, C.: The grid 2: Blueprint for a new computing infrastructure. Morgan Kaufmann (2003)
Google Scholar
Ji, F., Ma, X.: Using shared memory to accelerate mapreduce on graphics processing units. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 805–816 (2011)
Google Scholar
Jinno, R., Seki, K., Uehara, K.: Parallel distributed trajectory pattern mining using mapreduce. In: IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 269–273 (2012)
Google Scholar
Nakada, H., Ogawa, H., Kudoh, T.: Stream processing with bigdata: Sss-mapreduce. In: Proceedings of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 618–621 (2012)
Google Scholar
Shainer, G., Lui, P., Liu, T.: The development of mellanox/nvidia gpu direct over infinibanda new model for gpu to gpu communications. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, vol. 26, pp. 267–273 (2011)
Google Scholar
Stuart, J.A., Owens, J.D.: Multi-gpu mapreduce on gpu clusters. In: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 1068–1079 (2011)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Arkansas State University, Jonesboro, USA
Yi Chen, Zhi Qiao, Spencer Davis & Hai Jiang
Dept. of Computer Science & Information Eng., Providence University, Taichung, Taiwan
Kuan-Ching Li

Authors

Yi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Spencer Davis
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Kuan-Ching Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

, Software Engineering & Information, Central Michigan University, Mt. Pleasant, 48859, Michigan, USA
Roger Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, Y., Qiao, Z., Davis, S., Jiang, H., Li, KC. (2013). Pipelined Multi-GPU MapReduce for Big-Data Processing. In: Lee, R. (eds) Computer and Information Science. Studies in Computational Intelligence, vol 493. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00804-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-00804-2_17
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00803-5
Online ISBN: 978-3-319-00804-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics