Skip to main content

Pipelined Multi-GPU MapReduce for Big-Data Processing

  • Conference paper
Computer and Information Science

Part of the book series: Studies in Computational Intelligence ((SCI,volume 493))

Abstract

MapReduce is a popular large-scale data-parallel processing model. Its success has stimulated several studies of implementing MapReduce on Graphic Processing Unit (GPU). However, these studies focus most of their efforts on single-GPU algorithms and cannot handle large data sets which exceed GPU memory capacity. This paper describes an upgrade version of MGMR, a pipelined multi-GPU MapReduce system (PMGMR), which addresses the challenge of big data. PMGMR employs the power of multiple GPUs, improves GPU utilization using new GPU features such as streams and Hyper-Q, and handles large data sets which exceeds GPU and even CPU memory. Compared to MGMR, the newly proposed scheme achieves a 2.5-fold performance improvement and increases system scalability, while allowing users to write straight forward MapReduce code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bollier, D., Firestone, C.M.: The promise and peril of big data. Aspen Institute, Communications and Society Program (2010)

    Google Scholar 

  2. Chen, L., Agrawal, G.: Optimizing mapreduce for gpus with effective shared memory usage. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, pp. 199–210 (2012)

    Google Scholar 

  3. Chen, L., Huo, X., Agrawal, G.: Accelerating mapreduce on a coupled cpu-gpu architecture. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 25 (2012)

    Google Scholar 

  4. Chen, Y., Qiao, Z., Jiang, H., Li, K.C., Ro, W.W.: Mgmr: Multi-gpu based mapreduce. In: To Appear in Proceedings of the 8th International Conference on Grid and Pervasive Computing (2013)

    Google Scholar 

  5. Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid information services for distributed resource sharing. In: Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing, pp. 181–194 (2001)

    Google Scholar 

  6. Dean, J., Ghemawa, S.: Mapreduce: Simplied data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  7. Dinov, I.D.: Cuda optimization strategies for compute-and memory-bound neuroimaging algorithms. Computer Methods and Programs in Biomedicine (2011)

    Google Scholar 

  8. Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: Marla: Mapreduce for heterogeneous clusters. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 49–56 (2012)

    Google Scholar 

  9. Foster, I., Kesselman, C.: The grid 2: Blueprint for a new computing infrastructure. Morgan Kaufmann (2003)

    Google Scholar 

  10. Ji, F., Ma, X.: Using shared memory to accelerate mapreduce on graphics processing units. In: Proceedings of the IEEE International Parallel & Distributed Processing Symposium, pp. 805–816 (2011)

    Google Scholar 

  11. Jinno, R., Seki, K., Uehara, K.: Parallel distributed trajectory pattern mining using mapreduce. In: IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 269–273 (2012)

    Google Scholar 

  12. Nakada, H., Ogawa, H., Kudoh, T.: Stream processing with bigdata: Sss-mapreduce. In: Proceedings of 2012 IEEE 4th International Conference on Cloud Computing Technology and Science, pp. 618–621 (2012)

    Google Scholar 

  13. Shainer, G., Lui, P., Liu, T.: The development of mellanox/nvidia gpu direct over infinibanda new model for gpu to gpu communications. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery, vol. 26, pp. 267–273 (2011)

    Google Scholar 

  14. Stuart, J.A., Owens, J.D.: Multi-gpu mapreduce on gpu clusters. In: Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 1068–1079 (2011)

    Google Scholar 

  15. White, T.: Hadoop: The Definitive Guide. O’Reilly Media (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Chen, Y., Qiao, Z., Davis, S., Jiang, H., Li, KC. (2013). Pipelined Multi-GPU MapReduce for Big-Data Processing. In: Lee, R. (eds) Computer and Information Science. Studies in Computational Intelligence, vol 493. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00804-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00804-2_17

  • Publisher Name: Springer, Heidelberg

  • Print ISBN: 978-3-319-00803-5

  • Online ISBN: 978-3-319-00804-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics