GPU Computations on Hadoop Clusters for Massive Data Processing

  • Wenbo Chen
  • Shungou Xu
  • Hai Jiang
  • Tien-Hsiung Weng
  • Mario Donato Marino
  • Yi-Siang Chen
  • Kuan-Ching Li
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 345)

Abstract

Hadoop is a well-designed approach for handling massive amount of data. Comprised at the core of the Hadoop File System and MapReduce, it schedules the processing by orchestrating the distributed servers, providing redundancy and fault tolerance. In terms of performance, Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity. In this chapter, we exploit the hybrid model of GPU and Hadoop to make best use of both capabilities, and the design and implementation of application using Hadoop and CUDA is presented through two interfaces: Hadoop Streaming and Hadoop Pipes. Experimental results on K-means algorithm are presented as well as their performance results are discussed.

Keywords

Hadoop GPU HPC Massive data processing 

References

  1. 1.
    Chen, Y., et al.: MGMR: Multi-GPU based MapReduce. In: Park, J.J., Arabnia, H.R., Kim, C., Shi, W., Gil, J.-M. (eds.) Grid and Pervasive Computing, pp. 433–442. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  2. 2.
    Jiang, H., et al.: Scaling up MapReduce-based big data processing on multi-GPU systems. Clust. Comput. 18(1), 369–383 (2015)CrossRefGoogle Scholar
  3. 3.
    Chen, Y. et al.: Pipelined multi-GPU MapReduce for big-data processing. In: Lee, R. (ed.) Computer and Information Science, pp. 231–246. Springer, New York (2013)Google Scholar
  4. 4.
    Fang, W., et al.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)CrossRefGoogle Scholar
  5. 5.
    Fan, W. et al.: Parallelization of RSA algorithm based on compute unified device architecture. In: Proceedings of the 9th International Conference on Grid and Cooperative Computing (GCC), IEEE, (2010)Google Scholar
  6. 6.
    Tsiomenko, R., Rees, B.S.: Accelerating Fast Fourier Transforms Using Hadoop and CUDA. (2013)Google Scholar
  7. 7.
    Zhu, J., et al.: Embedding GPU computations in Hadoop. Int. J. Netw. Distrib. Comput. 2(4), 211–220 (2014)CrossRefGoogle Scholar
  8. 8.
    Ding, M. et al.: More convenient more overhead: the performance evaluation of Hadoop streaming. In: Proceedings of the 2011 ACM Symposium on Research in Applied Computation. ACM, New York, pp. 307–313 (2011)Google Scholar
  9. 9.
    Kirk, D.: NVIDIA CUDA software and GPU parallel computing architecture. In: ISMM, vol. 7, pp. 103–104 (2007)Google Scholar
  10. 10.
    Jiang, H., et al.: Accelerating MapReduce framework on multi-GPU systems. Clust. Comput. 17(2), 293–301 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Wenbo Chen
    • 1
  • Shungou Xu
    • 1
  • Hai Jiang
    • 2
  • Tien-Hsiung Weng
    • 3
  • Mario Donato Marino
    • 4
  • Yi-Siang Chen
    • 3
  • Kuan-Ching Li
    • 3
  1. 1.School of Information and Technology of Lanzhou UniversityLanzhouChina
  2. 2.Department of Computer ScienceArkansas State UniversityJonesboroUSA
  3. 3.Department of Computer Science and Information EngineeringProvidence UniversityTaichungTaiwan
  4. 4.SanfatucchioItaly

Personalised recommendations