Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Oh, Chanyoung; Yi, Saehanseul; Seok, Jongkyu; Jung, Hyeonjin; Yoon, Illo; Yi, Youngmin

doi:10.1007/s10586-023-04178-5

Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Published: 21 November 2023

(2023)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Chanyoung Oh¹,
Saehanseul Yi²,
Jongkyu Seok³,
Hyeonjin Jung³,
Illo Yoon³ &
…
Youngmin Yi³

84 Accesses
Explore all metrics

Abstract

As a GPU has become an essential component in high performance computing, it has been attempted by many works to leverage GPU computing in Hadoop. However, few works considered to fully utilize the GPU in Hadoop and only a few works studied utilizing both CPU and GPU at the same time. In this paper, we propose a CPU-GPU hybrid scheduling in Hadoop, where both CPUs and GPUs in a node are exploited as much as possible in an adaptive manner. The technical barrier stands in that the optimal number of GPU tasks is not known in advance, and the total number of Containers in a node cannot be changed once a Hadoop job starts. In the proposed approach, we first determine the initial number of Containers as well as the hybrid execution mode, then the proposed dynamic scheduler adjusts the number of Containers for a GPU and a CPU with the help of a GPU monitor during the job execution. It also employs a load-balancing algorithm for the tail. The experiments with various benchmarks show that the proposed CPU-GPU hybrid scheduling achieves 3.87\(\times\) of speedup on average against the 12-core CPU-only Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

KCSS: Kubernetes container scheduling strategy

Article 24 September 2020

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Data availability

Enquiries about data availability should be directed to the authors.

References

Apache. Hadoop (2022). https://hadoop.apache.org/
Grossman, M., Breternitz, M., Sarkar, V.: HadoopCL: mapReduce on distributed heterogeneous platforms through seamless integration of hadoop and OpenCL. in: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW) pp. 1918–1927 (2013)
Stuart, J. A., Owens, J. D.: Multi-GPU mapreduce on GPU clusters. In: Distributed Processing Symposium (IPDPS), pp. 1068–1079 (2011)
Rafique, M. M., Butt, A. R., Nikolopoulos, D. S.: Designing accelerator-based distributed systems for high performance. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 165–174 (2010)
Tan, Y. S., Lee, B.-S., He, B., Campbell, R. H.: A map-reduce based framework for heterogeneous processing element cluster environments. 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) 57–64 (2012)
Fukutomi, D., Iida, Y., Azumi, T., Kato, S., Nishio, N.: GPUhd: augmenting yarn with gpu resource management. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 127–136 (2018)
Mittal, S., Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Comput. Surv. 47, 1–35 (2015)
Article Google Scholar
Chen, L., Huo, X., Agrawal, G.: Accelerating mapreduce on a coupled cpu-gpu architecture. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11 (2012)
Shirahata, K., Sato, H., Matsuoka, S.: Hybrid map task scheduling for GPU-based heterogeneous clusters. In: 2010 IEEE 2nd International Conference on Cloud Computing Technology and Science (CloudCom), pp. 733–740 (2010)
Sabne, A., Sakdhnagool, P., Eigenmann, R.: Heterodoop: A mapreduce programming system for accelerator clusters. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, pp. 235–246 (2015)
Grossman, M., Breternitz, M., Sarkar, V.: HadoopCL2: motivating the design of a distributed, heterogeneous programming system with machine-learning applications. IEEE Trans. Parallel Distrib. Syst. 27, 762–775 (2016)
Article Google Scholar
Aparapi. Aparapi: open-source framework for executing native java code on the GPU (2022). https://aparapi.com/
JavaCPP. The missing bridge between java and native c+ (2020). https://github.com/bytedeco/javacpp
JCuda. Java bindings for CUDA (2022). http://www.jcuda.org/
Yan, Y., Grossman, M., Sarkar, V.: Jcuda: A programmer-friendly interface for accelerating java programs with CUDA. In: European Conference on Parallel Processing, pp. 887–899 (2009)
Apache. Hadoop-3.1 (2020). https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html
NVIDIA. CUDA profiling tools interface (2022). https://docs.nvidia.com/cupti/pdf/Cupti.pdf
NVML. Nvidia management library (2022). https://developer.nvidia.com/nvidia-management-library-nvml
CUDAMPS. Multi-process service (2017). https://docs.nvidia.com/deploy/mps/index.html
Yoon, I., Yi, S., Oh, C., Jung, H., Yi, Y.: Distributed video decoding on hadoop. IEICE Trans. Inf. Syst. 101, 2933–2941 (2018)
Article Google Scholar
Oh, C., Yi, S., Yi, Y.: Real-time face detection in Full HD images exploiting both embedded CPU and GPU. In: 2015 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2015)
Yi, S., Yoon, I., Oh, C., Yi, Y.: Real-time integrated face detection and recognition on embedded gpgpus. In: 2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia), pp. 98–107 (2014)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pp. 886–893 (2005)
OpenCV. Open source computer vision library (2022). https://opencv.org
CVLAB. http://cvlab.epfl.ch/data/pom (2017)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283 (2016)
Google Scholar
Che, S. et al.: Rodinia: A benchmark suite for heterogeneous computing. In: 2009 IEEE international symposium on workload characterization (IISWC), pp. 44–54 (2009)
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. Innov. Parallel Comput. (InPar) 2012, 1–10 (2012)
Google Scholar
Danalis, A. et al.: The scalable heterogeneous computing (SHOC) benchmark suite. in: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74 (2010)
Zheng, Z. et al.: Versapipe: a versatile programming framework for pipelined computing on GPU. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 587–599 (2017)
Oh, C., Zheng, Z., Shen, X., Zhai, J., Yi, Y.: Gopipe: a granularity-oblivious programming framework for pipelined stencil executions on GPU. In: Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, pp. 43–54 (2020)

Download references

Funding

This work was supported by the 2023 sabbatical year research grant of the University of Seoul.

Author information

Authors and Affiliations

Department of Software, Kongju National University, Cheonan, Chungcheongnam-do, 31080, Republic of Korea
Chanyoung Oh
School of Information and Computer Sciences, University of California, Irvine, Irvine, CA, 92697, USA
Saehanseul Yi
School of Electrical and Computer Engineering, University of Seoul, Seoul, 02504, Republic of Korea
Jongkyu Seok, Hyeonjin Jung, Illo Yoon & Youngmin Yi

Authors

Chanyoung Oh
View author publications
You can also search for this author in PubMed Google Scholar
Saehanseul Yi
View author publications
You can also search for this author in PubMed Google Scholar
Jongkyu Seok
View author publications
You can also search for this author in PubMed Google Scholar
Hyeonjin Jung
View author publications
You can also search for this author in PubMed Google Scholar
Illo Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Youngmin Yi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youngmin Yi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Oh, C., Yi, S., Seok, J. et al. Hybridhadoop: CPU-GPU hybrid scheduling in hadoop. Cluster Comput (2023). https://doi.org/10.1007/s10586-023-04178-5

Download citation

Received: 11 May 2023
Revised: 10 October 2023
Accepted: 14 October 2023
Published: 21 November 2023
DOI: https://doi.org/10.1007/s10586-023-04178-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

KCSS: Kubernetes container scheduling strategy

Containerization technologies: taxonomies, applications and challenges

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Abstract

Access this article

Similar content being viewed by others

A survey of Kubernetes scheduling algorithms

KCSS: Kubernetes container scheduling strategy

Containerization technologies: taxonomies, applications and challenges

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation