Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem

Rathore, M. Mazhar; Son, Hojae; Ahmad, Awais; Paul, Anand; Jeon, Gwanggil

doi:10.1007/s10766-017-0513-2

Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem

Published: 27 June 2017

Volume 46, pages 630–646, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

M. Mazhar Rathore¹,
Hojae Son¹,
Awais Ahmad²,
Anand Paul¹ &
…
Gwanggil Jeon³

2672 Accesses
61 Citations
Explore all metrics

Abstract

In this technological era, every person, authorities, entrepreneurs, businesses, and many things around us are connected to the internet, forming Internet of thing (IoT). This generates a massive amount of diverse data with very high-speed, termed as big data. However, this data is very useful that can be used as an asset for the businesses, organizations, and authorities to predict future in various aspects. However, efficiently processing Big Data while making real-time decisions is a quite challenging task. Some of the tools like Hadoop are used for Big Datasets processing. On the other hand, these tools could not perform well in the case of real-time high-speed stream processing. Therefore, in this paper, we proposed an efficient and real-time Big Data stream processing approach while mapping Hadoop MapReduce equivalent mechanism on graphics processing units (GPUs). We integrated a parallel and distributed environment of Hadoop ecosystem and a real-time streaming processing tool, i.e., Spark with GPU to make the system more powerful in order to handle the overwhelming amount of high-speed streaming. We designed a MapReduce equivalent algorithm for GPUs for a statistical parameter calculation by dividing overall Big Data files into fixed-size blocks. Finally, the system is evaluated while considering the efficiency aspect (processing time and throughput) using (1) large-size city traffic video data captured by static as well as moving vehicles’ cameras while identifying vehicles and (2) large text-based files, like twitter data files, structural data, etc. Results show that the proposed system working with Spark on top and GPUs under the parallel and distributed environment of Hadoop ecosystem is more efficient and real-time as compared to existing standalone CPU-based MapReduce implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

Article Open access 11 June 2019

Real-Time Traffic Congestion Forecasting Using Prophet and Spark Streaming

Open-Source Big Data Platform for Real-Time Geolocation in Smart Cities

References

Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: Mad skills: new analysis practices for Big Data. Proc. VLDB Endow. 2(2), 1481–1492 (2009)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
IBM, Armonk, NY, USA.: Four Vendor Views on Big Data and Big Data Analytics. IBM [Online]. http://www-Ol.ibm.comlsoftware/in/data/bigdata/ (2012)
CISCO.: The Internet of Things, Infographic. http://blogs.cisco.com/news/the-internet-of-things-infographic/ (2015)
Sivaraman, S., Trivedi, M.M.: Integrated lane and vehicle detection, localization, and tracking: a synergistic approach. IEEE Trans. Intell. Transp. Syst. 14(2), 906–917 (2013)
Article Google Scholar
Rathore, M.M., Ahmad, A., Paul, A., Jeon, G.: Efficient graph-oriented smart transportation using internet of things generated Big Data. In: 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 512–519 (2015)
Ahmad, A., Paul, A., Rathore, M.M., Chang, H.: Smart cyber society: integration of capillary devices with high usability based on cyber-physical system. Future Gen. Comput. Syst. 56, 493–503 (2016)
Article Google Scholar
Rathore, M.M., Ahmad, A., Paul, A., Wan, J., Daqiang, Z.: Real-time medical emergency response system: exploiting IoT and Big Data for public health. J. Med. Syst. 40(12), 283 (2016)
Article Google Scholar
Rathore, M.M., Ahmad, A., Paul, A., Rho, S.: Urban planning and building smart cities based on the internet of things using Big Data analytics. Comput. Netw. 101, 63–80 (2016)
Article Google Scholar
Ahmad, A., Paul, A., Rathore, M.M.: An efficient divide-and-conquer approach for Big Data analytics in machine-to-machine communication. Neurocomputing 174, 439–453 (2016)
Article Google Scholar
Jin, J., Gubbi, J., Marusic, S., Palaniswami, M.: An information framework for creating a smart city through internet of things. IEEE Internet Things J. 1(2), 112–121 (2014)
Article Google Scholar
Apache Hadoop.: Welcome to Apache™ Hadoop®!. http://hadoop.apache.org/ (2016). Accessed 1 Nov 2016
Apache SPARK.: Apache Spark™. http://spark.apache.org/ (2016). Accessed 1 Nov 2016
Ailamaki, A., Govindaraju, N.K., Harizopoulos, S., Manocha, D.: Query co-processing on commodity processors. VLDB 6, 1267–1267 (2006)
Google Scholar
Hadoop.: http://ati.amd.com/technology/streamcomputing/ (2010). Accessed 1 Nov 2016
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating mapreduce for multi-core and multiprocessor systems. In: IEEE 13th International Symposium on High Performance Computer Architecture 2007. HPCA 2007, pp. 13–24 (2007)
Cerotti, D., et al.: Modeling and analysis of performances for concurrent multithread applications on multicore and graphics processing unit systems. Concurr. Comput. Pract. Exp. 28(2), 438–452 (2016)
Article MathSciNet Google Scholar
Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches. In: Microarchitecture. 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on IEEE (2006)
Kavadias, S.G. et al.: On-chip communication and synchronization mechanisms with cache-integrated network interfaces. In: Proceedings of the 7th ACM International Conference on Computing Frontiers. ACM (2010)
Liu, F., Xiaowei J., Solihin, Y.: Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance. In: High Performance Computer Architecture (HPCA). 2010 IEEE 16th International Symposium on IEEE (2010)
D’Amore, L., et al.: HPC computation issues of the incremental 3D variational data assimilation scheme in OceanVar software. J. Numer. Anal. Ind. Appl. Math. 7(3–4), 91–105 (2012)
MathSciNet MATH Google Scholar
Che, S., et al.: A performance study of general-purpose applications on graphics processors using CUDA. J. Parallel Distrib. Comput. 68(10), 1370–1380 (2008)
Article Google Scholar
Owens, J.D., et al.: GPU computing. Proc. IEEE 96(5), 879–899 (2008)
Article Google Scholar
Gregg, C., Hazelwood K.: Where is the data? Why you cannot debate CPU versus GPU performance without the answer. In: Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on IEEE (2011)
Shi, L., et al.: vCUDA: GPU-accelerated high-performance computing in virtual machines. IEEE Trans. Comput. 61(6), 804–816 (2012)
Article MathSciNet MATH Google Scholar
Aldinucci, M., et al.: Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern. Int. J. High Perform. Comput. Appl. 29(4), 461–472 (2015)
Article Google Scholar
Wu, W., et al.: Hierarchical dag scheduling for hybrid distributed systems. In: Parallel and Distributed Processing Symposium (IPDPS), 2015 International IEEE (2015)
Song, F., Dongarra, J.: A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems. Concurr. Comput. Pract. Exp. 27(14), 3702–3723 (2015)
Article Google Scholar
Du, P., et al.: Soft error resilient QR factorization for hybrid system with GPGPU. J. Comput. Sci. 4(6), 457–464 (2013)
Article Google Scholar
Dongarra, J., et al.: Hpc programming on intel many-integrated-core hardware with magma port to xeon phi. Sci. Program. 2015, 9 (2015)
Google Scholar
Braun, T.D., et al.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61(6), 810–837 (2001)
Article MATH Google Scholar
Anderson, E., et al.: LAPACK Users’ guide. In: Society for Industrial and Applied Mathematics (1999)
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide, 3rd edn. SIAM, Philadelphia (1999)
Book MATH Google Scholar
Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., YarKhan, A.: Plasma Users’ Guide, Technical report. In: ICL, UTK (2014)
Blackford, L.S., Choi, J., Cleary, A., D’Azeuedo, E., Demmel, J., Dhillon, I., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK User’s Guide. In: Society for Industrial and Applied Mathematics, Philadelphia (1997)
Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. pp. 1–11 (2009)
Ahmad, A., et al.: Multilevel data processing using parallel algorithms for analyzing Big Data in high-performance computing. Int. J. Parallel Program. doi:10.1007/s10766-017-0498-x (2017)
Rathore, M.M., et al.: Exploiting encrypted and tunneled multimedia calls in high-speed Big Data environment. Multimed. Tools Appl. doi:10.1007/s11042-017-4393-7 (2017)
NVIDIA ACCELERATED COMPUTING.: CUDA Toolkit 8.0. https://developer.nvidia.com/cuda-downloads (2016). Accessed 1 Nov 2016
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of Sixth Conference Symposium on Opearting Systems Design and Implementation (OSDI) (2004)
Arlingtonva.us.: Live traffic cameras. https://transportation.arlingtonva.us/live-traffic-cameras/ (2016). Accessed 1 Nov 2016
43Earth Cam.: LIVE Webcam Network. http://www.earthcam.com/ (2016). Accessed 1 Nov 2016

Download references

Acknowledgements

This study was supported by the Brain Korea 21 Plus project (SW Human Resource Development Program for Supporting Smart Life) funded by Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (21A20131600005).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea
M. Mazhar Rathore, Hojae Son & Anand Paul
Department of Information and Communication Engineering, Yeungnam University, Gyeongbuk, Korea
Awais Ahmad
Department of Embedded Systems Engineering, Incheon National University, Incheon, Korea
Gwanggil Jeon

Authors

M. Mazhar Rathore
View author publications
You can also search for this author in PubMed Google Scholar
Hojae Son
View author publications
You can also search for this author in PubMed Google Scholar
Awais Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Anand Paul
View author publications
You can also search for this author in PubMed Google Scholar
Gwanggil Jeon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anand Paul.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rathore, M.M., Son, H., Ahmad, A. et al. Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem. Int J Parallel Prog 46, 630–646 (2018). https://doi.org/10.1007/s10766-017-0513-2

Download citation

Received: 27 February 2017
Accepted: 15 June 2017
Published: 27 June 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10766-017-0513-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem

Abstract

Access this article

Similar content being viewed by others

Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

Real-Time Traffic Congestion Forecasting Using Prophet and Spark Streaming

Open-Source Big Data Platform for Real-Time Geolocation in Smart Cities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Real-Time Big Data Stream Processing Using GPU with Spark Over Hadoop Ecosystem

Abstract

Access this article

Similar content being viewed by others

Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities

Real-Time Traffic Congestion Forecasting Using Prophet and Spark Streaming

Open-Source Big Data Platform for Real-Time Geolocation in Smart Cities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation