Abstract
Big Data technologies emerging day by day and are making drastic changes in various real-world applications. Traditional data mining tools adequate to process volumes of data but from past decades the rapid growth in data becomes difficult for processing. Due to continuous flow of data, data streams require additional computational processing than the traditional one. Big data stream processing considers different features of the data streams heterogeneity, scalability, fault tolerance and query optimization. Efficient implementation of these features in real-world applications using big data analytics is a challenging job during data storage, processing, and analysis phases. Therefore, the proposed model FRTSPS is a generic architecture which is influenced by popular big data processing Lambda architecture, based on distributed computing platform. The architecture using open-source platform Apache Flink for doing data processing. Flink is a popular platform for processing historical and stream data flows at once parallelly. Its stateful streaming can obtain more scalability and flexibility along with high throughput and low latency than the remaining stream processing programming models.
Similar content being viewed by others
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Kiran M, Murphy P, Monga I, Dugan J, Baveja SS (2015) Lambda architecture for cost-effective batch and speed big data processing. 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp. 2785–2792. https://doi.org/10.1109/BigData.2015.7364082
Isah H, Abughofa T, Mahfuz S, Ajerla D, Zulkernine F, Khan S (2019) A survey of distributed data stream processing frameworks. IEEE Access 7:154300–154316
Tantalaki N, Souravlas S, Roumeliotis M (2020) A review on big data real-time stream processing and its scheduling techniques. Int J Parallel Emergent Distrib Syst 35(5):571–601
Lopez MA, Lobato AG, Duarte OC (2016) A performance comparison of open-source stream processing platforms. 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, pp. 1–6. https://doi.org/10.1109/GLOCOM.2016.7841533
Rabl T, Traub J, Katsifodimos A, Markl V (2016) Apache Flink in current research. It-Inform Technol 58(4):157–165
Feng L (2020) A real-time computer network trend analysis algorithm based on dynamic data stream in the context of big data. 2020 International conference on intelligent transportation, big data & smart city (ICITBS), Vientiane, Laos, pp. 473–476. https://doi.org/10.1109/ICITBS49701.2020.00102
Carbone P, Fragkoulis M, Kalavri V, Katsifodimos A (2020) Beyond analytics: The evolution of stream processing systems. In Proceedings of the 2020 ACM SIGMOD international conference on management of data (SIGMOD '20). Association for computing machinery, New York, USA, 2651–2658. https://doi.org/10.1145/3318464.3383131
Marques, Nuno C, Bruno Silva, Hugo Santos (2016) An interactive interface for multi-dimensional data stream analysis. 2016 20th International Conference Information Visualisation (IV), Lisbon, Portugal, pp. 223–229. https://doi.org/10.1109/IV.2016.72
De Mauro A, Greco M, Grimaldi M (2016) A formal definition of Big Data based on its essential features. Libr Rev 65(3):122–135
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering, 38(4):28–38
Jiang W, Luo J (2022) Big data for traffic estimation and prediction: a survey of data and tools. Appl Syst Innov 5(1):23
Nazari E, Shahriari MH, Tabesh H (2019) BigData analysis in healthcare: apache hadoop, apache spark and apache flink. Front Health Inform 8(1):14
Naoual El aboudi and Benhlima L (2018) Big data management for healthcare systems: architecture, requirements, and implementation." Advances in Bioinformatics, 2018(4059018):10. https://doi.org/10.1155/2018/4059018
Venkataraman S, Panda A, Ousterhout K, Armbrust M, Ghodsi A, Franklin MJ, Recht B, Stoica I (2017) Drizzle: Fast and adaptable stream processing at scale. In Proceedings of the 26th Symposium on Operating Systems Principles, 374–389. https://doi.org/10.1145/3132747.3132750
Fragkoulis M, Carbone P, Kalavri V, Katsifodimos A (2020) A survey on the evolution of stream processing systems. arXiv preprint arXiv:2008.00842
Mahapatra T (2020) Composing high-level stream processing pipelines. J Big Data 7(1):1–28
Van Dongen G, Van Den Poel D (2021) Influencing factors in the scalability of distributed stream processing jobs. IEEE Access 9:109413–109431
Shahverdi E, Awad A, Sakr S (2019) Big stream processing systems: an experimental evaluation. In 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), 53–60
HoseinyFarahabady MR, Jannesari A, Taheri J, Bao W, Zomaya AY, Tari Z (2020) Q-flink: A qos-aware controller for apache flink. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, pp. 629-638. https://doi.org/10.1109/CCGrid49817.2020.00-30
Iwendi C, Ponnan S, Munirathinam R, Srinivasan K, Chang C-Y (2019) An efficient and unique TF/IDF algorithmic model-based data analysis for handling applications with big data streaming. Electronics 8(11):1331
Ta, V-D, Liu C-M, Nkabinde GW (2016) Big data stream computing in healthcare real-time analytics. In 2016 IEEE international conference on cloud computing and big data analysis (ICCCBDA), pp. 37–42. IEEE
Akanbi A, Masinde M (2020) A distributed stream processing middleware framework for real-time analysis of heterogeneous data on big data platform: case of environmental monitoring. Sensors 20(11):3166
Roriz Junior M, Olivieri B, Endler M (2019) DG2CEP: a near real-time on-line algorithm for detecting spatial clusters large data streams through complex event processing. J Internet Serv Appl 10(1):1–28
Vanathi R, Khadir AS (2017) A robust architectural framework for big data stream computing in personal healthcare real time analytics. 2017 world congress on computing and communication technologies (WCCCT), Tiruchirappalli, India, pp. 97–104. https://doi.org/10.1109/WCCCT.2016.32
Puthal D, Nepal S, Ranjan R, Chen J (2016) A secure big data stream analytics framework for disaster management on the cloud. 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Sydney, NSW, Australia, pp. 1218–1225. https://doi.org/10.1109/HPCC-SmartCity-DSS.2016.0170
Corral-Plaza D, Medina-Bulo I, Ortiz G, Boubeta-Puig J (2020) A stream processing architecture for heterogeneous data sources in the Internet of Things. Comput Stand Inter 70:103426
van Dongen G, Van Den Poel D (2021) A performance analysis of fault recovery in stream processing frameworks. IEEE Access 9:93745–93763
Hasani Z, Kon-Popovska M, Velinov G (2014) Lambda architecture for real time big data analytic. ICT Innovations 133–143
Probst L, Rauschenbach F, Schuldt H, Seidenschwarz P, Rumo M (2018) Integrated real-time data stream analysis and sketch-based video retrieval in team sports. 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp. 548-555. https://doi.org/10.1109/BigData.2018.8622592
Qadah E, Mock M, Alevizos E, Fuchs G (2018) Lambda architecture for batch and stream processing. In CEUR Workshop Proc 2083:109–116
Li Z, Yu J, Bian C, Pu Y, Wang Y, Zhang Y, Guo B (2020) Flink-er: an elastic resource-scheduling strategy for processing fluctuating mobile stream data on flink. Mobile Information Systems, 2020(5351824):17. https://doi.org/10.1155/2020/5351824
Van Dongen G, Van den Poel D (2020) Evaluation of stream processing frameworks. IEEE Trans Parallel Distrib Syst 31(8):1845–1858
Karri C (2021) Secure robot face recognition in cloud environments. Multimedia Tools Appl 80(12):18611–18626
Shen J, Yan S, & Hua XS (2010). The e-recall environment for cloud based mobile rich media data management. In Proceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing. 31–34. https://doi.org/10.1145/1877953.1877963
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests
The authors declares that there is no conflict of interest for this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Deepthi, B.G., Rani, K.S., Krishna, P.V. et al. An efficient architecture for processing real-time traffic data streams using apache flink. Multimed Tools Appl 83, 37369–37385 (2024). https://doi.org/10.1007/s11042-023-17151-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17151-6