Skip to main content

An Open-Source Framework Unifying Stream and Batch Processing

  • Conference paper
  • First Online:
Inventive Computation and Information Technologies

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 336))

Abstract

Log monitoring and analysis plays critical role in identifying events and traces to understand system behaviour at that point in time and to ensure predictive, corrective actions if required. This research is centered towards modelling open-source framework meant for real-time and historical log analytics of IT infrastructure of an educational institute consisting of application servers hosted over Internet and Intranet, peripheral firewalls and IoT devices. Modelling such framework has not only enhanced processing speed of real-time and historical logs through stream processing and batch processing, respectively, but also facilitated system administrators with critical security incidents monitoring and analysis in near-real time. It also allowed forensic investigations on indexed historical logs stored after stream processing by using batch processing. The modelled framework provides open-source, efficient, user-friendly, enterprise-ready centralized heterogeneous log analysis platform with fast searching options. Open-source tools like Apache Flume, Apache Kafka, ELK Stack and Apache Spark are used for log ingestion, stream processing, real-time search and analytics and batch processing, respectively, in this work. Arriving at a novel solution to unify big data processing paradigms stream and batch processing for log analytics, we propose an approach that can be extrapolated to a generalized system for log analytics across a large infrastructure generating voluminous heterogeneous logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. S. Yu, Data processing and development of big data system: a survey, in Advances in Artificial Intelligence and Security. ICAIS 2021, ed. by X. Sun, X. Zhang, Z. Xia, E. Bertino. Communications in Computer and Information Science, vol. 1423 (Springer, Cham, 2021), p. 34. https://doi.org/10.1007/978-3-030-78618-2

  2. M. Harvan, T. Locher, A.C. Sima, Cyclone: unified stream and batch processing, in 2016 45th International Conference on Parallel Processing Workshops (ICPPW) (2016), pp. 220–229. https://doi.org/10.1109/ICPPW.2016.42

  3. H. Nasiri, S. Nasehi, M. Goudarzi, Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. J. Big Data 6, 52 (2019). https://doi.org/10.1186/s40537-019-0215-2

    Article  Google Scholar 

  4. Z. Lv, H. Song, P. Basanta-Val, A. Steed, M. Jo, Next-generation big data analytics: state of the art, challenges, and future research topics. IEEE Trans. Ind. Inf. 13(4), 1891–1899 (2017). https://doi.org/10.1109/TII.2017.2650204

    Article  Google Scholar 

  5. H. Hu, Y. Wen, T.-S. Chua, X. Li, Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014). https://doi.org/10.1109/ACCESS.2014.2332453

    Article  Google Scholar 

  6. S. Chaudhari, V.K. Maurya, V. Singh, S.S. Tomara, A. Rajana, A. Rawata, Real time logs and traffic monitoring, analysis and visualization setup for IT security enhancement, in Next Generation Computing Technologies (NGCT-2019) (2019)

    Google Scholar 

  7. Y. Li, Y. Jiang, J. Gu, M. Lu, M. Yu, E.M. Armstrong, T. Huang, D. Moroni, L.J. McGibbney, G. Frank, C. Yang, A cloud-based framework for large-scale log mining through Apache Spark and elasticsearch. Appl. Sci. 9(6) (2019)

    Google Scholar 

  8. I. Mavridis, H. Karatza, Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark. J. Syst. Softw. 125, 133–151 (2017). ISSN 0164-1212. https://doi.org/10.1016/j.jss.2016.11.037

  9. X. Lin, P. Wang, B. Wu, Log analysis in cloud computing environment with Hadoop and Spark, in 2013 5th IEEE International Conference on Broadband Network and Multimedia Technology (2013), pp. 273–276. https://doi.org/10.1109/ICBNMT.2013.6823956

  10. J. Therdphapiyanak, K. Piromsopa, Applying Hadoop for log analysis toward distributed IDS, in Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication (ICUIMC’13) (Association for Computing Machinery, New York, NY, USA, 2013), Article 3, pp. 1–6. https://doi.org/10.1145/2448556.2448559

  11. S. Mehta, P. Kothuri, D.L. Garcia, Anomaly Detection for Network Connection Logs (2018). arXiv:1812.01941

  12. C. Yang, M. Yu, F. Hu, Y. Jiang, Y. Li, Utilizing cloud computing to address big geospatial data challenges. Comput. Environ. Urban Syst. 61, Part B, 120–128 (2017). ISSN 0198-9715

    Google Scholar 

  13. C. Yang, Q. Huang, Z. Li, K. Liu, F. Hu, Big data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth 10(1), 13–53 (2017). https://doi.org/10.1080/17538947.2016.1239771

  14. S. Salloum, R. Dautov, X. Chen et al., Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1, 145–164 (2016). https://doi.org/10.1007/s41060-016-0027-9

    Article  Google Scholar 

  15. https://spark.apache.org/

  16. https://kafka.apache.org/

  17. S. Chhajed, Learning ELK Stack (Packt Publishing Ltd., Birmingham, UK, 2015)

    Google Scholar 

  18. https://www.elastic.co/

  19. https://flume.apache.org/

  20. T. Kolajo, O. Daramola, A. Adebiyi, Big data stream analysis: a systematic literature review. J. Big Data 6, 47 (2019). https://doi.org/10.1186/s40537-019-0210-7

    Article  Google Scholar 

  21. W. Haoxiang, S. Smys, Big data analysis and perturbation using data mining algorithm. J. Soft Comput. Paradigm (JSCP) 3(01), 19–28 (2021)

    Google Scholar 

  22. D.D. Mishra, S. Pathan, C. Murthy, Apache Spark based analytics of Squid Proxy Logs, in IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), vol. 2018 (2018), pp. 1–6. https://doi.org/10.1109/ANTS.2018.8710044

  23. B.H. Park, S. Hukerikar, R. Adamson, C. Engelmann, Big data meets HPC Log analytics: scalable approach to understanding systems at extreme scale, in IEEE International Conference on Cluster Computing (CLUSTER), vol. 2017 (2017), pp. 758–765. https://doi.org/10.1109/CLUSTER.2017.113

  24. M. Bajer, Building an IoT data hub with elasticsearch, Logstash and Kibana, in 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW) (2017), pp. 63–68. https://doi.org/10.1109/FiCloudW.2017.101

  25. I.Y.M. Al-Mahbashi, M.B. Potdar, P. Chauhan, Network security enhancement through effective log analysis using ELK, in International Conference on Computing Methodologies and Communication (ICCMC), vol. 2017 (2017), pp. 566–570. https://doi.org/10.1109/ICCMC.2017.8282530

  26. J.C. Liu, C.T. Yang, Y.W. Chan et al., Cyberattack detection model using deep learning in a network log system with data visualization. J. Supercomput. (2021). https://doi.org/10.1007/s11227-021-03715-6

    Article  Google Scholar 

  27. L. Chen, J. Liu, M. Xian, H. Wang, Docker container log collection and analysis system based on ELK, in International Conference on Computer Information and Big Data Applications (CIBDA), vol. 2020 (2020), pp. 317–320. https://doi.org/10.1109/CIBDA50819.2020.00078

  28. S.J. Son, Y. Kwon, Performance of ELK stack and commercial system in security log analysis, in 2017 IEEE 13th Malaysia International Conference on Communications (MICC) (2017), pp. 187–190. https://doi.org/10.1109/MICC.2017.8311756

  29. S. Sanjappa, M. Ahmed, Analysis of logs by using Logstash, in Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, ed. by S. Satapathy, V. Bhateja, S. Udgata, P. Pattnaik. Advances in Intelligent Systems and Computing, vol. 516 (Springer, Singapore, 2017). https://doi.org/10.1007/978-981-10-3156-4

  30. Y.T. Wang, C.T. Yang, E. Kristiani, Y.W. Chan, The implementation of Wi-Fi Log analysis system with ELK Stack, in Frontier Computing. FC 2018, ed. by J. Hung, N. Yen, L. Hui. Lecture Notes in Electrical Engineering, vol. 542 (Springer, Singapore, 2019). https://doi.org/10.1007/978-981-13-3648-528

  31. B. Debnath et al., LogLens: a real-time log analysis system, in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (2018), pp. 1052–1062. https://doi.org/10.1109/ICDCS.2018.00105

  32. P. He, J. Zhu, S. He, J. Li, M.R. Lyu, Towards automated log parsing for large-scale log data analysis. IEEE Trans. Dependable Secure Comput. 15(6), 931–944 (2018). https://doi.org/10.1109/TDSC.2017.2762673

  33. R. More, A. Unakal, V. Kulkarni, R.H. Goudar, Real time threat detection system in cloud using big data analytics, in 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), Bangalore (2017), pp. 1262–1264

    Google Scholar 

  34. T. Prakash, M. Kakkar, K. Patel, Geo-identification of web users through logs using ELK stack, in Proceedings of the 2016 6th International Conference Cloud System and Big Data Engineering (Confluence), Noida, India, 14–15 Jan 2016, pp. 606–610

    Google Scholar 

  35. S. Bagnasco, D. Berzano, A. Guarise, S. Lusso, M. Masera, S. Vallero, Monitoring of IaaS and scientific applications on the cloud using the elasticsearch ecosystem. Proc. J. Phys. 608, 012016 (2015)

    Google Scholar 

  36. Y. Li, Y. Jiang, F. Hu, C. Yang, Armstrong, T. Huang, D. Moroni, C. Fench, Leveraging cloud computing to speedup user access log mining, in Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 Sept 2016

    Google Scholar 

  37. C.T. Yang, E. Kristiani, Y.T. Wang et al., On construction of a network log management system using ELK stack with Ceph. J. Supercomput. 76, 6344–6360 (2020). https://doi.org/10.1007/s11227-019-02853-2

    Article  Google Scholar 

  38. M. Podhoranyi, A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages. Earth Sci. Inform. 14, 913–929 (2021). https://doi.org/10.1007/s12145-021-00601-w

    Article  Google Scholar 

  39. F. Firouzi, B. Farahani, Architecting IoT cloud, in Intelligent Internet of Things, ed. by F. Firouzi, K. Chakrabarty, S. Nassif (Springer, Cham, 2020), p. 4. https://doi.org/10.1007/978-3-030-30367-9

  40. W. Xie, P. Li, H. Xu, Architecture and implementation of real-time analysis system based on cold chain data, in Complex, Intelligent, and Software Intensive Systems. CISIS 2018, ed. by L. Barolli, N. Javaid, M. Ikeda, M. Takizawa. Advances in Intelligent Systems and Computing, vol. 772 (Springer, Cham, 2018), p. 44. https://doi.org/10.1007/978-3-319-93659-8

  41. https://hive.apache.org/

  42. http://hadoop.apache.org/

Download references

Acknowledgements

I would like to express our thanks of gratitude to my Guide Prof. Madhuri Rao for guiding me during this work. Lastly, we would like to thank my Research Center Thadomal Shahani College of Engineering for providing me continuous support whenever required.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Deshpande, K., Rao, M. (2022). An Open-Source Framework Unifying Stream and Batch Processing. In: Smys, S., Balas, V.E., Palanisamy, R. (eds) Inventive Computation and Information Technologies. Lecture Notes in Networks and Systems, vol 336. Springer, Singapore. https://doi.org/10.1007/978-981-16-6723-7_45

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-6723-7_45

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-6722-0

  • Online ISBN: 978-981-16-6723-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics