Abstract
Log monitoring and analysis plays critical role in identifying events and traces to understand system behaviour at that point in time and to ensure predictive, corrective actions if required. This research is centered towards modelling open-source framework meant for real-time and historical log analytics of IT infrastructure of an educational institute consisting of application servers hosted over Internet and Intranet, peripheral firewalls and IoT devices. Modelling such framework has not only enhanced processing speed of real-time and historical logs through stream processing and batch processing, respectively, but also facilitated system administrators with critical security incidents monitoring and analysis in near-real time. It also allowed forensic investigations on indexed historical logs stored after stream processing by using batch processing. The modelled framework provides open-source, efficient, user-friendly, enterprise-ready centralized heterogeneous log analysis platform with fast searching options. Open-source tools like Apache Flume, Apache Kafka, ELK Stack and Apache Spark are used for log ingestion, stream processing, real-time search and analytics and batch processing, respectively, in this work. Arriving at a novel solution to unify big data processing paradigms stream and batch processing for log analytics, we propose an approach that can be extrapolated to a generalized system for log analytics across a large infrastructure generating voluminous heterogeneous logs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
S. Yu, Data processing and development of big data system: a survey, in Advances in Artificial Intelligence and Security. ICAIS 2021, ed. by X. Sun, X. Zhang, Z. Xia, E. Bertino. Communications in Computer and Information Science, vol. 1423 (Springer, Cham, 2021), p. 34. https://doi.org/10.1007/978-3-030-78618-2
M. Harvan, T. Locher, A.C. Sima, Cyclone: unified stream and batch processing, in 2016 45th International Conference on Parallel Processing Workshops (ICPPW) (2016), pp. 220–229. https://doi.org/10.1109/ICPPW.2016.42
H. Nasiri, S. Nasehi, M. Goudarzi, Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. J. Big Data 6, 52 (2019). https://doi.org/10.1186/s40537-019-0215-2
Z. Lv, H. Song, P. Basanta-Val, A. Steed, M. Jo, Next-generation big data analytics: state of the art, challenges, and future research topics. IEEE Trans. Ind. Inf. 13(4), 1891–1899 (2017). https://doi.org/10.1109/TII.2017.2650204
H. Hu, Y. Wen, T.-S. Chua, X. Li, Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2, 652–687 (2014). https://doi.org/10.1109/ACCESS.2014.2332453
S. Chaudhari, V.K. Maurya, V. Singh, S.S. Tomara, A. Rajana, A. Rawata, Real time logs and traffic monitoring, analysis and visualization setup for IT security enhancement, in Next Generation Computing Technologies (NGCT-2019) (2019)
Y. Li, Y. Jiang, J. Gu, M. Lu, M. Yu, E.M. Armstrong, T. Huang, D. Moroni, L.J. McGibbney, G. Frank, C. Yang, A cloud-based framework for large-scale log mining through Apache Spark and elasticsearch. Appl. Sci. 9(6) (2019)
I. Mavridis, H. Karatza, Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark. J. Syst. Softw. 125, 133–151 (2017). ISSN 0164-1212. https://doi.org/10.1016/j.jss.2016.11.037
X. Lin, P. Wang, B. Wu, Log analysis in cloud computing environment with Hadoop and Spark, in 2013 5th IEEE International Conference on Broadband Network and Multimedia Technology (2013), pp. 273–276. https://doi.org/10.1109/ICBNMT.2013.6823956
J. Therdphapiyanak, K. Piromsopa, Applying Hadoop for log analysis toward distributed IDS, in Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication (ICUIMC’13) (Association for Computing Machinery, New York, NY, USA, 2013), Article 3, pp. 1–6. https://doi.org/10.1145/2448556.2448559
S. Mehta, P. Kothuri, D.L. Garcia, Anomaly Detection for Network Connection Logs (2018). arXiv:1812.01941
C. Yang, M. Yu, F. Hu, Y. Jiang, Y. Li, Utilizing cloud computing to address big geospatial data challenges. Comput. Environ. Urban Syst. 61, Part B, 120–128 (2017). ISSN 0198-9715
C. Yang, Q. Huang, Z. Li, K. Liu, F. Hu, Big data and cloud computing: innovation opportunities and challenges. Int. J. Digital Earth 10(1), 13–53 (2017). https://doi.org/10.1080/17538947.2016.1239771
S. Salloum, R. Dautov, X. Chen et al., Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 1, 145–164 (2016). https://doi.org/10.1007/s41060-016-0027-9
S. Chhajed, Learning ELK Stack (Packt Publishing Ltd., Birmingham, UK, 2015)
T. Kolajo, O. Daramola, A. Adebiyi, Big data stream analysis: a systematic literature review. J. Big Data 6, 47 (2019). https://doi.org/10.1186/s40537-019-0210-7
W. Haoxiang, S. Smys, Big data analysis and perturbation using data mining algorithm. J. Soft Comput. Paradigm (JSCP) 3(01), 19–28 (2021)
D.D. Mishra, S. Pathan, C. Murthy, Apache Spark based analytics of Squid Proxy Logs, in IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), vol. 2018 (2018), pp. 1–6. https://doi.org/10.1109/ANTS.2018.8710044
B.H. Park, S. Hukerikar, R. Adamson, C. Engelmann, Big data meets HPC Log analytics: scalable approach to understanding systems at extreme scale, in IEEE International Conference on Cluster Computing (CLUSTER), vol. 2017 (2017), pp. 758–765. https://doi.org/10.1109/CLUSTER.2017.113
M. Bajer, Building an IoT data hub with elasticsearch, Logstash and Kibana, in 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW) (2017), pp. 63–68. https://doi.org/10.1109/FiCloudW.2017.101
I.Y.M. Al-Mahbashi, M.B. Potdar, P. Chauhan, Network security enhancement through effective log analysis using ELK, in International Conference on Computing Methodologies and Communication (ICCMC), vol. 2017 (2017), pp. 566–570. https://doi.org/10.1109/ICCMC.2017.8282530
J.C. Liu, C.T. Yang, Y.W. Chan et al., Cyberattack detection model using deep learning in a network log system with data visualization. J. Supercomput. (2021). https://doi.org/10.1007/s11227-021-03715-6
L. Chen, J. Liu, M. Xian, H. Wang, Docker container log collection and analysis system based on ELK, in International Conference on Computer Information and Big Data Applications (CIBDA), vol. 2020 (2020), pp. 317–320. https://doi.org/10.1109/CIBDA50819.2020.00078
S.J. Son, Y. Kwon, Performance of ELK stack and commercial system in security log analysis, in 2017 IEEE 13th Malaysia International Conference on Communications (MICC) (2017), pp. 187–190. https://doi.org/10.1109/MICC.2017.8311756
S. Sanjappa, M. Ahmed, Analysis of logs by using Logstash, in Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications, ed. by S. Satapathy, V. Bhateja, S. Udgata, P. Pattnaik. Advances in Intelligent Systems and Computing, vol. 516 (Springer, Singapore, 2017). https://doi.org/10.1007/978-981-10-3156-4
Y.T. Wang, C.T. Yang, E. Kristiani, Y.W. Chan, The implementation of Wi-Fi Log analysis system with ELK Stack, in Frontier Computing. FC 2018, ed. by J. Hung, N. Yen, L. Hui. Lecture Notes in Electrical Engineering, vol. 542 (Springer, Singapore, 2019). https://doi.org/10.1007/978-981-13-3648-528
B. Debnath et al., LogLens: a real-time log analysis system, in 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) (2018), pp. 1052–1062. https://doi.org/10.1109/ICDCS.2018.00105
P. He, J. Zhu, S. He, J. Li, M.R. Lyu, Towards automated log parsing for large-scale log data analysis. IEEE Trans. Dependable Secure Comput. 15(6), 931–944 (2018). https://doi.org/10.1109/TDSC.2017.2762673
R. More, A. Unakal, V. Kulkarni, R.H. Goudar, Real time threat detection system in cloud using big data analytics, in 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information and Communication Technology (RTEICT), Bangalore (2017), pp. 1262–1264
T. Prakash, M. Kakkar, K. Patel, Geo-identification of web users through logs using ELK stack, in Proceedings of the 2016 6th International Conference Cloud System and Big Data Engineering (Confluence), Noida, India, 14–15 Jan 2016, pp. 606–610
S. Bagnasco, D. Berzano, A. Guarise, S. Lusso, M. Masera, S. Vallero, Monitoring of IaaS and scientific applications on the cloud using the elasticsearch ecosystem. Proc. J. Phys. 608, 012016 (2015)
Y. Li, Y. Jiang, F. Hu, C. Yang, Armstrong, T. Huang, D. Moroni, C. Fench, Leveraging cloud computing to speedup user access log mining, in Proceedings of the OCEANS 2016 MTS/IEEE Monterey, Monterey, CA, USA, 19–23 Sept 2016
C.T. Yang, E. Kristiani, Y.T. Wang et al., On construction of a network log management system using ELK stack with Ceph. J. Supercomput. 76, 6344–6360 (2020). https://doi.org/10.1007/s11227-019-02853-2
M. Podhoranyi, A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of twitter flood-risk messages. Earth Sci. Inform. 14, 913–929 (2021). https://doi.org/10.1007/s12145-021-00601-w
F. Firouzi, B. Farahani, Architecting IoT cloud, in Intelligent Internet of Things, ed. by F. Firouzi, K. Chakrabarty, S. Nassif (Springer, Cham, 2020), p. 4. https://doi.org/10.1007/978-3-030-30367-9
W. Xie, P. Li, H. Xu, Architecture and implementation of real-time analysis system based on cold chain data, in Complex, Intelligent, and Software Intensive Systems. CISIS 2018, ed. by L. Barolli, N. Javaid, M. Ikeda, M. Takizawa. Advances in Intelligent Systems and Computing, vol. 772 (Springer, Cham, 2018), p. 44. https://doi.org/10.1007/978-3-319-93659-8
Acknowledgements
I would like to express our thanks of gratitude to my Guide Prof. Madhuri Rao for guiding me during this work. Lastly, we would like to thank my Research Center Thadomal Shahani College of Engineering for providing me continuous support whenever required.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deshpande, K., Rao, M. (2022). An Open-Source Framework Unifying Stream and Batch Processing. In: Smys, S., Balas, V.E., Palanisamy, R. (eds) Inventive Computation and Information Technologies. Lecture Notes in Networks and Systems, vol 336. Springer, Singapore. https://doi.org/10.1007/978-981-16-6723-7_45
Download citation
DOI: https://doi.org/10.1007/978-981-16-6723-7_45
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6722-0
Online ISBN: 978-981-16-6723-7
eBook Packages: EngineeringEngineering (R0)