Improving Classification-Based Log Analysis Using Vectorization Techniques

Bhanage, Deepali Arun; Pawar, Ambika Vishal

doi:10.1007/978-981-19-9228-5_24

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 612))

333 Accesses
2 Citations

Abstract

IT infrastructure components are exposed to miscellaneous anomalies or failures as they flourish swiftly in scale and usage. The identification of failures is possibly managed by system logs produced on the execution of logging statements. A recent highly advantageous technique is to observe the system’s behavior and identify the anomalous log entries to take corrective actions. However, current methods focus on classifying logs but overlook the nature of data. This paper proposes the log analysis system contingent on natural language processing (NLP) techniques considering logs as natural language text. This model is trained through TF-IDF, polarity score, and Word2Vec as vectorization techniques and conventional machine learning classifiers, suitable to group records as per the assigned level. The efficacy of the proposed model was validated on various IT infrastructure logs. Experimental results demonstrate that sentiment analysis is possibly the encouraging technique for analyzing complex, huge, and irregular system logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
GitHub—logpai/loghub: A large collection of system log datasets for AI-powered log analytics. [Online]. Available: https://github.com/logpai/loghub. Accessed: 27 June 2021.

References

K.A. Alharthi, A. Jhumka, S. Di, F. Cappello, E. Chuah, Sentiment analysis based error detection for large-scale systems, in Proceedings of 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, no. i (2021), pp. 237–249
Google Scholar
C. Bertero, M. Roy, C. Sauvanaud, G. Tredan, Experience report: log mining using natural language processing and application to anomaly detection, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2017, Oct 2017, pp. 351–360
Google Scholar
D.A. Bhanage, A.V. Pawar, K. Kotecha, IT infrastructure anomaly detection and failure handling: a systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 9, 156392–156421 (2021)
Article Google Scholar
R. Chen et al., LogTransfer: cross-system log anomaly detection for software systems with transfer learning, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2020, Oct 2020, pp. 37–47
Google Scholar
A. Das, F. Mueller, C. Siegel, A. Vishnu, Desh: deep learning for system health prediction of lead times to failure in HPC, in HPDC 2018—Proceedings of 2018 International Symposium on High-Performance Parallel and Distributed Computing (2018), pp. 40–51
Google Scholar
J. Grandgirard, D. Poinsot, L. Krespi, J.P. Nénon, A.M. Cortesero, Costs of secondary parasitism in the facultative hyperparasitoid Pachycrepoideus dubius: does host size matter? Entomol. Exp. Appl. 103(3), 239–248 (2002)
Article Google Scholar
P. He, J. Zhu, Z. Zheng, M.R. Lyu, Drain: an online log parsing approach with fixed depth tree, in Proceedings of 2017 IEEE 24th International Conference on Web Services ICWS 2017 (2017), pp. 33–40
Google Scholar
S. Huang et al., HitAnomaly: hierarchical transformers for anomaly detection in system log. IEEE Trans. Netw. Serv. Manag. 17(4), 2064–2076 (2020)
Article Google Scholar
W. Meng et al., Device-agnostic log anomaly classification with partial labels, in 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, no. 1 (2019a), pp. 1–6
Google Scholar
W. Meng et al., Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs, in IJCAI International Joint Conferences on Artificial Intelligence, vol. 2019, Aug 2019 (2019b), pp. 4739–4745
Google Scholar
M. Platini, T. Ropars, B. Pelletier, N. De Palma, LogFlow: simplified log analysis for large scale systems, in ACM International Conference Proceeding Series (2021), pp. 116–125
Google Scholar
R. Ren et al., Deep convolutional neural networks for log event classification on distributed cluster systems, in Proceedings of 2018 IEEE International Conference on Big Data, Big Data 2018 (2019), pp. 1639–1646
Google Scholar
Y. Tan, X. Gu, On predictability of system anomalies in real world, in Proceedings of 18th Annual IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2010 (2010), pp. 133–140
Google Scholar
M. Wang, L. Xu, L. Guo, Anomaly detection of system logs based on natural language processing and deep learning, in 2018 4th International Conference on Frontiers of Signal Processing ICFSP 2018 (2018), pp. 140–144
Google Scholar
J. Wang et al., LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things. Sensors (Switzerland) 20(9), 1–19 (2020)
Article Google Scholar
J. Wang, C. Zhao, S. He, Y. Gu, O. Alfarraj, A. Abugabah, LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207–1222 (2022)
Article Google Scholar
H. Yang, X. Zhao, D. Sun, Y. Wang, W. Huang, Sprelog: log-based anomaly detection with self-matching networks and pre-trained models, vol. 2 (Springer International Publishing, 2021)
Google Scholar
X. Zhang et al., Robust log-based anomaly detection on unstable log data, in ESEC/FSE 2019—Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019), pp. 807–817
Google Scholar
J. Zhu et al., Tools and benchmarks for automated log parsing, in Proceedings of 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice ICSE-SEIP 2019 (2019), pp. 121–130
Google Scholar

Download references

Author information

Authors and Affiliations

Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, 412115, India
Deepali Arun Bhanage & Ambika Vishal Pawar
Department of Computer Engineering, Pimpri Chinchwad Education Trust’s, Pimpri Chinchwad College of Engineering, Pune, India
Deepali Arun Bhanage

Authors

Deepali Arun Bhanage
View author publications
You can also search for this author in PubMed Google Scholar
Ambika Vishal Pawar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepali Arun Bhanage .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, VNR VJIET, Hyderabad, India
A. Brahmananda Reddy
Department of Computer Science and Engineering and Computer Science and Business Systems, VNR VJIET, Hyderabad, India
S. Nagini
Department of Automatics and Applied Software, “Aurel Vlaicu” University of Arad, Arad, Romania
Valentina E. Balas
Department of Computer Science and Engineering, CMR Technical Campus, Hyderabad, India
K. Srujan Raju

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhanage, D.A., Pawar, A.V. (2023). Improving Classification-Based Log Analysis Using Vectorization Techniques. In: Reddy, A.B., Nagini, S., Balas, V.E., Raju, K.S. (eds) Proceedings of Third International Conference on Advances in Computer Engineering and Communication Systems. Lecture Notes in Networks and Systems, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-19-9228-5_24

Download citation

DOI: https://doi.org/10.1007/978-981-19-9228-5_24
Published: 18 March 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9227-8
Online ISBN: 978-981-19-9228-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics