Abstract
IT infrastructure components are exposed to miscellaneous anomalies or failures as they flourish swiftly in scale and usage. The identification of failures is possibly managed by system logs produced on the execution of logging statements. A recent highly advantageous technique is to observe the system’s behavior and identify the anomalous log entries to take corrective actions. However, current methods focus on classifying logs but overlook the nature of data. This paper proposes the log analysis system contingent on natural language processing (NLP) techniques considering logs as natural language text. This model is trained through TF-IDF, polarity score, and Word2Vec as vectorization techniques and conventional machine learning classifiers, suitable to group records as per the assigned level. The efficacy of the proposed model was validated on various IT infrastructure logs. Experimental results demonstrate that sentiment analysis is possibly the encouraging technique for analyzing complex, huge, and irregular system logs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
GitHub—logpai/loghub: A large collection of system log datasets for AI-powered log analytics. [Online]. Available: https://github.com/logpai/loghub. Accessed: 27 June 2021.
References
K.A. Alharthi, A. Jhumka, S. Di, F. Cappello, E. Chuah, Sentiment analysis based error detection for large-scale systems, in Proceedings of 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, no. i (2021), pp. 237–249
C. Bertero, M. Roy, C. Sauvanaud, G. Tredan, Experience report: log mining using natural language processing and application to anomaly detection, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2017, Oct 2017, pp. 351–360
D.A. Bhanage, A.V. Pawar, K. Kotecha, IT infrastructure anomaly detection and failure handling: a systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 9, 156392–156421 (2021)
R. Chen et al., LogTransfer: cross-system log anomaly detection for software systems with transfer learning, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2020, Oct 2020, pp. 37–47
A. Das, F. Mueller, C. Siegel, A. Vishnu, Desh: deep learning for system health prediction of lead times to failure in HPC, in HPDC 2018—Proceedings of 2018 International Symposium on High-Performance Parallel and Distributed Computing (2018), pp. 40–51
J. Grandgirard, D. Poinsot, L. Krespi, J.P. Nénon, A.M. Cortesero, Costs of secondary parasitism in the facultative hyperparasitoid Pachycrepoideus dubius: does host size matter? Entomol. Exp. Appl. 103(3), 239–248 (2002)
P. He, J. Zhu, Z. Zheng, M.R. Lyu, Drain: an online log parsing approach with fixed depth tree, in Proceedings of 2017 IEEE 24th International Conference on Web Services ICWS 2017 (2017), pp. 33–40
S. Huang et al., HitAnomaly: hierarchical transformers for anomaly detection in system log. IEEE Trans. Netw. Serv. Manag. 17(4), 2064–2076 (2020)
W. Meng et al., Device-agnostic log anomaly classification with partial labels, in 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, no. 1 (2019a), pp. 1–6
W. Meng et al., Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs, in IJCAI International Joint Conferences on Artificial Intelligence, vol. 2019, Aug 2019 (2019b), pp. 4739–4745
M. Platini, T. Ropars, B. Pelletier, N. De Palma, LogFlow: simplified log analysis for large scale systems, in ACM International Conference Proceeding Series (2021), pp. 116–125
R. Ren et al., Deep convolutional neural networks for log event classification on distributed cluster systems, in Proceedings of 2018 IEEE International Conference on Big Data, Big Data 2018 (2019), pp. 1639–1646
Y. Tan, X. Gu, On predictability of system anomalies in real world, in Proceedings of 18th Annual IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2010 (2010), pp. 133–140
M. Wang, L. Xu, L. Guo, Anomaly detection of system logs based on natural language processing and deep learning, in 2018 4th International Conference on Frontiers of Signal Processing ICFSP 2018 (2018), pp. 140–144
J. Wang et al., LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things. Sensors (Switzerland) 20(9), 1–19 (2020)
J. Wang, C. Zhao, S. He, Y. Gu, O. Alfarraj, A. Abugabah, LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207–1222 (2022)
H. Yang, X. Zhao, D. Sun, Y. Wang, W. Huang, Sprelog: log-based anomaly detection with self-matching networks and pre-trained models, vol. 2 (Springer International Publishing, 2021)
X. Zhang et al., Robust log-based anomaly detection on unstable log data, in ESEC/FSE 2019—Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019), pp. 807–817
J. Zhu et al., Tools and benchmarks for automated log parsing, in Proceedings of 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice ICSE-SEIP 2019 (2019), pp. 121–130
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhanage, D.A., Pawar, A.V. (2023). Improving Classification-Based Log Analysis Using Vectorization Techniques. In: Reddy, A.B., Nagini, S., Balas, V.E., Raju, K.S. (eds) Proceedings of Third International Conference on Advances in Computer Engineering and Communication Systems. Lecture Notes in Networks and Systems, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-19-9228-5_24
Download citation
DOI: https://doi.org/10.1007/978-981-19-9228-5_24
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9227-8
Online ISBN: 978-981-19-9228-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)