Skip to main content

Abstract

IT infrastructure components are exposed to miscellaneous anomalies or failures as they flourish swiftly in scale and usage. The identification of failures is possibly managed by system logs produced on the execution of logging statements. A recent highly advantageous technique is to observe the system’s behavior and identify the anomalous log entries to take corrective actions. However, current methods focus on classifying logs but overlook the nature of data. This paper proposes the log analysis system contingent on natural language processing (NLP) techniques considering logs as natural language text. This model is trained through TF-IDF, polarity score, and Word2Vec as vectorization techniques and conventional machine learning classifiers, suitable to group records as per the assigned level. The efficacy of the proposed model was validated on various IT infrastructure logs. Experimental results demonstrate that sentiment analysis is possibly the encouraging technique for analyzing complex, huge, and irregular system logs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    GitHub—logpai/loghub: A large collection of system log datasets for AI-powered log analytics. [Online]. Available: https://github.com/logpai/loghub. Accessed: 27 June 2021.

References

  • K.A. Alharthi, A. Jhumka, S. Di, F. Cappello, E. Chuah, Sentiment analysis based error detection for large-scale systems, in Proceedings of 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2021, no. i (2021), pp. 237–249

    Google Scholar 

  • C. Bertero, M. Roy, C. Sauvanaud, G. Tredan, Experience report: log mining using natural language processing and application to anomaly detection, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2017, Oct 2017, pp. 351–360

    Google Scholar 

  • D.A. Bhanage, A.V. Pawar, K. Kotecha, IT infrastructure anomaly detection and failure handling: a systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access 9, 156392–156421 (2021)

    Article  Google Scholar 

  • R. Chen et al., LogTransfer: cross-system log anomaly detection for software systems with transfer learning, in Proceedings of International Symposium on Software Reliability Engineering ISSRE, vol. 2020, Oct 2020, pp. 37–47

    Google Scholar 

  • A. Das, F. Mueller, C. Siegel, A. Vishnu, Desh: deep learning for system health prediction of lead times to failure in HPC, in HPDC 2018—Proceedings of 2018 International Symposium on High-Performance Parallel and Distributed Computing (2018), pp. 40–51

    Google Scholar 

  • J. Grandgirard, D. Poinsot, L. Krespi, J.P. Nénon, A.M. Cortesero, Costs of secondary parasitism in the facultative hyperparasitoid Pachycrepoideus dubius: does host size matter? Entomol. Exp. Appl. 103(3), 239–248 (2002)

    Article  Google Scholar 

  • P. He, J. Zhu, Z. Zheng, M.R. Lyu, Drain: an online log parsing approach with fixed depth tree, in Proceedings of 2017 IEEE 24th International Conference on Web Services ICWS 2017 (2017), pp. 33–40

    Google Scholar 

  • S. Huang et al., HitAnomaly: hierarchical transformers for anomaly detection in system log. IEEE Trans. Netw. Serv. Manag. 17(4), 2064–2076 (2020)

    Article  Google Scholar 

  • W. Meng et al., Device-agnostic log anomaly classification with partial labels, in 2018 IEEE/ACM 26th International Symposium on Quality of Service, IWQoS 2018, no. 1 (2019a), pp. 1–6

    Google Scholar 

  • W. Meng et al., Loganomaly: unsupervised detection of sequential and quantitative anomalies in unstructured logs, in IJCAI International Joint Conferences on Artificial Intelligence, vol. 2019, Aug 2019 (2019b), pp. 4739–4745

    Google Scholar 

  • M. Platini, T. Ropars, B. Pelletier, N. De Palma, LogFlow: simplified log analysis for large scale systems, in ACM International Conference Proceeding Series (2021), pp. 116–125

    Google Scholar 

  • R. Ren et al., Deep convolutional neural networks for log event classification on distributed cluster systems, in Proceedings of 2018 IEEE International Conference on Big Data, Big Data 2018 (2019), pp. 1639–1646

    Google Scholar 

  • Y. Tan, X. Gu, On predictability of system anomalies in real world, in Proceedings of 18th Annual IEEE/ACM International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, MASCOTS 2010 (2010), pp. 133–140

    Google Scholar 

  • M. Wang, L. Xu, L. Guo, Anomaly detection of system logs based on natural language processing and deep learning, in 2018 4th International Conference on Frontiers of Signal Processing ICFSP 2018 (2018), pp. 140–144

    Google Scholar 

  • J. Wang et al., LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things. Sensors (Switzerland) 20(9), 1–19 (2020)

    Article  Google Scholar 

  • J. Wang, C. Zhao, S. He, Y. Gu, O. Alfarraj, A. Abugabah, LogUAD: log unsupervised anomaly detection based on Word2Vec. Comput. Syst. Sci. Eng. 41(3), 1207–1222 (2022)

    Article  Google Scholar 

  • H. Yang, X. Zhao, D. Sun, Y. Wang, W. Huang, Sprelog: log-based anomaly detection with self-matching networks and pre-trained models, vol. 2 (Springer International Publishing, 2021)

    Google Scholar 

  • X. Zhang et al., Robust log-based anomaly detection on unstable log data, in ESEC/FSE 2019—Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019), pp. 807–817

    Google Scholar 

  • J. Zhu et al., Tools and benchmarks for automated log parsing, in Proceedings of 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice ICSE-SEIP 2019 (2019), pp. 121–130

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepali Arun Bhanage .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhanage, D.A., Pawar, A.V. (2023). Improving Classification-Based Log Analysis Using Vectorization Techniques. In: Reddy, A.B., Nagini, S., Balas, V.E., Raju, K.S. (eds) Proceedings of Third International Conference on Advances in Computer Engineering and Communication Systems. Lecture Notes in Networks and Systems, vol 612. Springer, Singapore. https://doi.org/10.1007/978-981-19-9228-5_24

Download citation

Publish with us

Policies and ethics