Skip to main content

A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection

  • Conference paper
  • First Online:
Runtime Verification (RV 2021)

Abstract

Log-based anomaly detection identifies systems’ anomalous behaviors by analyzing system runtime information recorded in logs. While many approaches have been proposed, all of them have in common an essential pre-processing step called log parsing. This step is needed because automated log analysis requires structured input logs, whereas original logs contain semi-structured text printed by logging statements. Log parsing bridges this gap by converting the original logs into structured input logs fit for anomaly detection.

Despite the intrinsic dependency between log parsing and anomaly detection, no existing work has investigated the impact of the “quality” of log parsing results on anomaly detection. In particular, the concept of “ideal” log parsing results with respect to anomaly detection has not been formalized yet. This makes it difficult to determine, upon obtaining inaccurate results from anomaly detection, if (and why) the root cause for such results lies in the log parsing step.

In this short paper, we lay the theoretical foundations for defining the concept of “ideal” log parsing results for anomaly detection. Based on these foundations, we discuss practical implications regarding the identification and localization of root causes, when dealing with inaccurate anomaly detection, and the identification of irrelevant log messages.

This work has received funding from the Celtic-Next project CRITISEC and NSERC of Canada under the Discovery and CRC programs. Donghwan Shin was partially supported by the Basic Science Research Programme through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2019R1A6A3A03033444).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In general, logs may contain extra information, such as timestamps and logging levels (e.g., info, debug) for individual log messages. However, we omit such information since log parsing deals with log messages characterizing the states or events of the system.

  2. 2.

    This is because distinct \(\tau (m)\) for each message m that appear in L can lead to one or more dimensions. In ML, dimensionality reduction is an essential topic to improve predictive power [1].

  3. 3.

    Though the length of logs can be reduced in a pre-processing step by omitting certain messages or events based on domain knowledge, this is independent from log parsing, which just abstracts messages.

References

  1. Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press (2012), https://dl.acm.org/doi/book/10.5555/3360093

  2. Aigner, M.: A characterization of the bell numbers. Discret. Math. 205(1), 207–210 (1999). https://doi.org/10.1016/S0012-365X(99)00108-9

    Article  MathSciNet  MATH  Google Scholar 

  3. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1-58 (2009). https://doi.org/10.1145/1541880.1541882

  4. Dai, H., Li, H., Chen, C.S., Shang, W., Chen, T.: Logram: efficient log parsing using n-gram dictionaries. IEEE Trans. Softw. Eng. 1 (2020). https://doi.org/10.1109/TSE.2020.3007554

  5. Du, M., Li, F.: Spell: streaming parsing of system event logs. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 859–864. IEEE, Barcelona, Spain (2016)

    Google Scholar 

  6. El-Masri, D., Petrillo, F., Guéhéneuc, Y.G., Hamou-Lhadj, A., Bouziane, A.: A systematic literature review on automated log abstraction techniques. Inf. Softw. Technol. 122, 106276 (2020). https://doi.org/10.1016/j.infsof.2020.106276

  7. Hamooni, H., Debnath, B., Xu, J., Zhang, H., Jiang, G., Mueen, A.: Logmine: fast pattern recognition for log analytics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 1573–1582. Association for Computing Machinery, Indianapolis, IN, USA (2016)

    Google Scholar 

  8. He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: an online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 33–40. IEEE, Honolulu, HI, USA (2017)

    Google Scholar 

  9. He, S., He, P., Chen, Z., Yang, T., Su, Y., Lyu, M.R.: A survey on automated log analysis for reliability engineering. CoRR abs/2009.07237 (2020). https://arxiv.org/abs/2009.07237

  10. He, S., Zhu, J., He, P., Lyu, M.R.: Loghub: a large collection of system log datasets towards automated log analytics (2020)

    Google Scholar 

  11. Jiang, Z.M., Hassan, A.E., Flora, P., Hamann, G.: Abstracting execution logs to execution events for enterprise applications. In: 2008 The Eighth International Conference on Quality Software, pp. 181–186. IEEE, Oxford, UK (2008)

    Google Scholar 

  12. Liu, Z., Xia, X., Lo, D., Xing, Z., Hassan, A.E., Li, S.: Which variables should i log? IEEE Trans. Softw. Eng. 47(9), 2012–2031 (2019). https://doi.org/10.1109/TSE.2019.2941943

  13. Luke, S.: Essentials of Metaheuristics. Lulu, second edn. (2013), available for free at http://cs.gmu.edu/~sean/book/metaheuristics/

  14. Makanju, A.A., Zincir-Heywood, A.N., Milios, E.E.: Clustering event logs using iterative partitioning. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1255–1264. Association for Computing Machinery, New York, NY, USA (2009)

    Google Scholar 

  15. Messaoudi, S., Panichella, A., Bianculli, D., Briand, L., Sasnauskas, R.: A search-based approach for accurate identification of log message formats. In: 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC), pp. 167–16710. ACM, Association for Computing Machinery, Gothenburg, Sweden (2018)

    Google Scholar 

  16. Mizutani, M.: Incremental mining of system log format. In: 2013 IEEE International Conference on Services Computing, pp. 595–602. IEEE, Santa Clara, CA, USA (2013)

    Google Scholar 

  17. Nagappan, M., Vouk, M.A.: Abstracting log lines to log event types for mining software system logs. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 114–117. IEEE, IEEE, Cape Town, South Africa (2010)

    Google Scholar 

  18. Shima, K.: Length matters: clustering system log messages using length of words (2016)

    Google Scholar 

  19. Tang, L., Li, T., Perng, C.S.: Logsig: Generating system events from raw textual logs. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 785–794. ACM, New York, NY, USA (2011)

    Google Scholar 

  20. Vaarandi, R., Pihelgas, M.: Logcluster - a data clustering and pattern mining algorithm for event logs. In: 2015 11th International Conference on Network and Service Management (CNSM), pp. 1–7. IEEE, Barcelona, Spain (2015). https://doi.org/10.1109/CNSM.2015.7367331

  21. Vaarandi, R.: A data clustering algorithm for mining patterns from event logs. In: Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764), pp. 119–126. IEEE, Kansas City, MO, USA (2003)

    Google Scholar 

  22. Yuan, D., Park, S., Huang, P., Liu, Y., Lee, M.M., Tang, X., Zhou, Y., Savage, S.: Be conservative: enhancing failure diagnosis with proactive logging. In: 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 293–306. USENIX Association, Hollywood, CA (October 2012). https://www.usenix.org/conference/osdi12/technical-sessions/presentation/yuan

  23. Yuan, D., Zheng, J., Park, S., Zhou, Y., Savage, S.: Improving software diagnosability via log enhancement. ACM Trans. Comput. Syst. 30(1), 1-28 (2012). https://doi.org/10.1145/2110356.2110360

  24. Zhao, X., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D., Zhou, Y.: Log20: fully automated optimal placement of log printing statements under specified overhead threshold. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 565–581. SOSP 2017, Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3132747.3132778

  25. Zhu, J., He, S., Liu, J., He, P., Xie, Q., Zheng, Z., Lyu, M.R.: Tools and benchmarks for automated log parsing. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), pp. 121–130. IEEE, Madrid, Spain (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Donghwan Shin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shin, D., Khan, Z.A., Bianculli, D., Briand, L. (2021). A Theoretical Framework for Understanding the Relationship Between Log Parsing and Anomaly Detection. In: Feng, L., Fisman, D. (eds) Runtime Verification. RV 2021. Lecture Notes in Computer Science(), vol 12974. Springer, Cham. https://doi.org/10.1007/978-3-030-88494-9_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88494-9_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88493-2

  • Online ISBN: 978-3-030-88494-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics