Improved Software Reliability Through Failure Diagnosis Based on Clues from Test and Production Logs

Dobrowolski, Wojciech; Nikodem, Maciej; Zawistowski, Marek; Unold, Olgierd

doi:10.1007/978-3-031-06746-4_5

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 484))

Included in the following conference series:

International Conference on Dependability and Complex Systems

403 Accesses
1 Citations

Abstract

Growing demand for software reliability requires developers to analyze many production logs under time pressure. Unfortunately, some failures cannot be detected during the testing phase in large complex systems because they are specific to deployment, configuration parameters, non-deterministic system behaviour, and real-life user input. This article presents a novel and light approach to failure diagnosis based on natural language processing techniques. The aim is to extract as much information as it is possible from data available in standard logs attached to problem description. The approach uses unit test logs (test suites) to gather knowledge about the system. This knowledge is then used to analyze the production log and determine the test suites and the corresponding code block that most likely describes the runtime scenario. The experiments on Apache Hadoop HDFS and NOKIA systems show that the hints given by the framework are helpful to locate the fault.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Making Sense of Failure Logs in an Industrial DevOps Environment

Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications

How to Effectively Reduce Failure Analysis Time?

Notes

1.
https://github.com/dobrowol/defects4all.

References

Apache Hadoop HDFS architecture. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
Apache Hadoop HDFS hdfs-10453. https://issues.apache.org/jira/browse/HDFS-10453
Beschastnikh, I., Brun, Y., Ernst, M.D., Krishnamurthy, A.: Inferring models of concurrent systems from logs of their behavior with CSight. In: Proceedings of the 36th International Conference on Software Engineering, pp. 468–479 (2014),
Google Scholar
Beschastnikh, I., Liu, P., Xing, A., Wang, P., Brun, Y., Ernst, M.D.: Visualizing distributed system executions. ACM Trans. Softw. Eng. Methodol. (TOSEM) 29(2), 1–38 (2020)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Chen, A.R.: An empirical study on leveraging logs for debugging production failures. In: 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), pp. 126–128. IEEE (2019)
Google Scholar
Du, M., Li, F., Zheng, G., Srikumar, V.: DeepLog: anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 1285–1298. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3133956.3134015
Kam, H.T., et al.: Random decision forest. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. vol. 1416, pp. 278–282, Montreal, Canada, August 1995
Google Scholar
Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp. 102–111. IEEE (2016)
Google Scholar
Mason, L., Baxter, J., Bartlett, P., Frean, M.: Boosting algorithms as gradient descent. In: Advances in Neural Information Processing Systems, vol. 12 (1999)
Google Scholar
Pedregosa, F., et al.: scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Wang, J., et al.: LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in internet of things. Sensors 20(9), 2451 (2020)
Article Google Scholar
Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., Pasupathy, S.: SherLog: error diagnosis by connecting clues from run-time logs. In: Proceedings of the fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 143–154 (2010)
Google Scholar
Zhang, H.: The optimality of Naive Bayes. Aa 1(2), 3 (2004)
Google Scholar
Zhang, X., et al.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 807–817 (2019)
Google Scholar
Zhang, Y., Rodrigues, K., Luo, Y., Stumm, M., Yuan, D.: The inflection point hypothesis: a principled debugging approach for locating the root cause of a failure. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles. pp. 131–146 (2019)
Google Scholar

Download references

Acknowledgements

The authors appreciate the valuable comments provided by the anonymous reviewers. This work was supported by NOKIA company and financed by the Polish Ministry of Education and Science. Funds were allocated from “Implementation Doctorate” program.

Author information

Authors and Affiliations

Politechnika Wroclawska, Wroclaw, Poland
Wojciech Dobrowolski, Maciej Nikodem & Olgierd Unold
NOKIA, Wroclaw, Poland
Wojciech Dobrowolski & Marek Zawistowski

Authors

Wojciech Dobrowolski
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Nikodem
View author publications
You can also search for this author in PubMed Google Scholar
Marek Zawistowski
View author publications
You can also search for this author in PubMed Google Scholar
Olgierd Unold
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wojciech Dobrowolski .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Wojciech Zamojski
Wrocław University of Science and Technology, Wrocław, Poland
Jacek Mazurkiewicz
Wrocław University of Science and Technology, Wrocław, Poland
Jarosław Sugier
Wrocław University of Science and Technology, Wrocław, Poland
Tomasz Walkowiak
Polish Academy of Sciences, Systems Research Institute, Warsaw, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dobrowolski, W., Nikodem, M., Zawistowski, M., Unold, O. (2022). Improved Software Reliability Through Failure Diagnosis Based on Clues from Test and Production Logs. In: Zamojski, W., Mazurkiewicz, J., Sugier, J., Walkowiak, T., Kacprzyk, J. (eds) New Advances in Dependability of Networks and Systems. DepCoS-RELCOMEX 2022. Lecture Notes in Networks and Systems, vol 484. Springer, Cham. https://doi.org/10.1007/978-3-031-06746-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-06746-4_5
Published: 27 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06745-7
Online ISBN: 978-3-031-06746-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Improved Software Reliability Through Failure Diagnosis Based on Clues from Test and Production Logs

Abstract

Access this chapter

Similar content being viewed by others

Making Sense of Failure Logs in an Industrial DevOps Environment

Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications

How to Effectively Reduce Failure Analysis Time?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improved Software Reliability Through Failure Diagnosis Based on Clues from Test and Production Logs

Abstract

Access this chapter

Similar content being viewed by others

Making Sense of Failure Logs in an Industrial DevOps Environment

Log-Based Failure Analysis of Complex Systems: Methodology and Relevant Applications

How to Effectively Reduce Failure Analysis Time?

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation