Advertisement

Empirical Software Engineering

, Volume 23, Issue 6, pp 3248–3280 | Cite as

Studying and detecting log-related issues

  • Mehran Hassani
  • Weiyi Shang
  • Emad Shihab
  • Nikolaos Tsantalis
Article
  • 180 Downloads

Abstract

Logs capture valuable information throughout the execution of software systems. The rich knowledge conveyed in logs is highly leveraged by researchers and practitioners in performing various tasks, both in software development and its operation. Log-related issues, such as missing or having outdated information, may have a large impact on the users who depend on these logs. In this paper, we first perform an empirical study on log-related issues in two large-scale, open source software systems. We find that the files with log-related issues have undergone statistically significantly more frequent prior changes, and bug fixes. We also find that developers fixing these log-related issues are often not the ones who introduced the logging statement nor the owner of the method containing the logging statement. Maintaining logs is more challenging without clear experts. Finally, we find that most of the defective logging statements remain unreported for a long period (median 320 days). Once reported, the issues are fixed quickly (median five days). Our empirical findings suggest the need for automated tools that can detect log-related issues promptly. We conducted a manual study and identified seven root-causes of the log-related issues. Based on these root causes, we developed an automated tool that detects four evident types of log-related issues. Our tool can detect 75 existing inappropriate logging statements reported in 40 log-related issues. We also reported new issues found by our tool to developers and 38 previously unknown issues in the latest release of the subject systems were accepted by developers.

Keywords

Empirical study Log Software bug Mining software repositories 

References

  1. Baker C, Wuest J, Stern PN (1992) Method slurring: the grounded theory/phenomenology example. J Adv Nurs 17(11):1355–1360CrossRefGoogle Scholar
  2. Barik T, DeLine R, Drucker S, Fisher D (2016) The bones of the system: A case study of logging and telemetry at microsoft. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp 92–101Google Scholar
  3. Bird C, Nagappan N, Devanbu P, Gall H, Murphy B (2009) Does distributed development affect software quality?: an empirical case study of windows vista. Commun ACM 52(8):85–93CrossRefGoogle Scholar
  4. Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering. ACM, pp 4–14Google Scholar
  5. Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of CCA. vol 8, pp 1–5Google Scholar
  6. Carasso D (2012) Exploring Splunk. CIT, CroatiaGoogle Scholar
  7. Chen TH, Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2014) Detecting performance anti-patterns for applications developed using object-relational mapping. In: Proceedings of the 36th International Conference on Software Engineering. ACM, pp 1001–1012Google Scholar
  8. Chen TH, Shang W, Hassan AE, Nasser M, Flora P (2016) Detecting problems in the database access code of large scale systems - an industrial experience report. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp 71–80Google Scholar
  9. Chen B, Jiang ZMJ (2017) Characterizing and detecting anti-patterns in the logging code. In: 2017 IEEE/ACM 39th IEEE International Conference on Software Engineering (ICSE). IEEEGoogle Scholar
  10. Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’15. http://dl.acm.org/citation.cfm?id=2813767.2813778. USENIX Association, Berkeley, pp 139–150
  11. Dmitrienko A, Molenberghs G, Chuang-Stein C, Offen WW (2005) Analysis of clinical trials using SAS: A practical guide. SAS InstituteGoogle Scholar
  12. Fu X, Ren R, Zhan J, Zhou W, Jia Z, Lu G (2012) Logmaster: Mining event correlations in logs of large-scale cluster systems. In: Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed Systems, SRDS ’12.  https://doi.org/10.1109/SRDS.2012.40. IEEE Computer Society, Washington, pp 71–80
  13. Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering. ACM, pp 24–33Google Scholar
  14. Gousios G (2017) java-callgraph: Java Call Graph Utilities. https://github.com/gousiosg/java-callgraph
  15. Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09.  https://doi.org/10.1109/ICSE.2009.5070510. IEEE Computer Society, Washington, pp 78–88
  16. Hen I (2017) GitHub Research:Over 50% of Java Logging Statements Are Written Wrong. https://goo.gl/4Tp1nr/
  17. Herraiz I, Robles G, Gonzalez-Barahona JM, Capiluppi A, Ramil JF (2006) Comparison between slocs and number of files as size metrics for software evolution analysis. In: Conference on Software Maintenance and Reengineering (CSMR’06). pp 8, pp–213,Google Scholar
  18. Herraiz I, Hassan AE (2010) Beyond lines of code: Do we need more complexity metrics?. Making software: what really works, and why we believe it, pp 125–141Google Scholar
  19. Kabinna S, Bezemer CP, Shang W, Hassan AE (2016) Logging library migrations: A case study for the apache software foundation projects. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16.  https://doi.org/10.1145/2901739.2901769. ACM, New York, pp 154–164
  20. Kabinna S, Shang W, Bezemer CP, Hassan AE (2016) Examining the stability of logging statements. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). vol 1, pp 326–337,  https://doi.org/10.1109/SANER.2016.29
  21. Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) Systematic review: A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11-12):1073–1086.  https://doi.org/10.1016/j.infsof.2007.02.015 CrossRefGoogle Scholar
  22. Kernighan BW, Pike R (1999) The Practice of Programming. Addison-Wesley Longman Publishing Co., Inc., BostonGoogle Scholar
  23. Yao K, de Pádua GB, Shang W, Sporea S, Toma A, Sajedi S (2018) Log4perf: Suggesting logging locations for web-based systems’ performance monitoring. In: Proceedings of the 9th ACM/SPEC on International Conference on Performance Engineering, ICPE ’18. ACM, New YorkGoogle Scholar
  24. Li H, Shang W, Hassan AE (2017) Which log level should developers choose for a new logging statement?. Empir Softw Engg 22(4):1684–1716.  https://doi.org/10.1007/s10664-016-9456-2 CrossRefGoogle Scholar
  25. Li H, Shang W, Zou Y, Hassan EA (2017) Towards just-in-time suggestions for log changes. Empir Softw Engg 22(4):1831–1865.  https://doi.org/10.1007/s10664-016-9467-z CrossRefGoogle Scholar
  26. Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13. http://dl.acm.org/citation.cfm?id=2486788.2486927. IEEE Press, Piscataway, pp 1012–1021
  27. Nagappan N, Murphy B, Basili V (2008) The influence of organizational structure on software quality. In: Software Engineering, ACM/IEEE 30th International Conference on 2008. ICSE’08. IEEE, pp 521–530Google Scholar
  28. Nagaraj K, Killian C, Neville J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12. http://dl.acm.org/citation.cfm?id=2228298.2228334. USENIX Association, Berkeley, pp 26–26
  29. Oliner AJ, Aiken A (2011) Online detection of multi-component interactions in production systems. . In: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems&Networks, DSN ’11.  https://doi.org/10.1109/DSN.2011.5958206. IEEE Computer Society, Washington, pp 49–60
  30. Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55(2):55–61.  https://doi.org/10.1145/2076450.2076466 CrossRefGoogle Scholar
  31. Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2011) An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the 18th Working Conference on Reverse Engineering, WCRE ’11, pp 335–344Google Scholar
  32. Shang W, Nagappan M, Hassan AE, Jiang ZM (2014) Understanding log lines using development knowledge. In: Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution, ICSME ’14.  https://doi.org/10.1109/ICSME.2014.24. IEEE Computer Society, Washington, pp 21–30
  33. Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw: Evol Process 26(1):3–26Google Scholar
  34. Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20 (1):1–27CrossRefGoogle Scholar
  35. Stol KJ, Ralph P, Fitzgerald B (2016) Grounded theory in software engineering research: A critical review and guidelines. In: Proceedings of the 38th International Conference on Software Engineering, ICSE ’16.  https://doi.org/10.1145/2884781.2884833. ACM, New York, pp 120–131
  36. Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: Proceedings of the First USENIX Conference on Analysis of System Logs, WASL’08. http://dl.acm.org/citation.cfm?id=1855886.1855892. USENIX Association, Berkeley, pp 6–6
  37. The Open Source Elastic Stack (2017) https://www.elastic.co/products/
  38. Wilcoxon F, Wilcox RA (1964) Some rapid approximate statistical procedures. Lederle LaboratoriesGoogle Scholar
  39. Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP ’09.  https://doi.org/10.1145/1629575.1629587. ACM, New York, pp 117–132
  40. Yin RK (2013) Case study research: Design and methods. Sage publicationsGoogle Scholar
  41. Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: Error diagnosis by connecting clues from run-time logs. In: Proceedings of the Fifteenth Edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV.  https://doi.org/10.1145/1736020.1736038. ACM, New York, pp 143–154
  42. Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12. http://dl.acm.org/citation.cfm?id=2387880.2387909. USENIX Association, Berkeley, pp 293–306
  43. Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12. http://dl.acm.org/citation.cfm?id=2337223.2337236. IEEE Press, Piscataway, pp 102–112
  44. Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012c) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30(1):4:1–4:28.  https://doi.org/10.1145/2110356.2110360 CrossRefGoogle Scholar
  45. Yuan D, Zheng J, Park S, Zhou Y, Savage S (2012d) Improving software diagnosability via log enhancement. ACM Trans Comput Syst (TOCS) 30(1):4CrossRefGoogle Scholar
  46. Zhang H (2009) An investigation of the relationships between lines of code and defects. In: 2009. ICSM 2009. IEEE International Conference on Software Maintenance. IEEE, pp 274–283Google Scholar
  47. Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15. http://dl.acm.org/citation.cfm?id=2818754.2818807. IEEE Press, Piscataway, pp 415–425

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Software EngineeringConcordia UniversityMontrealCanada

Personalised recommendations