Empirical Software Engineering

, Volume 22, Issue 4, pp 1684–1716

Which log level should developers choose for a new logging statement?

Article

Abstract

Logging statements are used to record valuable runtime information about applications. Each logging statement is assigned a log level such that users can disable some verbose log messages while allowing the printing of other important ones. However, prior research finds that developers often have difficulties when determining the appropriate level for their logging statements. In this paper, we propose an approach to help developers determine the appropriate log level when they add a new logging statement. We analyze the development history of four open source projects (Hadoop, Directory Server, Hama, and Qpid), and leverage ordinal regression models to automatically suggest the most appropriate level for each newly-added logging statement. First, we find that our ordinal regression model can accurately suggest the levels of logging statements with an AUC (area under the curve; the higher the better) of 0.75 to 0.81 and a Brier score (the lower the better) of 0.44 to 0.66, which is better than randomly guessing the appropriate log level (with an AUC of 0.50 and a Brier score of 0.80 to 0.83) or naively guessing the log level based on the proportional distribution of each log level (with an AUC of 0.50 and a Brier score of 0.65 to 0.76). Second, we find that the characteristics of the containing block of a newly-added logging statement, the existing logging statements in the containing source code file, and the content of the newly-added logging statement play important roles in determining the appropriate log level for that logging statement.

Keywords

Logging statement Log level Ordinal regression model 

References

  1. Aguinis H (2004) Regression analysis for categorical moderators. Guilford PressGoogle Scholar
  2. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3CrossRefGoogle Scholar
  3. Cohen J, Cohen P, West SG, Aiken LS (2013) Applied multiple regression/correlation analysis for the behavioral sciences. RoutledgeGoogle Scholar
  4. Cullmann AD (2015) HandTill2001: Multiple Class Area under ROC Curve. R package version 0.2-10.Google Scholar
  5. D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4-5):531–577CrossRefGoogle Scholar
  6. Eberhardt C (2014) The art of logging. http://www.codeproject.com/Articles/42354/The-Art-of-Logging. Accessed 12 May 2016
  7. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7 (1):1–26MathSciNetCrossRefMATHGoogle Scholar
  8. Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461– 470MathSciNetCrossRefMATHGoogle Scholar
  9. Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering, ICSE Companion ’14, pp 24–33Google Scholar
  10. Graves TL, Karr AF, Marron JS, Siy H (2000) Predicting fault incidence using software change history. IEEE Trans Softw Eng 26(7):653–661CrossRefGoogle Scholar
  11. Gülcü C, Stark S (2003) The complete log4j manual. Quality Open SoftwareGoogle Scholar
  12. Hand DJ, Till RJ (2001) A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 45(2):171–186CrossRefMATHGoogle Scholar
  13. Harrell Jr FE (2015a) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. SpringerGoogle Scholar
  14. Harrell Jr FE (2015b) rms: Regression Modeling Strategies. R package version 4.4-1Google Scholar
  15. Harrell Jr FE (2014) with contributions from Charles Dupont, and many others. Hmisc: Harrell Miscellaneous. R package version 3.14-5Google Scholar
  16. Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp 78–88Google Scholar
  17. Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER ’16, pp 326–337Google Scholar
  18. Kabinna S, Bezemer C-P, Shang W, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. In: Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, pp 154–164Google Scholar
  19. Kuhn M, Johnson K (2013) Applied predictive modeling. SpringerGoogle Scholar
  20. Lawless J, Singhal K (1978) Efficient screening of nonnormal regression models. Biometrics 34(2):318–327CrossRefGoogle Scholar
  21. Mant J, Doust J, Roalfe A, Barton P, Cowie MR, Glasziou P, Mant D, McManus R, Holder R, Deeks J et al (2009) Systematic review and individual patient data meta-analysis of diagnosis of heart failure, with modelling of implications of different diagnostic strategies in primary care. Health Technol Assess 13(32):1–207CrossRefGoogle Scholar
  22. Mantel N (1970) Why stepdown procedures in variable selection. Technometrics 12(3):621–625CrossRefGoogle Scholar
  23. Mariani L, Pastore F (2008) Automated identification of failure causes in system logs. In: Proceedings of the 2008 19th International Symposium on Software Reliability Engineering, ISSRE ’08, pp 117–126Google Scholar
  24. Mariani L, Pastore F, Pezze M (2009) A toolset for automated failure analysis. In: Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, pp 563–566Google Scholar
  25. McCullagh P (1980) Regression models for ordinal data. J Royal Stat Soc. Ser B (Methodological) 42(2):109–142MathSciNetMATHGoogle Scholar
  26. McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The impact of code review coverage and code review participation on software quality: A case study of the Qt, VTK, and ITK projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp 192–201Google Scholar
  27. McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21 (5):2146–2189CrossRefGoogle Scholar
  28. McKelvey RD, Zavoina W (1975) A statistical model for the analysis of ordinal level dependent variables. J Math Sociol 4(1):103–120MathSciNetCrossRefMATHGoogle Scholar
  29. MSDN (2011) Logging an exception. https://msdn.microsoft.com/en-us/library/ff664711(v=pandp.50).aspx. Accessed 12 May 2016
  30. Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55(2):55–61CrossRefGoogle Scholar
  31. Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: Proceedings of the 37th International Conference on Software Engineering - Volume 2 , ICSE ’15, pp 169–178Google Scholar
  32. Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2011) An exploratory study of the evolution of communicated information about the execution of large software systems. In: Proceedings of the 18th Working Conference on Reverse Engineering, WCRE ’11, pp 335– 344Google Scholar
  33. Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw Evol Process 26(1):3–26CrossRefGoogle Scholar
  34. Shihab E, Jiang ZM, Ibrahim WM, Adams B, Hassan AE (2010) Understanding the impact of code and process metrics on postrelease defects: A case study on the Eclipse project. In: Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM ’10, pp 4:1–4:10Google Scholar
  35. Sommer S, Huggins RM (1996) Variables selection using the Wald test and a robust CP. Appl Stat 45(1):15–29CrossRefMATHGoogle Scholar
  36. Wilks DS (2011) Statistical methods in the atmospheric sciences, vol 100. Academic pressGoogle Scholar
  37. Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, SOSP ’09, pp 117–132Google Scholar
  38. Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: Error diagnosis by connecting clues from run-time logs. In: Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XV, pp 143–154Google Scholar
  39. Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp 3–14Google Scholar
  40. Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, volume 12 of OSDI ’12 , pp 293–306Google Scholar
  41. Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pp 102–112Google Scholar
  42. Yuan D, Luo Y, Zhuang X, Rodrigues GR, Zhao X, Zhang Y, Jain PU, Stumm M (2014) Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI ’14, pp 249–265Google Scholar
  43. Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1 , ICSE ’15, pp 415–425Google Scholar
  44. Zimmermann T, Weisgerber P, Diehl S, Zeller A (2004) Mining version histories to guide software changes. In: Proceedings of the 26th International Conference on Software Engineering, ICSE ’04, pp 563–572Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Software Analysis and Intelligence Lab (SAIL)Queen’s UniversityKingstonCanada
  2. 2.Department of Computer Science and Software EngineeringConcordia UniversityMontrealCanada

Personalised recommendations