Empirical Software Engineering

, Volume 23, Issue 1, pp 290–333 | Cite as

Examining the stability of logging statements

  • Suhas KabinnaEmail author
  • Cor-Paul Bezemer
  • Weiyi Shang
  • Mark D. Syer
  • Ahmed E. Hassan


Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20–45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83–91% precision, a 65–85% recall and a 0.95–0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.


Logging statements Log file stability Log processing tools 


  1. Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295Google Scholar
  2. Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10Google Scholar
  3. Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5Google Scholar
  4. Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978Google Scholar
  5. Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. RoutledgeGoogle Scholar
  6. Collett D (2015) Modelling survival data in medical research. CRC PressGoogle Scholar
  7. Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150Google Scholar
  8. Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409MathSciNetCrossRefzbMATHGoogle Scholar
  9. Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157CrossRefGoogle Scholar
  10. Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33Google Scholar
  11. Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158Google Scholar
  12. Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800Google Scholar
  13. Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. WileyGoogle Scholar
  14. Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. SpringerGoogle Scholar
  15. Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85Google Scholar
  16. Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22zbMATHGoogle Scholar
  17. Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event dataGoogle Scholar
  18. Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298CrossRefGoogle Scholar
  19. Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314Google Scholar
  20. Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEEGoogle Scholar
  21. Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appearGoogle Scholar
  22. Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086CrossRefGoogle Scholar
  23. Kendall MG (1948) Rank correlation methods. Oxford, England: Griffin Rank correlation methods.
  24. Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10Google Scholar
  25. Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appearGoogle Scholar
  26. Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appearGoogle Scholar
  27. Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24Google Scholar
  28. Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021Google Scholar
  29. McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Eng 21(5):2146–2189. doi: 10.1007/s10664-015-9381-9 CrossRefGoogle Scholar
  30. Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18CrossRefGoogle Scholar
  31. Miller RG Jr (2011) Survival analysis, vol 66. WileyGoogle Scholar
  32. Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178Google Scholar
  33. Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237CrossRefGoogle Scholar
  34. Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294zbMATHGoogle Scholar
  35. Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. WileyGoogle Scholar
  36. Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586Google Scholar
  37. Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26Google Scholar
  38. Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30Google Scholar
  39. Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27CrossRefGoogle Scholar
  40. Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307CrossRefGoogle Scholar
  41. Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197CrossRefGoogle Scholar
  42. Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6Google Scholar
  43. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2015) An empirical comparison of model validation techniques for defect prediction model. Under Review at Transactions on Software Engineering (TSE)
  44. Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160MathSciNetCrossRefzbMATHGoogle Scholar
  45. Log4j Last visited March’16.
  46. Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132Google Scholar
  47. Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5Google Scholar
  48. Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306Google Scholar
  49. Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14Google Scholar
  50. Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112Google Scholar
  51. Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304CrossRefGoogle Scholar
  52. Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Software Analysis and Intelligence Lab (SAIL)Queen’s UniversityKingstonCanada
  2. 2.Department of Computer Science and Software EngineeringConcordia UniversityMontrealCanada

Personalised recommendations