Examining the stability of logging statements

Kabinna, Suhas; Bezemer, Cor-Paul; Shang, Weiyi; Syer, Mark D.; Hassan, Ahmed E.

doi:10.1007/s10664-017-9518-0

Examining the stability of logging statements

Published: 15 June 2017

Volume 23, pages 290–333, (2018)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Suhas Kabinna ORCID: orcid.org/0000-0003-0784-7725¹,
Cor-Paul Bezemer¹,
Weiyi Shang²,
Mark D. Syer¹ &
…
Ahmed E. Hassan¹

1046 Accesses
58 Citations
3 Altmetric
Explore all metrics

Abstract

Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20–45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83–91% precision, a 65–85% recall and a 0.95–0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards just-in-time suggestions for log changes

Article 24 October 2016

Which log level should developers choose for a new logging statement?

Article 14 October 2016

Studying and detecting log-related issues

Article 15 March 2018

Notes

http://logging.apache.org/log4j/2.x/
http://www.xpolog.com/
http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/org/apache/hadoop/util/TestProcfsBasedProcessTree.java?r1=722760&r2=722759&pathrev=722760
https://issues.apache.org/jira/browse/HADOOP-4190
http://hadoop.apache.org/
https://issues.apache.org/jira/browse/HADOOP-4191
https://issues.apache.org/jira/browse/WICKET-3919
https://wicket.apache.org/
http://activemq.apache.org/(last checked April 2016)
http://camel.apache.org/(last checked April 2016)
https://cloudstack.apache.org/(last checked April 2016)
http://www.liferay.com/(last checked April 2016)
http://www.slf4j.org/(last checked April 2016)
http://logback.qos.ch/(last checked April 2016)

References

Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295
Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10
Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5
Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978
Google Scholar
Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge
Collett D (2015) Modelling survival data in medical research. CRC Press
Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150
Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409
Article MathSciNet MATH Google Scholar
Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157
Article Google Scholar
Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33
Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800
Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley
Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Google Scholar
Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22
MATH Google Scholar
Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event data
Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298
Article Google Scholar
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Google Scholar
Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEE
Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appear
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086
Article Google Scholar
Kendall MG (1948) Rank correlation methods. Oxford, England: Griffin Rank correlation methods. http://psycnet.apa.org/psycinfo/1948-15040-000
Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10
Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appear
Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appear
Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24
Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Eng 21(5):2146–2189. doi:10.1007/s10664-015-9381-9
Article Google Scholar
Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18
Article Google Scholar
Miller RG Jr (2011) Survival analysis, vol 66. Wiley
Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178
Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237
Article Google Scholar
Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294
MATH Google Scholar
Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. Wiley
Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586
Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26
Google Scholar
Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30
Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27
Article Google Scholar
Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307
Article Google Scholar
Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197
Article Google Scholar
Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2015) An empirical comparison of model validation techniques for defect prediction model. http://sailhome.cs.queensu.ca/replication/kla/model-validation.pdf. Under Review at Transactions on Software Engineering (TSE)
Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160
Article MathSciNet MATH Google Scholar
Log4j Last visited March’16. http://logging.apache.org/log4j/2.x/
Xpolog. http://www.xpolog.com/
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132
Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5
Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14
Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112
Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304
Article Google Scholar
Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425

Download references

Author information

Authors and Affiliations

Software Analysis and Intelligence Lab (SAIL), Queen’s University, Kingston, ON, Canada
Suhas Kabinna, Cor-Paul Bezemer, Mark D. Syer & Ahmed E. Hassan
Department of Computer Science and Software Engineering, Concordia University, Montreal, QC, Canada
Weiyi Shang

Authors

Suhas Kabinna
View author publications
You can also search for this author in PubMed Google Scholar
Cor-Paul Bezemer
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Shang
View author publications
You can also search for this author in PubMed Google Scholar
Mark D. Syer
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed E. Hassan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suhas Kabinna.

Additional information

Communicated by: Mark Grechanik

Appendix A: Background on Survival Analysis

Survival analysis comprises a set of statistical modeling techniques that model the time taken for an event to occur (Miller 2011). These modeling techniques can be parametric, semi-parametric or non-parametric in form. However, they share the common goal of modeling the time that it takes between the start of an observation period (i.e., logging statement introduction) and an event (i.e., logging statement change) to occur i.e., they model the survival time of a logging statement. Survival analysis also helps in identifying the important metrics that affect the survival time of a logging statement. The following section discusses the crucial aspect of survival analysis as described in Syer et al. (2015): survival analysis data and measuring time to event.

1.1 Survival Analysis Data and Measuring Time to Event

Survival analysis uses the data that is collected at specific time intervals to observe the relation between how a subject changes over time and the occurrence of an event of interest (e.g., whether a log statement changes). We explain survival analysis using the stability of logging statements as an example.To model the time to change of a logging statement, we collect the data about content, context and developers (metric) for each release (observation period) after a logging statement (subject) is introduced into the application. Each observation in the survival data contains the following fields:

1.
UID: Unique number of each logging statements.
2.
Start: Time of introduction of a logging statement.
3.
Stop: the time at which the logging statement changes.
4.
Event: (1) if the logging statement was changed or (0) if the logging statement was not changed at the end of observation period.
5.
Metrics: The content, context and developer metrics.

Table 10 shows the survival data for a logging statement (Log-1), where the observations are recorded at the beginning of a release. If a logging statement is changed (event occurs), the logging statement is not tracked and the study halted for that particular logging statement. However, some logging statements may never be changed and in such cases it is impractical to track them. Hence, the logging statements are tracked for a certain period of time (e.g., 3 years), during which they may or may not be changed.

Table 10 Data for survival analysis

Full size table

To conduct the survival analysis we need to define how we measure the introducing event (i.e., the first release after introduction of a logging statement), the censored event (i.e., the subsequent months where the logging statement is not changed) and the terminating event (i.e., month the logging statement is changed). From Table 10, we find that the logging statement is changed in the second release, which makes it the terminating event. In the prior releases, the event of interest does not occur which makes the observations censored events. In addition, when a logging statement is not changed during the period of study (i.e., 3 years), their survival is considered equal to the period of study. We include both censored and terminating events for our survival analysis as the models can handle both censored and terminating events and can produce effective survival models without bias (Hosmer and Lemeshow 1999).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kabinna, S., Bezemer, CP., Shang, W. et al. Examining the stability of logging statements. Empir Software Eng 23, 290–333 (2018). https://doi.org/10.1007/s10664-017-9518-0

Download citation

Published: 15 June 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s10664-017-9518-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Examining the stability of logging statements

Abstract

Access this article

Similar content being viewed by others

Towards just-in-time suggestions for log changes

Which log level should developers choose for a new logging statement?

Studying and detecting log-related issues

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Background on Survival Analysis

1.1 Survival Analysis Data and Measuring Time to Event

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Examining the stability of logging statements

Abstract

Access this article

Similar content being viewed by others

Towards just-in-time suggestions for log changes

Which log level should developers choose for a new logging statement?

Studying and detecting log-related issues

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix A: Background on Survival Analysis

Appendix A: Background on Survival Analysis

1.1 Survival Analysis Data and Measuring Time to Event

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation