Skip to main content

Advertisement

Log in

Examining the stability of logging statements

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20–45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83–91% precision, a 65–85% recall and a 0.95–0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Listing 1
Fig. 4
Listing 2
Fig. 5
Listing 3
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. http://logging.apache.org/log4j/2.x/

  2. http://www.xpolog.com/

  3. http://svn.apache.org/viewvc/hadoop/core/trunk/src/test/org/apache/hadoop/util/TestProcfsBasedProcessTree.java?r1=722760&r2=722759&pathrev=722760

  4. https://issues.apache.org/jira/browse/HADOOP-4190

  5. http://hadoop.apache.org/

  6. https://issues.apache.org/jira/browse/HADOOP-4191

  7. https://issues.apache.org/jira/browse/WICKET-3919

  8. https://wicket.apache.org/

  9. http://activemq.apache.org/(last checked April 2016)

  10. http://camel.apache.org/(last checked April 2016)

  11. https://cloudstack.apache.org/(last checked April 2016)

  12. http://www.liferay.com/(last checked April 2016)

  13. http://www.slf4j.org/(last checked April 2016)

  14. http://logback.qos.ch/(last checked April 2016)

References

  • Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295

  • Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10

  • Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5

  • Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978

    Google Scholar 

  • Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge

  • Collett D (2015) Modelling survival data in medical research. CRC Press

  • Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150

  • Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157

    Article  Google Scholar 

  • Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33

  • Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158

  • Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800

  • Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley

  • Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer

  • Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85

    Google Scholar 

  • Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22

    MATH  Google Scholar 

  • Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event data

  • Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298

    Article  Google Scholar 

  • Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314

    Google Scholar 

  • Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEE

  • Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appear

  • Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086

    Article  Google Scholar 

  • Kendall MG (1948) Rank correlation methods. Oxford, England: Griffin Rank correlation methods. http://psycnet.apa.org/psycinfo/1948-15040-000

  • Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10

  • Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appear

  • Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appear

  • Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24

  • Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Eng 21(5):2146–2189. doi:10.1007/s10664-015-9381-9

    Article  Google Scholar 

  • Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18

    Article  Google Scholar 

  • Miller RG Jr (2011) Survival analysis, vol 66. Wiley

  • Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178

  • Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237

    Article  Google Scholar 

  • Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294

    MATH  Google Scholar 

  • Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. Wiley

  • Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586

  • Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26

    Google Scholar 

  • Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30

  • Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27

    Article  Google Scholar 

  • Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307

    Article  Google Scholar 

  • Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197

    Article  Google Scholar 

  • Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6

  • Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2015) An empirical comparison of model validation techniques for defect prediction model. http://sailhome.cs.queensu.ca/replication/kla/model-validation.pdf. Under Review at Transactions on Software Engineering (TSE)

  • Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160

    Article  MathSciNet  MATH  Google Scholar 

  • Log4j Last visited March’16. http://logging.apache.org/log4j/2.x/

  • Xpolog. http://www.xpolog.com/

  • Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132

  • Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5

  • Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306

  • Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14

  • Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112

  • Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304

    Article  Google Scholar 

  • Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suhas Kabinna.

Additional information

Communicated by: Mark Grechanik

Appendix A: Background on Survival Analysis

Appendix A: Background on Survival Analysis

Survival analysis comprises a set of statistical modeling techniques that model the time taken for an event to occur (Miller 2011). These modeling techniques can be parametric, semi-parametric or non-parametric in form. However, they share the common goal of modeling the time that it takes between the start of an observation period (i.e., logging statement introduction) and an event (i.e., logging statement change) to occur i.e., they model the survival time of a logging statement. Survival analysis also helps in identifying the important metrics that affect the survival time of a logging statement. The following section discusses the crucial aspect of survival analysis as described in Syer et al. (2015): survival analysis data and measuring time to event.

1.1 Survival Analysis Data and Measuring Time to Event

Survival analysis uses the data that is collected at specific time intervals to observe the relation between how a subject changes over time and the occurrence of an event of interest (e.g., whether a log statement changes). We explain survival analysis using the stability of logging statements as an example.To model the time to change of a logging statement, we collect the data about content, context and developers (metric) for each release (observation period) after a logging statement (subject) is introduced into the application. Each observation in the survival data contains the following fields:

  1. 1.

    UID: Unique number of each logging statements.

  2. 2.

    Start: Time of introduction of a logging statement.

  3. 3.

    Stop: the time at which the logging statement changes.

  4. 4.

    Event: (1) if the logging statement was changed or (0) if the logging statement was not changed at the end of observation period.

  5. 5.

    Metrics: The content, context and developer metrics.

Table 10 shows the survival data for a logging statement (Log-1), where the observations are recorded at the beginning of a release. If a logging statement is changed (event occurs), the logging statement is not tracked and the study halted for that particular logging statement. However, some logging statements may never be changed and in such cases it is impractical to track them. Hence, the logging statements are tracked for a certain period of time (e.g., 3 years), during which they may or may not be changed.

Table 10 Data for survival analysis

To conduct the survival analysis we need to define how we measure the introducing event (i.e., the first release after introduction of a logging statement), the censored event (i.e., the subsequent months where the logging statement is not changed) and the terminating event (i.e., month the logging statement is changed). From Table 10, we find that the logging statement is changed in the second release, which makes it the terminating event. In the prior releases, the event of interest does not occur which makes the observations censored events. In addition, when a logging statement is not changed during the period of study (i.e., 3 years), their survival is considered equal to the period of study. We include both censored and terminating events for our survival analysis as the models can handle both censored and terminating events and can produce effective survival models without bias (Hosmer and Lemeshow 1999).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kabinna, S., Bezemer, CP., Shang, W. et al. Examining the stability of logging statements. Empir Software Eng 23, 290–333 (2018). https://doi.org/10.1007/s10664-017-9518-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9518-0

Keywords

Navigation