Abstract
Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20–45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83–91% precision, a 65–85% recall and a 0.95–0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.
Similar content being viewed by others
Notes
http://activemq.apache.org/(last checked April 2016)
http://camel.apache.org/(last checked April 2016)
https://cloudstack.apache.org/(last checked April 2016)
http://www.liferay.com/(last checked April 2016)
http://www.slf4j.org/(last checked April 2016)
http://logback.qos.ch/(last checked April 2016)
References
Bigliardi L, Lanza M, Bacchelli A, D’Ambros M, Mocci A (2014) Quantitatively exploring non-code software artifacts. In: 14th international conference on quality software (QSIC), 2014. IEEE, pp 286– 295
Bird C, Rigby PC, Barr ET, Hamilton DJ, German DM, Devanbu P (2009) The promises and perils of mining git. In: 6th IEEE international working conference on mining software repositories, 2009. MSR’09. IEEE, pp 1–10
Boulon J, Konwinski A, Qi R, Rabkin A, Yang E, Yang M (2008) Chukwa, a large-scale monitoring system. In: Proceedings of cloud computing and its applications, vol 8, pp 1–5
Carasso D (2012) Exploring splunk. CITO Research, New York, USA. ISBN, p 978
Cohen J, Cohen P, West S G, Aiken L S (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge
Collett D (2015) Modelling survival data in medical research. CRC Press
Ding R, Zhou H, Lou J-G, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX annual technical conference (USENIX ATC 15), pp 139–150
Elbers C, Ridder G (1982) True and spurious duration dependence: the identifiability of the proportional hazard model. Rev Econ Stud 49(3):403–409
Fisher L D, Lin D Y (1999) Time-dependent covariates in the cox proportional-hazards regression model. Annu Rev Public Health 20(1):145–157
Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Proceedings of ICSE companion 2014: the 36th international conference on software engineering, pp 24–33
Fu Q, Lou J-G, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: Proceedings of the ICDM 2009, ninth IEEE international conference on data mining. IEEE, pp 149–158
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th international conference on software engineering, vol 1. IEEE Press, pp 789–800
Greenwood PE, Nikulin MS (1996) A guide to chi-squared testing, vol 280. Wiley
Harrell F (2015) Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer
Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intell 27(2):83–85
Hoaglin DC, Welsch RE (1978) The hat matrix in regression and anova. Am Stat 32(1):17–22
Hosmer DW Jr, Lemeshow S (1999) Applied survival analysis: regression modelling of time to event data
Hripcsak G, Rothschild AS (2005) Agreement, the f-measure, and reliability in information retrieval. J Am Med Inform Assoc 12(3):296–298
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5(3):299–314
Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016a) Examining the stability of logging statements. In: SANER 2016: Proceedings of IEEE international conference on the software analysis, evolution and re-engineering. IEEE
Kabinna S, Shang W, Bezemer C-P, Hassan AE (2016b) Logging library migrations: a case study for the apache software foundation projects. Mining Software Repositories. page To appear
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11):1073–1086
Kendall MG (1948) Rank correlation methods. Oxford, England: Griffin Rank correlation methods. http://psycnet.apa.org/psycinfo/1948-15040-000
Koru AG, Zhang D, Liu H (2007) Modeling the effect of size on defect proneness for open-source software. In: Proceedings of the third international workshop on predictor models in software engineering. IEEE Computer Society, p 10
Li H, Shang W, Hassan AE (2016a) Which log level should developers choose for a new logging statement? Empir Softw Eng. page To appear
Li H, Shang W, Zou Y, Hassan AE (2016b) Towards just-in-time suggestions for log changes. Empir Softw Eng. page To appear
Lou J-G, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, Berkeley, CA, USA. USENIX Association, p 24
Malik H, Hemmati H, Hassan AE (2013) Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of (ICSE) 2013, 35th international conference on software engineering, pp 1012–1021
McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empirical Softw Eng 21(5):2146–2189. doi:10.1007/s10664-015-9381-9
Mednis M, Aurich MK (2012) Application of string similarity ratio and edit distance in automatic metabolite reconciliation comparing reconstructions and models. Biosyst Info Technol 1(1):14–18
Miller RG Jr (2011) Survival analysis, vol 66. Wiley
Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: assessment of a critical software development process. In: Proceedings of the 37th international conference on software engineering, vol 2. IEEE Press, pp 169–178
Ren H, Tang X, Lee JJ, Feng L, Everett AD, Hong WK, Khuri FR, Mao L (2004) Expression of hepatoma-derived growth factor is a strong prognostic predictor for patients with early-stage non–small-cell lung cancer. J Clin Oncol 22(16):3230–3237
Van Rijsbergen CJ (1979) Information retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA. ISBN 0408709294
Serfling RJ (2009) Approximation theorems of mathematical statistics, vol 162. Wiley
Shang W (2012) Bridging the divide between software developers and operators using logs. In: Proceedings of the 34th international conference on software engineering. IEEE, pp 1583–1586
Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014a) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26(1):3–26
Shang W, Nagappan M, Hassan AE, Jiang ZM (2014b) Understanding log lines using development knowledge. In: Proceedings of ICSME 2014, the international conference on software maintenance and evolution. IEEE, pp 21–30
Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27
Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A (2008) Conditional variable importance for random forests. BMC Bioinforma 9(1):307
Syer M, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41(2):176–197
Tan J, Pan X, Kavulya S, Gandhi R, Narasimhan P (2008) Salsa: Analyzing logs as state machines. In: WASL’08: Proceedings of the 1st USENIX conference on analysis of system logs. USENIX Association, p 6
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2015) An empirical comparison of model validation techniques for defect prediction model. http://sailhome.cs.queensu.ca/replication/kla/model-validation.pdf. Under Review at Transactions on Software Engineering (TSE)
Therneau TM, Grambsch PM, Fleming TR (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160
Log4j Last visited March’16. http://logging.apache.org/log4j/2.x/
Xpolog. http://www.xpolog.com/
Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SOPS 2009, 22nd symposium on operating systems principle, pp 117–132
Xu X, Weber I, Bass L, Zhu L, Wada H, Teng F (2013) Detecting cloud provisioning errors using an annotated process model. In: Proceedings of MW4NG 2013, the 8th workshop on middleware for next generation internet computing. ACM, p 5
Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012) Be conservative: enhancing failure diagnosis with proactive logging. In: OSDI 2012, USENIX Symposium on operating systems design and implementation, pp 293–306
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of ASPLOS 2011, the 16th conference on architectural support for programming languages and operating systems, pp 3–14
Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of ICSE 2012, the 34th international conference on software engineering. IEEE Press, pp 102–112
Zhang D, El Emam K, Liu H (2009) An investigation into the functional form of the size-defect relationship for software modules. IEEE Trans Softw Eng 35(2):293–304
Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: helping developers make informed logging decisions. In: Proceedings of ICSE 2015, the 37th international conference on software engineering, vol 1. IEEE Press, Piscataway, NJ, USA, pp 415–425
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Mark Grechanik
Appendix A: Background on Survival Analysis
Appendix A: Background on Survival Analysis
Survival analysis comprises a set of statistical modeling techniques that model the time taken for an event to occur (Miller 2011). These modeling techniques can be parametric, semi-parametric or non-parametric in form. However, they share the common goal of modeling the time that it takes between the start of an observation period (i.e., logging statement introduction) and an event (i.e., logging statement change) to occur i.e., they model the survival time of a logging statement. Survival analysis also helps in identifying the important metrics that affect the survival time of a logging statement. The following section discusses the crucial aspect of survival analysis as described in Syer et al. (2015): survival analysis data and measuring time to event.
1.1 Survival Analysis Data and Measuring Time to Event
Survival analysis uses the data that is collected at specific time intervals to observe the relation between how a subject changes over time and the occurrence of an event of interest (e.g., whether a log statement changes). We explain survival analysis using the stability of logging statements as an example.To model the time to change of a logging statement, we collect the data about content, context and developers (metric) for each release (observation period) after a logging statement (subject) is introduced into the application. Each observation in the survival data contains the following fields:
-
1.
UID: Unique number of each logging statements.
-
2.
Start: Time of introduction of a logging statement.
-
3.
Stop: the time at which the logging statement changes.
-
4.
Event: (1) if the logging statement was changed or (0) if the logging statement was not changed at the end of observation period.
-
5.
Metrics: The content, context and developer metrics.
Table 10 shows the survival data for a logging statement (Log-1), where the observations are recorded at the beginning of a release. If a logging statement is changed (event occurs), the logging statement is not tracked and the study halted for that particular logging statement. However, some logging statements may never be changed and in such cases it is impractical to track them. Hence, the logging statements are tracked for a certain period of time (e.g., 3 years), during which they may or may not be changed.
To conduct the survival analysis we need to define how we measure the introducing event (i.e., the first release after introduction of a logging statement), the censored event (i.e., the subsequent months where the logging statement is not changed) and the terminating event (i.e., month the logging statement is changed). From Table 10, we find that the logging statement is changed in the second release, which makes it the terminating event. In the prior releases, the event of interest does not occur which makes the observations censored events. In addition, when a logging statement is not changed during the period of study (i.e., 3 years), their survival is considered equal to the period of study. We include both censored and terminating events for our survival analysis as the models can handle both censored and terminating events and can produce effective survival models without bias (Hosmer and Lemeshow 1999).
Rights and permissions
About this article
Cite this article
Kabinna, S., Bezemer, CP., Shang, W. et al. Examining the stability of logging statements. Empir Software Eng 23, 290–333 (2018). https://doi.org/10.1007/s10664-017-9518-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9518-0