Abstract
Logging is widely used in modern software development to record run-time information for software systems and plays a significant role in software testing. Although the research area of logging has attracted much attention, little attention is paid to the practice of test logging (i.e., the logging involved in test files). To fill this knowledge gap, we conduct this empirical study to explore and disclose the practice of test logging. This study examines 21 open-source subjects with \(\sim \)70K logging statements, of which \(\sim \)48K are production logging statements and \(\sim \)22K are test logging statements. We organize our study by answering four research questions, and as a result, (1) we have yielded five findings to reveal the differences between test and production logging statements, (2) we have disclosed four findings regarding the differences between the maintenance efforts of test and production logging statements, (3) we have identified four reasons why developers use test log, and (4) we have uncovered the relationship between test logging and production logging. To the best of our knowledge, this is the first study that quantitatively and qualitatively analyzes the logging practices in test and production code, providing developers and researchers with insight into this topic.
Similar content being viewed by others
Notes
Scripts and data files used in our research are available online and can be found here: https://github.com/senseconcordia/TestLoggingPractice
In this work, we refer to test outputs as the log messages produced during the execution of the unit tests.
References
Apache Common Logging (2021) Apache commons. https://commons.apache.org/proper/commons-logging/guide.html#JCL_Best_Practices, Accessed: 2021-12-06
Apache Software Foundation (2021) Apache software foundation. https://www.apache.org/, Accessed: 2021-04-25
Chen B, Jiang ZMJ (2017a) Characterizing and detecting anti-patterns in the logging code. In: Proceedings of the 39th international conference on software engineering, ICSE ’17. https://doi.org/10.1109/ICSE.2017.15, pp 71–81
Chen B, Jiang ZMJ (2017b) Characterizing and detecting anti-patterns in the logging code. In: Proceedings of the 39th international conference on software engineering, ICSE ’17. https://doi.org/10.1109/ICSE.2017.15, pp 71–81
Chen B, Jiang ZMJ (2017c) Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation. Empir Softw Eng 22(1):330–374
Chen B, Song J, Xu P, Hu X, Jiang Z M J (2018) An automated approach to estimating code coverage measures via execution logs. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, association for computing machinery, New York, NY, USA, ASE. https://doi.org/10.1145/3238147.3238214, vol 2018, pp 305–316
Cliff N (1996) Ordinal methods for behavioral data analysis. Erlbaum. https://books.google.ca/books?id=bIJFvgAACAAJ
Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic press, Cambridge
Collard ML, Decker MJ, Maletic JI (2013) SrcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. pp 516–519. https://doi.org/10.1109/ICSM.2013.85
Confidence Intervals/Levels (2021) Sample size calculator. https://surveysystem.com/sscalc.htm, Accessed: 2021-07-01
Cramér H (2016) Mathematical methods of statistics (PMS-9), vol 9. Princeton University Press, Princeton
Danial A (2021) Cloc. https://github.com/AlDanial/cloc
Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. USENIX Association, USA, USENIX ATC ’15
Ding Z, Li H, Shang W (2022) Logentext: Automatically generating logging texts using neural machine translation. In: SANER. IEEE
Fisher RA (1922) On the interpretation of x2 from contingency tables, and the calculation of p. Journal of the Royal Statistical Society 85(1):87–94. http://www.jstor.org/stable/2340521
Franke TM, Ho T, Christie CA (2012) The chi-square test: Often used and more often misinterpreted. Am J Eval 33(3):448–458
Fu Q, Lou JG, Wang Y, Li J (2009) Execution anomaly detection in distributed systems through unstructured log analysis. In: 2009 Ninth IEEE international conference on data mining, pp 149–158. https://doi.org/10.1109/ICDM.2009.60
Fu Q, Lou JG, Lin Q, Ding R, Zhang D, Xie T (2013) Contextual analysis of program logs for understanding system behaviors. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, MSR ’13, p 397–400
Fu Q, Zhu J, Hu W, Lou J G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. ICSE Companion 2014:24–33. https://doi.org/10.1145/2591062.2591175
GitPython-Developers (2021) GitPython-Developers/gitpython: Gitpython is a python library used to interact with git repositories. https://git.io/JnXb2, Accessed: 2021-04-25
Gülcü C (2002) The Complete log4j Manual. QOS.ch
Glerum K, Kinshumann K, Greenberg S, Aul G, Orgovan V, Nichols G, Grant D, Loihle G, Hunt G (2009) Debugging in the (very) large: Ten years of implementation and experience. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, association for computing machinery, New York, NY, USA, SOSP ’09. https://doi.org/10.1145/1629575.1629586, pp 103–116
Grechanik M, Jones JA, Orso A, van der Hoek A (2010) Bridging gaps between developers and testers in globally-distributed software development. Association for Computing Machinery, New York, NY, USA, FoSER ’10, 149–154. https://doi.org/10.1145/1882362.1882394
Hassani M, Shang W, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empirical Softw Engg 23(6):3248–3280. https://doi.org/10.1007/s10664-018-9603-z
He P, Chen Z, He S, Lyu M R (2018) Characterizing the natural language descriptions in software logging statements. ASE 2018:178–189. https://doi.org/10.1145/3238147.3238193
Kabinna S, Bezemer CP, Shang W, Syer MD, Hassan AE (2018) Examining the stability of logging statements. Empirical Softw Engg 23 (1):290–333. https://doi.org/10.1007/s10664-017-9518-0
Kernighan B W, Pike R (1999) The practice of programming. Addison-Wesley longman publishing co Inc, USA
Laaber C, Scheuner J, Leitner P (2019) Software microbenchmarking in the cloud. how bad is it really? Empirical Softw Engg 24(4):2469–2508. https://doi.org/10.1007/s10664-019-09681-1
Li H, Shang W, Zou Y, E Hassan A (2017a) Towards just-in-time suggestions for log changes. Empir Softw Eng 22(4):1831–1865. https://doi.org/10.1007/s10664-016-9467-z
Li H, Shang W, Hassan AE (2017b) Which log level should developers choose for a new logging statement? Empir Softw Eng, 22. https://doi.org/10.1007/s10664-016-9456-2
Li H, Chen THP, Shang W, Hassan AE (2018) Studying software logging using topic models. Empirical Softw Engg 23(5):2655–2694. https://doi.org/10.1007/s10664-018-9595-8
Li H, Shang W, Adams B, Sayagh M, Hassan A E (2020a) A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Trans Softw Eng, 1–1. https://doi.org/10.1109/TSE.2020.2970422
Li Z, Tse-Hsun PC, Jinqiu Y, Weiyi S (2019) Characterizing and detecting duplicate logging code smells. In: Proceedings of the 41st international conference on software engineering: companion proceedings, ICSE ’19, p 147–149. https://doi.org/10.1109/ICSE-Companion.2019.00062
Li Z, Chen TH, Shang W (2020b) Where shall we log? studying and suggesting logging locations in code blocks. In: 2020 35th IEEE/ACM international conference on automated software engineering (ASE), pp 361–372
Li Z, Li H, Chen THP, Shang W (2021) Deeplv: Suggesting log levels using ordinal based neural networks. In: 2021 IEEE/ACM 43rd international conference on software engineering (ICSE), pp 1461–1472. https://doi.org/10.1109/ICSE43902.2021.00131
Liu Z, Xia X, Lo D, Xing Z, Hassan A E, Li S (2019) Which variables should I log? IEEE Trans Softw Eng, 1–1. https://doi.org/10.1109/TSE.2019.2941943
Lou JG, Fu Q, Yang S, Xu Y, Li J (2010) Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIX association, USA, USENIXATC’10, p 24
McHugh M (2012) Interrater reliability: The kappa statistic. Biochemia medica 22:276–282. https://doi.org/10.11613/BM.2012.031
McHugh M (2013) The chi-square test of independence. Biochemia medica 23:143–149. https://doi.org/10.11613/BM.2013.018
Microsoft Developer (2021) Microsoft developer. https://developer.microsoft.com/, Accessed: 2021-04-25
Murphy-Hill E, Zimmermann T, Bird C, Nagappan N (2015) The design space of bug fixes and how developers navigate it. IEEE Trans Softw Eng 41(1):65–81. https://doi.org/10.1109/TSE.2014.2357438
Nachar N (2008) The mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in Quantitative Methods for Psychology, 4. https://doi.org/10.20982/tqmp.04.1.p013
Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th international conference on software engineering, ICSE ’05, p 284–292. https://doi.org/10.1145/1062455.1062514
Nagaraj K, Killian C, Neville J (2012) Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of the 9th USENIX conference on networked systems design and implementation, USENIX Association, USA, NSDI’12, p 26
Oracle and/or its affiliates (2021) Package java.util.logging. https://docs.oracle.com/en/java/javase/16/docs/api/java.logging/java/util/logging/package-summary.html, Accessed: 2021-07-05
QOSch (2021) Simple logging facade for java (slf4j). http://www.slf4j.org/, Accessed: 2021-04-25
Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’s d for evaluating group differences on the nsse and other surveys. In: Annual meeting of the Florida Association of Institutional Research, vol 13
Shang W, Nagappan M, Hassan AE, Jiang ZM (2014) Understanding log lines using development knowledge. In: 2014 IEEE international conference on software maintenance and evolution, pp 21–30. https://doi.org/10.1109/ICSME.2014.24
Shang W, Nagappan M, Hassan A E (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27. https://doi.org/10.1007/s10664-013-9274-8
SLF4J (2021) Slf4j. https://www.slf4j.org/faq.html#fatal, Accessed: 2021-11-19
Tang Y, Spektor A, Khatchadourian R, Bagherzadeh M (2021) A tool for rejuvenating feature logging levels via git histories and degree of interest. arXiv:2112.02758
Tang Y, Spektor A, Khatchadourian R, Bagherzadeh M (2022) Automated evolution of feature logging statement levels using git histories and degree of interest. Science of Computer Programming . https://doi.org/10.1016/j.scico.2021.102724
The Apache Software Foundation (2021) Apache Log4j is a java-based logging utility. https://logging.apache.org/log4j/2.x/, Accessed: 2021-04-25
Wang S, Wen M, Liu Y, Wang Y, Wu R (2021) Understanding and facilitating the co-evolution of production and test code. In: 2021 IEEE international conference on software analysis, evolution and reengineering (SANER), pp 272–283. https://doi.org/10.1109/SANER50967.2021.00033
White R, Krinke J, Tan R (2020) Establishing multilevel test-to-code traceability links. pp 861–872. https://doi.org/10.1145/3377811.3380921
Yao K, de Pádua GB, Shang W, Sporea S, Toma A, Sajedi S (2018) Log4perf: Suggesting logging locations for web-based systems’ performance monitoring. pp 127–138. https://doi.org/10.1145/3184407.3184416
Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. SIGARCH Comput Archit News 39 (1):3–14. https://doi.org/10.1145/1961295.1950369
Yuan D, Park S, Huang P, Liu Y, Lee MM, Tang X, Zhou Y, Savage S (2012a) Be conservative: Enhancing failure diagnosis with proactive logging. In: Proceedings of the 10th USENIX conference on operating systems design and implementation, OSDI’12, p 293–306
Yuan D, Park S, Zhou Y (2012b) Characterizing logging practices in open-source software. In: 2012 34th international conference on software engineering (ICSE), pp 102–112. https://doi.org/10.1109/ICSE.2012.6227202
Zeng Y, Chen J, Shang W, Chen T H P (2019) Studying the characteristics of logging practices in mobile apps: a case study on f-Droid. Empir Softw Eng, 24. https://doi.org/10.1007/s10664-019-09687-9
Zhao X, Rodrigues K, Luo Y, Stumm M, Yuan D, Zhou Y (2017) Log20: Fully automated optimal placement of log printing statements under specified overhead threshold. Association for Computing Machinery, New York, NY, USA, SOSP ’17, p 565–581. https://doi.org/10.1145/3132747.3132778
Zhi C, Yin J, Deng S, Ye M, Fu M, Xie T (2019) An exploratory study of logging configuration practice in java. In: 2019 IEEE international conference on software maintenance and evolution (ICSME), pp 459–469. https://doi.org/10.1109/ICSME.2019.00079
Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th international conference on software engineering - vol 1, IEEE Press, ICSE ’15, p 415–425
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Shaukat Ali
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, H., Tang, Y., Lamothe, M. et al. Studying logging practice in test code. Empir Software Eng 27, 83 (2022). https://doi.org/10.1007/s10664-022-10139-0
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10139-0