Skip to main content
Log in

Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Log messages, which are generated by the debug statements that developers insert into the code at runtime, contain rich information about the runtime behavior of software systems. Log messages are used widely for system monitoring, problem diagnoses and legal compliances. Yuan et al. performed the first empirical study on the logging practices in open source software systems. They studied the development history of four C/C++ server-side projects and derived ten interesting findings. In this paper, we have performed a replication study in order to assess whether their findings would be applicable to Java projects in Apache Software Foundations. We examined 21 different Java-based open source projects from three different categories: server-side, client-side and supporting-component. Similar to the original study, our results show that all projects contain logging code, which is actively maintained. However, contrary to the original study, bug reports containing log messages take a longer time to resolve than bug reports without log messages. A significantly higher portion of log updates are for enhancing the quality of logs (e.g., formatting & style changes and spelling/grammar fixes) rather than co-changes with feature implementations (e.g., updating variable names).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • ASF Apache Software Foundation (2016) https://www.apache.org/. Accessed 8 April 2016

  • Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25(4):456–473

    Article  Google Scholar 

  • Beschastnikh I, Brun Y, Ernst MD, Krishnamurthy A (2014) Inferring models of concurrent systems from logs of their behavior with csight. In: Proceedings of the 36th International Conference on Software Engineering (ICSE)

  • Beschastnikh I, Brun Y, Schneider S, Sloan M, Ernst MD (2011) Leveraging existing instrumentation to automatically infer invariant-constrained models. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE ’11

  • Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T (2008) What makes a good bug report?. In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE)

  • Bird C, Bachmann A, Aune E, Duffy J, Bernstein A, Filkov V, Devanbu P (2009) Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (ESEC/FSE)

  • BlackBerry Enterprise Server Logs Submission (2015). https://www.blackberry.com/beslog/. Accessed 10 May 2015

  • Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. In: USENIX Annual Technical Conference

  • Dumps of the ASF Subversion repository (2015) Dumps http://svn-dump.apache.org/. Accessed 10 May 2015

  • Estimating the reproducibility of psychological science (2015) Open Science Collaboration

  • Fluri B, Wursch M, Pinzger M, Gall H (2007) Change distilling:tree differencing for fine-grained source code change extraction. IEEE Trans Softw Eng 33 (11):725–743

    Article  Google Scholar 

  • Fu Q, Zhu J, Hu W, Lou JG, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? An empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering

  • Gall HC, Fluri B, Pinzger M (2009) Change analysis with Evolizer and ChangeDistiller. IEEE Softw 26(1)

  • Gartner (2014) SIEM Magic Quadrant Leadership Report. http://www.gartner.com/document/2780017. Last accessed 05/10/2015

  • Ghezzi G, Gall HC (2013) Replicating mining studies with SOFAS. In: Proceedings of the 10th working conference on mining software repositories

  • Greiler M, Herzig K, Czerwonka J (2015) Code ownership and software quality: a replication study. In: Proceedings of the 12th working conference on mining software repositories (MSR), pp 2–12. IEEE Press

  • Group TO (2014) Application Response Measurement - ARM. https://collaboration.opengroup.org/tech/management/arm/. Last accessed 24 November 2014

  • Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Hassan AE, Martin DJ, Flora P, Mansfield P, Dietz D (2008) An industrial case study of customizing operational profiles using log compression. In: Proceedings of the 30th International Conference on Software Engineering (ICSE)

  • JDT Java development tools (2015) https://eclipse.org/jdt/. Accessed 23 October 2015

  • Jiang ZM, Hassan AE, Hamann G, Flora P (2008) Automatic identification of load testing problems. In: Proceedings of the 24th IEEE international conference on software maintenance (ICSM)

  • Jiang ZM, Hassan AE, Hamann G, Flora P (2009) Automated performance analysis of load tests. In: Proceedings of the 25th IEEE international conference on software maintenance (ICSM)

  • Kampstra P (2008) Beanplot: a boxplot alternative for visual comparison of distributions. J Stat Softw Code Snippets 28(1)

  • logstash - open source log management (2015) http://logstash.net/. Accessed 18 April 2015

  • LOG4J a logging library for Java (2016) http://logging.apache.org/log4j/1.2/. Accessed 8 April 2016

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and mozilla. ACM Trans Softw Eng Methodol 11 (3):309–346

    Article  Google Scholar 

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. Association for Computing Machinery, Inc.

  • Nagios Log Server - Monitor and Manage Your Log Data (2015) https://exchange.nagios.org/directory/Plugins/Log-Files. Accessed 10 May 2015

  • Oliner A, Ganapathi A, Xu W (2012) Advances and challenges in log analysis. Commun ACM 55(2):55–61

    Article  Google Scholar 

  • Premraj R, Herzig K (2011) Network versus code metrics to predict defects: A replication study. In: Proceedings of the 2011 international symposium on empirical software engineering and measurement (ESEM), pp. 215–224

  • Rahman F, Posnett D, Herraiz I, Devanbu P (2013) Sample size vs. bias in defect prediction. In: Proceedings of the 9th joint meeting on foundations of software engineering (ESEC/FSE)

  • Rajlich V (2014) Software Evolution and Maintenance. In: Proceedings of the on future of software engineering (FOSE), pp 133–144. ACM

  • Rigby PC, German DM, Storey MA (2008) Open source software peer review practices: a case study of the apache server. In: Proceedings of the 30th international conference on software engineering (ICSE), pp 541–550

  • Robles G (2010) Replicating msr: A study of the potential replicability of papers published in the mining software repositories proceedings. In: Proceedings of the 7th IEEE Working Conference on Mining Software Repositories (MSR), pp 171–180

  • Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys?. In: Annual meeting of the Florida Association of Institutional Research

  • Shang W, Jiang ZM, Adams B, Hassan A (2009) MapReduce as a general framework to support research in Mining Software Repositories (MSR). In: Proceedings of the 6th IEEE international working conference on mining software repositories

  • Shang W, Jiang ZM, Adams B, Hassan AE, Godfrey MW, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process 26 (1):3–26

    Google Scholar 

  • Shang W, Jiang ZM, Hemmati H, Adams B, Hassan AE, Martin P (2013) Assisting developers of big data analytics applications when deploying on hadoop clouds. In: Proceedings of the 35th international conference on software engineering (ICSE)

  • Shang W, Nagappan M, Hassan AE (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20 (1)

  • Splunk (2015) http://www.splunk.com/. Accessed 18 April 2015

  • Summary of Sarbanes-Oxley Act of 2002 (2015) http://www.soxlaw.com/. Accessed 10 May 2015

  • Syer MD, Jiang ZM, Nagappan M, Hassan AE, Nasser M, Flora P (2014) Continuous validation of load test suites. In: Proceedings of the 5th ACM/SPEC international conference on performance engineering (ICPE)

  • Syer MD, Nagappan M, Adams B, Hassan AE (2015) Replicating and re-evaluating the theory of relative defect-proneness. IEEE Trans Softw Eng 41 (2):176–197

    Article  Google Scholar 

  • Tan L, Yuan D, Krishna G, Zhou Y (2007) /* iComment: Bugs or Bad Comments? */. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP)

  • The AspectJ project (2015) https://eclipse.org/aspectj/. Accessed 10 May 2015

  • The replication package (2015) https://www.dropbox.com/s/tf5omwtaylffsbs/replication_package_major_revision.zip?dl=0 https://www.dropbox.com/s/tf5omwtaylffsbs/replication_package_major_revision.zip?dl=0. Accessed 23 October 2015

  • Wheeler D SLOCCOUNT source lines of code count. http://www.dwheeler.com/sloccount/

  • Woodside M, Franks G, Petriu DC (2007) The Future of Software Performance Engineering. In: Proceedings of the future of software engineering (FOSE) track, international conference on software engineering (ICSE)

  • Xu W, Huang L, Fox A, Patterson D, Jordan MI (2009) Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles (SOSP)

  • Yuan D, Mai H, Xiong W, Tan L, Zhou Y, Pasupathy S (2010) Sherlog: Error diagnosis by connecting clues from run-time logs. In: Proceedings of the fifteenth edition of ASPLOS on architectural support for programming languages and operating systems (ASPLOS)

  • Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of the 34th international conference on software engineering, ICSE ’12. IEEE Press, Piscataway, pp 102–112

  • Yuan D, Zheng J, Park S, Zhou Y, Savage S (2011) Improving software diagnosability via log enhancement. In: Proceedings of the sixteenth international conference on architectural support for programming languages and operating systems (ASPLOS)

  • Zhu J, He P, Fu Q, Zhang H, Lyu MR, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th international conference on software engineering

  • Zimmermann T, Premraj R, Bettenburg N, Just S, Schroter A, Weiss C (2010) What makes a good bug report? Transactions on Software Engineering (TSE)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Boyuan Chen or Zhen Ming (Jack) Jiang.

Additional information

Communicated by: David Lo

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., (Jack) Jiang, Z.M. Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation. Empir Software Eng 22, 330–374 (2017). https://doi.org/10.1007/s10664-016-9429-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9429-5

Keywords

Navigation