Skip to main content
Log in

The sense of logging in the Linux kernel

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Logging plays a crucial role in software engineering because it is key to perform various tasks including debugging, performance analysis, and detection of anomalies. Despite the importance of log data, the practice of logging still suffers from the lack of common guidelines and best practices. Recent studies investigated logging in C/C++ and Java open-source systems. In this paper, we complement these studies by conducting the first empirical study on logging practices in the Linux kernel, one of the most elaborate open-source development projects in the computer industry. We analyze 22 Linux releases with a focus on three main aspects: the pervasiveness of logging in Linux, the types of changes made to logging statements, and the rationale behind these changes. Our findings show that logging code accounts for 3.73% of the total source code in the Linux kernel, distributed across 72.36% of Linux files. We also found that the distribution of logging statements across Linux subsystems and their components vary significantly with no apparent reasons, suggesting that developers use different criteria when logging. In addition, we observed a slow decrease in the use of logging—reduction of 9.27% between versions v4.3 and v5.3. The majority of changes in logging code are made to fix language issues, modify log levels, and upgrade logging code to use new logging libraries, with the overall goal of improving the precision and consistency of the log output. Many recommendations are derived from our findings such as the use of static analysis tools to detect log-related issues, the adoption of common writing styles to improve the quality of log messages, the development of conventions to guide developers when selecting log levels, the establishment of review sessions to review logging code, and so on. Our recommendations can serve as a basis for developing logging guidelines as well as better logging processes, tools, and techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Listing 1
Listing 2
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Listing 3

Similar content being viewed by others

Notes

  1. https://www.gnu.org/licenses/old-licenses/gpl-2.0.html

  2. https://github.com/gregkh/kernel-history/

  3. https://github.com/torvalds/linux/commit/6a13feb9

  4. https://github.com/torvalds/linux/commit/4d856f72

  5. https://github.com/torvalds/linux

  6. https://github.com/gregkh/kernel-history/blob/master/scripts/genstat.pl

  7. https://github.com/torvalds/linux/blob/v5.3/fs/afs/internal.h#L1449

  8. https://github.com/torvalds/linux/blob/v5.3/arch/x86/kvm/i8259.c#L37

  9. https://github.com/torvalds/linux/blob/v5.3/drivers/char/mwave/mwavedd.h#L79

  10. https://github.com/torvalds/linux/blob/v5.3/drivers/char/mwave/mwavedd.h#L89

  11. https://github.com/iamkeyur/linux-logging-2021

  12. https://repo.or.cz/davej-history.git?a=commit;h=aa66269c

  13. Note that the change made to the logging statement is not semantically equivalent to the original statement, but with the bug fixed. The goal in the change was to fix the NULL pointer and the change was accepted to fix this issue.

  14. http://coccinelle.lip6.fr/rules/#null

  15. https://cwe.mitre.org/data/definitions/688.html

  16. https://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git/commit/?h=strings/rtc-no-func&id=762c5af234c5b816b7da3687a3e703cf8cdc2214

  17. https://www.kernel.org/doc/Documentation/printk-formats.txt

  18. https://github.com/torvalds/linux/commit/3fcb3c836ef413d3fc848288b308eb655e08d853

  19. https://cwe.mitre.org/data/definitions/200.html

  20. https://nvd.nist.gov/vuln/detail/CVE-2018-5995

  21. https://nvd.nist.gov/vuln/detail/CVE-2018-7273

  22. https://github.com/torvalds/linux/blob/v5.3/drivers/thermal/thermal_core.c#L1211

  23. https://github.com/torvalds/linux/blob/v5.3/drivers/usb/dwc2/gadget.c#L4846

  24. https://repo.or.cz/w/smatch.git

  25. https://sparse.wiki.kernel.org/

  26. https://github.com/ColinIanKing/kernelscan

References

  • Bagherzadeh M, Kahani N, Bezemer C-P, Hassan A E, Dingel J, Cordy J R (2018) Analyzing a decade of linux system calls. Empir Softw Eng 23(3):1519–1551

    Article  Google Scholar 

  • Bertero C, Roy M, Sauvanaud C, Trédan G (2017) Experience report: Log mining using natural language processing and application to anomaly detection. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 351–360

  • Bootlin (2020) Embedded linux kernel and driver development training. CreateSpace Independent Publishing Platform. https://github.com/bootlin/training-materials

  • Boslaugh S (2012) Statistics in a nutshell: A desktop quick reference. O’Reilly Media, Inc.

  • Chen B, Jiang Z M (2017) Characterizing and detecting anti-patterns in the logging code. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, pp 71–81

  • Chen B, Jiang Z M J (2017) Characterizing logging practices in java-based open source software projects–a replication study in apache software foundation. Empir Softw Eng 22(1):330–374

    Article  Google Scholar 

  • Chen B, Jiang Z M J (2019) Extracting and studying the logging-code-issue-introducing changes in java-based large-scale open source software systems. Empir Softw Eng 24(4):2285–2322

    Article  MathSciNet  Google Scholar 

  • Cinque M, Cotroneo D, Natella R, Pecchia A (2010) Assessing and improving the effectiveness of logs for the analysis of software faults. In: IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). IEEE, pp 457–466

  • Corbet J (2012) The perils of pr_info(). LWN net

  • Corbet J (2016) Tracepoint challenges. LWN net

  • Corbet J, Rubini A, Kroah-Hartman G (2005) Linux device drivers: Where the kernel meets the hardware. O’Reilly Media, Inc.

  • Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’15. USENIX Association, USA, pp 139–150

  • Edge J, 2019 Unifying kernel tracing. LWN net

  • El-Masri D, Petrillo F, Guéhéneuc Y-G, Hamou-Lhadj A, Bouziane A (2020) A systematic literature review on automated log abstraction techniques. Inf Softw Technol 122:106276

    Article  Google Scholar 

  • Fadel W (2011) Techniques for the abstraction of system call traces to facilitate the understanding of the behavioural aspects of the linux kernel. In: Master’s Thesis, Concordia University

  • Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: ACM/IEEE international conference on automated software engineering, ASE ’14. https://doi.org/10.1145/2642937.2642982, Vasteras, pp 313–324

  • Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp 24–33

  • Hamou-Lhadj A, Lethbridge TC (2002) Compression techniques to simplify the analysis of large execution traces. In: Proceedings 10th International Workshop on Program Comprehension. IEEE, pp 159–168

  • Hamou-Lhadj A, Lethbridge TC (2004) A survey of trace exploration tools and techniques. In: Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, pp 42–55

  • Hassani M, Shang W, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 23(6):3248–3280

    Article  Google Scholar 

  • He P, Chen Z, He S, Lyu M R (2018) Characterizing the natural language descriptions in software logging statements. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018. https://doi.org/10.1145/3238147.3238193. ACM, New York, pp 178–189

  • Islam M S, Khreich W, Hamou-Lhadj A (2018) Anomaly detection techniques based on kappa-pruned ensembles. IEEE Trans Reliab 67(1):212–229

    Article  Google Scholar 

  • Israeli A, Feitelson D G (2010) The linux kernel as a case study in software evolution. J Syst Softw 83(3):485–501

    Article  Google Scholar 

  • Kc K, Gu X (2011) Elt: Efficient log-based troubleshooting system for cloud computing infrastructures. In: 2011 IEEE 30th International Symposium on Reliable Distributed Systems. IEEE, pp 11–20

  • Khatuya S, Ganguly N, Basak J, Bharde M, Mitra B (2018) Adele: Anomaly detection from event log empiricism. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, pp 2114–2122

  • Lal S, Sardana N, Sureka A (2015) Two level empirical study of logging statements in open source java projects. Int J Open Source Softw Process (IJOSSP) 6(1):49–73

    Article  Google Scholar 

  • Li H, Chen T-H P, Shang W, Hassan A E (October 2018) Studying software logging using topic models. Empir Softw Engg 23(5):2655–2694. https://doi.org/10.1007/s10664-018-9595-8

  • Li H, Shang W, Adams B, Sayagh M, Hassan A E (2020) A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Trans Softw Eng

  • Li H, Shang W, Hassan A E (2017) Which log level should developers choose for a new logging statement?. Empir Softw Eng 22(4):1684–1716. https://doi.org/10.1007/s10664-016-9456-2

    Article  Google Scholar 

  • Li S, Niu X, Jia Z, Liao X, Wang J, Li T (2019a) Guiding log revisions by learning from software evolution history. Empirical Software Engineering, pp 1–39

  • Li Z, Chen TH, Yang J, Shang W (2019b) Dlfinder: Characterizing and detecting duplicate logging code smells. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, pp 152–163

  • Liu Z, Xia X, Lo D, Xing Z, Hassan A E, Li S (2019) Which variables should i log? IEEE Trans Softw Eng:1–1

  • Lotufo R, She S, Berger T, Czarnecki K, Wasowski A (2010) Evolution of the linux kernel variability model. In: International Conference on Software Product Lines. Springer, pp 136–150

  • Lu L, Arpaci-Dusseau A C, Arpaci-Dusseau R H, Lu S (2014) A study of linux file system evolution. ACM Trans Storage 10(1):1–32. https://doi.org/10.1145/2560012

    Article  Google Scholar 

  • Mazuera-Rozo A, Trubiani C, Linares-Vásquez M, Bavota G (2020) Investigating types and survivability of performance bugs in mobile apps. Empir Softw Eng:1–43

  • Miranskyy A, Hamou-Lhadj A, Cialini E, Larsson A (2016) Operational-log analysis for big data systems: Challenges and solutions. IEEE Softw 33 (2):52–59

    Article  Google Scholar 

  • Oliner A J, Aiken A, Stearley J (2008) Alert detection in system logs. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE, pp 959–964

  • Panthaplackel S, Nie P, Gligoric M, Li JJ, Mooney RJ (2020) Learning to update natural language comments based on code changes. 2004.12169

  • Passos L, Czarnecki K, Wasowski A (2012) Towards a catalog of variability evolution patterns: the Linux kernel case. In: Proceedings of the 4th International Workshop on Feature-Oriented Software Development - FOSD ’12. http://dl.acm.org/citation.cfm?doid=2377816.2377825. ACM Press, Dresden, Germany, pp 62–69

  • Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 2. IEEE, pp 169–178

  • Pi A, Chen W, Zhou X (2018) Profiling distributed systems in lightweight virtualized environments with logs and resource metrics. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC’18. https://doi.org/10.1145/3220192.3220197. Association for Computing Machinery, New York, pp 9–10

  • Ran CA (2019) Studying and leveraging user-provided logs in bug reports for debugging assistance, https://spectrum.library.concordia.ca/985950/

  • Shang W, Jiang Z M, Adams B, Hassan A E, Godfrey M W, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw: Evol Process 26 (1):3–26. https://doi.org/10.1002/smr.1579

    Google Scholar 

  • Shang W, Nagappan M, Hassan A E (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27

    Article  Google Scholar 

  • Sigelman BH, Barroso LA, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C (2010) Dapper, a large-scale distributed systems tracing infrastructure. Tech. rep., Google, Inc., https://research.google.com/archive/papers/dapper-2010-1.pdf

  • Tian J, Rudraraju S, Li Z (2004) Evaluating web software reliability based on workload and failure data extracted from server logs. IEEE Trans Softw Eng 30(11):754–769

    Article  Google Scholar 

  • Tschudin P S, Lawall J, Muller G (2015) 3l: Learning linux logging. In: BElgian-NEtherlands software eVOLution seminar (BENEVOL 2015)

  • Yang S, Park S J, Ousterhout J (2018) Nanolog: A nanosecond scale logging system. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, pp 335–350

  • Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, Kirda E (2013) Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th Annual Computer Security Applications Conference, pp 199–208

  • Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering. IEEE Press, pp 102–112

  • Yuan D, Zheng J, Park S, Zhou Y, Savage S (February 2012) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30 (1):4:1–4:28. https://doi.org/10.1145/2110356.2110360

  • Zeng Y, Chen J, Shang W, Chen T-H P (2019) Studying the characteristics of logging practices in mobile apps: a case study on f-droid. Empir Softw Eng 24(6):3394–3434

    Article  Google Scholar 

  • Zhao X, Rodrigues K, Luo Y, Stumm M, Yuan D Y, Zhou Y (2017) The game of twenty questions: Do you know where to log?. In: 16th Workshop on Hot Topics in Operating Systems (HotOS), pp 125–131

  • Zhou R, Hamdaqa M, Cai H, Hamou-Lhadj A (2020) Mobilogleak: A preliminary study on data leakage caused by poor logging practices. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 577–581

  • Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15. http://dl.acm.org/citation.cfm?id=2818754.2818807. IEEE Press, Piscataway, pp 415–425

Download references

Acknowledgements

Abdelwahab Hamou-Lhadj and Keyur Patel would like to thank Ericsson Global Artificial Intelligence Accelerator (GAIA) Group in Montreal and MITACS for supporting this project (Grant Number: IT15986). Ingrid Nunes would like to for CNPq grants ref. 313357/2018-8 and ref. 428157/2018-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. João Faccin would like to acknowledge the support of the National Council for Scientific and Technological Development of Brazil (CNPq) (grant ref. 141840/2016-1), and the support of the Government of Canada through the Emerging Leaders in the Americas Program (ELAP) program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdelwahab Hamou-Lhadj.

Additional information

Communicated by: Martin Monperrus

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Patel, K., Faccin, J., Hamou-Lhadj, A. et al. The sense of logging in the Linux kernel. Empir Software Eng 27, 153 (2022). https://doi.org/10.1007/s10664-022-10136-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10136-3

Keywords

Navigation