Abstract
Logging plays a crucial role in software engineering because it is key to perform various tasks including debugging, performance analysis, and detection of anomalies. Despite the importance of log data, the practice of logging still suffers from the lack of common guidelines and best practices. Recent studies investigated logging in C/C++ and Java open-source systems. In this paper, we complement these studies by conducting the first empirical study on logging practices in the Linux kernel, one of the most elaborate open-source development projects in the computer industry. We analyze 22 Linux releases with a focus on three main aspects: the pervasiveness of logging in Linux, the types of changes made to logging statements, and the rationale behind these changes. Our findings show that logging code accounts for 3.73% of the total source code in the Linux kernel, distributed across 72.36% of Linux files. We also found that the distribution of logging statements across Linux subsystems and their components vary significantly with no apparent reasons, suggesting that developers use different criteria when logging. In addition, we observed a slow decrease in the use of logging—reduction of 9.27% between versions v4.3 and v5.3. The majority of changes in logging code are made to fix language issues, modify log levels, and upgrade logging code to use new logging libraries, with the overall goal of improving the precision and consistency of the log output. Many recommendations are derived from our findings such as the use of static analysis tools to detect log-related issues, the adoption of common writing styles to improve the quality of log messages, the development of conventions to guide developers when selecting log levels, the establishment of review sessions to review logging code, and so on. Our recommendations can serve as a basis for developing logging guidelines as well as better logging processes, tools, and techniques.
Similar content being viewed by others
Notes
Note that the change made to the logging statement is not semantically equivalent to the original statement, but with the bug fixed. The goal in the change was to fix the NULL pointer and the change was accepted to fix this issue.
References
Bagherzadeh M, Kahani N, Bezemer C-P, Hassan A E, Dingel J, Cordy J R (2018) Analyzing a decade of linux system calls. Empir Softw Eng 23(3):1519–1551
Bertero C, Roy M, Sauvanaud C, Trédan G (2017) Experience report: Log mining using natural language processing and application to anomaly detection. In: 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). IEEE, pp 351–360
Bootlin (2020) Embedded linux kernel and driver development training. CreateSpace Independent Publishing Platform. https://github.com/bootlin/training-materials
Boslaugh S (2012) Statistics in a nutshell: A desktop quick reference. O’Reilly Media, Inc.
Chen B, Jiang Z M (2017) Characterizing and detecting anti-patterns in the logging code. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, pp 71–81
Chen B, Jiang Z M J (2017) Characterizing logging practices in java-based open source software projects–a replication study in apache software foundation. Empir Softw Eng 22(1):330–374
Chen B, Jiang Z M J (2019) Extracting and studying the logging-code-issue-introducing changes in java-based large-scale open source software systems. Empir Softw Eng 24(4):2285–2322
Cinque M, Cotroneo D, Natella R, Pecchia A (2010) Assessing and improving the effectiveness of logs for the analysis of software faults. In: IEEE/IFIP International Conference on Dependable Systems & Networks (DSN). IEEE, pp 457–466
Corbet J (2012) The perils of pr_info(). LWN net
Corbet J (2016) Tracepoint challenges. LWN net
Corbet J, Rubini A, Kroah-Hartman G (2005) Linux device drivers: Where the kernel meets the hardware. O’Reilly Media, Inc.
Ding R, Zhou H, Lou JG, Zhang H, Lin Q, Fu Q, Zhang D, Xie T (2015) Log2: A cost-aware logging mechanism for performance diagnosis. In: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’15. USENIX Association, USA, pp 139–150
Edge J, 2019 Unifying kernel tracing. LWN net
El-Masri D, Petrillo F, Guéhéneuc Y-G, Hamou-Lhadj A, Bouziane A (2020) A systematic literature review on automated log abstraction techniques. Inf Softw Technol 122:106276
Fadel W (2011) Techniques for the abstraction of system call traces to facilitate the understanding of the behavioural aspects of the linux kernel. In: Master’s Thesis, Concordia University
Falleri J-R, Morandat F, Blanc X, Martinez M, Monperrus M (2014) Fine-grained and accurate source code differencing. In: ACM/IEEE international conference on automated software engineering, ASE ’14. https://doi.org/10.1145/2642937.2642982, Vasteras, pp 313–324
Fu Q, Zhu J, Hu W, Lou J-G, Ding R, Lin Q, Zhang D, Xie T (2014) Where do developers log? an empirical study on logging practices in industry. In: Companion Proceedings of the 36th International Conference on Software Engineering, pp 24–33
Hamou-Lhadj A, Lethbridge TC (2002) Compression techniques to simplify the analysis of large execution traces. In: Proceedings 10th International Workshop on Program Comprehension. IEEE, pp 159–168
Hamou-Lhadj A, Lethbridge TC (2004) A survey of trace exploration tools and techniques. In: Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research, pp 42–55
Hassani M, Shang W, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 23(6):3248–3280
He P, Chen Z, He S, Lyu M R (2018) Characterizing the natural language descriptions in software logging statements. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018. https://doi.org/10.1145/3238147.3238193. ACM, New York, pp 178–189
Islam M S, Khreich W, Hamou-Lhadj A (2018) Anomaly detection techniques based on kappa-pruned ensembles. IEEE Trans Reliab 67(1):212–229
Israeli A, Feitelson D G (2010) The linux kernel as a case study in software evolution. J Syst Softw 83(3):485–501
Kc K, Gu X (2011) Elt: Efficient log-based troubleshooting system for cloud computing infrastructures. In: 2011 IEEE 30th International Symposium on Reliable Distributed Systems. IEEE, pp 11–20
Khatuya S, Ganguly N, Basak J, Bharde M, Mitra B (2018) Adele: Anomaly detection from event log empiricism. In: IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, pp 2114–2122
Lal S, Sardana N, Sureka A (2015) Two level empirical study of logging statements in open source java projects. Int J Open Source Softw Process (IJOSSP) 6(1):49–73
Li H, Chen T-H P, Shang W, Hassan A E (October 2018) Studying software logging using topic models. Empir Softw Engg 23(5):2655–2694. https://doi.org/10.1007/s10664-018-9595-8
Li H, Shang W, Adams B, Sayagh M, Hassan A E (2020) A qualitative study of the benefits and costs of logging from developers’ perspectives. IEEE Trans Softw Eng
Li H, Shang W, Hassan A E (2017) Which log level should developers choose for a new logging statement?. Empir Softw Eng 22(4):1684–1716. https://doi.org/10.1007/s10664-016-9456-2
Li S, Niu X, Jia Z, Liao X, Wang J, Li T (2019a) Guiding log revisions by learning from software evolution history. Empirical Software Engineering, pp 1–39
Li Z, Chen TH, Yang J, Shang W (2019b) Dlfinder: Characterizing and detecting duplicate logging code smells. In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, pp 152–163
Liu Z, Xia X, Lo D, Xing Z, Hassan A E, Li S (2019) Which variables should i log? IEEE Trans Softw Eng:1–1
Lotufo R, She S, Berger T, Czarnecki K, Wasowski A (2010) Evolution of the linux kernel variability model. In: International Conference on Software Product Lines. Springer, pp 136–150
Lu L, Arpaci-Dusseau A C, Arpaci-Dusseau R H, Lu S (2014) A study of linux file system evolution. ACM Trans Storage 10(1):1–32. https://doi.org/10.1145/2560012
Mazuera-Rozo A, Trubiani C, Linares-Vásquez M, Bavota G (2020) Investigating types and survivability of performance bugs in mobile apps. Empir Softw Eng:1–43
Miranskyy A, Hamou-Lhadj A, Cialini E, Larsson A (2016) Operational-log analysis for big data systems: Challenges and solutions. IEEE Softw 33 (2):52–59
Oliner A J, Aiken A, Stearley J (2008) Alert detection in system logs. In: 2008 Eighth IEEE International Conference on Data Mining. IEEE, pp 959–964
Panthaplackel S, Nie P, Gligoric M, Li JJ, Mooney RJ (2020) Learning to update natural language comments based on code changes. 2004.12169
Passos L, Czarnecki K, Wasowski A (2012) Towards a catalog of variability evolution patterns: the Linux kernel case. In: Proceedings of the 4th International Workshop on Feature-Oriented Software Development - FOSD ’12. http://dl.acm.org/citation.cfm?doid=2377816.2377825. ACM Press, Dresden, Germany, pp 62–69
Pecchia A, Cinque M, Carrozza G, Cotroneo D (2015) Industry practices and event logging: Assessment of a critical software development process. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 2. IEEE, pp 169–178
Pi A, Chen W, Zhou X (2018) Profiling distributed systems in lightweight virtualized environments with logs and resource metrics. In: Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC’18. https://doi.org/10.1145/3220192.3220197. Association for Computing Machinery, New York, pp 9–10
Ran CA (2019) Studying and leveraging user-provided logs in bug reports for debugging assistance, https://spectrum.library.concordia.ca/985950/
Shang W, Jiang Z M, Adams B, Hassan A E, Godfrey M W, Nasser M, Flora P (2014) An exploratory study of the evolution of communicated information about the execution of large software systems. J Softw: Evol Process 26 (1):3–26. https://doi.org/10.1002/smr.1579
Shang W, Nagappan M, Hassan A E (2015) Studying the relationship between logging characteristics and the code quality of platform software. Empir Softw Eng 20(1):1–27
Sigelman BH, Barroso LA, Burrows M, Stephenson P, Plakal M, Beaver D, Jaspan S, Shanbhag C (2010) Dapper, a large-scale distributed systems tracing infrastructure. Tech. rep., Google, Inc., https://research.google.com/archive/papers/dapper-2010-1.pdf
Tian J, Rudraraju S, Li Z (2004) Evaluating web software reliability based on workload and failure data extracted from server logs. IEEE Trans Softw Eng 30(11):754–769
Tschudin P S, Lawall J, Muller G (2015) 3l: Learning linux logging. In: BElgian-NEtherlands software eVOLution seminar (BENEVOL 2015)
Yang S, Park S J, Ousterhout J (2018) Nanolog: A nanosecond scale logging system. In: 2018 USENIX Annual Technical Conference (USENIX ATC 18). USENIX Association, pp 335–350
Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, Kirda E (2013) Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th Annual Computer Security Applications Conference, pp 199–208
Yuan D, Park S, Zhou Y (2012) Characterizing logging practices in open-source software. In: Proceedings of the 34th International Conference on Software Engineering. IEEE Press, pp 102–112
Yuan D, Zheng J, Park S, Zhou Y, Savage S (February 2012) Improving software diagnosability via log enhancement. ACM Trans Comput Syst 30 (1):4:1–4:28. https://doi.org/10.1145/2110356.2110360
Zeng Y, Chen J, Shang W, Chen T-H P (2019) Studying the characteristics of logging practices in mobile apps: a case study on f-droid. Empir Softw Eng 24(6):3394–3434
Zhao X, Rodrigues K, Luo Y, Stumm M, Yuan D Y, Zhou Y (2017) The game of twenty questions: Do you know where to log?. In: 16th Workshop on Hot Topics in Operating Systems (HotOS), pp 125–131
Zhou R, Hamdaqa M, Cai H, Hamou-Lhadj A (2020) Mobilogleak: A preliminary study on data leakage caused by poor logging practices. In: 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, pp 577–581
Zhu J, He P, Fu Q, Zhang H, Lyu M R, Zhang D (2015) Learning to log: Helping developers make informed logging decisions. In: Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15. http://dl.acm.org/citation.cfm?id=2818754.2818807. IEEE Press, Piscataway, pp 415–425
Acknowledgements
Abdelwahab Hamou-Lhadj and Keyur Patel would like to thank Ericsson Global Artificial Intelligence Accelerator (GAIA) Group in Montreal and MITACS for supporting this project (Grant Number: IT15986). Ingrid Nunes would like to for CNPq grants ref. 313357/2018-8 and ref. 428157/2018-1, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. João Faccin would like to acknowledge the support of the National Council for Scientific and Technological Development of Brazil (CNPq) (grant ref. 141840/2016-1), and the support of the Government of Canada through the Emerging Leaders in the Americas Program (ELAP) program.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Martin Monperrus
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Patel, K., Faccin, J., Hamou-Lhadj, A. et al. The sense of logging in the Linux kernel. Empir Software Eng 27, 153 (2022). https://doi.org/10.1007/s10664-022-10136-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10136-3