Skip to main content
Log in

Towards an understanding of memory leak patterns: an empirical study in Python

  • Research
  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Memory leaks, an important and difficult issue in software development, occur when an object is inadvertently retained longer than necessary. Programming languages provide a variety of dynamic memory management methods to support programmers in preventing the introduction of defects that cause memory leaks. However, it is not yet possible to completely free programmers from the work of memory management. Indeed, runtime leak detection is time consuming and usually done after the fact, while manual code inspection requires rich developer experience. Understanding the common patterns of memory leaks can help developers be mindful of leaks or avoid them at an earlier stage during the development process and may further inspire future research. Eight code patterns are found in our case study specifically for memory leaks caused by circular references in Python. The observed patterns can explain 91.64% of the memory leaks in the studied projects. Our work can guide important decisions about the possibility of identifying memory leaks with static code analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The authors confirm that the data supporting the findings of this study are available within the article and its supplementary materials.

Notes

  1. https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172

  2. CPython provides a simple and effective way to remove one type of reference: weak references. A weak reference is a reference that does not protect an object during garbage collection. A Python programmer can easily create weak references to objects with the weakref module. See https://docs.python.org/3/library/weakref.html.

  3. https://pypi.org/project/properform/

  4. https://github.com/cython/cython/blob/master/Cython/Compiler/Symtab.py

  5. https://github.com/html5lib/html5lib-python/blob/master/html5lib/_tokenizer.py

  6. https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/axis.py, https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/ticker.py

  7. https://github.com/zopefoundation/ZODB/blob/master/src/ZODB/mvccadapter.py

  8. https://github.com/lxml/lxml/blob/master/src/lxml/tests/test_xmlschema.py

  9. https://github.com/micheles/decorator/blob/master/src/decorator.py

  10. https://github.com/python/cpython/blob/2.7/Lib/collections.py

  11. https://github.com/python/cpython/blob/main/Lib/collections/__init__.py

  12. https://github.com/pycrypto/pycrypto/blob/master/lib/Crypto/Cipher/PKCS1_OAEP.py

  13. https://github.com/python/cpython/blob/main/Lib/json/encoder.py

References

  • Campos, E. C., & Maia, M. D. A. (2017). Common bug-fix patterns: a large-scale observational study. In ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM.

  • Cherem, S., Princehouse, L., & Rugina. R. (2007). Practical memory leak detection using guarded value-flow analysis. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 480–491.

  • Christopher, T. W. (1984). Reference count garbage collection. Software: Practice and Experience, 14(6), 503–507.

    Google Scholar 

  • Clause, J., & Orso, A. (2010). Leakpoint: Pinpointing the causes of memory leaks. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (vol. 1), ICSE ’10, pp. 515–524.

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT Press, Cambridge, MA, third edition.

  • Distefano, D., & Filipovic, I. (2010). Memory leaks detection in Java by bi-abductive inference. In International Conference on Fundamental Approaches to Software Engineering (FASE), pp. 278–292.

  • Fan, G., Wu, R., Shi, Q., et al. (2019). Smoke: Scalable path-sensitive memory leak detection for millions of lines of code. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE.

  • Ghanavati, M., Costa, D., Seboek, J., et al. (2020). Memory and resource leak defects and their repairs in Java projects. Empirical Software Engineering, 25, 678–718. https://doi.org/10.1007/s10664-019-09731-8

    Article  Google Scholar 

  • Hanam, Q., Brito, F. S. D. M., & Mesbah, A. (2016). Discovering bug patterns in JavaScript. ACM Sigsoft International Symposium on Foundations of Software Engineering. ACM, 2016, 144–156.

  • Hu, M., & Zhang, Y. (2020). The Python/C API: Evolution, usage statistics, and bug patterns. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE.

  • Jump, M., & McKinley, K. S. (2007). Cork: Dynamic memory leak detection for garbage-collected languages. In Proceedings of the 34th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 31–38.

  • Jung, C., Lee, S., Raman, E., & Pande, S. (2014). Automated memory leak detection for production use. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 825–836.

  • Liu, T., Curtsinger, C., & Berger, E. D. (2016). DoubleTake: Fast and precise error detection via evidence-based dynamic analysis. In IEEE/ACM International Conference on Software Engineering. IEEE.

  • Lo, D., Nagappan, N., & Zimmermann, T. (2015). How practitioners perceive the relevance of software engineering research. In Joint Meeting on Foundations of Software Engineering. ACM.

  • McBeth, J. H. (1963). On the reference counter method. Communications of ACM, 6(9), 575–584.

    Article  Google Scholar 

  • McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by machine, part I. Communications of the ACM, 3(4), 184–195.

    Article  MATH  Google Scholar 

  • Orlovich, M., & Rugina, R. (2006). Memory leak analysis by contradiction. In Static Analysis Symposium (SAS), pp. 405–424.

  • Pan, K., Kim, S., & Whitehead, E. J. (2009) Toward an understanding of bug fix patterns. Empirical Software Engineering, 14(3), 286–315.

  • Python Programming Language Homepage. Retrieved June 13, 2023, from https://www.python.org

  • Retrieved June 13, 2023, from https://github.com/benfred/github-analysis/#inferring-languages

  • Shaham, R., Kolodner, E. K., & Sagiv, M. (2000). Automatic removal of array memory leaks in Java. In International Conference on Compiler Construction (CC), pp. 50–66.

  • Sor, V., & Srirama, S. N. (2014). Memory leak detection in Java: Taxonomy and classification of approaches. Journal of Systems and Software, 96, 139–151.

  • Sui, Y., Ye, D., & Xue, J. (2014). Detecting memory leaks statically with full-sparse value-flow analysis. IEEE Transactions on Software Engineering, 40(2), 107–122.

    Article  Google Scholar 

  • Sun, X., Xu, S., Guo, C., et al. (2018). A projection-based approach for memory leak detection. In IEEE Computer Software & Applications Conference. IEEE.

  • Tan, L., Liu, C., Li, Z., et al. (2014). Bug characteristics in open source software. Empirical Software Engineering, 19(6), 1665–1705.

    Article  Google Scholar 

  • Xu, G., & Rountev, A. (2008). Precise memory leak detection for java software using container profiling. In Proceedings of the 30th International Conference on Software Engineering, ICSE ’08, pp. 151–160.

  • Xu, G., Bond, M. D., Qin, F., & Rountev, A. (2011). Leakchaser: Helping programmers narrow down causes of memory leaks. In Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 270–282.

  • Yan, D., Xu, G., Yang, S., et al. (2014). LeakChecker: Practical static memory leak detection for managed languages. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM.

  • Zhong, H., & Su, Z. (2015). An empirical study on real bug fixes. In IEEE/ACM IEEE International Conference on Software Engineering. ACM.

Download references

Funding

This work was supported by the National Science Foundation of China (No. 61702144), the Zhejiang Provincial National Science Foundation of China (No. LQ17F020003).

Author information

Authors and Affiliations

Authors

Contributions

Jie Chen, Dongjin Yu, and Haiyang Hu contributed to the conception of the study. Jie Chen performed the data collection and experiment and drafted the manuscript. Dongjin Yu and Haiyang Hu helped perform the analysis with constructive discussions and made important modifications to the manuscript.

Corresponding author

Correspondence to Jie Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Yu, D. & Hu, H. Towards an understanding of memory leak patterns: an empirical study in Python. Software Qual J 31, 1303–1330 (2023). https://doi.org/10.1007/s11219-023-09641-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-023-09641-5

Keywords

Navigation