Skip to main content
Log in

Towards understanding bugs in Python interpreters

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Python has been widely used to develop large-scale software systems such as distributed systems, cloud computing, artificial intelligence, and Web platforms due to its flexibility and versatility. As a kind of complex software, Python interpreter could also suffer from software bugs and thus fundamentally threaten the quality of all Python program applications. Since the first release of Python, more than 30,000 bugs have been discovered. While modern interpreters often consist of many modules, built-in libraries, extensions, etc, they could reach millions of code lines. The large size and high complexity of interpreters bring substantial challenges to their quality assurance.

To characterize the interpreter bugs and provide empirical supports, this paper conducts a large-scale empirical study on the two most popular Python interpreters – CPython and PyPy. We have comprehensively investigated the maintenance log information and collected 30,069 fixed bugs and 20,334 confirmed revisions. We further manually characterized and taxonomized 1200 bugs to investigate their representative symptoms and root causes deeply. Finally, we identified nine findings by comprehensively investigating bug locations, symptoms, root causes, and bug revealing & fixing time. The key findings include (for both interpreters): (1) the Library, object model, and interpreter back-end are the most buggy components; (2) unexpected behavior, crash, and performance are the most common symptoms; (3) incorrect algorithm logic, configuration, and internal call are the most common general root causes; incorrect object design is the most common Python-specific root cause; (4) some test-program triggering bugs are tiny (less than ten lines), and most bug fixes only involve slight modifications. Depending on these findings, we discuss the lessons learned and practical implications that can support the research on interpreters’ testing, debugging, and improvements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://github.com/Python/cPython

  2. https://www.PyPy.org

  3. https://www.jython.org

  4. https://ironPython.net

  5. https://bugs.Python.org

  6. https://bitbucket.org/blog/sunsetting-mercurial-support-in-bitbucket

  7. https://doc.pypy.org/en/latest/faq.html?highlight=bug#why-doesn-t-pypy-use-git-and-move-to-github

  8. https://bugs.Python.org/issue12144

  9. https://www.Python.org/dev/peps/

  10. https://devguide.Python.org/triaging/

  11. https://bugs.Python.org/issue23911

  12. https://bugs.Python.org/issue24245

  13. https://bugs.Python.org/issue6007

  14. https://www.Python.org/dev/peps/pep-0617/

  15. https://github.com/Python/cPython/pull/16438

  16. https://bugs.Python.org/issue33083

  17. https://bugs.Python.org/issue33871

  18. https://foss.heptapod.net/PyPy/PyPy/-/issues/1784

  19. https://bugs.Python.org/issue31673

  20. https://bugs.Python.org/issue34987,

  21. https://bugs.Python.org/issue36374

  22. https://github.com/Python/cPython/tree/master/Doc/library

  23. https://doc.pypy.org/en/latest/stackless.html?highlight=Coroutines#unimplemented-features

  24. https://bugs.Python.org/issue41697

  25. https://bugs.Python.org/issue37424

  26. https://bugs.Python.org/issue38121

  27. https://rPython.readthedocs.io/en/latest/jit/overview.html#id4

  28. https://www.Python.org/dev/peps/pep-0484/

  29. https://doc.pypy.org/en/latest/stm.html

  30. https://lcamtuf.coredump.cx/afl

  31. https://llvm.org/docs/LibFuzzer.html

  32. https://honggfuzz.dev/

References

  • Acuña R, Lacroix Z, Bazzi RA (2015) Instrumentation and trace analysis for ad-hoc Python workflows in cloud environments. In: 2015 IEEE 8th international conference on cloud computing. IEEE, pp 114–121

  • Atwi H, Lin B, Tsantalis N, Kashiwa Y, Kamei Y, Ubayashi N, Bavota G, Lanza M (2021) PYREF: refactoring detection in Python projects. In: 2021 IEEE 21st international working conference on source code analysis and manipulation (SCAM). pp 136–141

  • Biswas S, Islam M, Huang Y, Rajan H (2019) Boa meets Python: a boa dataset of data science software in Python language. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). pp 577–581

  • Cacho N, Barbosa EA, Araujo J, Pranto F, Garcia A, Cesar T, Soares E, Cassio A, Filipe T, Garcia I (2014) How does exception handling behavior evolve? an exploratory study in java and c# applications. In: 2014 IEEE international conference on software maintenance and evolution. IEEE, pp 31–40

  • Calmant T, Americo JC, Gattaz O, Donsez D, Gama K (2012) A dynamic and service-oriented component model for Python long-lived applications. In: Proceedings of the 15th ACM SIGSOFT symposium on component based software engineering, pp 35–40

  • Cao H, Gu N, Ren K, Li Y (2015) Performance research and optimization on cPython’s interpreter, IEEE, FedCSIS

  • Chen Z, Chen L, Zhou Y, Xu Z, Chu WC, Xu B (2014) Dynamic slicing of Python programs. In: 2014 IEEE 38th annual computer software and applications conference. IEEE, pp 219–228

  • Chen Z, Ma W, Lin W, Chen L, Li Y, Xu B (2017) A study on the changes of dynamic feature code when fixing bugs: towards the benefits and costs of Python dynamic features. Sci China Inf Sci 61:1–18

    Google Scholar 

  • Chen Z, Ma W, Lin W, Chen L, Xu B (2016b) Tracking down dynamic feature code changes against Python software evolution. In: 2016 third international conference on trustworthy systems and their applications (TSA). IEEE, pp 54–63

  • Chen J, Patra J, Pradel M, Xiong Y, Zhang, Hao D (2020) A survey of compiler testing, vol 53

  • Chen Y, Su T, Su Z (2019) Deep differential testing of jvm implementations. In: 2019 IEEE/ACM 41St international conference on software engineering (ICSE). IEEE, pp 1257–1268

  • Chen Y, Su T, Sun C, Su Z, Zhao J (2016a) Coverage-directed differential testing of jvm implementations. In: Proceedings of the 37th ACM SIGPLAN conference on programming language design and implementation, pp 85–99

  • Cummins C, Petoumenos P, Murray A, Leather H (2018) Compiler fuzzing through deep learning. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 95–105

  • Dalcin LD, Paz RR, Kler PA, Cosimo A (2011) Parallel distributed computing using Python. Adv Water Resour 34(9):1124–1139

    Article  Google Scholar 

  • Delgado-Pérez P, Medina-Bulo I, Segura S, García-Domínguez A, José J (2017) Gigan: evolutionary mutation testing for c++ object-oriented systems. In: Proceedings of the symposium on applied computing, pp 1387–1392

  • Di Franco A, Guo H, Rubio-gonzález C (2017) A comprehensive study of real-world numerical bug characteristics. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 509–519

  • Forcier J, Bissex P, Chun WJ (2008) Python web development with Django. Addison-Wesley professional

  • Gao Y, Dou W, Qin F, Gao C, Wang D, Wei J, Huang R, Zhou L, Wu Y (2018) An empirical study on crash recovery bugs in large-scale distributed systems. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 539–550

  • Garcia J, Feng Y, Shen J, Almanee S, Xia Y, Chen Q (2020) A comprehensive study of autonomous vehicle bugs. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, pp 385–396

  • Ghanbari A, Benton S, Zhang L (2019) Practical program repair via bytecode mutation. In: Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, pp 19–30

  • Gharibi G, Tripathi R, Lee Y (2018) Code2graph: automatic generation of static call graphs for Python source code. In: Proceedings Of The 33rd ACM/IEEE international conference on automated software engineering, pp 880–883

  • Guo PJ, Engler DR (2009) Linux kernel developer responses to static analysis bug reports. In: USENIX annual technical conference, pp 285–292

  • Holler C, Herzig K, Zeller A (2012) Fuzzing with code fragments. In: 21St {USENIX} security symposium ({USENIX} security 12), pp 445–458

  • Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 510–520

  • Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. ACM SIGPLAN Not 47(6):77–88

    Article  Google Scholar 

  • Koroglu Y, Wotawa F (2019) Fully automated compiler testing of a reasoning engine via mutated grammar fuzzing. In: 2019 IEEE/ACM 14Th international workshop on automation of software test (AST). IEEE, pp 28–34

  • Koyuncu A, Liu K, Bissyandé T, Kim D, Klein J, Monperrus M, Le Traon Y (2020) Fixminer: mining relevant fix patterns for automated program repair. Empir Softw Eng 25:1980–2024

    Article  Google Scholar 

  • Le V, Afshari M, Su Z (2014) Compiler validation via equivalence modulo inputs. ACM SIGPLAN Not 49(6):216–226

    Article  Google Scholar 

  • Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012) A Systematic study of automated program repair: Fixing 55 out of 105 bugs for 8 each. In: 2012 34Th international conference on software engineering (ICSE). IEEE, pp 3–13

  • Le V, Sun C, Su Z (2015a) Finding deep compiler bugs via guided stochastic program mutation. ACM SIGPLAN Not 50(10):386–399

    Article  Google Scholar 

  • Le V, Sun C, Su Z (2015b) Randomized stress-testing of link-time optimizers. In: Proceedings of the 2015 international symposium on software testing and analysis, pp 327–337

  • Leesatapornwongsa T, Lukman JF, Lu S, Gunawi HS (2016) Taxdc: a taxonomy of non-deterministic concurrency bugs in datacenter distributed systems. In: Proceedings of the twenty-first international conference on architectural support for programming languages and operating systems, pp 517–530

  • Leo S, Zanetti G (2010) Pydoop: a Python mapreduce and hdfs api for hadoop. In: Proceedings of the 19th ACM international symposium on high performance distributed computing, pp 819–825

  • Lidbury C, Lascu A, Chong N, Donaldson AF (2015) Many-core compiler fuzzing. ACM SIGPLAN Not 50(6):65–76

    Article  Google Scholar 

  • Liu K, Kim D, Bissyandé T, Yoo S, Le Traon Y (2018) Mining fix patterns for findbugs violations. IEEE Trans Software Eng 47:165–188

    Article  Google Scholar 

  • Livinskii V, Babokin D, Regehr J (2020) Random testing for c and c++ compilers with yarpgen. Proc ACM Program Language 4(OOPSLA):1–25

    Article  Google Scholar 

  • Lu S, Park S, Seo E, Zhou Y (2008) Learning from mistakes: a comprehensive study on real world concurrency bug characteristics. In: Proceedings of the 13th international conference on architectural support for programming languages and operating systems, pp 329–339

  • Lukasczyk S, Kroiß F, Fraser G (2021) An empirical study of automated unit test generation for Python. arXiv:2111.05003

  • Midtgaard J, Justesen MN, Kasting P, Nielson F, Nielson HR (2017) Effect-driven quickchecking of compilers. Proc ACM Program Language 1 (ICFP):1–23

    Article  Google Scholar 

  • Miller BP, Fredriksen L, So B (1990) An empirical study of the reliability of unix utilities. Commun ACM 33(12):32–44

    Article  Google Scholar 

  • Motwani M, Sankaranarayanan S, Just R, Brun Y (2018) Do automated program repair techniques repair hard and important bugs? Empir Softw Eng 23(5):2901–2947

    Article  Google Scholar 

  • Nagai E, Hashimoto A, Ishiura N (2014) Reinforcing random testing of arithmetic optimization of c compilers by scaling up size and number of expressions. IPSJ Trans Syst LSI Design Methodol 7:91–100

    Article  Google Scholar 

  • Orrú M, Tempero E, Marchesi M, Tonelli R, Destefanis G (2015) A curated benchmark collection of Python systems for empirical studies on software engineering. Proceedings Of The 11th international conference on predictive models and data analytics in software engineering, pp 1–4

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Perez F, Granger BE, Hunter JD (2010) Python: an ecosystem for scientific computing. Comput Sci Eng 13(2):13–21

    Article  Google Scholar 

  • Raschka S (2015) Python machine learning. Packt publishing ltd

  • Reynolds JC (1972) Definitional interpreters for higher-order programming languages. In: Proceedings of the ACM annual conference-Volume 2, pp 717–740

  • Seaman CB, Shull F, Regardie M, Elbert D, Feldmann RL, Guo Y, Godfrey S (2008) Defect categorization: making use of a decade of widely varying historical data. In: Proceedings of the second ACM-IEEE international symposium on Empirical software engineering and measurement, pp 149–157

  • Selakovic M, Pradel M (2016) Performance issues and optimizations in javascript: an empirical study. In: Proceedings of the 38th international conference on software engineering, pp 61–72

  • Shen Q, Ma H, Chen J, Tian Y, Cheung SC, Chen X (2021) A comprehensive study of deep learning compiler bugs. In: Proceedings of the 29th ACM Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 968–980

  • Srinath K (2017) Python–the fastest growing programming language. Int Res J Eng Technol (IRJET) 4(12):354–357

    Google Scholar 

  • Sun C, Le V, Su Z (2016a) Finding compiler bugs via live code mutation. In: Proceedings of the 2016 ACM SIGPLAN international conference on object-oriented programming, systems, languages, and applications, pp 849–863

  • Sun C, Le V, Zhang, Su Z (2016b) Toward understanding compiler bugs in gcc and llvm. In: Proceedings of the 25th international symposium on software testing and analysis, pp 294–305

  • Thung F, Wang S, Lo D, Jiang L (2012) An empirical study of bugs in machine learning systems. In: 2012 IEEE 23rd international symposium on software reliability engineering. IEEE, pp 271–280

  • Tian Y, Ray B (2017) Automatically diagnosing and repairing error handling bugs in c. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 752–762

  • Vasilescu B, Yu Y, Wang H, Devanbu P, Filkov V (2015) Quality and productivity outcomes relating to continuous integration in github. In: Proceedings of the 2015 10th joint meeting on foundations of software engineering, pp 805–816

  • Veggalam S, Rawat S, Haller I, Bos H (2016) Ifuzzer: an evolutionary interpreter fuzzer using genetic programming. In: European symposium on research in computer security. Springer, pp 581–601

  • Wan Z, Lo D, Xia X, Cai L (2017) Bug characteristics in blockchain systems: a large-scale empirical study. In: 2017 IEEE/ACM 14Th international conference on mining software repositories (MSR). IEEE, pp 413–424

  • Wang B, Chen L, Ma W, Chen Z, Xu B (2015) An empirical study on the impact of Python dynamic features on change-proneness. In: SEKE, pp 134–139

  • Wang J, Dou W, Gao Y, Gao C, Qin F, Yin K, Wei J (2017) A comprehensive study on real world concurrency bugs in node. js. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 520–531

  • Widyasari R, Sim S, Lok C, Qi H, Phan J, Tay Q, Tan C, Wee F, Tan J, Yieh Y et al (2020) Bugsinpy: a database of existing bugs in Python programs to enable controlled testing and debugging studies. In: Proceedings Of the 28th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 1556–1560

  • Xia X, Bao L, Lo D, Li S (2016) “automated debugging considered harmful” considered harmful: a user study revisiting the usefulness of spectra-based fault localization techniques with professionals using real bugs from large systems. In: 2016 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 267–278

  • Xiao G, Zheng Z, Jiang B, Sui Y (2019) An empirical study of regression bug chains in linux. IEEE Trans Reliab 69(2):558–570

    Article  Google Scholar 

  • Yang X, Chen Y, Eide E, Regehr J (2011) Finding and understanding bugs in c compilers. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation, pp 283–294

  • Ye G, Tang Z, Tan SH, Huang S, Fang D, Sun X, Bian L, Wang H, Wang Z (2021) Automated conformance testing for javascript engines via deep compiler fuzzing. In: Proceedings of the 42nd ACM SIGPLAN international conference on programming language design and implementation, pp 435–450

  • Zhang, Chen B, Chen L, Peng X, Zhao W (2019) A large-scale empirical study of compiler errors in continuous integration. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 176–187

  • Zhang, Chen Y, Cheung SC, Xiong Y, Zhang (2018) An empirical study on tensorflow program bugs. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, pp 129–140

  • Zhang, Sun C, Su Z (2017) Skeletal program enumeration for rigorous compiler testing. In: Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation, pp 347–361

  • Zhou H, Lou JG, Zhang, Lin H, Lin H, Qin T (2015) An empirical study on quality issues of production big data platform. In: 2015 IEEE/ACM 37Th IEEE international conference on software engineering. IEEE, vol 2, pp 17–26

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (No.61932012, No.62172209), the Science, Technology and Innovation Commission of Shenzhen Municipality (No.CJGJZD20200617103001003), the Fundamental Research Funds for the Central Universities (No. 2022300295), and the Cooperation Fund of Huawei-Nanjing University Next Generation Programming Innovation Lab (No.YBN2019105178SW37).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Feng.

Ethics declarations

Conflict of Interests

To the best of our knowledge, all the named authors have no conflict of interest, financial or otherwise.

Additional information

Communicated by: Justyna Petke

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, D., Feng, Y., Yan, Y. et al. Towards understanding bugs in Python interpreters. Empir Software Eng 28, 19 (2023). https://doi.org/10.1007/s10664-022-10239-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10239-x

Keywords

Navigation