Skip to main content
Log in

Utilizing source code syntax patterns to detect bug inducing commits using machine learning models

  • Published:
Software Quality Journal Aims and scope Submit manuscript

Abstract

Detecting Bug Inducing Commit (BIC) or Just in Time (JIT) defect prediction using Machine Learning (ML) based models requires tabulated feature values extracted from the source code or historical maintenance data of a software system. Existing studies have utilized meta-data from source code repositories (we named them GitHub Statistics or GS), n-gram-based source code text processing, and developer’s information (e.g., the experience of a developer) as the feature values in ML-based bug detection models. However, these feature values do not represent the source code syntax styles or patterns that a developer might prefer over available valid alternatives provided by programming languages. This investigation proposed a method to extract features from its source code syntax patterns to represent software commits and investigate whether they are helpful in detecting bug proneness in software systems. We utilize six manually and two automatically labeled datasets from eight open-source software projects written in Java, C++, and Python programming languages. Our datasets contain 642 manually labeled and 4014 automatically labeled buggy and non-buggy commits from six and two subject systems, respectively. The subject systems contain a diverse number of revisions, and they are from various application domains. Our investigation shows the inclusion of the proposed features increases the performance of detecting buggy and non-buggy software commits using five different machine learning classification models. Our proposed features also perform better in detecting buggy commits using the Deep Belief Network generated features and classification model. This investigation also implemented a state-of-the-art tool to compare the explainability of predicted buggy commits using our proposed and traditional features and found that our proposed features provide better reasoning about buggy commit detection compared to the traditional features. The continuation of this study can lead us to enhance software effectiveness by identifying, minimizing, and fixing software bugs during its maintenance and evolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The datasets and source files generated during and/or analyzed during this study are available in our GitHub repository (https://github.com/mnadims/bicDetectionSF/) for readers to investigate and facilitate any replication study.

Notes

  1. https://github.com/mnadims/bicDetectionSF

  2. https://github.com/justinwm/InduceBenchmark

  3. https://github.com/qt/

  4. https://github.com/openstack/

  5. https://www.srcml.org/

References

  • albertbup. (2017). A python implementation of deep belief networks built upon numpy and tensorflow with scikit-learn compatibility. https://github.com/albertbup/deep-belief-network

  • Asaduzzaman, M., Roy, C. K., & Schneider, K. A. (2011). Viscad: Flexible code clone analysis support for nicad. In: Proceedings of the 5th International Workshop on Software Clones (IWSC’11). Association for Computing Machinery, New York, NY, USA, pp 77–78.

  • Asaduzzaman, M., Bullock, M. C., Roy, C. K., et al. (2012). Bug introducing changes: A case study with android. In: Proceedings of the 9th IEEE Working Conference on Mining Software Repositories (MSR’12), pp 116–119.

  • Asaduzzaman, M., Ahasanuzzaman, M., Roy, C. K., et al. (2016). How developers use exception handling in java? In: Proceedings of the IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), pp 516–519.

  • Aversano, L., Cerulo, L., & DelGrosso, C. (2007). Learning from bug-introducing changes to prevent fault prone code. In: Proceedings of the 9th International Workshop on Principles of Software Evolution: In Conjunction with the 6th ESEC/FSE Joint Meeting (IWPSE’07), pp 19–26.

  • Bavota, G., DeCarluccio, B., DeLucia, A., et al. (2012). When does a refactoring induce bugs? an empirical study. In: Proceedings of the IEEE 12th International Working Conference on Source Code Analysis and Manipulation (SCAM’12), pp 104–113.

  • Bernardi, M. L., Canfora, G., Di Lucca, G. A., et al. (2012). Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering (CSMR’12), pp 139–148.

  • Borg, M., Svensson, O., Berg, K., et al. (2019). Szz unleashed: an open implementation of the szz algorithm - featuring example usage in a study of just-in-time bug prediction for the jenkins project. In: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE’19).

  • Brownlee, J. (2017). A gentle introduction to the bag-of-words model. https://machinelearningmastery.com/gentle-introduction-bag-words-model/. [Online; Accessed 28 Sept 2021].

  • Canfora, G., Ceccarelli, M., Cerulo, L., et al. (2011). How long does a bug survive? an empirical study. In: Proceedings of the 18th Working Conference on Reverse Engineering (WCRE’11), pp 191–200.

  • Casalnuovo, C., Lee, K., Wang, H., et al. (2019). Do people prefer “natural” code? CoRR.

  • Cavnar, W., & Trenkle, J. (1994). N-gram-based text categorization. Ann Arbor MI, 48113(2), 161–175.

    Google Scholar 

  • Cordy, J. R., & Roy, C. K. (2011). The nicad clone detector. In: Proceedings of the IEEE International Conference on Program Comprehension (ICPC’11), pp 219–220.

  • da Costa, D. A., McIntosh, S., Shang, W., et al. (2017). A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Transactions on Software Engineering, 43(7), 641–657.

    Article  Google Scholar 

  • Davies, S., Roper, M., & Wood, M. (2014). Comparing text-based and dependence-based approaches for determining the origins of bugs. Journal of Software: Evolution and Process, 26(1), 107–139.

    Google Scholar 

  • Developers.Google. (2020a). Classification: Precision and Recall - Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall. Accessed 26 Aug 2021.

  • Developers.Google. (2020b). Classification: ROC Curve and AUC - Machine Learning Crash Course. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc. Accessed 26 Aug 2021.

  • Ell, J. (2013). Identifying failure inducing developer pairs within developer networks. In: Proceedings of the 35th International Conference on Software Engineering (ICSE’13), pp 1471–1473.

  • Eyolfson, J., Tan, L., & Lam, P. (2011). Do time of day and developer experience affect commit bugginess? In: Proceedings of the 8th Working Conference on Mining Software Repositories (MSR’11), pp 153–162.

  • Fukushima, T., Kamei, Y., McIntosh, S., et al. (2014). An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR’14), pp 172–181.

  • Goues, C. L., Pradel, M., & Roychoudhury, A. (2019). Automated program repair. Communications of the ACM, p. 56–65.

  • Gu, Z., Barr, E. T., Hamilton, D. J., et al. (2010). Has the bug really been fixed? In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE’10), pp 55–64.

  • Hall, T., Beecham, S., Bowes, D., et al. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.

    Article  Google Scholar 

  • Hinton, G. E. (2007). Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434.

  • Hinton, G. E., & Salakhutdinov, R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.

    Article  MathSciNet  MATH  Google Scholar 

  • Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

  • Hoang, T., Khanh Dam, H., Kamei, Y., et al. (2019). Deepjit: An end-to-end deep learning framework for just-in-time defect prediction. In: Proceedings of the IEEE/ACM 16th International Conference on Mining Software Repositories (MSR’19), pp 34–45.

  • Hoang, T., Kang, H. J., Lo, D., et al. (2020). Cc2vec: Distributed representations of code changes. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 518–529.

  • Jeffrey, D., Feng, M., Neelam, G., et al. (2009). Bugfix: A learning-based tool to assist developers in fixing bugs. In: Proceedings of the IEEE 17th International Conference on Program Comprehension, pp 70–79.

  • Jiang, J., Xiong, Y., Zhang, H, et al. (2018). Shaping program repair space with existing patches and similar code. In: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’18), p 298–309.

  • Jimenez, M., Maxime, C., LeTraon, Y., et al. (2018). On the impact of tokenizer and parameters on n-gram based code analysis. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 437–448. https://doi.org/10.1109/ICSME.2018.00053

  • Kamei, Y., Shihab, E., Adams, B., et al. (2013). A large-scale empirical study of just-in-time quality assurance. IEEE Transactions on Software Engineering, pp. 757–773.

  • Kamei, Y., Fukushima, T., Mcintosh, S., et al. (2016). Studying just-in-time defect prediction using cross-project models. Empirical Software Engineering, p 2072–2106.

  • Kim, D., Nam, J., Song, J., et al. (2013). Automatic patch generation learned from human-written patches. In: Proceedings of the 2013 International Conference on Software Engineering (ICSE ’13), p 802–811.

  • Kim, S., & Whitehead Jr, E. J. (2006). How long did it take to fix bugs? In: Proceedings of the International Workshop on Mining Software Repositories (MSR’06), pp 173 – 174.

  • Kim, S., Pan, K., Whitehead, E. E. J. (2006a). Memories of bug fixes. In: Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, SIGSOFT ’06/FSE-14, p 35–45. https://doi.org/10.1145/1181775.1181781

  • Kim, S., Zimmermann, T., Pan, K., et al. (2006b). Automatic identification of bug-introducing changes. In: Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), pp 81–90.

  • Kim, S., Zimmermann, T., Whitehead Jr, E. J., et al. (2007). Predicting faults from cached history. In: Proceedings of the 29th International Conference on Software Engineering (ICSE’07), pp 489–498.

  • Kim, S., Whitehead, E. J., Jr, & Zhang, Y. (2008). Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 34(2), 181–196.

    Article  Google Scholar 

  • Kirch, W., (ed). (2008). Pearson’s Correlation Coefficient, Springer Netherlands, Dordrecht, pp 1090–1091. https://doi.org/10.1007/978-1-4020-5614-7_2569

  • Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, 32(ICML’14), II–1188–II–1196.

  • Li, K., Xiang, Z., Chen, T., et al. (2020a). Understanding the automated parameter optimization on transfer learning for cross-project defect prediction: An empirical study. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 566–577.

  • Li, Y., Wang, S., Nguyen, T. N. (2020b). Dlfix: Context-based code transformation learning for automated program repair. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 602–614.

  • Liu, K., Koyuncu, A., Kim, D., et al. (2019). Tbar: Revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19), p 31–42.

  • Martinez, M., & Monperrus, M. (2015). Mining software repair models for reasoning on the search space of automated program fixing. Empirical Software Engineering, p 176–205.

  • Martinez, M., Weimer, W., & Monperrus, M. (2014). Do the fix ingredients already exist? an empirical inquiry into the redundancy assumptions of program repair approaches. In: Companion Proceedings of the 36th International Conference on Software Engineering, p 492–495.

  • Mizuno, O., & Hata, H. (2013). A metric to detect fault-prone software modules using text filtering. International Journal of Reliability and Safety, 7(1), 17–31.

    Article  Google Scholar 

  • Nadim, M. (2020). Investigating the techniques to detect and reduce bug inducing commits during change operations in software systems. Master’s thesis, University of Saskatchewan, Saskatoon, Canada, https://harvest.usask.ca/handle/10388/13125

  • Nadim, M., Mondal, M., & Roy, C. K. (2020). Evaluating performance of clone detection tools in detecting cloned cochange candidates. In: Proceedings of the 14th International Workshop on Software Clones (IWSC’20), pp 15–21.

  • Nayrolles, M., & Hamou-Lhadj, A. (2018). Clever: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects. In: Proceedings of the IEEE/ACM 15th International Conference on Mining Software Repositories (MSR’18), pp 153–164.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

    MathSciNet  MATH  Google Scholar 

  • Pei, Y., Furia, C. A., Nordio, M., et al. (2014). Automatic program repair by fixing contracts. In: Proceedings of Fundamental Approaches to Software Engineering, pp 246–260.

  • Pornprasit, C., & Tantithamthavorn, C. (2021). Jitline: A simpler, better, faster, finer-grained just-in-time defect prediction. In: Proceedings of the International Conference on Mining Software Repositories (MSR), p To Appear.

  • Pornprasit, C., Tantithamthavorn, C., Jiarpakdee, J., et al. (2021). Pyexplainer: Explaining the predictions of just-in-time defect models. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 407–418. https://doi.org/10.1109/ASE51524.2021.9678763

  • Rahman, M. M., & Roy, C. K. (2018). Improving ir-based bug localization with context-aware query reformulation. In: Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). Association for Computing Machinery, p 621–632.

  • Rosen, C., Grawi, B., & Shihab, E. (2015). Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE’15), pp 966–969.

  • Rosner, B., Glynn, R. J., & Lee, M. L. T. (2006). The wilcoxon signed rank test for paired comparisons of clustered data. Biometrics, 62(1), 185–192.

    Article  MathSciNet  MATH  Google Scholar 

  • Shivaji, S., James Whitehead, E., Akella, R., et al. (2013). Reducing features to improve code change-based bug prediction. IEEE Transactions on Software Engineering, 39(4), 552–569.

    Article  Google Scholar 

  • Śliwerski, J., Zimmermann, T., & Zeller, A. (2005a). Hatari: Raising risk awareness. ACM SIGSOFT Software Engineering Notes, 30(5), 107–110.

    Article  Google Scholar 

  • Śliwerski, J., Zimmermann, T., & Zeller, A. (2005b). When do changes induce fixes? ACM SIGSOFT Software Engineering Notes, 30(4), 1–5.

    Article  Google Scholar 

  • Spearman Rank Correlation Coefficient. (2008). Springer New York, New York, NY, pp 502–505. https://doi.org/10.1007/978-0-387-32833-1_379

  • Tabassum, S., Minku, L. L., Feng, D., et al. (2020). An investigation of cross-project learning in online just-in-time software defect prediction. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering (ICSE’20), pp 554–565.

  • Tan, M., Tan, L., Dara, S., et al. (2015). Online defect prediction for imbalanced data. In: Proceedings of the 37th International Conference on Software Engineering (ICSE’15), pp 99–108.

  • Taunk, K., De, S., Verma, S., et al. (2019). A brief review of nearest neighbor algorithm for learning and classification. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS), pp 1255–1260. https://doi.org/10.1109/ICCS45141.2019.9065747

  • Vieira, R., da Silva, A., Rocha, L., et al. (2019). From reports to bug-fix commits: A 10 years dataset of bug-fixing activity from 55 apache’s open source projects. In: Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering. Association for Computing Machinery, New York, NY, USA, PROMISE’19, p 80–89. https://doi.org/10.1145/3345629.3345639

  • Virtanen, P., Gommers, R., Oliphant, T., et al. (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17, 261–272.

    Article  Google Scholar 

  • Wen, M., Wu, R., & Cheung, S. C. (2016). Locus: Locating bugs from software changes. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16), pp 262–273.

  • Wen, M., Wu, R., Liu, Y., et al. (2019). Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’19), pp 326–337.

  • Wen, M., Liu, Y., & Cheung, S. C. (2020). Boosting automated program repair with bug-inducing commits. In: Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER’20), pp 77–80.

  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. http://www.jstor.org/stable/3001968

  • Wu, R., Wen, M., Cheung, S. C., et al. (2018). Changelocator: locate crash-inducing changes based on crash reports. Empirical Software Engineering, 23(5), 2866–2900.

    Article  Google Scholar 

  • Xin, Q., & Reiss, S. P. (2019). Better code search and reuse for better program repair. In: Proceedings of the 6th International Workshop on Genetic Improvement (GI ’19), p 10–17.

  • Yang, X., Lo, D., Xia, X., et al. (2015). Deep learning for just-in-time defect prediction. In: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security (QRS’15), pp 17–26.

  • Yin, Z., Yuan, D., Zhou, Y., et al. (2011). How do fixes become bugs? In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11), pp 26–36.

  • Yue, R., Meng, N., & Wang, Q. (2017). A characterization study of repeated bug fixes. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 422–432. https://doi.org/10.1109/ICSME.2017.16

  • Zeng, Z., Zhang, Y., Zhang, H., et al. (2021). Deep just-in-time defect prediction: How far are we? In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. Association for Computing Machinery, New York, NY, USA, ISSTA 2021, p 427–438. https://doi.org/10.1145/3460319.3464819

  • Zhao, R., & Mao, K. (2018). Fuzzy bag-of-words model for document representation. IEEE Transactions on Fuzzy Systems, 26(2), 794–804.

    Article  Google Scholar 

  • Zibran, M. F., & Roy, C. K. (2012). Ide-based real-time focused search for near-miss clones. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, New York, NY, USA, SAC 2012, pp 1235–1242.

Download references

Funding

This research is supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery grants, and by an NSERC Collaborative Research and Training Experience (CREATE) grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Md Nadim.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nadim, M., Roy, B. Utilizing source code syntax patterns to detect bug inducing commits using machine learning models. Software Qual J 31, 775–807 (2023). https://doi.org/10.1007/s11219-022-09611-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-022-09611-3

Keywords

Navigation