Evaluating the impact of falsely detected performance bug-inducing changes in JIT models

Quach, Sophia; Lamothe, Maxime; Adams, Bram; Kamei, Yasutaka; Shang, Weiyi

doi:10.1007/s10664-021-10004-6

Evaluating the impact of falsely detected performance bug-inducing changes in JIT models

Published: 10 July 2021

Volume 26, article number 97, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Sophia Quach ORCID: orcid.org/0000-0001-5850-1375¹,
Maxime Lamothe²,
Bram Adams³,
Yasutaka Kamei⁴ &
…
Weiyi Shang¹

422 Accesses
4 Citations
Explore all metrics

Abstract

Performance bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of performance bugs, can therefore provide key assistance for software developers racing to fix these bugs. Classification models that focus on identifying defect-prone commits, referred to as Just-In-Time (JIT) Quality Assurance are known to be useful in allowing developers to review risky commits. These commits can be reviewed while they are still fresh in developers’ minds, reducing the costs of developing high-quality software. JIT models, however, leverage the SZZ approach to identify whether or not a change is bug-inducing. The fixes to performance bugs may be scattered across the source code, separated from their bug-inducing locations. The nature of performance bugs may make SZZ a sub-optimal approach for identifying their bug-inducing commits. Yet, prior studies that leverage or evaluate the SZZ approach do not distinguish performance bugs from other bugs, leading to potential bias in the results. In this paper, we conduct an empirical study on the JIT defect prediction for performance bugs. We concentrate on SZZ’s ability to identify the bug-inducing commits of performance bugs in two open-source projects, Cassandra, and Hadoop. We verify whether the bug-inducing commits found by SZZ are truly bug-inducing commits by manually examining these identified commits. Our manual examination includes cross referencing fix commits and JIRA bug reports. We evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits, i.e., the SZZ approach does introduce errors when identifying bug-inducing commits. However, we find that manually correcting these errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data (i.e., truly performance bug-inducing commits, non-performance bug-inducing commits, and non-bug-inducing commits) yields the best classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Steven Euijong Whang, Yuji Roh, … Jae-Gil Lee

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Test case selection and prioritization using machine learning: a systematic literature review

Article 14 December 2021

Rongqi Pan, Mojtaba Bagherzadeh, … Lionel Briand

Notes

Our data files and scripts used are publicly available and can be found at: https://github.com/senseconcordia/Perf-JIT-Models

References

Agrawal A, Menzies T (2018) Is ”better data” better than ”better data miners”? on the benefits of tuning smote for defect prediction. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. Association for Computing Machinery, New York, , pp 1050–1061. [Online]. Available: https://doi.org/10.1145/3180155.3180197
Apache apache/cassandra (2019) [Online]. Available: https://github.com/apache/cassandra
Apache hadoop (2020) [Online]. Available: https://hadoop.apache.org/
Bryant RE, O’Hallaron DR (2015) Computer Systems: A Programmer’s Perspective, 3rd ed. Pearson
Catolino G (2017) Just-in-time bug prediction in mobile applications: The domain matters! pp 05
Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp 99–110
Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 341–352
Chen T.-H., Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2014) Detecting performance anti-patterns for applications developed using object-relational mapping. In: Proceedings of the 36th International Conference on Software Engineering, ser. ICSE 2014. Association for Computing Machinery, New York, pp 1001–1012. [Online]. Available: https://doi.org/10.1145/2568225.2568259
Chen J, Shang W, Shihab E (2020) Perfjit: Test-level just-in-time prediction for performance regression introducing commits. IEEE Trans Softw Eng:1–1
Correlation (pearson kendall, spearman) (2020) [Online]. Available: https://www.statisticssolutions.com/correlation-pearson-kendall-spearman/
da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657
Article Google Scholar
Davies S, Roper M, Wood M (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. J Softw Evol Process 26:01
Article Google Scholar
Ding Z, Chen J, Shang W (2020) Towards the use of the readily available tests from the release pipeline as performance tests. are we there yet? In: 42nd International Conference on Software Engineering, Seoul
Dmwr (2020) [Online]. Available: https://www.rdocumentation.org/packages/DMwR/versions/0.4.1/topics/SMOTE
Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874
Article MathSciNet Google Scholar
Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21:05
Google Scholar
Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th International Conference on Software Engineering - Vol 1, ser. ICSE ’15. IEEE Press, pp 789–800
Guindon C Swt: The standard widget toolkit. [Online]. Available: https://www.eclipse.org/swt/
Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. Association for Computing Machinery, New York, pp 495–504. [Online]. Available: https://doi.org/10.1145/1806799.1806871
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910
Article Google Scholar
Hamill M, Goseva-Popstojanova K (2014) Exploring the missing link: An empirical study of software fixes. Softw Test Verif Reliab 24(8):684–705. [Online]. Available: https://doi.org/10.1002/stvr.1518
Hassan AE (2009) Predicting faults using the complexity of code changes. In: 2009 IEEE 31st International Conference on Software Engineering, pp. 78–88
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. SIGPLAN Not. 47(6):77–88
Article Google Scholar
Jpace jpace/diffj [Online]. Available: https://github.com/jpace/diffj
Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773
Article Google Scholar
Kim S, Zimmermann T, Pan K, Whitehead Jr. E. J. (2006) Automatic identification of bug-introducing changes. In: 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), pp 81–90
Kondo M, German D, Mizuno O, Choi E (2020) The impact of context metrics on just-in-time defect prediction. Empir Softw Eng 25:01
Article Google Scholar
LaToza TD, Venolia G, DeLine R (2006) Maintaining mental models: A study of developer work habits. In: Proceedings of the 28th International Conference on Software Engineering, ser. ICSE ’06. ACM, New York, pp 492–501
Li H, Shang W, Zou Y, Hassan AE (2018) Towards just-in-time suggestions for log changes (journal-first abstract). In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 467–467
McDonald JH (2014) Handbook of biological statistics. sparky house publishing Baltimore. MD 3:186–189
Google Scholar
McHugh M (2012) Interrater reliability: The kappa statistic, Biochemia medica : č,asopis Hrvatskoga društva medicinskih biokemičara / HDMB, vol 22, pp 276–82, 10
McIntosh S., Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. IEEE Trans Softw Eng 44(5):412–428
Article Google Scholar
Molyneaux I (2009) The Art of Application Performance Testing: Help for Programmers and Quality Assurance, 1st ed. O’Reilly Media, Inc.
Nayrolles M, Hamou-Lhadj A (2018) Clever: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects, pp 03
Neto E, Costa D, Kulesza U (2018) The impact of refactoring changes on the szz algorithm: An empirical study, pp 03
Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. in: 2013 10th working conference on mining software repositories (MSR), pp 237–246
Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015)
Radu A, Nadi S (2019) A dataset of non-functional bugs. In: Proceedings of the 16th International Conference on Mining Software Repositories, ser. MSR ’19. IEEE Press, Piscataway, pp 399–403
Rodrıguez-Perez G., Nagappan M, Robles G (2020) Watch out for extrinsic bugs! a case study of their impact in just-in-time bug prediction models on the openstack project. IEEE Transactions on Software Engineering
Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. ACM, New York, pp 966–969. [Online]. Available: https://doi.org/10.1145/2786805.2803183
Sawilowsky SS (2009) New effect size rules of thumb. J Modern Appl Stat Methods 8(2):26
Article MathSciNet Google Scholar
Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. SIGSOFT Softw Eng Notes 30(4):1–5
Article Google Scholar
Syer MD, Shang W, Jiang ZM, Hassan AE (2017) Continuous validation of performance test workloads. Autom Softw Engg 2(1):189–231. [Online]. Available: https://doi.org/10.1007/s10515-016-0196-8
Tabassum S (2020) An investigation of cross-project learning in online just-in-time software defect prediction, pp 06
Tantithamthavorn C, Hassan AE, Matsumoto K (2020) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng 46(11):1200–1219
Article Google Scholar
Team J Eclipse java development tools (jdt). [Online]. Available: https://www.eclipse.org/jdt/
Tsakiltsidis S, Miranskyy A, Mazzawi E (2016) On automatic detection of performance bugs. In: 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp 132–139
Williams C, Spacco J (2008) Szz revisited: Verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, ser. DEFECTS ’08. ACM, New York, pp 32–36
Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. Association for Computing Machinery, New York, pp 157–168. [Online]. Available: https://doi.org/10.1145/2950290.2950353
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: A case study on firefox. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR ’11. ACM, New York, pp 93–102
Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 199–208

Download references

Acknowledgements

This research was partially supported by JSPS KAKENHI Japan (Grant Numbers: 21H04877, JP18H03222) and JSPS International Joint Research Program with SNSF (Project “SENSOR”).

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Concordia University, Montreal, QC, Canada
Sophia Quach & Weiyi Shang
Department of Computer and Software Engineering, Polytechnique, Montreal, QC, Canada
Maxime Lamothe
Queen’s School of Computing, Queen’s University, Queen’s, Canada
Bram Adams
Faculty of Information Science and Electrical Engineering, Kyushu University, Kyushu, Japan
Yasutaka Kamei

Authors

Sophia Quach
View author publications
You can also search for this author in PubMed Google Scholar
Maxime Lamothe
View author publications
You can also search for this author in PubMed Google Scholar
Bram Adams
View author publications
You can also search for this author in PubMed Google Scholar
Yasutaka Kamei
View author publications
You can also search for this author in PubMed Google Scholar
Weiyi Shang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sophia Quach.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quach, S., Lamothe, M., Adams, B. et al. Evaluating the impact of falsely detected performance bug-inducing changes in JIT models. Empir Software Eng 26, 97 (2021). https://doi.org/10.1007/s10664-021-10004-6

Download citation

Accepted: 14 June 2021
Published: 10 July 2021
DOI: https://doi.org/10.1007/s10664-021-10004-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the impact of falsely detected performance bug-inducing changes in JIT models

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Abstract

Access this article

Similar content being viewed by others

Data collection and quality challenges in deep learning: a data-centric AI perspective

How different are different diff algorithms in Git?

Test case selection and prioritization using machine learning: a systematic literature review

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation