Skip to main content
Log in

Evaluating the impact of falsely detected performance bug-inducing changes in JIT models

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Performance bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of performance bugs, can therefore provide key assistance for software developers racing to fix these bugs. Classification models that focus on identifying defect-prone commits, referred to as Just-In-Time (JIT) Quality Assurance are known to be useful in allowing developers to review risky commits. These commits can be reviewed while they are still fresh in developers’ minds, reducing the costs of developing high-quality software. JIT models, however, leverage the SZZ approach to identify whether or not a change is bug-inducing. The fixes to performance bugs may be scattered across the source code, separated from their bug-inducing locations. The nature of performance bugs may make SZZ a sub-optimal approach for identifying their bug-inducing commits. Yet, prior studies that leverage or evaluate the SZZ approach do not distinguish performance bugs from other bugs, leading to potential bias in the results. In this paper, we conduct an empirical study on the JIT defect prediction for performance bugs. We concentrate on SZZ’s ability to identify the bug-inducing commits of performance bugs in two open-source projects, Cassandra, and Hadoop. We verify whether the bug-inducing commits found by SZZ are truly bug-inducing commits by manually examining these identified commits. Our manual examination includes cross referencing fix commits and JIRA bug reports. We evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits, i.e., the SZZ approach does introduce errors when identifying bug-inducing commits. However, we find that manually correcting these errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data (i.e., truly performance bug-inducing commits, non-performance bug-inducing commits, and non-bug-inducing commits) yields the best classification results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Our data files and scripts used are publicly available and can be found at: https://github.com/senseconcordia/Perf-JIT-Models

References

  • Agrawal A, Menzies T (2018) Is ”better data” better than ”better data miners”? on the benefits of tuning smote for defect prediction. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. Association for Computing Machinery, New York, , pp 1050–1061. [Online]. Available: https://doi.org/10.1145/3180155.3180197

  • Apache apache/cassandra (2019) [Online]. Available: https://github.com/apache/cassandra

  • Apache hadoop (2020) [Online]. Available: https://hadoop.apache.org/

  • Bryant RE, O’Hallaron DR (2015) Computer Systems: A Programmer’s Perspective, 3rd ed. Pearson

  • Catolino G (2017) Just-in-time bug prediction in mobile applications: The domain matters! pp 05

  • Catolino G, Di Nucci D, Ferrucci F (2019) Cross-project just-in-time bug prediction for mobile apps: An empirical assessment. In: 2019 IEEE/ACM 6th International Conference on Mobile Software Engineering and Systems (MOBILESoft), pp 99–110

  • Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp 341–352

  • Chen T.-H., Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2014) Detecting performance anti-patterns for applications developed using object-relational mapping. In: Proceedings of the 36th International Conference on Software Engineering, ser. ICSE 2014. Association for Computing Machinery, New York, pp 1001–1012. [Online]. Available: https://doi.org/10.1145/2568225.2568259

  • Chen J, Shang W, Shihab E (2020) Perfjit: Test-level just-in-time prediction for performance regression introducing commits. IEEE Trans Softw Eng:1–1

  • Correlation (pearson kendall, spearman) (2020) [Online]. Available: https://www.statisticssolutions.com/correlation-pearson-kendall-spearman/

  • da Costa DA, McIntosh S, Shang W, Kulesza U, Coelho R, Hassan AE (2017) A framework for evaluating the results of the szz approach for identifying bug-introducing changes. IEEE Trans Softw Eng 43(7):641–657

    Article  Google Scholar 

  • Davies S, Roper M, Wood M (2014) Comparing text-based and dependence-based approaches for determining the origins of bugs. J Softw Evol Process 26:01

    Article  Google Scholar 

  • Ding Z, Chen J, Shang W (2020) Towards the use of the readily available tests from the release pipeline as performance tests. are we there yet? In: 42nd International Conference on Software Engineering, Seoul

  • Dmwr (2020) [Online]. Available: https://www.rdocumentation.org/packages/DMwR/versions/0.4.1/topics/SMOTE

  • Fawcett T (2006) An introduction to roc analysis. Pattern Recogn Lett 27(8):861–874

    Article  MathSciNet  Google Scholar 

  • Fukushima T, Kamei Y, McIntosh S, Yamashita K, Ubayashi N (2014) Studying just-in-time defect prediction using cross-project models. Empir Softw Eng 21:05

    Google Scholar 

  • Ghotra B, McIntosh S, Hassan AE (2015) Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of the 37th International Conference on Software Engineering - Vol 1, ser. ICSE ’15. IEEE Press, pp 789–800

  • Guindon C Swt: The standard widget toolkit. [Online]. Available: https://www.eclipse.org/swt/

  • Guo PJ, Zimmermann T, Nagappan N, Murphy B (2010) Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. Association for Computing Machinery, New York, pp 495–504. [Online]. Available: https://doi.org/10.1145/1806799.1806871

  • Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  • Hamill M, Goseva-Popstojanova K (2014) Exploring the missing link: An empirical study of software fixes. Softw Test Verif Reliab 24(8):684–705. [Online]. Available: https://doi.org/10.1002/stvr.1518

  • Hassan AE (2009) Predicting faults using the complexity of code changes. In: 2009 IEEE 31st International Conference on Software Engineering, pp. 78–88

  • Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. SIGPLAN Not. 47(6):77–88

    Article  Google Scholar 

  • Jpace jpace/diffj [Online]. Available: https://github.com/jpace/diffj

  • Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N (2013) A large-scale empirical study of just-in-time quality assurance. IEEE Trans Softw Eng 39(6):757–773

    Article  Google Scholar 

  • Kim S, Zimmermann T, Pan K, Whitehead Jr. E. J. (2006) Automatic identification of bug-introducing changes. In: 21st IEEE/ACM International Conference on Automated Software Engineering (ASE’06), pp 81–90

  • Kondo M, German D, Mizuno O, Choi E (2020) The impact of context metrics on just-in-time defect prediction. Empir Softw Eng 25:01

    Article  Google Scholar 

  • LaToza TD, Venolia G, DeLine R (2006) Maintaining mental models: A study of developer work habits. In: Proceedings of the 28th International Conference on Software Engineering, ser. ICSE ’06. ACM, New York, pp 492–501

  • Li H, Shang W, Zou Y, Hassan AE (2018) Towards just-in-time suggestions for log changes (journal-first abstract). In: 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp 467–467

  • McDonald JH (2014) Handbook of biological statistics. sparky house publishing Baltimore. MD 3:186–189

    Google Scholar 

  • McHugh M (2012) Interrater reliability: The kappa statistic, Biochemia medica : č,asopis Hrvatskoga društva medicinskih biokemičara / HDMB, vol 22, pp 276–82, 10

  • McIntosh S., Kamei Y (2018) Are fix-inducing changes a moving target? a longitudinal case study of just-in-time defect prediction. IEEE Trans Softw Eng 44(5):412–428

    Article  Google Scholar 

  • Molyneaux I (2009) The Art of Application Performance Testing: Help for Programmers and Quality Assurance, 1st ed. O’Reilly Media, Inc.

  • Nayrolles M, Hamou-Lhadj A (2018) Clever: Combining code metrics with clone detection for just-in-time fault prevention and resolution in large industrial projects, pp 03

  • Neto E, Costa D, Kulesza U (2018) The impact of refactoring changes on the szz algorithm: An empirical study, pp 03

  • Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. in: 2013 10th working conference on mining software repositories (MSR), pp 237–246

  • Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015)

  • Radu A, Nadi S (2019) A dataset of non-functional bugs. In: Proceedings of the 16th International Conference on Mining Software Repositories, ser. MSR ’19. IEEE Press, Piscataway, pp 399–403

  • Rodrıguez-Perez G., Nagappan M, Robles G (2020) Watch out for extrinsic bugs! a case study of their impact in just-in-time bug prediction models on the openstack project. IEEE Transactions on Software Engineering

  • Rosen C, Grawi B, Shihab E (2015) Commit guru: Analytics and risk prediction of software commits. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. ACM, New York, pp 966–969. [Online]. Available: https://doi.org/10.1145/2786805.2803183

  • Sawilowsky SS (2009) New effect size rules of thumb. J Modern Appl Stat Methods 8(2):26

    Article  MathSciNet  Google Scholar 

  • Śliwerski J, Zimmermann T, Zeller A (2005) When do changes induce fixes?. SIGSOFT Softw Eng Notes 30(4):1–5

    Article  Google Scholar 

  • Syer MD, Shang W, Jiang ZM, Hassan AE (2017) Continuous validation of performance test workloads. Autom Softw Engg 2(1):189–231. [Online]. Available: https://doi.org/10.1007/s10515-016-0196-8

  • Tabassum S (2020) An investigation of cross-project learning in online just-in-time software defect prediction, pp 06

  • Tantithamthavorn C, Hassan AE, Matsumoto K (2020) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng 46(11):1200–1219

    Article  Google Scholar 

  • Team J Eclipse java development tools (jdt). [Online]. Available: https://www.eclipse.org/jdt/

  • Tsakiltsidis S, Miranskyy A, Mazzawi E (2016) On automatic detection of performance bugs. In: 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pp 132–139

  • Williams C, Spacco J (2008) Szz revisited: Verifying when changes induce fixes. In: Proceedings of the 2008 Workshop on Defects in Large Software Systems, ser. DEFECTS ’08. ACM, New York, pp 32–36

  • Yang Y, Zhou Y, Liu J, Zhao Y, Lu H, Xu L, Xu B, Leung H (2016) Effort-aware just-in-time defect prediction: Simple unsupervised models could be better than supervised models. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016. Association for Computing Machinery, New York, pp 157–168. [Online]. Available: https://doi.org/10.1145/2950290.2950353

  • Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: A case study on firefox. In: Proceedings of the 8th Working Conference on Mining Software Repositories, ser. MSR ’11. ACM, New York, pp 93–102

  • Zaman S, Adams B, Hassan AE (2012) A qualitative study on performance bugs, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp 199–208

Download references

Acknowledgements

This research was partially supported by JSPS KAKENHI Japan (Grant Numbers: 21H04877, JP18H03222) and JSPS International Joint Research Program with SNSF (Project “SENSOR”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sophia Quach.

Additional information

Communicated by: Nachiappan Nagappan

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quach, S., Lamothe, M., Adams, B. et al. Evaluating the impact of falsely detected performance bug-inducing changes in JIT models. Empir Software Eng 26, 97 (2021). https://doi.org/10.1007/s10664-021-10004-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10004-6

Keywords

Navigation