On the Effectiveness of Bisection in Performance Regression Localization

Ocariza, Jr., Frolin S.

doi:10.1007/s10664-022-10152-3

On the Effectiveness of Bisection in Performance Regression Localization

Published: 30 April 2022

Volume 27, article number 95, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Frolin S. Ocariza, Jr. ORCID: orcid.org/0000-0003-2468-3902¹

285 Accesses
1 Citation
Explore all metrics

Abstract

Performance regressions can have a drastic impact on the usability of a software application. The crucial task of localizing such regressions can be achieved using bisection, which attempts to find the bug-introducing commit using binary search. This approach is used extensively by many development teams, but it is an inherently heuristical approach when applied to performance regressions, and therefore, does not have correctness guarantees. Unfortunately, bisection is also time-consuming, which implies the need to assess its effectiveness prior to running it. To this end, the goal of this study is to analyze the effectiveness of bisection for performance regressions. This goal is achieved by first formulating a metric that quantifies the probability of a successful bisection, and extracting a list of input parameters – the contributing properties – that potentially impact its value; a sensitivity analysis is then conducted on these properties to understand the extent of their impact. Furthermore, an empirical study of 310 bug reports describing performance regressions in 17 real-world applications is conducted, to better understand what these contributing properties look like in practice. The results show that while bisection can be highly effective in localizing real-world performance regressions, this effectiveness is sensitive to the contributing properties, especially the choice of baseline and the distributions at each commit. The results also reveal that most bug reports do not provide sufficient information to help developers properly choose values and metrics that can maximize the effectiveness, which implies the need for measures to fill this information gap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Selective Bisection Debugging

A study of common bug fix patterns in Rust

Article 12 February 2024

Considering dependencies between bug reports to improve bugs triage

Article 07 January 2021

Notes

For simplicity, we will also refer to software performance regressions simply as performance regressions
https://people.ece.ubc.ca/frolino/projects/perf-bisect-effectiveness/
In fact, this turns out to be the main reason the effectiveness measure tends to be low when the transition index is somewhere in the middle, for all three distribution types

References

Ahmed TM, Bezemer C-P, Chen T-H, Hassan AE, Shang W (2016) Studying the effectiveness of application performance management (apm) tools for detecting performance regressions for web applications: an experience report. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 1–12
Akinshin A (2019) Pro. net benchmarking: The art of performance measurement. Springer, Berlin
Book Google Scholar
Alcocer J P S, Beck F, Bergel A (2019) Performance evolution matrix: Visualizing performance variations along software versions. In: Proceedings of the working conference on software visualization (VISSOFT). IEEE, pp 1–11
Alcocer J P S, Bergel A (2015) Tracking down performance variation against source code evolution. ACM SIGPLAN Not 51(2):129–139
Article Google Scholar
An G, Yoo S (2021) Reducing the search space of bug inducing commits using failure coverage. In: Proceedings of the joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 1459–1462
Arif M M, Shang W, Shihab E (2018) Empirical study on the discrepancy between performance testing results from virtual and physical environments. Empir Softw Eng 23(3):1490–1518
Article Google Scholar
Artho C (2011) Iterative delta debugging. Int J Softw Tools Technol Transfer (STTT) 13(3):223–246
Article Google Scholar
Automattic (2021) Automattic WordPress Calypso. https://www.github.com/Automattic/wp-calypso (Accessed: July 20, 2021)
Ben-Or M, Hassidim A (2008) The bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In: Proceedings of the IEEE symposium on foundations of computer science. IEEE, pp 221–230
Bezemer C, Milon E, Zaidman A, Pouwelse J (2014) Detecting and analyzing I/O performance regressions. J Softw Evol Process (JSEP) 26 (12):1193–1212
Article Google Scholar
Bezemer C-P, Pouwelse J, Gregg B (2015) Understanding software performance regressions using differential flame graphs. In: Proceedings of the international conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 535–539
Bittner D M, Sarwate A D, Wright R N (2018) Using noisy binary search for differentially private anomaly detection. In: Proceedings of the international symposium on cyber security cryptography and machine learning (CSCML). Springer, pp 20–37
Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE Computer Society, pp 341–352
Chen T, Guo Q, Temam O, Wu Y, Bao Y, Xu Z, Chen Y (2014) Statistical performance comparisons of computers. IEEE Trans Comput 64(5):1442–1455
Article MathSciNet Google Scholar
Chen Y, Winter S, Suri N (2019) Inferring performance bug patterns from developer commits. In: Proceedings of the international symposium on software reliability engineering (ISSRE). IEEE Computer Society, pp 70–81
Cockroach Labs (2021) CockroachDB. https://www.github.com/cockroachdb/cockroach (Accessed: July 20, 2021)
Couder C (2009) Fighting regressions with git bisect. https://git-scm.com/docs/git-bisect-lk2009 (Accessed: August 9, 2021)
Crovella M E (2000) Performance evaluation with heavy tailed distributions. In: International conference on modelling techniques and tools for computer performance evaluation (TOOLS). Springer, pp 1–9
Crovella M E, Taqqu M S, Bestavros A (1998) Heavy-tailed probability distributions in the world wide web. A practical guide to heavy tails: statistical techniques and applications 1:3–26
MATH Google Scholar
Dahl R (2021) Node.js. https://www.github.com/nodejs/node (Accessed: July 20, 2021)
Della Toffola L, Pradel M, Gross T R (2015) Performance problems you can fix: A dynamic analysis of memoization opportunities. In: Proceedings of the international conference on object-oriented programming, systems, languages, and applications (OOPSLA). ACM, pp 607–622
Dereniowski D, Łukasiewicz A, Uznański P (2021) An efficient noisy binary search in graphs via median approximation. In: Proceedings of the international workshop on combinatorial algorithms. Springer, pp 265–281
Dynatrace (2018) Dynatrace. https://www.dynatrace.com/ (Accessed: January 8, 2018)
Elastic NV (2021) Elasticsearch. https://www.github.com/elastic/elasticsearch (Accessed: July 20, 2021)
Epa N S, Gan J, Wirth A (2019) Result-sensitive binary search with noisy information. In: Proceedings of the international symposium on algorithms and computation (ISAAC). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik
Exadv1 (2021) Space Station 13. https://www.github.com/tgstation/tgstation (Accessed: July 20, 2021)
Fossil (2006) Fossil bisect command documentation. https://www.fossil-scm.org/fossil/help/bisect (Accessed: August 11, 2021)
Gaviar A (2019) GitHub’s Top 100 Most Valuable Repositories Out of 96 Million. https://hackernoon.com/githubs-top-100-most-valuable-repositories-out-of-96-million-bb48caa9eb0b (Accessed: July 19, 2021)
Git (2009) Git Bisect Documentation. https://git-scm.com/docs/git-bisect (Accessed: August 11, 2021)
Google (2018) Chrome DevTools Overview. https://developer.chrome.com/devtools (Accessed: February 19, 2018)
Google (2021) Bisecting performance regressions. https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/speed/bisects.md (Accessed: November 29, 2021)
Google (2021) Google Flutter. https://www.github.com/flutter/flutter (Accessed: July 20, 2021)
Google (2021) Kubernetes. https://www.github.com/kubernetes/kubernetes (Accessed: July 20, 2021)
Google (2021) TensorFlow. https://www.github.com/tensorflow/tensorflow (Accessed: July 20, 2021)
Graham S L, Kessler P B, Mckusick M K (1982) Gprof: A call graph execution profiler. In: Proceedings of the SIGPLAN symposium on compiler construction. ACM, pp 120–126
Gregg B (2016) The flame graph: This visualization of software execution is a new necessity for performance profiling and debugging. Queue 14(2):91–110
Article MathSciNet Google Scholar
Gross T (1997) Bisection debugging. In: Proceedings of the international workshop on automatic debugging (AADEBUG). Linkøping University Electronic Press, pp 185–191
Han X, Carroll D, Yu T (2019) Reproducing performance bug reports in server applications: The researchers’ experiences. J Syst Softw 156:268–282
Article Google Scholar
Han X, Yu T (2016) An empirical study on performance bugs for highly configurable software systems. In: Proceedings of the international symposium on empirical software engineering and measurement. ACM/IEEE, pp 1–10
Harchol-Balter M (2013) Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, Cambridge
MATH Google Scholar
Heger C, Happe J, Farahbod R (2013) Automated root cause isolation of performance regressions during software development. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 27–38
Inman HF, Bradley EL Jr (1989) The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Commun Stat-Theory Methods 18(10):3851–3874
Article MathSciNet Google Scholar
Jedynak B, Frazier P I, Sznitman R (2012) Twenty questions with noise: Bayes optimal policies for entropy loss. J Appl Probab 49(1):114–136
Article MathSciNet Google Scholar
Jung J, Hu H, Arulraj J, Kim T, Kang W (2019) Apollo: Automatic detection and diagnosis of performance regressions in database systems. Proceedings of the VLDB Endowment 13(1):57–70
Article Google Scholar
Karp R M, Kleinberg R (2007) Noisy binary search and its applications. In: Proceedings of the ACM-SIAM symposium on discrete algorithms. ACM, pp 881–890
Keenan J E (2019) Multisection: When Bisection Isn’t Enough to Debug a Problem – The Perl Conference 2019. https://www.youtube.com/watch?v=05CwdTRt6AM (Accessed: November 18, 2021)
Larabel M (2009a) Autonomously finding performance regressions in the linux kernel. https://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=2 (Accessed: August 11, 2021)
Larabel M (2009b) Phoromatic tracker launches to monitor linux performance. https://www.phoronix.com/scan.php?page=article&item=phoromatic_tracker&num=2 (Accessed: August 11, 2021)
Leitner P, Bezemer C-P (2017) An exploratory study of the state of practice of performance testing in java-based open source projects. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 373–384
Linares-Vásquez M, Vendome C, Luo Q, Poshyvanyk D (2015) How developers detect and fix performance bottlenecks in Android apps. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE Computer Society, pp 352–361
Luo Q, Poshyvanyk D, Grechanik M (2016) Mining performance regression inducing code changes in evolving software. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 25–36
Malik H, Adams B, Hassan AE (2010) Pinpointing the subsystems responsible for the performance deviations in a load test. In: Proceedings of the international symposium on software reliability engineering (ISSRE). IEEE Computer Society, pp 201–210
Mercurial (2005) Mercurial bisect command documentation. https://www.selenic.com/mercurial/hg.1.html (Accessed: August 11, 2021)
Microsoft (2015) How to: Compare Performance Data Files. https://msdn.microsoft.com/en-us/library/bb385753.aspx (Accessed: February 19, 2018)
Microsoft (2018) Startup performance regressed. https://github.com/microsoft/vscode/issues/42513 (Accessed: June 20, 2021)
Microsoft (2021a) Microsoft .NET CoreFX. https://www.github.com/dotnet/runtime (Accessed: July 20, 2021)
Microsoft (2021b) Microsoft .NET Roslyn. https://www.github.com/dotnet/roslyn (Accessed: July 20, 2021)
Microsoft (2021c) Microsoft Visual Studio Code. https://www.github.com/microsoft/vscode (Accessed: July 20, 2021)
Microsoft (2021) [Perf -6%] Regression in System.Text.Encodings.Web.Tests.Perf_Encoders. https://github.com/dotnet/runtime/issues/48519 (Accessed: June 20, 2021)
Moby Project (2021) Moby. https://www.github.com/moby/moby (Accessed: July 20, 2021)
Mozilla Corporation (2021) Servo. https://www.github.com/servo/servo (Accessed: July 20, 2021)
Murphy W (2018) Investigating performance changes with git bisect. https://willmurphyscode.net/2018/02/07/investigating-performance-changes-with-git-bisect/l (Accessed: August 11, 2021)
Najafi A, Rigby P C, Shang W (2019) Bisecting commits and modeling commit risk during testing. In: Proceedings of joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 279–289
Neville-Neil GV (2021) Divide and conquer: The use and limits of bisection. Queue 19(3):37–39
Article Google Scholar
Nguyen Thanh HD, Nagappan M, Hassan A E, Nasser M, Flora P (2014) An industrial case study of automatically identifying performance regression-causes. In: Proceedings of the international working conference on mining software repositories (MSR). ACM, pp 232–241
Nistor A, Chang P-C, Radoi C, Lu S (2015) Caramel: detecting and fixing performance problems that have non-intrusive fixes. In: Proceedings of the International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 902–912
Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. In: Proceedings of the working conference on mining software repositories (MSR). IEEE Computer Society, pp 237–246
NixOS (2021) NixOS package collection. https://www.github.com/NixOS/nixpkgs (Accessed: July 20, 2021)
Nowak R (2009) Noisy generalized binary search. In: Proceedings of advances in neural iinformation processing systems, pp 1366–1374
Ocariza F (2020) Web Application Debugging – UBC Guest Lecture. https://www.youtube.com/watch?v=gNa247IaaGM (Accessed: June 20, 2021)
Ocariza F, Bajaj K, Pattabiraman K, Mesbah A (2013) An empirical study of client-side JavaScript bugs. In: Proceedings of the international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society, pp 55–64
Ocariza F, Zhao B (2021) Localizing software performance regressions in web applications by comparing execution timelines. Software Testing, Verification and Reliability (STVR) 31(5):e1750
Google Scholar
Ocariza F S, Bajaj K, Pattabiraman K, Mesbah A (2017) A study of causes and consequences of client-side javascript bugs. IEEE Trans Softw Eng 43(2):128–144
Article Google Scholar
Olianas D, Leotta M, Ricca F, Biagiola M, Tonella P (2021) STILE: a tool for parallel execution of e2e web test scripts. In: Proceedings of the international conference on software testing, verification and validation (ICST). IEEE Computer Society, pp 460–465
Pelc A (1989) Searching with known error probability. Theor Comput Sci 63(2):185–202
Article MathSciNet Google Scholar
Pradel M, Schuh P, Sen K (2014) EventBreak: analyzing the responsiveness of user interfaces through performance-guided test generation. In: Proceedings of the international conference on object oriented programming systems languages & applications (OOPSLA). ACM, pp 33–47
Red Hat (2021a) Ansible. https://www.github.com/ansible/ansible (Accessed: July 20, 2021)
Red Hat (2021b) Red Hat OpenShift. https://www.github.com/openshift/origin (Accessed: July 20, 2021)
Rivest R L, Meyer A R, Kleitman D J, Winklmann K, Spencer J (1980) Coping with errors in binary search procedures. J Comput Syst Sci 20 (3):396–404
Article MathSciNet Google Scholar
Rogora D, Carzaniga A, Diwan A, Hauswirth M, Soulé R (2020) Analyzing system performance with probabilistic performance annotations. In: Proceedings of the european conference on computer systems (EuroSys), pp 1–14
Saha R, Gligoric M (2017) Selective bisection debugging. In: Proceedings of the international conference on fundamental approaches to software engineering (FASE). Springer, pp 60–77
Sánchez A B, Delgado-Pérez P, Medina-Bulo I, Segura S (2020) Tandem: A taxonomy and a dataset of real-world performance bugs. IEEE Access 8:107214–107228
Article Google Scholar
Sandoval Alcocer JP, Bergel A, Valente M T (2016) Learning from source code history to identify performance failures. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 37–48
Sasaki H, Su F-H, Tanimoto T, Sethumadhavan S (2017) Why do programs have heavy tails?. In: Proceedings of the international symposium on workload characterization (IISWC). IEEE, pp 135–145
Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: an empirical study. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 61–72
Shang W, Hassan A E, Nasser M, Flora P (2015) Automated detection of performance regressions using regression models on clustered performance counters. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 15–26
The Rust Foundation (2021) Rust. https://www.github.com/rust-lang/rust (Accessed: July 20, 2021)
Tizpaz-Niari S, Cerny P, Chang B-Y E, Trivedi A (2018) Differential performance debugging with discriminant regression trees. In: Proceedings of the AAAI conference on artificial intelligence. AAAI
Tizpaz-Niari S, Černỳ P, Trivedi A (2020) Detecting and understanding real-world differential performance bugs in machine learning libraries. In: Proceedings of the international symposium on software testing and analysis (ISSTA). ACM, pp 189–199
Tsiligkaridis T (2016) Asynchronous decentralized algorithms for the noisy 20 questions problem. In: Proceedings of the international symposium on information theory (ISIT). IEEE, pp 2699–2703
Waeber R, Frazier P I, Henderson S G (2013) Bisection search with noisy responses. SIAM J Control Optim 51(3):2261–2279
Article MathSciNet Google Scholar
Weitzman MS (1970) Measures of overlap of income distributions of white and negro families in the united states, vol 3. US Bureau of the Census, USA
Google Scholar
YourKit (2018) YourKit. https://www.yourkit.com/ (Accessed: July 2, 2018)
Zaman S, Adams B, Hassan A E (2012) A qualitative study on performance bugs. In: Proceedings of the IEEE working conference on mining software repositories (MSR). IEEE Computer Society, pp 199–208
Zeller A (1999) Yesterday, my program worked. today, it does not. why?. In: Proceedings of the joint meeting of the european software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 253–266
Zhao Y, Xiao L, Wang X, Sun L, Chen B, Liu Y, Bondi A B (2020) How are performance issues caused and resolved?-an empirical study from a design perspective. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 181–192

Download references

Acknowledgements

Many thanks go out to Jacques Buchholz and the rest of the performance and reliability team at SAP Vancouver for supporting this project. Special thanks to Parry Fung, Roger Zhao, Mahbubur Shihab, and the anonymous EMSE reviewers for their feedback, which have served to improve the quality of the paper.

Author information

Authors and Affiliations

SAP Canada Inc., Vancouver, BC, Canada
Frolin S. Ocariza, Jr.

Authors

Frolin S. Ocariza, Jr.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Frolin S. Ocariza, Jr..

Additional information

Communicated by: Philipp Leitner

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ocariza, Jr., F.S. On the Effectiveness of Bisection in Performance Regression Localization. Empir Software Eng 27, 95 (2022). https://doi.org/10.1007/s10664-022-10152-3

Download citation

Accepted: 23 March 2022
Published: 30 April 2022
DOI: https://doi.org/10.1007/s10664-022-10152-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Effectiveness of Bisection in Performance Regression Localization

Abstract

Access this article

Similar content being viewed by others

Selective Bisection Debugging

A study of common bug fix patterns in Rust

Considering dependencies between bug reports to improve bugs triage

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

On the Effectiveness of Bisection in Performance Regression Localization

Abstract

Access this article

Similar content being viewed by others

Selective Bisection Debugging

A study of common bug fix patterns in Rust

Considering dependencies between bug reports to improve bugs triage

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation