Skip to main content
Log in

On the Effectiveness of Bisection in Performance Regression Localization

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Performance regressions can have a drastic impact on the usability of a software application. The crucial task of localizing such regressions can be achieved using bisection, which attempts to find the bug-introducing commit using binary search. This approach is used extensively by many development teams, but it is an inherently heuristical approach when applied to performance regressions, and therefore, does not have correctness guarantees. Unfortunately, bisection is also time-consuming, which implies the need to assess its effectiveness prior to running it. To this end, the goal of this study is to analyze the effectiveness of bisection for performance regressions. This goal is achieved by first formulating a metric that quantifies the probability of a successful bisection, and extracting a list of input parameters – the contributing properties – that potentially impact its value; a sensitivity analysis is then conducted on these properties to understand the extent of their impact. Furthermore, an empirical study of 310 bug reports describing performance regressions in 17 real-world applications is conducted, to better understand what these contributing properties look like in practice. The results show that while bisection can be highly effective in localizing real-world performance regressions, this effectiveness is sensitive to the contributing properties, especially the choice of baseline and the distributions at each commit. The results also reveal that most bug reports do not provide sufficient information to help developers properly choose values and metrics that can maximize the effectiveness, which implies the need for measures to fill this information gap.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. For simplicity, we will also refer to software performance regressions simply as performance regressions

  2. https://people.ece.ubc.ca/frolino/projects/perf-bisect-effectiveness/

  3. In fact, this turns out to be the main reason the effectiveness measure tends to be low when the transition index is somewhere in the middle, for all three distribution types

References

  • Ahmed TM, Bezemer C-P, Chen T-H, Hassan AE, Shang W (2016) Studying the effectiveness of application performance management (apm) tools for detecting performance regressions for web applications: an experience report. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 1–12

  • Akinshin A (2019) Pro. net benchmarking: The art of performance measurement. Springer, Berlin

    Book  Google Scholar 

  • Alcocer J P S, Beck F, Bergel A (2019) Performance evolution matrix: Visualizing performance variations along software versions. In: Proceedings of the working conference on software visualization (VISSOFT). IEEE, pp 1–11

  • Alcocer J P S, Bergel A (2015) Tracking down performance variation against source code evolution. ACM SIGPLAN Not 51(2):129–139

    Article  Google Scholar 

  • An G, Yoo S (2021) Reducing the search space of bug inducing commits using failure coverage. In: Proceedings of the joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 1459–1462

  • Arif M M, Shang W, Shihab E (2018) Empirical study on the discrepancy between performance testing results from virtual and physical environments. Empir Softw Eng 23(3):1490–1518

    Article  Google Scholar 

  • Artho C (2011) Iterative delta debugging. Int J Softw Tools Technol Transfer (STTT) 13(3):223–246

    Article  Google Scholar 

  • Automattic (2021) Automattic WordPress Calypso. https://www.github.com/Automattic/wp-calypso (Accessed: July 20, 2021)

  • Ben-Or M, Hassidim A (2008) The bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In: Proceedings of the IEEE symposium on foundations of computer science. IEEE, pp 221–230

  • Bezemer C, Milon E, Zaidman A, Pouwelse J (2014) Detecting and analyzing I/O performance regressions. J Softw Evol Process (JSEP) 26 (12):1193–1212

    Article  Google Scholar 

  • Bezemer C-P, Pouwelse J, Gregg B (2015) Understanding software performance regressions using differential flame graphs. In: Proceedings of the international conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 535–539

  • Bittner D M, Sarwate A D, Wright R N (2018) Using noisy binary search for differentially private anomaly detection. In: Proceedings of the international symposium on cyber security cryptography and machine learning (CSCML). Springer, pp 20–37

  • Chen J, Shang W (2017) An exploratory study of performance regression introducing code changes. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE Computer Society, pp 341–352

  • Chen T, Guo Q, Temam O, Wu Y, Bao Y, Xu Z, Chen Y (2014) Statistical performance comparisons of computers. IEEE Trans Comput 64(5):1442–1455

    Article  MathSciNet  Google Scholar 

  • Chen Y, Winter S, Suri N (2019) Inferring performance bug patterns from developer commits. In: Proceedings of the international symposium on software reliability engineering (ISSRE). IEEE Computer Society, pp 70–81

  • Cockroach Labs (2021) CockroachDB. https://www.github.com/cockroachdb/cockroach (Accessed: July 20, 2021)

  • Couder C (2009) Fighting regressions with git bisect. https://git-scm.com/docs/git-bisect-lk2009 (Accessed: August 9, 2021)

  • Crovella M E (2000) Performance evaluation with heavy tailed distributions. In: International conference on modelling techniques and tools for computer performance evaluation (TOOLS). Springer, pp 1–9

  • Crovella M E, Taqqu M S, Bestavros A (1998) Heavy-tailed probability distributions in the world wide web. A practical guide to heavy tails: statistical techniques and applications 1:3–26

    MATH  Google Scholar 

  • Dahl R (2021) Node.js. https://www.github.com/nodejs/node (Accessed: July 20, 2021)

  • Della Toffola L, Pradel M, Gross T R (2015) Performance problems you can fix: A dynamic analysis of memoization opportunities. In: Proceedings of the international conference on object-oriented programming, systems, languages, and applications (OOPSLA). ACM, pp 607–622

  • Dereniowski D, Łukasiewicz A, Uznański P (2021) An efficient noisy binary search in graphs via median approximation. In: Proceedings of the international workshop on combinatorial algorithms. Springer, pp 265–281

  • Dynatrace (2018) Dynatrace. https://www.dynatrace.com/ (Accessed: January 8, 2018)

  • Elastic NV (2021) Elasticsearch. https://www.github.com/elastic/elasticsearch (Accessed: July 20, 2021)

  • Epa N S, Gan J, Wirth A (2019) Result-sensitive binary search with noisy information. In: Proceedings of the international symposium on algorithms and computation (ISAAC). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik

  • Exadv1 (2021) Space Station 13. https://www.github.com/tgstation/tgstation (Accessed: July 20, 2021)

  • Fossil (2006) Fossil bisect command documentation. https://www.fossil-scm.org/fossil/help/bisect (Accessed: August 11, 2021)

  • Gaviar A (2019) GitHub’s Top 100 Most Valuable Repositories Out of 96 Million. https://hackernoon.com/githubs-top-100-most-valuable-repositories-out-of-96-million-bb48caa9eb0b (Accessed: July 19, 2021)

  • Git (2009) Git Bisect Documentation. https://git-scm.com/docs/git-bisect (Accessed: August 11, 2021)

  • Google (2018) Chrome DevTools Overview. https://developer.chrome.com/devtools (Accessed: February 19, 2018)

  • Google (2021) Bisecting performance regressions. https://chromium.googlesource.com/chromium/src/+/refs/heads/main/docs/speed/bisects.md (Accessed: November 29, 2021)

  • Google (2021) Google Flutter. https://www.github.com/flutter/flutter (Accessed: July 20, 2021)

  • Google (2021) Kubernetes. https://www.github.com/kubernetes/kubernetes (Accessed: July 20, 2021)

  • Google (2021) TensorFlow. https://www.github.com/tensorflow/tensorflow (Accessed: July 20, 2021)

  • Graham S L, Kessler P B, Mckusick M K (1982) Gprof: A call graph execution profiler. In: Proceedings of the SIGPLAN symposium on compiler construction. ACM, pp 120–126

  • Gregg B (2016) The flame graph: This visualization of software execution is a new necessity for performance profiling and debugging. Queue 14(2):91–110

    Article  MathSciNet  Google Scholar 

  • Gross T (1997) Bisection debugging. In: Proceedings of the international workshop on automatic debugging (AADEBUG). Linkøping University Electronic Press, pp 185–191

  • Han X, Carroll D, Yu T (2019) Reproducing performance bug reports in server applications: The researchers’ experiences. J Syst Softw 156:268–282

    Article  Google Scholar 

  • Han X, Yu T (2016) An empirical study on performance bugs for highly configurable software systems. In: Proceedings of the international symposium on empirical software engineering and measurement. ACM/IEEE, pp 1–10

  • Harchol-Balter M (2013) Performance modeling and design of computer systems: queueing theory in action. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Heger C, Happe J, Farahbod R (2013) Automated root cause isolation of performance regressions during software development. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 27–38

  • Inman HF, Bradley EL Jr (1989) The overlapping coefficient as a measure of agreement between probability distributions and point estimation of the overlap of two normal densities. Commun Stat-Theory Methods 18(10):3851–3874

    Article  MathSciNet  Google Scholar 

  • Jedynak B, Frazier P I, Sznitman R (2012) Twenty questions with noise: Bayes optimal policies for entropy loss. J Appl Probab 49(1):114–136

    Article  MathSciNet  Google Scholar 

  • Jung J, Hu H, Arulraj J, Kim T, Kang W (2019) Apollo: Automatic detection and diagnosis of performance regressions in database systems. Proceedings of the VLDB Endowment 13(1):57–70

    Article  Google Scholar 

  • Karp R M, Kleinberg R (2007) Noisy binary search and its applications. In: Proceedings of the ACM-SIAM symposium on discrete algorithms. ACM, pp 881–890

  • Keenan J E (2019) Multisection: When Bisection Isn’t Enough to Debug a Problem – The Perl Conference 2019. https://www.youtube.com/watch?v=05CwdTRt6AM (Accessed: November 18, 2021)

  • Larabel M (2009a) Autonomously finding performance regressions in the linux kernel. https://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=2 (Accessed: August 11, 2021)

  • Larabel M (2009b) Phoromatic tracker launches to monitor linux performance. https://www.phoronix.com/scan.php?page=article&item=phoromatic_tracker&num=2 (Accessed: August 11, 2021)

  • Leitner P, Bezemer C-P (2017) An exploratory study of the state of practice of performance testing in java-based open source projects. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 373–384

  • Linares-Vásquez M, Vendome C, Luo Q, Poshyvanyk D (2015) How developers detect and fix performance bottlenecks in Android apps. In: Proceedings of the international conference on software maintenance and evolution (ICSME). IEEE Computer Society, pp 352–361

  • Luo Q, Poshyvanyk D, Grechanik M (2016) Mining performance regression inducing code changes in evolving software. In: Proceedings of the international conference on mining software repositories (MSR). ACM, pp 25–36

  • Malik H, Adams B, Hassan AE (2010) Pinpointing the subsystems responsible for the performance deviations in a load test. In: Proceedings of the international symposium on software reliability engineering (ISSRE). IEEE Computer Society, pp 201–210

  • Mercurial (2005) Mercurial bisect command documentation. https://www.selenic.com/mercurial/hg.1.html (Accessed: August 11, 2021)

  • Microsoft (2015) How to: Compare Performance Data Files. https://msdn.microsoft.com/en-us/library/bb385753.aspx (Accessed: February 19, 2018)

  • Microsoft (2018) Startup performance regressed. https://github.com/microsoft/vscode/issues/42513 (Accessed: June 20, 2021)

  • Microsoft (2021a) Microsoft .NET CoreFX. https://www.github.com/dotnet/runtime (Accessed: July 20, 2021)

  • Microsoft (2021b) Microsoft .NET Roslyn. https://www.github.com/dotnet/roslyn (Accessed: July 20, 2021)

  • Microsoft (2021c) Microsoft Visual Studio Code. https://www.github.com/microsoft/vscode (Accessed: July 20, 2021)

  • Microsoft (2021) [Perf -6%] Regression in System.Text.Encodings.Web.Tests.Perf_Encoders. https://github.com/dotnet/runtime/issues/48519 (Accessed: June 20, 2021)

  • Moby Project (2021) Moby. https://www.github.com/moby/moby (Accessed: July 20, 2021)

  • Mozilla Corporation (2021) Servo. https://www.github.com/servo/servo (Accessed: July 20, 2021)

  • Murphy W (2018) Investigating performance changes with git bisect. https://willmurphyscode.net/2018/02/07/investigating-performance-changes-with-git-bisect/l (Accessed: August 11, 2021)

  • Najafi A, Rigby P C, Shang W (2019) Bisecting commits and modeling commit risk during testing. In: Proceedings of joint meeting on european software engineering conference and symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 279–289

  • Neville-Neil GV (2021) Divide and conquer: The use and limits of bisection. Queue 19(3):37–39

    Article  Google Scholar 

  • Nguyen Thanh HD, Nagappan M, Hassan A E, Nasser M, Flora P (2014) An industrial case study of automatically identifying performance regression-causes. In: Proceedings of the international working conference on mining software repositories (MSR). ACM, pp 232–241

  • Nistor A, Chang P-C, Radoi C, Lu S (2015) Caramel: detecting and fixing performance problems that have non-intrusive fixes. In: Proceedings of the International Conference on Software Engineering (ICSE). IEEE Computer Society, pp 902–912

  • Nistor A, Jiang T, Tan L (2013) Discovering, reporting, and fixing performance bugs. In: Proceedings of the working conference on mining software repositories (MSR). IEEE Computer Society, pp 237–246

  • NixOS (2021) NixOS package collection. https://www.github.com/NixOS/nixpkgs (Accessed: July 20, 2021)

  • Nowak R (2009) Noisy generalized binary search. In: Proceedings of advances in neural iinformation processing systems, pp 1366–1374

  • Ocariza F (2020) Web Application Debugging – UBC Guest Lecture. https://www.youtube.com/watch?v=gNa247IaaGM (Accessed: June 20, 2021)

  • Ocariza F, Bajaj K, Pattabiraman K, Mesbah A (2013) An empirical study of client-side JavaScript bugs. In: Proceedings of the international symposium on empirical software engineering and measurement (ESEM). IEEE Computer Society, pp 55–64

  • Ocariza F, Zhao B (2021) Localizing software performance regressions in web applications by comparing execution timelines. Software Testing, Verification and Reliability (STVR) 31(5):e1750

    Google Scholar 

  • Ocariza F S, Bajaj K, Pattabiraman K, Mesbah A (2017) A study of causes and consequences of client-side javascript bugs. IEEE Trans Softw Eng 43(2):128–144

    Article  Google Scholar 

  • Olianas D, Leotta M, Ricca F, Biagiola M, Tonella P (2021) STILE: a tool for parallel execution of e2e web test scripts. In: Proceedings of the international conference on software testing, verification and validation (ICST). IEEE Computer Society, pp 460–465

  • Pelc A (1989) Searching with known error probability. Theor Comput Sci 63(2):185–202

    Article  MathSciNet  Google Scholar 

  • Pradel M, Schuh P, Sen K (2014) EventBreak: analyzing the responsiveness of user interfaces through performance-guided test generation. In: Proceedings of the international conference on object oriented programming systems languages & applications (OOPSLA). ACM, pp 33–47

  • Red Hat (2021a) Ansible. https://www.github.com/ansible/ansible (Accessed: July 20, 2021)

  • Red Hat (2021b) Red Hat OpenShift. https://www.github.com/openshift/origin (Accessed: July 20, 2021)

  • Rivest R L, Meyer A R, Kleitman D J, Winklmann K, Spencer J (1980) Coping with errors in binary search procedures. J Comput Syst Sci 20 (3):396–404

    Article  MathSciNet  Google Scholar 

  • Rogora D, Carzaniga A, Diwan A, Hauswirth M, Soulé R (2020) Analyzing system performance with probabilistic performance annotations. In: Proceedings of the european conference on computer systems (EuroSys), pp 1–14

  • Saha R, Gligoric M (2017) Selective bisection debugging. In: Proceedings of the international conference on fundamental approaches to software engineering (FASE). Springer, pp 60–77

  • Sánchez A B, Delgado-Pérez P, Medina-Bulo I, Segura S (2020) Tandem: A taxonomy and a dataset of real-world performance bugs. IEEE Access 8:107214–107228

    Article  Google Scholar 

  • Sandoval Alcocer JP, Bergel A, Valente M T (2016) Learning from source code history to identify performance failures. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 37–48

  • Sasaki H, Su F-H, Tanimoto T, Sethumadhavan S (2017) Why do programs have heavy tails?. In: Proceedings of the international symposium on workload characterization (IISWC). IEEE, pp 135–145

  • Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: an empirical study. In: Proceedings of the international conference on software engineering (ICSE). ACM, pp 61–72

  • Shang W, Hassan A E, Nasser M, Flora P (2015) Automated detection of performance regressions using regression models on clustered performance counters. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 15–26

  • The Rust Foundation (2021) Rust. https://www.github.com/rust-lang/rust (Accessed: July 20, 2021)

  • Tizpaz-Niari S, Cerny P, Chang B-Y E, Trivedi A (2018) Differential performance debugging with discriminant regression trees. In: Proceedings of the AAAI conference on artificial intelligence. AAAI

  • Tizpaz-Niari S, Černỳ P, Trivedi A (2020) Detecting and understanding real-world differential performance bugs in machine learning libraries. In: Proceedings of the international symposium on software testing and analysis (ISSTA). ACM, pp 189–199

  • Tsiligkaridis T (2016) Asynchronous decentralized algorithms for the noisy 20 questions problem. In: Proceedings of the international symposium on information theory (ISIT). IEEE, pp 2699–2703

  • Waeber R, Frazier P I, Henderson S G (2013) Bisection search with noisy responses. SIAM J Control Optim 51(3):2261–2279

    Article  MathSciNet  Google Scholar 

  • Weitzman MS (1970) Measures of overlap of income distributions of white and negro families in the united states, vol 3. US Bureau of the Census, USA

    Google Scholar 

  • YourKit (2018) YourKit. https://www.yourkit.com/ (Accessed: July 2, 2018)

  • Zaman S, Adams B, Hassan A E (2012) A qualitative study on performance bugs. In: Proceedings of the IEEE working conference on mining software repositories (MSR). IEEE Computer Society, pp 199–208

  • Zeller A (1999) Yesterday, my program worked. today, it does not. why?. In: Proceedings of the joint meeting of the european software engineering conference and the symposium on the foundations of software engineering (ESEC/FSE). ACM, pp 253–266

  • Zhao Y, Xiao L, Wang X, Sun L, Chen B, Liu Y, Bondi A B (2020) How are performance issues caused and resolved?-an empirical study from a design perspective. In: Proceedings of the international conference on performance engineering (ICPE). ACM, pp 181–192

Download references

Acknowledgements

Many thanks go out to Jacques Buchholz and the rest of the performance and reliability team at SAP Vancouver for supporting this project. Special thanks to Parry Fung, Roger Zhao, Mahbubur Shihab, and the anonymous EMSE reviewers for their feedback, which have served to improve the quality of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Frolin S. Ocariza, Jr..

Additional information

Communicated by: Philipp Leitner

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ocariza, Jr., F.S. On the Effectiveness of Bisection in Performance Regression Localization. Empir Software Eng 27, 95 (2022). https://doi.org/10.1007/s10664-022-10152-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10152-3

Keywords

Navigation