Skip to main content
Log in

Pinpointing and repairing performance bottlenecks in concurrent programs

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developing concurrent software that is both correct and efficient is challenging. Past research has proposed various techniques that support developers in finding, understanding, and repairing concurrency-related correctness problems, such as missing or incorrect synchronization. In contrast, existing work provides little support for dealing with concurrency-related performance problems, such as unnecessary or inefficient synchronization. This paper presents SyncProf, a profiling approach that helps in identifying, localizing, and repairing performance bottlenecks in concurrent programs. The approach consists of a sequence of dynamic analyses that reason about relevant code locations with increasing precision while narrowing down performance problems and gathering data for avoiding them. A key novelty is a graph-based representation of relations between critical sections, which is the basis for computing the performance impact of a critical section and for identifying the root cause of a bottleneck. Once a bottleneck is identified, SyncProf searches for a suitable optimization strategy to avoid the problem, increasing the level of automation when repairing performance bottlenecks over a traditional, manual approach. We evaluate SyncProf on 25 versions of eleven C/C++ projects with both known and previously unknown synchronization bottlenecks. The results show that SyncProf effectively localizes the root causes of these bottlenecks with higher precision than a state of the art lock contention profiler, and that it suggests valuable strategies to repair the bottlenecks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. A path that directly impacts the completion time of a program.

  2. ANOVA tests the significance of group differences between two or more groups

  3. http://www.grammatech.com/products/codesurfer/overview.html.

  4. http://www.jonahharris.com/osdb/mysql/mysql-performance-whitepaper.pdf.

References

  • Alam K, Ahmad R, Ko K (2017) Enabling far-edge analytics: performance profiling of frequent pattern mining algorithms. IEEE Access

  • Ammons G, Choi J-D, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systems

  • Arlitt MF, Williamson CL (1996) Web server workload characterization: the search for invariants. 126–137

  • Artho C, Havelund K, Biere A (2003) High-level data races. J Softw Test Verif Reliab 13:207–227

    Article  Google Scholar 

  • Avritzer A, Kondek J, Liu D, Weyuker EJ (2002) Software performance testing based on workload characterization. In: Proceedings of the international workshop on software and performance, pp 17– 24

  • Barford P, Crovella M (2001) Critical path analysis of TCP, transactions. Proc Conf Appl Technol Architect Protoc Comput Commun 9:238–248

    Google Scholar 

  • Bois K, Eyerman S, Sartor JB, Eeckhout L (2013) Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In: Proceedings of the 40th annual international symposium on computer architecture, pp 511–522

  • Bond MD, Coons KE, McKinley KS (2010) PACER: proportional detection of data races. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 255–268

  • Burckhardt S, Kothari P, Musuvathi M, Nagarakatte S (2010) A randomized scheduler with probabilistic guarantees of finding bugs. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 167–178

  • Chen G, Stenstrom P (2012) Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications. In: Proceedings of the international conference on high performance computing, networking, storage and analysis

  • Choudhary A, Lu S, Pradel M (2017) Efficient detection of thread safety violations via coverage-guided generation of concurrent tests. In: International conference on software engineering, pp 266–277

  • Cleve H, Zeller A (2005) Locating causes of program failures. In: Proceedings of the international conference on software engineering, pp 342–351

  • Coons KE, Burckhardt S, Musuvathi M (2010) GAMBIT: effective unit testing for concurrency libraries. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 15–24

  • Curtsinger C, Berger ED (2015) Coz: finding code that counts with causal profiling. In: Proceedings of the ACM symposium on operating systems principles, pp 184–197

  • Diagnosing Lock Contention with the Concurrency Visualizer (2010) Microsoft MSDN

  • Draheim D, Grundy J, Hosking J, Lutteroth C, Weber G (2006) Realistic load testing of web applications. In: Conference on software maintenance and reengineering, pp 11–70

  • DRD (2015) A thread error detector, http://valgrind.org/docs/manual/drd-manual.html

  • Identify Thread Contention (2015) https://community.dynatrace.com

  • Intel® vtuneTM amplifier xe (2014) http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/

  • Effinger-Dean L, Lucia B, Ceze L, Grossman D, Boehm H-J (2012) Ifrit: interference-free regions for dynamic data-race detection. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 467–484

  • Eyerman S, Eeckhout L (2010) Modeling critical sections in amdahl’s law and its implications for multicore design. In: Proceedings of the international symposium on computer architecture, pp 362–370

  • Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the international symposium on principles of programming languages, pp 256–267

  • Flanagan C, Qadeer S (2003) A type and effect system for atomicity. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 338–349

  • Fouché S, Cohen MB, Porter A (2007) Towards incremental adaptive covering arrays. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 557–560

  • Gong L, Pradel M, Sen K (2015) JITProf pinpointing JIT-unfriendly JavaScript code. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 357–368

  • Gray JN, Lorie RA, Putzolu GR (1975) Granularity of locks in a shared data base. In: Proceedings of the international conference on very large data bases, pp 428–451

  • Gu R, Jin G, Song L, Zhu L, Lu S (2015) What change history tells us about thread synchronization. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 426–438

  • Gupta R, Epstein M (1990) Achieving low cost synchronization in a multiprocessor system. Fut Gen Comput Syst 6(3):255–269

    Article  Google Scholar 

  • Han S, Dang Y, Ge S, Zhang D, Xie T (2012) Performance debugging in the large via mining millions of stack traces. In: Proceedings of the international conference on software engineering, pp 145–155

  • Heinrich M, Chaudhuri M (2003) Ocean warning: avoid drowning. SIGARCH Comput Archit News 31(3):30–32

    Article  Google Scholar 

  • Heirman W, Carlson T, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: International symposium on workload characterization, pp 38–49

  • Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 77–88

  • Joao JA, Suleman MA, Mutlu O, Patt YN (2012) Bottleneck identification and scheduling in multithreaded applications. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 223–234

  • Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of international conference on automated software engineering, pp 273–282

  • Joshi S, Lahiri SK, Lal A (2012) Underspecified harnesses and interleaved bugs. In: Proceedings of the international symposium on principles of programming languages, pp 19–30

  • Kahlon V, Sinha N, Kruus E, Zhang Y (2009) Static data race detection for concurrent programs with asynchronous calls. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 13–22

  • Koskinen E, Herlihy M (2008) Dreadlocks: efficient deadlock detection. In: Proceedings of the symposium on parallelism in algorithms and architectures, pp 297–303

  • Lozi J -P, David F, Thomas G, Lawall J, Muller G (2012) Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In: USENIX annual technical conference, pp 6–6

  • Luk C -K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 190–200

  • Marino D, Musuvathi M, Narayanasamy S (2009) LiteRace: effective sampling for lightweight data-race detection. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 134–143

  • Mertler CA, Vannatta RA (2002) Advanced and multivariate statistical methods, Pyrczak, Los Angeles

  • Michael MM, Scott ML (1996) Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the ACM symposium on principles of distributed computing, pp 267– 275

  • Miller B, Clark M, Hollingsworth J, Kierstead S, Lim S -S, Torzewski T (1990) IPS-2: the second generation of a parallel program measurement system. In: IEEE transactions on parallel and distributed systems, pp 206–217

  • Mosberger D, Jin T (1998) A tool for measuring web server performance. ACM SIGMETRICS Perform Eval Rev 26(3):31–37

    Article  Google Scholar 

  • Musuvathi M, Qadeer S, Ball T, Basler G, Nainar PA, Neamtiu I (2008) Finding and reproducing Heisenbugs in concurrent programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 267–280

  • Naik M, Park C-S, Sen K, Gay D (2009) Effective static deadlock detection. In: Proceedings of the international conference on software engineering, pp 386–396

  • Nistor A, Luo Q, Pradel M, Gross TR, Marinov D (2012) Ballerina: automatic generation and clustering of efficient random unit tests for multithreaded code. In: Proceedings of the international conference on software engineering, pp 727–737

  • Nistor A, Song L, Marinov D, Lu S (2013) Toddler: detecting performance problems via similar memory-access patterns. In: Proceedings of the international conference on software engineering, pp 562–571

  • Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the sbt debugging tool. In: Proceedings of the international symposium on parallel and distributed processing, p 290.2

  • Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2007) UTS: an unbalanced tree search benchmark. In: LCPC, pp 235–250

  • Ongoing work on lock contention in QEMU driver (2013). https://www.redhat.com/archives/libvir-list/2013-May/msg01247.html

  • Pradel M, Gross TR (2012) Fully automatic and precise detection of thread safety violations. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 521– 530

  • Pradel M, Huggler M, Gross TR (2014) Performance regression testing of concurrent classes. In: Proceedings of the international symposium on software testing and analysis, pp 13–25

  • Pradel M, Schuh P, Necula G, Sen K (2014) EventBreak: analyzing the responsiveness of user interfaces through performance-guided test generation. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications

  • Rajwar R, Goodman JR (2001) Speculative lock elision: enabling highly concurrent multithreaded execution. In: Proceedings of the ACM/IEEE international symposium on microarchitecture, pp 294–305

  • Roy A, Hand S, Harris T (2009) A runtime system for software lock elision. In: Proceedings of the SIGOPS/EuroSys European conference on computer systems, pp 261–274

  • Sahelices B, Ibáñez P, Viñals V, Llabería J M (2009) A methodology to characterize critical section bottlenecks in dsm multiprocessors. In: Proceedings of the international euro-par conference on parallel processing, pp 149–161

  • Samak M, Ramanathan MK (2014) Trace driven dynamic deadlock detection and reproduction. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 29–42

  • Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: an empirical study. In: Proceedings of the international conference on software engineering

  • Selakovic M, Glaser T, Pradel M (2017) An actionable performance profiler for optimizing the order of evaluations. In: International symposium on software testing and analysis, pp 170–180

  • Sen K (2007) Effective random testing of concurrent programs. In: Proceedings of international conference on automated software engineering, pp 323–332

  • Sen K (2008) Race directed random testing of concurrent programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–21

  • Shacham O, Bronson N, Aiken A, Sagiv M, Vechev M, Yahav E (2011) Testing atomicity of composed concurrent operations. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 51–64

  • Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–22

  • Tallent NR, Mellor-Crummey JM, Porterfield A (2010) Analyzing lock contention in multithreaded applications. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 269–280

  • Toffola LD, Pradel M, Gross TR (2015) Performance problems you can fix: a dynamic analysis of memoization opportunities. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications

  • Visser W, Havelund K, Brat GP, Park S, Lerda F (2003) Model checking programs. Autom Softw Eng 10(2):203–232

    Article  Google Scholar 

  • von Praun C, Gross TR (2003) Static conflict analysis for multi-threaded object-oriented programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 115–128

  • Wang L, Stoller SD (2006) Accurate and efficient runtime detection of atomicity errors in concurrent programs. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 137–146

  • Wang Y, Kelly T, Kudlur M, Lafortune S, Mahlke S (2008) Gadara: dynamic deadlock avoidance for multithreaded programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 281–294

  • Wert A, Happe J, Happe L (2013) Supporting swift reaction: automatically uncovering performance problems by systematic experiments. In: Proceedings of the international conference on software engineering, pp 552–561

  • Williams A, Thies W, Ernst MD (2005) Static deadlock detection for java libraries. In: European conference on object-oriented programming, pp 602–629

  • Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the international symposium on computer architecture, pp 24–36

  • Xu M, Bodík R, Hill MD (2005) A serializability violation detector for shared-memory server programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 1–14

  • Yu T, Pradel M (2016) Syncprof: detecting, localizing, and optimizing synchronization bottlenecks. In: Proceedings of the international symposium on software testing and analysis, pp 389–400

  • Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: a device-driver case. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 193–206

  • Zheng L, Liao X, He B, Wu S, Jin H (2015) On performance debugging of unnecessary lock contentions on multicore processors: a replay-based approach. In: Proceedings of the IEEE/ACM international symposium on code generation and optimization

Download references

Acknowledgments

This research is supported in part by the NSF grants CCF-1464032 and CCF-1652149, by the German Research Foundation within the Emmy Noether project “ConcSys” and by the German Federal Ministry of Education and Research and the Hessian Ministry of Science and the Arts within “CRISP”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingting Yu.

Additional information

Communicated by: Martin Monperrus and Westley Weimer

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, T., Pradel, M. Pinpointing and repairing performance bottlenecks in concurrent programs. Empir Software Eng 23, 3034–3071 (2018). https://doi.org/10.1007/s10664-017-9578-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9578-1

Keywords

Navigation