Abstract
Developing concurrent software that is both correct and efficient is challenging. Past research has proposed various techniques that support developers in finding, understanding, and repairing concurrency-related correctness problems, such as missing or incorrect synchronization. In contrast, existing work provides little support for dealing with concurrency-related performance problems, such as unnecessary or inefficient synchronization. This paper presents SyncProf, a profiling approach that helps in identifying, localizing, and repairing performance bottlenecks in concurrent programs. The approach consists of a sequence of dynamic analyses that reason about relevant code locations with increasing precision while narrowing down performance problems and gathering data for avoiding them. A key novelty is a graph-based representation of relations between critical sections, which is the basis for computing the performance impact of a critical section and for identifying the root cause of a bottleneck. Once a bottleneck is identified, SyncProf searches for a suitable optimization strategy to avoid the problem, increasing the level of automation when repairing performance bottlenecks over a traditional, manual approach. We evaluate SyncProf on 25 versions of eleven C/C++ projects with both known and previously unknown synchronization bottlenecks. The results show that SyncProf effectively localizes the root causes of these bottlenecks with higher precision than a state of the art lock contention profiler, and that it suggests valuable strategies to repair the bottlenecks.
Similar content being viewed by others
Notes
A path that directly impacts the completion time of a program.
ANOVA tests the significance of group differences between two or more groups
References
Alam K, Ahmad R, Ko K (2017) Enabling far-edge analytics: performance profiling of frequent pattern mining algorithms. IEEE Access
Ammons G, Choi J-D, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systems
Arlitt MF, Williamson CL (1996) Web server workload characterization: the search for invariants. 126–137
Artho C, Havelund K, Biere A (2003) High-level data races. J Softw Test Verif Reliab 13:207–227
Avritzer A, Kondek J, Liu D, Weyuker EJ (2002) Software performance testing based on workload characterization. In: Proceedings of the international workshop on software and performance, pp 17– 24
Barford P, Crovella M (2001) Critical path analysis of TCP, transactions. Proc Conf Appl Technol Architect Protoc Comput Commun 9:238–248
Bois K, Eyerman S, Sartor JB, Eeckhout L (2013) Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In: Proceedings of the 40th annual international symposium on computer architecture, pp 511–522
Bond MD, Coons KE, McKinley KS (2010) PACER: proportional detection of data races. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 255–268
Burckhardt S, Kothari P, Musuvathi M, Nagarakatte S (2010) A randomized scheduler with probabilistic guarantees of finding bugs. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 167–178
Chen G, Stenstrom P (2012) Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications. In: Proceedings of the international conference on high performance computing, networking, storage and analysis
Choudhary A, Lu S, Pradel M (2017) Efficient detection of thread safety violations via coverage-guided generation of concurrent tests. In: International conference on software engineering, pp 266–277
Cleve H, Zeller A (2005) Locating causes of program failures. In: Proceedings of the international conference on software engineering, pp 342–351
Coons KE, Burckhardt S, Musuvathi M (2010) GAMBIT: effective unit testing for concurrency libraries. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 15–24
Curtsinger C, Berger ED (2015) Coz: finding code that counts with causal profiling. In: Proceedings of the ACM symposium on operating systems principles, pp 184–197
Diagnosing Lock Contention with the Concurrency Visualizer (2010) Microsoft MSDN
Draheim D, Grundy J, Hosking J, Lutteroth C, Weber G (2006) Realistic load testing of web applications. In: Conference on software maintenance and reengineering, pp 11–70
DRD (2015) A thread error detector, http://valgrind.org/docs/manual/drd-manual.html
Identify Thread Contention (2015) https://community.dynatrace.com
Intel® vtuneTM amplifier xe (2014) http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/
Effinger-Dean L, Lucia B, Ceze L, Grossman D, Boehm H-J (2012) Ifrit: interference-free regions for dynamic data-race detection. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 467–484
Eyerman S, Eeckhout L (2010) Modeling critical sections in amdahl’s law and its implications for multicore design. In: Proceedings of the international symposium on computer architecture, pp 362–370
Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the international symposium on principles of programming languages, pp 256–267
Flanagan C, Qadeer S (2003) A type and effect system for atomicity. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 338–349
Fouché S, Cohen MB, Porter A (2007) Towards incremental adaptive covering arrays. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 557–560
Gong L, Pradel M, Sen K (2015) JITProf pinpointing JIT-unfriendly JavaScript code. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 357–368
Gray JN, Lorie RA, Putzolu GR (1975) Granularity of locks in a shared data base. In: Proceedings of the international conference on very large data bases, pp 428–451
Gu R, Jin G, Song L, Zhu L, Lu S (2015) What change history tells us about thread synchronization. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 426–438
Gupta R, Epstein M (1990) Achieving low cost synchronization in a multiprocessor system. Fut Gen Comput Syst 6(3):255–269
Han S, Dang Y, Ge S, Zhang D, Xie T (2012) Performance debugging in the large via mining millions of stack traces. In: Proceedings of the international conference on software engineering, pp 145–155
Heinrich M, Chaudhuri M (2003) Ocean warning: avoid drowning. SIGARCH Comput Archit News 31(3):30–32
Heirman W, Carlson T, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: International symposium on workload characterization, pp 38–49
Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 77–88
Joao JA, Suleman MA, Mutlu O, Patt YN (2012) Bottleneck identification and scheduling in multithreaded applications. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 223–234
Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of international conference on automated software engineering, pp 273–282
Joshi S, Lahiri SK, Lal A (2012) Underspecified harnesses and interleaved bugs. In: Proceedings of the international symposium on principles of programming languages, pp 19–30
Kahlon V, Sinha N, Kruus E, Zhang Y (2009) Static data race detection for concurrent programs with asynchronous calls. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 13–22
Koskinen E, Herlihy M (2008) Dreadlocks: efficient deadlock detection. In: Proceedings of the symposium on parallelism in algorithms and architectures, pp 297–303
Lozi J -P, David F, Thomas G, Lawall J, Muller G (2012) Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In: USENIX annual technical conference, pp 6–6
Luk C -K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 190–200
Marino D, Musuvathi M, Narayanasamy S (2009) LiteRace: effective sampling for lightweight data-race detection. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 134–143
Mertler CA, Vannatta RA (2002) Advanced and multivariate statistical methods, Pyrczak, Los Angeles
Michael MM, Scott ML (1996) Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the ACM symposium on principles of distributed computing, pp 267– 275
Miller B, Clark M, Hollingsworth J, Kierstead S, Lim S -S, Torzewski T (1990) IPS-2: the second generation of a parallel program measurement system. In: IEEE transactions on parallel and distributed systems, pp 206–217
Mosberger D, Jin T (1998) A tool for measuring web server performance. ACM SIGMETRICS Perform Eval Rev 26(3):31–37
Musuvathi M, Qadeer S, Ball T, Basler G, Nainar PA, Neamtiu I (2008) Finding and reproducing Heisenbugs in concurrent programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 267–280
Naik M, Park C-S, Sen K, Gay D (2009) Effective static deadlock detection. In: Proceedings of the international conference on software engineering, pp 386–396
Nistor A, Luo Q, Pradel M, Gross TR, Marinov D (2012) Ballerina: automatic generation and clustering of efficient random unit tests for multithreaded code. In: Proceedings of the international conference on software engineering, pp 727–737
Nistor A, Song L, Marinov D, Lu S (2013) Toddler: detecting performance problems via similar memory-access patterns. In: Proceedings of the international conference on software engineering, pp 562–571
Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the sbt debugging tool. In: Proceedings of the international symposium on parallel and distributed processing, p 290.2
Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2007) UTS: an unbalanced tree search benchmark. In: LCPC, pp 235–250
Ongoing work on lock contention in QEMU driver (2013). https://www.redhat.com/archives/libvir-list/2013-May/msg01247.html
Pradel M, Gross TR (2012) Fully automatic and precise detection of thread safety violations. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 521– 530
Pradel M, Huggler M, Gross TR (2014) Performance regression testing of concurrent classes. In: Proceedings of the international symposium on software testing and analysis, pp 13–25
Pradel M, Schuh P, Necula G, Sen K (2014) EventBreak: analyzing the responsiveness of user interfaces through performance-guided test generation. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications
Rajwar R, Goodman JR (2001) Speculative lock elision: enabling highly concurrent multithreaded execution. In: Proceedings of the ACM/IEEE international symposium on microarchitecture, pp 294–305
Roy A, Hand S, Harris T (2009) A runtime system for software lock elision. In: Proceedings of the SIGOPS/EuroSys European conference on computer systems, pp 261–274
Sahelices B, Ibáñez P, Viñals V, Llabería J M (2009) A methodology to characterize critical section bottlenecks in dsm multiprocessors. In: Proceedings of the international euro-par conference on parallel processing, pp 149–161
Samak M, Ramanathan MK (2014) Trace driven dynamic deadlock detection and reproduction. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 29–42
Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: an empirical study. In: Proceedings of the international conference on software engineering
Selakovic M, Glaser T, Pradel M (2017) An actionable performance profiler for optimizing the order of evaluations. In: International symposium on software testing and analysis, pp 170–180
Sen K (2007) Effective random testing of concurrent programs. In: Proceedings of international conference on automated software engineering, pp 323–332
Sen K (2008) Race directed random testing of concurrent programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–21
Shacham O, Bronson N, Aiken A, Sagiv M, Vechev M, Yahav E (2011) Testing atomicity of composed concurrent operations. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 51–64
Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–22
Tallent NR, Mellor-Crummey JM, Porterfield A (2010) Analyzing lock contention in multithreaded applications. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 269–280
Toffola LD, Pradel M, Gross TR (2015) Performance problems you can fix: a dynamic analysis of memoization opportunities. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications
Visser W, Havelund K, Brat GP, Park S, Lerda F (2003) Model checking programs. Autom Softw Eng 10(2):203–232
von Praun C, Gross TR (2003) Static conflict analysis for multi-threaded object-oriented programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 115–128
Wang L, Stoller SD (2006) Accurate and efficient runtime detection of atomicity errors in concurrent programs. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 137–146
Wang Y, Kelly T, Kudlur M, Lafortune S, Mahlke S (2008) Gadara: dynamic deadlock avoidance for multithreaded programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 281–294
Wert A, Happe J, Happe L (2013) Supporting swift reaction: automatically uncovering performance problems by systematic experiments. In: Proceedings of the international conference on software engineering, pp 552–561
Williams A, Thies W, Ernst MD (2005) Static deadlock detection for java libraries. In: European conference on object-oriented programming, pp 602–629
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the international symposium on computer architecture, pp 24–36
Xu M, Bodík R, Hill MD (2005) A serializability violation detector for shared-memory server programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 1–14
Yu T, Pradel M (2016) Syncprof: detecting, localizing, and optimizing synchronization bottlenecks. In: Proceedings of the international symposium on software testing and analysis, pp 389–400
Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: a device-driver case. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 193–206
Zheng L, Liao X, He B, Wu S, Jin H (2015) On performance debugging of unnecessary lock contentions on multicore processors: a replay-based approach. In: Proceedings of the IEEE/ACM international symposium on code generation and optimization
Acknowledgments
This research is supported in part by the NSF grants CCF-1464032 and CCF-1652149, by the German Research Foundation within the Emmy Noether project “ConcSys” and by the German Federal Ministry of Education and Research and the Hessian Ministry of Science and the Arts within “CRISP”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Martin Monperrus and Westley Weimer
Rights and permissions
About this article
Cite this article
Yu, T., Pradel, M. Pinpointing and repairing performance bottlenecks in concurrent programs. Empir Software Eng 23, 3034–3071 (2018). https://doi.org/10.1007/s10664-017-9578-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-017-9578-1