Advertisement

Empirical Software Engineering

, Volume 23, Issue 5, pp 3034–3071 | Cite as

Pinpointing and repairing performance bottlenecks in concurrent programs

  • Tingting YuEmail author
  • Michael Pradel
Article

Abstract

Developing concurrent software that is both correct and efficient is challenging. Past research has proposed various techniques that support developers in finding, understanding, and repairing concurrency-related correctness problems, such as missing or incorrect synchronization. In contrast, existing work provides little support for dealing with concurrency-related performance problems, such as unnecessary or inefficient synchronization. This paper presents SyncProf, a profiling approach that helps in identifying, localizing, and repairing performance bottlenecks in concurrent programs. The approach consists of a sequence of dynamic analyses that reason about relevant code locations with increasing precision while narrowing down performance problems and gathering data for avoiding them. A key novelty is a graph-based representation of relations between critical sections, which is the basis for computing the performance impact of a critical section and for identifying the root cause of a bottleneck. Once a bottleneck is identified, SyncProf searches for a suitable optimization strategy to avoid the problem, increasing the level of automation when repairing performance bottlenecks over a traditional, manual approach. We evaluate SyncProf on 25 versions of eleven C/C++ projects with both known and previously unknown synchronization bottlenecks. The results show that SyncProf effectively localizes the root causes of these bottlenecks with higher precision than a state of the art lock contention profiler, and that it suggests valuable strategies to repair the bottlenecks.

Keywords

Testing Concurrency Performance bottlenecks 

Notes

Acknowledgments

This research is supported in part by the NSF grants CCF-1464032 and CCF-1652149, by the German Research Foundation within the Emmy Noether project “ConcSys” and by the German Federal Ministry of Education and Research and the Hessian Ministry of Science and the Arts within “CRISP”.

References

  1. Alam K, Ahmad R, Ko K (2017) Enabling far-edge analytics: performance profiling of frequent pattern mining algorithms. IEEE AccessGoogle Scholar
  2. Ammons G, Choi J-D, Gupta M, Swamy N (2004) Finding and removing performance bottlenecks in large systemsGoogle Scholar
  3. Arlitt MF, Williamson CL (1996) Web server workload characterization: the search for invariants. 126–137Google Scholar
  4. Artho C, Havelund K, Biere A (2003) High-level data races. J Softw Test Verif Reliab 13:207–227CrossRefGoogle Scholar
  5. Avritzer A, Kondek J, Liu D, Weyuker EJ (2002) Software performance testing based on workload characterization. In: Proceedings of the international workshop on software and performance, pp 17– 24Google Scholar
  6. Barford P, Crovella M (2001) Critical path analysis of TCP, transactions. Proc Conf Appl Technol Architect Protoc Comput Commun 9:238–248Google Scholar
  7. Bois K, Eyerman S, Sartor JB, Eeckhout L (2013) Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In: Proceedings of the 40th annual international symposium on computer architecture, pp 511–522Google Scholar
  8. Bond MD, Coons KE, McKinley KS (2010) PACER: proportional detection of data races. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 255–268Google Scholar
  9. Burckhardt S, Kothari P, Musuvathi M, Nagarakatte S (2010) A randomized scheduler with probabilistic guarantees of finding bugs. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 167–178Google Scholar
  10. Chen G, Stenstrom P (2012) Critical lock analysis: Diagnosing critical section bottlenecks in multithreaded applications. In: Proceedings of the international conference on high performance computing, networking, storage and analysisGoogle Scholar
  11. Choudhary A, Lu S, Pradel M (2017) Efficient detection of thread safety violations via coverage-guided generation of concurrent tests. In: International conference on software engineering, pp 266–277Google Scholar
  12. Cleve H, Zeller A (2005) Locating causes of program failures. In: Proceedings of the international conference on software engineering, pp 342–351Google Scholar
  13. Coons KE, Burckhardt S, Musuvathi M (2010) GAMBIT: effective unit testing for concurrency libraries. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 15–24Google Scholar
  14. Curtsinger C, Berger ED (2015) Coz: finding code that counts with causal profiling. In: Proceedings of the ACM symposium on operating systems principles, pp 184–197Google Scholar
  15. Diagnosing Lock Contention with the Concurrency Visualizer (2010) Microsoft MSDNGoogle Scholar
  16. Draheim D, Grundy J, Hosking J, Lutteroth C, Weber G (2006) Realistic load testing of web applications. In: Conference on software maintenance and reengineering, pp 11–70Google Scholar
  17. DRD (2015) A thread error detector, http://valgrind.org/docs/manual/drd-manual.html
  18. Identify Thread Contention (2015) https://community.dynatrace.com
  19. Effinger-Dean L, Lucia B, Ceze L, Grossman D, Boehm H-J (2012) Ifrit: interference-free regions for dynamic data-race detection. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 467–484Google Scholar
  20. Eyerman S, Eeckhout L (2010) Modeling critical sections in amdahl’s law and its implications for multicore design. In: Proceedings of the international symposium on computer architecture, pp 362–370Google Scholar
  21. Flanagan C, Freund SN (2004) Atomizer: a dynamic atomicity checker for multithreaded programs. In: Proceedings of the international symposium on principles of programming languages, pp 256–267Google Scholar
  22. Flanagan C, Qadeer S (2003) A type and effect system for atomicity. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 338–349Google Scholar
  23. Fouché S, Cohen MB, Porter A (2007) Towards incremental adaptive covering arrays. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 557–560Google Scholar
  24. Gong L, Pradel M, Sen K (2015) JITProf pinpointing JIT-unfriendly JavaScript code. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 357–368Google Scholar
  25. Gray JN, Lorie RA, Putzolu GR (1975) Granularity of locks in a shared data base. In: Proceedings of the international conference on very large data bases, pp 428–451Google Scholar
  26. Gu R, Jin G, Song L, Zhu L, Lu S (2015) What change history tells us about thread synchronization. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 426–438Google Scholar
  27. Gupta R, Epstein M (1990) Achieving low cost synchronization in a multiprocessor system. Fut Gen Comput Syst 6(3):255–269CrossRefGoogle Scholar
  28. Han S, Dang Y, Ge S, Zhang D, Xie T (2012) Performance debugging in the large via mining millions of stack traces. In: Proceedings of the international conference on software engineering, pp 145–155Google Scholar
  29. Heinrich M, Chaudhuri M (2003) Ocean warning: avoid drowning. SIGARCH Comput Archit News 31(3):30–32CrossRefGoogle Scholar
  30. Heirman W, Carlson T, Che S, Skadron K, Eeckhout L (2011) Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads. In: International symposium on workload characterization, pp 38–49Google Scholar
  31. Jin G, Song L, Shi X, Scherpelz J, Lu S (2012) Understanding and detecting real-world performance bugs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 77–88Google Scholar
  32. Joao JA, Suleman MA, Mutlu O, Patt YN (2012) Bottleneck identification and scheduling in multithreaded applications. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 223–234Google Scholar
  33. Jones JA, Harrold MJ (2005) Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of international conference on automated software engineering, pp 273–282Google Scholar
  34. Joshi S, Lahiri SK, Lal A (2012) Underspecified harnesses and interleaved bugs. In: Proceedings of the international symposium on principles of programming languages, pp 19–30Google Scholar
  35. Kahlon V, Sinha N, Kruus E, Zhang Y (2009) Static data race detection for concurrent programs with asynchronous calls. In: Proceedings of the ACM SIGSOFT symposium on foundations of software engineering, pp 13–22Google Scholar
  36. Koskinen E, Herlihy M (2008) Dreadlocks: efficient deadlock detection. In: Proceedings of the symposium on parallelism in algorithms and architectures, pp 297–303Google Scholar
  37. Lozi J -P, David F, Thomas G, Lawall J, Muller G (2012) Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications. In: USENIX annual technical conference, pp 6–6Google Scholar
  38. Luk C -K, Cohn R, Muth R, Patil H, Klauser A, Lowney G, Wallace S, Reddi VJ, Hazelwood K (2005) Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 190–200Google Scholar
  39. Marino D, Musuvathi M, Narayanasamy S (2009) LiteRace: effective sampling for lightweight data-race detection. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 134–143Google Scholar
  40. Mertler CA, Vannatta RA (2002) Advanced and multivariate statistical methods, Pyrczak, Los AngelesGoogle Scholar
  41. Michael MM, Scott ML (1996) Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In: Proceedings of the ACM symposium on principles of distributed computing, pp 267– 275Google Scholar
  42. Miller B, Clark M, Hollingsworth J, Kierstead S, Lim S -S, Torzewski T (1990) IPS-2: the second generation of a parallel program measurement system. In: IEEE transactions on parallel and distributed systems, pp 206–217Google Scholar
  43. Mosberger D, Jin T (1998) A tool for measuring web server performance. ACM SIGMETRICS Perform Eval Rev 26(3):31–37CrossRefGoogle Scholar
  44. Musuvathi M, Qadeer S, Ball T, Basler G, Nainar PA, Neamtiu I (2008) Finding and reproducing Heisenbugs in concurrent programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 267–280Google Scholar
  45. Naik M, Park C-S, Sen K, Gay D (2009) Effective static deadlock detection. In: Proceedings of the international conference on software engineering, pp 386–396Google Scholar
  46. Nistor A, Luo Q, Pradel M, Gross TR, Marinov D (2012) Ballerina: automatic generation and clustering of efficient random unit tests for multithreaded code. In: Proceedings of the international conference on software engineering, pp 727–737Google Scholar
  47. Nistor A, Song L, Marinov D, Lu S (2013) Toddler: detecting performance problems via similar memory-access patterns. In: Proceedings of the international conference on software engineering, pp 562–571Google Scholar
  48. Novillo E, Lu P (2003) A case study of selected SPLASH-2 applications and the sbt debugging tool. In: Proceedings of the international symposium on parallel and distributed processing, p 290.2Google Scholar
  49. Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng C-W (2007) UTS: an unbalanced tree search benchmark. In: LCPC, pp 235–250Google Scholar
  50. Ongoing work on lock contention in QEMU driver (2013). https://www.redhat.com/archives/libvir-list/2013-May/msg01247.html
  51. Pradel M, Gross TR (2012) Fully automatic and precise detection of thread safety violations. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 521– 530Google Scholar
  52. Pradel M, Huggler M, Gross TR (2014) Performance regression testing of concurrent classes. In: Proceedings of the international symposium on software testing and analysis, pp 13–25Google Scholar
  53. Pradel M, Schuh P, Necula G, Sen K (2014) EventBreak: analyzing the responsiveness of user interfaces through performance-guided test generation. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applicationsGoogle Scholar
  54. Rajwar R, Goodman JR (2001) Speculative lock elision: enabling highly concurrent multithreaded execution. In: Proceedings of the ACM/IEEE international symposium on microarchitecture, pp 294–305Google Scholar
  55. Roy A, Hand S, Harris T (2009) A runtime system for software lock elision. In: Proceedings of the SIGOPS/EuroSys European conference on computer systems, pp 261–274Google Scholar
  56. Sahelices B, Ibáñez P, Viñals V, Llabería J M (2009) A methodology to characterize critical section bottlenecks in dsm multiprocessors. In: Proceedings of the international euro-par conference on parallel processing, pp 149–161Google Scholar
  57. Samak M, Ramanathan MK (2014) Trace driven dynamic deadlock detection and reproduction. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 29–42Google Scholar
  58. Selakovic M, Pradel M (2016) Performance issues and optimizations in JavaScript: an empirical study. In: Proceedings of the international conference on software engineeringGoogle Scholar
  59. Selakovic M, Glaser T, Pradel M (2017) An actionable performance profiler for optimizing the order of evaluations. In: International symposium on software testing and analysis, pp 170–180Google Scholar
  60. Sen K (2007) Effective random testing of concurrent programs. In: Proceedings of international conference on automated software engineering, pp 323–332Google Scholar
  61. Sen K (2008) Race directed random testing of concurrent programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–21Google Scholar
  62. Shacham O, Bronson N, Aiken A, Sagiv M, Vechev M, Yahav E (2011) Testing atomicity of composed concurrent operations. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applications, pp 51–64Google Scholar
  63. Sim J, Dasgupta A, Kim H, Vuduc R (2012) A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 11–22Google Scholar
  64. Tallent NR, Mellor-Crummey JM, Porterfield A (2010) Analyzing lock contention in multithreaded applications. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 269–280Google Scholar
  65. Toffola LD, Pradel M, Gross TR (2015) Performance problems you can fix: a dynamic analysis of memoization opportunities. In: Proceedings of the ACM SIGPLAN international conference on object oriented programming systems languages and applicationsGoogle Scholar
  66. Visser W, Havelund K, Brat GP, Park S, Lerda F (2003) Model checking programs. Autom Softw Eng 10(2):203–232CrossRefGoogle Scholar
  67. von Praun C, Gross TR (2003) Static conflict analysis for multi-threaded object-oriented programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 115–128Google Scholar
  68. Wang L, Stoller SD (2006) Accurate and efficient runtime detection of atomicity errors in concurrent programs. In: Proceedings of the ACM SIGPLAN symposium on principles and practice of parallel programming, pp 137–146Google Scholar
  69. Wang Y, Kelly T, Kudlur M, Lafortune S, Mahlke S (2008) Gadara: dynamic deadlock avoidance for multithreaded programs. In: Proceedings of the USENIX conference on operating systems design and implementation, pp 281–294Google Scholar
  70. Wert A, Happe J, Happe L (2013) Supporting swift reaction: automatically uncovering performance problems by systematic experiments. In: Proceedings of the international conference on software engineering, pp 552–561Google Scholar
  71. Williams A, Thies W, Ernst MD (2005) Static deadlock detection for java libraries. In: European conference on object-oriented programming, pp 602–629Google Scholar
  72. Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the international symposium on computer architecture, pp 24–36Google Scholar
  73. Xu M, Bodík R, Hill MD (2005) A serializability violation detector for shared-memory server programs. In: Proceedings of the ACM SIGPLAN conference on programming language design and implementation, pp 1–14Google Scholar
  74. Yu T, Pradel M (2016) Syncprof: detecting, localizing, and optimizing synchronization bottlenecks. In: Proceedings of the international symposium on software testing and analysis, pp 389–400Google Scholar
  75. Yu X, Han S, Zhang D, Xie T (2014) Comprehending performance from real-world execution traces: a device-driver case. In: Proceedings of the international conference on architectural support for programming languages and operating systems, pp 193–206Google Scholar
  76. Zheng L, Liao X, He B, Wu S, Jin H (2015) On performance debugging of unnecessary lock contentions on multicore processors: a replay-based approach. In: Proceedings of the IEEE/ACM international symposium on code generation and optimizationGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of KentuckyLexingtonUSA
  2. 2.Department of Computer ScienceTU DarmstadtDarmstadtGermany

Personalised recommendations