Annotation Guided Collection of Context-Sensitive Parallel Execution Profiles

Benavides, Zachary; Gupta, Rajiv; Zhang, Xiangyu

doi:10.1007/978-3-319-67531-2_7

Annotation Guided Collection of Context-Sensitive Parallel Execution Profiles

Zachary Benavides¹⁵,
Rajiv Gupta¹⁵ &
Xiangyu Zhang¹⁶

Conference paper
First Online: 06 September 2017

1269 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10548))

Abstract

Studying the relative behavior of an application’s threads is critical to identifying performance bottlenecks and understanding their root causes. We present context-sensitive parallel (CSP) execution profiles, that capture the relative behavior of threads in terms of the user selected code regions they execute. CSPs can be analyzed to compute execution times spent by the application in interesting behavior states. To capture execution context, code regions of interest can be given static and dynamic names using a versatile set of annotations. The CSP divides the execution time of a multithreaded application into a sequence of time intervals called frames, during which no thread transitions between code regions. By appropriate selection and naming of code regions, the user can obtain a CSP that captures all occurrences of arbitrary behavior states. We provide the user with a powerful query language to facilitate the analysis of CSPs. Our implementation for collection of CSPs of C++ programs has low overhead and high accuracy. Collection of CSPs of full executions of 12 Parsec programs incurred overhead of at most 7% in execution time. The accuracy of CSPs was validated in the context of common performance problems such as load imbalance in pipeline stages and the presence of straggler threads.

This work is supported by NSF grants CCF-1318103 and CCF-1524852 to UCR.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Intel 64 and ia-32 architectures software developer’s manual, volume 2: Instruction set reference, a-z (2015). http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf. Accessed 22 July 2016
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concurrency Comput. Pract. Experience 22(6), 685–701 (2010)
Google Scholar
Anderson, T.E., Lazowska, E.D.: Quartz: a tool for tuning parallel program performance. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. Citeseer (1990)
Google Scholar
Böhme, D., Wolf, F., de Supinski, B.R., Schulz, M., Geimer, M.: Scalable critical-path based performance analysis. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS), pp. 1330–1340. IEEE (2012)
Google Scholar
Curtsinger, C., Berger, E.D.: Coz: finding code that counts with causal profiling. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 184–197. ACM (2015)
Google Scholar
David, F., Thomas, G., Lawall, J., Muller, G.: Continuously measuring critical section pressure with the free-lunch profiler. In: OOPSLA 2014. ACM (2014)
Google Scholar
Ding, R., Zhou, H., Lou, J.G., Zhang, H., Lin, Q., Fu, Q., Zhang, D., Xie, T.: Log2: a cost-aware logging mechanism for performance diagnosis. In: 2015 USENIX Annual Technical Conference (USENIX ATC 15), pp. 139–150 (2015)
Google Scholar
Du Bois, K., Sartor, J.B., Eyerman, S., Eeckhout, L.: Bottle graphs: visualizing scalability bottlenecks in multi-threaded applications. In: Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2013, NY, USA, pp. 355–372 (2013). doi:10.1145/2509136.2509529
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The scalasca performance toolset architecture. Concurrency Comput. Pract. Experience 22(6), 702–719 (2010)
Google Scholar
Graham, S.L., Kessler, P.B., Mckusick, M.K.: Gprof: a call graph execution profiler. In: Proceedings of the 1982 SIGPLAN Symposium on Compiler Construction, SIGPLAN 1982, NY, USA, pp. 120–126 (1982). doi:10.1145/800230.806987
Hollingsworth, J.K.: An online computation of critical path profiling. In: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 11–20. ACM (1996)
Google Scholar
Hollingsworth, J.K., Miller, B.P.: Parallel program performance metrics: a comprison and validation. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pp. 4–13. IEEE Computer Society Press (1992)
Google Scholar
Hollingsworth, J.K., Miller, B.P.: Slack: a new performance metric for parallel programs (1994)
Google Scholar
Jeon, D., Garcia, S., Louie, C., Taylor, M.B.: Kismet: Parallel speedup estimates for serial programs. In: Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA 2011, NY, USA, pp. 519–536 (2011). doi:10.1145/2048066.2048108
Kambadur, M., Tang, K., Kim, M.A.: ParaShares: finding the important basic blocks in multithreaded programs. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 75–86. Springer, Cham (2014). doi:10.1007/978-3-319-09873-9_7
Google Scholar
Miller, B.P., Clark, M., Hollingsworth, J., Kierstead, S., Lim, S.S., Torzewski, T.: IPS-2: The second generation of a parallel program measurement system. IEEE Trans. Parallel Distrib. Syst. 1(2), 206–217 (1990)
Article Google Scholar
Oyama, Y., Taura, K., Yonezawa, A.: Online computation of critical paths for multithreaded languages. In: Rolim, J. (ed.) IPDPS 2000. LNCS, vol. 1800, pp. 301–313. Springer, Heidelberg (2000). doi:10.1007/3-540-45591-4_40
Chapter Google Scholar
Shende, S., Malony, A.D., Cuny, J., Beckman, P., Karmesin, S., Lindlan, K.: Portable profiling and tracing for parallel, scientific applications using c++. In: Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, pp. 134–145. ACM (1998)
Google Scholar
Tallent, N.R., Mellor-Crummey, J.M., Porterfield, A.: Analyzing lock contention in multithreaded applications. In: 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010 (2010)
Google Scholar
Yang, C.Q., Miller, B.P.: Critical path analysis for the execution of parallel and distributed programs. In: 8th International Conference on Distributed Computing Systems, pp. 366–373. IEEE (1988)
Google Scholar
Yu, X., Han, S., Zhang, D., Xie, T.: Comprehending performance from real-world execution traces: a device-driver case. In: ASPLOS. Citeseer (2014)
Google Scholar
Yuan, X., Wu, C., Wang, Z., Li, J., Yew, P.C., Huang, J., Feng, X., Lan, Y., Chen, Y., Guan, Y.: ReCBuLC: reproducing concurrency bugs using local clocks. In: Proceedings of the 37th International Conference on Software Engineering-Volume 1, pp. 824–834. IEEE Press (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Riverside, USA
Zachary Benavides & Rajiv Gupta
Purdue University, West Lafayette, USA
Xiangyu Zhang

Authors

Zachary Benavides
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zachary Benavides .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Shuvendu Lahiri
The University of Manchester, Manchester, United Kingdom
Giles Reger

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benavides, Z., Gupta, R., Zhang, X. (2017). Annotation Guided Collection of Context-Sensitive Parallel Execution Profiles. In: Lahiri, S., Reger, G. (eds) Runtime Verification. RV 2017. Lecture Notes in Computer Science(), vol 10548. Springer, Cham. https://doi.org/10.1007/978-3-319-67531-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-67531-2_7
Published: 06 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67530-5
Online ISBN: 978-3-319-67531-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics