Abstract
Call graphs generated by profiling tools are critical to dissecting the performance of parallel programs. Although many mature and sophisticated profiling tools record call graph data, each tool is different in its runtime overheads, memory consumption, and output data generated. In this work, we perform a comparative evaluation study on the call graph data generation capabilities of several popular profiling tools – Caliper, HPCToolkit, TAU, and Score-P. We evaluate their runtime overheads, memory consumption, and generated call graph data (size and quality). We perform this comparison empirically by executing several proxy applications, AMG, LULESH, and Quicksilver on a parallel cluster. Our results show which tool results in the lowest overheads and produces the most meaningful call graph data under different conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Adhianto, L., et al.: HPCTOOLKIT: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Adhianto, L., Mellor-Crummey, J., Tallent, N.R.: Effectively presenting call path profiles of application performance. In: 2010 39th International Conference on Parallel Processing Workshops, pp. 179–188. IEEE (2010)
Bell, R., Malony, A.D., Shende, S.: ParaProf: a portable, extensible, and scalable tool for parallel performance profile analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-45209-6_7
Bhatele, A., Brink, S., Gamblin, T.: Hatchet: pruning the overgrowth in parallel profiles. In: Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019, November 2019. https://doi.org/10.1145/3295500.3356219. lLNL-CONF-772402
Boehme, D., et al.: Caliper: performance introspection for HPC software stacks. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 550–560 (2016). https://doi.org/10.1109/SC.2016.46
Henson, V.E., Yang, U.M.: BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Appl. Numer. Math. 41(1), 155–177 (2002). https://doi.org/10.1016/S0168-9274(01)00115-5. https://www.sciencedirect.com/science/article/pii/S0168927401001155. Developments and Trends in Iterative Methods for Large Systems of Equations - in Memorium Rudiger Weiss
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Technical report LLNL-TR-641973, August 2013
Knobloch, M., Mohr, B.: Tools for GPU computing-debugging and performance analysis of heterogenous HPC applications. Supercomput. Front. Innov. 7(1), 91–111 (2020)
Knüpfer, A., et al.: Score-p: a joint performance measurement run-time infrastructure for periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M.S., Nagel, W.E., Resch, M.M. (eds.) Tools for High Performance Computing 2011, pp. 79–91. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-31476-6_7
Leko, A., Sherburne, H., Su, H., Golden, B., George, A.D.: Practical experiences with modern parallel performance analysis tools: an evaluation. In: Parallel and Distributed Processing, IPDPS 2008 IEEE Symposium, pp. 14–18 (2008)
Lindlan, K.A., et al.: A tool framework for static and dynamic analysis of object-oriented software with templates. In: SC 2000: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, p. 49. IEEE (2000)
Liu, X., Mellor-Crummey, J.: A tool to analyze the performance of multithreaded programs on NUMA architectures. ACM Sigplan Not. 49(8), 259–272 (2014)
Madsen, J.R., et al.: Timemory: modular performance analysis for HPC. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 434–452. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_22
Malony, A.D., Huck, K.A.: General hybrid parallel profiling. In: 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, pp. 204–212. IEEE (2014)
Mellor-Crummey, J., Fowler, R., Marin, G.: HPCView: a tool for top-down analysis of node performance. J. Supercomput. 23, 81–101 (2002). https://doi.org/10.1023/A:1015789220266
Mohr, B.: Scalable parallel performance measurement and analysis tools-state-of-the-art and future challenges. Supercomput. Front. Innov. 1(2), 108–123 (2014)
Nataraj, A., Sottile, M., Morris, A., Malony, A.D., Shende, S.: TAUoverSupermon: low-overhead online parallel performance monitoring. In: Kermarrec, A.-M., Bougé, L., Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 85–96. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74466-5_11
Nethercote, N.: Dynamic binary analysis and instrumentation. Technical report, University of Cambridge, Computer Laboratory (2004)
Richards, D.F., Bleile, R.C., Brantley, P.S., Dawson, S.A., McKinley, M.S., O’Brien, M.J.: Quicksilver: a proxy app for the monte Carlo transport code mercury. In: 2017 IEEE International Conference on Cluster Computing (CLUSTER), pp. 866–873. IEEE (2017)
Saviankou, P., Knobloch, M., Visser, A., Mohr, B.: Cube v4: from performance report explorer to performance analysis tool. Procedia Comput. Sci. 51, 1343–1352 (2015)
Shende, S., Malony, A.D.: Integration and application of TAU in parallel Java environments. Concurr. Comput. Pract. Exp. 15(3–5), 501–519 (2003)
Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Tallent, N.R., Mellor-Crummey, J.M., Fagan, M.W.: Binary analysis for measurement and attribution of program performance. ACM Sigplan Not. 44(6), 441–452 (2009)
Acknowledgments
This work was supported by funding provided by the University of Maryland College Park Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Cankur, O., Bhatele, A. (2022). Comparative Evaluation of Call Graph Generation by Profiling Tools. In: Varbanescu, AL., Bhatele, A., Luszczek, P., Marc, B. (eds) High Performance Computing. ISC High Performance 2022. Lecture Notes in Computer Science, vol 13289. Springer, Cham. https://doi.org/10.1007/978-3-031-07312-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-07312-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07311-3
Online ISBN: 978-3-031-07312-0
eBook Packages: Computer ScienceComputer Science (R0)