Abstract
Modern computational systems and parallel applications quite often are highly complicated. As a result, their interaction becomes difficult to analyze. There is a wide variety of profiling tools that can help in finding bottlenecks and inefficient parts in programs running on high-performance clusters. However, those tools involve additional overheads. This might be partially avoided by introducing methods of analysis that work on the network layer. In this article, we describe the development of a new tool for exploring visually and analyzing the behavior of different MPI parallel programs. The tool is based on an existing method of collecting traffic data from the InfiniBand network on the Lomonosov supercomputer. The comprehensive implementation includes constructing communication matrices of MPI processes and displaying various parts of the application timeline through these matrices, plotting communicational graphs and message distribution graphs built on several parameters of InfiniBand packets. The obtained visual representation of traffic of parallel applications may enable the analysis of such applications without inspecting the code directly, as demonstrated by examining a few NPB tests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Mellanox Technologies, Mellanox Integrated Switch Management Solution: http://www.mellanox.com/products/management-software/fabricit.
- 2.
Wireshark: https://www.wireshark.org/.
- 3.
Python Dash library: https://dash.plotly.com/.
References
Nagios. http://www.nagios.org/
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: design, implementation, and experience. Parallel Comput. 30(7), 817–840 (2004). https://doi.org/10.1016/j.parco.2004.04.001
Dandapanthula, N., et al.: INAM - a scalable InfiniBand network analysis and monitoring tool. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7156, pp. 166–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29740-3_20
Adhianto, L., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: performance measurement and analysis for supercomputers with node-level parallelism. In: Workshop on Node Level Parallelism for Large Scale Supercomputers, in Conjuction with Supercomputing 2008 (2008)
Malony, A.D., Shende, S.: Performance technology for complex parallel and distributed systems. In: Kotsis, G., Kacsuk, P. (eds.) Distributed and Parallel Systems. SECS, vol. 567, pp. 37–46. Springer, Boston (2000). https://doi.org/10.1007/978-1-4615-4489-0_5
Karrels, E., Lusk, E.: Performance analysis of MPI programs. In: Proceedings of the Workshop on Environments and Tools for Parallel Scientific Computing, pp. 195–200 (1994)
Subramoni, H., et al.: INAM2: InfiniBand Network Analysis and Monitoring with MPI (2016). https://doi.org/10.1007/978-3-319-41321-1_16
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard Version 3.0, section 14.3 (2012)
Infiniband Architecture Specification, Volume 1, Release 1.1 (2002)
Gradskov, A., Stefanov, K.: InfiniBand traffic analysis for building application communication profile. In: Russian Supercomputing Days: Proceedings of the International Conference, pp. 768–775 (2017). (in Russian)
Gabriel, E., et al.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting, pp. 97–104 (2004). https://doi.org/10.1007/978-3-540-30218-6_19
Travis, E.: Oliphant: A Guide to NumPy. Trelgol Publishing, Austin (2006)
Bailey, D.H.: The NAS Parallel Benchmarks. United States (2009). https://doi.org/10.2172/983318
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Domracheva, D., Stefanov, K. (2021). Detecting Changes in Communication Properties of Parallel Programs by InfiniBand Traffic Analysis. In: Sokolinsky, L., Zymbler, M. (eds) Parallel Computational Technologies. PCT 2021. Communications in Computer and Information Science, vol 1437. Springer, Cham. https://doi.org/10.1007/978-3-030-81691-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-81691-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81690-2
Online ISBN: 978-3-030-81691-9
eBook Packages: Computer ScienceComputer Science (R0)