The Analysis of Cluster Interconnect with the Network_Tests2 Toolkit
The article discusses MPI-2 tools for benchmarking and extracting information on features of interconnect in HPC clusters. Authors develop a toolkit named “network_tests2”. This toolkit highlights hidden cluster’s topology, illuminates the so-called “jump points” in latency during message transfer, allows user to search defective cluster nodes and so on. The toolkit consists of several programs. The first one is an MPI-program that performs message transfer in several modes to provide certain communication activity or benchmarking of a chosen MPI-function and collects some statistics. The output of this program is a set of communicative matrices which are stored as a NetCDF file. The toolkit includes programs that perform data clustering and provide GUI for visualisation and comparison of results obtained from different clusters. This article touches some results obtained from Russian supercomputers such as Lomonosov T500 system. We also present data on Infiniband Mellanox and Blue Gene/P interconnect technologies.
Unable to display preview. Download preview PDF.
- 2.Dave Turner, X.C.: Protocol-dependent message-passing performance on linux clusters. In: IEEE International Conference on Cluster Computing (CLUSTER 2002), pp. 187–194 (2002)Google Scholar
- 3.M. P. I. Forum. MPI: A Message-Passing Interface Standard, Version 2.2. High Performance Computing Center Stuttgart (HLRS) (September 2009)Google Scholar
- 4.Lastovetsky, A., Rychkov, V., O’Flynn, M.: MPIBlib: Benchmarking MPI communications for parallel computing on homogeneous and heterogeneous clusters. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds.) EuroPVM/MPI 2008. LNCS, vol. 5205, pp. 227–238. Springer, Heidelberg (2008)CrossRefGoogle Scholar
- 5.Majumder, S., Rixner, S.: Comparing ethernet and myrinet for mpi communication. In: Proceedings of the 7th Workshop on Languages, Compilers, and Run-time Support for Scalable Systems, LCR 2004, pp. 1–7. ACM, New York (2004)Google Scholar
- 8.Salnikov, A.N., Andreev, D.Y.: Develop tools for monitoring communications environment of computing clusters with a large number of processor elements. In: Proceedings of the Fifth International Conference Parallel Computing and Control Problems, pp. 1187–1208. Russian Academy of Sciences, Moscow (2010)Google Scholar