Advertisement

Lobachevskii Journal of Mathematics

, Volume 39, Issue 9, pp 1188–1198 | Cite as

Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

  • M. R. KhalilovEmail author
  • A. V. Timofeev
Part 1. Special issue “High Performance Data Intensive Computing” Editors: V. V. Voevodin, A. S. Simonov, and A. V. Lapin
  • 2 Downloads

Abstract

An algorithm of MPI processes mapping optimization is adapted for supercomputers with interconnect Angara. The mapping algorithm is based on partitioning of parallel program communication pattern. It is performed in such a way that the processes between which the most intensive exchanges take place are tied to the nodes/processors with the highest bandwidth. The algorithm finds a near-optimal distribution of its processes for processor cores to minimize the total execution time of exchanges between MPI processes. The analysis of results of optimized placement of processes using proposed method on small supercomputers is shown. The analysis of the dependence of the MPI program execution time on supercomputer parameters and task parameters is performed. A theoretical model is proposed for estimation of effect of mapping optimization on the execution time for several types of supercomputer topologies. The prospect of using implemented optimization library for large-scale supercomputers with the interconnect Angara is discussed.

Keywords and phrases

parallel programming process mapping MPI Angara interconnect 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    MPI: AMessage-Passing Interface Standard, Version 3. 1. mpi-forum. org/docs/mpi-3. 1/mpi31-report. pdf.Google Scholar
  2. 2.
    A. A. Agarkov, T. F. Ismagilov, D. V. Makagon, A. S. Semyonov, and A. S. Simonov, “Performance evaluation of the Angara interconnect,” in Russian Supercomputing Days, Proceedings of the International Scientific Conference, September 26–27, 2016, Moscow(Mosk. Gos. Univ.,Moscow, 2016), pp. 626–639.Google Scholar
  3. 3.
    A. S. Simonov, D. V. Makagon, I. A. Zhabin, A. N. Sherbak, E. L. Syromiatnikov, and D. A. Polyakov, “The First Generation of Angara High-Speed Interconnect,” Naukoemk. Tekhnol. 15, 21–28 (2014).Google Scholar
  4. 4.
    V. Stegailov, A. Agarkov, S. Biryukov, T. Ismagilov, N. Kondratyuk, M. Khalilov, E. Kushtanov, D. Makagon, A. Mukosey, A. Semenov, A. Simonov, A. Timofeev, and V. Vecher, “Early evaluation of the hybrid clusterwith torus interconnect aimed at cost-effective molecular-dynamics simulations,” in Proceedings of the PPAM 2017 Conference, Lect. NotesComput. Sci. 10778, 327–336 (2018).Google Scholar
  5. 5.
    H. Subramoni et al., “Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes,” in High Performance Computing, Networking, Storage and Analysis (SC), Proceedings of the 2012 International Conference for IEEE, 2012, pp. 1–12.Google Scholar
  6. 6.
    T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proceedings of the International Conference on Supercomputing—ACM, 2011, pp. 75–84.Google Scholar
  7. 7.
    A. A. Paznikov, M. G. Kurnosov, and M. S. Kuprijanov, “Multilevel algorithms for mapping parallel MPIprograms to computing clusters,” Probl. Inform. 1, 4–17 (2015).Google Scholar
  8. 8.
    F. Broquedis et al., “hwloc: A generic framework for managing hardware affinities in HPC applications,” in Parallel, Distributed and Network-Based Processing (PDP), Proceedings of the 18th Euromicro International Conference, 2010 (IEEE, 2010), pp. 180–186.Google Scholar
  9. 9.
    C. Chevalier and F. Pellegrini, “PT-Scotch: A tool for efficient parallel graph ordering,” Parallel Comput. 34, 318–331 (2008).MathSciNetCrossRefGoogle Scholar
  10. 10.
    Abhinav Bhatele, “Automating Topology Aware Mapping for Supercomputers,” Ph. D. Dissertation (Univ. Illinois, Urbana-Champaign, Champaign, IL, USA, 2010).Google Scholar
  11. 11.
    H. Yu et al., “Topology mapping for Blue Gene/L supercomputer,” in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (ACM, 2006), p. 116.Google Scholar
  12. 12.
    P. Balaji et al., “Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems,” Comput. Sci.-Res. Developm. 26, 247–256 (2011).CrossRefGoogle Scholar
  13. 13.
    G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM J. Sci. Comput. 20, 359–392 (1999).MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” The Bell Syst. Tech. J. (1970).Google Scholar
  15. 15.
    H. D. Simon and S.-H. Teng, “How good is recursive bisection?,” SIAM J. Sci. Comput. 18, 1436–1445 (1997).MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    T. Rauber and G. Runger, Parallel Programming for Multicore and Cluster Systems (Springer, Berlin, Heidelberg, 2013).zbMATHGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsMoscowRussia
  2. 2.Joint Institute for High Temperatures of the Russian Academy of SciencesMoscowRussia

Personalised recommendations