Abstract
Algorithms are proposed to improve the efficiency of parallel programs execution on high-performance computer systems, in particular, when solving problems of modeling physical processes. The developed algorithms are focused on optimizing the performance of collective operations on multiprocessor SMP/NUMA nodes in the MPI standard. Read-write blocking algorithms increase the efficiency of synchronization of access to shared memory relative to the algorithms used in the Open PMIx library.
Similar content being viewed by others
REFERENCES
A. V. Dvurechenskii and A. I. Yakimov, ‘‘Silicon-based nanoheterostructures with quantum dots,’’ in Advances in Semiconductor Nanostructures: Growth, Characterization, Properties and Applications, Ed. by A. V. Latyshev, A. V. Dvurechenskii, and A. L. Aseev (Elsevier, Amsterdam, 2017), pp. 59–99. https://doi.org/10.1016/B978-0-12-810512-2.00004-4
Supercomputer Fugaku. https://www.fujitsu.com/global/about/innovation/fugaku/. Cited June 29, 2021.
R. L. Graham and G. Shipman, ‘‘MPI support for multi-core architectures: Optimized shared memory collectives,’’ in Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008, Ed. by A. Lastovetsky, T. Kechadi, and J. Dongarra, Lecture Notes in Computer Science, vol. 5205 (Springer, Berlin, 2008), pp. 130–140. https://doi.org/10.1007/978-3-540-87475-1_21
S. Jain, R. Kaleem, M. G. Balmana, A. Langer, D. Durnov, A. Sannikov, and M. Garzaran, ‘‘Framework for scalable intra-node collective operations using shared memory,’’ in SC18: Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Dallas, 2018 (IEEE, 2018), pp. 374–385. https://doi.org/10.1109/SC.2018.00032
M.-S. Wu, R. A. Kendall, and S. Aluru, ‘‘Exploring collective communications on a cluster of SMPs,’’ Proc. Seventh Int. Conf. on High Performance Computing and Grid in Asia Pacific Region, Tokyo, 2004 (IEEE, 2004), pp. 114–117. https://doi.org/10.1109/HPCASIA.2004.1324024
J. Bruck, Ch.-T. Ho, Sh. Kipnis, E. Upfal, and D. Weathersby, ‘‘Efficient algorithms for all-to-all communications in multiport message passing systems,’’ IEEE Trans. Parallel Distrib. Syst. 8, 1143–1156 (1997). https://doi.org/10.1109/71.642949
R. Thakur, R. Rabenseifner, and W. Gropp, ‘‘Optimization of collective communication operations in MPICH,’’ Int. J. High Perform. Comput. Appl. 19, 49–66 (2005). https://doi.org/10.1177/1094342005051521
P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, R. Thakur, and J. L. Traff, ‘‘MPI on millions of cores,’’ Parallel Process. Lett. 21, 45–60 (2011). https://doi.org/10.1142/S0129626411000060
S. Li, T. Hoefler, and M. Snir, ‘‘NUMA-aware shared memory collective communication for MPI,’’ in Proc. of the 22nd Int. Symp. on High-Performance Parallel and Distributed Computing, New York, 2013 (Association for Computing Machinery, New York, 2013), pp. 85–96. https://doi.org/10.1145/2462902.2462903
A. Polyakov, B. Karasev, J. Hursey, J. Ladd, M. Brinskii, and E. Shipunova, ‘‘A performance analysis and optimization of PMIx-based HPC software stacks,’’ in Proc. of the 26th Europ. MPI Users’ Group Meeting. Zurich, 2019, Ed. by T. Hoefler and J. L. Traff (Association for Computing Machinery, New York, 2019), p. 9. https://doi.org/10.1145/3343211.3343220
PMIx Consortium. 2017–2018. PMIx-based Reference RunTime Environment (PRRTE). https://github.com/pmix/prrte. Cited June 25, 2021.
R. H. Castain, D. Solt, J. Hursey, and A. Bouteiller, ‘‘PMIx: Process management for exascale environments,’’ in Proc. of the 24th Europ. MPI Users’ Group Meeting, New York, 2017 (Association for Computing Machinery, New York, 2017), p. 14. https://doi.org/10.1145/3127024.3127027
IEEE Std 1003.1-2017: IEEE Standard for Information Technology–Portable Operating System Interface (POSIX(R)) Base Specifications. Iss. 7 (Revision of IEEE Std 1003.1-2008) (IEEE, 2018) https://doi.org/10.1109/IEEESTD.2018.8277153. Cited June 25, 2021.
Microbenchmark. https://github.com/artpol84/poc/tree/master/arch/concurrency/locking/shmem_locking. Cited June 25, 2021.
Funding
The work was carried out within the framework of the state assignment of the Rzhanov Institute of Semiconductor Physics, Siberian Branch, Russian Academy of Sciences (no. 0242-2021-0011), and with the support of the Russian Foundation for Basic Research (grant no. 20-07-00039).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by T. N. Sokolova
About this article
Cite this article
Pavsky, K.V., Kurnosov, M.G., Efimov, A.V. et al. Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes. Optoelectron.Instrument.Proc. 57, 552–560 (2021). https://doi.org/10.3103/S8756699021050113
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S8756699021050113