Skip to main content
Log in

Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes

  • Published:
Optoelectronics, Instrumentation and Data Processing Aims and scope

Abstract

Algorithms are proposed to improve the efficiency of parallel programs execution on high-performance computer systems, in particular, when solving problems of modeling physical processes. The developed algorithms are focused on optimizing the performance of collective operations on multiprocessor SMP/NUMA nodes in the MPI standard. Read-write blocking algorithms increase the efficiency of synchronization of access to shared memory relative to the algorithms used in the Open PMIx library.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

REFERENCES

  1. A. V. Dvurechenskii and A. I. Yakimov, ‘‘Silicon-based nanoheterostructures with quantum dots,’’ in Advances in Semiconductor Nanostructures: Growth, Characterization, Properties and Applications, Ed. by A. V. Latyshev, A. V. Dvurechenskii, and A. L. Aseev (Elsevier, Amsterdam, 2017), pp. 59–99. https://doi.org/10.1016/B978-0-12-810512-2.00004-4

  2. Supercomputer Fugaku. https://www.fujitsu.com/global/about/innovation/fugaku/. Cited June 29, 2021.

  3. R. L. Graham and G. Shipman, ‘‘MPI support for multi-core architectures: Optimized shared memory collectives,’’ in Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008, Ed. by A. Lastovetsky, T. Kechadi, and J. Dongarra, Lecture Notes in Computer Science, vol. 5205 (Springer, Berlin, 2008), pp. 130–140. https://doi.org/10.1007/978-3-540-87475-1_21

  4. S. Jain, R. Kaleem, M. G. Balmana, A. Langer, D. Durnov, A. Sannikov, and M. Garzaran, ‘‘Framework for scalable intra-node collective operations using shared memory,’’ in SC18: Int. Conf. for High Performance Computing, Networking, Storage and Analysis, Dallas, 2018 (IEEE, 2018), pp. 374–385. https://doi.org/10.1109/SC.2018.00032

  5. M.-S. Wu, R. A. Kendall, and S. Aluru, ‘‘Exploring collective communications on a cluster of SMPs,’’ Proc. Seventh Int. Conf. on High Performance Computing and Grid in Asia Pacific Region, Tokyo, 2004 (IEEE, 2004), pp. 114–117. https://doi.org/10.1109/HPCASIA.2004.1324024

  6. J. Bruck, Ch.-T. Ho, Sh. Kipnis, E. Upfal, and D. Weathersby, ‘‘Efficient algorithms for all-to-all communications in multiport message passing systems,’’ IEEE Trans. Parallel Distrib. Syst. 8, 1143–1156 (1997). https://doi.org/10.1109/71.642949

    Article  Google Scholar 

  7. R. Thakur, R. Rabenseifner, and W. Gropp, ‘‘Optimization of collective communication operations in MPICH,’’ Int. J. High Perform. Comput. Appl. 19, 49–66 (2005). https://doi.org/10.1177/1094342005051521

    Article  Google Scholar 

  8. P. Balaji, D. Buntinas, D. Goodell, W. Gropp, T. Hoefler, S. Kumar, E. Lusk, R. Thakur, and J. L. Traff, ‘‘MPI on millions of cores,’’ Parallel Process. Lett. 21, 45–60 (2011). https://doi.org/10.1142/S0129626411000060

    Article  MathSciNet  Google Scholar 

  9. S. Li, T. Hoefler, and M. Snir, ‘‘NUMA-aware shared memory collective communication for MPI,’’ in Proc. of the 22nd Int. Symp. on High-Performance Parallel and Distributed Computing, New York, 2013 (Association for Computing Machinery, New York, 2013), pp. 85–96. https://doi.org/10.1145/2462902.2462903

  10. A. Polyakov, B. Karasev, J. Hursey, J. Ladd, M. Brinskii, and E. Shipunova, ‘‘A performance analysis and optimization of PMIx-based HPC software stacks,’’ in Proc. of the 26th Europ. MPI Users’ Group Meeting. Zurich, 2019, Ed. by T. Hoefler and J. L. Traff (Association for Computing Machinery, New York, 2019), p. 9. https://doi.org/10.1145/3343211.3343220

  11. PMIx Consortium. 2017–2018. PMIx-based Reference RunTime Environment (PRRTE). https://github.com/pmix/prrte. Cited June 25, 2021.

  12. R. H. Castain, D. Solt, J. Hursey, and A. Bouteiller, ‘‘PMIx: Process management for exascale environments,’’ in Proc. of the 24th Europ. MPI Users’ Group Meeting, New York, 2017 (Association for Computing Machinery, New York, 2017), p. 14. https://doi.org/10.1145/3127024.3127027

  13. IEEE Std 1003.1-2017: IEEE Standard for Information Technology–Portable Operating System Interface (POSIX(R)) Base Specifications. Iss. 7 (Revision of IEEE Std 1003.1-2008) (IEEE, 2018) https://doi.org/10.1109/IEEESTD.2018.8277153. Cited June 25, 2021.

  14. Microbenchmark. https://github.com/artpol84/poc/tree/master/arch/concurrency/locking/shmem_locking. Cited June 25, 2021.

Download references

Funding

The work was carried out within the framework of the state assignment of the Rzhanov Institute of Semiconductor Physics, Siberian Branch, Russian Academy of Sciences (no. 0242-2021-0011), and with the support of the Russian Foundation for Basic Research (grant no. 20-07-00039).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. V. Pavsky.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by T. N. Sokolova

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pavsky, K.V., Kurnosov, M.G., Efimov, A.V. et al. Algorithms for Optimizing the Execution of Parallel Programs on High-Performance Systems When Solving Problems of Modeling Physical Processes. Optoelectron.Instrument.Proc. 57, 552–560 (2021). https://doi.org/10.3103/S8756699021050113

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S8756699021050113

Keywords:

Navigation