Advertisement

Approaches for Memory-Efficient Communication Library and Runtime Communication Optimization

  • Takeshi Nanri
Chapter

Abstract

This article summarizes the works established in Advanced Communication for Exa (ACE) project. The most important motivation of this project was the severe demands for scalable communication toward Exa-scale computations. Therefore, in the project, we have built a PGAS-based communication library, Advanced Communication Primitives (ACP). Its fundamental communication model is one-sided, based on PGAS model, so that it can consume internal memory footprint as small as possible. Based on this model, several applications including simulations of magnetohydrodynamic, molecular orbitals, and particles were tuned to achieve higher scalability. In addition to that, some communication optimization techniques have been investigated. Especially, tuning methods of collective communications, such as message ordering, algorithm selection, and overlapping, are studied. Also, in this project, a network simulator NSIM-ACE is developed. It simulates behavior of packets for one-sided communications to study the effects of congestions on interconnects.

References

  1. 1.
    Ajima, Y.: Reducing manipulation overhead of remote data structure by controlling remote memory access order. In: ExaComm 2016 Workshop, Frankfurt, Germany, 23 June 2016. https://doi.org/10.1007/978-3-319-46079-6_7 Google Scholar
  2. 2.
    Ajima, Y., Nose, T., Saga, K., Shida, N., Sumimoto, S.: ACPdl: data-structure and global memory allocator library over a thin PGAS-layer. In: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pp. 11–18 (2015)Google Scholar
  3. 3.
    Fukazawa, K., Nanri, T.: Performance of large scale MHD simulation of global planetary magnetosphere with massively parallel scalar type supercomputer including post processing. In: Proceedings of 14th IEEE International Conference on High Performance Computing and Communication, pp. 976–982, Liverpool, United Kingdom, Jun 2012.  https://doi.org/10.1109/HPCC.2012.142
  4. 4.
    Fukazawa, K., Nanri, T., Umeda, T.: Performance evaluation of magnetohydrodynamics simulation for magnetosphere on K computer. In: Tan, G., Yeo, G.K., Turner, S.J., Teo, Y.M. (eds.) AsiaSim 2013, Communications in Computer and Information Science, vol. 402, pp. 570–576. Springer, Berlin/Heidelberg (2013). ISBN: 978-3-642-45036-5. https://doi.org/10.1007/978-3-642-45037-2_61 CrossRefGoogle Scholar
  5. 5.
    Fukazawa, K., Nanri, T., Umeda, T.: Performance measurements of MHD simulation for planetary magnetosphere on peta-scale computer FX10. In: Parallel Computing: Accelerating Computational Science and Engineering (CSE). Advances in Parallel Computing, vol. 25, pp. 387–394. IOS Press (2014). https://doi.org/10.3233/978-1-61499-381-0-387
  6. 6.
    Honda, H.: Performance evaluation of Hartree-Fock program developed by ruby scripting language. In: 1st Pan-American Congress on Computational Mechanics (PANACM 2015), Apr 2015Google Scholar
  7. 7.
    Honda, H.: Development of ACP middle layer communication library for molecular orbital calculation. In: International Congress of Quantum Chemistry 2015 Satellite Symposium, June 2015Google Scholar
  8. 8.
    Honda, H., Morie, Y., Nanri, T.: Development of a memory efficient communication method for connecting MPI programs by using ACP library. In: The 35th JSST Annual Conference International Conference on Simulation Technology, Kyoto, Japan, 27–29 Oct 2016Google Scholar
  9. 9.
    Kobayashi, T.: A new bottleneck in large-scale numerical simulations of transient phenomena, and cooperation between simulations and the post-processes. In: 1st Pan-American Congress on Computational Mechanics (PANACM 2015), Apr 2015Google Scholar
  10. 10.
    Morie, Y.: Implement and evaluation of ACP basic layer of InfiniBand. In: International Workshop on Information Technology. Applied Mathematics and Science (IMS 2015), Kyoto, Japan, Mar 2015Google Scholar
  11. 11.
    Morie, Y., Nanri, T.: Task allocation optimization for neighboring communication on fat tree. In: 14th IEEE International Conference on High Performance Computing and Communication 9th IEEE International Conference on Embedded Software and Systems, HPCC-ICESS 2012, pp. 1219–1225, Liverpool, UK, 25–27 June 2012Google Scholar
  12. 12.
    Morie, Y., Nanri, T.: Neighbor communication algorithm with making an effective use of NICs on multidimensional-mesh/torus. In: International Conference on Simulation Technology (JSST2013), Tokyo, Sep 2013Google Scholar
  13. 13.
    Nanri, T.: Channel interface: a primitive model for memory efficient communication. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, Turku, Finland, Feb 2015Google Scholar
  14. 14.
    Nanri, T.: Performance and memory usage evaluations for channel interface of advanced communication primitives library. In: 1st Pan-American Congress on Computational Mechanics (PANACM 2015), Apr 2015Google Scholar
  15. 15.
    Nanri, T., Fukazawa, K.: Effect of overlapping halo exchange with one-sided communication. In: the 35th JSST Annual Conference International Conference on Simulation Technology, Oct 2016Google Scholar
  16. 16.
    Nanri, T., Kurokawa, M.: Efficient runtime algorithm selection of collective communication with topology-based performance models. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA)’12, Las Vegas, 16–19 July 2012Google Scholar
  17. 17.
    Shibamura, H.: Active packet pacing as a congestion avoidance technique in interconnection network. In: International Conference on Parallel Computing 2015 (ParCo 2015), Sept 2015Google Scholar
  18. 18.
    Sumimoto, S., Ajima, Y., Saga, K., Nose, T., Shida, N., Nanri, T.: The design of advanced communication to reduce memory usage for exa-scale systems. In: Proceedings of 12th International Meeting on High Performance Computing for Computational Science, Porto, Portugal, 28–30 June 2016, to be published as Springer’s Lecture Notes in Computer Science (LNCS)Google Scholar
  19. 19.
    Sumimoto, S., Ajima, Y., Nose, T., Saga, K., Shida, N., Yoshiyuki, M., Nanri, T.: Parallel application experiences using advanced communication primitives. In: 25th Euromicro International Conference on Parallel, Distributed and network-based Processing (PDP 2017), 6–8 Mar 2017Google Scholar
  20. 20.
    Susukita, R., Morie, Y., Nanri, T., Shibamura, H.: Performance Evaluation of RDMA Communication Patterns by Means of Simulations. In: 2015 Joint International Mechanical, Electronic and Information Technology Conference, Dec 2015Google Scholar
  21. 21.
    Susukita, R., Morie, Y., Nanri, T., Shibamura, H.: NSIM-ACE: an interconnection network simulator for evaluating remote direct memory access. In: Proceedings of 6th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH 2016) (2016)Google Scholar
  22. 22.
    Susukita, R., Morie, Y., Nanri, T.: Efficient communications of particle data in particle-based simulations. In: Proceedings of 35th JSST Annual Conference International Conference on Simulation Technology (JSST 2016) (2016)Google Scholar
  23. 23.
    Takami, T., Fukudome, D.: An efficient pipelined implementation of space-time parallel applications. In: Parallel Computing: Accelerating Computational Science and Engineering (CSE). Advances in Parallel Computing, vol. 25, pp. 273–281. IOS Press (2014). https://doi.org/10.3233/978-1-61499-381-0-273
  24. 24.
    Takami, T., Fukudome, D.: An identity parareal method for temporal parallel computations. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) Lecture Notes in Computer Science, vol. 8384, pp. 67–75 (2014). https://doi.org/10.1007/978-3-642-55224-3_7 CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Kyushu UniversityFukuokaJapan

Personalised recommendations