Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer

Mukunoki, Daichi; Imamura, Toshiyuki

doi:10.1007/978-3-319-78024-5_31

Daichi Mukunoki¹⁷ &
Toshiyuki Imamura¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

1532 Accesses
1 Citations

Abstract

In this study, we propose a 2D-compatible implementation of 2.5D parallel matrix multiplication (2.5D-PDGEMM), which was designed to perform computations of 2D distributed matrices on a 2D process grid. We evaluated the performance of our implementation using 16384 nodes (131072 cores) on the K computer, which is a highly parallel computer. The results show that our 2.5D implementation outperforms conventional 2D implementations including the ScaLAPACK PDGEMM routine, in terms of strong scaling, even when the cost for matrix redistribution between 2D and 2.5D distributions is included. We discussed the performance of our implementation by providing a breakdown of the performance and describing the performance model of the implementation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Georganas, E., González-Domínguez, J., Solomonik, E., Zheng, Y., Touriño, J., Yelick, K.: Communication avoiding and overlapping for numerical linear algebra. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC 2012), pp. 100:1–100:11 (2012)
Google Scholar
Kitazawa, Y., Kuroda, A., Shida, N., Adachi, T., Minami, K.: Evaluation of MPI communication performance using throughput on the K computer. In: Proceedings of IPSJ Symposium on High Performance Computing and Computational Science (HPCS2017), pp. 17–25 (2017). (in Japanese)
Google Scholar
Lipshitz, B., Ballard, G., Demmel, J., Schwartz, O.: Communication-avoiding parallel strassen: implementation and performance. In: Proceedings of International Conference on High Performance Computing, Networking, Storage and Analysis (SC 2012), pp. 101:1–101:11 (2012)
Google Scholar
Schatz, M., Van de Geijn, R.A., Poulson, J.: Parallel matrix multiplication: a systematic journey. SIAM J. Sci. Comput. 38(6), C748–C781 (2016)
Article MathSciNet MATH Google Scholar
Solomonik, E., Demmel, J.: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6853, pp. 90–109. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23397-5_10
Chapter Google Scholar
Solomonik, E., Demmel, J.: Communication-optimal parallel 2.5D matrix multiplication and LU factorization algorithms. Technical Report UCB/EECS-2011-10, LAPACK Working Note (2011). http://www.netlib.org/lapack/lawnspdf/lawn238.pdf
Van de Geijn, R.A., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm, Technical report. Department of Computer Science, University of Texas at Austin (1995)
Google Scholar

Download references

Acknowledgment

The results were obtained using the K computer at the RIKEN Advanced Institute for Computational Science (project number: ra000022). This study is a part of the Flagship2020 project. We thank Akiyoshi Kuroda (RIKEN Advanced Institute for Computational Science), Eiji Yamanaka, and Naoki Sueyasu (Fujitsu Limited) for their helpful suggestions and discussions.

Author information

Authors and Affiliations

RIKEN Advanced Institute for Computational Science, 7-1-26 Minatojima-minami-machi, Chuo-ku, Kobe, Hyogo, 650-0047, Japan
Daichi Mukunoki & Toshiyuki Imamura

Authors

Daichi Mukunoki
View author publications
You can also search for this author in PubMed Google Scholar
Toshiyuki Imamura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daichi Mukunoki .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Knoxville, Tennessee, USA
Jack Dongarra
University of Southern California, Marina Del Rey, California, USA
Ewa Deelman
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukunoki, D., Imamura, T. (2018). Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-78024-5_31
Published: 23 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics