Abstract
In this paper, we report the results obtained from the acceleration of multi-binary64-type multiple precision block and Strassen matrix multiplications with AVX2. We target double-double (DD), triple-double (TD), and quad-double (QD) precision arithmetic designed using certain types of error-free transformation (EFT) arithmetic. Furthermore, we implement SIMDized EFT functions, which simultaneously compute with four binary64 numbers on x86_64 computing environment, and by using help of them, we also develop SIMDized DD, TD, and QD additions and multiplications. In addition, AVX2 load/store functions were adopted to efficiently speed up reading and storing matrix elements from/to memory. Owing to these combined techniques, our implemented multiple precision matrix multiplications were accelerated more than three times compared with non-accelerated ones. Our accelerated matrix multiplication modifies parallelization performance with OpenMP.
Supported by JSPS KAKENHI (Grant Number JP20K11843) and Shizuoka Institute of Science and Technology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bailey, D.: QD. https://www.davidhbailey.com/dhbsoftware/
Intel Corp.: The intel intrinsics guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Dekker, T.J.: A floating-point technique for extending the available precision. Numerische Mathematik 18(3), 224–242 (1971) https://doi.org/10.1007/BF01397083
Fabiano, N., Muller, J.M., Picot, J.: Algorithms for triple words arithmetic. IEEE Trans. Comput. 68, 1573–1583 (2019)
Golub, G.H., Loan, C.: Matrix Computations (4th ed.). Johns Hopkins University Press (2013)
Hishinuma, T., Fujii, A., Tanaka, T., Hasegawa, H.: AVX acceleration of DD. Arithmetic between a sparse matrix and vector. Parallel Process. Appl. Math., 622–631 (2014)
Kouya, T.: Performance evaluation of multiple precision matrix multiplications using parallelized Strassen and Winograd algorithms. JSIAM Lett. 8, 21–24 (2015). https://doi.org/10.14495/jsiaml.8.21
Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33(2), 13 (2007). http://doi.acm.org/10.1145/1236463.1236468
MPLAPACK/MPBLAS: Multiple precision arithmetic LAPACK and BLAS. http://mplapack.sourceforge.net/
OpenBLAS. http://www.openblas.net/
Granlaud, T., GMP development team: the GNU multiple precision arithmetic library. https://gmplib.org/
Kotakemori, T., Fujii, S., Hasegawa, H., Nishida, A.: Lis: Library of iterative solvers for linear systems. https://www.ssisc.org/lis/
Yagi, H., Ishiwata, E., Hasegawa, H.: Acceleration of interactive multiple precision arithmetic toolbox MuPAT using FMA, SIMD, and OpenMP. Adv. Parallel Comput. 36, 431–440 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kouya, T. (2021). Acceleration of Multiple Precision Matrix Multiplication Based on Multi-component Floating-Point Arithmetic Using AVX2. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12953. Springer, Cham. https://doi.org/10.1007/978-3-030-86976-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-86976-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86975-5
Online ISBN: 978-3-030-86976-2
eBook Packages: Computer ScienceComputer Science (R0)