Abstract
Parallel implementations of Krylov subspace algorithms often help to accelerate the procedure to find the solution of a linear system. However, from the other side, such parallelization coupled with asynchronous and out-of-order execution often enlarge the non-associativity of floating-point operations. This results in non-reproducibility on the same or different settings. This paper proposes a general framework for deriving reproducible and accurate variants of a Krylov subspace algorithm. The proposed algorithmic strategies are reinforced by programmability suggestions to assure deterministic and accurate executions. The framework is illustrated on the preconditioned BiCGStab method for the solution of non-symmetric linear systems with message-passing. Finally, we verify the two reproducible variants of PBiCGStab on a set matrices from the SuiteSparse Matrix Collection and a 3D Poisson’s equation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Reproducibility is the ability to obtain a bit-wise identical and accurate result for multiple executions on the same data in various parallel environments.
- 2.
ExBLAS repository: https://github.com/riakymch/exblas.
- 3.
Certainly, there are better alternatives for banded or similar sparse matrices, but using MPI_Allgatherv is the simplified solution for nonstructured sparse matrices.
References
Iakymchuk, R., et al.: Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments. IJHPCA 34(5), 502–518 (2020). https://doi.org/10.1177/1094342020932650
Iakymchuk, R., et al.: Reproducibility strategies for parallel preconditioned conjugate gradient. JCAM 371, 112697 (2020). https://doi.org/10.1016/j.cam.2019.112697
Barrett, R., et al.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM (1994)
Collange, S., et al.: Numerical reproducibility for the parallel reduction on multi-and many-core architectures. Parallel Comput. 49, 83–97 (2015). https://doi.org/10.1016/j.parco.2015.09.001
Cools, S., Vanroose, W.: The communication-hiding pipelined BiCGstab method for the parallel solution of large unsymmetric linear systems. Parallel Comput. 65, 1–20 (2017). https://doi.org/10.1016/j.parco.2017.04.005
Demmel, J., Nguyen, H.D.: Parallel reproducible summation. IEEE Trans. Comput. 64(7), 2060–2070 (2015). https://doi.org/10.1109/TC.2014.2345391
Fletcher, R.: Conjugate gradient methods for indefinite systems. In: Watson, G.A. (ed.) Numerical Analysis. LNM, vol. 506, pp. 73–89. Springer, Heidelberg (1976). https://doi.org/10.1007/BFb0080116
Fousse, L., et al.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM TOMS 33(2), 13 (2007). https://doi.org/10.1145/1236463.1236468
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–48 (1991). https://doi.org/10.1145/103162.103163
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley (1969)
Kulisch, U., Snyder, V.: The exact dot product as basic tool for long interval arithmetic. Computing 91(3), 307–313 (2011). https://doi.org/10.1007/s00607-010-0127-7
Mukunoki, D., Ogita, T., Ozaki, K.: Reproducible BLAS routines with tunable accuracy using Ozaki scheme for many-core architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds.) PPAM 2019. LNCS, vol. 12043, pp. 516–527. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-43229-4_44
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26, 1955–1988 (2005). https://doi.org/10.1137/030601818
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part II: sign, K-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008). https://doi.org/10.1137/07068816X
Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. SIAM, Philadelphia (2003). https://doi.org/10.1137/1.9780898718003
Saad, Y., Schultz, M.H.: GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 7, 856–869 (1986). https://doi.org/10.1137/0907058
Sonneveld, P.: CGS, a fast Lanczos-type solver for nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 10(1), 36–52 (1989). https://doi.org/10.1137/0910004
van der Vorst, H.A.: Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 13(2), 631–644 (1992). https://doi.org/10.1137/0913035
Wiesenberger, M., et al.: Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures. CPC 238, 145–156 (2019). https://doi.org/10.1016/j.cpc.2018.12.006
Acknowledgment
This research was partially supported by the EU H2020 MSCA-IF Robust project (No. 842528); the French ANR InterFLOP project (No. ANR-20-CE46-0009). The research from Universitat Jaume I was funded by the project PID2020-113656RB-C21 via MCIN/AEI/10.13039/501100011033.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Iakymchuk, R., Graillat, S., Aliaga, J.I. (2023). General Framework for Deriving Reproducible Krylov Subspace Algorithms: BiCGStab Case. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham. https://doi.org/10.1007/978-3-031-30442-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-30442-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30441-5
Online ISBN: 978-3-031-30442-2
eBook Packages: Computer ScienceComputer Science (R0)