ddRingAllreduce: a high-precision RingAllreduce algorithm

Lei, Xiaojun; Gu, Tongxiang; Xu, Xiaowen

doi:10.1007/s42514-023-00150-2

ddRingAllreduce: a high-precision RingAllreduce algorithm

Regular Paper
Published: 05 July 2023

Volume 5, pages 245–257, (2023)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Xiaojun Lei¹,
Tongxiang Gu² &
Xiaowen Xu^2,3

126 Accesses
2 Citations
Explore all metrics

Abstract

For complex problems in scientific computing, parallel computing is almost the only way to solve them, in which global reduction is one of the most frequently used operations. Due to the existence of floating-point rounding errors, the existing global reduction algorithm may result in inaccurate or different between two runs, which are difficult to meet the needs of complex applications. Since the communication cost of RingAllreduce is a constant, independent of the number of processes, it is an effective algorithm when a large amount of data needs to be communicated. However, it faces the same problem as the general global reduction operation, and it is necessary to develop a high-precision RingAllreduce algorithm. In this paper, by combining double-double arithmetic and RingAllreduce algorithm, we propose a high-precision RingAllreduce algorithm, called ddRingAllreduce algorithm. The theoretical error of the proposed algorithm is analyzed and the compact error bounds are derived. We have carried out a large number of parallel numerical experiments and obtained numerical results consistent with the theoretical analysis, and ddRingAllreduce is accurate in the case that RingAllreduce is inaccurate or miscalculated. At the same time, we also analyze the relationship between the problem size and the cost of using double-double arithmetic through experiments, at a small scale, the ddRingAllreduce algorithm can achieve higher accuracy with relatively less time overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MapReduce – The Scalable Distributed Data Processing Solution

Empowering R with High Performance Computing Resources for Big Data Analytics

The Family of Map-Reduce

References

Ahrens, P., Nguyen, H., Demmel, J.: Efficient reproducible floating point summation and BLAS. ACM Trans. Math. Softw. 46(3), 1–49 (2015)
Article Google Scholar
Ahrens, P., Demmel, J., Nguyen, H.D.: Algorithms for efficient reproducible floating point summation. ACM Trans. Math. Softw. 46(3), 1–49 (2020)
Article MathSciNet MATH Google Scholar
ANSI/IEEE.: IEEE Standard for Binary Floating Point Arithmetic, Std 754–2019. IEEE, New York (2019)
Blanchard, P., Higham, N., Mary, T.: A class of fast and accurate summation algorithms. SIAM J. Sci. Comput. 42(3), 1541–1557 (2020)
Article MathSciNet MATH Google Scholar
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)
Article MathSciNet MATH Google Scholar
Demmel, J., Hida, Y.: Fast and accurate floating point summation with application to computational geometry. Numer. Algorithms 37, 101–112 (2004)
Article MathSciNet MATH Google Scholar
Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: Prof of the 21th IEEE Symposium on Computer Arithmetic, pp. 163–172 (2013)
Demmel, J., Nguyen, H.D.: Parallel reproducible summation. IEEE Trans. Comput. 64(7), 2060–2070 (2015)
Article MathSciNet MATH Google Scholar
Dogru, A.H., Fung, L.S., Middya, U., Al-Shaalan, T.M., Tom B., Hahn H., Werner A.H., Al-Zamel, N., Pita, J., Hemanthkumar, K., et al.: Newfrontiers in large scale reservoir simulation. SPE (2011)
Fousse, L., Hanrot, G., Lefevre, V., Pelissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33, 13-es (2007)
Article MathSciNet MATH Google Scholar
Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: ARITH01, pp. 55–162 (2001)
Higham, N.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)
Book MATH Google Scholar
Iakymchuk, R., Collange, S., Defour, D., Graillat, S.: ExBLAS: reproducible and accurate BLAS library. NRE2015 (SC15) (2015)
Jiang, H.: Study on reliable computing and rounding error analysis in floating-point arithmetic (in Chinese). PhD Thesis, Changsha, National University of Defense Technology (2013)
Kimura, R.: Numerical weather prediction. J. Wind. Eng. Ind. Aerodyn. 90, 1403–1414 (2002)
Article Google Scholar
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1969)
Lei, X., Tongxiang, G., Graillat, S., et al.: A fast parallel high-precision summation algorithm based on AccSumK. J. Comput. Appl. Math. 406, 0377–0427 (2021)
MathSciNet MATH Google Scholar
Lei, X., Gu, T., Graillat, S., Xu, X., Meng, J.: Comparison of reproducible parallel preconditioned BiCGSTAB algorithm based on ExBLAS and ReproBLAS. In: HPC Asia’23, Association for Computing Machinery, New York, pp 46–54 (2023)
Li, X.S., Demmel, J., Bailey, D.H., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002)
Article MathSciNet MATH Google Scholar
Muller, J.M., Brisebarre, N., Dinechin, F.D.: Handbook of Floating-Point Arithmetic. Birkhäuser (2010)
Ogita, T., Rump, S., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)
Article MathSciNet MATH Google Scholar
Patarasuk, P., Xin, Y.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)
Article Google Scholar
Rabenseifner, R.: Optimization of collective reduction operations. In: LNCS 3036: International Conference on Computational Science, pp. 1–9 (2004)
Rabenseifner, R., Traff, J.L.: More efficient reduction algorithms for nonpower-of-two number of processors in message-passing parallel systems. In: LNCS 3241: EuroPVM/MPI, pp. 36–46 (2004)
Rump, S.: Ultimately fast accurate summation. SIAM J. Sci. Comput. 31(5), 3466–3502 (2009)
Article MathSciNet MATH Google Scholar
Rump, S., Ogita, T., Oishi, S.: Accurate floating-point summation I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Article MathSciNet MATH Google Scholar
Rump, S., Ogita, T., Oishi, S.: Accurate floating-point summation part II: sign K-Fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)
Article MathSciNet MATH Google Scholar
The MPI forum.: MPI: A Message-Passing Interface Standard, version 1.3 (2008). https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf
van de Geijn, R.: On global combine operations. J. Parallel Distrib. Comput. 22(2), 324–328 (1994)
Article MATH Google Scholar
Xiaowen, X., Zeyao, M., Hengbin, A.: Algebraic two-level iterative method for 2-D 3-T radiation diffusion equations. Chin. J. Comput. Phys. 26(1), 1 (2009)
Google Scholar
Yamanaka, N., Ogita, T., Rump, S., Oishi, S.: A parallel algorithm for accurate dot product. Parallel Comput. 34(6–8), 392–410 (2008)
Article MathSciNet Google Scholar
Zhou, Y.: A discussion on the matching relations among the word length, speed and memory space of digital electronic computer for the use of scientific calculation (in Chinese). J. Numer. Method Comput. Appl. 1(3), 181–192 (1980)
Google Scholar

Download references

Acknowledgements

The second author was supported by the foundation of key laboratory of computational physics, China. The third author is financially supported by the National Natural Science Foundation of China(62032023).

Author information

Authors and Affiliations

Graduate School of Chinese Academy of Engineering Physics, 6 Huayuan Rd, Beijing, 100193, China
Xiaojun Lei
Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, 6 Huayuan Rd, Beijing, 100088, China
Tongxiang Gu & Xiaowen Xu
CAEP Software Center for Numerical Simulation, 6 Huayuan Rd, Beijing, 100088, China
Xiaowen Xu

Authors

Xiaojun Lei
View author publications
You can also search for this author in PubMed Google Scholar
Tongxiang Gu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tongxiang Gu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lei, X., Gu, T. & Xu, X. ddRingAllreduce: a high-precision RingAllreduce algorithm. CCF Trans. HPC 5, 245–257 (2023). https://doi.org/10.1007/s42514-023-00150-2

Download citation

Received: 08 March 2023
Accepted: 24 April 2023
Published: 05 July 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s42514-023-00150-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ddRingAllreduce: a high-precision RingAllreduce algorithm

Abstract

Access this article

Similar content being viewed by others

MapReduce – The Scalable Distributed Data Processing Solution

Empowering R with High Performance Computing Resources for Big Data Analytics

The Family of Map-Reduce

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ddRingAllreduce: a high-precision RingAllreduce algorithm

Abstract

Access this article

Similar content being viewed by others

MapReduce – The Scalable Distributed Data Processing Solution

Empowering R with High Performance Computing Resources for Big Data Analytics

The Family of Map-Reduce

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation