Skip to main content
Log in

ddRingAllreduce: a high-precision RingAllreduce algorithm

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

For complex problems in scientific computing, parallel computing is almost the only way to solve them, in which global reduction is one of the most frequently used operations. Due to the existence of floating-point rounding errors, the existing global reduction algorithm may result in inaccurate or different between two runs, which are difficult to meet the needs of complex applications. Since the communication cost of RingAllreduce is a constant, independent of the number of processes, it is an effective algorithm when a large amount of data needs to be communicated. However, it faces the same problem as the general global reduction operation, and it is necessary to develop a high-precision RingAllreduce algorithm. In this paper, by combining double-double arithmetic and RingAllreduce algorithm, we propose a high-precision RingAllreduce algorithm, called ddRingAllreduce algorithm. The theoretical error of the proposed algorithm is analyzed and the compact error bounds are derived. We have carried out a large number of parallel numerical experiments and obtained numerical results consistent with the theoretical analysis, and ddRingAllreduce is accurate in the case that RingAllreduce is inaccurate or miscalculated. At the same time, we also analyze the relationship between the problem size and the cost of using double-double arithmetic through experiments, at a small scale, the ddRingAllreduce algorithm can achieve higher accuracy with relatively less time overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Ahrens, P., Nguyen, H., Demmel, J.: Efficient reproducible floating point summation and BLAS. ACM Trans. Math. Softw. 46(3), 1–49 (2015)

    Article  Google Scholar 

  • Ahrens, P., Demmel, J., Nguyen, H.D.: Algorithms for efficient reproducible floating point summation. ACM Trans. Math. Softw. 46(3), 1–49 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • ANSI/IEEE.: IEEE Standard for Binary Floating Point Arithmetic, Std 754–2019. IEEE, New York (2019)

  • Blanchard, P., Higham, N., Mary, T.: A class of fast and accurate summation algorithms. SIAM J. Sci. Comput. 42(3), 1541–1557 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  • Demmel, J., Hida, Y.: Fast and accurate floating point summation with application to computational geometry. Numer. Algorithms 37, 101–112 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Demmel, J., Nguyen, H.D.: Fast reproducible floating-point summation. In: Prof of the 21th IEEE Symposium on Computer Arithmetic, pp. 163–172 (2013)

  • Demmel, J., Nguyen, H.D.: Parallel reproducible summation. IEEE Trans. Comput. 64(7), 2060–2070 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Dogru, A.H., Fung, L.S., Middya, U., Al-Shaalan, T.M., Tom B., Hahn H., Werner A.H., Al-Zamel, N., Pita, J., Hemanthkumar, K., et al.: Newfrontiers in large scale reservoir simulation. SPE (2011)

  • Fousse, L., Hanrot, G., Lefevre, V., Pelissier, P., Zimmermann, P.: MPFR: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. 33, 13-es (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Hida, Y., Li, X.S., Bailey, D.H.: Algorithms for quad-double precision floating point arithmetic. In: ARITH01, pp. 55–162 (2001)

  • Higham, N.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)

    Book  MATH  Google Scholar 

  • Iakymchuk, R., Collange, S., Defour, D., Graillat, S.: ExBLAS: reproducible and accurate BLAS library. NRE2015 (SC15) (2015)

  • Jiang, H.: Study on reliable computing and rounding error analysis in floating-point arithmetic (in Chinese). PhD Thesis, Changsha, National University of Defense Technology (2013)

  • Kimura, R.: Numerical weather prediction. J. Wind. Eng. Ind. Aerodyn. 90, 1403–1414 (2002)

    Article  Google Scholar 

  • Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1969)

  • Lei, X., Tongxiang, G., Graillat, S., et al.: A fast parallel high-precision summation algorithm based on AccSumK. J. Comput. Appl. Math. 406, 0377–0427 (2021)

    MathSciNet  MATH  Google Scholar 

  • Lei, X., Gu, T., Graillat, S., Xu, X., Meng, J.: Comparison of reproducible parallel preconditioned BiCGSTAB algorithm based on ExBLAS and ReproBLAS. In: HPC Asia’23, Association for Computing Machinery, New York, pp 46–54 (2023)

  • Li, X.S., Demmel, J., Bailey, D.H., et al.: Design, implementation and testing of extended and mixed precision BLAS. ACM Trans. Math. Softw. 28(2), 152–205 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Muller, J.M., Brisebarre, N., Dinechin, F.D.: Handbook of Floating-Point Arithmetic. Birkhäuser (2010)

  • Ogita, T., Rump, S., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Patarasuk, P., Xin, Y.: Bandwidth optimal all-reduce algorithms for clusters of workstations. J. Parallel Distrib. Comput. 69(2), 117–124 (2009)

    Article  Google Scholar 

  • Rabenseifner, R.: Optimization of collective reduction operations. In: LNCS 3036: International Conference on Computational Science, pp. 1–9 (2004)

  • Rabenseifner, R., Traff, J.L.: More efficient reduction algorithms for nonpower-of-two number of processors in message-passing parallel systems. In: LNCS 3241: EuroPVM/MPI, pp. 36–46 (2004)

  • Rump, S.: Ultimately fast accurate summation. SIAM J. Sci. Comput. 31(5), 3466–3502 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Rump, S., Ogita, T., Oishi, S.: Accurate floating-point summation I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Rump, S., Ogita, T., Oishi, S.: Accurate floating-point summation part II: sign K-Fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • The MPI forum.: MPI: A Message-Passing Interface Standard, version 1.3 (2008). https://www.mpi-forum.org/docs/mpi-1.3/mpi-report-1.3-2008-05-30.pdf

  • van de Geijn, R.: On global combine operations. J. Parallel Distrib. Comput. 22(2), 324–328 (1994)

    Article  MATH  Google Scholar 

  • Xiaowen, X., Zeyao, M., Hengbin, A.: Algebraic two-level iterative method for 2-D 3-T radiation diffusion equations. Chin. J. Comput. Phys. 26(1), 1 (2009)

    Google Scholar 

  • Yamanaka, N., Ogita, T., Rump, S., Oishi, S.: A parallel algorithm for accurate dot product. Parallel Comput. 34(6–8), 392–410 (2008)

    Article  MathSciNet  Google Scholar 

  • Zhou, Y.: A discussion on the matching relations among the word length, speed and memory space of digital electronic computer for the use of scientific calculation (in Chinese). J. Numer. Method Comput. Appl. 1(3), 181–192 (1980)

    Google Scholar 

Download references

Acknowledgements

The second author was supported by the foundation of key laboratory of computational physics, China. The third author is financially supported by the National Natural Science Foundation of China(62032023).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tongxiang Gu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, X., Gu, T. & Xu, X. ddRingAllreduce: a high-precision RingAllreduce algorithm. CCF Trans. HPC 5, 245–257 (2023). https://doi.org/10.1007/s42514-023-00150-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-023-00150-2

Keywords

Navigation