Skip to main content
Log in

Precision-aware deterministic and probabilistic error bounds for floating point summation

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

We analyze the forward error in the floating point summation of real numbers, for computations in low precision or extreme-scale problem dimensions that push the limits of the precision. We present a systematic recurrence for a martingale on a computational tree, which leads to explicit and interpretable bounds with nonlinear terms controlled explicitly rather than by big-O terms. Two probability parameters strengthen the precision-awareness of our bounds: one parameter controls the first order terms in the summation error, while the second one is designed for controlling higher order terms in low precision or extreme-scale problem dimensions. Our systematic approach yields new deterministic and probabilistic error bounds for three classes of mono-precision algorithms: general summation, shifted general summation, and compensated (sequential) summation. Extension of our systematic error analysis to mixed-precision summation algorithms that allow any number of precisions yields the first probabilistic bounds for the mixed-precision FABsum algorithm. Numerical experiments illustrate that the probabilistic bounds are accurate, and that among the three classes of mono-precision algorithms, compensated summation is generally the most accurate. As for mixed precision algorithms, our recommendation is to minimize the magnitude of intermediate partial sums relative to the precision in which they are computed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 2.1
Fig. 1
Algorithm 3.1
Fig. 2
Algorithm 4.1
Algorithm 5.1
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. For simplicity, the conditioning also includes those \(\delta _{\ell }\), \(1\le \ell \le k-1\), that are not descendants in the partial order. With stochastic rounding such \(\delta _\ell \) are fully independent of \(\delta _k\).

  2. If n does not admit an exact floating point representation, then we could append an additional vertex for the artificial ‘addition’ \(n+0\), to induce the rounding of n.

  3. The dots do not refer to differentiation!

  4. An especially alert reviewer discovered that the typo was found in March 2007, as mentioned in the earliest errata for [22] from January 2011.

  5. Although the quantities depend on n and \(\eta \), we omit the subscripts, and simply write \(S_k\) instead of \(S_{k, n, \eta }\).

  6. Our simulation of half-precision ignores the range restriction realmax = 65504.

References

  1. Abdelfattah, A., Anzt, H., Boman, E.G., Carson, E., Cojean, T., Dongarra, J., Fox, A., Gates, M., Higham, N.J., Li, X.S., et al.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021)

    Article  Google Scholar 

  2. Blanchard, P., Higham, N.J., Mary, T.: A class of fast and accurate summation algorithms. SIAM J. Sci. Comput. 42(3), A1541–A1557 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chung, F., Lu, L.: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3(1), 79–127 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  4. Connolly, M.P., Higham, N.J., Mary, T.: Stochastic rounding and its probabilistic backward error analysis. SIAM J. Sci. Comput. 43(1), A566–A585 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  5. Constantinides, G., Dahlqvist, F., Rakamaric, Z., Salvia, R.: Rigorous roundoff error analysis of probabilistic floating-point computations (2021). ArXiv:2105.13217

  6. Dahlqvist, F., Salvia, R., Constantinides, G.A.: A probabilistic approach to floating-point arithmetic (2019). ArXiv:1912.00867

  7. Demmel, J., Hida, Y.: Accurate and efficient floating point summation. SIAM J. Sci. Comput. 25(4), 1214–1248 (2003/04)

  8. El Arar, E.M., Sohier, D., de Oliveira Castro, P., Petit, E.: Bounds on non-linear errors for variance computation with stochastic rounding (2023). ArXiv:2304.05177

  9. Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–48 (1991)

    Article  MathSciNet  Google Scholar 

  10. Hallman, E.: A refined probabilistic error bound for sums (2021). ArXiv:2104.06531

  11. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)

    Book  MATH  Google Scholar 

  12. Higham, N.J., Mary, T.: A new approach to probabilistic rounding error analysis. SIAM J. Sci. Comput. 41(5), A2815–A2835 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. Higham, N.J., Mary, T.: Sharper probabilistic backward error analysis for basic linear algebra kernels with random data. SIAM J. Sci. Comput. 42(5), A3427–A3446 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  14. Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer. 31, 347–414 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  15. Higham, N.J., Pranesh, S.: Simulating low precision floating-point arithmetic. SIAM J. Sci. Comput. 41(5), C585–C602 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  16. IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754-2008 (2019). http://ieeexplore.ieee.org/document/4610935

  17. Ipsen, I.C.F., Zhou, H.: Probabilistic error analysis for inner products. SIAM J. Matrix Anal. Appl. 41(4), 1726–1741 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  18. Jeannerod, C.P., Rump, S.M.: Improved error bounds for inner products in floating-point arithmetic. SIAM J. Matrix Anal. Appl. 34(2), 338–344 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  19. Jeannerod, C.P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications. Math. Comput. 87(310), 803–819 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kahan, W.: Further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965)

    Article  Google Scholar 

  21. Kahan, W.: Implementation of algorithms (lecture notes by W. S. Haugeland and D. Hough). Tech. Rep. 20, Department of Computer Science, University of California, Berkeley, CA 94720 (1973)

  22. Knuth, D.: The Art of Computer Programming, 3rd edn. Addison-Wesley, Reading, MA (1998)

    MATH  Google Scholar 

  23. Lange, M., Rump, S.: Sharp estimates for perturbation errors in summations. Math. Comput. 88(315), 349–368 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lohar, D., Prokop, M., Darulova, E.: Sound probabilistic numerical error analysis. In: Intern. Conf. Integrated Formal Methods, pp. 322–340. Springer, Cham (2019)

    Chapter  Google Scholar 

  25. Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomization and Probabilistic Techniques in Algorithms and Data Analysis. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  26. Roch, S.: Modern discrete probability: an essential toolkit. University Lecture (2015)

  27. Rump, S.M.: Error estimation of floating-point summation and dot product. BIT Numer. Math. 52(1), 201–220 (2012)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We are greatly indebted to Claude-Pierre Jeannerod for his many helpful suggestions that improved the paper, and to the two reviewers for their unusually careful and constructive reading of the paper. We also thank Johnathan Rhyne for helpful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric Hallman.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported in part by grants DMS-1745654 and DMS-1760374 from the National Science Foundation, and grant DE-SC0022085 from the Department of Energy.

A Proof of Lemma 8

A Proof of Lemma 8

Define \(\beta \equiv u(1+u)^2\) and

$$\begin{aligned} \omega _k \equiv |s_k| + |x_k| + S_k, \qquad 2\le k \le n-1. \end{aligned}$$
(A.1)

By assumption, \(\beta < 1\). Lemma 7 implies

$$\begin{aligned} Z_k&= u\omega _k +(1+u)Y_k = u\omega _k + (1+u)^2C_{k-1},\qquad 3\le k\le n-1 \end{aligned}$$
(A.2)
$$\begin{aligned} C_k&= u\omega _k + uZ_k = u(1+u)\omega _k + \beta C_{k-1}, \end{aligned}$$
(A.3)

where \(Z_2\le u\omega _2\) and \(C_2\le u(1+u)\omega _2\). For \(3\le k \le n\), define the vectors

$$\begin{aligned} {{\textbf {c}}}_k \equiv \begin{bmatrix} C_{k-1}&\cdots&C_2\end{bmatrix}^T, \quad {{\textbf {z}}}_k \equiv \begin{bmatrix} Z_{k-1}&\ldots&Z_2 \end{bmatrix}^T, \quad {{\textbf {w}}}_k \equiv \begin{bmatrix}\omega _{k-1}&\ldots&\omega _2\end{bmatrix}^T. \end{aligned}$$

From (A.3) follows the componentwise inequality

$$\begin{aligned} {\textbf {c}}_k \le u(1+u){\textbf {w}}_k + \beta {\textbf {U}}{} {\textbf {c}}_k, \end{aligned}$$

where \({\textbf {U}}\) is an upper shift matrix. Solving for \({\textbf {c}}_k\) gives another componentwise inequality with a unit upper triangular matrix \({\textbf {I}}-\beta {\textbf {U}}\),

$$\begin{aligned} {\textbf {c}}_k \le u(1+u)({\textbf {I}}-\beta {\textbf {U}})^{-1}{} {\textbf {w}}_k, \end{aligned}$$

and a bound

$$\begin{aligned} \Vert {\textbf {c}}_k\Vert _2 \le u(1+u)\Vert ({\textbf {I}}-\beta {\textbf {U}})^{-1}{} {\textbf {w}}_k\Vert _2 \le \tfrac{u(1+u)}{1-\beta }\Vert {\textbf {w}}_k\Vert _2. \end{aligned}$$

The bound for \(\Vert {\textbf {z}}_k\Vert _2\) follows from (A.2) and the definition of \(\beta \),

$$\begin{aligned} \Vert {\textbf {z}}_k\Vert _2 \le u\Vert {\textbf {w}}_k\Vert _2 + (1+u)^2\Vert {\textbf {c}}_k\Vert _2 \le \tfrac{u(2+2u+u^2)}{1-\beta }\Vert {\textbf {w}}_k\Vert _2. \end{aligned}$$

Finally, from \(Y_k = (1+u)C_{k-1}\) follows the Frobenius norm bound

$$\begin{aligned} \left( \sum _{j=3}^k\left( Y_j^2 + C_{j-1}^2 + Z_{j-1}^2\right) \right) ^{1/2} = \left\| \begin{bmatrix}(1+u){\textbf {c}}_k&{\textbf {c}}_k&{\textbf {z}}_k \end{bmatrix}\right\| _F \le \alpha u\Vert {\textbf {w}}_k\Vert _2, \end{aligned}$$

where the higher order terms in \(\alpha \) follow from the Taylor series expansion \((1-\beta )^{-2}=1 +2u +\mathcal {O}(u^2)\),

$$\begin{aligned} \alpha ^2 = \frac{1+3(1+u)^2 + 2(1+u)^4}{(1-\beta )^2} = 6 + 26u + \mathcal {O}(u^2). \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hallman, E., Ipsen, I.C.F. Precision-aware deterministic and probabilistic error bounds for floating point summation. Numer. Math. 155, 83–119 (2023). https://doi.org/10.1007/s00211-023-01370-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-023-01370-y

Mathematics Subject Classification

Navigation