Skip to main content
Log in

Improved error bounds for floating-point products and Horner’s scheme

  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

Let \(\mathbf{u}\) denote the relative rounding error of some floating-point format. Recently it has been shown that for a number of standard Wilkinson-type bounds the typical factors \(\gamma _k:=k\mathbf{u}/(1-k\mathbf{u})\) can be improved into \(k\mathbf{u}\), and that the bounds are valid without restriction on \(k\). Problems include summation, dot products and thus matrix multiplication, residual bounds for \(LU\)- and Cholesky-decomposition, and triangular system solving by substitution. In this note we show a similar result for the product \(\prod _{i=0}^k{x_i}\) of real and/or floating-point numbers \(x_i\), for computation in any order, and for any base \(\beta \geqslant 2\). The derived error bounds are valid under a mandatory restriction of \(k\). Moreover, we prove a similar bound for Horner’s polynomial evaluation scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. In what follows, adjacent means child or parent node.

  2. Thus, for the classical case \(\beta =2\) a contradiction to \(\{i,j\}\subseteq I\) is already obtained.

  3. In fact, (2.29) is applied to \(|x_0|, |x_1|\) because the proof of Theorem 1.2 assumes positive factors.

  4. In [1] long sequences \(x_i\in \mathbb {F}\) with \({\text {fl}}((\ldots ({\text {fl}}(x_0x_1)x_2)\ldots )x_k) = x_0\) are constructed for some precisions.

References

  1. Graillat, S., Lefèvre, V., Muller, J.-M.: On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic. Numer. Algorithms (2015, to appear). doi:10.1007/s11075-015-9967-8

  2. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)

    Book  MATH  Google Scholar 

  3. ANSI/IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic. IEEE, New York (2008)

  4. Jeannerod, C.-P., Rump, S.M.: Improved error bounds for inner products in floating-point arithmetic. SIAM. J. Matrix Anal. Appl. (SIMAX) 34(2), 338–344 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  5. Jeannerod, C.-P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications, January 2014 (preprint)

  6. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)

    MATH  Google Scholar 

  7. MATLAB: User’s Guide, Version 2013b, the MathWorks Inc. (2013)

  8. Muller, J.-M.: On the maximum relative error when computing iterated integer powers in floating-point arithmetic. In: INVA Conference, Tokyo (2014)

  9. Rump, S.M., Jeannerod, C.-P.: Improved error bounds for LU and Cholesky factorizations. SIAM. J. Matrix Anal. Appl. (SIMAX) 35(2), 699–724 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  10. Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors wish to thank Marko Lange, Vincent Lefèvre, and Jean-Michel Muller for their fruitful comments on a preliminary version of this note. Moreover, our thanks to two anonymous referees for their thoughtful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Siegfried M. Rump.

Additional information

Communicated by Axel Ruhe.

Appendices

Appendix

The goal of this appendix is to prove that for \(\beta =2\) and \(p\geqslant 4\) the constraint \(k < \mathbf{u}^{-1/2}\) in Theorem 1.2 cannot be replaced by \(k < 12\mathbf{u}^{-1/2}\). To do thatFootnote 4 we construct \(x_0,x_1,x_2\in \mathbb {F}\) for given precision \(p\) such that \(x_1x_2 < 1\) and \({\text {fl}}({\text {fl}}(x_0x_1)x_2) = x_0\). Subsequent multiplications by \(x_1x_2\) produce an exponential growth of the rounding error, eventually exceeding \(k\mathbf{u}\).

Define \(s:=\lfloor \mathbf{u}^{-1/2}\rfloor \in \mathbb {N}\), so that \(s=\mathbf{u}^{-1/2}-\delta \) with \(0\leqslant \delta < 1\). We henceforth assume \(p \geqslant 15\) and treat the case \(p\leqslant 14\) later. Note that \(\beta =2\) and \(p \geqslant 15\) imply \(s\geqslant 181\). We distinguish two cases.

First, assume \(s\) is odd. Set

$$\begin{aligned} x_0:=1+(2s+8)\mathbf{u},\quad x_1:=1-(s-4)\mathbf{u}, \quad \hbox {and}\quad x_2:=1+(s-5)\mathbf{u}, \end{aligned}$$

so that \(x_i\in \mathbb {F}\). Then, \(x_0x_1=1+(s+10)\mathbf{u}+\mu _1\mathbf{u}\) with \(\mu _1:=4\delta \sqrt{\mathbf{u}}+(32-2\delta ^2)\mathbf{u}\), so that \(0<\mu _1<1\) and \(s\) odd imply \({\text {fl}}(x_0x_1)=1+(s+11)\mathbf{u}\). Moreover, \({\text {fl}}(x_0x_1)x_2 = 1+(2s+7)\mathbf{u}+\mu _2\mathbf{u}\) with

$$\begin{aligned} \mu _2 := \sqrt{\mathbf{u}}\left( 6-55\sqrt{\mathbf{u}}+\Phi \delta \right) \quad \hbox {with }\Phi :=(\delta -6)\sqrt{\mathbf{u}}-2. \end{aligned}$$

Now \(\Phi <0\) for any value of \(\delta \), so that \(0<4\sqrt{\mathbf{u}}-60\mathbf{u}\leqslant \mu _2\leqslant 6\sqrt{\mathbf{u}}-55\mathbf{u}<1\). Thus,

$$\begin{aligned} {\text {fl}}({\text {fl}}(x_0x_1)x_2) = x_0. \end{aligned}$$
(4.1)

Define a vector \(X:=[x_0 \; x \; x \ldots x]\in \mathbb {F}^{2m+1}\) with \(m\) times repeating the row vector \(x = [x_1 \, x_2] \in \mathbb {F}^2\). Denoting \({\widehat{r}}_0:=x_0\) and \({\widehat{r}}_i:={\text {fl}}({\widehat{r}}_{i-1} X_i)\) for \(i\geqslant 1\) yields \({\widehat{r}}_2=v_0\). Then, abbreviating \(\pi :=x_1x_2\) and using \({\widehat{r}}_{2m}={\widehat{r}}_2=x_0\) gives

$$\begin{aligned} {\widehat{r}}_{2m}-\prod _{i=0}^{2m}{X_i} = x_0 - x_0\pi ^m = (\pi ^{-m}-1)\prod _{i=0}^{2m}{X_i}\quad \hbox {for }1\leqslant m\in \mathbb {N}. \end{aligned}$$
(4.2)

Now,

$$\begin{aligned} \pi = 1 - \left( 2 - (9+2\delta )\sqrt{\mathbf{u}}\right) \mathbf{u}- (20+9\delta +\delta ^2)\mathbf{u}^2 < 1 - \left( 2 - 11\sqrt{\mathbf{u}}\right) \mathbf{u}=: 1-\gamma \mathbf{u}, \end{aligned}$$

and for \(m\in \mathbb {N}\),

$$\begin{aligned} \pi ^{-m} > 1+m\gamma \mathbf{u}+ \frac{m(m-1)}{2}\gamma ^2\mathbf{u}^2 = 1+2m\mathbf{u}+\frac{m\mathbf{u}\sqrt{\mathbf{u}}}{2}\left[ (m-1)\gamma ^2\sqrt{\mathbf{u}}-22\right] . \end{aligned}$$

The assumption \(p\geqslant 15\) implies

$$\begin{aligned} (6-2\sqrt{\mathbf{u}})\gamma ^2-22 = 2 - 272 \sqrt{\mathbf{u}} + (814-242\sqrt{\mathbf{u}})\mathbf{u}> 2 - 272 \sqrt{\mathbf{u}} > 0, \end{aligned}$$

and therefore

$$\begin{aligned} m \geqslant 6 \mathbf{u}^{-1/2} - 1 \quad \Rightarrow \quad \pi ^{-m} > 1+2m\mathbf{u}. \end{aligned}$$
(4.3)

Combining this with (4.2) shows that the error bound in (1.6) is not satisfied for \(k=2\left\lceil 6\mathbf{u}^{-1/2}-1\right\rceil < 12 \mathbf{u}^{-1/2}\), and that finishes the first part.

Second, assume \(s\) is even and define as before

$$\begin{aligned} y_0:=1+(2s+6)\mathbf{u},\quad y_1:=1-(s-3)\mathbf{u}, \quad \hbox {and}\quad y_2:=1+(s-4)\mathbf{u}. \end{aligned}$$
(4.4)

Then, \(y_i\in \mathbb {F}\). Furthermore, \(y_0y_1=1+(s+7)\mathbf{u}+\mu _1\mathbf{u}\) with \(\mu _1:=4\delta \sqrt{\mathbf{u}}+(18-2\delta ^2)\mathbf{u}\), so that \(0<\mu _1<1\) and \(s\) even imply \({\text {fl}}(y_0y_1)=1+(s+8)\mathbf{u}\). Moreover, \({\text {fl}}(y_0y_1)y_2 = 1+(2s+5)\mathbf{u}+\mu _2\mathbf{u}\) with

$$\begin{aligned} \mu _2 := \sqrt{\mathbf{u}}\left( 4-32\sqrt{\mathbf{u}}+\Phi \delta \right) \quad \hbox {with }\Phi :=(\delta -4)\sqrt{\mathbf{u}}-2. \end{aligned}$$

As before, \(\Phi <0\) for any value of \(\delta \). Thus, \(0<2\sqrt{\mathbf{u}}-35\mathbf{u}\leqslant \mu _2\leqslant 4\sqrt{\mathbf{u}}-32\mathbf{u}<1\). Hence, similar to (4.1), \({\text {fl}}({\text {fl}}(y_0y_1)y_2) = y_0\) is again true. Now for the values \(y_1, y_2\) in (4.4) we obtain

$$\begin{aligned} y_1y_2 = (1-(s-3)\mathbf{u})(1+(s-4)\mathbf{u}) < x_1x_2, \end{aligned}$$

and the result follows as before. Finally, for the cases \(4\leqslant p\leqslant 14\), consider

p

\(m_0\)

\(m_1\)

\(m_2\)

F

4

2

\(-\)4

4

9.6

5

20

\(-\)3

2

8.9

6

32

\(-\)14

16

5.8

7

28

\(-\)9

8

6.8

8

52

\(-\)39

44

5.8

9

48

\(-\)21

20

4.6

10

140

\(-\)117

130

5.2

11

94

\(-\)43

42

5.8

12

186

\(-\)154

158

4.0

13

184

\(-\)89

88

4.1

14

262

\(-\)125

124

7.2

For precision \(p\) define \(x_i:=1+m_i\mathbf{u}\). Then, (4.1) is satisfied, and the error bound in (1.6) is not true for \(k<F\mathbf{u}^{-1/2}\). This finishes the proof. \(\square \)

We finally mention that it is easy to see that, if \(1\leqslant p\leqslant 2\), then the error bound in (1.6) is satisfied for all \(k\in \mathbb {N}\), and if \(p=3\), then the minimum value of \(k\) for which it is not satisfied is \(k=72\approx 25\mathbf{u}^{-1/2}\).

Summary

In previous papers, the factor \(\gamma _k\) has been replaced by \(k\mathbf{u}\) in a number of classical error estimates in numerical analysis together with removing the restriction on \(k\). We proved that \(k\mathbf{u}\) can be used for general products and for the Horner scheme, however, with a mandatory restriction on \(k\) of the order \(k\lesssim \mathbf{u}^{-1/2}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rump, S.M., Bünger, F. & Jeannerod, CP. Improved error bounds for floating-point products and Horner’s scheme. Bit Numer Math 56, 293–307 (2016). https://doi.org/10.1007/s10543-015-0555-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10543-015-0555-z

Keywords

Mathematics Subject Classification

Navigation