Abstract
Let \(\mathbf{u}\) denote the relative rounding error of some floating-point format. Recently it has been shown that for a number of standard Wilkinson-type bounds the typical factors \(\gamma _k:=k\mathbf{u}/(1-k\mathbf{u})\) can be improved into \(k\mathbf{u}\), and that the bounds are valid without restriction on \(k\). Problems include summation, dot products and thus matrix multiplication, residual bounds for \(LU\)- and Cholesky-decomposition, and triangular system solving by substitution. In this note we show a similar result for the product \(\prod _{i=0}^k{x_i}\) of real and/or floating-point numbers \(x_i\), for computation in any order, and for any base \(\beta \geqslant 2\). The derived error bounds are valid under a mandatory restriction of \(k\). Moreover, we prove a similar bound for Horner’s polynomial evaluation scheme.
Similar content being viewed by others
Notes
In what follows, adjacent means child or parent node.
Thus, for the classical case \(\beta =2\) a contradiction to \(\{i,j\}\subseteq I\) is already obtained.
In [1] long sequences \(x_i\in \mathbb {F}\) with \({\text {fl}}((\ldots ({\text {fl}}(x_0x_1)x_2)\ldots )x_k) = x_0\) are constructed for some precisions.
References
Graillat, S., Lefèvre, V., Muller, J.-M.: On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic. Numer. Algorithms (2015, to appear). doi:10.1007/s11075-015-9967-8
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)
ANSI/IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic. IEEE, New York (2008)
Jeannerod, C.-P., Rump, S.M.: Improved error bounds for inner products in floating-point arithmetic. SIAM. J. Matrix Anal. Appl. (SIMAX) 34(2), 338–344 (2013)
Jeannerod, C.-P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications, January 2014 (preprint)
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)
MATLAB: User’s Guide, Version 2013b, the MathWorks Inc. (2013)
Muller, J.-M.: On the maximum relative error when computing iterated integer powers in floating-point arithmetic. In: INVA Conference, Tokyo (2014)
Rump, S.M., Jeannerod, C.-P.: Improved error bounds for LU and Cholesky factorizations. SIAM. J. Matrix Anal. Appl. (SIMAX) 35(2), 699–724 (2014)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Acknowledgments
The authors wish to thank Marko Lange, Vincent Lefèvre, and Jean-Michel Muller for their fruitful comments on a preliminary version of this note. Moreover, our thanks to two anonymous referees for their thoughtful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Axel Ruhe.
Appendices
Appendix
The goal of this appendix is to prove that for \(\beta =2\) and \(p\geqslant 4\) the constraint \(k < \mathbf{u}^{-1/2}\) in Theorem 1.2 cannot be replaced by \(k < 12\mathbf{u}^{-1/2}\). To do thatFootnote 4 we construct \(x_0,x_1,x_2\in \mathbb {F}\) for given precision \(p\) such that \(x_1x_2 < 1\) and \({\text {fl}}({\text {fl}}(x_0x_1)x_2) = x_0\). Subsequent multiplications by \(x_1x_2\) produce an exponential growth of the rounding error, eventually exceeding \(k\mathbf{u}\).
Define \(s:=\lfloor \mathbf{u}^{-1/2}\rfloor \in \mathbb {N}\), so that \(s=\mathbf{u}^{-1/2}-\delta \) with \(0\leqslant \delta < 1\). We henceforth assume \(p \geqslant 15\) and treat the case \(p\leqslant 14\) later. Note that \(\beta =2\) and \(p \geqslant 15\) imply \(s\geqslant 181\). We distinguish two cases.
First, assume \(s\) is odd. Set
so that \(x_i\in \mathbb {F}\). Then, \(x_0x_1=1+(s+10)\mathbf{u}+\mu _1\mathbf{u}\) with \(\mu _1:=4\delta \sqrt{\mathbf{u}}+(32-2\delta ^2)\mathbf{u}\), so that \(0<\mu _1<1\) and \(s\) odd imply \({\text {fl}}(x_0x_1)=1+(s+11)\mathbf{u}\). Moreover, \({\text {fl}}(x_0x_1)x_2 = 1+(2s+7)\mathbf{u}+\mu _2\mathbf{u}\) with
Now \(\Phi <0\) for any value of \(\delta \), so that \(0<4\sqrt{\mathbf{u}}-60\mathbf{u}\leqslant \mu _2\leqslant 6\sqrt{\mathbf{u}}-55\mathbf{u}<1\). Thus,
Define a vector \(X:=[x_0 \; x \; x \ldots x]\in \mathbb {F}^{2m+1}\) with \(m\) times repeating the row vector \(x = [x_1 \, x_2] \in \mathbb {F}^2\). Denoting \({\widehat{r}}_0:=x_0\) and \({\widehat{r}}_i:={\text {fl}}({\widehat{r}}_{i-1} X_i)\) for \(i\geqslant 1\) yields \({\widehat{r}}_2=v_0\). Then, abbreviating \(\pi :=x_1x_2\) and using \({\widehat{r}}_{2m}={\widehat{r}}_2=x_0\) gives
Now,
and for \(m\in \mathbb {N}\),
The assumption \(p\geqslant 15\) implies
and therefore
Combining this with (4.2) shows that the error bound in (1.6) is not satisfied for \(k=2\left\lceil 6\mathbf{u}^{-1/2}-1\right\rceil < 12 \mathbf{u}^{-1/2}\), and that finishes the first part.
Second, assume \(s\) is even and define as before
Then, \(y_i\in \mathbb {F}\). Furthermore, \(y_0y_1=1+(s+7)\mathbf{u}+\mu _1\mathbf{u}\) with \(\mu _1:=4\delta \sqrt{\mathbf{u}}+(18-2\delta ^2)\mathbf{u}\), so that \(0<\mu _1<1\) and \(s\) even imply \({\text {fl}}(y_0y_1)=1+(s+8)\mathbf{u}\). Moreover, \({\text {fl}}(y_0y_1)y_2 = 1+(2s+5)\mathbf{u}+\mu _2\mathbf{u}\) with
As before, \(\Phi <0\) for any value of \(\delta \). Thus, \(0<2\sqrt{\mathbf{u}}-35\mathbf{u}\leqslant \mu _2\leqslant 4\sqrt{\mathbf{u}}-32\mathbf{u}<1\). Hence, similar to (4.1), \({\text {fl}}({\text {fl}}(y_0y_1)y_2) = y_0\) is again true. Now for the values \(y_1, y_2\) in (4.4) we obtain
and the result follows as before. Finally, for the cases \(4\leqslant p\leqslant 14\), consider
p | \(m_0\) | \(m_1\) | \(m_2\) | F |
---|---|---|---|---|
4 | 2 | \(-\)4 | 4 | 9.6 |
5 | 20 | \(-\)3 | 2 | 8.9 |
6 | 32 | \(-\)14 | 16 | 5.8 |
7 | 28 | \(-\)9 | 8 | 6.8 |
8 | 52 | \(-\)39 | 44 | 5.8 |
9 | 48 | \(-\)21 | 20 | 4.6 |
10 | 140 | \(-\)117 | 130 | 5.2 |
11 | 94 | \(-\)43 | 42 | 5.8 |
12 | 186 | \(-\)154 | 158 | 4.0 |
13 | 184 | \(-\)89 | 88 | 4.1 |
14 | 262 | \(-\)125 | 124 | 7.2 |
For precision \(p\) define \(x_i:=1+m_i\mathbf{u}\). Then, (4.1) is satisfied, and the error bound in (1.6) is not true for \(k<F\mathbf{u}^{-1/2}\). This finishes the proof. \(\square \)
We finally mention that it is easy to see that, if \(1\leqslant p\leqslant 2\), then the error bound in (1.6) is satisfied for all \(k\in \mathbb {N}\), and if \(p=3\), then the minimum value of \(k\) for which it is not satisfied is \(k=72\approx 25\mathbf{u}^{-1/2}\).
Summary
In previous papers, the factor \(\gamma _k\) has been replaced by \(k\mathbf{u}\) in a number of classical error estimates in numerical analysis together with removing the restriction on \(k\). We proved that \(k\mathbf{u}\) can be used for general products and for the Horner scheme, however, with a mandatory restriction on \(k\) of the order \(k\lesssim \mathbf{u}^{-1/2}\).
Rights and permissions
About this article
Cite this article
Rump, S.M., Bünger, F. & Jeannerod, CP. Improved error bounds for floating-point products and Horner’s scheme. Bit Numer Math 56, 293–307 (2016). https://doi.org/10.1007/s10543-015-0555-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10543-015-0555-z