Improved error bounds for floating-point products and Horner’s scheme

Rump, Siegfried M.; Bünger, Florian; Jeannerod, Claude-Pierre

doi:10.1007/s10543-015-0555-z

Improved error bounds for floating-point products and Horner’s scheme

Published: 24 March 2015

Volume 56, pages 293–307, (2016)
Cite this article

BIT Numerical Mathematics Aims and scope Submit manuscript

Siegfried M. Rump^1,2,
Florian Bünger¹ &
Claude-Pierre Jeannerod³

315 Accesses
6 Citations
Explore all metrics

Abstract

Let $\mathbf{u}$ denote the relative rounding error of some floating-point format. Recently it has been shown that for a number of standard Wilkinson-type bounds the typical factors $\gamma _k:=k\mathbf{u}/(1-k\mathbf{u})$ can be improved into $k\mathbf{u}$, and that the bounds are valid without restriction on $k$. Problems include summation, dot products and thus matrix multiplication, residual bounds for $LU$- and Cholesky-decomposition, and triangular system solving by substitution. In this note we show a similar result for the product $\prod _{i=0}^k{x_i}$ of real and/or floating-point numbers $x_i$, for computation in any order, and for any base $\beta \geqslant 2$. The derived error bounds are valid under a mandatory restriction of $k$. Moreover, we prove a similar bound for Horner’s polynomial evaluation scheme.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Error estimates for the summation of real numbers with application to floating-point summation

Article 03 May 2017

On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic

Article 01 February 2015

Accurate evaluation of Chebyshev polynomials in floating-point arithmetic

Article 09 November 2018

Notes

In what follows, adjacent means child or parent node.
Thus, for the classical case $\beta =2$ a contradiction to $\{i,j\}\subseteq I$ is already obtained.
In fact, (2.29) is applied to $|x_0|, |x_1|$ because the proof of Theorem 1.2 assumes positive factors.
In [1] long sequences $x_i\in \mathbb {F}$ with ${\text {fl}}((\ldots ({\text {fl}}(x_0x_1)x_2)\ldots )x_k) = x_0$ are constructed for some precisions.

References

Graillat, S., Lefèvre, V., Muller, J.-M.: On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic. Numer. Algorithms (2015, to appear). doi:10.1007/s11075-015-9967-8
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)
Book MATH Google Scholar
ANSI/IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic. IEEE, New York (2008)
Jeannerod, C.-P., Rump, S.M.: Improved error bounds for inner products in floating-point arithmetic. SIAM. J. Matrix Anal. Appl. (SIMAX) 34(2), 338–344 (2013)
Article MathSciNet MATH Google Scholar
Jeannerod, C.-P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications, January 2014 (preprint)
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)
MATH Google Scholar
MATLAB: User’s Guide, Version 2013b, the MathWorks Inc. (2013)
Muller, J.-M.: On the maximum relative error when computing iterated integer powers in floating-point arithmetic. In: INVA Conference, Tokyo (2014)
Rump, S.M., Jeannerod, C.-P.: Improved error bounds for LU and Cholesky factorizations. SIAM. J. Matrix Anal. Appl. (SIMAX) 35(2), 699–724 (2014)
Article MathSciNet MATH Google Scholar
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors wish to thank Marko Lange, Vincent Lefèvre, and Jean-Michel Muller for their fruitful comments on a preliminary version of this note. Moreover, our thanks to two anonymous referees for their thoughtful suggestions.

Author information

Authors and Affiliations

Institute for Reliable Computing, Hamburg University of Technology, Schwarzenbergstraße 95, 21071, Hamburg, Germany
Siegfried M. Rump & Florian Bünger
Faculty of Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo, 169-8555, Japan
Siegfried M. Rump
Inria, Laboratoire LIP (CNRS, ENS de Lyon, Inria, UCBL), Université de Lyon, 46 allée d’Italie, 69364, Lyon Cedex 07, France
Claude-Pierre Jeannerod

Authors

Siegfried M. Rump
View author publications
You can also search for this author in PubMed Google Scholar
Florian Bünger
View author publications
You can also search for this author in PubMed Google Scholar
Claude-Pierre Jeannerod
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siegfried M. Rump.

Additional information

Communicated by Axel Ruhe.

Appendices

Appendix

The goal of this appendix is to prove that for $\beta =2$ and $p\geqslant 4$ the constraint $k < \mathbf{u}^{-1/2}$ in Theorem 1.2 cannot be replaced by $k < 12\mathbf{u}^{-1/2}$. To do that^{Footnote 4} we construct $x_0,x_1,x_2\in \mathbb {F}$ for given precision $p$ such that $x_1x_2 < 1$ and ${\text {fl}}({\text {fl}}(x_0x_1)x_2) = x_0$. Subsequent multiplications by $x_1x_2$ produce an exponential growth of the rounding error, eventually exceeding $k\mathbf{u}$.

Define $s:=\lfloor \mathbf{u}^{-1/2}\rfloor \in \mathbb {N}$, so that $s=\mathbf{u}^{-1/2}-\delta $ with $0\leqslant \delta < 1$. We henceforth assume $p \geqslant 15$ and treat the case $p\leqslant 14$ later. Note that $\beta =2$ and $p \geqslant 15$ imply $s\geqslant 181$. We distinguish two cases.

First, assume $s$ is odd. Set

$$\begin{aligned} x_0:=1+(2s+8)\mathbf{u},\quad x_1:=1-(s-4)\mathbf{u}, \quad \hbox {and}\quad x_2:=1+(s-5)\mathbf{u}, \end{aligned}$$

so that $x_i\in \mathbb {F}$. Then, $x_0x_1=1+(s+10)\mathbf{u}+\mu _1\mathbf{u}$ with $\mu _1:=4\delta \sqrt{\mathbf{u}}+(32-2\delta ^2)\mathbf{u}$, so that $0<\mu _1<1$ and $s$ odd imply ${\text {fl}}(x_0x_1)=1+(s+11)\mathbf{u}$. Moreover, ${\text {fl}}(x_0x_1)x_2 = 1+(2s+7)\mathbf{u}+\mu _2\mathbf{u}$ with

$$\begin{aligned} \mu _2 := \sqrt{\mathbf{u}}\left( 6-55\sqrt{\mathbf{u}}+\Phi \delta \right) \quad \hbox {with }\Phi :=(\delta -6)\sqrt{\mathbf{u}}-2. \end{aligned}$$

Now $\Phi <0$ for any value of $\delta $, so that $0<4\sqrt{\mathbf{u}}-60\mathbf{u}\leqslant \mu _2\leqslant 6\sqrt{\mathbf{u}}-55\mathbf{u}<1$. Thus,

$$\begin{aligned} {\text {fl}}({\text {fl}}(x_0x_1)x_2) = x_0. \end{aligned}$$

(4.1)

Define a vector $X:=[x_0 \; x \; x \ldots x]\in \mathbb {F}^{2m+1}$ with $m$ times repeating the row vector $x = [x_1 \, x_2] \in \mathbb {F}^2$. Denoting ${\widehat{r}}_0:=x_0$ and ${\widehat{r}}_i:={\text {fl}}({\widehat{r}}_{i-1} X_i)$ for $i\geqslant 1$ yields ${\widehat{r}}_2=v_0$. Then, abbreviating $\pi :=x_1x_2$ and using ${\widehat{r}}_{2m}={\widehat{r}}_2=x_0$ gives

$$\begin{aligned} {\widehat{r}}_{2m}-\prod _{i=0}^{2m}{X_i} = x_0 - x_0\pi ^m = (\pi ^{-m}-1)\prod _{i=0}^{2m}{X_i}\quad \hbox {for }1\leqslant m\in \mathbb {N}. \end{aligned}$$

(4.2)

Now,

$$\begin{aligned} \pi = 1 - \left( 2 - (9+2\delta )\sqrt{\mathbf{u}}\right) \mathbf{u}- (20+9\delta +\delta ^2)\mathbf{u}^2 < 1 - \left( 2 - 11\sqrt{\mathbf{u}}\right) \mathbf{u}=: 1-\gamma \mathbf{u}, \end{aligned}$$

and for $m\in \mathbb {N}$,

$$\begin{aligned} \pi ^{-m} > 1+m\gamma \mathbf{u}+ \frac{m(m-1)}{2}\gamma ^2\mathbf{u}^2 = 1+2m\mathbf{u}+\frac{m\mathbf{u}\sqrt{\mathbf{u}}}{2}\left[ (m-1)\gamma ^2\sqrt{\mathbf{u}}-22\right] . \end{aligned}$$

The assumption $p\geqslant 15$ implies

$$\begin{aligned} (6-2\sqrt{\mathbf{u}})\gamma ^2-22 = 2 - 272 \sqrt{\mathbf{u}} + (814-242\sqrt{\mathbf{u}})\mathbf{u}> 2 - 272 \sqrt{\mathbf{u}} > 0, \end{aligned}$$

and therefore

$$\begin{aligned} m \geqslant 6 \mathbf{u}^{-1/2} - 1 \quad \Rightarrow \quad \pi ^{-m} > 1+2m\mathbf{u}. \end{aligned}$$

(4.3)

Combining this with (4.2) shows that the error bound in (1.6) is not satisfied for $k=2\left\lceil 6\mathbf{u}^{-1/2}-1\right\rceil < 12 \mathbf{u}^{-1/2}$, and that finishes the first part.

Second, assume $s$ is even and define as before

$$\begin{aligned} y_0:=1+(2s+6)\mathbf{u},\quad y_1:=1-(s-3)\mathbf{u}, \quad \hbox {and}\quad y_2:=1+(s-4)\mathbf{u}. \end{aligned}$$

(4.4)

Then, $y_i\in \mathbb {F}$. Furthermore, $y_0y_1=1+(s+7)\mathbf{u}+\mu _1\mathbf{u}$ with $\mu _1:=4\delta \sqrt{\mathbf{u}}+(18-2\delta ^2)\mathbf{u}$, so that $0<\mu _1<1$ and $s$ even imply ${\text {fl}}(y_0y_1)=1+(s+8)\mathbf{u}$. Moreover, ${\text {fl}}(y_0y_1)y_2 = 1+(2s+5)\mathbf{u}+\mu _2\mathbf{u}$ with

$$\begin{aligned} \mu _2 := \sqrt{\mathbf{u}}\left( 4-32\sqrt{\mathbf{u}}+\Phi \delta \right) \quad \hbox {with }\Phi :=(\delta -4)\sqrt{\mathbf{u}}-2. \end{aligned}$$

As before, $\Phi <0$ for any value of $\delta $. Thus, $0<2\sqrt{\mathbf{u}}-35\mathbf{u}\leqslant \mu _2\leqslant 4\sqrt{\mathbf{u}}-32\mathbf{u}<1$. Hence, similar to (4.1), ${\text {fl}}({\text {fl}}(y_0y_1)y_2) = y_0$ is again true. Now for the values $y_1, y_2$ in (4.4) we obtain

$$\begin{aligned} y_1y_2 = (1-(s-3)\mathbf{u})(1+(s-4)\mathbf{u}) < x_1x_2, \end{aligned}$$

and the result follows as before. Finally, for the cases $4\leqslant p\leqslant 14$, consider

p	$m_0$	$m_1$	$m_2$	F
4	2	$-$4	4	9.6
5	20	$-$3	2	8.9
6	32	$-$14	16	5.8
7	28	$-$9	8	6.8
8	52	$-$39	44	5.8
9	48	$-$21	20	4.6
10	140	$-$117	130	5.2
11	94	$-$43	42	5.8
12	186	$-$154	158	4.0
13	184	$-$89	88	4.1
14	262	$-$125	124	7.2

For precision $p$ define $x_i:=1+m_i\mathbf{u}$. Then, (4.1) is satisfied, and the error bound in (1.6) is not true for $k<F\mathbf{u}^{-1/2}$. This finishes the proof. $\square $

We finally mention that it is easy to see that, if $1\leqslant p\leqslant 2$, then the error bound in (1.6) is satisfied for all $k\in \mathbb {N}$, and if $p=3$, then the minimum value of $k$ for which it is not satisfied is $k=72\approx 25\mathbf{u}^{-1/2}$.

Summary

In previous papers, the factor $\gamma _k$ has been replaced by $k\mathbf{u}$ in a number of classical error estimates in numerical analysis together with removing the restriction on $k$. We proved that $k\mathbf{u}$ can be used for general products and for the Horner scheme, however, with a mandatory restriction on $k$ of the order $k\lesssim \mathbf{u}^{-1/2}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rump, S.M., Bünger, F. & Jeannerod, CP. Improved error bounds for floating-point products and Horner’s scheme. Bit Numer Math 56, 293–307 (2016). https://doi.org/10.1007/s10543-015-0555-z

Download citation

Received: 03 November 2014
Accepted: 05 March 2015
Published: 24 March 2015
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10543-015-0555-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improved error bounds for floating-point products and Horner’s scheme

Abstract

Access this article

Similar content being viewed by others

Error estimates for the summation of real numbers with application to floating-point summation

On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic

Accurate evaluation of Chebyshev polynomials in floating-point arithmetic

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix

Summary

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Improved error bounds for floating-point products and Horner’s scheme

Abstract

Access this article

Similar content being viewed by others

Error estimates for the summation of real numbers with application to floating-point summation

On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic

Accurate evaluation of Chebyshev polynomials in floating-point arithmetic

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix

Summary

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation