Exploiting Structure in Floating-Point Arithmetic

  • Claude-Pierre JeannerodEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9582)


The analysis of algorithms in IEEE floating-point arithmetic is most often carried out via repeated applications of the so-called standard model, which bounds the relative error of each basic operation by a common epsilon depending only on the format. While this approach has been eminently useful for establishing many accuracy and stability results, it fails to capture most of the low-level features that make floating-point arithmetic so highly structured. In this paper, we survey some of those properties and how to exploit them in rounding error analysis. In particular, we review some recent improvements of several classical, Wilkinson-style error bounds from linear algebra and complex arithmetic that all rely on such structure properties.


Floating-point arithmetic IEEE standard 754-2008 Rounding error analysis High relative accuracy 



I am grateful to Ilias Kotsireas, Siegfried M. Rump, and Chee Yap for giving me the opportunity to write this survey. This work was supported in part by the French National Research Agency, under grant ANR-13-INSE-0007 (MetaLibm).


  1. 1.
    Baudin, M.: Error bounds of complex arithmetic, June 2011.
  2. 2.
    Brent, R.P., Percival, C., Zimmermann, P.: Error bounds on complex floating-point multiplication. Math. Comput. 76, 1469–1481 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Brent, R.P., Zimmerman, P.: Modern Computer Arithmetic. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
  4. 4.
    Brunie, N.: Contributions to Computer Arithmetic and Applications to Embedded Systems. Ph.D. thesis, École Normale Supérieure de Lyon, Lyon, France, May 2014.
  5. 5.
    Champagne, W.P.: On finding roots of polynomials by hook or by crook. Master’s thesis, University of Texas, Austin, Texas (1964)Google Scholar
  6. 6.
    Corless, R.M., Fillion, N.: A Graduate Introduction to Numerical Methods, From the Viewpoint of Backward Error Analysis. Springer, New York (2013)CrossRefzbMATHGoogle Scholar
  7. 7.
    Cornea, M., Harrison, J., Tang, P.T.P.: Scientific Computing on Itanium\({}^{\textregistered }\)-based Systems. Intel Press, Hillsboro (2002)Google Scholar
  8. 8.
    Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Demmel, J.W.: Applied Numerical Linear Algebra. SIAM, Philadelphia (1997)CrossRefzbMATHGoogle Scholar
  10. 10.
    Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–48 (1991)CrossRefGoogle Scholar
  11. 11.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)zbMATHGoogle Scholar
  12. 12.
    Graillat, S., Lefèvre, V., Muller, J.M.: On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic. Numer. Algorithms 70, 653–667 (2015). MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Hauser, J.R.: Handling floating-point exceptions in numeric programs. ACM Trans. Program. Lang. Syst. 18(2), 139–174 (1996)CrossRefGoogle Scholar
  14. 14.
    Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)CrossRefzbMATHGoogle Scholar
  15. 15.
    Higham, N.J.: Floating-point arithmetic. In: Higham, N.J., Dennis, M.R., Glendinning, P., Martin, P.A., Santosa, F., Tanner, J. (eds.) The Princeton Companion to Applied Mathematics, pp. 96–97. Princeton University Press, Princeton (2015)Google Scholar
  16. 16.
    Holm, J.E.: Floating-Point Arithmetic and Program Correctness Proofs. Ph.D. thesis, Cornell University, Ithaca, NY, USA, August 1980Google Scholar
  17. 17.
    IEEE Computer Society: IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754–1985. IEEE Computer Society, New York (1985)Google Scholar
  18. 18.
    IEEE Computer Society: IEEE Standard for Floating-Point Arithmetic, IEEE Standard 754–2008. IEEE Computer Society, New York (2008)Google Scholar
  19. 19.
    Jeannerod, C.P.: A radix-independent error analysis of the Cornea-Harrison-Tang method, to appear in ACM Trans. Math. Softw.
  20. 20.
    Jeannerod, C.P., Kornerup, P., Louvet, N., Muller, J.M.: Error bounds on complex floating-point multiplication with an FMA, to appear in Math. Comput.
  21. 21.
    Jeannerod, C.P., Louvet, N., Muller, J.M.: Further analysis of Kahan’s algorithm for the accurate computation of \(2\times 2\) determinants. Math. Comput. 82(284), 2245–2264 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Jeannerod, C.P., Louvet, N., Muller, J.M., Plet, A.: A library for symbolic floating-point arithmetic (2015).
  23. 23.
    Jeannerod, C.-P., Louvet, N., Muller, J.-M., Plet, A.: Sharp error bounds for complex floating-point inversion. Numer. Algorithms 1–26 (2016).
  24. 24.
    Jeannerod, C.P., Rump, S.M.: Improved error bounds for inner products in floating-point arithmetic. SIAM J. Matrix Anal. Appl. 34(2), 338–344 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Jeannerod, C.P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications (2014).
  26. 26.
    Kahan, W.: Further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965)CrossRefGoogle Scholar
  27. 27.
    Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2, 3rd edn. Addison-Wesley, Reading (1998)zbMATHGoogle Scholar
  28. 28.
    Møller, O.: Quasi double-precision in floating point addition. BIT 5, 37–50 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Monniaux, D.: The pitfalls of verifying floating-point computations. ACM Trans. Program. Lang. Syst. 30(3), 12:1–12:41 (2008)CrossRefGoogle Scholar
  30. 30.
    Muller, J.M.: On the error of computing \(ab+cd\) using Cornea, Harrison and Tang’s method. ACM Trans. Math. Softw. 41(2), 7:1–7:8 (2015)CrossRefGoogle Scholar
  31. 31.
    Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)CrossRefzbMATHGoogle Scholar
  32. 32.
    Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Overton, M.L.: Numerical Computing with IEEE Floating Point Arithmetic: Including One Theorem, One Rule of Thumb, and One Hundred and One Exercises. Society for Industrial and Applied Mathematics, Philadelphia (2001)CrossRefzbMATHGoogle Scholar
  34. 34.
    Priest, D.M.: On Properties of Floating Point Arithmetics: Numerical Stability and the Cost of Accurate Computations. Ph.D. thesis, Mathematics Department, University of California, Berkeley, CA, USA, November 1992Google Scholar
  35. 35.
    Rump, S.M.: Ultimately fast accurate summation. SIAM J. Sci. Comput. 31(5), 3466–3502 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    Rump, S.M.: Error estimation of floating-point summation and dot product. BIT 52(1), 201–220 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Rump, S.M., Bünger, F., Jeannerod, C.P.: Improved error bounds for floating-point products and Horner’s scheme. BIT (2015).
  39. 39.
    Rump, S.M., Jeannerod, C.P.: Improved backward error bounds for LU and Cholesky factorizations. SIAM J. Matrix Anal. Appl. 35(2), 684–698 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  40. 40.
    Sterbenz, P.H.: Floating-Point Computation. Prentice-Hall, Englewood Cliffs (1974)Google Scholar
  41. 41.
    Trefethen, L.N.: Computing numerically with functions instead of numbers. Math. Comput. Sci. 1(1), 9–19 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Trefethen, L.N., Bau III, D.: Numerical Linear Algebra. SIAM, Philadelphia (1997)CrossRefzbMATHGoogle Scholar
  43. 43.
    Wilkinson, J.H.: Error analysis of floating-point computation. Numer. Math. 2, 319–340 (1960)MathSciNetCrossRefzbMATHGoogle Scholar
  44. 44.
    Wilkinson, J.H.: The Algebraic Eigenvalue Problem. Oxford University Press, Oxford (1965)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Inria, Laboratoire LIP (U. Lyon, CNRS, ENSL, Inria, UCBL)Lyon Cedex 07France

Personalised recommendations