Advertisement

BIT Numerical Mathematics

, Volume 56, Issue 1, pp 309–317 | Cite as

On the definition of unit roundoff

  • Siegfried M. Rump
  • Marko Lange
Article

Abstract

The result of a floating-point operation is usually defined to be the floating-point number nearest to the exact real result together with a tie-breaking rule. This is called the first standard model of floating-point arithmetic, and the analysis of numerical algorithms is often solely based on that. In addition, a second standard model is used specifying the maximum relative error with respect to the computed result. In this note we take a more general perspective. For an arbitrary finite set of real numbers we identify the rounding to minimize the relative error in the first or the second standard model. The optimal “switching points” are the arithmetic or the harmonic means of adjacent floating-point numbers. Moreover, the maximum relative error of both models is minimized by taking the geometric mean. If the maximum relative error in one model is \(\alpha \), then \(\alpha /(1-\alpha )\) is the maximum relative error in the other model. Those maximal errors, that is the unit roundoff, are characteristic constants of a given finite set of reals: The floating-point model to be optimized identifies the rounding and the unit roundoff.

Keywords

Floating-point number IEEE 754 Rounding Tie 

Mathematics Subject Classification

65G50 

Notes

Acknowledgments

Our dearest thanks go to Claude-Pierre Jeannerod from Lyon for his many detailed comments and for very helpful discussions and suggestions. Moreover, many thanks to the anonymous referees for their valuable and constructive comments.

References

  1. 1.
    Arnold, M.G., Collange, S.: The denormal logarithmic number system. In: 24th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 117–124 (2013)Google Scholar
  2. 2.
    Clenshaw, C.W., Olver, F.W.J.: Beyond floating point. J. ACM 31(2), 319–328 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Clenshaw, C.W., Olver, F.W.J., Turner, P.R.: Level-index arithmetic: an introductory survey. Lect. Notes Math. 1397, 95–168 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Higham, N.J.: Accuracy and stability of numerical algorithms, 2nd edn. SIAM Publications, Philadelphia (2002)CrossRefzbMATHGoogle Scholar
  5. 5.
    IEEE Standard 754–2008: IEEE Standard for Floating-Point Arithmetic. IEEE Computer Society, New York (2008)Google Scholar
  6. 6.
    Jeannerod, C.-P., Rump, S.M.: On relative errors of floating-point operations: optimal bounds and applications. Preprint (2014)Google Scholar
  7. 7.
    Kingsburg, N.G., Rayner, P.J.W.: Digital filtering using logarithmic arithmetic. Electron. Lett. 7, 56–58 (1971)CrossRefGoogle Scholar
  8. 8.
    Knuth, D.E.: The art of computer programming, 3rd edn. In: Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading, Massachusetts (1998)Google Scholar
  9. 9.
    Lee, S.C., Edgar, A.D.: The focus number system. IEEE Trans. Comput. C–26, 1167–1170 (1977)CrossRefGoogle Scholar
  10. 10.
    Swartzlander Jr, E.E., Alexopoulos, A.G.: The sign/logarithm number system. IEEE Trans. Comput. C–24, 1238–1243 (1975)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2015

Authors and Affiliations

  1. 1.Institute for Reliable ComputingHamburg University of TechnologyHamburgGermany
  2. 2.Faculty of Science and EngineeringWaseda UniversityTokyoJapan

Personalised recommendations