Some issues related to double rounding

Martin-Dorel, Érik; Melquiond, Guillaume; Muller, Jean-Michel

doi:10.1007/s10543-013-0436-2

Some issues related to double rounding

Published: 05 July 2013

Volume 53, pages 897–924, (2013)
Cite this article

BIT Numerical Mathematics Aims and scope Submit manuscript

Érik Martin-Dorel¹,
Guillaume Melquiond² &
Jean-Michel Muller³

243 Accesses
7 Citations
Explore all metrics

Abstract

Double rounding is a phenomenon that may occur when different floating-point precisions are available on the same system. Although double rounding is, in general, innocuous, it may change the behavior of some useful small floating-point algorithms. We analyze the potential influence of double rounding on the Fast2Sum and 2Sum algorithms, on some summation algorithms, and Veltkamp’s splitting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A note on Dekker’s FastTwoSum algorithm

Article Open access 24 April 2020

Error estimates for the summation of real numbers with application to floating-point summation

Article 03 May 2017

On the definition of unit roundoff

Article 17 March 2015

Notes

The FMA instruction evaluates expressions of the form xy+z with one final rounding only.

References

Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer, Berlin (2004)
Book Google Scholar
Boldo, S.: Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms. In: Furbach, U., Shankar, N. (eds.) Proceedings of the 3rd International Joint Conference on Automated Reasoning. Lecture Notes in Computer Science, vol. 4130, pp. 52–66 (2006)
Chapter Google Scholar
Boldo, S., Daumas, M.: Representable correcting terms for possibly underflowing floating point operations. In: Bajard, J.C., Schulte, M. (eds.) Proceedings of the 16th Symposium on Computer Arithmetic, pp. 79–86. IEEE Comput. Soc. Press, Los Alamitos (2003)
Google Scholar
Boldo, S., Daumas, M., Moreau-Finot, C., Théry, L.: Computer validated proofs of a toolset for adaptable arithmetic. Tech. rep, École Normale Supérieure de Lyon (2001). Available at http://arxiv.org/pdf/cs.MS/0107025
Boldo, S., Melquiond, G.: Emulation of FMA and correctly rounded sums: proved algorithms using rounding to odd. IEEE Trans. Comput. 57(4), 462–471 (2008)
Article MathSciNet Google Scholar
Cornea, M., Harrison, J., Anderson, C., Tang, P.T.P., Schneider, E., Gvozdev, E.: A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. IEEE Trans. Comput. 58(2), 148–162 (2009)
Article MathSciNet Google Scholar
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18(3), 224–242 (1971)
Article MathSciNet MATH Google Scholar
Figueroa, S.A.: When is double rounding innocuous? ACM SIGNUM Newsl. 30(3) (1995)
Figueroa, S.A.: A rigorous framework for fully supporting the IEEE standard for floating-point arithmetic in high-level programming languages. Ph.D. thesis, Department of Computer Science, New York University (2000)
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–47 (1991). An edited reprint is available at http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf from Sun’s Numerical Computation Guide; it contains an addendum ”Differences among IEEE 754 implementations”, also available at http://www.validlab.com/goldberg/addendum.html
Article Google Scholar
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)
Book MATH Google Scholar
IEEE Computer Society: IEEE standard for floating-point arithmetic. IEEE Standard 754-2008 (2008). Available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
International Organization for Standardization: Programming languages—C. ISO/IEC Standard 9899:1999, Geneva, Switzerland (1999)
Kahan, W.: Pracniques: further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965)
Article Google Scholar
Kahan, W.: Lecture notes on the status of IEEE-754 (1996). PDF file accessible at http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Knuth, D.: The Art of Computer Programming vol. 2, 3rd edn. Addison-Wesley, Reading (1998)
Google Scholar
Møller, O.: Quasi double-precision in floating-point addition. BIT Numer. Math. 5, 37–50 (1965)
Article MATH Google Scholar
Monniaux, D.: The pitfalls of verifying floating-point computations. ACM TOPLAS 30(3), 12 (2008)
Article Google Scholar
Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)
Book MATH Google Scholar
Neumaier, A.: Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen. Z. Angew. Math. Mech. 54, 39–51 (1974) (in German)
Article MathSciNet MATH Google Scholar
Nievergelt, Y.: Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit. ACM Trans. Math. Softw. 29(1), 27–48 (2003)
Article MathSciNet MATH Google Scholar
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)
Article MathSciNet MATH Google Scholar
Pichat, M.: Correction d’une somme en arithmétique à virgule flottante. Numer. Math. 19, 400–406 (1972) (in French)
Article MathSciNet MATH Google Scholar
Priest, D.M.: Algorithms for arbitrary precision floating point arithmetic. In: Kornerup, P., Matula, D.W. (eds.) Proceedings of the 10th IEEE Symposium on Computer Arithmetic (Arith-10), pp. 132–144. IEEE Comput. Soc. Press, Los Alamitos (1991)
Google Scholar
Priest, D.M.: On properties of floating-point arithmetics: numerical stability and the cost of accurate computations. Ph.D. thesis, University of California at Berkeley (1992)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Article MathSciNet MATH Google Scholar
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part II: sign, K-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)
Article MathSciNet MATH Google Scholar
Shewchuk, J.R.: Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete Comput. Geom. 18, 305–363 (1997)
Article MathSciNet MATH Google Scholar
Sterbenz, P.H.: Floating-Point Computation. Prentice-Hall, Englewood Cliffs (1974)
Google Scholar

Download references

Acknowledgements

We are extremely grateful to the anonymous referees, whose suggestions have been very helpful for revising this paper. Especially, one of them suggested a drastic simplification of the proof of Theorem 4.1.

Author information

Authors and Affiliations

Inria Sophia Antipolis - Méditerranée, Marelle team, 2004 route des Lucioles, BP 93, 06902, Sophia Antipolis Cedex, France
Érik Martin-Dorel
Inria Saclay–Île-de-France, Toccata team, LRI Lab., CNRS, Bât. 650, Univ. Paris Sud, 91405, Orsay Cedex, France
Guillaume Melquiond
CNRS, lab. LIP, Inria Aric team, Université de Lyon, 46 Allée d’Italie, 69364, Lyon Cedex 07, France
Jean-Michel Muller

Authors

Érik Martin-Dorel
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Melquiond
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Michel Muller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jean-Michel Muller.

Additional information

Communicated by Axel Ruhe.

This work is partly supported by the TaMaDi project of the French Agence Nationale de la Recherche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martin-Dorel, É., Melquiond, G. & Muller, JM. Some issues related to double rounding. Bit Numer Math 53, 897–924 (2013). https://doi.org/10.1007/s10543-013-0436-2

Download citation

Received: 09 February 2012
Accepted: 14 June 2013
Published: 05 July 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s10543-013-0436-2

Keywords

Mathematics Subject Classification (2010)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some issues related to double rounding

Abstract

Access this article

Similar content being viewed by others

A note on Dekker’s FastTwoSum algorithm

Error estimates for the summation of real numbers with application to floating-point summation

On the definition of unit roundoff

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Some issues related to double rounding

Abstract

Access this article

Similar content being viewed by others

A note on Dekker’s FastTwoSum algorithm

Error estimates for the summation of real numbers with application to floating-point summation

On the definition of unit roundoff

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation