Abstract
Double rounding is a phenomenon that may occur when different floating-point precisions are available on the same system. Although double rounding is, in general, innocuous, it may change the behavior of some useful small floating-point algorithms. We analyze the potential influence of double rounding on the Fast2Sum and 2Sum algorithms, on some summation algorithms, and Veltkamp’s splitting.
Similar content being viewed by others
Notes
The FMA instruction evaluates expressions of the form xy+z with one final rounding only.
References
Bertot, Y., Castéran, P.: Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer, Berlin (2004)
Boldo, S.: Pitfalls of a full floating-point proof: example on the formal proof of the Veltkamp/Dekker algorithms. In: Furbach, U., Shankar, N. (eds.) Proceedings of the 3rd International Joint Conference on Automated Reasoning. Lecture Notes in Computer Science, vol. 4130, pp. 52–66 (2006)
Boldo, S., Daumas, M.: Representable correcting terms for possibly underflowing floating point operations. In: Bajard, J.C., Schulte, M. (eds.) Proceedings of the 16th Symposium on Computer Arithmetic, pp. 79–86. IEEE Comput. Soc. Press, Los Alamitos (2003)
Boldo, S., Daumas, M., Moreau-Finot, C., Théry, L.: Computer validated proofs of a toolset for adaptable arithmetic. Tech. rep, École Normale Supérieure de Lyon (2001). Available at http://arxiv.org/pdf/cs.MS/0107025
Boldo, S., Melquiond, G.: Emulation of FMA and correctly rounded sums: proved algorithms using rounding to odd. IEEE Trans. Comput. 57(4), 462–471 (2008)
Cornea, M., Harrison, J., Anderson, C., Tang, P.T.P., Schneider, E., Gvozdev, E.: A software implementation of the IEEE 754R decimal floating-point arithmetic using the binary encoding format. IEEE Trans. Comput. 58(2), 148–162 (2009)
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18(3), 224–242 (1971)
Figueroa, S.A.: When is double rounding innocuous? ACM SIGNUM Newsl. 30(3) (1995)
Figueroa, S.A.: A rigorous framework for fully supporting the IEEE standard for floating-point arithmetic in high-level programming languages. Ph.D. thesis, Department of Computer Science, New York University (2000)
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Comput. Surv. 23(1), 5–47 (1991). An edited reprint is available at http://www.physics.ohio-state.edu/~dws/grouplinks/floating_point_math.pdf from Sun’s Numerical Computation Guide; it contains an addendum ”Differences among IEEE 754 implementations”, also available at http://www.validlab.com/goldberg/addendum.html
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia (2002)
IEEE Computer Society: IEEE standard for floating-point arithmetic. IEEE Standard 754-2008 (2008). Available at http://ieeexplore.ieee.org/servlet/opac?punumber=4610933
International Organization for Standardization: Programming languages—C. ISO/IEC Standard 9899:1999, Geneva, Switzerland (1999)
Kahan, W.: Pracniques: further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965)
Kahan, W.: Lecture notes on the status of IEEE-754 (1996). PDF file accessible at http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Knuth, D.: The Art of Computer Programming vol. 2, 3rd edn. Addison-Wesley, Reading (1998)
Møller, O.: Quasi double-precision in floating-point addition. BIT Numer. Math. 5, 37–50 (1965)
Monniaux, D.: The pitfalls of verifying floating-point computations. ACM TOPLAS 30(3), 12 (2008)
Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Birkhäuser, Boston (2010)
Neumaier, A.: Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen. Z. Angew. Math. Mech. 54, 39–51 (1974) (in German)
Nievergelt, Y.: Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit. ACM Trans. Math. Softw. 29(1), 27–48 (2003)
Ogita, T., Rump, S.M., Oishi, S.: Accurate sum and dot product. SIAM J. Sci. Comput. 26(6), 1955–1988 (2005)
Pichat, M.: Correction d’une somme en arithmétique à virgule flottante. Numer. Math. 19, 400–406 (1972) (in French)
Priest, D.M.: Algorithms for arbitrary precision floating point arithmetic. In: Kornerup, P., Matula, D.W. (eds.) Proceedings of the 10th IEEE Symposium on Computer Arithmetic (Arith-10), pp. 132–144. IEEE Comput. Soc. Press, Los Alamitos (1991)
Priest, D.M.: On properties of floating-point arithmetics: numerical stability and the cost of accurate computations. Ph.D. thesis, University of California at Berkeley (1992)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation, part II: sign, K-fold faithful and rounding to nearest. SIAM J. Sci. Comput. 31(2), 1269–1302 (2008)
Shewchuk, J.R.: Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete Comput. Geom. 18, 305–363 (1997)
Sterbenz, P.H.: Floating-Point Computation. Prentice-Hall, Englewood Cliffs (1974)
Acknowledgements
We are extremely grateful to the anonymous referees, whose suggestions have been very helpful for revising this paper. Especially, one of them suggested a drastic simplification of the proof of Theorem 4.1.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Axel Ruhe.
This work is partly supported by the TaMaDi project of the French Agence Nationale de la Recherche.
Rights and permissions
About this article
Cite this article
Martin-Dorel, É., Melquiond, G. & Muller, JM. Some issues related to double rounding. Bit Numer Math 53, 897–924 (2013). https://doi.org/10.1007/s10543-013-0436-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10543-013-0436-2