Fast and Accurate Floating Point Summation with Application to Computational Geometry
 James Demmel,
 Yozo Hida
 … show all 2 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessAbstract
We present several simple algorithms for accurately computing the sum of n floating point numbers using a wider accumulator. Let f and F be the number of significant bits in the summands and the accumulator, respectively. Then assuming gradual underflow, no overflow, and roundtonearest arithmetic, up to ⌊2^{ F−f }/(1−2^{−f })⌋+1 numbers can be accurately added by just summing the terms in decreasing order of exponents, yielding a sum correct to within about 1.5 units in the last place. In particular, if the sum is zero, it is computed exactly. We apply this result to the floating point formats in the IEEE floating point standard, and investigate its performance. Our results show that in the absence of massive cancellation (the most common case) the cost of guaranteed accuracy is about 30–40% more than the straightforward summation. If massive cancellation does occur, the cost of computing the accurate sum is about a factor of ten. Finally, we apply our algorithm in computing a robust geometric predicate (used in computational geometry), where our accurate summation algorithm improves the existing algorithm by a factor of two on a nearly coplanar set of points.
 ANSI/IEEE, IEEE standard for binary floating point arithmetic, New York, Std 754–1985 edition (1985).
 G. Bohlender, Floating point computation of functions with maximum accuracy, IEEE Trans. Comput. 26 (1977) 621–632.
 T. Dekker, A floating point technique for extending the available precision, Numer. Math. 18 (1971) 224–242.
 J. Demmel and Y. Hida, Accurate floating point summation, Computer Science Division Technical Report UCB//CSD–02–1180, University of California, Berkeley, submitted to SIAM J. Sci. Comput.
 N.J. Higham, The accuracy of floating point summation, SIAMJ. Sci. Comput. 14(4) (1993) 783–799.
 N.J. Higham, Accuracy and Stability of Numerical Algorithms (SIAM, Philadelphia, PA, 1996).
 Intel Corporation, Intel Itanium architecture software developer's manual, Vol. 1. Intel Corporation (2002); http://developer.intel.com/design/itanium/manuals.
 Intel Corporation, IA32 Intel architecture software developer's manual, Vol. 1, Intel Corporation (2002); http://developer.intel.com/design/pentium/manuals.
 W. Kahan, Doubledprecision IEEE standard 754 floating point arithmetic, manuscript (1987).
 D. Knuth, The Art of Computer Programming, Vol. 2 (AddisonWesley, Reading, MA, 1969).
 U. Kulisch and G. Bohlender, Formalization and implementation of floatingpoint matrix operations, Computing 16 (1976) 239–261.
 U. Kulisch and W.L. Miranker, Computer Arithmetic in Theory and Practice (Academic Press, New York, 1981).
 [13] H. Leuprecht and W. Oberaigner, Parallel algorithms for the rounding exact summation of floating point numbers, Computing 28 (1982) 89–104.
 S. Linnainmaa, Software for doubledprecision floating point computations, ACM Trans. Math. Software 7 (1981) 272–283.
 M. Malcolm, On accurate floatingpoint summation, Comm. ACM 14(11) (1971) 731–736.
 O. Møller, Quasi double precision in floatingpoint arithmetic, BIT 5 (1965) 37–50.
 M. Pichat, Correction d'une somme en arithmétique à virgule flottante, Numer. Math. 19 (1972) 400–406.
 D. Priest, Algorithms for arbitrary precision floating point arithmetic, in: Proc. of the 10th Symposium on Computer Arithmetic, eds. P. Kornerup and D. Matula, Grenoble, France, 26–28 June 1991 (IEEE Computer Soc. Press) pp. 132–145.
 D. Priest, On properties of floating point arithmetics: Numerical stability and the cost of accurate computations, Ph.D. thesis, University of California at Berkeley (1992); available through anonymous FTP at ftp.icsi.berkeley.edu/pub/theory/priestthesis.ps.Z.
 D.R. Ross, Reducing truncation errors using cascading accumulators, Comm. ACM 8(1) (1965) 32–33.
 J.R. Shewchuk, Adaptive precision floatingpoint arithmetic and fast robust geometric predicates, Discrete Comput. Geometry 18(3) (1997) 305–363.
 J.M. Wolfe, Reducing truncation errors by programming, Comm. ACM 7(6) (1964) 355–356.
 Title
 Fast and Accurate Floating Point Summation with Application to Computational Geometry
 Journal

Numerical Algorithms
Volume 37, Issue 14 , pp 101112
 Cover Date
 20041201
 DOI
 10.1023/B:NUMA.0000049458.99541.38
 Print ISSN
 10171398
 Online ISSN
 15729265
 Publisher
 Kluwer Academic Publishers
 Additional Links
 Topics
 Keywords

 floating point summation
 rounding error analysis
 computational geometry
 robust geometric predicate
 Industry Sectors
 Authors

 James Demmel ^{(1)}
 Yozo Hida ^{(2)}
 Author Affiliations

 1. Computer Science Division and Mathematics Department, University of California, Berkeley, CA, 94720, USA
 2. Computer Science Division, University of California, Berkeley, CA, 94720, USA