Skip to main content

Final Rounding

  • Chapter
  • First Online:
  • 1797 Accesses

Abstract

This chapter is devoted to the problems of preserving monotonicity and always getting correctly rounded results when implementing the elementary functions in floating-point arithmetic.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Rounding functions are increasing functions, therefore, for any rounding function \(\circ {}(\cdot )\), if the “exact function” f is monotonic, and if correct rounding is provided, then the “computed function” f—equal to \(\circ {}(f)\) is monotonic too.

  2. 2.

    Correct rounding preserves symmetry if we round to the nearest or toward zero, that is, if the rounding function itself is symmetrical.

  3. 3.

    Computed with a precision somewhat larger than the “target precision.”.

  4. 4.

    The probability of a failure is about one over one million with \(m_0 = p+20\).

  5. 5.

    An algebraic number is a root of a nonzero polynomial with integer coefficients.

  6. 6.

    This should not be viewed as a problem since this will allow us to easily return correctly rounded results for small arguments (see Tables 12.4 and 12.5).

  7. 7.

    See Section 2.1.4 for a definition.

  8. 8.

    That number \(n_e\) is not necessarily the number of possible exponents of the considered floating-point format. It is the number of different exponents of the set of the input values for which we want to estimate what largest value of k will appear. For instance, if f is the cosine function and we only want to know what will be the largest value of k for input values between 0 and \(\pi /2\), we will not consider exponents larger than 1.

  9. 9.

    The exponential of a large number is an overflow, whereas the exponential of a very small number, when rounded to the nearest is 1. Thus there is no need to consider many different exponents for the exponential function. Concerning the trigonometric functions, I think that a general-purpose implementation should ideally provide correctly rounded results whenever the function is mathematically defined. And yet, many may argue that the sine, cosine, or tangent of a huge number is meaningless and I must recognize that in most cases, it does not make much sense to evaluate a trigonometric function of a number with a large exponent, unless we know for some reason that that number is exact. Maybe one day this problem will be solved by attaching an “exact” bit to the representation of the numbers themselves, as suggested for instance by Gustafson [211].

  10. 10.

    For the IEEE-754 binary64/double-precision format, we have obtained the bounds for many functions and domains (see Section 12.8.4). Getting tight bounds for much higher precisions seems out of reach. And yet, recent results tend to show that getting loose (yet still of interest) bounds for “quad”/binary128 precision might be feasible.

  11. 11.

    In fact, we know them: see Table 12.6.

  12. 12.

    Here, \(e=2.718\cdots {}\) is the base of the natural logarithm.

  13. 13.

    If one prefers to think in terms of relative error, one can use the following well-known properties: in radix-2 floating-point arithmetic, if the significand distance between y and \(y^*\) is less than \(\epsilon \), then their relative distance \(|y-y^*|/|y|\) is less than \(\epsilon \). If the relative distance between y and \(y^*\) is less than \(\epsilon _r\), then their significand distance is less than \(2\epsilon _r\).

  14. 14.

    But not that much: de Dinechin et al. achieve a worst case overhead within a factor 2 to 10 of the best libms [131].

  15. 15.

    Still findable on the Internet.

  16. 16.

    See https://lipforge.ens-lyon.fr/projects/crlibm/.

  17. 17.

    See http://lipforge.ens-lyon.fr/www/metalibm/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Michel Muller .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this chapter

Cite this chapter

Muller, JM. (2016). Final Rounding. In: Elementary Functions. Birkhäuser, Boston, MA. https://doi.org/10.1007/978-1-4899-7983-4_12

Download citation

Publish with us

Policies and ethics