Computation of the unit in the first place (ufp) and the unit in the last place (ulp) in precision-p base $$\beta $$

Rump, Siegfried M.

doi:10.1007/s10543-023-00970-2

Computation of the unit in the first place (ufp) and the unit in the last place (ulp) in precision-p base $\beta $

Open access
Published: 29 April 2023

Volume 63, article number 28, (2023)
Cite this article

Download PDF

You have full access to this open access article

BIT Numerical Mathematics Aims and scope Submit manuscript

Computation of the unit in the first place (ufp) and the unit in the last place (ulp) in precision-p base $\beta $

Download PDF

Siegfried M. Rump ORCID: orcid.org/0000-0002-4779-4800^1,2

945 Accesses
Explore all metrics

Abstract

There are simple algorithms to compute the predecessor, successor, unit in the first place, unit in the last place etc. in binary arithmetic. In this note equally simple algorithms for computing the unit in the first place and the unit in the last place in precision-p base-$\beta $ arithmetic with $p \geqslant 1$ and with $\beta \geqslant 2$ are presented. The algorithms work in the underflow range, and numbers close to overflow are treated by scaling. The algorithms use only the basic operations with directed rounding. If the successor (or predecessor) of a floating-point number is available, an algorithm in rounding to nearest is presented as well.

On the definition of unit roundoff

Article 17 March 2015

On the maximum relative error when computing integer powers by iterated multiplications in floating-point arithmetic

Article 01 February 2015

Finding Normal Binary Floating-Point Factors Efficiently

Article Open access 06 February 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Notation and main result

Let ${\mathbb {F}}_{{\mathcal {N}}}$ denote the set of normalized precision-p base-$\beta $ floating-point numbers

$$\begin{aligned} {\mathbb {F}}_{{\mathcal {N}}}:= \{\pm m \beta ^e \;\;\text{ with }\;\; \beta ^{p-1} \leqslant m \leqslant \beta ^p-1 \;\;\text{ and }\;\; E_{\min } \leqslant e \leqslant E_{\max } \}, \end{aligned}$$

(1.1)

and denote by

$$\begin{aligned} {\mathbb {F}}_{{\mathcal {D}}}:= \{\pm m \beta ^{E_{\min }} \;\;\text{ with }\;\; 1 \leqslant m < \beta ^{p-1} \} \end{aligned}$$

(1.2)

the set of denormalized numbers. Then ${\mathbb {F}}:= {\mathbb {F}}_{{\mathcal {N}}} \cup {\mathbb {F}}_{{\mathcal {D}}} \cup \{0\}$ is the set of all precision-p base-$\beta $ floating-point numbers.^{Footnote 1} Set ${\mathbb {F}}^*:= {\mathbb {F}}\cup \{-\infty ,\infty \}$ and let an arithmetic on ${\mathbb {F}}^*$ following the IEEE 754 standard [7, 8] be given. That means in particular that in RoundToNearest all floating-point operations have minimal error, bounded by the relative rounding error unit $\textbf{u}:= \frac{1}{2} \beta ^{1-p}$. Moreover, different rounding modes are available, also with best possible result.

In [19] we introduced the “unit in the first place” (ufp) which is defined by

$$\begin{aligned} 0 \ne r \in {\mathbb {R}}\quad \Rightarrow \quad \textrm{ufp}(r):= \beta ^{\lfloor \log _{\beta }|r| \rfloor } \end{aligned}$$

and $\textrm{ufp}(0):=0$. For all real $r \in {\mathbb {R}}$ it is the value of the left-most nonzero digit^{Footnote 2} in the base $\beta $-representation.

In contrast, the often used “unit in the last place” (ulp) depends on the precision of the floating-point format in use. For a nonzero finite base-$\beta $ string it is the magnitude of its least significant digit, or in other words, the distance between the floating point number and the next floating point number of greater magnitude [6].^{Footnote 3} There are several other definitions of the unit in the last place, in particular for real $r \notin {\mathbb {F}}$, cf. [2, 12, 13]. We use the definition above, namely $\textrm{ulp}(r) = \beta ^e$ for $r \in {\mathbb {F}}_{{\mathcal {N}}}$ according to (1.1), $\textrm{ulp}(r) = \beta ^{E_{\min }}$ for $r \in {\mathbb {F}}_{{\mathcal {D}}}$, and $\textrm{ulp}(0)=0$. All definitions have in common that they depend not only on the basis $\beta $ but also on the precision of the floating-point format in use.

We invented the unit in the first place in [19] because it was very helpful if not mandatory to formulate complicated proofs of the validity of our new floating-point algorithms for accurate summation and dot products. We developed a small collection of rules using ufp, so that based on that no further understanding of the many properties of the IEEE 754 floating-point arithmetic was necessary to follow the proofs.

The main difference in the definition of $\textrm{ulp}$ compared to $\textrm{ufp}$ is to separate the use of the basis and of the precision. First, ufp is defined for a general real number, only depending on the basis $\beta $, and second precision-related results use ufp and the relative rounding error unit, i.e., the precision p. That separation was useful to formulate our proofs in [19] and following papers.

There are simple algorithms to compute the predecessor, successor, unit in the first place, unit in the last place etc. in binary arithmetic [3, 5, 10, 11, 13, 14], but apparently no method is known to compute the unit in the first place in a base-$\beta $ arithmetic. Jean-Michel Muller [15] proposed a method based on the results in [9], however, it needs up to $\log _2(\beta )$ iterations. We are interested in a flat, loop-free algorithm with few operations.

Recently we wrote a toolbox for an IEEE 754 precision-p base-$\beta $ arithmetic with specifiable exponent range [18] as part of INTLAB [16], the Matlab/Octave toolbox for reliable computing. As part of this we present in this note a simple algorithm to compute the unit in the first place for a precision-p base-$\beta $ arithmetic with $p \geqslant 1$ and $\beta \geqslant 2$. The algorithm works correctly in the underflow range, where numbers close to overflow are treated by scaling. That algorithm requires a directed rounding, i.e., RoundToZero, RoundUp or RoundDown; we could not construct a simple algorithm in RoundToNearest.

In addition, as a reply to suggestions by the referees, we present some additional algorithms to compute ufp and ulp. Those require a specific directed rounding mode and/or access to the predecessor/successor of a floating-point number. Since these algorithms are pretty obvious and the proofs of correctness are trivial, we banned them into the appendix.

We formulate our algorithm to compute the unit in the first place in the rounding mode RoundToZero and call the corresponding mapping $\textrm{fl}_{\diamond }: {\mathbb {R}}\rightarrow {\mathbb {F}}$. It follows that the result of a floating-point operation with positive real result x is $\max \{f \in {\mathbb {F}}: f \leqslant x\}$, and that operations cannot cause overflow.

The predecessor and successor of $x \in {\mathbb {R}}$ in ${\mathbb {F}}^*$ is defined by

$$\begin{aligned} \textrm{pred}(x):= & {} \max \{ g \in {\mathbb {F}}^*: g< x \} \\ \textrm{succ}(x):= & {} \min \{ g \in {\mathbb {F}}^*: x < g \}, \end{aligned}$$

respectively. In precision-p base-$\beta $ arithmetic we have

$$\begin{aligned} E_{\min } < k \leqslant E_{\max }&\;\Rightarrow \;&\textrm{pred}(\beta ^k) = (1-\beta ^{-p}) \beta ^k \end{aligned}$$

(1.3)

$$\begin{aligned} 0 <f \in {\mathbb {F}}_{{\mathcal {N}}} \;\text{ and }\; f \ne \textrm{ufp}(f)&\;\Rightarrow \;&\textrm{pred}(f) = f - \beta ^{1-p} \textrm{ufp}(f) \end{aligned}$$

(1.4)

$$\begin{aligned} 0 <f \in {\mathbb {F}}_{{\mathcal {N}}}&\;\Rightarrow \;&\textrm{succ}(f) = f + \beta ^{1-p} \textrm{ufp}(f) \end{aligned}$$

(1.5)

Note that (1.5) includes the case $p=1$, $\beta =2$ for which ${\mathbb {F}}$ is the set of powers of 2, i.e., $f = \textrm{ufp}(f)$ for all $f \in {\mathbb {F}}$, and $\textrm{succ}(f) = f + \beta ^{1-p}\textrm{ufp}(f) = 2f$. Among the properties of ufp [19] is

$$\begin{aligned} 0 \ne f \in {\mathbb {F}}&\;\Rightarrow \;&\textrm{ufp}(f) \leqslant |f| \leqslant \beta (1-\beta ^{-p}) \cdot \textrm{ufp}(f). \end{aligned}$$

(1.6)

Next we present in Fig. 1 our algorithm to compute $\textrm{ufp}(f)$ for $f \in {\mathbb {F}}$ in precision-p base-$\beta $ arithmetic and RoundToZero or RoundDown. It is obvious how to adapt the algorithm for RoundUp. We assume that subrealmin, the smallest positive denormalized floating-point number equal to $\beta ^{E_{\min }}$, is available. Overflow is easily avoided by proper scaling, but we omit that technical detail. Note that in a practical implementation, the constants p1 and phi in lines 2 and 3 of Algorithm ufp would be stored rather than calculated, and the extra input parameters p and beta would be omitted.

Theorem 1.1

Let S be the result of Algorithm ufp applied to $f \in {\mathbb {F}}$, where $E_{\min } \leqslant -1 < p \leqslant E_{\max }$. Suppose that all operations are executed in precision-p base-$\beta $ floating-point arithmetic following the IEEE 754 standard with $p \geqslant 1$ and $\beta \geqslant 2$ in RoundToZero or RoundDown, and that $|f| < \beta ^{E_{\max }-p+1}$. Then S is equal to $\textrm{ufp}(f)$.

Remark 1.1

The usual problems in the denormalized range are avoided because $q \in {\mathbb {F}}_{{\mathcal {N}}}$, so that the multiplication in line 5 are in the normalized range. The result of the final subtraction may be in the denormalized range but is error-free because of Sterbenz’ lemma [21].

Proof

The result is correct for $f=0$, so henceforth we assume $f \ne 0$. We first verify that the used constants p1 and phi are in ${\mathbb {F}}$. The rounding RoundToZero or RoundDown implies that $p_1$ in line 2 is the predecessor of 1, and (1.3) and $E_{\min } \leqslant -1$ yield $p_1 = 1-\beta ^{-p}$. Moreover, $\varphi \in {\mathbb {F}}$ follows by $\beta ^{p-1}+1 \leqslant \beta ^p \leqslant \beta ^{E_{\max }}$. Note that this includes the case $\varphi =2$ for $p=1$.

The input f is used only in line 4, and since $\textrm{ufp}(f)=\textrm{ufp}(|f|)$ we may henceforth assume without loss of generality that $f > 0$. The monotonicity of the rounding, (1.6) and (1.5) imply

$$\begin{aligned} \varphi f\leqslant & {} (\beta ^{p-1}+1)\beta (1-\beta ^{p})\cdot \textrm{ufp}(f) = ( \beta ^p + \beta - 1 - \beta ^{1-p}) \cdot \textrm{ufp}(f) \\< & {} (1+\beta ^{1-p})\beta ^p \textrm{ufp}(f) = \textrm{succ}(\beta ^p\textrm{ufp}(f)), \end{aligned}$$

so that the rounding mode implies $q = \textrm{fl}_{\diamond }(\varphi f) \leqslant \beta ^p\textrm{ufp}(f)$. Therefore,

$$\begin{aligned} \beta ^{p-1}\textrm{ufp}(f) \leqslant \textrm{ufp}(q) \leqslant \beta ^p\textrm{ufp}(f). \end{aligned}$$

(1.7)

Hence q is always in the normalized range ${\mathbb {F}}_{{\mathcal {N}}}$ and $f < \beta ^{E_{\max }-p+1}$ yields $\textrm{ufp}(f) \leqslant \beta ^{E_{\max }-p}$ and $q \leqslant \beta ^p\textrm{ufp}(f) \leqslant \beta ^{E_{\max }}$.

We distinguish two cases. First, assume $\textrm{ufp}(q) = \beta ^p\textrm{ufp}(f)$, which implies that $q = \beta ^p\textrm{ufp}(f)$ is a power of $\beta $. Then $q \geqslant \beta ^p \beta ^{E_{\min }} > \beta ^{E_{\min }}$ and (1.3) yield

$$\begin{aligned} r:= \textrm{fl}_{\diamond }((1-\beta ^{-p}) q) = \textrm{pred}(q) = (1-\beta ^{-p}) q \end{aligned}$$

and therefore $S = \textrm{fl}_{\diamond }(q-r) = \textrm{fl}_{\diamond }(\beta ^{-p}q) = \textrm{fl}_{\diamond }(\textrm{ufp}(f)) = \textrm{ufp}(f)$. According to (1.7) it remains the second case

$$\begin{aligned} \textrm{ufp}(q) = \textrm{ufp}( \textrm{fl}_{\diamond }((\beta ^{p-1}+1) f)) = \beta ^{p-1}\textrm{ufp}(f). \end{aligned}$$

(1.8)

Note that $p=1$ and $\beta =2$ belongs to the first case $\textrm{ufp}(q) = \beta ^p\textrm{ufp}(f)$. Next $\beta ^{p-1}f \in {\mathbb {F}}_{{\mathcal {N}}} $ and (1.5) give

$$\begin{aligned} q= & {} \textrm{fl}_{\diamond }((\beta ^{p-1}+1)f) = \textrm{fl}_{\diamond }((1+\beta ^{1-p})\beta ^{p-1}f) \\\geqslant & {} \textrm{fl}_{\diamond }(\beta ^{p-1}f + \beta ^{1-p}\textrm{ufp}(\beta ^{p-1}f)) = \textrm{succ}(\beta ^{p-1}f)\\\geqslant & {} \textrm{succ}(\beta ^{p-1}\textrm{ufp}(f)) = \textrm{succ}(\textrm{ufp}(q)). \end{aligned}$$

The monotonicity of the rounding, (1.6), $q > \textrm{ufp}(q)$ and (1.4) yield

$$\begin{aligned} q= & {} \textrm{fl}_{\diamond }(q) > \textrm{fl}_{\diamond }((1-\beta ^{-p})q) =: r \\\geqslant & {} \textrm{fl}_{\diamond }(q - \beta ^{1-p}(1-\beta ^{-p})\textrm{ufp}(q)) \geqslant \textrm{fl}_{\diamond }(q - \beta ^{1-p}\textrm{ufp}(q)) \\= & {} \textrm{pred}(q), \end{aligned}$$

and therefore $r = \textrm{pred}(q) = q - \beta ^{1-p}\textrm{ufp}(q) = q - \textrm{ufp}(f)$. Hence $S = \textrm{fl}_{\diamond }(q-r) = \textrm{fl}_{\diamond }(\textrm{ufp}(f)) = \textrm{ufp}(f)$. The theorem is proved. $\square $ $\square $

Algorithm ufp will part of the flbeta toolbox in INTLAB. Executable INTLAB code, which is almost identical to the one given in Fig. 1, is shown in Fig. 2,

Here flbeta is a user-defined data type, where the precision $p \geqslant 1$, the base $\beta \geqslant 2$ as well as the exponent range $(E_{\min },E_{\max })$ can be specified through initialization by flbetainit. As in every operator concept, an operation is executed in flbeta-arithmetic if at least one of the operands is of type flbeta. The flbeta toolbox respects the rounding mode; in line 2 it is switched to RoundToZero using the internal Matlab command feature.

The result of p = flbetainit as in line 3 without input and with one output argument is the precision p in use. The constructor flbeta(m,e) generates the flbeta constant $m\beta ^e$. Otherwise the code is self-explaining.

Finally we want to mention that the flbeta toolbox was very useful for testing in different precisions p, different bases $\beta $ and exponent ranges $E_{\min },E_{\max }$. Frankly speaking, we found Algorithm ufp experimentally when playing around with different possibilities. However, we did not find a simple algorithm in the nearest rounding RoundTiesToEven.

We close the main part of this note with some open problems. As has been mentioned, we did not succeed to find a simple algorithm to compute ufp solely in rounding to nearest. Here “simple” means few operations without loop.

Problem 1.1

Given a precision-p base-$\beta $ arithmetic following IEEE 754, find a simple algorithm to compute the unit in the first place (ufp) in rounding to nearest.

The problem is solved [17] in binary for $p \geqslant 1$.

Problem 1.2

Given a precision-p base-$\beta $ arithmetic following IEEE 754, find a simple algorithm to compute the unit in the last place (ulp) in rounding to nearest.

Concerning units of a floating-point number, there is a third quantity of interest, namely, the magnitude of the least nonzero digit in a finite base-$\beta $ representation. Historically [15], Shewchuk [20] uses this quantity implicitly for defining his “nonoverlapping expansion”, with the notation $\omega (f)$ it appears in [4], and in [1] the notation $\textrm{uls}(f)$ (unit in the least significant place) is used. For example, in a precision-3 decimal arithmetic and $f = 42$ we have $\textrm{ufp}(f) = 10$, $\textrm{ulp}(f) = 0.1$ and $\textrm{uls}(f) = 1$.

Problem 1.3

Given a precision-p base-$\beta $ arithmetic following IEEE 754, find a simple algorithm to compute the unit in the least significant place (uls) in any rounding mode.

Notes

Note that our definition of $E_{\min }$ and $E_{\max }$ differs by $p-1$ from that in [8, 13].
Using a finite representation if possible, i.e., avoiding infinitely many trailing $\beta -1$-digits.
Harrison notes this in [6], which is different from what is called “Harrison’s-ulp” [2, 13]

References

Boldo, S., Joldes, M., Muller, J.-M., Popescu, V.: Formal verification of a floating-point expansion renormalization algorithm. In: 8th International Conference on Interactive Theorem Proving (ITP’2017), vol. 10499 of Lecture Notes in Computer Science, Brasilia, Brazil (2017)
Boldo, S., Lauter, C., Muller, J.-M.: Emulating round-to-nearest ties-to-zero “augmented’’ floating-point operations using round-to-nearest ties-to-even arithmetic. IEEE Trans. Comput. 70(7), 1046–1058 (2021)
Article MathSciNet MATH Google Scholar
Brent, R., Zimmermann, P.: Modern Computer Arithmetic. Cambridge University Press, New York (2010)
Book MATH Google Scholar
Daumas, M.: Multiplications of floating-point expansions. In: Koren and Kornerup, editors, Proceedings of the 14th IEEE Symposium on Computer Arithmetic (Adelaide, Australia), 250–257. IEEE Computer Society Press, Los Alamitos, CA (1999)
Dekker, T.J.: A floating-point technique for extending the available precision. Numer. Math. 18, 224–242 (1971)
Article MathSciNet MATH Google Scholar
Harrison, J.: A machine-checked theory of floating point arithmetic. In: Bertot, Y., Dowek, G., Thery, L., Hirschowitz, A., Paulin, C. (eds.) Theorem Proving in Higher Order Logics. Springer, Berlin (1999)
Google Scholar
IEEE. IEEE standard for binary floating-point arithmetic. ANSI/IEEE Std 754-1985, 1–20 (1985)
IEEE. IEEE standard for floating-point arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008), 1–84 (2019)
Jeannerod, C.-P., Muller, J.-M., Zimmermann, P.: On various ways to split a floating-point number. In: 25th IEEE Symposium on Computer Arithmetic, ARITH 2018, Amherst, MA, USA, June 25-27, 2018, 53–60. IEEE (2018)
Kahan, W.: Further remarks on reducing truncation errors. Commun. ACM 8(1), 40 (1965)
Article Google Scholar
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2. Addison Wesley, Massachusetts (1969)
MATH Google Scholar
Muller, J.-M.: On the definition of ulp(x). [Research Report] RR-5504, LIP RR-2005-09, INRIA, LIP. ffinria-00070503f
Muller, J.-M., Brunie, N., de Dinechin, F., Jeannerod, C.-P., Joldes, M., Lefèvre, V., Melquiond, G., Revol, R., Torres, S.: Handbook of Floating-Point Arithmetic. Springer, Berlin (2018)
Book MATH Google Scholar
Møller, O.: Quasi double precision in floating-point arithmetic. BIT Numer. Math. 5, 37–50 (1965)
Article MATH Google Scholar
Muller, J.-M.: private communication (2022)
Rump, S.M.: INTLAB - INTerval LABoratory. In: Csendes, Tibor (ed.) Developments in Reliable Computing. Springer, Berlin (1999)
Google Scholar
Rump, S.M., Zimmermann, P., Boldo, S., Melquiond, G.: Computing predecessor and successor in rounding to nearest. BIT 49(2), 419–431 (2009)
Article MathSciNet MATH Google Scholar
Rump, S.M.: IEEE-754 precision-$p$ base-$\beta $ arithmetic implemented in binary. ACM Trans. Math. Software, to appear
Rump, S.M., Ogita, T., Oishi, S.: Accurate floating-point summation part I: faithful rounding. SIAM J. Sci. Comput. 31(1), 189–224 (2008)
Article MathSciNet MATH Google Scholar
Shewchuk, J.R.: Adaptive precision floating-point arithmetic and fast robust geometric predicates. Discrete Comput. Geom. 18(3), 305–363 (1997)
Article MathSciNet MATH Google Scholar
Sterbenz, P.H.: Floating-point computation. Prentice Hall, Englewood Cliffs (1974)
Google Scholar

Download references

Acknowledgements

Many thanks to Christoph Lauter and Jean-Michel Muller for their constructive comments to an earlier version of this note. Moreover, my dearest thanks to two unknown referees for many fruitful remarks. Their thoughtful questions led to the algorithms presented in the appendix.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Reliable Computing, Hamburg University of Technology, Am Schwarzenberg-Campus 3, 21073, Hamburg, Germany
Siegfried M. Rump
Faculty of Science and Engineering, Waseda University, 3–4–1 Okubo, Shinjuku-ku, Tokyo, 169–8555, Japan
Siegfried M. Rump

Authors

Siegfried M. Rump
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siegfried M. Rump.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Communicated by Elisabeth Larsson.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

We add some more algorithms to compute ufp and ulp requiring specific rounding modes, namely RoundDown, RoundUp and/or RoundToNearest.

To compute $\textrm{ulp}(f)$ in RoundUp [or RoundDown] we can just follow the definition $\textrm{ulp}(f) = \textrm{succ}(|f|) - |f|$ for nonzero $f \in {\mathbb {F}}$.

The result is correct in precision-p base-$\beta $ floating-point arithmetic for any nonzero floating-point number f with $|f| < (\beta ^p-1) \beta ^{E_{\max }}$, i.e., with absolute value not equal to the largest representable floating-point number realmax. If there is a possibility to obtain the successor of a floating-point number, then replacing line 3 by s = succ(f); produces correct code in any rounding mode because the computation in line 4 is error-free (Fig. 3).

In RoundDown or RoundToZero, a little more effort is necessary to compute $\textrm{ulp}(f)$. The algorithm in Fig. fig:ulpDown works for vector or matrix input as well. That is, by the way, also true for the previous algorithms.

The computed S in line 4 is correct for positive $f \in {\mathbb {F}}$ except powers of $\beta $ in the normalized range. Otherwise, S is corrected in lines 5-8. The result is correct for nonzero $f \in {\mathbb {F}}$ with $|f| < \texttt{realmax}$.

Sometimes if-statements may cause quite some computational overhead. The algorithm in Fig. 5, working for nonzero $f \in {\mathbb {F}}$ with $|f| < \texttt{realmax}$, closes that gap. If f is not a power of $\beta $, then f + S is the successor of f, so that d = 0 in line 5. Hence S is not changed in line 6. Otherwise, if f is a power of $\beta $, then S = ulp(f)/beta and f + S is equal to f in floating-point in the chosen rounding modes. Hence d = -S = -ulp(f)/beta, and the computed S is corrected into $\textrm{ulp}(f)$ because d is a power of $\beta $ and the computation in line 6 is error-free. However, the algorithm in Fig. 5 is about twice as slow as the previous one in Fig. 4.

Finally, if there is a possibility to obtain the successor of a floating-point number, then ufp can be calculated in any rounding mode by the algorithm in Fig 6. The algorithm works, as for Theorem 1.1, correctly for nonzero $f \in {\mathbb {F}}$ satisfying $|f| < (\beta ^p-1) \beta ^{E_{\max }-p+1}$ except that f must be nonzero.

That means the posed Problem 1.1 to find a simple algorithm to compute the unit in the first place in RoundToNearest is solved if such an algorithm for the successor is available. In [17] we presented a simple algorithm for binary arithemtic but only estimates for precision-p base-$\beta $.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rump, S.M. Computation of the unit in the first place (ufp) and the unit in the last place (ulp) in precision-p base $\beta $. Bit Numer Math 63, 28 (2023). https://doi.org/10.1007/s10543-023-00970-2

Download citation

Received: 02 November 2022
Accepted: 22 March 2023
Published: 29 April 2023
DOI: https://doi.org/10.1007/s10543-023-00970-2

Keywords

Mathematics Subject Classification

65G99

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Computation of the unit in the first place (ufp) and the unit in the last place (ulp) in precision-p base \(\beta \)

Abstract