Rigorous Roundoff Error Analysis of Probabilistic Floating-Point Computations

We present a detailed study of roundoff errors in probabilistic floating-point computations. We derive closed-form expressions for the distribution of roundoff errors associated with a random variable, and we prove that roundoff errors are generally close to being uncorrelated with their generating distribution. Based on these theoretical advances, we propose a model of IEEE floating-point arithmetic for numerical expressions with probabilistic inputs and an algorithm for evaluating this model. Our algorithm provides rigorous bounds to the output and error distributions of arithmetic expressions over random variables, evaluated in the presence of roundoff errors. It keeps track of complex dependencies between random variables using an SMT solver, and is capable of providing sound but tight probabilistic bounds to roundoff errors using symbolic affine arithmetic. We implemented the algorithm in the PAF tool, and evaluated it on FPBench, a standard benchmark suite for the analysis of roundoff errors. Our evaluation shows that PAF computes tighter bounds than current state-of-the-art on almost all benchmarks.


Introduction
There are two common sources of randomness in a numerical computation (a straight-line program). First, the computation might be using inherently noisy data, for example from analog sensors in cyber-physical systems such as robots, autonomous vehicles, and drones. A prime example is data from GPS sensors, whose error distribution can be described very precisely [2] and which we study in some detail in § 2. Second, the computation itself might sample from random number generators. Such probabilistic numerical routines, known as Monte-Carlo methods, are used in a wide variety of tasks, such as integration [39,30], optimization [40], finance [21], fluid dynamics [28], and computer graphics [26]. We call numerical computations whose input values are sampled from some probability distributions probabilistic computations. Probabilistic computations are typically implemented using floating-point arithmetic, which leads to roundoff errors being introduced in the computation. To strike the right balance between the performance and energy consumption versus the quality of the computed result, expert programmers rely on either a manual or automated floating-point error analysis to guide their design decisions. However, the current state-of-the-art approaches in this space have primary focused on worst-case roundoff error analysis of deterministic computations. So what can we say about floating-point roundoff errors in a probabilistic context? Is it possible to probabilistically quantify them by computing confidence intervals? Can we, for example, say with 99% confidence that the roundoff error of the computed result is smaller than some chosen constant? What is the distribution of outputs when roundoff errors are taken into account? In this paper, we explore these and similar questions. To answer them, we propose a rigorous -that is to say sound -approach to quantifying roundoff errors in probabilistic computations. Based on this approach, we develop an automatic tool that efficiently computes an overapproximate probabilistic profile of roundoff errors.
As an example, consider the floating-point arithmetic expression (X +Y )÷Y , where X and Y are random inputs described formally as independent random variables (usually defined over some intervals). In § 4, we first show how the computation in finite-precision of a single arithmetic operation such as X + Y can be modelled as (X + Y )(1 + ε), where ε is also a random variable. We then show how this random variable can be computed from first principles and why it makes sense to view (X +Y ) and (1+ε) as independent expressions, which in turn allows us to easily compute the distribution of (X + Y )(1 + ε). The distribution of ε depends on that of X + Y , and we therefore need to evaluate arithmetic operations between random variables. When the operands are independent -as in X + Y -this is standard [44], but when the operands are dependent -as in the case of the division in (X + Y ) ÷ Y -this is a hard problem. To solve it, we adopt and improve a technique for soundly bounding these distributions described in [3]. Our improvement comes from the use of an SMT solver to reason about the dependency between (X + Y ) and Y and remove regions of the state-space with zero probability. We describe this in § 6.
We can thus soundly bound the output distribution of any probabilistic computation, such as (X + Y ) ÷ Y , performed in finite-precision floating-point arithmetic. This gives us the ability to perform probabilistic range analysis and prove rigorous assertions like: 99% of the outputs of a floating-point computation are smaller than a given constant bound. In order to perform probabilistic roundoff error analysis we develop symbolic affine arithmetic in § 5. This technique is combined with probabilistic range analysis to compute conditional roundoff errors. Specifically, we over-approximate the maximal error conditioned on the output landing in the 99% range computed by the probabilistic range analysis, i.e. conditioned on the computations not returning an outlier. This allows us to quantify very precisely the idea of the maximal error of a 'typical' computation.
Here we need to make a comment about how to understand the notion of sound or rigorous error bounds in a probabilistic context. The usual meaning of these words is anchored in a non-deterministic model of computation where inputs are known to belong to some intervals and soundly bounding the error amounts to bounding the error of all possible computations. If the inputs are drawn from a distribution, this understanding of soundness corresponds to bounding errors with 100% confidence, and thereby ignoring the probabilistic information completely. This is why the state-of-the-art refers to this nondeterministic model as worst-case analysis, because it does not account for the distribution of the inputs. On the other hand, in a probabilistic model of computation, we will say that we are giving a sound (or rigorous) probabilistic guarantee if there is 100% certainty about the probabilistic guarantee. For example when saying 99% of outputs will land in the interval [a, b] we are making a sound probabilistic statement based on some analytical description of the output distribution, and there is therefore no uncertainty about this statement. An example of unsound probabilistic statement is given by making probabilistic statements based on Monte-Carlo sampling. When saying based on 1 billion runs, 99% of outputs will land in the interval [a, b] we are making an unsound probabilistic statement which may fail in another run of the experiment. Another example of unsound probabilistic statement would be with 95% certainly, 99% of outputs will land in the interval [a, b], as is done in e.g. PAC learning [47]. By working with 'p-boxes' [3] -structures which upper-and lower-bound cumulative distribution functions (see §3 and §4) -we will be able to give probabilistic guarantees which are all sound. We implemented our model and algorithms in a tool called PAF (for Probabilistic Analysis of Floating-point errors). We evaluated PAF on the standard floating-point benchmark suite FPBench [7], and compared its range and error analysis with the worst-case roundoff error analyzer FPTaylor [43,42] and the probabilistic roundoff error analyzer PrAn [32]. We present the results in § 7, and show that FPTaylor's worst-case analysis is often overly pessimistic in the probabilistic setting, while PAF also generates tighter probabilistic error bounds than PrAn on all but one benchmark.
We summarize our contributions as follows: (i) We derive several closed-form expressions for the distribution of roundoff errors associated with a random variable. We prove that roundoff errors are generally close to being uncorrelated with their input distribution.
(ii) Based on these results we propose a model of IEEE754 floating-point arithmetic for numerical expressions with probabilistic inputs.
(iii) We evaluate this model by developing a new algorithm for rigorously bounding the output range and roundoff error distributions of floating-point arithmetic expressions with probabilistic inputs.
(iv) We implement this model in the PAF tool, and perform probabilistic range and roundoff error analysis on a standard benchmark suite. Our comparison with the current state-of-the-art shows the advantages of our approach in terms of computing tighter, and yet still rigorous, probabilistic bounds.

Overview Through an Application
GPS sensors provide latitude and longitude coordinates that are inherently noisy. As shown in detail in [1], the conditional probability distribution of the true longitude and latitude coordinates (TrueLat, TrueLong), given a GPS sensor reading (ReadLat, ReadLong), is distributed according to a Rayleigh distribution where ε is a hardware dependent measure of accuracy of the GPS reading known as horizontal accuracy. Interestingly and somewhat surprisingly, since the density of any Rayleigh distribution is always zero at x = 0, it is extremely unlikely that the true coordinates lie in a small neighbourhood of those given by the GPS reading. This leads to errors described in [1] and [2] and to the suggestion that the coordinates given by a GPS sensor should be corrected by adding a probabilistic error term which will, on average, shift the observed coordinates into an area of high probability for the true coordinates. Fig. 1 gives the correction of [2], where the value of the variable radius is drawn from the Rayleigh distribution with parameter ε √ ln 400 , the value of the variable angle is draw from the uniform distribution on [0, 2π], and DEGREES PER METER is a constant (≈ 0.000009009) used to convert the radius variable (expressed in meters) into degrees.
A developer trying to strike the right balance between resources, such as energy consumption or execution time, versus the accuracy of the computation, particularly in the context of mobile and embedded devices, might want to run a rigorous worst-case floating-point analysis tool to decide which floating-point format is accurate enough to process GPS signals. This is a mandatory decision if the developer requires rigorous error bounds holding with 100% certainty. The problem such a developer would face when analyzing a piece of code involving the probabilistic correction above, is that the Rayleigh distribution -that is to say the value of the variable radius in Fig. 1 -has a semi-infinite support (i.e. [0, ∞)), and any rigorous (i.e. sound) worst-case roundoff error analysis would return an infinite error bound in the presence of unbounded variables. This is because, in the worst-case scenario, the unbounded variable assumes exactly the infinite value, thus resulting in an infinite roundoff error. In order to get a meaningful (numeric) error bound from worst-case roundoff error tools, the range of radius will therefore have to be truncated. The main consequence of truncation is that soundness -which is the purpose of these tools -is lost, because regardless of where you decide to truncate the distribution, you are going to discard a part of its support with non-zero probability. To mitigate this TrueLat = ReadLat + ((radius * sin(angle)) * DEGREES_PER_METER) TrueLong = ReadLong + ((radius * cos(angle)) * DEGREES_PER_METER) Fig. 1: Probabilistic correction of GPS coordinates from [2]: radius is Rayleigh distributed, angle is uniformly distributed.
issue and be 'as sound as possible', the natural truncation is given by the interval [0, max (p,e) ] where max (p,e) is the largest representable number not causing an overflow, in a generic floating-point format with p bits for the mantissa and e bits for the exponent representations. As we show next, the main issue with this natural truncation is that all the useful probabilistic information is lost, and regions of high probability are treated in the same way as regions which would never be explored even if we sampled for billions of years.
As an example, let us consider the latitude correction of Fig. 1: and set ReadLat to 51.4769 • , the latitude of the Greenwich observatory, and DE-GREES PER METER to 0.000009009. In table 1, we report our detailed roundoff error analysis when (1) is implemented in the well-known IEEE-754 double-, single-, and half-precision formats. With each implementation format (reported in the Format column), we associate the range [0, max (p,e) ], with max (p,e) reported in the Max column, of the Rayleigh distribution truncated in the natural manner described above. In double-precision, for example, we truncate the Rayleigh distribution to the range [0, 10 307 ] since 10 307 ≈ max (53,11) . We computed worst-case roundoff error bounds for (1) in each precision format with the state-of-the-art error analyzer FPTaylor [43], with the variable radius constrained to the corresponding truncated range. We show the results in the column FPTaylor. Note that in double-precision, FPTaylor returns a roundoff error bound of 4.3e+286. The reason for such an enormous error is the fact that FP-Taylor does not take into consideration how radius is distributed in the range, i.e. how rare the values contributing to such a large error are. We also computed worst-case roundoff errors with our tool PAF by setting the confidence interval from which conditional roundoff errors are computed to 100%. We report the results in the PAF 100% column; these results are identical to the ones from FPTaylor, thereby confirming empirically that worst-case roundoff error is synonymous with conditional roundoff error based on a 100% confidence interval. Finally, the PAF 99.9999% column gives the 99.9999% conditional roundoff error computed by PAF. This value is an upper bound to the roundoff error conditioned on the computation having landed in an interval capturing (at least) 99.9999% of all possible outputs. The column Absolute gives the 99.9999% conditional roundoff error of the latitude correction (1), which is expressed in degrees, and the column Meters converts this value into meters, based on the fact that 1 • of latitude is roughly equal to 111km. Our probabilistic error analysis in PAF with 99.9999% confidence interval reduces the 100% (i.e., worst-case) error bound by 300 orders of magnitude in the case of double-precision, and by 30 orders of magnitude in the case of single-precision. The reason behind such dramatic reduction is that PAF runs a sound probabilistic range analysis (described in § 4 and § 6.2) in order to over-approximate the 99.9999% confidence interval when (1) is implemented in floating-point arithmetic. This 99.9999% interval is then used to compute the conditional roundoff error (described in § 6.3). Specifically, using symbolic affine arithmetic (see § 5), PAF will maximize the possible roundoff error for those computations that land in the 99.9999% confidence interval. In other words, PAF over-approximates the roundoff error of all possible computations except the 0.0001% of outliers that lead to the most abnormal outputs. Without our probabilistic error analysis, the developer might erroneously conclude that half-precision format is the most appropriate to implement (1) because it results in the smallest error bound. However, with the information provided by the 99.9999% conditional roundoff error, the developer can see that the average error, without the extremely rare outliers, is many orders of magnitude smaller than the worst-case scenarios. Furthermore, the developer can also observe that in the case of the half-precision format (and in this case alone), the worst-case errors and the probabilistic errors are almost identical, which means that the average roundoff error is of the same order of magnitude as the worstcase error. This is because in half-precision (and lower), the roundoff error of (1) is governed by the rounding errors of the constants, and is thus completely independent from the chosen confidence interval.
Armed with this information, the developer can conclude that with a roundoff error of roughly 40cm (4.1e-1 meters) when correcting 99.9999% of GPS latitude readings, working in single-precision is an adequate compromise between efficiency and accuracy of the computation, since double-precision provides unnecessary accuracy (roundoff error of 4.5e-10 meters) and half-precision is always overly imprecise (roundoff error of 2667 meters). As a quantitative intuition of the 99.9999% confidence level, it is not hard to show (by using the Poisson approximation of the binomial distribution) that the probability of encountering at least one outlier after 700,000 latitude corrections given by (1) is about 50%. Assuming one GPS reading every 5 seconds, this corresponds to at least 50% chance that the roundoff error bound is never breached after 40 days of continual use. Of course, we can run PAF at higher levels of confidence if required.
This demonstrates the innovative concept of probabilistic precision tuning, supported by this paper, in order to determine which floating-point format is the most appropriate to implement (1). In the current worst-case precision tuners [4,8], the developer provides the tuner with an algebraic expression f (x) (like (1)) where the arguments x are properly bounded, together with a numerical roundoff error bound . The output from the tuner is the minimal (in terms of bits) floating-point format required when implementing the expression f (x) that guarantees that the implementation will never violate the given er-ror bound. Clearly, worst-case precision tuning suffers from the same limitations we discussed above, meaning we have no information on how rare the inputs violating the given error bound might be.
As an example, let us do a precision tuning exercise for the latitude correction computation (1). We bound the variable radius in the interval [0, 10 307 ] and we set the error bound to 1e-5 (roughly 1m). We manually perform worst-case precision tuning using FPTaylor to determine that the minimal floating-point format not violating the given error bound consists of 1022 bits for the mantissa and 11 bits for the exponent. Such custom floating-point format is prohibitively expensive, in particular for devices with an intensive usage of GPS readings, like smart-phones or smart-watches. Thus, worst-case error bounds are not only very pessimistic, but are also of little practical use from an implementation prospective in some cases.
On the other hand, when we manually performed probabilistic precision tuning using PAF with a confidence interval set to 99.9999%, we determine that the minimal required floating-point format consists of 22 bits for the mantissa and 11 bits for the exponent. Thanks to PAF, the confidence interval has now become an additional input parameter of the probabilistic precision tuning routine. Thus, in addition to the expression f (x) and the error bound , the developer can also provide a custom confidence interval of interest and ignore the extremaly unlikely corner cases like the ones we described for (1).

Floating-Point Arithmetic
Given a precision p ∈ N and an exponent range [e min , e max ] {n | n ∈ N ∧ e min ≤ n ≤ e max }, we define F(p, e min , e max ), or simply F if there is no ambiguity, as the set of extended real numbers Elements z = z(s, e, k) ∈ F will be called floating-point representable numbers (for the given precision p and exponent range [e min , e max ]) and we will use the variable z to represent them. The variable s will be called the sign, the variable e the exponent, and the variable k the significand of z(s, e, k). Next, we introduce a rounding map Round : R → F that rounds to nearest (or to −∞/∞ for values smaller/greater than the smallest/largest finite element of F) and follows any of the IEEE754 rounding modes in case of a tie. We will not worry about which choice is made since the set of mid-points will always have probability zero for the distributions we will be working with. All choices are thus equivalent, probabilistically speaking, and what happens in a tie can therefore be left unspecified. We will denote the extended real line by R R ∪ {−∞, ∞}. The (signed) absolute error function err abs : R → R is defined as: err abs (x) = x−Round(x). We define the sets z, z = {y ∈ R | Round(y) = Round(z)}. Thus if z ∈ F, then z, z is the collection of all reals rounding to z. As the reader will see, the basic result of § 4 (Eq. (6)) is expressed entirely using the notation z, z which is parametric in the choice of the Round function. It follows that our results apply to rounding modes other that round-to-nearest with minimal changes. The relative error function err rel : The quantity 2 −(p+1) is usually called the unit roundoff and will be denoted by u.
For z 1 , z 2 ∈ F and op ∈ {+, −, ×, ÷} an (infinite-precision) arithmetic operation, the traditional model of IEEE754 floating-point arithmetic [36] [22] states that the finite-precision implementation op m of op must satisfy We leave dealing with subnormal floating-point numbers to future work. The model given by Eq.
(2) stipulates that the implementation of an arithmetic operation can induce a relative error of magnitude at most u. The exact size of the error is, however, not specified and Eq. (2) is therefore a non-deterministic model of computation. It follows that numerical analyses based on Eq. (2) must consider all possible relative errors δ and are fundamentally worst-case analyses.
Since the output of such a program might be the input of another, one should also consider non-deterministic inputs, and this is indeed what happens with automated tools for roundoff error analysis, such as Daisy [8] or FPTaylor [43,42], which require for each variable of the program a (bounded) range of possible values in order to perform a worst-case analysis (cf. GPS example in § 2).
In this paper, we study a model formally similar to Eq. (2), namely The difference is that δ is now distributed according to dist, a probability distribution whose support is [−u, u]. In other words, we move from a non-deterministic to a probabilistic model of roundoff errors. This is similar to the 'Monte Carlo arithmetic' of [38], but whilst op. cit. postulates that dist is the uniform distribution on [−u, u], we compute dist from first principles in § 4.

Probability Theory
To fix the notation and be self-contained, we present some basic notions of probability theory which are essential to what follows.

Cumulative Distribution Functions and Probability Density Functions.
We assume that the reader is (at least intuitively) familiar with the notion of a (real) random variable. Given a random variable X we define its Cumulative Distribution Function (CDF) as the function c(t) P [X ≤ t]. If there exists a non-negative integrable function d : R → R such that then we call d(t) the Probability Density Function (PDF) of X. If it exists, then it can be recovered from the CDF by differentiation d(t) = ∂ ∂t c(t) by the fundamental theorem of calculus.
Not all random variables have a PDF: consider the random variable which takes value 0 with probability 1 /2 and value 1 with probability 1 /2. For this random variable it is impossible to write P [X ≤ t] = d(t) dt. Instead, we will write the distribution of such a variable using the so-called Dirac delta measure at 0 and 1 as 1 /2δ 0 + 1 /2δ 1 . It is possible for a random variable to have a PDF covering part of its distribution -this will be its continuous part -and a sum of Dirac deltas covering the rest of its distribution -this will be its discrete part. We will encounter examples of such random variables in § 4. Finally, note that if X is a random variable and f : R → R is a (measurable) function, then f (X) is a random variable. In particular err rel (X) is a random variable, which we will describe in detail in § 4. Arithmetic on Random Variables.Suppose X, Y are independent random variables with PDFs f X and f Y , respectively. Using the arithmetic operations we can form new random variables X + Y, X − Y, X × Y, X ÷ Y . The PDFs of these new random variables can be expressed as operations on f X and f Y [44]: (4) It is important to note that these formulas are only valid if X and Y are assumed to be independent. When an arithmetic expression containing variable repetitions is given a random variable interpretation, this independence can no longer be assumed. In the expression (X + Y ) ÷ Y the sub-term (X + Y ) can be interpreted by using Eqs. (4) if X, Y are independent. However, the sub-terms X + Y and Y clearly cannot be interpreted as independent random variables, and the division operation can therefore not be evaluated using Eqs. (4). Soundly Bounding Probabilities. The constraint that the distribution of a random variable must integrate to 1 makes it impossible to order random variables in the 'natural' way: . This means that we cannot quantify our probabilistic uncertainty about a random variable by sandwiching it between two other random variables as one would do with reals or real-valued functions.
One solution is to restrict the sets used in the comparison, i.e. declare that for A ranging over a given set of 'test subsets'.
Such an order can be defined by taking the 'test subsets' to be all the intervals (−∞, x] [41]. This order is now known as the stochastic order. It follows from the definition of the CDF that this order can most elegantly be defined by saying that where c X and c Y are the CDFs of X and Y , respectively. If it is possible to sandwich an unknown random variable X between known lower and upper bounds X lower ≤ X ≤ X upper using the stochastic order then it becomes possible to give sound bounds to the quantities P-Boxes and DS-Structures. As mentioned above, giving a random variable interpretation to an arithmetic expression containing variable repetitions cannot be done using Eqs. (4). In fact, these interpretations are in general analytically intractable. Hence, a common approach is to give up on soundness and approximate such distributions using Monte-Carlo simulations. We use this approach in our experiments to assess the quality of our sound results. However, we will also provide sound under-and over-approximations of the distribution of arithmetic expressions over random variables using the stochastic order discussed above. Since X lower ≤ X ≤ X upper is equivalent to saying that , the fundamental approximating structure will be a pair of CDFs satisfying c 1 (x) ≤ c 2 (x). Such a structure is known in the literature as a p-box [16], and has already been used in the context of probabilistic roundoff errors in related work [3,32]. The data of a p-box is equivalent to a pair of sandwiching distributions for the stochastic order.
A Dempster-Shafer structure (DS-structure) of size N is a set of interval- The intervals in the collection might overlap. One can always convert a DSstructure to a p-box and back again [16], but arithmetic operations are much easier to perform on DS-structures than on p-boxes ( [3]), which is why we will use DS-structures in the algorithm described in § 6.

Distribution of Floating-Point Roundoff Errors
We briefly described in the GPS example of § 2 how our tool PAF computes probabilistic roundoff errors by conditioning the optimization of symbolic affine form (presented in the next section) on the output of the computation landing in a confidence interval. The purpose of this section is to provide the necessary probabilistic tools to compute these intervals. Specifically, we will show how the probabilistic model of Eq. (3) can be used to compute (or at least bound) the output distribution of a probabilistic computation in finite-precision. In other words, this section provides the foundations of probabilistic range analysis.
Our first task is to specify the distribution dist of the random variable δ in the probabilistic model given by Eq. (3). Following [6], we use the fact that dist can be computed explicitly from first principles and expressed in closed form (Eq. (6)). However, this expression can only be computed in practice at very low precision. We then show that roundoff error distributions in high precision (say, single-precision and higher) can be approximated by another closed-form expression Eq. (7) with an error that can be explicitly bounded, allowing us to remain sound. Moreover, the quality of this approximation increases with the working (i.e. benchmark) precision. In the case where all mantissas are equiprobable, it can be shown that as the working precision increases, the roundoff error distribution always converges to a unique distribution given by Eq. (8), which we call the typical distribution. For this class of input distributions, the distribution of errors is therefore asymptotically independent of the way the inputs are distributed. More generally, we show that the covariance between the roundoff errors of Eq. (7) and their generating input distribution is extremely small, that is to say the two processes are almost completely uncorrelated. This will have an important practical application when evaluating our probabilistic model of IEEE 754 arithmetic presented in §6.1. All proofs can be found in the Appendix.

Derivation of the Distribution of Rounding Errors
Let us return to the probabilistic model of IEEE 754 arithmetic given by Eq. (3) where op is the infinite-precision arithmetic operation and op m is its finiteprecision implementation: Let us also assume that z 1 , z 2 are random variables with known distributions. Then z 1 op z 2 is also a random variable which can (in principle) be computed.
Since the IEEE 754 standard states that z 1 op m z 2 should be given by rounding the infinite precision operation z 1 op z 2 , it is a completely natural consequence of the standard to require that δ should simply be given by Thus dist should simply be the distribution of the random variable err rel (z 1 op z 2 ). More generally, if X is a random variable with know distribution, we will show how to compute the distribution dist of the random variable We choose to express the distribution dist of relative errors in multiples of the unit roundoff u. This choice is arbitrary, but it allows us to work with a distribution on the conceptually and numerically convenient interval [−1, 1], since the absolute value of the relative error is strictly bounded by u (see § 3.1), rather than the interval [−u, u].
To compute the density function of dist, we proceed as described in § 3.2 by first computing the CDF c(t) and then taking its derivative. Recall first from In other words, the probability measure corresponding to err rel has three discrete components at {−∞}, {1}, and {∞}, which cannot be accounted for by a PDF (see § 3.2). It follows that the probability measure dist is given by where dist c is a continuous measure that is not quite a probability measure since its total mass is 1 In general, dist c integrates to 1 in machine precision since P [X ∈ 0, 0 ] is of the order of the smallest positive floating-point representable number, and the PDF of X rounds to 0 way before it reaches the smallest/largest floating-point representable number. For example, Python's scipy implementation of the PDF of the standard Gaussian rounds to 0 from x = 39. However in order to be sound, we must in general include these three discrete components in our computations. The density dist c can be computed explicitly and is given by the following result whose proof can essentially be found in [6].
Theorem 1. Let X be a real random variable with PDF f . The continuous part dist c of the distribution of err rel (X) has a PDF given by where 1 A (x) is the indicator function which returns 1 if x ∈ A and 0 otherwise.  6) applied to the distribution Unif(2, 4), first in very low precision (3 bit exponent, 4 bit significand) and then in half-precision. The theoretical density is plotted alongside a histogram of the relative error incurred when rounding 100,000 samples to low precision (computed in double-precision). The reported statistic is the K-S (Kolmogorov-Smirnov) test which measures the likelihood that a collection of samples were drawn from a given distribution. This test reports that we cannot reject the hypothesis that the samples are drawn from the corresponding density. Note how in low precision the term in 1 (1−tu) 2 induces a visible asymmetry on the central section of the distribution. This effect is much less pronounced in half-precision.

High-Precision Case
As the working precision increases, a regime changes occurs: on the one hand it becomes practically impossible to enumerate all floating-point representable numbers as done in Eq. (6), but on the other hand sufficiently well-behaved density functions are numerically close to being constant at the scale of an interval between two floating-point representable numbers. We exploit this smoothness to overcome the combinatorial limit imposed by Eq. (6).
Theorem 2. Let X be a real random variable with PDF f . The continuous part dist c of the distribution of err rel (X) has a PDF given by d and R(t) is an error whose total contribution |R| Note how Eq. (7) reduces the sum over all floating-point representable numbers in Eq. (6) to a sum over the exponents by exploiting the regularity of f . This can often be reduced further since one only needs to consider the exponent on the support of f . Note also that since f is a PDF, it usually decreases very quickly away from 0, and its derivative decreases even quicker (or vanishes altogether for uniform distributions) and |R| thus tends to be very small. This is the case for all benchmarks in § 7. As an example, in single-precision for X ∼ Norm(0, 1) we get |R| < 3.2e − 7, for X ∼ Unif(−2, 2) we get |R| < 1.2e − 7; in both cases very close to the smallest floating-point representable increment to 1. Moreover, |R| → 0 as the precision p → ∞.
Eq. (7) is easy to implement, and we present some results in Fig. 2 where we have chosen as input: (i) a distribution Unif (7,8) where large significands are more likely, (ii) a distribution Unif (4,5) where small significands are more likely, (iii) a distribution Unif (4,32) where all significands are equally likely, and (iv) a distribution Norm(0, 1) with infinite support. The graphs show the density function given by Eq. (7) in single-precision versus a histogram of the relative error incurred when rounding 1,000,000 samples to single-precision (computed in double-precision). The K-S test reports that we cannot reject the hypothesis that the samples are drawn from the corresponding distributions.

Typical Distribution
The distributions depicted in graphs (ii), (v) and (vi) of Fig. 2 are extremely similar, despite being computed from very different input distributions. What they have in common is that their input distributions have the property that that all significands in their supports are equally likely. In fact, we show that under this assumption, the distribution of roundoff errors given by Eq. (6) converges to a unique density as the precision increases, irrespective of the input distribution! Since all significands are in practice often equiprobable (it is the case for a third of the benchmarks presented in § 7), this density is of great practical importance. If one had to choose 'the' canonical distribution for roundoff errors, we claim that the density given below should be this distribution, and we therefore call it the typical distribution; we depict it in Fig. 3 and formalize it with the following theorem, which makes precise ideas from [6].
Theorem 3. If X is a random variable such that P [Round(X) = z(s, e, k 0 )] = 1 2 p for any significand k 0 , then where d(t) is the exact density given by Eq. (6).

Covariance Structure
The result above can be interpreted as saying that if X is such that all mantissas are equiprobable, then X and err rel (X) are asymptotically independent (as p → ∞). Much more generally, we now show that if a random variable X has a sufficiently regular PDF, it is close to being uncorrelated from err rel (X). Formally, we prove that the covariance is small, specifically of the order of u. Note that the expectation in the first summand above is taken w.r.t. the joint distribution of X and err rel (X). The main technical obstacles to proving that the expression above is small are that E [err rel (X)] turns out to be difficult to compute (we only manage to bound it) and that the joint distribution P [X ∈ A ∧ err rel (X) ∈ B] does not have a PDF since it is not continuous w.r.t. the Lebesgue measure on R 2 . Indeed, it is supported by the graph of the function err rel which has a Lebesgue measure of 0. This does not mean that it is impossible to compute the expectation E [X.err rel (X)] = R 2 xut dP (10) but it is necessary to use some more advanced probability theory. We will make the simplifying assumption that the density of X is constant at the level of each interval z, z in order to keep the proof manageable. In practice this is an extremely good approximation. Without this assumption, we would need to add an error term similar to that of Theorem 2 to the expression below. This is not conceptually difficult, but is is rather messy, and it would distract from the main aim of the following theorem which is to bound E [err rel (X)], compute E [X.err rel (X)], and show that the covariance between X and err rel (X) is typically of the order of u.
Theorem 4. If the density of X is piecewise constant on intervals z, z , then If the distribution of X is centered (i.e., E [X] = 0) then L is the exact value of the covariance, and it is worth noting that L is fundamentally an artefact of the floating-point representation and is due to the fact that the intervals 2 e , 2 e are not symmetric. More generally, for E [X] of the order of, say, 2, the covariance will be small (of the order of u) as K ≤ 1 (since |x| ≤ 2 e+1 in each summand).
For very large values of E [X] it is worth noting that there is a high chance that L is also be very large, partially canceling E [X]. An illustration of this is given by the doppler benchmark examined in §7, an outlier as it has an input variable with range [20, 20000]. Nevertheless, even for this benchmark the bounds of Theorem 4 still give a small covariance of the order of 0.001.

Error Terms and P-Boxes
In low-precision we can use the exact formula Eq. (6) to compute the error distribution. However, in high-precision, approximations (typically extremely good) like Eqs. (7) and (8) must be used. In order to remain sound in the implementation of our model (see § 6) we must account for the error made by this approximation. We have not got the space to discuss the error made by Eq. (8), but taking the term |R| of Theorem 2 as an illustration, we can use the notion of p-box described in § 3.2 to create an object which soundly approximates the error distribution. We proceed as follows: since |R| bounds the total error accumulated over all t ∈ [−1, 1], we can soundly bound the CDF c(t) of the error distribution given by Eq. (7)  This p-box is used as the (independent) roundoff error term for the probabilistic model of IEEE arithmetic given by Eq. (12), which we describe in § 6.

Symbolic Affine Arithmetic
In this section, we introduce symbolic affine arithmetic, which we employ to generate the symbolic form for the roundoff error that we use in § 6.3. Affine arithmetic [5] is a model for range analysis that extends classic interval arithmetic [37] with information about linear correlations between operands. Symbolic affine arithmetic extends standard affine arithmetic by keeping the coefficients of the noise terms symbolic. We define a symbolic affine form aŝ We call x 0 the central symbol of the affine form, while x i are the symbolic coefficients for the noise terms i . We can always convert a symbolic affine form to its corresponding interval representation. This can be done using interval arithmetic or, to avoid precision loss, using a global optimizer.
Affine operations between symbolic forms follow the usual rules, such as Non-linear operations cannot be represented exactly using an affine form. Hence, we approximate them like in standard affine arithmetic [45]. Sound Error Analysis with Symbolic Affine Arithmetic. We now show how the roundoff errors get propagated through the four basic arithmetic operation. We apply these propagation rules to an arithmetic expression to accurately keep track of the roundoff errors. Since the (absolute) roundoff error directly depends on the range of a computation, we describe range and error together as a pair (range: Symbol, err: Symbolic Affine Form). Here, range represents the infinite-precision range of the computation, while err is the symbolic affine form for the roundoff error in floating-point precision. Unary operators (e.g., rounding) take as input a (range, error form) pair, and return a new output pair; binary operators take as input two pairs, one per operand. For linear operators, the ranges and errors get propagated using the standard rules of affine arithmetic.
For the multiplication, we distribute each term in the first operand to every term in the second operand: (x, err x ) * (y, err y ) = (x*y, x * err y + y * err x + err x * err y ) The output range is the product of the input ranges and the remaining terms contribute to the error. Only the last (quadratic) expression cannot be represented exactly in symbolic affine arithmetic; we bound such non-linearities using a global optimizer. The division is computed as the term-wise multiplication of the numerator with the inverse of the denominator. Hence, we need the inverse of the denominator error form, and then we can proceed as for multiplication. To compute the inverse, we leverage the symbolic expansion used in FPTaylor [42].
Finally, after every operation we apply the unary rounding operator from Eq. (2). The infinite-precision range is not affected by rounding. The rounding operator appends a fresh noise term to the symbolic error form. The coefficient for the new noise term is the (symbolic) floating-point range given by the sum of the input range with the input error form.

Algorithm and Implementation
In this section, we describe our probabilistic model of floating-point arithmetic and how we implement it in a prototype named PAF (for Probabilistic Analysis of Floating-point errors). Fig. 4 shows the toolflow of PAF. captured with this simple grammar: Following [27], we interpret each computation t given by the grammar as a random variable. We define the interpretation map − over the computation tree inductively. The base case is given by z(s, e, k) (−1) s 2 e (1 + k2 −p ) and x i X i , where the real numbers z(s, e, k) are understood as constant random variables and each X i is a random input variable with a user-specified distribution. Currently, PAF supports several well-known distributions out-ofthe-box (e.g., uniform, normal, exponential, beta), and a user can also define custom distributions as piecewise functions. For the inductive case t 1 op m t 2 , we put the lessons from § 4 to work. Recall first the probabilistic model Eq. (3): In § 4.1, we showed that dist should be taken as the distribution of the actual roundoff errors of the random elements (x op y). We therefore define: To evaluate the model of Eq. (12), we first use the appropriate closed-form expression Eqs. (6) to (8) derived in § 4 to evaluate the (distribution of the) random variable err rel ( t 1 op t 2 ) -or the corresponding p-box as described in §4.5. We then use Theorem 4 to justify evaluating the multiplication operation in Eq. (12) independently -that is to say by using Eq. (4) -since the roundoff process is very close to being uncorrelated to the process generating it. The validity of this assumption is also confirmed experimentally by the remarkable agreement of Monte-Carlo simulations with this analytical model (see § 7). We now introduce the algorithm for evaluating the model given in Eq. (12). The evaluation performs an in-order (LNR) traversal of the Abstract Syntax Tree (AST) of a computation given by our grammar, and it feeds the results to the parent level along the way. At each node, it computes the probabilistic range of the intermediate result using the probabilistic ranges computed for its children nodes (i.e., operands). We first determine whether the operands are independent or not (Ind? branch in the toolflow), and we either apply a cheaper (i.e., no SMT solver invocations) algorithm if they are independent (see below) or a more involved one (see § 6.2) if they are not. We describe our methodology at a generic intermediate computation in the AST of the expression.
We consider two distributions X and Y discretized into DS-structures DS X and DS Y ( § 3.2), and we want to derive the DS-structure DS Z for Z = X op Y , op ∈ {+, −, ×, ÷}. Together with the DS-structures of the operands, we also need the traces trace X and trace Y containing the history of the operations performed so far, one for each operand. A trace is constructed at each leaf of the AST with the input distributions and their range. It is then propagated to the parent level and populated at each node with the current operation. Such history traces are critical when dealing with dependent operations since they allow us to interrogate an SMT solver about the feasibility of the current operation, as we describe in the next section. When the operands are independent, we simply use the arithmetic operations on independent DS-structures [3] (see Appendix B).

Computing Probabilistic Ranges for Dependent Operands
When the operands are dependent, we start by assuming that the dependency is unknown. This assumption is sound because the dependency of the operation is included in the set of unknown dependencies, while the result of the operation is no longer a single distribution but a p-box. Due to this "unknown assumption", the CDFs of the output p-box are a very pessimistic over-approximation of the operation, i.e., they are far from each other. Our key insight is to then leverage an SMT solver to prune infeasible combinations of intervals from the input DSstructures, which prunes regions of zero probability from the output p-box. This probabilistic pruning using a solver squeezes together the CDFs of the output p-box, often resulting in a much more accurate over-approximation. Thanks to using a solver, we move from an unknown to a partially known dependency between the operands. Currently, PAF supports the Z3 [13] and dReal [19] SMT solvers.
Algorithm 1 shows the pseudocode of our algorithm for computing the probabilistic output range (i.e., DS-structure) for dependent operands. When dealing with dependent operands, interval arithmetic (line 5) might not be as precise as in the independent case. Hence, we use an SMT solver to prune away any over-approximations introduced by interval arithmetic when computing with dependent ranges (line 6); this use of the solver is orthogonal to the one dealing with probabilities. On line 7, we check with an SMT solver whether the current combination of ranges [x 1 , x 2 ] and [y 1 , y 2 ] is compatible with the traces of the operands. If the query is satisfiable, the probability is strictly greater than zero but currently unknown (line 8). If the query is unsatisfiable, we assign a probability of zero to the range in DS Z (line 10). Finally, we append a new range to the DS-structure DS Z (line 11). Note that the loops are independent, and hence in our prototype implementation we run them in parallel. After this algorithm terminates, we still need to assign probability values to all the unknown-probability ranges in DS Z . Since we cannot assign an exact value, we compute a range of potential values [p zmin , p zmax ] instead. This computation can be encoded as a linear programming routine exactly as described in previous work [3] (see Appendix B.1).

Computing Conditional Roundoff Error
The final step of our toolflow computes the conditional roundoff error by combining the symbolic affine arithmetic error form of the computation (see §5) with the probabilistic range analysis described above. The symbolic error form gets maximized conditioned on the results of all the intermediate operations landing in the given confidence interval (e.g., 99%) of their respective ranges (computed as described in the previous section). Note that conditioning only on the last operation of the computation tree (i.e., the AST root) would lead to extremely pessimistic over-approximation since all the outliers in the intermediate operations would be part of the maximization routine. This would lead to our tool PAF computing pessimistic error bounds typical of worst-case analyzers.
Algorithm 2 shows the pseudocode of the roundoff error computation algorithm. The algorithm takes as input a list DSS of DS-structures (one for each intermediate result range in the computation), the generated symbolic error form, and a confidence interval. It iterates over all the intermediate DS-structures (line 3), and for each it determines the ranges needed to support the chosen confidence intervals (lines 4-12). In each iteration, it first sorts the list of range-probability pairs (i.e., focal elements) of the current DS-structure by their probability value in a descending order (line 4). This is a heuristic that prioritizes the focal elements with most of the probability mass (aka typicals) to avoid including the unlikely outliers (with low probability) that cause large roundoff errors into the final roundoff error computation. With the help of an accumulator (line 8), we keep collecting focal elements (line 9) until the accumulated probability satisfies the confidence interval (line 10). Finally, we maximize the error form condi-Algorithm 2 Conditional Roundoff Error Computation 1: function cond err(DSS, errorF orm, conf idence) 2: allRanges = list() 3: for all DSi ∈ DSS do 4: f ocals = sorted(DSi, key = prob, order = descending) 5: accumulator = 0 6: ranges = Ø 7: for all ([x1, x2], px) ∈ f ocals do 8: accumulator = accumulator + px 9: ranges = ranges ∪ [x1, x2] 10: if accumulator ≥ conf idence then 11: allRanges.append(ranges) 12:

Experimental Evaluation
We evaluate PAF on the standard FPBench benchmark suite [7,17] that uses the four basic operations we currently support {+, −, ×, ÷}. Many of these benchmarks were also used in recent related work [32] that we compare against. The benchmarks come from a variety of domains: embedded software (bsplines), linear classifications (classids), physics computations (dopplers), filters (filters), controllers (traincars, rigidBody), polynomial approximations of functions (sine, sqrt), solving equations (solvecubic), and global optimizations (trids). Since FP-Bench has been primarily used for worst-case roundoff error analysis, the benchmarks come with ranges for input variables, but they do not specify input distributions. We instantiate the benchmarks with three well-known distributions for all the inputs: uniform, standard normal distribution, and double exponential (i.e. Laplace) distribution with σ = 0.01 which we will call 'exp'. The normal and exp distributions get truncated to the given range. We assume single-precision floating-point format for all operands and operations, but it is straightforward to extend PAF to mixed-precision computations.
To assess the accuracy and performance of PAF, we compare it with PrAn [32], the current state-of-the-art tool for automated analysis of probabilistic roundoff errors. PrAn currently supports only uniform and normal input distributions. It offers six different tool configurations; for each benchmark, we run all of them and report the best result. We fix the number of intervals in each discretization to 50 to match PrAn. We choose 99% as the confidence interval for the computation of our conditional roundoff error ( § 6.3) and of PrAn's probabilistic error. We also compare our probabilistic error bounds against FPTaylor which performs worst-case roundoff error analysis, and hence it does not take into account Table 2: Roundoff error bounds reported by PAF, PrAn, and FPTaylor given uniform (uni), normal (norm), and Laplace (exp) input distributions. We set the confidence interval to 99% for PAF and PrAn, and mark the smallest reported roundoff errors for each benchmark in bold. Asterisk (*) highlights a difference of more than one order of magnitude between PAF and FPTaylor.
In Table 2, of particular interest are benchmarks (6 for normal and 10 for exp) where the error bounds generated by PAF for the 99% confidence interval are at least an order of magnitude tighter than the worst-case bounds generated by FPTaylor. For such a benchmark and input distribution, PAF's results inform a user that there is an opportunity to optimize the benchmark (e.g., by reducing precision of floating-point operations) if their use-case can handle at most 1% of inputs generating roundoff errors that exceed a user-provided bound. FPTaylor's results, on the other hand, do not allow for a user to explore such fine-grained trade-offs since they are worst-case and do not take probabilities into account.
In general, we see a gradual reduction of the errors transitioning from uniform to normal to exp. When the input distributions are uniform, there is a significant chance of generating a roundoff error of the same order of magnitude as the worstcase error, because all the inputs, including the extrema, are equally likely. The standard normal distribution concentrates more than 99% of probability mass in the interval [−3, 3], thus resulting in the long tail phenomenon, where less than 0.5% of mass spreads in the interval [3, ∞]. When the normal distribution gets truncated in a neighborhood of zero (e.g., [0, 1] for bsplines and filters) nothing changes with respect to the uniform case -there is still a high chance of committing errors close to the worst-case. However, when the normal distribution gets truncated in a wider range (e.g., [−100, 100] for trids) and we do have long tails, then the outliers causing large errors are very rare events, not included in the 99% confidence interval. The exponential distribution further compresses the 99% probability mass in the tiny interval [−0.01, 0.01], so the long tails effect is common among all the benchmarks.
The runtimes of PAF vary between 10 minutes for small benchmarks, such as bsplines, to several hours for benchmarks with more than 30 operations, such as trid4 ; they are always less than two hours, except for trids with 11 hours and filters with 6 hours. The runtime of PAF is usually dominated by Z3 invocations, and the long runtimes are caused by numerous Z3 timeouts that the respective benchmarks induce. The runtimes of PrAn are comparable to PAF since they are always less than two hours, except for trids with 3 hours, sqrt with 3 hours, and sine with 11 hours. Note that neither PAF nor PrAn are memory intensive, and hence memory consumption is not an issue.
To assess the quality of our rigorous (i.e., sound) results, we implement Monte Carlo sampling to generate both roundoff error and output range distributions. The procedure consists of randomly sampling from the provided input distributions, evaluating the floating-point computation in both the specified and highprecision (e.g., double-precision) floating-point regimes to measure the roundoff error, and finally partitioning the computed errors into bins to get an approximation (i.e., histogram) of the PDF. Of course, Monte Carlo sampling does not provide rigorous bounds, but is a useful tool to assess how far the rigorous bounds computed statically by PAF are from an empirical measure of the error. Fig.5 shows the effects of the input distributions on the output and roundoff error ranges of the traincars3 benchmark. In the error graphs (right column), we show the Monte Carlo sampling evaluation (yellow line) together with the error bounds from PAF with 99% confidence interval (red plus symbol) and FPTaylor's worst-case bounds (green crossmark). In the range graphs (left column), we also plot PAF's p-box over-approximations. We can observe that in the case of uniform inputs the computed p-boxes overlap at the extrema of the output range. This phenomenon makes it impossible to distinguish between 99% and 100% confidence intervals, and hence as expected the bound reported by PAF is almost identical to FPTaylor's. This is not the case for normal and exponential distributions, where we can observe the long tail phenomenon, and PAF can significantly improve both the output range and error bounds over FPTaylor. Hence, we again illustrate how pessimistic the bounds from worst-case tools can be when the information about the input distributions is not taken into account. Finally, the graphs illustrate how the rigorous p-boxes and error bounds from PAF follow their respective empirical estimations, showing that PAF adjusts them based on the shape of the input distribution.

Related Work
Our work draws inspiration from the seminal probabilistic affine arithmetic approach of Bouissou et al. [3]. The fundamental object in this approach is the probabilistic affine form, which is an affine form whose error symbols are associated with p-boxes (see § 3.2). The authors propose an innovative technique to perform dependent operations between random variables based on affine arithmetic. However, their approach can only detect affine dependencies between the operands. On the other hand, in PAF, we detect dependencies between operands using an SMT solver. Our approach allows us to handle not only polynomial dependencies, but also any non-linearities the solver supports. Finally, their approach only computes output ranges, while we went a step further and can compute roundoff error bounds as well, which is a non-trivial extension.
The most similar approach to our work is the recent static probabilistic roundoff error analysis called PrAn [32]; to the best of our knowledge, this is also the only work apart from this paper that presents a rigorous and general probabilistic roundoff error analysis. PrAn builds on the probabilistic affine arithmetic of Bouissou et al. [3], and inherits the same limitations in dealing with dependent operations we described above. Like us, their method hinges on a discretization scheme that builds p-boxes for both the input and error distributions and propagates them through the computation. The question of how these p-boxes are chosen is left open. In contrast, we take the input variables to be random variables that the user specifies and show how the distribution of each error terms can be computed directly and exactly from the random variable generating it (see § 4). Furthermore, unlike PrAn, PAF leverages the noncorrelation between random variables and the corresponding error distribution we described in § 4.4. The immediate consequence is that PAF performs the rounding in Eq. (3) as an independent operation, thus without any uncertainty. Putting these together leads to PAF computing tighter probabilistic roundoff error bounds than PrAn, as our experimental results show (see § 7).
The idea of using a probabilistic model of rounding errors to analyze deterministic computations can be traced back to [48]. Parker's so-called 'Monte Carlo arithmetic' [38] is probably the most detailed description of this approach. We, however, consider probabilistic computations, and are therefore able to derive the error distribution from first principles. For this reason, the famous critique of the probabilistic approach to roundoff errors [25] does not apply to this work, since it only considers deterministic computations. When dealing with probabilistic computations however, everything -including rounding errors -has a probabilistic behavior and the probabilistic approach is therefore a necessity.
More recently, probabilistic roundoff error models have been investigated by Higham and Mary [23] as well as Ipsen and Zhou [24]. Interestingly, because these authors are interested in large-dimensional problems, neither approach needs to explicitly specify the distribution of errors. However, this means that these approaches are only applicable to large-dimensional problems, and are completely unsuited for the domains (e.g., control systems, filters, cyber-physical systems) captured by the FPBench benchmark suite. Finally, unlike us, neither approach is sound since they rely on concentration of measure inequalities.
Unlike probabilistic roundoff error analysis, worst-case analysis of roundoff errors has been an active research area with numerous published approaches [12,31,9,14,34,46,8,43,42,10,33,29,18,11]. As expected, none of them can deal with probabilistic inputs. Affine arithmetic [5] is a well-known technique that extends interval arithmetic [37] with information about linear correlations between operands [45]. Rigorous affine arithmetic [35] takes into consideration numerical errors stemming from the use of floating-point arithmetic when computing with affine forms, and it has been implemented in Rosa [10] to perform rigorous worst-case roundoff error analysis. Our symbolic affine arithmetic (see § 5) used in PAF evolved from rigorous affine arithmetic by keeping the coefficients of the noise terms symbolic, which often leads to improved precision. These symbolic terms are very similar to the first-order Taylor approximations of the roundoff error expressions used in FPTaylor [43,42]; our experimental results also support this since the bounds generated by PAF for the 100% confidence interval are almost always equal to the worst-case bounds computed by FPTaylor.

Conclusions and Future Work
In this paper, we presented our approach to computing rigorous (i.e., sound) probabilistic roundoff error bounds for arithmetic floating-point computations involving random variables. First, we showed how to explicitly compute the error distribution of floating-point arithmetic operations directly from the distributions of the input random variables. Then, we demonstrated how the random variable and the corresponding error distribution are close to being uncorrelated. We leverage this to be able to rigorously compute operations between random floating-point variables, and thus perform rigorous probabilistic range analysis. We showed how to do this even in the complex case of dependent operands using our novel combination of p-boxes and SMT solving. We use our probabilistic range analysis to compute conditional roundoff errors where, as opposed to the worst-case approach, the (symbolic) error expression gets maximized constrained to the output landing in a given confidence interval of interest (e.g., 99%). We implemented our approach in a prototype tool named PAF, and we compared PAF on a popular benchmark suite with state-of-the-art tools for probabilistic as well as worst-case error analysis. Our results show that PAF almost always outperforms the state-of-the-art probabilistic analysis tool PrAn in terms of generating tighter, but still rigorous, error bounds. Moreover, we observe that worst-case roundoff errors can be very pessimistic in some cases, and that PAF can reduce such error bounds by several orders of magnitude.
As future work, we plan to explore the potential use of our probabilistic range analysis to perform probabilistic overflow detection. Together with a warning about a potential overflow, a user would also get from PAF a rigorous bound on the probability of the overflow happening. This would allow users to perform a more detailed risk analysis in order to decide whether a mitigation effort is needed. We envision a similar use case for the probabilistic division-by-zero detection. Finally, in our motivating example (see § 2), we performed manual probabilistic precision tuning to further drive the point that probabilistic error analysis is needed in many settings. In fact, PAF can be used as a back-end roundoff error analyzer as a part of an existing precision tuner (such as FP-Tuner [4]). In such a setup, the confidence interval becomes a new key parameter in the precision tuning process, thereby allowing for programmers to explore a richer space of trade-offs. We plan to explore this line of work in the future.
Using the σ-additivity of measures (see §3.2) we get the following density for dist c : where F + 0,∞ (resp. F − 0,−∞ ) denotes the strictly positive (resp. negative) finite floating-point representable numbers. Since X is described by a probability density function f : R → R, we get: Using this lemma we can prove a result relating the length τ (z) z − z of the interval corresponding to a representable number z ∈ F with its absolute value |z|. Note that z − z is always a positive quantity, hence the relation with |z|. The following result will be used heavily in the proof of Theorem 2 . Again, the proof can be obtained by direct computation.
Finally, we will need the following technical lemma which can also be shown by direct computation.
Lemma 3. If z = z(s, e, k) with e = e min , e max then otherwise Note in particular that z 1−tu ∈ z, z whenever |t| ≤ 1 2 . We are now ready to prove the theorem of § 4.2. Proof of Theorem 2: We start by plugging the coefficients of Prop. 2 into Eq. (6): We start by considering the case where |t| ≤ since |t| ≤ 1.
necessary, but it make the proof a lot less cumbersome) we get: where S(k, p, t) is an error term quantifying the error made by the numerical integration scheme evaluating P [Round(X) = z(s, e, k)]) as the sum in the first step of the derivation above. It can be quantified in the same way as in the proof of Theorem 2. Crucially lim p→∞ S(k, p, t) = 0 for every k, t. We thus get that for |t| < 1 /2 lim p→∞ d(t) = lim As shown by Lemma 3, when 1 2 < t ≤ 1 not all mantissas in the sum of Eq. (6) are possible,. Using the explicit characterisation of Lemma 3, we get that for 1 /2 < |t| ≤ 1 (16) becomes The case where −1 ≤ t < − 1 2 is treated in the same way and yields the same asymptotic distribution. Combining 16 and 17 we get Proofs for § 4.4.
We assume throughout this section that err rel (X) is distributed according to d hp in Eq. (7). We start by bounding E [err rel (X)]. The proof of the following Proposition is a little technical but not conceptually difficult and can be found in the Appendix. In particular E [X.err rel (X)] = 0 if f is symmetric. More generally, the value of E [X.err rel (X)] is determined by the competing terms 2 2e and u 2 . A quick calculation shows that as long as the distribution f assigns most of its mass to values below 2 p

B.1 Linear Programming Routine
We can create a linear programming (LP) routine to delineate upper bound and lower bounds for the p-box. In the following we report the pseudo-code for the evaluation of the upper bound of a p-box. The lower bound is similar. The probabilities in the DS structures of the operands (p yi , p xi ) are called the marginals. The focal elements we use to populate DS Z are called the insiders (p zi,j ). The constraints in the LP program force the insiders to have a probability between 0 and 1, and they relate the insiders with the marginals. The insiders have to sum up to the marginals. This is very similar to a so called joint table.
The LP program takes in input an evaluation point (evp) and returns a probability value. In order to construct the DS Z exactly we should pick one evaluation point per focal element. This has quadratic complexity. A good tradeoff between accuracy and execution time consists in picking only N out of the N 2 evaluation points (e.g. using some heuristics), at the price of a slightly overapproximation. We run the linear programming routines in parallel, because the analysis of a single evaluation point is completely independent from the others. Table 3: Comparison between PAF, given uniform (uni), normal (norm) and exponential (exp) input distributions, and FPTaylor. The PAF columns report 99% of the support of the output range distribution. The FPTaylor columns report the worst-case output ranges. The asterisk (*) highlights a difference of more than one order of magnitude between PAF and FPTaylor.