1 Introduction

Basic geometric predicates, such as computing the orientation of a triangle or testing if a point is inside a circle, are at the core of many computational geometry algorithms such as convex hull, Delaunay triangulation and mesh generation [5]. Interestingly, those predicates also appear in geospatial computations such as topological spatial relations that determine the relationship among geometries. Those operations are fundamental in many Geographic Information System (GIS) applications. If evaluated with floating-point arithmetic, these computations can incur round-off errors that can, due to the ill-conditioning of discrete decisions, lead to incorrect results and inconsistencies, causing computations to fail [20].

Among other applications, Delaunay triangulations are important for the construction of triangular meshes [19, 31] and Triangulated Irregular Networks (TIN) [21]. Predicate failures in the underlying Delaunay triangulation may lead to suboptimal mesh quality and cause invalid triangulations or termination failure [30].

Robust geometric predicates can also be used in spatial predicates to guarantee correct results for floating-point geometries. Spatial predicates are used to determine the relationship between geometries and have applications in spatial databases and GIS applications. Examples of such predicates include intersects, crosses, touches, or within. Using non-robust spatial predicates, for example, a point that lies close to the shared edge of two triangles can be found to be within both or neither of them, which is not only incorrect but also inconsistent and violates basic assumptions on partitioned spaces.

Exact computations can guarantee correct results for floating-point input but are very slow for practical purposes. Since predicates are usually ill-conditioned only on a set of measure zero and extremely well-conditioned everywhere else, an adaptive evaluation can improve average performance by using exact arithmetic only if an a priori error estimate can not guarantee correctness for the faster, approximate computations. In other words, the expensive computations are filtered out by using those error estimates.

Now, the main question is how difficult it is to compute those error estimates. There are several approaches that provide a trade-off in efficiency and accuracy of error estimation. The three main types of filters are static, semi-static and dynamic.

In the first case, the error is pre-computed very efficiently using a priori bounds on the input and typically attains very low accuracy. In semi-static filters, the error estimation depends on the input. They are somewhat slower than static filters but improve on the accuracy and require no a priori bounds on the input. The slowest and most accurate are dynamic filters using floating-point interval arithmetic to better control the error and achieve fewer filter failures.

Previous work. Many techniques have been proposed in the past for efficient and robust arithmetic. In his seminal paper [30], Shewchuk introduced robust, adaptive implementations for orientation-, incircle- and insphere-predicates that can be used, for example, in the construction of Delaunay triangulations. They use a sequence of semi-static filters of ever-increasing accuracy. The phases are attempted in order, each phase building on the result from the previous one until the correct sign is obtained. On the other hand, efficient dynamic filters are proposed in [6]. For Delaunay triangulations, in [11] they propose a set of efficient static and semi-static filters and experimentally compare them with several alternatives including [30]. Meyer and Pion develop FPG [24], a general-purpose code analyzer and generator for static filtered predicates. The generated filters, however, include multiple branch instructions, which was found in [26] to cause suboptimal performance.

Nanevski et al. extend Shewchuk’s method to arbitrary polynomial expressions and implement an expression compiler that takes a function and produces a predicate, consisting of semi-static filters and an exact stage that computes the sign of the source function at any given floating point arguments [25]. Their filters, however, are not robust with respect to overflow or underflow.

In [8], Burnikel et al. present EXPCOMP, a C++ wrapper class and an expression compiler that generates fast semi-static filters for predicates involving the operations \(+, -, \cdot , /, \sqrt{\cdot }\), which include arbitrary polynomials and handle all kinds of floating-point exceptions. In their benchmarks, they found a 25-30% runtime overhead for their C++ wrapper class when compared with their expression compiler and their error bound constants are comparatively pessimistic (see Subsection 4.3 for an example).

More recently, Ozaki et al. developed an improved static filter as well as a new semi-static filter for the 2D orientation predicate, where the latter also handles floating-point exceptions such as overflow and underflow [26]. This approach yields a close-to-optimal error bound constant, however, it is not designed for arbitrary polynomial predicates.

Regarding non-linear geometries, there is work on filters for circular arcs [10]. Moreover, robust predicates could be extended to provide robust constructions such as points of intersection of linestrings [2]. Recently, GPU implementations of robust predicates have been presented, providing a constant (3 to 4 times) speedup over standard CPU implementations [9, 27].

In [13], they employ dynamic determinant computations to speed up the computation of sequences of determinants that appear in high-dimensional (typically more than 6) geometric algorithms such as convex hull and volume computation.

In [3], the authors present a C++ metaprogramming framework that produces fast, robust predicates and illustrate how GIS applications can benefit from it.

Our contribution. The contribution of this paper is three-fold. First, we present an algorithm that generates semi-static or static filters for robust predicates based on arbitrary polynomials. These filters are shown to be valid for all input numbers, regardless of range issues such as overflow or underflow. They also require only a single comparison and can therefore be evaluated encountering only a single, easy-to-predict branch. To the best of our knowledge, this is the first filter design combining generality, range robustness, and branch efficiency.

Second, we present a new implementation based on C++ meta-programming techniques that produces fast, robust code at compile-time for predicates. It is extensible, based on the C++ library Boost.Geometry [14] and publicly available at [4].

The main advantage of our implementation is the ability to automatically generate filters for arbitrary polynomial predicates without relying on external code generation tools. In addition, it can be complemented seamlessly with manual handcrafted filters, as illustrated by the use of our axis-aligned filter for the incircle predicate (see example 8).

Last, we perform an experimental analysis of the generated filters as well as a comparison with the state of the art. We perform benchmarks for 2D Delaunay triangulation, 3D Polygon Mesh processing and 3D Mesh refinement. The algorithms tested in the benchmarks make use of four different geometric predicates of different complexity. We show that our predicates outperform the state of the art libraries [7, 29] in all benchmark cases, which includes both synthetic and real data. Unlike Burnikel et al. in [8] we find no performance penalty for our C++ implementation over generated code.

2 Robust geometric predicates

In this section we review the basic concepts and notation necessary for presenting our filter design in Sect. 3 and implementation approach in Sect. 4.1.

2.1 Geometric predicates and robustness issues

In the context of this paper, we define geometric predicates to be functions that return discrete answers to geometric questions based on evaluating the sign of a polynomial. One example is the planar orientation predicate. Given three points \(a,b,c \in \mathbb {R}^2\), it determines the location of c with respect to the straight line going through a and b by evaluating the sign of

$$\begin{aligned} p_{\text {orientation}\_2}\left( a,b\right) :=\begin{vmatrix} a_{x}-c_{x}&a_{y}-c_{y}\\ b_{x}-c_{x}&b_{y}-c_{y} \end{vmatrix} \end{aligned}$$
(1)

For this definition of the orientation predicate, positive, zero, and negative determinants correspond to the locations left of the line, on the line and right of the line, respectively. This geometric predicate has applications in the construction of Delaunay triangulations, convex hulls, and in spatial predicates such as within for 2D points, lines or polygons.

While expression (1) always gives the correct answer in real arithmetic, this is not necessarily the case for floating-point arithmetic.

Definition 1

(Floating-Point Number System) For a given precision \(p \in \mathbb {N}_{\ge 2}\) and minimum and maximum exponents \(e_{\min }, e_{\max }\in \mathbb {Z}\) we define by

$$\begin{aligned} N_{p,e_{\min },e_{\max }} :=\{ \left( {-}1\right) ^\sigma \left( 1 {+} \sum _{i=1}^{p-1} b_i 2^{-i} \right) 2^e \mid \sigma , b_1,\ldots ,b_p \in \{ 0, 1 \}, e_{\min } {\le } e {\le } e_{\max } \} \end{aligned}$$

the set of normalised binary floating-point numbers and by

$$\begin{aligned} S_{p,e_{\min }} :=\{ \left( -1\right) ^\sigma \left( \sum _{i=1}^{p-1} b_i 2^{-i} \right) 2^{e_{\min }} \mid \sigma , b_1,\ldots ,b_p \in \{ 0, 1 \} \} \end{aligned}$$

the set of subnormal binary floating-point numbers.

For the remainder we will drop the parameters in the subscript. A binary Floating-Point Number system (FPN) is defined by \(F :=N \cup S \cup \{ -\infty ,\infty ,\text {NaN} \}\). For a number \(a \in F\) given in the representation

$$\begin{aligned} \left( -1\right) ^\sigma \left( 1 + \sum _{i=1}^{p-1} b_i 2^{-i} \right) 2^e \text { or } \left( -1\right) ^\sigma \left( \sum _{i=1}^{p-1} b_i 2^{-i} \right) 2^{e_{\min }}, \end{aligned}$$

we call the tuple \(\left( b_1, \ldots , b_{p-1}\right) \) significand. It is sometimes called mantissa in literature. The significand is called even if \(b_{p-1} = 0\).

Definition 2

(Rounding function) For a given FPN F we define the rounding-function \(\text {rd}:\mathbb {R}\rightarrow F\) as follows

$$\begin{aligned} \text {rd}\left( a\right) :={\left\{ \begin{array}{ll} -\infty &{} a\le -2^{e_{\max }}\left( 2-2^{-p}\right) \\ \text {closest number to a in F} &{} -2^{e_{\max }}\left( 2-2^{-p}\right)<a<2^{e_{\max }}\left( 2-2^{-p}\right) \\ \infty &{} a\ge 2^{e_{\max }}\left( 2-2^{-p}\right) \\ \end{array}\right. } \end{aligned}$$

If there are two nearest numbers in F, the one with an even significand is chosen.

Remark 1

The above definition of subnormal numbers includes zero while zero is neither a normal nor subnormal number in the IEEE standard 754-2008 [18]. This deviation from the standard does not affect the rounding error analysis in this paper.

Next, we define some special quantities. By \(\varepsilon :=2^{-p}\) we denote the machine epsilon, which is half of the difference between 1.0 and the next number in F, by \(u_{N}:=2^{-e_{\min }}\) the smallest, positive, normalized number in F and by \(u_{S}:=2^{-e_{\min }-p+1}=2\cdot \varepsilon \cdot u_{N}\) the smallest, positive, subnormal number in F.

Definition 3

(Floating-point operations) For a given FPN F and \(a, b, c \in F \cap \mathbb {R}\) we define the floating-point operator \(\circledcirc : F\times F \rightarrow F\) for each \(\circ \in \{+,-,\cdot \}\) as

$$\begin{aligned} a \circledcirc b :=\textrm{rd}\left( a \circ b\right) . \end{aligned}$$

and the Fused Multiply-Add (FMA) operator

$$\begin{aligned} \textrm{FMA}\left( a,b,c\right) :=\textrm{rd}\left( ab+c\right) \end{aligned}$$

If a zero is produced in the floating-point multiplication of two non-zero numbers, in an FMA operation with \(ab+c\ne 0\) or a subnormal number is produced, this is called an underflow. If the result of an operation is \(\infty \) or \(-\infty \), this is called an overflow. These definitions are extended to arguments with NaN by setting the result to NaN and to infinities in the natural way with the following special cases set to NaN: \(\infty \oplus -\infty \), \(-\infty \ominus -\infty \), \(\infty \ominus \infty \), \(\pm \infty \odot 0\).

This definition is consistent with the IEEE standard 754-2008 [18] with the default rounding mode “roundTiesToEven”. In [28], a number of error estimates for floating-point operations are given. In the following we use for \(a,b,c \in F \cap \mathbb {R}\) and the unit in the first place,

$$\begin{aligned} \textrm{ufp}\left( a\right) :={\left\{ \begin{array}{ll} 0 &{} a=0\\ 2^{\left\lfloor \log _{2}\left| a\right| \right\rfloor }, &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

It represents the value of the first digit in the significand of a number in floating-point representation (it can be defined the same way for numbers in \(\mathbb {R}\)). If no overflow occurs, it holds that

$$\begin{aligned} \left|a \circledcirc b - a \circ b\right|\le \frac{1}{2} \varepsilon \cdot \textrm{ufp}\left( a \circledcirc b \right) \le \varepsilon \left|a \circledcirc b \right|. \end{aligned}$$
(2)

In the case of underflow, addition and subtraction are exact. For multiplication, assuming no overflow occurs, [28] gives the error bound

$$\begin{aligned} \left| a\odot b-ab\right| =\varepsilon \cdot \textrm{ufp}\left( a\odot b\right) +\eta \end{aligned}$$

for some \(\varepsilon , \eta \in \mathbb {R}\) such that \(\epsilon \le \varepsilon \), \(\eta \le u_S\) and \(\varepsilon \eta =0\). If no underflow occurs, i.e. \(a\odot b \ge u_N=\frac{1}{2}\varepsilon ^{-1}u_S\), then this implies

$$\begin{aligned} \left|a\odot b-ab\right|\le \varepsilon \left|a\odot b\right|, \end{aligned}$$

otherwise (if underflow occurs)

$$\begin{aligned} \left| a\odot b-ab\right| \le \frac{1}{2} u_S, \end{aligned}$$

and regardless of underflow

$$\begin{aligned} \left|a\odot b-ab\right|\le \varepsilon \left( \left|a\odot b\right|\oplus u_N\right) . \end{aligned}$$
(3)

We will use similar error bounds for FMA. Assuming no overflow or underflow it holds that

$$\begin{aligned} \left|\textrm{FMA}\left( a,b,c\right) -ab+c\right|\le \varepsilon \left|\textrm{FMA}\left( a,b,c\right) \right|, \end{aligned}$$
(4)

if underflow occurs then

$$\begin{aligned} \left|\textrm{FMA}\left( a,b,c\right) -ab+c\right|\le \frac{1}{2} u_S, \end{aligned}$$

and regardless of whether underflow occurs

$$\begin{aligned} \left|\textrm{FMA}\left( a,b,c\right) -ab+c\right|\le \varepsilon \left( \left|\textrm{FMA}\left( a,b,c\right) \right|\oplus u_N\right) . \end{aligned}$$
(5)

Common examples of floating-point number systems include the binary FPN with \(p = 24, e_{\min } = -126, e_{\max } = 127\) called single-precision or FP32 and the binary FPN with \(p = 53, e_{\min } = -1022, e_{\max } = 1023\) double-precision or FP64.

Remark 2

The requirements of the previous definitions are met by IEEE 754-conforming binary floating-point number systems which include the native single- and double-precision floating-point types of the architectures x86, x86-64, current ARM, common virtual machines running WebAssembly and current CUDA processors. The machine epsilon is sometimes defined as the difference between 1.0 and the next number in F.

We call

$$\begin{aligned} \tilde{p}_{\text {orientation}\_2}\left( a,b,c\right) :=&\left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \ominus \nonumber \\&\left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \end{aligned}$$
(6)

a floating-point realisation of (1). Due to rounding errors, this realisation can produce incorrect results.

As an example, consider the points \(a=\left( -0.01, -0.59\right) \), \(b=\left( 0.01, 0.57\right) \), \(c=\left( 0,-0.01\right) \). In real arithmetic, c lies on the straight line through a and b. Their closest approximations in \(F_{53,-1022,1023}\) (IEEE 754-2008 binary64 [18], or FP64 for short), \(\tilde{a}, \tilde{b}, \tilde{c}\), however, are only very close to collinear, which makes the case sensitive to rounding errors.

As a second example, let us evaluate the spatial relationship between the point c and the closed triangles \(\tilde{t}_1 :=\{\left( -1, 0 \right) , \tilde{a}, \tilde{b} \}\) and \(\tilde{t}_2 :=\{ \left( 1, 0 \right) , \tilde{b}, \tilde{a} \}\) using the winding-number algorithm [32].

Table 1 Relationships of point \(c=\left( 0,-0.01\right) \) to polygon \(\tilde{t}_1 :=\{\left( -1, 0 \right) , \tilde{a}, \tilde{b} \}\) and \(\tilde{t}_2 :=\{ \left( 1, 0 \right) , \tilde{b}, \tilde{a} \}\), where \(a=\left( -0.01, -0.59\right) \), \(b=\left( 0.01, 0.57\right) \)

Table 1 summarizes the results, all compiled with GCC 11.1 and optimization level O2. The first row is particularly noteworthy because the results are not only incorrect but also mutually contradictory. The final row can be obtained using any implementation of the orientation predicate that guarantees correct results, such as the implementation of Shewchuk [29] or CGAL’s kernels epick or epeck [7].

Remark 3

The difference between the architectures is due to GCC producing an assembly code using the FMA instruction for evaluating (1). This instruction causes loss of anticommutativity for difference, i.e. \(a \odot b\ominus c \odot d = - (c \odot d\ominus a \odot b)\) holds if no range errors occur, but \(\text {FMA}\left( a,b,-c \odot d\right) = -\text {FMA}\left( c,d,-a \odot b\right) \) is not necessarily true. When inserted into the orientation predicate, this can lead to situations in which swapping two input points does not reverse the sign of the result.

Inconsistencies can occur without FMA as well. Consider \(\tilde{a}, \tilde{b}, \tilde{d} :=\left( \text {rd}\left( 0.15\right) , \text {rd}\right. \)\(\left. \left( 8.69 \right) \right) \) and \(\tilde{e} :=\left( \text {rd}\left( 0.07\right) , \text {rd}\left( 4.05 \right) \right) \). The floating-point realisation (6), compiled without FMA-optimizations, will determine \(\tilde{a}, \tilde{b}, \tilde{e}\) and \(\tilde{b}, \tilde{d}, \tilde{e}\) to be collinear but not \(\tilde{a}, \tilde{b}, \tilde{d}\), which is a contradiction.

Besides rounding errors, incorrect predicate results can also be caused by overflow or underflow. It can be easily checked that, in the FP64 number system,

$$\begin{aligned} \tilde{p}_{\text {orientation}\_2}\left( \left( 2^{-801}, 2^{-801}\right) ,\left( 2^{-800}, 2^{-800}\right) ,\left( 2^{-801}, 2^{-800}\right) \right) = 0, \end{aligned}$$

due to underflow, and

$$\begin{aligned} \tilde{p}_{\text {orientation}\_2}\left( \left( 2^{800}, 2^{800}\right) ,\left( 2^{800}, 2^{800}\right) ,\left( 0, 0\right) \right) = \text {NaN}, \end{aligned}$$

due to overflow.

Different approaches have been developed to obtain consistent results. We briefly discuss arbitrary precision arithmetic and floating-point filters in the following sections.

2.2 Exact arithmetic

A natural idea to solve the precision issues of floating-point arithmetic would be to perform the computations at higher precision. There are a number of arbitrary-precision libraries that implement number types with increased precision in software, such as GMP [15], the CGAL Number Types package [17] or Boost Multiprecision [23].

In combination with filters, such arbitrary-precision number types are used for exact geometric predicates in the CGAL 2D and 3D kernels, which were documented in [7]. A drawback of software-implemented number types is that basic operations can be orders of magnitude slower than hardware-implemented operations for native number types such as single- or double-precision floating-point operations on most modern processor architectures.

An approach for arbitrary-precision arithmetic that makes use of hardware acceleration is expansion arithmetic. A floating-point expansion is a tuple of multiple floating-point numbers that can represent a single number as an unevaluated sum with greater precision than a single floating-point number. Because the operations on floating-point expansions are implemented in terms of hardware-accelerated operations on the components, they can be faster than more general techniques for arbitrary precision arithmetic. The use of floating-point expansions for exact geometric predicates has been described in [30].

2.3 Floating-point filters

We call an implementation a robust floating-point predicate if it is guaranteed to produce correct results. With expansion arithmetic, we can produce a robust predicate from a floating-point realisation by replacing all rounding floating-point operators \(\oplus ,\ominus \) and \(\odot \) with the respective exact algorithms on floating-point expansions. The sign of the resulting expansion is then equal to the sign of its most significant (i.e. largest non-zero) component.

The issue with this naive approach is that even simple predicates become computationally expensive. To mitigate this issue, we resort to expansion arithmetic only in the rare case that the straightforward floating point implementation is not guaranteed to produce the correct result. This decision is made by filters.

Definition 4

(Filter) For a predicate \(\text {sign}\left( p\left( x_{1},\ldots ,x_{n}\right) \right) \) and an FPN system F, we call \(f:M\subseteq F^{n}\rightarrow \left\{ -1,0,1,\text {uncertain}\right\} \) a floating-point filter. f is called valid for p on M if for each \(\left( x_{1},\ldots ,x_{n}\right) \in M\) either \(f\left( x_{1},\ldots x_{n}\right) =\text {sign}\left( p\left( x_{1},\ldots ,x_{n}\right) \right) \) or \(f\left( x_{1},\ldots ,x_{n}\right) =\text {uncertain}\) holds. The latter case is referred to as filter failure.

Adopting the terminology used in [11], we call filters dynamic if they require the computation of an error at every step of the computation, static if they use a global error bound that does not depend on the inputs for each call of the predicate, and semi-static if their error bound has a static component and a component that depends on the input. A variation of static filters, which require a priori restrictions on the inputs to compute global error bounds, are almost static filters, which start with an error bound based on initial bounds on the input and update their error bound whenever the inputs exceed the previous bounds.

Example 1

(Shewchuk’s Stage A orientation predicate) Consider the predicate \(\textrm{sign}(p)\) based on (1) and its floating-point realisation \(\textrm{sign}(\tilde{p})\) (6). Then,

$$\begin{aligned} f\left( a_{x},\ldots ,c_{y}\right) :={\left\{ \begin{array}{ll} \text {sign}\left( \tilde{p}\right) , &{} \text {if } \left|\tilde{p}\right|\ge e\left( a_{x},\ldots ,c_{y}\right) \\ \text {uncertain}, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

with the error bound

$$\begin{aligned} e\left( a_{x},\ldots ,c_{y}\right) :=&\left( 3\varepsilon +16\varepsilon ^{2}\right) \odot (\left|\left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \right|\oplus \\&\left|\left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \right|), \end{aligned}$$

where \(\tilde{p}:=\tilde{p}\left( a_{x},\ldots ,c_{y}\right) \) and \(\varepsilon \) is the machine-epsilon of the FPN, is a valid filter for all inputs that do not cause underflow [30].

If underflow occurs, however, validity is not guaranteed. Consider the example

$$\begin{aligned} a:=&\left( \begin{array}{c} 0\\ 0 \end{array}\right) \\ b:=&\left( \begin{array}{c} 2^{e_{\min }}\\ 0 \end{array}\right) \\ c:=&\left( \begin{array}{c} 2^{e_{\min }}\\ 2^{e_{\min }} \end{array}\right) . \end{aligned}$$

Clearly the points are not collinear, however, \(e\left( a_{x},\ldots ,c_{y}\right) \) and \(\tilde{p}\) will evaluate as zero due to underflow, which shows that the filter can certify incorrect signs.

Remark 4

The term “error bound” is used in Example 1 for the quantity \(e\left( a_{x},\ldots ,c_{y}\right) \) somewhat loosely. It is only proven to be larger than the absolute error of the floating-point result in cases that might produce incorrect signs (and if no underflow occurs), which is sufficient for the validity of the filter. The term “error bound” is similarly used for the bounds in Example 3 and Theorem 1.

This filter can be considered semi-static, with its static component being \(3\varepsilon +16\varepsilon ^{2}\). The error bound is obtained mostly by applying standard forward-error analysis to the floating-point realisation. Shewchuk also described similar filters for the 2D incircle predicate, as well as the 3D orientation and incircle predicates.

Example 2

(FPG orientation filter [24]) Consider predicate (1) and its floating-point realisation (6). Let

$$\begin{aligned} m_{x}&:=\max \left\{ \left| a_{x}-c_{x}\right| ,\left| b_{x}-c_{x}\right| \right\} \ \\ m_{y}&:=\max \left\{ \left| a_{y}-c_{y}\right| ,\left| b_{y}-c_{y}\right| \right\} . \end{aligned}$$

If

$$\begin{aligned}{} & {} \max {\{m_{x}, m_{y}\}} > 2^{509},\\{} & {} 0 \ne \min {\{m_{x}, m_{y}\}} \le 2^{-485} \end{aligned}$$

or

$$\begin{aligned} \left|\tilde{p}\right|\le 8.887 205 737 259 27 \times 10^{16} \odot m_{x} \odot m_{y} \ne 0, \end{aligned}$$

then “uncertain” is returned, otherwise the sign of \(\tilde{p}\) is returned. The filter is valid with FP64 arithmetic for all FP64 inputs. It is also semi-static with the static component of the error bound being \(8.887 205 737 259 27\times 10^{16}\) (roughly \(4\varepsilon \)).

A static version of this filter can be obtained if global bounds for \(m_x\) and \(m_y\) are known a priori. The first two conditions are range-checks that guard against overflow and underflow. Apart from these conditions, the filter is based on an error bound similar to the previous example. The program FPG can generate such filters for arbitrary homogeneous polynomials if group annotations for the input variables are provided. In this context, group annotations are lists of grouped variables that are part of the input for FPG. The group annotations help the code generator with the choice of the scaling factors \(m_{x}\) and \(m_{y}\). In the example above, the group annotations specified that \(a_x, b_x\) and \(c_x\) as well as \(a_y, b_y\) and \(c_y\) form a group.

Example 3

(2D orientation filter by Ozaki et al. [26]) Consider again predicate (1) and its floating-point realisation (6). Let

$$\begin{aligned} f\left( a_{x},\ldots ,c_{y}\right) :={\left\{ \begin{array}{ll} \text {sign}\left( \tilde{p}\right) , &{} \text {if } \left|\tilde{p}\right|> e\left( a_{x},\ldots ,c_{y}\right) \\ \text {uncertain}, &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

with the error bound

$$\begin{aligned} e\left( a_{x},\ldots ,c_{y}\right) :=&\, \theta \odot (|\left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \oplus \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) |+ u_N), \end{aligned}$$

where \(\tilde{p}:=\tilde{p}\left( a_{x},\ldots ,c_{y}\right) \), \(\varepsilon \) is the machine-epsilon of the FPN, \(u_N\) is the smallest, positive normalized number in the floating-point system and

$$\begin{aligned} \theta :=3 \varepsilon - \left( 2\left\lfloor \frac{-1+\sqrt{\varepsilon ^{-1}+45}}{4} \right\rfloor - 22 \right) \varepsilon ^2 \in F. \end{aligned}$$

Then f is a valid filter for all inputs.

Unlike Example 1, this filter cannot produce incorrect results for inputs that cause underflows, and unlike Example 2 which evaluates multiple inequalities at which it can branch, this filter only contains a single branch. The static constant \(\theta \), which is better than in the other two filters, has been obtained in [26] by using a model of floating-point arithmetic that bounds the relative rounding error by the unit in the first place as introduced in [28], which is smaller than the machine epsilon unless the significand of the result is exactly 1, and by considering more carefully the accumulated error of the entire predicate expression, rather than just propagating the maximum possible error in each subexpression and by considering various cases in which the sign can be guaranteed to be correct.

A disadvantage of the filter in this example is that, unlike the previous two filters, it returns “uncertain” for common, simple degeneracies like three points that have the same x-coordinate or the same y-coordinate or contain duplicate points.

The next example is not strictly an error bound filter.

Example 4

(Shewshuk’s stage B orientation predicate) Consider the predicate (1) and its floating-point realisation (6). Let \(d_{ax} :=a_{x}\ominus c_{x}, d_{bx} :=b_{x} \ominus c_{x}\) and analogously for y. If the computations of these values incurred round-off errors, return uncertain. Otherwise compute \(d_{ax} \cdot d_{by} - d_{ay} \cdot b_{dx} \) exactly, using expansion arithmetic, and return the sign. This filter is described as stage B in [30] and is valid for all inputs that do not produce overflow or underflow. The full version in [30] also includes an error bound with an error bound on the order of \(\varepsilon ^2\) check that allows preventing a filter failure if the no-round-off test fails.

Similar filters were presented by Shewchuk for other predicates. This filter is particularly effective for input points that are closer to each other than to \(\left( 0, 0\right) \) because differences of floating-point numbers that are within half/double of each other do not incur round-off errors. In the context of Shewchuk’s multi-staged predicates, this filter also has the advantage that it can reuse computations from stage A and that its interim results can be reused for more precise stages in case of filter failure. As a final example, we present a dynamic filter.

Example 5

(Interval arithmetic filter) Consider a predicate and one of its floating-point realisations. Given the inputs, compute for each floating-point operation \(\oplus , \ominus , \odot \) the lower and the upper bound of the result, including the rounding error, using interval arithmetic. If the final resulting interval contains numbers of different signs, return uncertain. Otherwise, return the shared sign of all numbers in the result interval. This approach is presented in [6].

In [11, 26] and [30] failure probabilities and performance experiments for various sequences of filters, types of inputs and algorithms are presented. We will present our own experiments in Sect. 4.2.

3 Semi-static filters

In this section, we will define a set of rules that allow us to derive error bounds for arbitrary floating-point polynomials. These error bounds will then be used to define semi-static filters. We start with establishing some properties of floating-point operations that will be used in the proof of the validity of our error bounds.

Lemma 1

Let \(a,b\in F\) be floating-point numbers.

  1. 1.

    If either a or b is in \(\left\{ -\infty ,\infty ,\textrm{NaN}\right\} \), then

    $$\begin{aligned} a\circledcirc b\in \left\{ -\infty ,\infty ,\textrm{NaN}\right\} \end{aligned}$$

    and

    $$\begin{aligned} \left|a\circledcirc b\right|\in \left\{ \infty ,\textrm{NaN}\right\} \end{aligned}$$

    for every \(\circledcirc \in \left\{ \oplus ,\ominus ,\odot \right\} \). Consequently, the same holds for all floating-point expressions using the operators \(\oplus ,\ominus ,\odot \) and \(\left|\cdot \right|\) that contain a subexpression that evaluates to \(-\infty ,\infty \) or \(\textrm{NaN}\).

  2. 2.

    If an underflow occurs in the computation of \(a \oplus b\) or \(a \ominus b\), then the result is exact.

The first statement follows directly from 3 and the second statement is given as Theorem 3.4.1 in [16].

Let \(p:\mathbb {R}^{m}\rightarrow \mathbb {R}\) a polynomial in m variables, denoted as \(p\in \mathbb {R}\left[ x_1,\ldots ,x_m\right] \). Let \(\tilde{p}:F^{m}\rightarrow F\) be a floating-point realisation of p, i.e. a function on \(F^{m}\) involving only the floating-point operations \(\oplus ,\ominus \) and \(\odot \) such that it would be equivalent to p if the floating-point operations were replaced by the corresponding exact operations. Note, that \(\tilde{p}\) is not unique, e.g. \(\left( x_1 \oplus x_2 \right) \oplus x_3\) is different from \(x_1 \oplus \left( x_2 \oplus x_3 \right) \) but both are floating-point realisations of the real polynomial \(x_1 + x_2 + x_3\). We denote by \(F\left[ x_1,\ldots ,x_m\right] \) the set of floating-point realisations of polynomials in m variables. The subexpressions of \(\tilde{p}\) will be denoted by \(\tilde{p}_{1},\ldots ,\tilde{p}_{k}\).

We will present a recursive scheme that allows the derivation of error bound expressions for semi-static, almost static and static floating-point filters. We will assume that the final operation of \(\tilde{p}\) is a sum or difference, so it holds that \(\tilde{p}=\tilde{p}_{1}\circledcirc \tilde{p}_{2}\) with \(\circledcirc \in \left\{ \oplus ,\ominus \right\} \). If it were a multiplication, the signs of each factor could be determined independently. These filters will require only one branch like the filters in [26] and will not certify incorrect values for inputs that cause overflow and optionally for inputs that cause underflow.

3.1 Error bounds

As a reminder, semi-static error bounds are partially computed at compile-time and partially computed from the input values at runtime. The static component of our error bounds is a polynomial in the machine epsilon \(\varepsilon \), so an element of \(\mathbb {R}\left[ \varepsilon \right] \), with integer coefficients. The runtime component of our semi-static error bounds is an expression in input values \(x_1,\ldots ,x_m\) and constants with the operators \(\oplus , \ominus , \odot \) and \(\left|\cdot \right|\). We will call the set of such expressions \(F'\left[ x_1,\ldots ,x_m\right] \). We will define two error bound maps E and \(E_\textrm{UFP}\) for all subexpressions \(\tilde{q}\) of \(\tilde{p}\) of the form

$$\begin{aligned} E,E_{\textrm{UFP}}: F\left[ x_1,\ldots ,x_m\right] \rightarrow \mathbb {R}\left[ \varepsilon \right] \times F'\left[ x_1,\ldots ,x_m\right] , \quad \tilde{q}\mapsto \left( a,m\right) , \end{aligned}$$

such that the following invariants hold:

Either

figure a

or both

figure b

and

figure c

\(E_{\textrm{UFP}}\), where UFP signifies underflow protection, will be constructed such that these invariants hold regardless of whether underflow occurs during any of the computations. For E this will not be guaranteed. The value of the static component \(a\left( \varepsilon \right) \) of an error bound is in \(\mathbb {R}\) but not necessarily representable in F. In an implementation, it can be represented as a list of integer coefficients. Because the polynomial \(a\left( \cdot \right) \) will only be evaluated in \(\varepsilon \), we will omit the argument and will use the polynomial and its value in \(\varepsilon \) interchangeably. The error bound maps are defined through a list of recursive error bound rules,

$$\begin{aligned} R_{i\left( , \textrm{UFP} \right) }: F\left[ x_1,\ldots ,x_m\right] \rightarrow \mathbb {R}\left[ \varepsilon \right] \times F'\left[ x_1,\ldots ,x_m\right] \end{aligned}$$

for \(1\le i \le 9\) as follows:

Definition 5

(Error Bound Rules, Error Bound Map) Let \(\tilde{q}:F^{m}\rightarrow F\) be a subexpression of a floating-point polynomial \(\tilde{p}:F^{m}\rightarrow F\). We define the following error bound rules:

  1. 1.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =c\) for some \(c\in F\), we set

    $$\begin{aligned} R_{1}\left( \tilde{q}\right) :=\left( 0,\left|c\right|\right) . \end{aligned}$$
  2. 2.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =x_{i}\) for some \(1\le i\le m\), we set

    $$\begin{aligned} R_{2}\left( \tilde{q}\right) :=\left( 0,\left|x_{i}\right|\right) . \end{aligned}$$
  3. 3.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =x_{i}\circledcirc x_{j}\) for some \(1\le i,j\le m\) and \(\circledcirc \in \left\{ \oplus ,\ominus \right\} \), we set

    $$\begin{aligned} R_{3}\left( \tilde{q}\right) :=\left( \varepsilon ,\left|x_{i}\circledcirc x_{j}\right|\right) . \end{aligned}$$
  4. 4.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =x_{i}\odot x_{j}\) for some \(1\le i,j\le m\), we set

    $$\begin{aligned} R_{4}\left( \tilde{q}\right) :=\left( \varepsilon ,\left|x_{i}\odot x_{j}\right|\right) . \end{aligned}$$

    and

    $$\begin{aligned} R_{4,\textrm{UFP}}\left( \tilde{q}\right) :=\left( \varepsilon ,\left|x_{i}\odot x_{j}\right|\oplus u_{N}\right) . \end{aligned}$$
  5. 5.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =\left( x_{i}\circledcirc _{1}x_{j}\right) \odot \left( x_{h}\circledcirc _{2}x_{g}\right) \) for some \(1\le g,h,i,j\le m\) and \(\circledcirc _{1},\circledcirc _{2}\in \left\{ \oplus ,\ominus \right\} \), we set

    $$\begin{aligned} R_{5}\left( \tilde{q}\right) :=\left( 3\varepsilon -\left( \phi -14\right) \varepsilon ^{2},\left|\left( x_{i}\circledcirc _{1}x_{j}\right) \odot \left( x_{h}\circledcirc _{2}x_{g}\right) \right|\right) \end{aligned}$$

    and

    $$\begin{aligned} R_{5,\textrm{UFP}}\left( \tilde{q}\right) :=\left( 3\varepsilon -\left( \phi -14\right) \varepsilon ^{2},\left|\left( x_{i}\circledcirc _{1}x_{j}\right) \odot \left( x_{h}\circledcirc _{2}x_{g}\right) \right|\oplus u_{N}\right) \end{aligned}$$

    with

    $$\begin{aligned} \phi :=2\left\lfloor \frac{-1+\sqrt{4\varepsilon ^{-1}+45}}{4}\right\rfloor . \end{aligned}$$
  6. 6.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =\tilde{q}_{1}\left( x_{1},\ldots ,x_{m}\right) \circledcirc \tilde{q}_{2}\left( x_{1},\ldots ,x_{m}\right) \) with \(\circledcirc \in \left\{ \oplus ,\ominus \right\} \), we set

    $$\begin{aligned} R_{6}\left( \tilde{q}\right) :=\left( \left( 1+\varepsilon \right) \max \left( a_{1},a_{2}\right) +\varepsilon ,m_{1}\oplus m_{2}\right) \end{aligned}$$

    and

    $$\begin{aligned} R_{6,\textrm{UFP}}\left( \tilde{q}\right) :=\left( \left( 1+\varepsilon \right) \max \left( a_{1},a_{2}\right) +\varepsilon ,m_{1}\oplus m_{2}\right) \end{aligned}$$

    with \(\left( a_{i},m_{i}\right) :=E\left( \tilde{q}_{i}\right) \) and \(\left( a_{i},m_{i}\right) :=E_{\textrm{UFP}}\left( \tilde{q}_{i}\right) \) respectively for \(i=1,2\).

  7. 7.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_{1},\ldots ,x_{m}\right) =\tilde{q}_{1}\left( x_{1},\ldots ,x_{m}\right) \odot \tilde{q}_{2}\left( x_{1},\ldots ,x_{m}\right) \), we set

    $$\begin{aligned} R_{7}\left( \tilde{q}\right) :=\left( \left( 1+\varepsilon \right) \left( a_{1}+a_{2}+a_{1}a_{2}\right) +\varepsilon ,m_{1}\odot m_{2}\right) \end{aligned}$$

    and

    $$\begin{aligned} R_{7,\textrm{UFP}}\left( \tilde{q}\right) :=\left( \left( 1+\varepsilon \right) \left( a_{1}+a_{2}+a_{1}a_{2}\right) +\varepsilon ,m_{1}\odot m_{2}\oplus u_{N}\right) \end{aligned}$$

    with \(\left( a_{i},m_{i}\right) :=E\left( \tilde{q}_{i}\right) \) and \(\left( a_{i},m_{i}\right) :=E_{\textrm{UFP}}\left( \tilde{q}_{i}\right) \) respectively for \(i=1,2\).

  8. 8.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_1,\ldots ,x_m\right) =\textrm{FMA}\left( x_h,x_i,x_j\right) \) for some \(1\le h,i,j \le m\) we set

    $$\begin{aligned} R_{8}\left( \tilde{q}\right) :=\left( \varepsilon , \left|\textrm{FMA}\left( x_h,x_i,x_j\right) \right|\right) \end{aligned}$$

    and

    $$\begin{aligned} R_{8,\textrm{UFP}}\left( \tilde{q}\right) :=\left( \varepsilon , \left|\textrm{FMA} \left( x_h,x_i,x_j\right) \right|\oplus u_N \right) \end{aligned}$$
  9. 9.

    For a \(\tilde{q}\) of the form \(\tilde{q}\left( x_1,\ldots ,x_m\right) {=}\textrm{FMA}\left( \tilde{q}_1 \left( x_1,\ldots ,x_m\right) ,\ldots ,\tilde{q}_3 \left( x_1,\ldots ,x_m\right) \right) \) we set

    $$\begin{aligned} R_{9}\left( \tilde{q}\right) :=\left( a, \left|\textrm{FMA}\left( m_1,m_2,m_3\right) \right|\right) \end{aligned}$$

    and

    $$\begin{aligned} R_{9,\textrm{UFP}}\left( \tilde{q}\right) :=\left( a, \left|\textrm{FMA}\left( m_1,m_2,m_3\right) \right|\oplus u_N\right) \end{aligned}$$

    with

    $$\begin{aligned} a:=\max \left( \left( a_{1}+a_{2}+a_{1}a_{2}\right) \left( 1+\varepsilon \right) ,a_{3}\right) \left( 1+\varepsilon \right) +\varepsilon \end{aligned}$$

We define \(E\left( \tilde{q}\right) \) to be the first applicable map out of \(R_{1},\ldots ,R_{9}\) and analogously \(E_{\textrm{UFP}}\left( \tilde{q}\right) \) with the respective UFP-variations of the rules.

It is straightforward to see that E and \(E_{\textrm{UFP}}\) are well-defined because the rules are exhaustive in the sense that there is no subexpression in a floating-point polynomial for which no rule is applicable and any recursion through \(R_{6}\) and \(R_{7}\) or their UFP-variations terminates at the level of individual variables.

Lemma 2

Let \(\tilde{p}\) be an arbitrary floating-point polynomial then the invariants for \(E\left( \tilde{q}\right) \) hold for every subexpression \(\tilde{q}\) of \(\tilde{p}\) for every choice of floating-point inputs \(x_{1},\ldots ,x_{m}\in F\) such that no underflow occurs in the evaluation of any subexpression of \(\tilde{q}\).

Following [30], we introduce the following convenient notation that will be used in the proof. We extend the arithmetic operations \(\circ \) to sets \(A,B\subset \mathbb {R}\) by \(A\circ B:=\{a\circ b \mid a\in A, b\in B\}\), identify \(a\in \mathbb {R}\) with \(\{a\}\) for \(\circ \in \{+,-,\cdot \}\), and set \(A\pm a:= A + [-a,a]\).

Proof

For any subexpression \(\tilde{q}\) to which \(R_{1}\) or \(R_{2}\) applies, the statement is obvious. For subexpressions for which \(R_{3}\), \(R_{4}\) or \(R_{8}\) are the first applicable rules and no overflow occurs, (I2.1) holds by the definition of m and (I2.2) follows from the error bounds 24 respectively. If overflow occurs, m is infinity and (I1) holds. For subexpressions, to which \(R_{5}\) applies, either (I1) holds if overflow occurs or, if no overflow occurs, (I2.1) holds by definition and (I2.2) is proven in [26] in Lemma 3.1.

If \(R_{6}\) is the first applicable rule, we assume that the invariant (I1) or the invariants (I2.1) and (I2.2) hold for \(\left( a_{1},m_{1}\right) :=E\left( \tilde{q}_{1}\right) \) and \(\left( a_{2},m_{2}\right) :=E\left( \tilde{q}_{2}\right) \) and we consider the case that \(\tilde{q}=\tilde{q}_{1}\oplus \tilde{q}_{2}\). If \(\tilde{q}_{1}\) or \(\tilde{q}_{2}\) is either \(\infty \) or NaN, by the assumption so are \(m_{1}\) or \(m_{2}\) and consequently \(m_{1}\oplus m_{2}\) and (I1) holds. If no overflow occurs, we see that

$$\begin{aligned} \left|\tilde{q}\right|&=\left|\tilde{q}_{1}\oplus \tilde{q}_{2}\right|\\&\le \left|\tilde{q}_{1}\right|\oplus \left|\tilde{q}_{2}\right|\\&\le m_{1}\oplus m_{2} \end{aligned}$$

and

$$\begin{aligned} \tilde{q}&=\tilde{q}_{1}\oplus \tilde{q}_{2}\\&\in \tilde{q}_{1}+\tilde{q}_{2}\pm \varepsilon \left|\tilde{q}_{1}\oplus \tilde{q}_{2}\right|\\&\subseteq \tilde{q}_{1}+\tilde{q}_{2}\pm \varepsilon \left( m_{1}\oplus m_{2}\right) \\&\subseteq q_{1}\pm a_{1}\left( \varepsilon \right) m_{1}+q_{2}\pm a_{2}\left( \varepsilon \right) m_{2}\pm \varepsilon \left( m_{1}\oplus m_{2}\right) \\&\subseteq q\pm \max \left( a_{1}\left( \varepsilon \right) ,a_{2}\left( \varepsilon \right) \right) \left( m_{1}+m_{2}\right) \pm \varepsilon \left( m_{1}\oplus m_{2}\right) \\&\subseteq q\pm \left( \max \left( a_{1}\left( \varepsilon \right) ,a_{2}\left( \varepsilon \right) \right) \left( 1+\varepsilon \right) +\varepsilon \right) \left( m_{1}\oplus m_{2}\right) , \end{aligned}$$

where we used the assumption that the invariant holds for the two subexpressions and standard floating-point rounding error estimates. The proof for \(\tilde{q}=\tilde{q}_{1}\ominus \tilde{q}_{2}\) is analogous.

If \(R_{7}\) is the first applicable rule, we assume that the invariant holds for \(\left( a_{1},m_{1}\right) :=E\left( \tilde{q}_{1}\right) \) and \(\left( a_{2},m_{2}\right) :=E\left( \tilde{q}_{2}\right) \). Analogous to above, the case of overflow is trivial, so we consider the case that no overflow occurs. Again it holds that

$$\begin{aligned} \left|\tilde{q}\right|&=\left|\tilde{q}_{1}\odot \tilde{q}_{2}\right|\\&\le m_{1}\odot m_{2}. \end{aligned}$$

and

$$\begin{aligned} \tilde{q}&=\tilde{q}_{1}\odot \tilde{q}_{2}\\&\in \tilde{q}_{1}\cdot \tilde{q}_{2}\pm \varepsilon \left|\tilde{q}_{1}\odot \tilde{q}_{2}\right|\\&\subseteq \tilde{q}_{1}\cdot \tilde{q}_{2}\pm \varepsilon \left( m_{1}\odot m_{2}\right) \\&\subseteq \left( q_{1}\pm a_{1}m_{1}\right) \cdot \left( q_{2}\pm a_{2}m_{2}\right) \pm \varepsilon \left( m_{1}\odot m_{2}\right) \\&\subseteq q_{1}q_{2}\pm a_{2}m_{2}q_{1}\pm a_{1}m_{1}q_{2}\pm a_{1}a_{2}m_{1}m_{2}\pm \varepsilon \left( m_{1}\odot m_{2}\right) \\&\subseteq q_{1}q_{2}\pm a_{2}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \pm a_{1}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \\&\quad \pm a_{1}a_{2}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \pm \varepsilon \left( m_{1}\odot m_{2}\right) \\&\subseteq q_{1}q_{2}\pm \left( \left( 1+\varepsilon \right) \left( a_{1}+a_{2}+a_{1}a_{2}\right) +\varepsilon \right) \left( m_{1}\odot m_{2}\right) . \end{aligned}$$

The proof for \(R_{9}\) combines the steps of the proofs for \(R_6\) and \(R_{7}\) and uses that the multiplication in the FMA can be assumed to be error-free Because the recursion eventually terminates at a non-recursion case (rules 1–5), the claims for \(E\left( \tilde{q}_{1}\right) \) and \(E\left( \tilde{q}_{2}\right) \) hold. \(\square \)

Lemma 3

Let \(\tilde{p}\) be an arbitrary floating-point polynomial and then the invariants for \(E_\mathrm{{UFP}}\left( \tilde{q}\right) \) hold for every subexpression \(\tilde{q}\) of \(\tilde{p}\) for every choice of inputs \(x_{1},\ldots ,x_{m}\in F\).

Proof

Because it is useful for the parts of the proof that apply to the recursive rules \(R_6\) and \(R_7\), we will also prove for each rule applied to a subexpression \(\tilde{q}\) that

figure d

holds.

For \(R_{1}\) and \(R_{2}\) there is nothing to prove.

For \(R_{3}\) the reasoning given in the proof for Lemma 2 still applies because the assumption of no underflow occurring was not used. If underflow occurs then \(\tilde{q}\) is evaluated exactly, i.e. \(\tilde{q} = q\) and if no underflow occurs then \(\tilde{q}\) is not subnormal and hence it holds that \(m \ge u_N\).

For \(R_{4,\textrm{UFP}}\) and \(R_{8,\textrm{UFP}}\), we first note that m is always non-zero and not smaller than either \(\tilde{q}\) or \(u_N\) so invariants () and () hold. Invariant () then follows directly from \(2\varepsilon u_{N}=u_{S}\) and 3 and 5 respectively.

() for \(R_{5,\textrm{UFP}}\) was proven as Lemma 3.1 in [26]. For () and () the same reasoning as for \(R_{r,\textrm{UFP}}\) applies.

For the recursive rules \(R_{6,\textrm{UFP}}\) and \(R_{7,\textrm{UFP}}\), we assume at all invariants hold for the respective subexpressions \(\tilde{q}_1\) and \(\tilde{q}_2\), this is again justified because all recursions will be cases of rules \(R_{1}\) to \(R_{5,\textrm{UFP}}\) for which the invariants were already proven to hold or to other cases for rules \(R_{6,\textrm{UFP}}\) and \(R_{7,\textrm{UFP}}\).

For \(R_{6,\textrm{UFP}}\), as in Lemma 2, it is obvious that invariant () holds. If either \(m_{1}\) or \(m_{2}\) is equal to or greater than \(u_{N}\), then no underflow can occur in the evaluation of m and the invariant holds as proven in Lemma 2 and m is greater than or equal to \(u_{N}\). If both \(m_1\) and \(m_2\) are smaller than \(u_{N}\) then \(\tilde{q}_{1}\) and \(\tilde{q}_{2}\) are evaluated error-free and \(\tilde{q}\) is error-free if underflow occurs. If no underflow occurs in the evaluation of \(\tilde{q}\), the invariant also holds as proven in Lemma 2 and in this case m is equal to or greater than \(u_{N}\).

For \(R_{7,\textrm{UFP}}\), we first note that, as in \(R_{4,\textrm{UFP}}\), m is greater than or equal to both \(\tilde{q}\) and \(u_N\), so invariants I2.1 and () hold. To show that () holds, we use 3 to obtain

$$\begin{aligned} \tilde{q}&=\tilde{q}_{1}\odot \tilde{q}_{2}\\&\in \tilde{q}_{1}\cdot \tilde{q}_{2}\pm \varepsilon \left( \left|\tilde{q}_{1}\odot \tilde{q}_{2}\right|\oplus u_{N}\right) \\&\subseteq \left( q_{1}\pm a_{1}m_{1}\right) \cdot \left( q_{2}\pm a_{2}m_{2}\right) \pm \varepsilon \left( m_{1}\odot m_{2}\oplus u_{N}\right) \\&\subseteq q_{1}q_{2}\pm a_{2}m_{2}q_{1}\pm a_{1}m_{1}q_{2}\pm a_{1}a_{2}m_{1}m_{2}\pm \varepsilon \left( m_{1}\odot m_{2}\oplus u_{N}\right) \\&\subseteq q_{1}q_{2}\pm a_{2}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \pm a_{1}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \\&\pm a_{1}a_{2}\left( 1+\varepsilon \right) \left( m_{1}\odot m_{2}\right) \pm \varepsilon \left( m_{1}\odot m_{2}\oplus u_{N}\right) \\&\subseteq q_{1}q_{2}\pm \left( \left( 1+\varepsilon \right) \left( a_{1}+a_{2}+a_{1}a_{2}\right) +\varepsilon \right) \left( m_{1}\odot m_{2} \oplus u_{N}\right) . \end{aligned}$$

The proof for \(R_{9,\textrm{UFP}}\) combines the steps of the proofs for \(R_6\) and \(R_{7,\textrm{UFP}}\) and uses that the multiplication in the FMA can be assumed to be error-free. \(\square \)

3.2 Floating-point filters

The following result provides two semi-static filters for floating-point predicates that evaluate the sign of a polynomial. It is only stated for floating-point realisations of polynomials that are sums or differences. For products, the signs of each factor could be obtained individually and then multiplied.

Theorem 1

Let \(p\in \mathbb {R}\left[ x_{1},\ldots ,x_{m}\right] \) be a polynomial and \(\tilde{p}\in F\left[ x_{1},\ldots ,x_{m}\right] \) be some floating-point realisation of p.

  1. 1.

    Let \(\tilde{p}\) be of the form \(\tilde{p}=\tilde{p}_{1}\oplus \tilde{p}_{2}\) or \(\tilde{p}=\tilde{p}_{1}\ominus \tilde{p}_{2}\), \(\left( a_{1},m_{1}\right) :=E\left( \tilde{p}_{1}\right) \) and \(\left( a_{2},m_{2}\right) :=E\left( \tilde{p}_{2}\right) \). Moreover, let constants \(a_3,a_4 \in F\) satisfy

    $$\begin{aligned} a_3 > \frac{\max \left( a_{1},a_{2}\right) }{1-\varepsilon }, \qquad a_4 \ge a_{3}\left( 1+\varepsilon \right) ^{2}, \end{aligned}$$

    and define

    $$\begin{aligned} e\left( x_{1},\ldots ,x_{m}\right) :=a_{4}\odot \left( m_{1}\left( x_{1},\ldots ,x_{m}\right) \oplus m_{2}\left( x_{1},\ldots ,x_{m}\right) \right) . \end{aligned}$$

    Then, for every choice of \(x_{1},\ldots ,x_{m}\in F\backslash \left\{ \textrm{NaN},\infty ,-\infty \right\} \) such that no underflow occurs in the evaluation of \(\tilde{p}\) or e

    $$\begin{aligned} f\left( x_{1},\ldots ,x_{m}\right) :={\left\{ \begin{array}{ll} \textrm{sign}\left( \tilde{p}\left( x_{1},\ldots ,x_{m}\right) \right) &{} \left|\tilde{p}\right|>e\vee e=0\\ \textrm{uncertain} &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

    is a valid filter.

  2. 2.

    Let \(\tilde{p}\) be of the form \(\tilde{p}=\tilde{p}_{1}\oplus \tilde{p}_{2}\) or \(\tilde{p}=\tilde{p}_{1}\ominus \tilde{p}_{2}\), \(\left( a_{1},m_{1}\right) :=E_{\textrm{UFP}}\left( \tilde{p}_{1}\right) \) and \(\left( a_{2},m_{2}\right) :=E_{\textrm{UFP}}\left( \tilde{p}_{2}\right) \). We set \(a_{3}\) and \(a_{4}\) as in 1. and

    $$\begin{aligned} e\left( x_{1},\ldots ,x_{m}\right) :=a_{4}\odot \left( m_{1}\left( x_{1},\ldots ,x_{m}\right) \oplus m_{2}\left( x_{1},\ldots ,x_{m}\right) \right) \oplus u_{\textrm{S}}. \end{aligned}$$

    Then for every choice of \(x_{1},\ldots ,x_{m}\in F\backslash \left\{ \textrm{NaN},\infty ,-\infty \right\} , \)

    $$\begin{aligned} f\left( x_{1},\ldots ,x_{m}\right) :={\left\{ \begin{array}{ll} \textrm{sign}\left( \tilde{p}\left( x_{1},\ldots ,x_{m}\right) \right) &{} \left|\tilde{p}\right|>e\\ \textrm{uncertain} &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

    is a valid filter.

  3. 3.

    Let \(\tilde{p}\) be of the form \(\tilde{p}=\textrm{FMA}\left( \tilde{p}_{1}, \tilde{p}_{2}, \tilde{p}_{3}\right) \), \(\left( a_{1},m_{1}\right) :=E\left( \tilde{p}_{1}\right) \), \(\left( a_{2},m_{2}\right) :=E\left( \tilde{p}_{2}\right) \) and \(\left( a_{3},m_{3}\right) :=E\left( \tilde{p}_{3}\right) \). Moreover, let constants \(a_4,a_5 \in F\) satisfy

    $$\begin{aligned} a_4 > \frac{\max \left( a_{1}+a_{2}+a_1 a_2, a_3\right) }{1-\varepsilon }, \qquad a_5 \ge a_{4}\left( 1+\varepsilon \right) ^{2}, \end{aligned}$$

    and define

    $$\begin{aligned} e\left( x_{1},\ldots ,x_{m}\right) :=a_{5}\odot \textrm{FMA}\left( m_{1}\left( x_{1},\ldots ,x_{m}\right) ,\ldots , m_{3}\left( x_{1},\ldots ,x_{m}\right) \right) . \end{aligned}$$

    Then, for every choice of \(x_{1},\ldots ,x_{m}\in F\backslash \left\{ \textrm{NaN},\infty ,-\infty \right\} \) such that no underflow occurs in the evaluation of \(\tilde{p}\) or ef as defined in 1. is a valid filter.

  4. 4.

    Let \(\tilde{p}\) be of the form \(\tilde{p}=\textrm{FMA}\left( \tilde{p}_{1}, \tilde{p}_{2}, \tilde{p}_{3}\right) \), \(\left( a_{1},m_{1}\right) :=E\left( \tilde{p}_{1}\right) \), \(\left( a_{2},m_{2}\right) :=E\left( \tilde{p}_{2}\right) \) and \(\left( a_{3},m_{3}\right) :=E\left( \tilde{p}_{3}\right) \). We set \(a_{4}\) and \(a_{5}\) as in 3. and

    $$\begin{aligned} e\left( x_{1},\ldots ,x_{m}\right) :=a_{5}\odot \textrm{FMA}\left( m_{1}\left( x_{1},\ldots ,x_{m}\right) ,\ldots , m_{3}\left( x_{1},\ldots ,x_{m}\right) \right) \oplus u_{\textrm{S}}. \end{aligned}$$

    Then for every choice of \(x_{1},\ldots ,x_{m}\in F\backslash \left\{ \textrm{NaN},\infty ,-\infty \right\} ,\) f as defined in 2. is a valid filter.

Note that \(\left|\tilde{p}\right|>e\) always evaluates as false if e is \(\infty \) or \(\textrm{NaN}\).

Proof

We first prove 1. and we assume without loss of generality that \(\tilde{p}=\tilde{p}_{1}\oplus \tilde{p}_{2}\). Using Lemma 1 and Lemma 2, it holds that

$$\begin{aligned} \tilde{p}&=\tilde{p}_{1}\oplus \tilde{p}_{2}\\&\subseteq \left( \tilde{p}_{1}+\tilde{p}_{2}\right) \pm \varepsilon \tilde{p}\\&\subseteq p\pm \varepsilon \tilde{p}\pm a_{1}m_{1}\pm a_{2}m_{2} \end{aligned}$$

and equivalently

$$\begin{aligned} p&\in \tilde{p}\pm \varepsilon \tilde{p}\pm a_{1}m_{1}\pm a_{2}m_{2}\\&\subseteq \tilde{p}\pm \varepsilon \tilde{p}\pm \max \left( a_{1},a_{2}\right) \left( m_{1}+m_{2}\right) . \end{aligned}$$

From this it follows that the signs of p and \(\tilde{p}\) are equal if

$$\begin{aligned} \left( 1-\varepsilon \right) \left|\tilde{p}\right|>\max \left( a_{1},a_{2}\right) \left( m_{1}+m_{2}\right) . \end{aligned}$$
(7)

The inequality

$$\begin{aligned} \left|\tilde{p}\right|>a_{3}\left( m_{1}+m_{2}\right) \end{aligned}$$
(8)

is equivalent to

$$\begin{aligned} \left( 1-\varepsilon \right) \left|\tilde{p}\right|>a_{3}\left( 1-\varepsilon \right) \left( m_{1}+m_{2}\right) , \end{aligned}$$

which implies (7), so (8) is also a sufficient condition. Lastly, we see that

$$\begin{aligned} a_{3}\left( m_{1}+m_{2}\right) \le&a_{3}\left( 1+\varepsilon \right) \left( m_{1}\oplus m_{2}\right) \\ \le&a_{4}\odot \left( m_{1}\oplus m_{2}\right) , \end{aligned}$$

where the second step uses the no-underflow assumption. Hence,

$$\begin{aligned} \left|\tilde{p}\right|>a_{4}\odot \left( m_{1}\oplus m_{2}\right) =:e \end{aligned}$$

is a sufficient condition for the signs of p and \(\tilde{p}\) being equal. It remains to consider the case \(e=0\). If \(a_{1}\) and \(a_{2}\) are both zero, then \(\tilde{p}\) is a simple expression and its sign is trivially correct, which makes f always valid. If either \(a_{1}\) or \(a_{2}\) is non-zero, then \(a_{4}\) is easily seen to be non-zero too and e can only be zero if both \(m_{1}\) and \(m_{2}\) are zero, since we assumed that no underflow occurs. If \(m_{1}=m_{2}=0\), then by Lemma 2, \(\tilde{p}_{1}=\tilde{p}_{2}=p_{1}=p_{2}=0\), and \(\tilde{p}=p=0\).

The proof for 2. is analogous except for the constant \(u_{\textrm{S}}\) being added to e in place of assuming no underflow occurring, the omittance of the case \(e=0\), which can not occur in 2 and the usage of Lemma 3 in place of Lemma 2.

The proofs for 3. and 4. are analogous to the proofs for 1. and 2. \(\square \)

Remark 5

Note that the constants \(a_{3}\) and \(a_{4}\) do not depend on the input but only on the expression of \(\tilde{p}\) so in practice they can be computed at compile-time in floating point or exact arithmetic.

A detailed example for the construction of a filter based on this approach can be found in a Jupyter notebook using the Cling-Kernel [33] in [4].

This filter differs from the semi-static filters (called stage A) in [30], which do not guarantee valid results in cases of underflow. It also differs from the semi-static filters generated by FPG [24] because only a single condition is evaluated, rather than three conditions, which means that most predicate calls for non-degenerate inputs can be decided on a code path with a single, well-predictable branch.

With these two properties, having only a single branch on the filter success code path and validity for inputs that can cause underflow, this procedure to construct semi-static filters can be seen as a generalisation of the semi-static 2D orientation filter presented in [26]. For the 2D orientation predicate, in particular, our approach produces a more pessimistic error bound than [26], which could be considered as the price to pay for using a more general method.

The semi-static error bound e can be turned into a static error bound by evaluating \(m_{1}\oplus m_{2}\) not in specific input values \(x_{1},\ldots ,x_{m}\) but in bounds on these values, \(\left[ \underline{x}_{1},\overline{x}_{1}\right] ,\ldots ,\left[ \underline{x}_{m},\overline{x}_{m}\right] \) using interval arithmetic, or by obtaining its maximum over some more general domain in \(F^{m}\). This yields a static or almost static filter.

3.3 Zero-filter

With underflow protection, the right-hand side of our semi-static filter condition will never be zero, hence the filter will always fail, as in returning “uncertain”, if the true sign of p is 0. For inputs in F that approximate a uniform distribution on an interval in \(\mathbb {R}\), \(p=0\) is extremely unlikely, but in some practical input data, it might be more common.

Example 6

Consider the 2D orientation predicate with the floating-point realisation

$$\begin{aligned} \tilde{p}=\left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \ominus \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) . \end{aligned}$$

It is easy to check that if either point a or b coincides with point c or if all points share the same x or y coordinate and no overflow occurs, then \(\tilde{p}\) evaluates to zero. Such cases can be common degeneracies in real-world data.

It can also be checked that the error bound e from our semi-static filter without underflow-protection would be zero in either of these cases, so such degeneracies can be decided quickly by the filter. The error bound of the \(\textrm{UFP}\)-variation of our filter, though, would not zero because non-zero terms would be introduced in the error bounds of the multiplications and in the definition of e itself.

In this case, a simple filter that can certify common cases for inputs that produce zeroes, can be useful. Such a filter can be produced using the following rules.

Definition 6

(Zero-Filter) Let \(\tilde{p}\) be a floating-point realisation of a polynomial and let \(x_{1},\ldots ,x_{m}\in F\) a given set of input values. We define the following rules.

  1. 1.

    For a subexpression \(\tilde{q}\) of the form \(\tilde{q}=c\) for some constant \(c\in F\) or input value \(x_{i}\) for \(i\in \left\{ 1,\ldots ,m\right\} \), we define

    $$\begin{aligned} Z_{1}\left( \tilde{q};x_{1},\ldots ,x_{m}\right) ={\left\{ \begin{array}{ll} \text {true}, &{} c=0\\ \text {false}, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
  2. 2.

    For a subexpression \(\tilde{q}\) of the form \(\tilde{q}=x_{i}\circledcirc x_{j}\) for \(1\le i,j\le m\) and \(\circledcirc \in \left\{ \oplus ,\ominus \right\} \), we define

    $$\begin{aligned} Z_{2}\left( \tilde{q};x_{1},\ldots ,x_{m}\right) ={\left\{ \begin{array}{ll} \text {true}, &{} \tilde{q}=0\\ \text {false}, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
  3. 3.

    For a subexpression \(\tilde{q}\) of the form \(\tilde{q}=\tilde{q}_{1}\circledcirc \tilde{q}\) for \(1\le i,j\le m\) and \(\circledcirc \in \left\{ \oplus ,\ominus \right\} \), we define

    $$\begin{aligned} Z_{3}\left( \tilde{q};x_{1},\ldots ,x_{m}\right) = Z\left( \tilde{q}_{1};x_{1},\ldots ,x_{m}\right) \wedge Z\left( \tilde{q}_{2};x_{1},\ldots ,x_{m}\right) . \end{aligned}$$
  4. 4.

    For a subexpression \(\tilde{q}\) of the form \(\tilde{q}=\tilde{q}_{1}\odot \tilde{q}_{2}\), we define

    $$\begin{aligned} Z_{4}\left( \tilde{q};x_{1},\ldots ,x_{m}\right) = Z\left( \tilde{q}_{1};x_{1},\ldots ,x_{m}\right) \vee Z\left( \tilde{q}_{2};x_{1},\ldots ,x_{m}\right) . \end{aligned}$$

We define \(Z\left( \tilde{q};x_{1},\ldots ,x_{m}\right) \) to be the result of the first applicable rule out of \(Z_{1}\), \(Z_{2}\), \(Z_{3}\), and \(Z_{4}\).

The zero-filter returns the sign 0 if \(Z\left( \tilde{p};x_{1},\ldots ,x_{m}\right) \) is true and “uncertain” otherwise. It is easy to verify that this filter is valid for all inputs regardless of range issues such as overflow or underflow.

Example 7

Consider again the 2D orientation predicate as in Example 6. Applying Definition 6 to \(\tilde{p}\), we first apply \(Z_3\) to \(\tilde{p}\), which gives us the condition

$$\begin{aligned} Z\left( \left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \right) \wedge Z\left( \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \right) \end{aligned}$$

For each subexpression, we can apply \(Z_4\) to obtain

$$\begin{aligned} Z\left( a_{x}\ominus c_{x}\right) \vee Z\left( b_{y}\ominus c_{y}\right) \wedge Z\left( a_{y}\ominus c_{y}\right) \vee Z\left( b_{x}\ominus c_{x}\right) \end{aligned}$$

and finally, applying \(Z_2\) for each difference

$$\begin{aligned} \left( a_{x}\ominus c_{x} = 0\right) \vee \left( b_{y}\ominus c_{y} = 0\right) \wedge \left( a_{y}\ominus c_{y} = 0\right) \vee \left( b_{x}\ominus c_{x} = 0\right) , \end{aligned}$$

which is a sufficient condition of the sign of p being 0 for the given inputs that can be used as a second filter stage after a filter with underflow-protection that is not able to decide simple degenerate cases, like all points sharing the same x- or y-coordinate.

4 Numerical results

The exact predicates derived in the previous section are designed to be fast, applicable to general polynomial expressions, and simple to use. These goals must be reflected in their implementation, which is briefly covered before benchmark results are presented.

4.1 C++ implementation

Our implementation of exact predicates is based on C++ template and constexpr metaprogramming, making use of the Boost.Mp11 library [12]. The main design goal is the avoidance of runtime overhead like the one that was seen in the C++-wrapper implementation in [8] because geometric predicates can be found on performance-critical code paths in geometric algorithms and can make up a large proportion of overall runtime as the benchmarks in Sect. 4.2 show. Further design goals include flexibility and extensibility with regard to the choice and order of filters as well as expressivity and simplicity in the definition of predicate expressions.

Exact predicates are implemented as variadic class templates for staged predicates that hold a tuple of zero or more stages implementing filters or the exact arithmetic evaluation. If all filters are semi-static, the instantiated class is stateless and can be constructed without arguments and with no runtime cost. The static parts of semi-static error bounds are computed at compile-time from the predicate expression and static type information for the calculation type. If almost static or static filters are included, input bounds need to be provided at construction for the computation of error bounds. For almost static filters, an update member function is provided to update error bounds. The exact predicate is called through a variadic function that takes a variable but compile-time static number of inputs in the calculation type and returns an integer out of \(-1,0\) and 1, that represents the result sign.

The individual stages are expected to follow the same basic interface. Each stage provides at least a member function that is called with input values and returns an integer that represents either the result sign or a constant that indicates uncertainty. For stages that require the computation of runtime constants, e.g. static and almost static filters, constructor and update members need to be implemented as well. Otherwise, the stages are default constructed at no runtime cost. This general interface allows users of the library to extend exact predicates with custom filters beyond those provided by our implementation to better suit their algorithms and data sets, such as the filter shown in Example 8.

figure e

Example 8

The 2D incircle predicate on four 2D points \(p_{1},\ldots ,p_{4}\) decides whether \(p_{4}\) lies inside, on or outside of the oriented circle passing through \(p_{1},p_{2}\) and \(p_{3}\), assuming the points do not lie on a line. A pattern of degenerate inputs are four points that form a rectangle. For this input, \(p_{4}\) clearly lies on the circle (indicated by a sign of 0) but a forward error bound filter could classify the case as undecidable and forward it to a computationally expensive exact stage. The following listing illustrates a custom filter that conforms to the previously described interface and could be used with our implementation of staged predicates.

figure f

At the core of the implementation is the compile-time processing of polynomial expressions for the derivation of error bound expressions. Arithmetic expressions are represented in the C++ type system using expression templates, a technique described in [34]. The most basic expressions in our implementation are types representing the leaves of expression trees. Those leaves are either compile-time constants (indexed with zero) or input values (indexed with a positive number). More complex expressions can be built from these placeholders using the elementary operators +, − and *.

Forward error bound expression types are deduced at compile-time based on a list of rule class templates. The interface of each rule class template requires a constexpr function that expects an expression template and returns a bool indicating whether the rule is applicable to the expression, and a class template for the error bound based on the rule. Error bounds are implemented in the form of constexpr integer arrays that represent the coefficients of the polynomial in \(\varepsilon \) and a magnitude expression template. The rules can be extended through custom rules that conform to the interface.

figure g

Example 9

Consider a 2D orientation problem for points whose coordinates are not binary floating-point numbers, e.g. because the input is given in a decimal or rational format. The rules given in Definition 5 are not designed for this problem but with a custom error bound rule, our implementation can be extended to generate a filter for inputs that are rounded to the nearest floating-point number. Such a filter could be used before going into a more computationally expensive stage operating on decimal or rational numbers.

figure h

The listing illustrates a custom rule. It is only applicable for expressions that are input values, i.e. expressions of the form \(\tilde{q}\left( x_{1},\ldots ,x_{n}\right) =x_{i}\). In the context of our implementation these expressions are leaves of the expression tree with a positive index, and the error bound is \(R\left( \tilde{q}\right) =\left( \varepsilon ,\left|x_{i}\right|\right) \). Using a rule set consisting of this custom rule, \(R_{6,0}\) and \(R_{7,0}\) on the 2D orientation predicate, yields the semi-static error bound

$$\begin{aligned} \left( 5\varepsilon \oplus 32\varepsilon ^{2}\right) \left( \left( \left|a_{x}\right|\oplus \left|c_{x}\right|\right) \odot \left( \left|b_{y}\right|\oplus \left|c_{y}\right|\right) \oplus \left( \left|a_{y}\right|\oplus \left|c_{y}\right|\right) \odot \left( \left|b_{x}\right|\oplus \left|c_{x}\right|\right) \right) . \end{aligned}$$

Besides forward error bound based filters discussed in this paper, our implementation also contains templates for filters and exact stages based on the same principles as the stages B and D in [30].

4.2 Benchmarks

To test the performance of our approach and implementation, we measured timings for a number of benchmarks that are provided by the CGAL library. The design of the 2D and 3D geometry kernels concept in CGAL as documented in [7] provide a simple way to test our predicates in CGAL algorithms by deriving from the Simple_cartesian<double> kernel and overriding all predicate objects that may suffer from rounding errors with predicates generated from our implementation.

The performance with the resulting custom kernel is then compared to the performance of CGAL’s Exact_predicates_inexact_constructions_kernel, which follows a similar paradigm of filtered, exact predicates.

All benchmarks were run on a GNU/Linux workstation with a Intel Core i7-6700HQ CPU using the performance scaling governor, no optional mitigations against CPU vulnerabilities such as Spectre or Meltdown, and disabled turbo for consistency. All code was compiled with GCC 11.1 and the flags “-O3 -march=native”. The installed versions of relevant libraries were CGAL 5.4, GMP 6.2.1, MPFR 4.1.0, and Boost 1.79. The code for all benchmarks with instructions on how to replicate them can be found in [4].

4.2.1 2D delaunay triangulation

The 2D Delaunay Triangulation algorithm provided by the CGAL library makes use of the 2D orientation and incircle predicates, which compute the sign of the following expressions:

$$\begin{aligned} \begin{aligned}p_{\text {orientation}\_2}&=\begin{vmatrix} a_{x}-c_{x}&a_{y}-c_{y}\\ b_{x}-c_{x}&b_{y}-c_{y} \end{vmatrix}\\ p_{\text {incircle}\_2}&=\begin{vmatrix} a_{x}-d_{x}&a_{y}-d_{y}&\left( a_{y}-d_{y}\right) ^{2}+\left( a_{y}-d_{y}\right) ^{2}\\ b_{x}-d_{x}&b_{y}-d_{y}&\left( b_{y}-d_{y}\right) ^{2}+\left( b_{y}-d_{y}\right) ^{2}\\ c_{x}-d_{x}&c_{y}-d_{y}&\left( c_{y}-d_{y}\right) ^{2}+\left( c_{y}-d_{y}\right) ^{2} \end{vmatrix}. \end{aligned} \end{aligned}$$

2D Delaunay Triangulations were computed for two data sets of randomly generated points. The coordinates were sampled either from a continuous uniform distribution (using CGAL’s Random_points_in_square_2 generator) or from an equidistant grid (using CGAL’s points_on_square_grid_2 generator) and shuffled with 1,000,000 points in each data set. For the continuous distribution, we found a 4.2% performance penalty for the use of underflow guards, which can be explained by the slightly more expensive error expressions. With or without underflow guards, our implementation performed faster than the CGAL predicates, see (a) in Fig. 1. This is expected because all calls can be decided on a code path with a single, well-predictable branch.

Fig. 1
figure 1

This chart shows the relative runtime for the construction of a Delaunay triangulation of 1,000,000 points with coordinates sampled from a a continuous uniform distribution and b an equidistant grid, as well as for CGAL mesh c polygon processing and d refinement benchmarks. For each benchmark, our filters with and without underflow protection are compared to the predicates implemented in CGAL

For the points sampled from the equidistant grid, the triangulations were, in general, much slower, which is also expected because the input is designed to be degenerate and trigger edge cases. Our predicates with underflow protection and the predicates in CGAL show very similar performance (roughly 0.2% difference), while our filter without underflow protection is significantly faster, see (b) in Fig. 1.

By construction, our semi-static filter with underflow protection fails for all cases in which the true sign is zero, most of which can be decided by the zero-filter, though. Table 2 shows the number of filter failures in the first filters for each predicate. For a graphical comparison of the precision of 2D orientation filters, see Fig. 2.

Table 2 Number of filter failures for the 2D orientation and 2D incircle predicate with various semi-static filters when constructing the Delaunay triangulation of 1,000,000 points sampled from an equidistant grid
Fig. 2
figure 2

This figure shows the result of calls to the non-robust 2D orientation predicate and to three 2D orientation filters respectively for the points (20.1, 20.1), (18.9, 18.9) and a small neighbourhood of the point (3.5, 3.5), such that neighbouring pixels represent points with neighbouring floating-point coordinates. The point (3.5, 3.5) is marked with a black circle. The dimensions of the neighbourhood are roughly \(6\times 10^{-13}\) in width and \(3\times 10^{-13}\) in height. The colours represent left side (red), collinear (green), right side (blue) and uncertain (yellow). The pattern of green points in (a) shows that the naive predicate produces many incorrect results. Our filter (b) is more precise than FPG (c) but less precise than the significantly slower interval filter. Ozaki’s filter produces the exact same image as our filter (colour figure online)

4.2.2 3D polygon mesh processing

The next benchmark was taken from the Polygon Mesh processing benchmark in CGAL. For this benchmark, first, a 3D mesh is taken, and a polyhedral envelope with a distance \(\delta \) is taken around it. The polyhedral envelope is an approximation of the Minkowski sum of the mesh with a sphere, also known as a buffer. Then, three points are repeatedly chosen in a loop, and if they form a non-degenerate triangle, it is tested whether that triangle is contained in the polyhedral envelope or not. As input, we use the file pig.off, which is provided as a sample in the CGAL tree, and for \(\delta \) we chose 0.1. This is described in more detail in [22].

The algorithm makes use of the 3D orientation predicate defined as the sign of

$$\begin{aligned} \begin{aligned}p_{\text {orientation}\_3}&=\begin{vmatrix} a_{x}-d_{x}&a_{y}-d_{y}&a_{z}-d_{z}\\ b_{x}-d_{x}&b_{y}-d_{y}&b_{z}-d_{z}\\ c_{x}-d_{x}&c_{y}-d_{y}&c_{z}-d_{z} \end{vmatrix}\end{aligned}. \end{aligned}$$

No filter failures were recorded for either implementation, and no performance penalty was measured for the underflow protection. The predicates provided by CGAL caused an additional runtime of around 28% compared to our implementation, see (c) in Fig. 1.

4.2.3 3D mesh refinement

As the last benchmark, we measure the runtime of 3D mesh refinement with CGAL. The algorithm and its parameters are explained in [1]. The predicates used in this benchmark are the 3D orientation predicate and the power side of oriented power sphere predicate, which is defined as the sign of the following expression

$$\begin{aligned} p&=\begin{vmatrix} a_{x}-e_{x}&a_{y}-e_{y}&a_{z}-e_{z}&\left( a_{x}-e_{x}\right) ^{2}+\left( a_{y}-e_{y}\right) ^{2}+\left( a_{z}-e_{z}\right) ^{2}+\left( e_{w}-a_{w}\right) \\ b_{x}-e_{x}&b_{y}-e_{y}&b_{z}-e_{z}&\left( b_{x}-e_{x}\right) ^{2}+\left( b_{y}-e_{y}\right) ^{2}+\left( b_{z}-e_{z}\right) ^{2}+\left( e_{w}-b_{w}\right) \\ c_{x}-e_{x}&c_{y}-e_{y}&c_{z}-e_{z}&\left( c_{x}-e_{x}\right) ^{2}+\left( c_{y}-e_{y}\right) ^{2}+\left( c_{z}-e_{z}\right) ^{2}+\left( e_{w}-c_{w}\right) \\ d_{x}-e_{x}&d_{y}-e_{y}&d_{z}-e_{z}&\left( d_{x}-e_{z}\right) ^{2}+\left( d_{x}-e_{z}\right) ^{2}+\left( d_{x}-e_{z}\right) ^{2}+\left( e_{w}-d_{w}\right) \end{vmatrix}\!, \end{aligned}$$

which has with \(d=5\) the highest degree of all predicates used in our benchmarks and is based on a non-homogeneous polynomial. As input file, we used elephant.off, which is provided as a sample in the CGAL source tree, with a face approximation error of 0.0068, a max facet sign of 0.003 and a maximum tetrahedron size of 0.006.

The underflow guard came with a slight performance penalty of around 1%, and the CGAL predicates were about 3.4% slower, see (d) in Fig. 1. There was a non-zero but negligible number of filter failures of around 0.1% for each of the predicates.

4.3 Error bound comparison

The following table compares error constants and error bounds for various semi-static filtering approaches for the 2D orientation predicate in the double-precision floating-point system. Underflow guards are omitted and error constants are rounded to 9 digits for readability.

Filter

Static constant

Variable component without underflow guards

[30]

\(3.3306690739\times 10^{-16}\)

\(\left| \left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \right| \oplus \left| \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \right| \)

[26]

\(3.3306690622\times 10^{-16}\)

\(\left| \left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \oplus \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \right| \)

Our

\(3.3306690622\times 10^{-16}\)

\(\left| \left( a_{x}\ominus c_{x}\right) \odot \left( b_{y}\ominus c_{y}\right) \right| \oplus \left| \left( a_{y}\ominus c_{y}\right) \odot \left( b_{x}\ominus c_{x}\right) \right| \)

[24]

\(8.8872057373\times 10^{-16}\)

\(\max \left( \left| a_{x}\ominus c_{x}\right| ,\left| b_{x}\ominus c_{x}\right| \right) \cdot \max \left( \left| a_{y}\ominus c_{y}\right| ,\left| b_{y}\ominus c_{y}\right| \right) \)

[8]

\(8.8817841970\times 10^{-16}\)

\(\left( \left| a_{x}\right| \oplus \left| c_{x}\right| \right) \odot \left( \left| b_{y}\right| \oplus \left| c_{y}\right| \right) \oplus \left( \left| a_{y}\right| \oplus \left| c_{y}\right| \right) \odot \left( \left| b_{x}\right| \oplus \left| c_{x}\right| \right) \)

Since the variable component in FPG omits the addition, its error bound should be halfed for comparison to the first three filters in the table. Still, it can be seen that the approaches in FPG [24] and by Burnikel et al. [8] produce more pessimistic error bound constants than the other filters. The approach by Ozaki et al. [26] obtains a slight improvement over the error bound constant by Shewchuk [30], which we directly use in rule \(R_5\) in Definition 5 to obtain a similar constant.

With regard to the input-dependent component, we generally obtain the same expressions as Shewchuk. In comparison, the expression by Ozaki et al. produces smaller error bounds when the products have opposite signs but in these case, there is no cancellation in the determinant computation anyway and the filter would not fail, so this mainly saves one instruction for the computation of the absolute value. The expression generated by the approach of Burnikel et al. does not use that the error of the initial differences, e.g. \(a_x \ominus c_x\), can be bounded just in terms of their result, e.g. \(\left|a_x \ominus c_x \right|\), and will produce much higher values for points that are relatively close to each other.

The expressions generated by FPG are very different because they are based on the idea of computing the rounding error for polynomial under the assumption that all inputs are scaled to 1 and then rescaled, using the maxima for each group of coordinates. A disadvantage of this expression is that it loses much of the polynomials original structure and, for example, produces more pessimistic estimates than the first three expressions when a and c or b and c are equal or very close.

5 Conclusion

We have presented a recursive scheme for the derivation of (semi-)static filters for geometric predicates. The approach is branch-efficient, sufficiently general to handle rounding errors, overflow and underflow and can be applied to arbitrary polynomials.

Our C++-metaprogramming-based implementation is user-friendly in so far as it requires no code generation tools, additional annotations for variables or manual tuning. This is achieved without the additional runtime overhead of previous C++-wrapper-based implementations, and our measurements show that our approach is competitive with and even outperforms the state-of-the-art in some cases.

Future work could include generalisations toward non-polynomial predicates and robust predicates on implicit points that occur as results or interim results of geometric constructions and may not be explicitly representable with floating-point coordinates. The implementation may also be extended in the future to include further filtering stages to improve the performance for common cases of degenerate inputs.