1 Introduction

Deductive verification has been successful in providing functional verification for programs written in popular programming languages such as Java [2, 21, 40, 48], Python [27], Rust [4], C [23, 53], and Ada [17, 49]. Deductive verifiers allow a user to annotate methods in a program with pre- and postconditions, from which verification conditions (VCs) are automatically generated. These are then either proven directly by the verifier itself, or discharged with external tools such as automated Satisfiability Modulo Theories (SMT) solvers or interactive proof assistants.

While deductive verifiers fully implement many sophisticated data representations (including heap data structures, objects, and ownership), support for floating-point numbers remains rather limited—solely Frama-C and SPARK offer automated support for floating-point arithmetic in C and Ada [30]. This state of affairs is at least partially a result of previous limitations in floating-point support in SMT solvers. Consequently, deductive verification has been used for floating-point programs only by experts with considerable manual effort [13, 30]. This is unfortunate as it makes deductive verification unavailable for a large number of programs across many domains including embedded systems, machine learning, and scientific computing. With the increasing need for parallelization in code, scientific computing specifically has recently experienced algorithmic challenges for which formal methods may contribute to a solution [8, 56].

One of the main challenges of floating-point arithmetic is its unintuitive behavior and the special values that the IEEE 754 standard [38] introduces. For instance, an overflow or a division by zero results in the special value (positive or negative) infinity, and not a runtime exception. Similarly, invalid operations like sqrt(-1.0) result in a Not a Number (NaN) value. These special values are problematic as seemingly straightforward identities do not hold (x == x or x * 0.0 == 0.0). In addition, every operation on floating-point numbers potentially involves rounding, which compromises familiar rules like associativity and distributivity. Hence, reasoning support for writing correct floating-point programs is indispensable.

Abstract interpretation-based tools can prove the absence of runtime errors and special values [18, 42] and bound roundoff errors due to floating-point’s finite precision [9, 19, 24, 35, 58]. SMT decision procedures [15] or SAT-based model-checking [22, 56], on the other hand, can prove intricate properties requiring bit-precise reasoning. However, these techniques and tools largely support only purely floating-point programs or program snippets, or analyze programs only up to a predefined depth of the call stack. General reasoning about real-world object-oriented programs, however, requires support for more than floating-point arithmetic. Many realistic programs mix floating-point types with other primitive types, like integers, including (implicit) type casts and moreover use object-oriented concepts, like the (unbounded) object heap. This necessitates different analyses which need to be integrated with floating-point reasoning.

Handling floating-points in a deductive verifier has unique advantages. First, the deductive verification approach already comes with the infrastructure for reasoning about complex control and data structures (like exception handling and heaps). Second, it allows one to flexibly combine the verifier’s symbolic execution reasoning with external decision procedures. Third, depending on the theory support, the verifier or external solver may also generate counterexamples of a property and thus help program debugging—something an abstract interpretation-based approach fundamentally cannot provide.

We report on adding floating-point support to the KeY deductive verifier, providing the first automated deductive floating-point support for the Java programming language. Among others, we verify the absence of the special values infinity and NaN. While those are helpful in particular circumstances (like carefully designed implementations of numeric analysis algorithms), for most applications they indicate an error. Hence, showing their absence is a prerequisite for further (functional) reasoning. Moreover, our extension allows to express and discharge functional properties relating pre- and poststates, including bounds on roundoff errors and bounds on differences between two similar floating-point programs, which we also demonstrate.

We exploit both KeY ’s symbolic execution and external SMT support. On the one hand, we handle arithmetic operations by relying on a combination of KeY ’s symbolic execution to handle the heap and SMT-based decision procedures to handle the floating-point part of the VCs. On the other hand, we support transcendental functions via axiomatization in the KeY prover itself.

Transcendental functions such as sine are a common feature in engineering applications, but are not supported by floating-point decision procedures. We explore two ways of supporting them soundly but approximately, by encoding them as axiomatized uninterpreted function symbols once directly in the SMT queries, and once in additional calculus rules in KeY . Our evaluation shows that even though such reasoning is approximate, it is nonetheless sufficient to prove the absence of special values in many interesting programs.

We evaluate KeY ’s floating-point support on a number of real-world floating-point Java programs, verifying the absence of special values as well as functional correctness, including programs with loops (with the help of loop invariants). Our benchmark set allows us to evaluate recent progress in SMT floating-point support in Z3 [26], CVC4 [6] and MathSAT [20] on yet unseen benchmarks. For instance, we observe that quantifiers are challenging even if they do not affect satisfiability of SMT queries. Our benchmarks are openly available, and we expect our insights to be useful for further solver development.

Contributions In summary, we make the following contributions:

  • we implement and evaluate the first automated deductive verification of floating-point Java programs by combining the strength of rule-based and SMT-based deduction;

  • we develop novel automated support for reasoning about transcendental functions in a deductive verifier;

  • we add support for reasoning about (potentially rounding) type conversions;

  • we collect a new set of challenging real-world floating-point benchmarks in Java;Footnote 1

  • we compare different SMT solvers for discharging floating-point VCs on this new set of benchmarks.

This article extends a previous conference paper [1] by more floating-point arithmetic background, the presentation of all rules for transcendental functions and sqrt, new rules—and accordingly experiments—for (potentially rounding) casts between integers and floats, new experiments on the verification of floating-point loop invariants, new experiments on the sensitivity of SMT solvers to numerous problem modifications, and extended reporting of the experiments presented in [1].

2 Background

2.1 Introduction to KeY

KeY [2] is a platform for deductive verification of Java programs, working at a source code level. The input is a Java program annotated in the Java Modeling Language (JML) [44], encouraging a Design by Contract ([45, 50]) approach to software development. The user specifies the expected behavior of Java classes with class invariants that the program has to maintain at critical points. Methods are specified with method contracts, consisting mainly of pre- and postconditions, with the understanding that if the precondition holds when the method is called, the postcondition has to hold after the method returns.

After loading an annotated program, KeY translates it to a formula in Java Dynamic Logic [2] (JavaDL), an instance of Dynamic Logic [36] which enables logical reasoning about Java programs. Logical rules are provided for the translation of programs into first-order logic, and for closing the resulting goals, or proof obligations. KeY is semi-interactive in that it allows manual rule application, while also offering powerful built-in automation and macros.

The rules are written in KeY as taclets,

calculus rule schemata, implementing rewrite rules. One example taclet that rewrites any expression matching \(x+0\) to x is shown below:

In general, taclets have the following form:

figure b

It consists of a schematic find term \(t_{ find }\) which has to match an expression or formula in the current goal, and a term \(t_{ replace }\) which replaces the matched expression or formula. New formulas may be introduced as new assumptions onto the proof goal using add. If the rule has a side condition \(\varphi _{ show }\) that needs to be established, it can be optionally specified using show that will open a new proof goal for \(\varphi _{show}\). One of the clauses for add and replace may be empty or omitted. The taclet language supports typed meta-variables and heuristics for KeY ’s automation.

In addition to the application of taclets, KeY also supports the translation of open goals into the common SMT input format SMT-LIB [7] and the calling of an external SMT solver. For specific theories, SMT solvers can be much more efficient than KeY ’s own, rule-based reasoning, while other goals are more effectively dealt with by KeY rather than SMT solvers. KeY therefore allows to discharge some goals with SMT solvers, and others with KeY ’s own, rule-based proof engine.

2.2 Floating-point arithmetic in Java

Among the primitive (i.e., non-reference) types of Java, there are two which represent floating-point numbers, namely float and double. They are associated with the 32-bit and 64-bit format, respectively, as specified in the IEEE 754 Standard for Floating-Point Arithmetic. More precisely, Java implements a subset of that standard. In the following, we summarize some central characteristics of Java floating-point numbers, loosely following Muller et.al. [52]. Most of this is not specific to Java, but more generally applies to IEEE 754. Note that the Java Virtual Machine (JVM) only supports floating-point numbers with base 2, even if the Java language syntax supports base 10 as well, in such a way that parsing and input/output routines translate back and forth between the bases.

Each (base 2) floating-point number x (except the special values \(+\infty \), \(-\infty \), and NaN, see below) with precision p can be represented as a triplet (sme), such that \(x = (-1)^s * m * 2^e\), where \(s \in \{0,1\}\) is the sign, m (called significand) is a binary fixed-point number with one digit before the radix point and \(p-1\) digits after the radix point (note that \(0 \le m < 2\)), and e (exponent) is an integer such that \(e_{min} \le e \le e_{max}\). Java supports two floating-point formats (both in base 2): float with \(p = 24\), \(e_{min} = -126, e_{max} = 127\) and double with \(p = 53, e_{min} = -1022, e_{max} = 1023\).

Whenever the result of a computation cannot be exactly represented with the given precision, it is rounded. IEEE 754 defines various rounding modes, of which Java only supports round to nearest, ties to even. Rounding is exact, as if one would first compute the ideal real number, and round afterwards. Note that rounding may even occur when the exact mathematical result of a computation corresponds to an integer number. The single precision type float has a significand of 24 bits and can therefore not exactly represent all integers of the 32-bit type int. However, integers whose absolute value is smaller than \(2^{24}\) can be represented in an exact way in the type float.

A number x could potentially be represented by different (sme) triples. However, it has many computational advantages to restrict the floating-point numbers to a normal form. Therefore, m is always chosen such that \(1 \le m < 2\) wherever possible (these are called the normal numbers) and \(0 \le m < 1\) only were necessary (these are called the subnormal numbers). Note that, for subnormal numbers, \(e = e_{min}\). Also note that, for subnormal numbers, the relative rounding error can be much worse than for normal numbers. On the other hand, the existence of subnormals guarantees the following equivalence: \((x>y)\leftrightarrow (x-y > 0)\). Without subnormals, this would not hold, because the difference between two normal floating-point numbers can be smaller than any normal floating-point number.

The triple representation gives us two zeros, \(+0\) and \(-0\), represented by (0, 0, 0) and (1, 0, 0), respectively.Footnote 2 The Boolean expressions \(+0.0\)==\(-0.0\) and \(-0.0\)==\(+0.0\) both return true. If the absolute value of the ideal result of a computation is too small to be representable as a floating-point number of the given format, the resulting floating point number is \(+0\) or \(-0\). In addition, there are three special values, \(+\infty \), \(-\infty \), and NaN (Not a Number). If the absolute value of the ideal result of a computation is too big to be representable as a floating-point number of the given format, the result is \(+\infty \) or \(-\infty \). Also, division by zero will give an infinite result (e.g., \(7.3/+0=+\infty \)).Footnote 3 Computing further with infinity may give an infinite result (e.g., \(+\infty + +\infty = +\infty \)), but may also result in the additional ‘error value’ NaN (e.g., \(+\infty - +\infty = \text {NaN}\)). The predicates return false as soon as one operand evaluates to NaN. The predicate != returns true as soon as one operand evaluates to NaN. In particular, NaN==NaN returns false, and NaN!=NaN returns true. Due to the presence of infinities and NaN, floating-point operations do not throw Java exceptions.Footnote 4

By default, the Java virtual machine is allowed to make use of higher-precision formats provided by the hardware. This can make computation more accurate, but it also leads to platform-dependent behavior. This can be avoided by using the strictfp modifier, ensuring that floating-point computations inside methods or classes with the strictfp modifier comply to the precision defined in the IEEE 754 Standard, even for intermediate results. This modifier ensures portability. Additionally, one can set the Assume strictfp option in KeY, meaning that all computations are assumed to be within the scope of a strictfp modifier, even if strictfp is not explicit in the code.

3 Floating-point support in KeY

3.1 Arithmetics

In order to be able to specify and verify programs containing floating-point numbers, we made several extensions to the KeY tool. We added the float and double types to the KeY type system, introduced functions and predicate symbols to formalize arithmetic operations (+, *, ...), and comparisons (, ==, ...) on floating-point expressions, and added cast operations among floating-point and integer types ((double), (float), (int)). The translation supports both code with and without the strictfp modifier. However, since the actual precision of non-strictfp operations is not known, the function symbols remain uninterpreted. We assume that the strictfp modifier is set for all our benchmarks. We extended KeY ’s parser to correctly handle programs and annotations containing floating-point numbers and added logic rules for translating floating-point expressions from Java or JML to JavaDL.

As an example, Listing 1 shows a JML specification of our Rectangle benchmark that contains floating-point literals and makes use of the fp_nan and fp_nice predicates. fp_nan states that a floating-point expression is NaN and fp_nice, states that a floating-point expression is neither NaN nor infinity. The scale method contains two contracts that are checked separately, ensuring that the class fields of a scaled rectangle object are not NaN, considering different preconditions. For the first contract, the SMT solver produces a counterexample. In the second, we bound inputs by concrete ranges that we picked arbitrarily and get the valid result. In practice, such ranges would come from the context, e.g., from the kind of rectangles that appear in an application, or from known ranges of sensor values.

Concerning discharging the resulting proof obligations, there were two main ways to consider. One is to create a floating-point theory within KeY by adding axioms and deduction rules, so that the desired properties can be proven in KeY ’s calculus. The other way is to translate the proof obligations from JavaDL to SMT-LIB and call an external SMT solver. While the KeY approach traditionally favors conducting proofs with KeY ’s own, rule-based reasoning engine, we partially deviated from this way in order to harness the greater efficiency of SMT solvers when it comes to the combinatorically heavy reasoning about floating-point arithmetic. Our approach attempts to get the best of both worlds by distinguishing between basic floating-point arithmetic, i. e., elementary operations and comparisons, and more complex functions which do not have an SMT-LIB equivalent (e. g., the transcendental functions), or where rule-based reasoning is more effective than SMT solvers currently are (see Sect. 3.2.2).

figure e

Elementary operations and comparisons get translated to the corresponding SMT-LIB functions. In SMT-LIB, all floating-point computations conform to the IEEE 754 Standard. Therefore, only Java programs with the strictfp modifier can be directly translated to SMT-LIB without loss of correctness.

We developed a translation from KeY ’s floating-point theory to SMT-LIB. In order to integrate it into KeY, we also overhauled the existing translation from JavaDL to SMT-LIB to create a new, more modular framework, which now supports all the features of the original translation, e. g., heaps and integer arithmetic, but also floating-point expressions at the same time.

Floating-point intricacies sometimes require extra caution. For example, there are two different notions of equality for floats: bitwise equality and IEEE754 equality. Our implementation ensures these are distinguished correctly and that the specification language remains intuitive for a developer to use.

Using the translation to SMT-LIB, we can specify and prove two classes of properties in KeY, the absence of special values and functional properties. The absence of special values is specified using the fp_nan and fp_infinite predicates (or the fp_nice equivalent). Furthermore, one can specify functional properties (including loop invariants) that are expressible in floating-point arithmetic, e.g., one can compare the result of a computation against the result of a different program which is known to produce a good result or a reference value.

3.2 Transcendental functions

Floating-point decision procedures in SMT solvers successfully handle programs consisting of arithmetic and square-root operations. Many numerical real-world programs, however, include transcendental functions such as sin and cos. In Java programs, these functions are implemented as static library functions in the class java.lang.Math.

Unlike arithmetic operations, transcendental functions are much more loosely specified by the IEEE 754 Standard—only an upper bound on the roundoff error is given. Libraries are thus free to provide different implementations, and even tighter error bounds. Exact reasoning in the same spirit as floating-point arithmetic would thus have to encode a specific implementation. Given that these implementations are highly optimized, this approach would be arguably complex. We observe, however, that such exact reasoning about transcendental functions is often not necessary and a sound approximate approach is sufficient and efficient.

In this section, we introduce an axiomatic approach for reasoning about programs containing transcendental functions. We observe that with the flexibility of deductive verification and KeY itself, we can instantiate it in two different ways. We encode transcendental functions as uninterpreted functions and axiomatize them in the SMT queries. Alternatively, we encode these axioms in KeY as logical inference rules. In the following, we explain each of these solutions in more detail and later we will evaluate them on a set of benchmarks.

3.2.1 Axiomatization in SMT

We encode library functions as uninterpreted functions and include a set of axioms in the SMT-LIB translation for each method that is called in a benchmark. That is, we extended KeY such that when a transcendental function appears in the proof obligation, its declaration alongside all the axioms for that function is added to the translation.

For the axiomatization of transcendentals, we did not add rules that expand to a definition or allow a repeated approximation of the function value (like expansion into a Taylor series). Instead, we added a number of lemmata encoding interesting properties related to special values. For instance, the following axiom states that if the input to the sin function is not a NaN or infinity, then the returned value of sin is between \(-1.0\) and 1.0:

figure f

Note that this implies that the result is not a NaN or infinity. The other axioms are similar in spirit, so we do not list them.

These axioms are expressed as quantified floating-point formulas and capture high-level properties of library functions complying with the specifications in the IEEE 754 Standard. Clearly, since we do not have the actual implementations of these functions, we are not able to prove arbitrary properties. However, such an axiomatization is often sufficient to check for (the absence of) special values, i.e., NaN and infinity, as our experiments in Sect. 4.4 show.

3.2.2 Taclets in KeY

Reasoning about quantified formulas in SMT is a long-lasting challenge [32]. We have also observed in our experiments with only arithmetic operations (Sect. 4.3) that SMT solvers struggle with quantifiers in combination with floating-point numbers. We have therefore implemented an alternative approach encoding the axioms not in the SMT queries, but instead as deductive inference rules (called ‘taclets’) in KeY.

The rules encode the same logical information as the universally quantified assertions that we add in SMT-LIB (and where we leave the choice of instantiations entirely to the SMT/SAT solver). With our taclet approach, we instantiate a quantifier (only) to one’s needs. For instance, the taclet that corresponds to the SMT axiom mentioned in Sect. 3.2.1 is captured by the following taclet:

figure g

We note that this is an incomplete treatment of the formalized operations. Considering only some and not all possible quantifier instantiations buys us more closed proofs and shorter running times in some cases. However, it may also lead to spurious counterexamples (false positives) reported by the SMT solver in other cases.

A heuristic strategy applies the rules automatically using the occurrences of transcendentals as instantiation triggers. However, instantiating the axioms too eagerly considerably increases the number of open goals, which is why we assume that the user selects the axioms to apply manually (and did so in the experiments). After the application, the proof obligation can either be closed, i.e., proven, by KeY automatically, or be given to the SMT solver as before for final solving. Currently, the set of axioms (in the SMT-LIB translation and as taclets in KeY) only contains axioms for the transcendental functions occurring in our benchmarks, namely \(\sin \), \(\cos \), and \(\text {atan}\) functions. So far we have 10 axioms; however, adding more axioms (also for further transcendentals like exponentiation or logarithm) is straightforward.

The full set of axioms is listed in Fig. 1 (stated in natural language for presentation purposes).

Fig. 1
figure 1

List of added axioms to KeY for transcendental functions

3.3 Interaction of floats with other primitive types

For the formal verification of Java programs with floats, there are often not only floating-point numbers that have to be taken into consideration, but also heap data structures, arrays, and primitive values of other data types than float or double. Our SMT translation in KeY does not only target the floating-point theory, but encodes other aspects of proof obligations into theories like linear integer arithmetic for integers, or arrays representing heap data structures. In the evaluation, the state-of-the-art SMT solvers show their ability to reason about these theories when they occur simultaneously within the same proof obligation. As long as the different theories act on disjoint domains, this combined integrated translation works nicely in most cases. For instance, programs using float arrays or class attributes of type double can be handled as long as floating-point values and other types are not arithmetically combined within an expression.

However, there are relevant cases in which arithmetic expressions in Java programs have to contain both integer and floating-point variables, their domains thus becoming entangled. It is, for instance, a legal Java expression to add an integer value to a float value, yielding a float value. Such cases of intersecting value domains require special attention.

We call float-representable those 32-bit int values which can be represented in float without loss of precision. In particular, all integers between \(-2^{24}\) and \(2^{24}\) are float-representable.Footnote 5 According to the Java Language Specification (JLS) [33], an implicit widening cast called numeric promotion is applied when different primitive types are combined in an arithmetic operation. The addition iv + fv of an integer variable iv and a float variable fv is hence equivalent to the expression (float)iv + fv in which the numeric promotion has been made explicit.

The cast operator connects the domains of the two primitive types and maps integer values to floats by applying the same rounding operation which is applied to the result of arithmetic operations within floats. Since this family of cast operators on primitive types is translated from JML into JavaDL as uninterpreted functions, we can build a bridge between the integer and floating-point domains by adding theorems/axioms about the numeric promotion to KeY in form of taclets.

In KeY, we can add knowledge about the casting operator in form of taclet rules to the verification engine: The new taclets are implemented as conditional rewriting taclet rules introducing new constraints on the cast operator ‘\((\texttt{float})\)’.

A central property of float-representable integers is that the cast operation can be inverted, captured by the following taclet in KeY, formalising that casting an integer expression i first to a float and then back to an int is the identity if i stays within the range of float-representable values.

Since the JLS prescribes the same rounding mode for float operations and numeric promotion, it does not matter, for float-representable values, if they are first added and then cast or the way around:

Theorem 1   Let ab be float-representable integer values such that \(a+b\) does not int-overflow. Then, \(\mathtt {(float)}(a+b) = \mathtt {(float)}a + \mathtt {(float)}b\).

Proof We introduce an auxiliary injection function \(R: \texttt{float}\cup \texttt{int}\rightarrow \mathbb {R}\) canonically mapping float and integer values to their real-valued counterparts. The function \( rnd : \mathbb {R}\rightarrow \texttt{float}\) is the rounding function mapping real values to floats. According to the JLS, the same rounding function is used after floating-point addition (\(f_1 +_\texttt{float}f_2 = rnd (R(f_1) +_\mathbb {R}R(f_2))\) for floats \(f_1, f_2\)) and when converting integers to floats (\((\texttt{float})i = rnd (R(i))\) for an integer i).

With ab as required by the theorem, we have \(R( rnd (R(a)))=R(a)\), \(R( rnd (b))=R(b)\) and \(R(a+_\texttt{int}b) = R(a) +_\mathbb {R}R(b)\). Hence,

$$\begin{aligned}&(\texttt{float})(a+_\texttt{int}b) \\ {}={}&rnd (R(a +_\texttt{int}b)) {}={} rnd (R(a) +_\mathbb {R}R(b)) \\ {}={}&rnd (R(rnd(R(a)) +_\mathbb {R}R(rnd(R(b)))) \\ {}={}&(\texttt{float})a +_\texttt{float}(\texttt{float})b \end{aligned}$$

Note that Theorem 1 does not require \(a+b\) to be float-representable. Indeed, \(a+b\) may need to be rounded when casted to float on the left of the equation. At the same time, the float addition on the right side of the equation may also require rounding. The insight from the proof above is that the IEEE standard and the JLS force the rounding on both sides to be the same!

Theorem 1 can also be formulated for subtraction and multiplication (with essentially the same correctness arguments) instead of addition. A taclet that implements this theorem for the float-representable numbers in the aforementioned range is the following:

figure i

Note that the taclet does not cover all float-representable integers but only the interval in which all integers are float-representable. In this range, the sum \(a+b\) cannot overflow. The taclet can also be formulated for subtraction. The taclet for the case of multiplication must take this into consideration:

figure j

Note that KeY internally operates on mathematical integers (instead of 32-bit integers) such that the overflow check can be implemented in this fashion.

Another important rule for the cast operator is the conversion of literals: any expression \(\mathtt {(float)}c\) where c is a numerical integer literal can be rewritten to the corresponding numerical literal on the float side, like

Similar theorems and rules can be formulated for other combinations of primitive types, in particular involving double and long.

An example program, which makes use of numeric promotion in Java, is shown in Listing 2, where the loop invariant and the postcondition contain a numeric promotion (made explicit in the listing). In order to prove that the loop invariant is inductive, one has to show that \((\texttt{float})(i+1) = (\texttt{float})i + 1.f\), which can be proven using the newly introduced taclet mentioned above.

figure l

4 Evaluation

4.1 Benchmark programs

Table 1 Benchmark details and KeY automode statistics, time is measured in seconds

We collected a set of existing floating-point Java programs representing real-world applications in order to evaluate the feasibility and performance of KeY ’s floating-point support.

The left half of Table 1 provides an overview of our benchmarks. Each benchmark consists of one method, which is composed of arithmetic operations and method calls to potentially other classes. The invocations of methods from java.lang.Math (e.g., Math.abs) are marked by “+1” in Table 1; these are resolved by inlining the method implementation. For benchmarks that contain calls to transcendental functions and square root, the called functions are listed; these are handled by our axiomatization. We include sqrt in this list, as we have observed that exact support can be expensive, so it may be advantageous to handle sqrt axiomatically. We include benchmarks with loops and loop invariants used for the evaluation of the tool  [39], represented as Pine.*. Benchmarks Rectangle, Circuit, Matrix3, Rotation and Pine.pendulum-approx are partially shown in Listings 1, 6, 3, 4 and 5 respectively.

Table 2 Summary of valid / invalid goals correctly decided and average running times of each solver for the SMT translations with and without quantified axioms

Each benchmark also includes a JML contract that is to be checked. For some methods, we specify two contracts (marked by “(2)” in the first column of Table 1), each serving as an independent benchmark. The contracts for some of these benchmarks check that the methods do not return a special value, i.e., infinity and/or NaN, the preconditions being that the variables are not themselves special values and possibly are bounded in a given range. For the Matrix, FPLoop, Rotate and Pine.* benchmarks, we check a functional property (see Sect. 4.3). The Pine.* benchmarks and FPLoop, which has three contracts, additionally show how to specify floating-point loop behavior using loop invariants.

4.2 Proof obligation generation

To reason about the contract of a selected benchmark, we apply KeY, which generates proof obligations or ‘goals’. Some of these goals (heap-related) are closed by KeY automatically. The remaining open goals are closed by either SMT solvers with floating-point support directly (Sects. 3.1 and 3.2.1), or with a combination of transcendental KeY taclets and floating-point SMT solving (Sect. 3.2.2).

Columns 6 and 7 in Table 1 show the number of proof obligations closed by KeY directly and to be discharged by external solvers, respectively. The next two columns show the number of taclet rules that KeY applied in order to close its goals, and the time this takes. For benchmarks with two contracts, we show the respective values separated by ‘/’.

We run our experiments on a server with 1.5 TB memory and 4x12 CPU cores at 3 GHz. However, KeY runs single-threadedly and does not use more than 8GB of memory.

For our set of benchmarks, the symbolic execution process is fully automated. Note that the machinery can deal with loop invariants, if they are provided. Automated loop invariant generation is, however, particularly challenging for floating-points due to roundoff errors [25, 39], and a research topic in itself.

4.3 Evaluation of SMT floating-point support

Previous work [30] reported that SMT support for float-ing-point arithmetic is rather limited. However, with recent advances [15], we evaluate the situation again. Most benchmarks used to evaluate SMT solvers’ decision procedures [55] aim to check (individual) specialized (corner case) properties of floating-point arithmetic. The proof obligations generated from our set of benchmarks are complementary in that they are more arithmetic heavy, while nonetheless relying on accurate reasoning about special values and functional properties.

For each open goal not automatically closed, KeY generates one SMT-LIB file that is fed to the solvers for validation. We compare the performance of the three major SMT solvers with floating-point support CVC4 [6] (version 1.8, with the SymFPU library [15] enabled), Z3 (4.8.9) [26] and MathSAT (5.6.3) [20]. For this, we set a timeout of 300s for each proof obligation. While KeY is able to discharge proof obligations in parallel, for our experiments, we do so sequentially to maintain comparability.

Fig. 2
figure 2

Runtimes for valid goals with SMT translations with quantifiers

KeY ’s default translation to SMT introduces axioms which include quantifiers. These quantifications are not related to floating-point arithmetic, but are used to logically encode important properties of the Java memory model, like the type hierarchy and the absence of dangling references on any valid Java heap. If we reason about floating-point problems in isolation, they are not needed, but if we want to consider Java verification more holistically with questions combining aspects of heap and floating point reasoning, they become essential. We manually inspected that the proof obligations without our axiomatized treatment of transcendental functions do not depend on these properties and investigate the quantifier support by including or removing them from the SMT translations. We do not report results with quantifiers for MathSAT, since it does not support them.

Fig. 3
figure 3

Runtimes for valid goals with SMT translations without quantifiers

Table 3 Summary of valid goals proved and running times of each solver for the SMT translations with quantified axioms

Table 2 summarizes the results of our experiments; the first column lists the corresponding table with detailed results. Column 4 shows the number of expected valid or invalid goals for all benchmarks. For each solver, we show the number of goals that each solver can validate or invalidate, together with the average time (in seconds) needed. The goals resulting in timeout were excluded from the computation of the average time. Column 3 shows whether the SMT queries include quantifiers or not. The rows summarize our experiments with valid contracts, invalid contracts, axiomatizations of transcendental functions and square root, and loop invariants, respectively.

Figures 2 and  3 show a more detailed view of the solvers’ running time for the valid benchmarks. The x-axis shows the number of open goals that are discharged by the SMT solvers, sorted by running time for each solver individually. The k-th point of one graph shows the minimum running time needed by the solver to close each of the k fastest goals. Note that each solver may have different goals which are its k fastest. The y-axis shows the time on a logarithmic scale, and the maximum value of 300 indicates the timeout.

figure n

We conclude that in the presence of quantified axioms and floating-point arithmetic solvers’ performance deteriorate for both valid and invalid goals. In particular, none of the solvers is able to find counterexamples for any of the invalid goals. However, when the quantified axioms are removed from the SMT translations, their performance improves. For valid contracts, CVC4 and MathSAT perform better than Z3, in terms of both number of goals validated and the running time per goal. In particular, MathSAT is able to prove all goals. However, the running time performance of CVC4 is better than MathSAT’s. For invalid contracts, solvers are able to produce the expected counterexamples at least partially. Particularly, MathSAT has a better performance than CVC4 and Z3 in terms of both running time and the number of proof obligations for which it can produce counterexamples.

Table 4 Summary of valid goals proved and running times of each solver for the SMT translations without quantified axioms
Table 5 Summary of invalid goals proved and running times of each solver for the SMT translations with and without quantified axioms

Proving functional properties Listings 3 and 4 show examples of functional properties that are concerned with floating-point computation. The verification results are included in Table 3 and Table 4.

For Matrix, we check that the determinants of a matrix and its transpose are equal. Note that this property holds trivially under real arithmetic, but not necessarily under floating-points. After feeding transposedEq (which uses the determinant method) and its contract to KeY, increasing the default timeout sufficiently and discharging the created goal, CVC4 generates a counterexample in 170.2s seconds and MathSAT in 16.2s. Z3 times out after 30 minutes. By feeding transposedEqV2 (which uses the determinantNew method) to KeY, CVC4 validates the contract in 1.1s, MathSAT in 3.9s and Z3 times out again. One thing worth noting is that the way programs are written can greatly influence the computational complexity needed to reject or verify the contract. This is evident from the fact that slightly modifying the order of operations (using determinantNew instead) substantially reduces verification time and changes the verification result for MathSAT and CVC4.

For Rotate, we check that the difference between an original vector and the one that is rotated four times by 90 degrees must not be larger than 1.0E-15. We also verified the same bound for the relative difference (by exploiting another method and contract) for this benchmark. The constant cos90 in Listing 4 is not precisely 0.0 to account for rounding effects in the computation of the cosine. FPLoop includes three loops, for which the contracts check that the return value is bigger than a given constant.

figure o

Furthermore, we have investigated KeY ’s capability in proving floating-point loop invariants in more details in the next experiment.

Though not always very fast, these examples show that verification of functional floating-point properties is viable.

Proving loop invariants We conducted an experiment to assess KeY in proving floating-point loop invariants for a set of benchmarks. As part of this experiment, we verified some of the floating-point loop invariants generated by the Pine tool [39] (represented as Pine.* in Table 1). Listing 5 shows the nonlinear Pine.pendulum-approx benchmark and the loop invariant that Pine has generated. This benchmark simulates a simple pendulum and uses a Taylor approximation of the sine function. The condition of the while loop is a placeholder for any condition, meaning that we are proving the loop invariant regardless of how many iterations the loop takes. We thus specify diverges true, which means that the method is (unconditionally) allowed to not terminate.

Table 6 Summary of goals proved and running times of each solver for benchmarks with floating-point loop invariants without quantified axioms

Row 10 in Table 2 summarizes the results of this experiment, and Table 6 shows the detailed results. As shown, KeY is able to prove that all but one of the considered invariants are in fact inductive. We used the timeout of two hours for this experiment and observed that MathSAT performs better than the other solvers in proving the invariants.

Note that not all invariants generated by Pine are necessarily verifiable by KeY (e.g., Ex2 in Table 6), because the semantics of Pine and KeY differ subtly. In Pine, the generated invariant and bounds are considered to be real-valued, whereas in KeY they are evaluated under a floating-point semantics.

Table 7 SMT solvers summary statistics for various versions of the Rectangle benchmark with quantified axioms in the SMT translations

Sensitivity to contract variations We conducted an experiment on our Rectangle.scale benchmark to assess the solver’s sensitivity to various changes, applied to the benchmark’s contract or its implementation.

Specifically, we considered the following versions of the benchmark:

  • v0: is the original version of the benchmark (Listing 1 using the second contract) and our baseline;

  • v1: reduces the number of classes involved to two, while keeping the same functionality;

  • v2: reduces the number of classes involved to one, while keeping the same functionality;

  • v3: modifies v2 such that variable bounds in the precondition become more “complicated” in terms of longer fractional parts (e.g., the bounds for arg2 become [3.0000001, -6.4000000003] instead of [3.0001, -6.4000003]);

  • v4: simplifies the mathematical expression of v2 (less arithmetic operations)

  • v5: modifies v3 such that arg2 has a tighter bound, i.e., the interval width is smaller

  • v6: modifies v2 such that arg2 has a larger bound, i.e., the interval width is larger

  • v7: modifies v2 such that only arg2 has a “complicated” bound

  • v8: modifies v0 such that arg2 has a tighter bound

Table 7 summarizes the results for this experiment. With the quantified formulas included in the SMT translation, both CVC4 and Z3 are able to prove more goals when the number of classes is reduced, and also when the number of arithmetic operations is reduced. Z3 further seems to be sensitive to whether variable bounds are “complicated” or not, whereas CVC4 is not. We obtain a somewhat surprising result when arg2 has a tighter bound. While Z3’s performance improves, CVC4 validates two goals less. On the other hand, increasing the bounds on arg2 does not seem to make a difference.

It seems that arg2 is the bottleneck for this benchmark; when only arg2 has a “complicated” input interval, CVC4 proves less goals. Finally, constraining arg2 in the original benchmark more tightly allows CVC4 to validate all goals but Z3’s performance remains unaffected.

With the SMT-LIB translation that does not introduce quantified axioms, we can see that CVC4’s results, in terms of number of goals validated, are the same as before, while Z3 performs much better than before. MathSAT is able to validate all goals of all versions.

In summary, the solvers’ performance seems to be sensitive to slight innocuous looking changes such as the number of classes involved and variable bounds. For example, constraining arg2 in the original benchmark more tightly allows CVC4 to validate all goals (1 more). This behavior could be potentially exploited by, e.g., relaxing a variable’s bounds.

4.4 Transcendental functions in KeY

Table 8 Summary statistics with axioms in SMT-LIB translations and as taclet rules in KeY

We evaluated the two approaches from Sect. 3.2.1 on our set of benchmarks; rows 5, 6, and 7 in Table 2 summarize the results. The detailed results of these experiments are included in Table 8. Note that both approaches are fully automated.

We conclude that the SMT solvers perform better when the axiomatization is applied at the KeY level. When axioms for transcendental functions are added to the SMT-LIB translation directly Z3 validates 4 out of 10 goals. With the axiomatization at the KeY level, solvers are able to validate more goals (with quantified formulas removed from the SMT translations), e.g., Z3 is able to validate 5 goals and CVC4 can validate all. Therefore, it is preferable to apply them on the KeY side via taclet rules.

figure p

All the solvers we have used in this work comply with the IEEE 754 standard and therefore have bit-precise support for the square-root function. They provide bit-precise reasoning by effectively encoding the behavior of floating-point circuits over bitvectors (which is naturally expensive), together with different heuristics and abstractions to speed up solving time. However, depending on the property, we do not always need bit-precise reasoning, so we propose handling the square-root function with the same taclet-based axiomatization as introduced in Sect. 3.2.2.

To this end, we conducted an experiment on the benchmarks containing sqrt, comparing the approach from Sect. 3.2.2 (adding the necessary axioms, resp. taclet rules) to using the square root implemented in SMT solvers (fp.sqrt). We chose to include only axioms specified in or inferred from the IEEE 754 standard. The set of used axioms are as follows:

  • If arg is NaN or less than zero, then sqrt(arg) is NaN.

  • If arg is positive infinity, then sqrt(arg) is positive infinity.

  • If arg is positive zero or negative zero, then sqrt(arg) is the same as arg.

  • If arg is not NaN and greater than or equal to zero, then sqrt(arg) is not NaN.

  • If arg is not infinity and is greater than one then sqrt(arg) < arg.

Rows 8 and 9 in Table 2 summarize the results for this experiment; the detailed results are included in Table 9.

We observed that for two out of the three benchmarks, the average running time of all solvers decreases using the axiomatized square root. Furthermore, Z3 is able to reason about more proof obligations with the axiomatized version. However, the success of this approach depends on the axioms added to KeY and may not always work if we do not have suitable axioms. For example, for the Circuit.instantCurrent benchmark which computes the instantanious current of an RL circuit (Listing 6), using the axiomatized square root, CVC4 is not able to validate the contract, but with fp.sqrt the contract is validated. The reason our approach is unsuccessful on this benchmark is that the square-root function appears early in the computation followed by atan and cos afterwards, resulting in complex expressions. In order to prove the corresponding proof obligations, in this case stronger axioms are needed.

Table 9 Summary statistics for benchmarks containing the square-root function, with quantified formulas removed from the SMT-LIB translation
figure q

In summary, treating sqrt axiomatically can result in shorter solving times than performing bit-precise reasoning, but the approach may not always succeed when the axioms are not sufficient to prove a particular property.

4.5 Discussion and insights

In our set of experiments, we used benchmarks taken from real code to examine the feasibility of our approach and the solver’s support for the floating-point theory. We can conclude that, for our set of benchmarks, generally all the solvers perform better with an SMT translation that does not introduce quantified formulas. Especially if the contract of the benchmark is invalid, the solvers are not able to produce counterexamples when quantifiers are present. Another observation is that solvers’ performance is affected by the number of heap operations performed and the complexity of variable input ranges. From our experiment with programs containing loops, we observed that KeY is able to prove the provided invariants and solvers are mostly able to prove the generated goals. Finally, based on our results, we can confirm that the support for floating-point theory seams promising in all the solvers we examined. However, in terms of scalability and the size and type of problems they can handle, there is still room for improvement.

From our experiments with programs containing transcendental functions (and sqrt), we observed that handling these functions as uninterpreted with an appropriate axiomatization at the SMT-LIB or the taclet level is a viable approach, with an interesting trade-off between the verifier performance and the properties that can be proven. Clearly, properties that require exact semantics for transcendental functions will not be proven with our approach, but our experiments show that reasoning about the absence of special values is indeed possible. We further observe that both our approaches for adding axioms to the SMT queries or as taclet rules work to some extent, however, applying axioms as taclets in KeY directly has several advantages. Using taclets avoids quantified axioms in the SMT query, which in turn improves the performance. While the additional assertions do not compromise the theoretical decidability of the theory (since the quantified domains are finite), they add considerably to the complexity of the encoding for the bit-blasting SAT-based decision procedures such that running times may increase exponentially. The experiments have thus exposed a weakness of the SAT-based verification approach: if symbols outside the canonical arithmetic operations with fixed semantics are used within a program or specification, providing semantics for these symbols within the SMT-LIB translation is not efficient.

Concretely, applying the axiomatization as taclets (and removing other quantifiers from the SMT translation) allows us to use MathSAT as the solver. Furthermore, we have also observed that quantifiers in the SMT queries result in poor performance (unknown results or timeouts) when a query is invalid. Applying axioms as taclets lets us avoid this issue. The rule-based sequent calculus, on which the KeY reasoning engine is based, deals with universally quantified symbol axiomatizations very successfully in many domains. The taclet mechanism and the implemented automatic strategy allow one to control the treatment of the symbol depending on a variety of side conditions, e.g., applying a lemma only if another relevant formula is also present in the proof obligation. Furthermore, taclets can also be applied manually, which allows the verifying user to control the calculus in a very fined-grained manner. Another advantage of having taclet rules is that we can create taclets with different formats for a single axiom and by doing so can sometimes even reduce the size of the proof obligation. This is the case when, instead of adding an axiom to the proof obligation, we replace a term with another one and thus avoid enlarging the proof obligation. That said, an axiomatization at the KeY level may result in spurious counterexamples if a rule application for an uninterpreted symbol, which would make the sequent valid, has not been applied yet. However, such a spurious counterexample can be straightforwardly identified by simply executing the method in question on it.

To summarize, the experiments show that highly automated floating point program verification is viable for relevant properties (handling of special values and some functional properties), up to a certain level of complexity (given by the SMT solvers). The choices of which parts of a proof obligation are delegated to SMT, and how they are translated to SMT are crucial for achieving effective and efficient program verification. Arithmetic operations proved to be more efficiently dealt with by delegation to SMT, whereas for transcendental functions, axiomatization and rule-based treatment in the theorem prover, outside the SMT solver, perform clearly better.

5 Related work

Our implementation uses the floating-point SMT-LIB theory [16], which, however, does not handle transcendental functions, as their semantics is (library) implementation dependent. Some real-valued automated solvers do handle transcendental functions [3, 31], but to the best of our knowledge, the combination of floating-points and reals in SMT solvers is still severely limited.

None of the existing deductive verifiers support floating-point transcendental functions automatically. The Why3 deductive verification framework [28] has support for floating-point arithmetic, with front-ends for the C and Ada programming languages through Frama-C [23] and SPARK [17, 30], respectively. Why3 has back-end support for different SMT solvers, as well as interactive proof assistants like Coq. Until recently, Why3 would discharge still many interesting floating-point problems with the help of Coq, relying on significant user interaction. In later work [30] (in the context with floating-point verification for Ada programs), Why3 can achieve a higher degree of automation. Note, however, that the user is still required to add code assertions as well as ‘ghost code’ to a significant extent.

The Boogie intermediate verification language [46] also supports floating-point expressions and targets Z3 for discharging proof obligations. In the Boogie community, it was observed that writing a specification in Boogie leads to decreases in SMT solver performance when compared to writing the goal in SMT-LIB directly, probably due to an inherent mixing of theories when using Boogie [57]. This matches our own experiences, and separation of theories should be considered an important task for the further development of floating-point verification.

Other deductive verifiers for Java have only rudimentary support for floating-points. Verifast [40] treats floating-point operations as if they were real values, and OpenJML [21] parses programs with floating-point operations, but essentially treats float and double as uninterpreted sorts.

The Java category of verification competition SV-COMP [10] contains a number of benchmarks that make use of floating-point variables. However, the focus of these benchmarks is usually not on arithmetical properties of expressions, but on the completeness of the Java language support. Amongst the participants of SV-COMP 2020, Symbolic (Java) Pathfinder (SPF) [54] (and various extensions) and the Java Bounded Model Checker (JBMC) [22] support floating-point arithmetic. Besides being limited to exploring the state space up to a bounded depth, their constraint languages do not support quantifiers and abstracting of method calls—which are features that we have used in this work.

Floating-point arithmetic has also been formalized in several interactive theorem provers [14, 29, 41]. While one can prove intricate properties about floating-point programs [12, 13, 37], proofs using interactive provers are to a large part manual and require significant expertise.

Abstract interpretation-based techniques can show the absence of special values in floating-point code fully automatically, and several abstract domains which are sound with respect to floating-point arithmetic exist [18, 42]. While the analysis itself is fully automated, applying it successfully to real-world programs in general requires adaptation to each program analyzed by end-users, e.g., the selection of suitable abstract domains or widening thresholds [11].

Besides showing the absence of special values, recent research has developed static analyses to bound floating-point roundoff errors [24, 34, 47, 51, 58]. These analyses currently work only for small arithmetic kernels and the tools in particular do not accept programs with objects.

Dynamic analyses generally scale well on real-world programs, but can only identify bugs (when given failure-triggering input), rather than proving correctness for all possible inputs. Executing a floating-point program together with a higher-precision one allows one to find inputs which cause large roundoff errors [9, 19, 43]. Ariadne [5] uses a combination of symbolic execution, real-valued SMT solving and testing to find inputs that trigger floating-point exceptions, including overflow and invalid operations. Our work subsumes this approach as the SMT solvers that we use can directly generate counterexamples, but more importantly, KeY is able to prove the absence of such exceptions.

6 Conclusion

In this work, we set out to enable efficient verification of programs which feature floating-point computations in combination with the other features of a fully fledged, widely used programming language, here Java. This is a different problem from verifying floating-point computations in isolation. To achieve that, we extend the verification tool KeY , which prior to this work supported full sequential Java except floating-point types. The core of KeY is a prover applying proof rules capturing an axiomatic semantics of the target language (Java), with the option to export proof goals to SMT solver plug-ins. This gave us the freedom to decide which features, and under which circumstances, should be dealt with by proof rules within the KeY prover, or by exporting goals to an SMT solver, respectively.

By joining the complementary strengths of SAT-based SMT solving and rule-based deduction, we presented the first working floating-point support in a deductive verification tool for Java. At the same time, we close a remaining gap in KeY to now support full sequential Java. Our evaluation shows that for specifications dealing with absence of NaN and infinity, as well as with value ranges, our approach can verify realistic programs automatically within a reasonable time frame. This includes programs using transcendental functions, as well as programs with loops. We observe that the MathSAT and CVC4 solver’s floating-point support scales sufficiently for our benchmarks, as long as the queries do not include any quantifiers. On the other hand, our axiomatized approach for transcendental functions is best realized using calculus rules in KeY ’s internal reasoning engine, rather than in SMT solvers. We also presented rules for handling potentially rounding casts from integers to floating-point types.

While our work is implemented within the KeY verifier, we expect the insights from this work to be portable to other verifiers.