Deductive Verification of Floating-Point Java Programs in KeY

Deductive verification has been successful in verifying interesting properties of real-world programs. One notable gap is the limited support for floating-point reasoning. This is unfortunate, as floating-point arithmetic is particularly unintuitive to reason about due to rounding as well as the presence of the special values infinity and ‘Not a Number’ (NaN). In this paper, we present the first floating-point support in a deductive verification tool for the Java programming language. Our support in the KeY verifier handles arithmetic via floating-point decision procedures inside SMT solvers and transcendental functions via axiomatization. We evaluate this integration on new benchmarks, and show that this approach is powerful enough to prove the absence of floating-point special values—often a prerequisite for further reasoning about numerical computations—as well as certain functional properties for realistic benchmarks.


Introduction
Deductive verification has been successful in providing functional verification for programs written in popular programming languages such as Java [3,22,40,48], Python [28], Rust [5], C [24,53], and Ada [18,49].Deductive verifiers allow a user to annotate methods in a program with pre-and postconditions, from which they automatically generate verification conditions (VCs).These are then either proven directly by the verifier itself, or discharged with external tools such as automated (SMT) solvers or interactive proof assistants.
While deductive verifiers fully implement many sophisticated data representations (including heap data structures, objects, and ownership), support for floating-point numbers remains rather limited -solely Frama-C and SPARK offer automated support for floating-point arithmetic in C and Ada [31].This state of affairs is at least partially a result of previous limitations in floating-point support in SMT solvers.Consequently, deductive verification has been used for floating-point programs only by experts with considerable manual effort [14,31].This is unfortunate as it makes deductive verification unavailable for a large number of programs across many domains including embedded systems, machine learning, and scientific computing.With the increasing need for parallelization in code, scientific computing specifically has recently experienced algorithmic challenges for which formal methods may contribute to a solution [9,55].
One of the main challenges of floating-point arithmetic is its unintuitive behavior and the special values that the IEEE 754 standard [38] introduces.For instance, an overflow or a division by zero results in the special value (positive or negative) infinity, and not a runtime exception.Similarly, invalid operations like sqrt(-1.0)result in a Not a Number (NaN) value.These special values are problematic as seemingly straight-forward identities do not hold (x == x or x * 0.0 == 0.0).In addition, every operation on floating-point numbers potentially involves rounding, which compromises familiar rules like associativity and distributivity.Hence, reasoning support for writing correct floating-point programs is indispensable.
Abstract interpretation-based tools can prove the absence of runtime errors and special values [19,42], and bound roundoff errors due to floating-point's finite precision [10,20,25,35,56].SMT decision procedures [17] or SAT-based model-checking [23,55], on the other hand, can prove intricate properties requiring bit-precise reasoning.However, these techniques and tools largely support only purely floating-point programs or program snippets, or analyze programs only up to a predefined depth of the call stack.General reasoning about real-world object-oriented programs, however, also requires support for features such as the (unbounded) heap, necessitating different analyses which need to be combined with floating-point reasoning.
Handling floating-points in a deductive verifier has unique advantages.First, the deductive verification approach already comes with the infrastructure for reasoning about complex control and data structures (like exception handling and heap).Second, it allows one to flexibly combine the verifier's symbolic execution reasoning with external decision procedures.Third, depending on the theory support, the verifier or external solver may also generate counterexamples of a property and thus help program debugging -something an abstract interpretationbased approach fundamentally cannot provide.
We report on adding floating-point support to the KeY deductive verifier, providing the first automated deductive floating-point support for the Java programming language.We focus mainly on proving the absence of the special values infinity and NaN.While these are helpful in certain circumstances, for most applications they signal an error.Hence, showing their absence is a prerequisite for further (functional) reasoning.That said, our extension also allows one to express and discharge arbitrary functional properties expressible in floating-point arithmetic, including bounds on roundoff errors for certain programs, and bounds on differences between two similar floating-point programs We exploit both KeY's symbolic execution and external SMT support.On the one hand, we handle arithmetic operations by relying on a combination of KeY's symbolic execution to handle the heap and SMT based decision procedures to handle the floating-point part of the VCs.On the other hand, we support transcendental functions via axiomatization in the KeY prover itself.
Transcendental functions such as sine are a common feature in numerical programs, but are not supported by floating-point decision procedures.We explore two ways of supporting them soundly but approximately, by encoding them as axiomatized uninterpreted function symbols once directly in the SMT queries, and once in additional calculus rules in KeY.Our evaluation shows that even though such reasoning is approximate, it is nonetheless sufficient to prove the absence of special values in many interesting programs.
We evaluate KeY's floating-point support on a number of real-world floatingpoint Java programs.Our benchmark set allows us to evaluate recent progress in SMT floating-point support in Z3 [27], CVC4 [7] and MathSAT [21] on yet unseen benchmarks.For instance, we observe that quantifiers are challenging even if they do not affect satisfiability of SMT queries.Our benchmarks are openly available, and we expect our insights to be useful for further solver development.
Contributions In summary, we make the following contributions: we implement and evaluate the first automated deductive verification of floating-point Java programs by combining the strength of rule based and SMT based deduction; we collect a new set of challenging real-world floating-point benchmarks in Java (available at https://gitlab.mpi-sws.org/AVA/key-float-benchmarks/);we compare different SMT solvers for discharging floating-point VCs on this new set of benchmarks; and we develop novel automated support for reasoning about transcendental functions in a deductive verifier.

Introduction to KeY
KeY [3] is a platform for deductive verification of Java programs, working at a source code level.The input is a Java program annotated in the Java Modeling Language (JML) [44], encouraging a Design by Contract ( [45,50]) approach to software development.The user specifies the expected behavior of Java classes with class invariants that the program has to maintain at critical points.Methods are specified with method contracts, consisting mainly of pre-and postconditions, with the understanding that if the precondition holds when the method is called, the postcondition has to hold after the method returns.
After loading an annotated program, KeY translates it to a formula in Java Dynamic Logic [3] (JavaDL), an instance of Dynamic Logic [36] which enables logical reasoning about Java programs.Logical rules are provided for the translation of programs into first-order logic, and for closing the resulting goals, or proof obligations.KeY is semi-interactive in that it allows manual rule application, while also offering powerful built-in automation and macros.In addition, it is also possible to translate an open goal into SMT-LIB format [8] and call an external SMT solver.For specific theories, SMT solvers can be much more efficient than KeY's own automation.This makes it possible to prove some goals, which depend on SMT supported theories, by using an SMT solver, while others are proved internally, using KeY's own automation.

Floating-Point Arithmetic in Java
In the following, we summarize some central characteristics of Java floating-point numbers, loosely following [52].Each normal floating-point number x can be represented as a triplet (s, m, e), such that x = (−1) s * m * 2 e , where s ∈ {0, 1} is the sign, m (called significand ) is a binary fixed-point number with one digit before the radix point and p − 1 digits after the radix point (note that 0 ≤ m < 2), and e (exponent) is an integer such that e min ≤ e ≤ e max .Java supports two floating-point formats (both in base 2): float ('single') precision with p = 24, and minimal and maximal exponent e min = −126, e max = 127 and double precision with p = 53, e min = −1022, e max = 1023.
Whenever the result of a computation cannot be exactly represented with the given precision, it is rounded.IEEE 754 defines various rounding modes, of which Java only supports round to nearest, ties to even.Rounding is exact, as if one would first compute the ideal real number, and round afterwards.
The triple representation gives us two zeros, +0 and −0, represented by (0, 0, 0) and (1, 0, 0), respectively.If the absolute value of the ideal result of a computation is too small to be representable as a floating-point number of the given format, the resulting floating point number is +0 or −0.In addition, there are three special values, +∞, −∞, and NaN (Not a Number).If the absolute value of the ideal result of a computation is too big to be representable as a floating-point number of the given format, the result is +∞ or −∞.Also, division by zero will give an infinite result (e.g., 7.13/ + 0 = +∞).Computing further with infinity may give an infinite result (e.g., +∞ + +∞ = +∞), but may also result in the additional 'error value' NaN (e.g., +∞ − +∞ = NaN).Due to the presence of infinities and NaN, floating-point operations do not throw Java exceptions.
By default, the Java virtual machine is allowed to make use of higher-precision formats provided by the hardware.This can make computation more accurate, but it also leads to platform dependent behaviour.This can be avoided by using the strictfp modifier, ensuring that only the single and double precision types are used.This modifier ensures portability.
3 Floating-Point Support in KeY

Arithmetics
In order to be able to specify and verify programs containing floating-point numbers, we made several extensions to the KeY tool.First, we added the float Listing 1.1:The Rectangle.scalebenchmark We further introduced functions and predicate symbols to formalize operations (+, * , . . . ) and comparisons (<, ==, . . . ) on floating-point expressions.The translation supports both code with and without the strictfp modifier.However, since the actual precision of non-strictfp operations is not known, the function symbols remain uninterpreted.We extended KeY's parser to correctly handle programs and annotations containing floating-point numbers, and added logic rules for translating floating-point expressions from Java or JML to JavaDL.
As an example, Listing 1.1 shows JML specifications of our Rectangle benchmark that contains floating-point literals and makes use of the fp _ nan and fp _ nice predicates.fp _ nan states that a floating-point expression is NaN and fp _ nice, which is shorthand for "not infinity and not NaN", states that a floating-point expression is not NaN or infinity.The scale method contains two contracts that are checked separately, ensuring that the class fields of a scaled rectangle object are not NaN, considering different preconditions.For the first contract, the SMT solver produces a counterexample.In the second, we bound inputs by concrete ranges that we picked arbitrarily and get the valid result.In practice, such ranges would come from the context, e.g. from the kind of rectangles that appear in an application, or from known ranges of sensor values.
Concerning discharging the resulting proof obligations, there were two main ways to consider.One is to create a floating-point theory within KeY by adding axioms and deduction rules, so that the desired properties can be proven in KeY's sequent calculus.The other way is to translate the proof obligations from JavaDL to SMT-LIB and call an external SMT solver.While the KeY approach traditionally favors conducting proofs within KeY, for this work, we partially deviated from this way in order to harness the greater experience and efficiency of SMT solvers when it comes to floating-point arithmetic.Our approach attempts to get the best of both worlds by distinguishing between basic floating-point arithmetic, i. e., elementary operations and comparisons, and more complex functions which do not have an SMT-LIB equivalent (e. g., the transcendental functions), or where the SMT-LIB function is not usefully implemented by current SMT solvers (see Section 3.2.B).
Elementary operations and comparisons get translated to the corresponding SMT-LIB functions.In SMT-LIB, all floating-point computations conform to the IEEE 754 Standard.Therefore, only Java programs with the strictfp modifier can be directly translated to SMT-LIB without loss of correctness.
We developed a translation from KeY's floating-point theory to SMT-LIB.In order to integrate it into KeY, we also overhauled the existing translation from JavaDL to SMT-LIB to create a new, more modular framework, which now supports all the features of the original translation, e. g., heaps and integer arithmetic, but also floating-point expressions at the same time.
Floating-point intricacies sometimes require extra caution.For example, there are two different notions of equality for floats: bitwise equality and IEEE754 equality.Our implementation ensures these are distinguished correctly, and that the specification language remains intuitive for a developer to use.
Using the translation to SMT-LIB, we can specify and prove two classes of properties in KeY: The absence of special values is specified using the fp _ nan and fp _ infinite predicates (or the fp _ nice equivalent).Furthermore, one can specify functional properties that are expressible in floating-point arithmetic, e.g. one can compare the result of a computation against the result of a different program which is known to produce a good result or a reference value.

Transcendental Functions
Floating-point decision procedures in SMT solvers successfully handle programs consisting of arithmetic and square root operations.Many numerical real-world programs, however, include transcendental functions such as sin and cos.In Java programs, these functions are implemented as static library functions in the class java.lang.Math.
Unlike arithmetic operations, transcendental functions are much more loosely specified by the IEEE 754 Standard-only an upper bound on the roundoff error is given.Libraries are thus free to provide different implementations, and even tighter error bounds.Exact reasoning in the same spirit as floating-point arithmetic would thus have to encode a specific implementation.Given that these implementations are highly optimized, this approach would be arguably complex.
We observe, however, that such exact reasoning about transcendental functions is often not necessary and a sound approximate approach is sufficient and efficient.
In this section, we introduce an axiomatic approach for reasoning about programs containing transcendental functions.We observe that with the flexibility of deductive verification and KeY itself, we can instantiate it in two different ways.We encode transcendental functions as uninterpreted functions and axiomatize them in the SMT queries.Alternatively, we encode these axioms in KeY as logical inference rules.
(A) Axiomatization in SMT We encode library functions as uninterpreted functions and include a set of axioms in the SMT-LIB translation for each method that is called in a benchmark.That is, we extended KeY such that when a transcendental function exists in the proof obligation, its definition alongside all the axioms for that function are added to the translation.
For the axiomatization of transcendentals, we did not add rules that expand to a definition or allow a repeated approximation of the function value (like expansion into a Taylor series).Instead, we added a number of lemmata encoding interesting properties related to special values.For instance, the following axiom states that if the input to the sin function is not a NaN or infinity, then the returned value of sin is between −1.0 and 1.0:Note that this implies that the result is not a NaN or infinity.The other axioms are similar in spirit, so we do not list them.
These axioms are expressed as quantified floating-point formulas and capture high-level properties of library functions complying with the specifications in the IEEE 754 Standard.Clearly, since we do not have the actual implementations of these functions, we are not able to prove arbitrary properties.However, such an axiomatization is often sufficient to check for the (absence of) special values, i.e.NaN and infinity, as our experiments in Section 4.4 show.
(B) Taclets in KeY Reasoning about quantified formulas in SMT is a longlasting challenge [33].We have also observed in our experiments with only arithmetic operations (Section 4.3) that SMT solvers struggle with quantifiers in combination with floating-points.We have therefore implemented an alternative approach encoding the axioms not in the SMT queries, but instead as deductive inference rules (so-called taclets) in KeY.
The rules encode the same logical information as the universally quantified assertions that we add in SMT-LIB (and where we leave the choice of instantiations entirely to the SMT/SAT solver).With our taclet approach, we instantiate a quantifier (only) to one's needs.We note that for proving a property correct, this results in a correct (under)approximation.However, the prize for achieving A heuristic strategy applies the rules automatically using the occurrences of transcendentals as instantiation triggers.However, instantiating the axioms too eagerly, considerably increases the number of open goals, which is why we assume that the user selects the axioms to apply manually (and did so in the experiments).After the application the proof obligation can either be closed, i.e proven, by KeY automatically, or be given to the SMT solver as before for final solving.
Currently, the set of axioms (in the SMT-LIB translation and as taclets in KeY) only contains axioms for the transcendental functions occurring in our benchmarks.So far we have 10 axioms; however, adding more axioms (also for further transcendentals like exponentiation or logarithm) is straightforward.The full set of axioms is included in the Appendix of the technical report.

Benchmark Programs
We collected a set of existing floating-point Java programs representing realworld applications in order to evaluate the feasibility and performance of KeY's floating-point support.
The left half of Table 1 provides an overview of our benchmarks.Each benchmark consists of one method, which is composed of arithmetic operations

}}
and method calls to potentially other classes.The invocations of methods from java.lang.Math (e.g.Math.abs) are marked by "+1" in Table 1; these are resolved by inlining the method implementation.For benchmarks that contain calls to transcendental functions and square root, the called functions are listed; these are handled by our axiomatization.We include sqrt in this list, as we have observed that exact support can be expensive, so it may be advantageous to handle sqrt axiomatically.Benchmarks Rectangle, Circuit, Matrix3 and Rotation are partially shown in Listings 1.1, 1.2, 1.3 and 1.4 respectively.
Each benchmark also includes a JML contract that is to be checked.For some methods, we specify two contracts (marked by "(2)" in the first column of Table 1), each serving as an independent benchmark.The contracts for most of these benchmarks check that the methods do not return a special value i.e infinity and/or NaN, the preconditions being that the variables are not themselves special values and possibly are bounded in a given range.For the Matrix, FPLoop and Rotate benchmarks, we check a functional property (see Section 4.3).FPLoop, which has three contracts, additionally shows how to specify floating-point loop behavior using loop invariants.

Proof Obligation Generation
To reason about the contract of a selected benchmark, we apply KeY, which generates proof obligations or 'goals'.Some of these goals (heap-related) are closed by KeY automatically.The remaining open goals are closed by either SMT solvers with floating-point support directly (Section 3.1 and Section 3.2.A), or with a combination of transcendental KeY taclets and floating-point SMT solving (Section 3.2.B).
Columns 6 and 7 in Table 1 show the number of proof obligations closed by KeY directly and to be discharged by external solvers, respectively.The next two columns show the number of taclet rules that KeY applied in order to close its goals, and the time this takes.For benchmarks with two contracts we show the respective values separated by '/'.
We run our experiments on a server with 1.5 TB memory and 4x12 CPU cores at 3 GHz.However, KeY runs single-threadedly and does not use more than 8GB of memory.
For our set of benchmarks, the symbolic execution process is fully automated.Note that the machinery can deal with loop invariants, if they are provided.Loop invariant generation is, however, particularly challenging for floating-points due to roundoff errors [26,39], and a research topic in itself.

Evaluation of SMT Floating-Point Support
Previous work [31] reported that SMT support for floating-point arithmetic is rather limited.However, with recent advances [17], we evaluate the situation again.Most benchmarks used to evaluate SMT solvers' decision procedures [1] aim to check (individual) specialized (corner case) properties of floating-point arithmetic.The proof obligations generated from our set of benchmarks are complementary in that they are more arithmetic heavy, while nonetheless relying on accurate reasoning about special values and functional properties.
For each open goal not automatically closed, KeY generates one SMT-LIB file that is fed to the solvers for validation.We compare the performance of the three major SMT solvers with floating-point support CVC4 [7] (version 1.8, with the SymFPU library [17] enabled), Z3 (4.8.9) [27] and MathSAT (5.6.3)[21].For this we set a timeout of 300s for each proof obligation.While KeY is able to discharge proof obligations in parallel, for our experiments, we do so sequentially to maintain comparability.
KeY's default translation to SMT includes quantifiers.These quantifications are not related to floating-point arithmetic, but are used to logically encode important properties of the Java memory model, like the type hierarchy and the absence of dangling references on any valid Java heap.If we reason about floating-point problems in isolation, they are not needed, but if we want to consider Java verification more holistically with questions combining aspects of heap and floating point reasoning, they become essential.We manually inspected that the proof obligations without our axiomatized treatment of transcendental functions do not depend on these properties and investigate the quantifier support by including or removing them from the SMT translations.We do not report results with quantifiers for MathSAT, since it does not support them.
Table 2 2 show the results for benchmarks with valid contracts.This experiment thus represents the common behavior of KeY, whose main goal is to prove contracts correct.Rows 3 and 4 of Table 2 demonstrate the results for benchmarks with invalid contracts, i.e. for those we expect a counterexample for at least one of the goals.The Appendix (Section A) contains the detailed results for each experiment separated by benchmark.Figure 1 and Figure 2 show a more detailed view of the solvers' running time for the valid benchmarks.The x-axis shows the number of open goals that are discharged by the SMT solvers, sorted by running time for each solver individually.The k-th point of one graph shows the minimum running time needed by the solver to close each of the k fastest goals.Note that each solver may have different goals which are its k fastest.The y-axis shows the time on a logarithmic scale.
We conclude that in the presence of quantified axioms and floating-point arithmetic solvers' performance deteriorate for both valid and invalid goals.In particular, none of the solvers is able to find counterexamples for any of the invalid goals.However, when the quantified axioms are removed from the SMT translations, their performance improves.For valid contracts, CVC4 and MathSAT perform better than Z3, in terms of both number of goals validated and the running time per goal.In particular, MathSAT is able to prove all goals.However, the running time performance of CVC4 is better than MathSAT's.For invalid contracts, solvers are able to produce the expected counterexamples at least partially.Particularly, MathSAT has a better performance than CVC4 and Z3 in terms of both running time and the number of proof obligations for which it can produce counterexamples.
We conducted another experiment on our Rectangle.scalebenchmark to assess the solvers' sensitivity to various changes, applied to the benchmark's contract or its implementation.We considered modifications such as reducing the number of classes while keeping the same functionality, having tighter and larger bounds for variables, reducing the number of arithmetic operations etc.The details of this experiment can be found in the Appendix of the technical report.In summary, solvers' performance seems to be sensitive to slight innocuous looking changes such as the number of classes involved and variable bounds.For example, constraining arg2 in the original benchmark more tightly allows CVC4 to validate all goals (1 more).This behavior could be potentially exploited by e.g.relaxing a variable's bounds.
Proving Functional Properties Listings 1.3 and 1. 4 show examples of functional properties that are expressible in floating-point arithmetic and that KeY can handle.The verification results are included in rows 1 and 2 of Table 2, for more details see the Appendix of the technical report.
For Matrix, we check that the determinants of a matrix and its transpose are equal.Note that this property holds trivially under real arithmetic, but not necessarily under floating-points.After feeding transposedEq (which uses the determinant method) and its contract to KeY, increasing the default timeout sufficiently and discharging the created goal, CVC4 generates a counterexample in 170.2s seconds and MathSAT in 16.2s.Z3 times out after 30 minutes.By feeding transposedEqV2 (which uses the determinantNew method) to KeY, CVC4 validates the contract in 1.1s, MathSAT in 3.9s and Z3 times out again.One thing worth noting is that the way programs are written can greatly influence the computational complexity needed to reject or verify the contract.This is evident from the fact that slightly modifying the order of operations (using determinantNew instead) substantially reduces verification time and changes the verification result for MathSAT and CVC4.
For Rotate, we check that the difference between an original vector and the one that is rotated four times by 90 degrees, must not be larger than 1.0E-15.We also verified the same bound for the relative difference (by exploiting another method and contract) for this benchmark.The constant cos90 in Listing 1.4 is not precisely 0.0 to account for rounding effects in the computation of the cosine.FPLoop includes three loops, for which the contracts check that the return value is bigger than a given constant.
Though not always very fast, these examples show that verification of functional floating-point properties is viable.

Evaluation of Support for Transcendental Functions in KeY
We evaluated the two approaches from Section 3.2.A on our set of benchmarks; rows 5 and 6 in Table 2 summarize the results.(The detailed results of these experiments are included in the Appendix of the technical report.)Note that both approaches are fully automated.
We conclude that the SMT solvers perform better when the axiomatization is applied at the KeY level.When axioms for transcendental functions are added to the SMT-LIB translation directly Z3 validates 4 out of 10 goals.With the axiomatization at the KeY level, solvers are able to validate more goals (with quantified formulas removed from the SMT translations), e.g.Z3 is able to validate 5 goals and CVC4 can validate all.Therefore, it is preferable to apply them on the KeY side via taclet rules.
All the solvers we have used in this work comply with the IEEE 754 standard and therefore have bit-precise support for the square root function.They provide bit-precise reasoning by effectively encoding the behavior of floating-point circuits over bitvectors (which is naturally expensive), together with different heuristics and abstractions to speed up solving time.However, depending on the property, we do not always need bit-precise reasoning, so we propose handling the square root function with the same taclet-based axiomatization as introduced in Section 3.2.B.
To this end, we conducted an experiment on the benchmarks containing sqrt, comparing the approach from Section 3.2.B (adding the necessary axioms, resp.taclet rules) to using the square root implemented in SMT solvers (fp.sqrt).We chose to include only axioms specified in or inferred from the IEEE 754 standard (e.g. if the argument of the square root function is NaN or less than zero, then the square root results in NaN).The full set of axioms that we used is included in the Appendix of the technical report.
Rows 7 and 8 in Table 2 summarize the results for this experiment; the detailed results are included in the Appendix of the technical report.We observed that for two out of the three benchmarks, the average running time of all solvers decreases using the axiomatized square root.Furthermore, Z3 is able to reason about more proof obligations with the axiomatized version.However, the success of this approach depends on the axioms added to KeY and may not always work if we do not have suitable axioms.For example, for the Circuit.instantCurrentbenchmark (Listing 1.2), using the axiomatized square root, CVC4 is not able to validate the contract, but with fp.sqrt the contract is validated.
In summary, treating sqrt axiomatically can result in shorter solving times than performing bit-precise reasoning, but the approach may not always succeed when the axioms are not sufficient to prove a particular property.

Discussion and insights
The experiments show that highly automated floating point program verification is viable for relevant properties (handling of special values and some functional properties), up to a certain level of complexity (given by the SMT solvers).The choices of which parts of a proof obligation are delegated to SMT, and how they are translated to SMT, are crucial for achieving effective and efficient program verification.Arithmetic operations proved to be more efficiently dealt with by delegation to SMT, whereas for transcendental functions, axiomatization and rule based treatment in the theorem prover, outside the SMT solver, performs clearly better.

Related Work
Our implementation uses the floating-point SMT-LIB theory [16], which however does not handle transcendental functions, as their semantics is (library) implementation dependent.Some real-valued automated solvers do handle transcendental functions [4,32], but to the best of our knowledge, the combination of floating-points and reals in SMT solvers is still severely limited.
None of the existing deductive verifiers support floating-point transcendental functions automatically.The Why3 deductive verification framework [29] has support for floating-point arithmetic, with front-ends for the C and Ada programming languages through Frama-C [24] and SPARK [18,31], respectively.Why3 has back-end support for different SMT solvers, as well as interactive proof assistants like Coq.Until recently, Why3 would discharge still many interesting floating-point problems with help of Coq, relying on significant user interaction.In later work [31] (in the context with floating-point verification for Ada programs), Why3 can achieve a higher degree of automation.Note, however, that the user is still required to add code assertions as well as 'ghost code' to a significant extent.
The Boogie intermediate verification language [46] also supports floatingpoint expressions, and targets Z3 for discharging proof obligations.In the Boogie community, it was observed that writing a specification in Boogie leads to decreases in SMT solver performance when compared to writing the goal in SMT-LIB directly, probably due to an inherent mixing of theories when using Boogie [2].This matches our own experiences, and separation of theories should be considered an important task for the further development of floating-point verification.
Other deductive verifiers for Java have only rudimentary support for floatingpoints.Verifast [40] treats floating-point operations as if they were real values, and OpenJML [22] parses programs with floating-point operations, but essentially treats float and double as uninterpreted sorts.
The Java category of verification competition SV-COMP [11] contains a number of benchmarks that make use of floating-point variables.However, the focus of these benchmarks is usually not on arithmetical properties of expressions, but on the completeness of the Java language support.Amongst the participants of SV-COMP 2020, the Symbolic (Java) Pathfinder (SPF) [54] (and various extensions) and the Java Bounded Model Checker (JBMC) [23] support floating-point arithmetic.Besides being limited to exploring the state space up to a bounded depth, their constraint languages do not support quantifiers and abstracting of method calls-which are features that we have used in this work.
Floating-point arithmetic has also been formalized in several interactive theorem provers [15,30,41].While one can prove intricate properties about floating-point programs [13,14,37], proofs using interactive provers are to a large part manual and require significant expertise.
Abstract interpretation based techniques can show the absence of special values in floating-point code fully automatically, and several abstract domains which are sound with respect to floating-point arithmetic exist [19,42].While the analysis itself is fully automated, applying it successfully to real-world programs in general requires adaptation to each program analyzed by end-users, e.g. the selection of suitable abstract domains or widening thresholds [12].
Besides showing the absence of special values, recent research has developed static analyses to bound floating-point roundoff errors [25,34,47,51,56].These analyses currently work only for small arithmetic kernels and the tools in particular do not accept programs with objects.
Dynamic analyses generally scale well on real-world programs, but can only identify bugs (when given failure-triggering input), rather than proving correctness for all possible inputs.Executing a floating-point program together with a higherprecision one allows one to find inputs which cause large roundoff errors [10,20,43].Ariadne [6] uses a combination of symbolic execution, real-valued SMT solving and testing to find inputs that trigger floating-point exceptions, including overflow and invalid operations.Our work subsumes this approach as the SMT solvers that we use can directly generate counterexamples, but more importantly, KeY is able to prove the absence of such exceptions.

Conclusion
By joining the forces of rule-based deduction and SAT-based SMT solving, we presented the first working floating-point support in a deductive verification tool for Java and by that close a remaining gap in KeY to now support full sequential Java.Our evaluation shows that for specifications dealing with value ranges and absence of NaN and infinity, our approach can verify realistic programs within a reasonable time frame.We observe that the MathSAT and CVC4 solver's floatingpoint support scales sufficiently for our benchmarks, as long as the queries do not include any quantifiers, and that our axiomatized approach for handling transcendental functions is best realized using calculus rules in KeY's internal reasoning engine.While our work is implemented within the KeY verifier, we expect our approach to be portable to other verifiers.

A.1 Axioms for Transcendental Functions in KeY
Here we present the axioms that we implemented to prove properties for benchmarks with transcendental functions: -If arg is NaN or an infinity, then sin(arg) is NaN.
-If arg is zero, then the result of sin(arg) is a zero with the same sign as arg.
if arg is not NaN or infinity, then the returned value of sin is between −1.0 and 1.0.if arg is not NaN or infinity, then the returned value of sin is not NaN.
-If arg is NaN or an infinity, then cos(arg) is NaN.
if arg is not NaN or infinity, then the returned value of cos is between −1.0 and 1.0.if arg is not NaN or infinity, then the returned value of cos is not NaN.
-If arg is NaN or an infinity, then atan(arg) is NaN.
-If arg is zero, then the result of atan(arg) is a zero with the same sign as arg.
if arg is not NaN, then the returned value of atan is between −π/2 and π/2.
In our Evaluation we showed that handling square root axiomatically can improve performance.Here is the list of axioms we used for this function: -If arg is NaN or less than zero, then sqrt(arg) is NaN.
-If arg is positive infinity, then sqrt(arg) is positive infinity.
-If arg is positive zero or negative zero, then sqrt(arg) is the same as arg.
-If arg is not NaN and greater or equal to zero, then sqrt(arg) is not NaN.
-If arg is not infinity and is greater than one then sqrt(arg) < arg.

A.2 Detailed Evaluation Results
Here we present the tables that did not fit in the main body of the paper and contain detailed results of our experiments.In each table we show the number of goals per benchmark that each solver can validate or invalidate, together with the average and maximum time (in seconds) needed.'TO' in the maximum column denotes that at least one goal timed out.The goals resulting in timeout were excluded from the computation of the average time.
Table 3 shows the results for benchmarks with valid contracts with the quantified formulas included in the SMT translations.We have summarized this table in row 1 of Table 2.
Table 4 demonstrates the results for the same benchmarks when the quantified axioms are removed form the SMT translations which is summarized in row 2 of Table 2.
Table 5 shows the detailed results of the experiments with benchmarks with invalid contracts, when the quantified formulas are included in and removed form the SMT translations.This results are summarized in rows 3 and 4 of Table 2   The first two sections of Table 7 show the results from applying the two approaches for handling transcendental functions in sections 3.2.A and 3.2.B, using the default SMT translation in KeY.The last section of the table depicts the results of applying the approach in Section 3.2.B, while the quantified formulas are removed from the SMT translations.This table is summarized in rows 5 and 6 of Table 2 Table 6 shows the detailed results of conducting the experiment on the benchmarks containing sqrt, comparing the approach from Section 3.2.B (adding the necessary axioms, resp.taclet rules) to using the square root implemented in SMT solvers (fp.sqrt), when the quantified formulas are removed from the SMT translations.We have summarized the results of these experiments in rows 7 and 8 of Table 2 Sensitivity to Contract Variations We conducted an experiment on our Rectangle.scalebenchmark to assess the solver's sensitivity to various changes, applied to the benchmark's contract or its implementation.We considered the following modifications: -v0: is the original version of the benchmark (Listing 1.1 using the second contract) and our baseline; -v1: reduces the number of classes involved to two, while keeping the same functionality; -v2: reduces the number of classes involved to one, while keeping the same functionality; -v3: modifies v2 such that variable bounds in the precondition become more "complicated" in terms of longer fractional parts (e.g. the bounds for arg2 become [3.0000001,-6.4000000003] instead of [3.0001, -6.4000003]); -v4: simplifies the mathematical expression of v2 (less arithmetic operations) -v5: modifies v3 such that arg2 has a tighter bound, i.e. the interval width is smaller -v6: modifies v2 such that arg2 has a larger bound, i.e. the interval width is larger -v7: modifies v2 such that only arg2 has a "complicated" bound -v8: modifies v0 such that arg2 has a tighter bound  when the number of classes is reduced, and also when the number of arithmetic operations is reduced.Z3 further seems to be sensitive to whether variable bounds are "complicated" or not, whereas CVC4 is not.We obtain a somewhat surprising result when arg2 has a tighter bound.While Z3's performance improves, CVC4 validates two goals less.On the other hand, increasing the bounds on arg2 does not seem to make a difference.It seems that arg2 is the bottleneck for this benchmark; when only arg2 has a "complicated" input interval, CVC4 proves less goals.Finally, constraining arg2 in the original benchmark more tightly allows CVC4 to validate all goals but Z3's performance remains unaffected.
With the quantified formulas removed from the SMT-LIB translation, we can see that CVC4's results, in terms of number of goals validated, are the same as before, and Z3 performs much better than before.MathSAT is able to validate all goals of all versions.
In summary, solvers' performance seems to be sensitive to slight innocuous looking changes such as the number of classes involved and variable bounds.This behavior could be potentially exploited by e.g.relaxing a variable's bounds.

Table 1 :
Benchmark details and KeY automode statistics, time is measured in seconds more closed proofs and shorter running times is that for disproving a property, not considering all possible quantifier instantiations may lead to spurious counterexamples, i.e., false positives.

Table 2 :
summarizes the results of our experiments.Column 4 shows the number of expected valid or invalid goals for all benchmarks.For each solver we show the number of goals that each solver can validate or invalidate, together with the average time (in seconds) needed.The goals resulting in timeout were Summary of valid / invalid goals correctly decided and average running times of each solver for the SMT translations with and without quantified axioms .

Table 3 :
Summary of valid goals proved and running times of each solver for the SMT translations with quantified axioms

Table 4 :
Summary of valid goals proved and running times of each solver for the SMT translations without quantified axioms

Table 6 :
Summary statistics for benchmarks containing the square root function, with quantified formulas removed from the SMT-LIB translation

Table 7 :
. Summary statistics with axioms in SMT-LIB translations and as taclet rules in KeY

Table 8
summarizes the results for this experiment.With the quantified forulas included in the SMT translation, Both CVC4 and Z3 are able to prove more goals

Table 8 :
SMT solvers summary statistics for various versions of the Rectangle benchmark with quantified axioms in the SMT translations