Icing: Supporting FastMath Style Optimizations in a Verified Compiler
Abstract
Verified compilers like CompCert and CakeML offer increasingly sophisticated optimizations. However, their deterministic source semantics and strict IEEE 754 compliance prevent the verification of “fastmath” style floatingpoint optimizations. Developers often selectively use these optimizations in mainstream compilers like GCC and LLVM to improve the performance of computations over noisy inputs or for heuristics by allowing the compiler to perform intuitive but IEEE 754unsound rewrites.
We designed, formalized, implemented, and verified a compiler for Icing, a new language which supports selectively applying fastmath style optimizations in a verified compiler. Icing’s semantics provides the first formalization of fastmath in a verified compiler. We show how the Icing compiler can be connected to the existing verified CakeML compiler and verify the endtoend translation by a sequence of refinement proofs from Icing to the translated CakeML. We evaluated Icing by incorporating several of GCC’s fastmath rewrites. While Icing targets CakeML’s source language, the techniques we developed are general and could also be incorporated in lowerlevel intermediate representations.
Keywords
Compiler verification Floatingpoint arithmetic Optimization1 Introduction
Verified compilers formally guarantee that compiled machine code behaves according to the specification given by the source program’s semantics. This stringent requirement makes verifying “endtoend” compilers for mainstream languages challenging, especially when proving sophisticated optimizations that developers rely on. Recent verified compilers like CakeML [38] for ML and CompCert [24] for C have been steadily verifying more of these important optimizations [39, 40, 41]. While the gap between verified compilers and mainstream alternatives like GCC and LLVM has been shrinking, socalled “fastmath” floatingpoint optimizations remain absent in verified compilers.
Fastmath optimizations allow a compiler to perform rewrites that are often intuitive when interpreted as realvalued identities, but which may not preserve strict IEEE 754 floatingpoint behavior. Developers selectively enable fastmath optimizations when implementing heuristics, computations over noisy inputs, or errorrobust applications like neural networks—typically at the granularity of individual source files. The IEEE 754unsound rewrites used in fastmath optimizations allow compilers to perform strength reductions, reorder code to enable other optimizations, and remove some error checking [1, 2]. Together these optimization can provide significant savings and are widelyused in performancecritical applications [12].
Unfortunately, strict IEEE 754 source semantics prevents proving fastmath optimizations correct in verified compilers like CakeML and CompCert. Simple strengthreducing rewrites like fusing the expression \(x * y + z\) into a faster and locallymoreaccurate fused multiplyadd (fma) instruction cannot be included in such verified compilers today. This is because fma avoids an intermediate rounding and thus may not produce exactly the same bitforbit result as the unoptimized code. More sophisticated optimizations like vectorization and loop invariant code motion depend on reordering operations to make expressions available, but these cannot be verified since floatingpoint arithmetic is not associative. Even simple reductions like rewriting \(x  x\) to 0 cannot be verified since the result can actually be NaN (“not a number”) if x is NaN. Each of these cases represent rewrites that developers would often, in principle, be willing to apply manually to improve performance but which can be more conveniently handled by the compiler. Verified compilers’ strict IEEE 754 source semantics similarly hinders composing their guarantees with recent tools designed to improve accuracy of a source program [14, 16, 32], as these tools change program behavior to reduce rounding error. In short, developers today are forced to choose between verified compilers and useful tools based on floatingpoint rewrites.
The crux of the mismatch between verified compilers and fastmath lies in the source semantics: verified compilers implement strict IEEE 754 semantics while developers are intuitively programming against a looser specification of floatingpoint closer to the reals. Developers currently indicate this perspective by passing compiler flags like ffastmath for the parts of their code written against this looser semantics, enabling mainstream compilers to aggressively optimize those components. Ideally, verified compilers will eventually support such loosened semantics by providing an “approximate real” data type and let the developer specify error bounds under which the compiler could freely apply any optimization that stays within bounds. A good interface to tools for analyzing finiteprecision computations [11, 16] could even allow independentlyestablished formal accuracy guarantees to be composed with compiler correctness.
As an initial step toward this goal, we present a pragmatic and flexible approach to supporting fastmath optimizations in verified compilers. Our approach follows the implicit design of existing mainstream compilers by providing two complementary features. First, our approach provides finegrained control over which parts of a program the compiler may optimize under extended floatingpoint semantics. Second, our approach provides flexible extensions to the floatingpoint semantics specified by a set of highlevel rewrites which can be specialized to different parts of a program. The result is a new nondeterministic source semantics which grants the compiler freedom to optimize floatingpoint code within clearly defined bounds.
Under such extended semantics, we verify a set of common fastmath optimizations with the simulationbased proof techniques already used in verified compilers like CakeML and CompCert, and integrate our approach with the existing compilation pipeline of the CakeML compiler. To enable these proofs, we provide various local lemmas that a developer can prove about their rewrites to ensure global correctness of the verified fastmath optimizer. Several challenges arise in the design of this decomposition including how to handle “duplicating rewrites” like distributivity that introduce multiple copies of a subexpression and how to connect contextdependent rewrites to other analyses (e.g., from accuracyverification tools) via rewrite preconditions. Our approach thus provides a rigorous formalization of the intuitive fastmath semantics developers already use, provides an interface for dispatching proof obligations to formal numerical analysis tools via rewrite preconditions, and enables bringing fastmath optimizations to verified compilers.

We introduce an extensible, nondeterministic semantics for floatingpoint computations which allows for fastmath style compiler optimizations with flexible, yet finegrained control in a language we call Icing.

We implement three optimizers based on Icing: a baseline strict optimizer which provably preserves IEEE 754 semantics, a greedy optimizer, which applies any available optimization, and a conditional optimizer which applies an optimization whenever an (optimizationspecific) precondition is satisfied. The code is available at https://gitlab.mpisws.org/AVA/Icing.

We formalize Icing and verify our three different optimizers in HOL4.

We connect Icing to CakeML via a translation from Icing to CakeML source and verify its correctness via a sequence of refinement proofs.
2 The Icing Language
In this section we define the Icing language and its semantics to support fastmath style optimizations in a verified compiler. Icing is a prototype language whose semantics is designed to be extensible and widely applicable instead of focusing on a particular implementation of fastmath optimizations. This allows us to provide a stable interface as the implementation of the compiler changes, as well as supporting different optimization choices in the semantics, depending on the compilation target.
2.1 Syntax
We use the Open image in new window and Open image in new window primitives to show that Icing can be used to express programs beyond arithmetic, while keeping the language simple. Language features like function definitions or general loops do not affect floatingpoint computations with respect to fastmath optimizations and are thus orthogonal.
The Open image in new window scoping annotation implements one of the key features of Icing: floatingpoint semantics are relaxed only for expressions under an Open image in new window scope. In this way, Open image in new window provides finegrained control both for expressions and conditional guards.
2.2 Optimizations as Rewrites

performance and precision improving strength reduction which fuses \(x * y + z\) into an fma instruction (Rewrite 1)

reordering based on realvalued identities, here commutativity, and associativity of \(+, *\), double negation and distributivity of \(*\) (Rewrites 2–5)

simplifying computation based on (assumed) realvalued behavior for computations by removing NaN error checks (Rewrite 6)
A key feature of Icing’s design is that each rewrite can be guarded by a rewrite precondition. We distinguish compiler rewrite preconditions as those that must be true for the rewrite to be correct with respect to Icing semantics. Removing a NaN check, for example, can change the runtime behavior of a floatingpoint program: a previously crashing program may terminate or viceversa. Thus a NaNcheck can only removed if the value can never be a NaN.
In contrast, an application rewrite precondition guards a rewrite that can always be proven correct against the Icing semantics, but where a user may still want finergrained control. By restricting the context where Icing may fire these rewrites, a user can establish endtoend properties of their application, e.g., worstcase roundoff error. The crucial difference is that the compiler preconditions must be discharged before the rewrite can be proven correct against the Icing semantics, whereas the application precondition is an additional restriction limiting where the rewrite is applied for a specific application.
A key benefit of this design is that rewrite preconditions can serve as an interface to external tools to determine where optimizations may be conditionally applied. This feature enables Icing to address limitations that have prevented previous work from proving fastmath optimizations in verified compilers [5] since “The only way to exploit these [floatingpoint] simplifications while preserving semantics would be to apply them conditionally, based on the results of a static analysis (such as FP interval analysis) that can exclude the problematic cases.” [5] In our setting, a static analysis tool can be used to establish an application rewrite precondition, while compiler rewrite preconditions can be discharged during (or potentially after) compilation via static analysis or manual proof.
This design choice essentially decouples the floatingpoint static analyzer from the generalpurpose compiler. One motivation is that the compiler may perform hardwarespecific rewrites, which sourcecodebased static analyzers would generally not be aware of. Furthermore, integrating endtoend verification of these rewrites into a compiler would require it to always run a global static analysis. For this reason, we propose an interface which communicates only the necessary information.
Rewrites which duplicate matched subexpressions, e.g., distributing multiplication over addition, required careful design in Icing. Such rewrites can lead to unexpected results if different copies of the duplicated expression are optimized differently; this also complicates the Icing correctness proof. We show how preconditions additionally enabled us to address this challenge in Sect. 4.
Rewrites currently supported in Icing (\(\circ \in \{+, *\}\))
Name  Rewrite  Precondition  

1  fma introduction  Open image in new window \(\rightarrow \) Open image in new window  application precond. 
2  \(\circ \) associative  Open image in new window \(\rightarrow \) Open image in new window  application precond. 
3  \(\circ \) commutative  Open image in new window \(\rightarrow \) Open image in new window  application precond. 
4  double negation  x welltyped  
5  \(*\) distributive  Open image in new window \(\rightarrow \) Open image in new window  no control dependency on optimization result 
6  NaN check removal  isNaN Open image in new window \(\rightarrow \) Open image in new window  x is not a NaN 
2.3 Semantics of Icing
We define the semantics of Icing programs in Fig. 2 as a bigstep judgment of the form \(( cfg , E, e) \rightarrow {} v\). \( cfg \) is a configuration carrying a list of rewrites (\(s \rightarrow t\)) representing allowed optimizations, and a flag tracking whether optimizations are allowed in the current program fragment under an Open image in new window scope (OptOk). E is the (runtime) execution environment mapping free variables to values and e an Icing expression. The value v is the result of evaluating e under E using optimizations from \( cfg \).
The second key idea of our semantics is that it nondeterministically applies rewrites from the configuration \( cfg \) while evaluating expression e instead of just returning its value tree. In the semantics, we model the nondeterministic choice of an optimization result for a particular value tree v with the relation Open image in new window , where \(( cfg ,v)\) Open image in new window r if either the configuration \( cfg \) allows for optimizations to be applied, and value tree v can be rewritten into value tree r using rewrites from the configuration \( cfg \); or the configuration does not allow for rewrites to be applied, and \(v = r\). Rewriting on value trees reuses several definitions from Sect. 2.2. We add the nondeterminism on top of the existing functions by making the relation Open image in new window pick a subset of the rewrites from the configuration \( cfg \) which are applied to value tree v.
Icing’s semantics allows optimizations to be applied for arithmetic and comparison operations. The rules Unary, Binary, fma, isNaN, and Compare first evaluate argument expressions into value trees. The final result is then nondeterministically chosen from the Open image in new window relation for the obtained value tree and the current configuration. Evaluation of Open image in new window , Open image in new window , and letbindings follows standard textbook evaluation semantics and does not apply optimizations.
Rule Scope models the finegrained control over where optimizations are applied in the semantics. We store in the current configuration \( cfg \) that optimizations are allowed in the (sub)expression e (cfg with OptOk := true).
Example. We illustrate Icing semantics and how optimizations are applied both in syntax and semantics with the example in Fig. 3. The example first translates the input list by 3.0 using a Open image in new window , and then computes the norm of the translated list with Open image in new window and Open image in new window .
We want to apply \(x+y \rightarrow y+x\) (commutativity of \(+\)) and Open image in new window introduction ( Open image in new window ) to our example program. Depending on their order the function Open image in new window will produce different results.
If we first apply commutativity of \(+\), and then Open image in new window introduction, all \(+\) operations in our example will be commuted, but no Open image in new window introduced as the Open image in new window introduction syntactically relies on the expression having the structure \( x * y + z\) where x, y, z can be arbitrary. In contrast, if we use the opposite order of rewrites, the second line will be replaced by Open image in new window and commutativity is only applied in the first line.
To illustrate how the semantics applies optimizations, we run the program on the 2D unit vector ( Open image in new window ) in a configuration that contains both rewrites. Consequently the Open image in new window application can produce Open image in new window , Open image in new window , \(\ldots \) Where the terms Open image in new window correspond to the value trees representing the addition of Open image in new window and Open image in new window .
The first result is the result of evaluating the initial program without any rewrites, the second result corresponds to syntactically optimizing with commutativity of \(+\) and then Open image in new window introduction, and the third corresponds to using the opposite order syntactically. The last two results can only be results of semantic optimizations as commutativity and Open image in new window introduction are applied to some intermediate results of Open image in new window , but not all. There is no syntactic application of commutativity and Open image in new window introduction leading to such results.
3 Modelling Existing Compilers in Icing
Having defined the syntax and semantics of Icing, we next implement and prove correct functions which model the behavior of previous verified compilers, like CompCert or CakeML, and the behavior of unverified compilers, like GCC or Clang, respectively. For the former, we first define a translator of Icing expressions which preserves the IEEE 754 strict meaning of its input and does not allow for any further optimizations. Then we give a greedy optimizer that unconditionally optimizes expressions, as observed by GCC and Clang.
3.1 An IEEE 754 Preserving Translator
The Icing semantics nondeterministically applies optimizations if they are added to the configuration. However, when compiling safetycritical code or after applying some syntactic optimizations, one might want to preserve the strict IEEE 754 meaning of an expression.
To make sure that the behavior of an expression cannot be further changed and thus the expression exhibits strict IEEE 754 compliant behavior, we have implemented the function Open image in new window , which essentially disallows optimizations by replacing all optimizable expressions opt: e’ with nonoptimizable expressions e’. Correctness of Open image in new window shows that (a) no optimizations can be applied after the function has been applied, and (b) evaluation is deterministic. We have proven these properties as separate theorems.
3.2 A Greedy Optimizer
Next, we implement and prove correct an optimizer that mimics the (observed) behavior of GCC and Clang as closely as possible. The optimizer applies fma introduction, associativity and commutativity greedily. All these rewrites only have an application rewrite precondition which we instantiate to True to apply the rewrites unconstrained.
Greedy optimization is implemented in the function Open image in new window which applies the rewrites in Open image in new window in a bottomup traversal to expression Open image in new window . In combination with the greedy optimizer our finegrained control (using Open image in new window annotations) allows the enduser to control where optimizations can be applied.
We have shown correctness of Open image in new window with respect to Icing semantics, i.e., we have shown that optimizing greedily gives the same result as applying the greedy rewrites in the semantics:^{1}
Theorem 1
Open image in new window is correct
Let E be an environment, v a value tree and \( cfg \) a configuration.
If \(\,( cfg , E, \) Open image in new window \()\rightarrow {}v\,\) then \(\,( cfg \, with \) Open image in new window \(, E, \) \()\rightarrow {}v\).
Proving Theorem 1 without any additional lemmas is tedious as it requires showing correctness of a single optimization in the presence of other optimizations and dealing with the bottomup traversal applying the optimization at the same time. Thus we reduce the proof of Theorem 1 to proving each rewrite separately and then chaining together these correctness proofs. Lemma 1 shows that applications of the function Open image in new window can be chained together in the semantics. This also means that adding, removing, or reordering optimizations simply requires changing the list of rewrites, thus making Icing easy to extend.
Lemma 1
Open image in new window is compositional
Let e be an expression, v a value tree, \(s \rightarrow t\) a rewrite, and \( rws \) a set of rewrites.
If the rewrite \(s \rightarrow t\) can be correctly simulated in the semantics, and list \( rws \) can be correctly simulated in the semantics, then the list of rewrites \((s \rightarrow t)\, {:}{:}\, rws \) can be correctly simulated in the semantics.
4 A Conditional Optimizer
We have implemented an IEEE 754 optimizer which has the same behavior as CompCert and CakeML, and a greedy optimizer with the (observed) behavior of GCC and Clang. The finegrained control of where optimizations are applied is essential for the usability of the greedy optimizer. However, in this section we explain that the control provided by the Open image in new window annotation is often not enough. We show how preconditions can be used to provide additional constraints on where rewrites can be applied, and sketch how preconditions serve as an interface between the compiler and external tools, which can and should discharge them.
We observe that in many cases, whether an optimization is acceptable or not can be captured with a precondition on the optimization itself, and not on every arithmetic operation separately. One example for such an optimization is removal of NaN checks as a check for a NaN should only be removed if the check never succeeds.
We argue that both application and compiler rewrite preconditions should be discharged by external tools. Many interesting preconditions for a rewrite depend on a global analysis. Running a global analysis as part of a compiler is infeasible, as maintaining separate analyses for each rewrite is not likely to scale. We thus propose to expose an interface to external tools in the form of preconditions.
We implement this idea in the conditional optimizer Open image in new window that supports three different applications of fastmath optimizations: applying optimizations Open image in new window unconstrained ( Open image in new window ), applying optimizations if precondition Open image in new window is true ( Open image in new window ), and applying optimizations under the assumptions generation by function Open image in new window which should be discharged externally ( Open image in new window ). When applying Open image in new window , Open image in new window checks whether precondition Open image in new window is true before optimizing, whereas for Open image in new window the propositions returned by Open image in new window are assumed, and should then be discharged separately by a static analysis or a manual proof.
Correctness of Open image in new window relates syntactic optimizations to applying optimizations in the semantics. Similar to Open image in new window , we designed the proof modularly such that it suffices to prove correct each rewrite individually.
Our optimizer Open image in new window takes as arguments first a list of rewrite applications using uncond, cond, and assume then an expression e. If the list is empty, we have optimizeCond ([], e) = e. Otherwise the rewrite is applied in a bottomup traversal to e and optimization continues recursively. For Open image in new window , the rewrites are applied if they match; for Open image in new window the precondition Open image in new window is checked for the expression being optimized and the rewrites Open image in new window are applied if Open image in new window is true; for Open image in new window , the function Open image in new window is evaluated on the expression being optimized. If execution of Open image in new window fails, no optimization is applied. Otherwise, Open image in new window returns a list of assumptions which are logged by the compiler and the rewrites are applied.
Using the interface provided by preconditions, one can prove external theorems showing additional properties of a compiler run using application rewrite preconditions, and external theorems showing how to discharge compiler rewrite preconditions with static analysis tools or a manual proof. We will call such external theorems meta theorems.
In the following we discuss two possible meta theorems, highlighting key steps required for implementing (and proving) them. A complete implementation consists of two connections: (1) from the compiler to rewrite preconditions and (2) from rewrite preconditions to external tools. We implement (1) independently of any particular tool. A complete implementation of (2) is out of scope of this paper; meta theorems generally depend on global analyses which are orthogonal to designing Icing, but several external tools already provide functionality that is a close match to our interface and we sketch possible connections below. We note that for these meta theorems, Open image in new window should track the context in which an assumption is made and use the context to express assumptions as local program properties. Our current Open image in new window implementation does not collect this contextual information yet, as this information at least partially depends on the particular meta theorems desired.
4.1 A Logging Compiler for NaN Special Value Checks
We show how a meta theorem can be used to discharge a compiler rewrite precondition on the example of removing a NaN check. Removing a NaN check, in general, can be unsound if the check could have succeeded. Inferring statically whether a value can be a NaN special value or not requires either a global static analysis, or a manual proof on all possible executions.
Preconditions are our interface to external tools. For NaN check removal, we implement a function Open image in new window that returns the assumption that no NaN special value can be the result of evaluating the argument expression e. Function Open image in new window could then be used as part of an Open image in new window rule for Open image in new window . We prove a strengthened correctness theorem for NaN check removal, showing that if the assumption returned by Open image in new window is discharged externally (i.e. by the enduser or via static analysis), then we can simulate applying NaN check removal syntactically in Icing semantics without additional sideconditions.
The assumption from Open image in new window is additionally returned as the result of Open image in new window since it is faithfully assumed when optimizing. Such assumptions can be discharged by static analyzers like Verasco [22], or Gappa [17].
4.2 Proving Roundoff Error Improvement
Rewrites like associativity and distributivity change the results of floatingpoint programs. One way of capturing this behavior for a single expression is to compute the roundoff error, i.e. the difference between an idealized realvalued and a floatingpoint execution of the expression.
To compute an upper bound on the roundoff error, various formally verified tools have been implemented [3, 17, 30, 37]. A possible meta theorem is thus to show that applying a particular list of optimizations does not increase the roundoff error of the optimized expression but only decreases or preserves it. The meta theorem for this example would show that (a) all the applied syntactic rewrites can be simulated in the semantics and (b) the worstcase roundoff error of the optimized expression is smaller or equal to the error of the input expression. Our development already proves (a) and we sketch the steps necessary to show (b) below.
We can leverage these roundoff error analysis tools as application preconditions in a Open image in new window rule, checking whether a rewrite should be applied or not in Open image in new window . For a particular expression Open image in new window , an application precondition ( Open image in new window ) would return true if applying rewrite Open image in new window does not increase the roundoff error of Open image in new window .
Theorem 2
Open image in new window decreases roundoff error
(\( cfg , E, \) Open image in new window ) \(\rightarrow \) v \(\Longrightarrow \)
(\( cfg \) Open image in new window , E, e) \(\rightarrow \) v \(\wedge \)
Implementing Open image in new window requires computing a roundoff error for expression Open image in new window and one for Open image in new window rewritten with Open image in new window and returning True if and only if the roundoff error has not increased by applying the rewrite. Proving the theorem would require giving a realvalued semantics for Icing, connecting Icing’s semantics to the semantics of the roundoff error analysis tool, and a global range analysis on the Icing programs, which can be provided by Verasco or Gappa.
4.3 Supporting Distributivity in Open image in new window
The rewrites considered up to this point do not duplicate any subexpressions in the optimized output. In this section, we consider rewrites which do introduce additional occurrences of subexpressions, which we dub duplicative rewrites. Common duplicative rewrites are distributivity of \(*\) with \(+\) (\(x * (y + x) \leftrightarrow x * y + x * z\)) and rewriting a single multiplication into multiple additions (\(x * n \leftrightarrow \sum _{i=1}^{n} x\)). Here we consider distributivity as an example. A compiler might want to use this optimization to apply further strength reductions or Open image in new window introduction.
The main issue with duplicative rewrites is that they add new occurrences of a matched subexpression. Applying (\(x * (y + z) \rightarrow x * y + x * z\)) to Open image in new window returns Open image in new window . The values for the two occurrences of Open image in new window may differ because of further optimizations applied to only one of it’s occurrences.
Any correctness proof for such a duplicative rewrite must match up the two (potentially different) executions of Open image in new window in the optimized expression ( Open image in new window ) with the execution of Open image in new window in the initial expression ( Open image in new window ). This can only be achieved by finding a common intermediate optimization (resp. evaluation) result shared by both subexpressions of Open image in new window .
In general, existence of such an intermediate result can only be proven for expressions that do not depend on “eager” evaluation, i.e. which consists of letbindings and arithmetic. We illustrate the problem using a conditional ( Open image in new window ). In Icing semantics, the guard Open image in new window is first evaluated to a value tree Open image in new window . Next, the semantics evaluates Open image in new window to a boolean value b using function Open image in new window . Computing b from Open image in new window loses the structural information of value tree Open image in new window by computing the results of previously delayed arithmetic operations. This loss of information means that rewrites that previously matched the structure of Open image in new window may no longer apply to b.
This is not a bug in the Icing semantics. On the contrary, our semantics makes this issue explicit, while in other compilers it can lead to unexpected behavior (e.g., in GCC’s support for distributivity under fastmath). CakeML, for example, also eagerly evaluates conditionals and similarly loses structural information about optimizations that otherwise may have been applied. Having lazy conditionals in general would only “postpone” the issue until eager evaluation of the conditional expression for a loop is necessary.
An intuitive compiler precondition that enables proving duplicative rewrites is to forbid any control dependencies on the expression being optimized. However, this approach may be unsatisfactory as it disallows branching on the results of optimized expressions and requires a verified dependency analysis that must be rerun or incrementally updated after every rewrite, and thus could become a bottleneck for fastmath optimizers. Instead, in Icing we restrict duplicative rewrites to only fire when pattern variables are matched against program variables, e.g., pattern variables a, b, c only match against program variables Open image in new window . This restriction to only matching letbound variables is more scalable, as it can easily be checked syntactically, and allows us to loosen the restriction on controlflow dependence by simply letbinding subexpressions as needed.
5 Connecting to CakeML
We have shown how to apply optimizations in Icing and how to use it to preserve IEEE 754 semantics. Next, we describe how we connected Icing to an existing verified compiler by implementing a translation from Icing source to CakeML source and showing an equivalence theorem.^{2} The translation function Open image in new window maps Icing syntax to CakeML syntax. We highlight the most interesting cases. The translations of Open image in new window , Open image in new window , Open image in new window relate an Icing execution to a predefined function from the CakeML standard library. We show separate theorems relating executions of list operations in Icing to CakeML closures of library functions. The predicate Open image in new window is implemented as Open image in new window . The predicate is true in Icing semantics, if and only if Open image in new window is a NaN special value. Recall that floatingpoint NaN values are incomparable (even to themselves) and thus we implement Open image in new window with an equality check.
6 Related Work
Verified Compilation of FloatingPoint Programs. CompCert [25] uses a constructive formalization of IEEE 754 arithmetic [6] based on Flocq [7] which allows for verified constant propagation and strength reduction optimizations for divisions by powers of 2 and replacing \(x \times 2\) by \(x + x\). The situation is similar for CakeML [38] whose floatingpoint semantics is based on HOL’s [19, 20]. With Icing, we propose a semantics which allows important floatingpoint rewrites in a verified compiler by allowing users to specify a larger set of possible behaviors for their source programs. The precondition mechanism serves as an interface to external tools. While Icing is implemented in HOL, our techniques are not specific to higherorder logic or the details of CakeML and we believe that an analog of our “verified fastmath” approach could easily be ported to CompCert.
The Alive framework [27] has been extended to verify floatingpoint peephole optimizations [29, 31]. While these tools relax some exceptional (NaN) cases, most optimizations still need to preserve “bitforbit” IEEE 754 behavior, which precludes valuable rewrites like the fma introductions Icing supports.
Optimization of FloatingPoint Programs. ‘Mixedprecision tuning’ can increase performance by decreasing precision at the expense of accuracy, for instance from double to single floatingpoint precision. Current tools [11, 13, 16, 35], ensure that a userprovided error bound is satisfied either through dynamic or static analysis. In this work, we consider only uniform 64bit floatingpoint precision, but Icing’s optimizations are equally applicable to other precisions. Optimizations such as mixedprecision tuning are, however, out of scope of a compiler setting, as they require error bound annotations for kernel functions.
Spiral [33] uses realvalued linear algebra identities for rewriting at the algorithmic level to choose a layout which provides the best performance for a particular platform, but due to operation reordering is not IEEE 754 semantics preserving. Herbie [32] optimizes for accuracy, and not for performance by applying rewrites which are mostly based on realvalued identities. The optimizations performed by Spiral and Herbie go beyond what traditional compilers perform, but they fit our view that it is sometimes beneficial to relax the strict IEEE 754 specification, and could be considered in an extended implementation of Icing. On the other hand, STOKE’s floatingpoint superoptimizer [36] for x86 binaries does not preserve realvalued semantics, and only provides approximate correctness using dynamic analysis.
Analysis and Verification of FloatingPoint Programs. Static analysis for bounding roundoff errors of finiteprecision computations w.r.t. to a realvalued semantics [15, 17, 18, 28, 30, 37] (some with formal certificates in Coq or HOL), are currently limited to short, mostly straightline functions and require finegrained domain annotations at the function level. Whole program accuracy can be formally verified w.r.t. to a realvalued implementation with substantial user interaction and expertise [34]. Verification of elementary function implementations has also recently been automated, but requires substantial compute resources [23].
On the other hand, static analyses aiming to verify the absence of runtime exceptions like division by zero [4, 10, 21, 22] scale to realistic programs. We believe that such tools can be used to satisfy preconditions and thus Icing would serve as an interface between the compiler and such specialized verification techniques.
The KLEE symbolic execution engine [9] has support for floatingpoint programs [26] through an interface to Z3’s floatingpoint theory [8]. This theory is also based on IEEE 754 and will thus not be able to verify the kind of optimizations that Icing supports.
7 Conclusion
We have proposed a novel semantics for IEEE 754unsound floatingpoint compiler optimizations which allows them to be applied in a verified compiler setting and which captures the intuitive semantics developers often use today when reasoning about their floatingpoint code. Our semantics is nondeterministic in order to provide the compiler the freedom to apply optimizations where they are useful for a particular application and platform—but within clearly defined bounds. The semantics is flexible from the developer’s perspective, as it provides finegrained control over which optimizations are available and where in a program they can be applied. We have presented a formalization in HOL4, implemented three prototype optimizers, and connected them to the CakeML verified compiler frontend. For our most general optimizer, we have explained how it can be used to obtain metatheorems for its results by exposing a welldefined interface in the form of preconditions. We believe that our semantics can be integrated fully with different verified compilers in the future, and bridge the gap between compiler optimizations and floatingpoint verification techniques.
Footnotes
 1.
As in many verified compilers, Icing’s proofs closely follow the structure of optimizations. Achieving this required careful design and many iterations; we consider the simplicity of Icing’s proofs to be a strength of this work.
 2.
We also extended the CakeML source semantics with an fma operation, as CakeML’s compilation currently does not support mapping fma’s to hardware instructions.
References
 1.LLVM language reference manual  fastmath flags (2019). https://llvm.org/docs/LangRef.html#fastmathflags
 2.Semantics of floating point math in GCC (2019). https://gcc.gnu.org/wiki/FloatingPointMath
 3.Becker, H., Zyuzin, N., Monat, R., Darulova, E., Myreen, M.O., Fox, A.: A verified certificate checker for finiteprecision error bounds in Coq and HOL4. In: 2018 Formal Methods in Computer Aided Design (FMCAD), pp. 1–10. IEEE (2018)Google Scholar
 4.Blanchet, B., et al.: A static analyzer for large safetycritical software. In: PLDI (2003)CrossRefGoogle Scholar
 5.Boldo, S., Jourdan, J.H., Leroy, X., Melquiond, G.: A formallyverified c compiler supporting floatingpoint arithmetic. In: 2013 21st IEEE Symposium on Computer Arithmetic (ARITH), pp. 107–115. IEEE (2013)Google Scholar
 6.Boldo, S., Jourdan, J.H., Leroy, X., Melquiond, G.: Verified compilation of floatingpoint computations. J. Autom. Reasoning 54(2), 135–163 (2015)MathSciNetCrossRefGoogle Scholar
 7.Boldo, S., Melquiond, G.: Flocq: a unified library for proving floatingpoint algorithms in Coq. In: 19th IEEE International Symposium on Computer Arithmetic, ARITH, pp. 243–252 (2011). https://doi.org/10.1109/ARITH.2011.40
 8.Brain, M., Tinelli, C., Ruemmer, P., Wahl, T.: An automatable formal semantics for IEEE754 floatingpoint arithmetic. Technical report (2015). http://smtlib.org/papers/BTRW15.pdf
 9.Cadar, C., Dunbar, D., Engler, D.: KLEE: unassisted and automatic generation of highcoverage tests for complex systems programs. In: OSDI (2008)Google Scholar
 10.Chen, L., Miné, A., Cousot, P.: A sound floatingpoint polyhedra abstract domain. In: Ramalingam, G. (ed.) APLAS 2008. LNCS, vol. 5356, pp. 3–18. Springer, Heidelberg (2008). https://doi.org/10.1007/9783540893301_2CrossRefGoogle Scholar
 11.Chiang, W.F., Baranowski, M., Briggs, I., Solovyev, A., Gopalakrishnan, G., Rakamarić, Z.: Rigorous floatingpoint mixedprecision tuning. In: Symposium on Principles of Programming Languages (POPL), pp. 300–315. ACM (2017)Google Scholar
 12.Corden, M., Kreitzer, D.: Consistency of floatingpoint results using the Intel compiler. Technical report, Intel Corporation (2010)Google Scholar
 13.Damouche, N., Martel, M.: Mixed precision tuning with salsa. In: PECCS, pp. 185–194. SciTePress (2018)Google Scholar
 14.Damouche, N., Martel, M., Chapoutot, A.: Intraprocedural optimization of the numerical accuracy of programs. In: Núñez, M., Güdemann, M. (eds.) FMICS 2015. LNCS, vol. 9128, pp. 31–46. Springer, Cham (2015). https://doi.org/10.1007/9783319194585_3CrossRefzbMATHGoogle Scholar
 15.Darulova, E., Izycheva, A., Nasir, F., Ritter, F., Becker, H., Bastian, R.: Daisy  framework for analysis and optimization of numerical programs (tool paper). In: Beyer, D., Huisman, M. (eds.) TACAS 2018. LNCS, vol. 10805, pp. 270–287. Springer, Cham (2018). https://doi.org/10.1007/9783319899602_15CrossRefGoogle Scholar
 16.Darulova, E., Sharma, S., Horn, E.: Sound mixedprecision optimization with rewriting. In: ICCPS (2018)Google Scholar
 17.De Dinechin, F., Lauter, C.Q., Melquiond, G.: Assisted verification of elementary functions using Gappa. In: ACM Symposium on Applied Computing, pp. 1318–1322. ACM (2006)Google Scholar
 18.Goubault, E., Putot, S.: Static analysis of finite precision computations. In: Jhala, R., Schmidt, D. (eds.) VMCAI 2011. LNCS, vol. 6538, pp. 232–247. Springer, Heidelberg (2011). https://doi.org/10.1007/9783642182754_17CrossRefGoogle Scholar
 19.Harrison, J.: Floating point verification in HOL. In: Thomas Schubert, E., Windley, P.J., AlvesFoss, J. (eds.) TPHOLs 1995. LNCS, vol. 971, pp. 186–199. Springer, Heidelberg (1995). https://doi.org/10.1007/3540602755_65CrossRefGoogle Scholar
 20.Harrison, J.: Floatingpoint verification. In: Fitzgerald, J., Hayes, I.J., Tarlecki, A. (eds.) FM 2005. LNCS, vol. 3582, pp. 529–532. Springer, Heidelberg (2005). https://doi.org/10.1007/11526841_35CrossRefGoogle Scholar
 21.Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009). https://doi.org/10.1007/9783642026584_52CrossRefGoogle Scholar
 22.Jourdan, J.H.: Verasco: a formally verified C static analyzer. Ph.D. thesis, Université Paris Diderot (Paris 7), May 2016Google Scholar
 23.Lee, W., Sharma, R., Aiken, A.: On automatically proving the correctness of math.h implementations. In: POPL (2018)Google Scholar
 24.Leroy, X.: Formal certification of a compiler backend, or: programming a compiler with a proof assistant. In: 33rd ACM Symposium on Principles of Programming Languages, pp. 42–54. ACM Press (2006)Google Scholar
 25.Leroy, X.: A formally verified compiler backend. J. Autom. Reasoning 43(4), 363–446 (2009). http://xavierleroy.org/publi/compcertbackend.pdfMathSciNetCrossRefGoogle Scholar
 26.Liew, D., Schemmel, D., Cadar, C., Donaldson, A.F., Zähl, R., Wehrle, K.: Floatingpoint symbolic execution: a case study in nversion programming. In: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press (2017)Google Scholar
 27.Lopes, N.P., Menendez, D., Nagarakatte, S., Regehr, J.: Provably correct peephole optimizations with alive. In: PLDI (2015)Google Scholar
 28.Magron, V., Constantinides, G., Donaldson, A.: Certified roundoff error bounds using semidefinite programming. ACM Trans. Math. Softw. 43(4), 1–34 (2017)MathSciNetCrossRefGoogle Scholar
 29.Menendez, D., Nagarakatte, S., Gupta, A.: AliveFP: automated verification of floating point based peephole optimizations in LLVM. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 317–337. Springer, Heidelberg (2016). https://doi.org/10.1007/9783662534137_16CrossRefGoogle Scholar
 30.Moscato, M., Titolo, L., Dutle, A., Muñoz, C.A.: Automatic estimation of verified floatingpoint roundoff errors via static analysis. In: Tonetta, S., Schoitsch, E., Bitsch, F. (eds.) SAFECOMP 2017. LNCS, vol. 10488, pp. 213–229. Springer, Cham (2017). https://doi.org/10.1007/9783319662664_14CrossRefGoogle Scholar
 31.Nötzli, A., Brown, F.: LifeJacket: verifying precise floatingpoint optimizations in LLVM. In: Proceedings of the 5th ACM SIGPLAN International Workshop on State of the Art in Program Analysis, pp. 24–29. ACM (2016)Google Scholar
 32.Panchekha, P., SanchezStern, A., Wilcox, J.R., Tatlock, Z.: Automatically improving accuracy for floating point expressions. In: Conference on Programming Language Design and Implementation (PLDI) (2015)Google Scholar
 33.Püschel, M., et al.: SPIRAL  a generator for platformadapted libraries of signal processing alogorithms. IJHPCA 18(1), 21–45 (2004)Google Scholar
 34.Ramananandro, T., Mountcastle, P., Meister, B., Lethin, R.: A unified Coq framework for verifying C programs with floatingpoint computations. In: Certified Programs and Proofs (CPP) (2016)Google Scholar
 35.RubioGonzález, C., et al.: Precimonious: tuning assistant for floatingpoint precision. In: SC (2013)Google Scholar
 36.Schkufza, E., Sharma, R., Aiken, A.: Stochastic optimization of floatingpoint programs with tunable precision. In: PLDI (2014)Google Scholar
 37.Solovyev, A., Jacobsen, C., Rakamarić, Z., Gopalakrishnan, G.: Rigorous estimation of floatingpoint roundoff errors with Symbolic Taylor Expansions. In: Bjørner, N., de Boer, F. (eds.) FM 2015. LNCS, vol. 9109, pp. 532–550. Springer, Cham (2015). https://doi.org/10.1007/9783319192499_33CrossRefGoogle Scholar
 38.Tan, Y.K., Myreen, M.O., Kumar, R., Fox, A., Owens, S., Norrish, M.: The verified CakeML compiler backend. J. Funct. Program. 29 (2019)Google Scholar
 39.Tristan, J.B., Leroy, X.: Formal verification of translation validators: a case study on instruction scheduling optimizations. In: Proceedings of the 35th ACM Symposium on Principles of Programming Languages (POPL 2008), pp. 17–27. ACM Press, January 2008Google Scholar
 40.Tristan, J.B., Leroy, X.: Verified validation of lazy code motion. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009), pp. 316–326 (2009)Google Scholar
 41.Tristan, J.B., Leroy, X.: A simple, verified validator for software pipelining. In: Proceedings of the 37th ACM Symposium on Principles of Programming Languages (POPL 2010), pp. 83–92. ACM Press (2010)Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.