Keywords

figure a

1 Introduction

Verifying compiler optimizations is important to ensure reliability of the software ecosystem. Various frameworks have been proposed to verify optimizations of industrial compilers. Among them, Alive [12] is a tool for verifying peephole optimizations of LLVM that has been successfully adopted by compiler developers. Since it was released, Alive has helped developers find dozens of bugs.

Figure 1 shows the structure of Alive. An optimization pattern of interest written in a domain-specific language is given as input. Alive parses the input, and encodes the behavior of the source and target programs into logic formulas in the theory of quantified bit-vectors and arrays. Finally, several proof obligations are created from the encoded behavior, and then checked by an SMT solver.

Alive relies on the following three-fold trust base. Firstly, the semantics of LLVM’s intermediate representation and SMT expressions. Secondly, Alive’s verification condition generator. Finally, the SMT solver used to discharge proof obligations. None of these are formally verified, and thus an error in any of these may result in an incorrect answer.

Fig. 1.
figure 1

The structure of Alive and AliveInLean

To address this problem, we introduce AliveInLean, a formally verified peephole optimization verifier for LLVM. AliveInLean is written in Lean [14], an interactive theorem proving language. Its semantics of LLVM IR (Intermediate Representation) and SMT expressions are rigorously tested using Lean’s metaprogramming language [5] and system library. AliveInLean’s verification condition generator is formally verified in Lean.

Using AliveInLean requires less human effort than directly proving the optimizations on formal frameworks thanks to automation given by SMT solvers. For example, verifying the correctness of a peephole optimization on a formal framework requires more than a hundred lines of proofs [15]. However, the correctness of AliveInLean relies on the correctness of the used SMT solver. To counteract the dependency on SMT solvers, proof obligations can be cross-checked with multiple SMT solvers. Moreover, there is substantial work towards making SMT solvers generate proof certificates [2, 3, 6, 7].

AliveInLean is a proof of concept. It currently does not support all operations that Alive does like, e.g., memory-related operations. However, AliveInLean supports all integer peephole optimizations, which is already useful in practice as most bugs found by Alive were in integer optimizations [12].

2 Overview

We give an overview of AliveInLean’s features from a user’s perspective.

Verifying Optimizations. AliveInLean reads optimization(s) from a file and checks their correctness. A user writes an optimization of interest in a DSL with similar syntax to that of LLVM IR:

figure b

This example transformation corresponds to rewriting (%a & %b) + (%a | %b) to %a + %b, given 4-bits integers %a and %b. The last variable %r, or root variable, is assumed to be the return value of the programs. AliveInLean encodes the behavior of each program and generates verification conditions (VCs). Finally, AliveInLean calls Z3 to discharge the VCs.

Proving Useful Properties. AliveInLean can be used as a formal framework to prove lemmas using interactive theorem proving. This is helpful when a user wants to show a property of a program which is hard to represent as a transformation.

For example, one may want to prove that the divisor of udiv (unsigned division) is never poisonFootnote 1 if it did not raise undefined behavior (UB). The lemma below states this in Lean. This lemma says that the divisor val is never poison if the state st’ after executing the udiv instruction (step) has no UB.

figure c

Testing Specifications. AliveInLean supports random testing of AliveInLean’s specifications (for which no verification is possible). For example, the step function in the above example implements a specification of the LLVM IR, and it can be tested with respect to the behavior of the LLVM compiler. Another trust-base is the specification of SMT expressions, which defines a relation between expressions (with no free variable) and their corresponding concrete values.

These tests help build confidence in the validity of VC generation. Running tests is helpful when a user wants to use a different version of LLVM or modify AliveInLean’s specifications (e.g., adding a new instruction to IR).

3 Verifying Optimizations

In this section we introduce the different components of AliveInLean that work together to verify an optimization.

3.1 Semantics Encoder

Given a program and an initial state, the semantics encoder produces the final state of the program as a set of SMT expressions. The IR interpreter is similar, but works over concrete values rather than symbolic ones. The semantics encoder and the IR interpreter share the same codebase (essentially the LLVM IR semantics). The code is parametric on the type of the program state. For example, the type of undefined behavior can be either initialized as the bool type of Lean or the Bool SMT expression type. Given the type, Lean can automatically resolve which operations to use to update the state using typeclass resolution.

3.2 Refinement Encoder

Given a source program, a transformed program, and an initial state, the refinement encoder emits an SMT expression that encodes the refinement check between the final states of the two programs. To obtain the final states, the semantics encoder is used.

The refinement check proves that (1) the transformed program only triggers UB when the original program does (i.e., UB can only be removed), (2) the root variable of the transformed program is only poison when it is also poison in the original program, and (3) variables’ values in the final states of the two programs are the same when no UB is triggered and the original value is not poison.

3.3 Parser and Z3 Backend

The parser for Alive’s DSL is implemented using Lean’s parser monad and file I/O library. SMT expressions are processed with Z3 using Lean’s SMT interface.

4 Correctness of AliveInLean

We describe how the correctness of AliveInLean is proved. First, we explain the correctness proof of the semantics encoder and the refinement encoder. We show that if the SMT expression encoded by refinement encoder is valid, the optimization is indeed correct. Next, we explain how the trust-base is tested.

4.1 Semantics Encoding

Given an IR interpreter run, a semantics encoder encoder is correct with respect to run if for any IR program and input state, the final program state generated by run and the symbolic state encoded by encoder are equivalent.

To formally define its correctness, an equivalence relation between SMT expressions and concrete values is defined. We say that an SMT expression e and a Lean value \(\nu \) are equivalent, or \(e \sim \nu \), if e has no free variables and it evaluates to \(\nu \). The equivalence relation is inductively defined with respect to the structure of an SMT expression. To deal with free variables, an environment \(\eta \) is defined, which is a set of pairs \((x, \nu )\) where x is a variable and \(\nu \) is a concrete value. \(\eta [\![ e ]\!]\) is an expression with all free variables x replaced with \(\nu \) if \((x,\nu ) \in \eta \).

Next, we define a program state. A state s is defined as (ur) where u is an undefined behavior flag and r is a register file. r is a list of (xv) where x is a variable and v is a value. v is defined as (szip) where sz is its size in bits, i is an integer value, and p is a poison flag.

There are two kinds of states: a symbolic state, and a concrete state. A symbolic state \(s_s\) is a state whose u, i, p are SMT expressions. A concrete state \(s_c\) is a state whose all attributes are concrete values. We say that \(s_s\) and \(s_c\) are equivalent, or \(s_s \sim s_c\), if \(s_s\) has no free variable in its attributes and they are equivalent. \(\eta [\![ s_s ]\!]\) is a symbolic state with the environment \(\eta \) applied to u, i, p.

Now, the correctness of encoder with respect to run is defined as follows. It states that the result of encoder is equivalent to the result of run.

Theorem 1

For all initial states \(s_s\), \(s_c\), program p, and environment \(\eta \) s.t. \(\eta [\![ s_s ]\!] \sim s_c\), we have that \(\eta [\![\texttt {encoder}(p, s_s)]\!] \sim \texttt {run}(p, s_c)\).

4.2 Refinement Encoding

Function check\((p_{src}, p_{tgt}, s_s)\) generates an SMT expression that encodes refinement between the source and target programs, respectively, \(p_{src}\) and \(p_{tgt}\).

We first define refinement between two concrete states. As Alive does, AliveInLean only checks the value of the root variable of a program. Given a root variable r, a concrete state \(s_c'\) refines \(s_c\), or \(s_c' \sqsubseteq s_c\), if (1) \(s_c\) has undefined behavior, or (2) both \(s_c\) and \(s_c'\) have values assigned to r, say v and \(v'\), and \(v = \texttt {poison} \vee v' = v\). A target program \(p_{tgt}\) refines program \(p_{src}\) if \(\texttt {run}(p_{tgt}, s_c) \sqsubseteq \texttt {run}(p_{src}, s_c)\) holds for any initial concrete state \(s_c\),.

The correctness of check is stated as follows.

Theorem 2

Given an initial symbolic state \(s_s\), if \(\eta _0[\![\)check\((p_{src}, p_{tgt}, s_s)]\!] \sim true \) for any \(\eta _0\), then for any environment \(\eta \) and initial state \(s_c\) s.t. \(\eta [\![ s_s ]\!] \sim s_c\), we have that \(\texttt {run}(p_{tgt}, s_c) \sqsubseteq \texttt {run}(p_{src}, s_c)\).

This theorem says that if the returned expression of check evaluates to true in any environment, program \(p_{tgt}\) refines program \(p_{src}\).

4.3 Validity of Trust-Base

Testing Specification of SMT Expressions. Specifications of SMT expressions are traversed using Lean’s metaprogramming language and tested. The testing we have done is different from QuickChick [4] because QuickChick evaluates expressions in Coq. The approach cannot be used here because SMT expressions need to be evaluated in an SMT solver (e.g., Z3). Example spec:

figure d

This spec says that if SMT expressions s1, s2 of a bit-vector type (sbitvec) are equivalent to two concrete bit-vector values b1, b2 in Lean (bitvector), an add expression of s1, s2 is equivalent to the result of adding b1 and b2. Function bitvector.add must be called in Lean, so its operands (b1, b2) are assigned random values in Lean. sbitvec.add is translated to SMT’s bvadd expression, and s1 and s2 are initialized as BitVec variables in an SMT solver. The testing function generates an SMT expression with random inputs like the following:

figure e

The size of bitvector (sz) is initialized to 4, and b1, b2 were randomly initialized to 10 (#xA) and 2 (#x2). A specification is incorrect if the generated SMT expression is not valid.

Testing Specification of LLVM IR. Specification of LLVM IR is tested using randomly generated IR programs. IR programs of 5–10 randomly chosen instructions are generated, compiled with LLVM, and ran. The result of the execution of the program is compared with the result of AliveInLean’s IR interpreter.

5 Evaluation

For the evaluation, we used a computer with an Intel Core i5-6600 CPU and 8 GB of RAM, and Z3 [13] for SMT solving. To test whether AliveInLean and Alive give the same result, we used all of the 150 integer optimizations from Alive’s test suite that are supported by AliveInLean. No mismatches were observed.

To test the SMT specification, we randomly generated 10,000 tests for each of the operations (18 bit-vector and 15 boolean). This test took 3 CPU hours.

The LLVM IR specification was tested by running 1,000,000 random IR programs in our interpreter and comparing the output with that of LLVM. This comparison needs to take into account that some programs may trigger UB or yield a poison value, which gives freedom to LLVM to produce a variety of results. These tests took 10 CPU hours overall. Four admitted arithmetic lemmas were tested as well. As a side-effect of the testing, we found several miscompilation bugs in LLVM.Footnote 2

AliveInLeanFootnote 3 consists of 11.9K lines of code. The optimization verifier consists of 2.2K LoC, the specification tester is 1.5K, and the proof has 8.1K lines. It took 3 person-months to implement the tool and prove its correctness.

6 Related Work

We introduce previous work on compiler verification and validation and compare it with AliveInLean. Also, we give an overview on previous work on semantics of compiler intermediate representations (IRs).

6.1 Compiler Verification

Proving Correctness on Formal Semantics. The correctness of compilation can be proved on a formal semantics of a language that is written in a theorem proving language such as Coq. Vellvm [26] is a Coq formalization of the semantics of LLVM IR. CompCert [11] is a verified C compiler written in Coq, and its compilation to assembly languages including x86, PowerPC is proved correct.

However, it is hard to apply this approach to existing industrial compilers because proving correctness of optimizations requires non-trivial effort. Peek [15] is a framework for implementing and verifying peephole optimizations for x86 on CompCert. They implemented 28 peephole optimizations which required 3.3k lines of code and 6.6k lines of proofs (\(\sim \)350 LoC each). Even if this is small compared to the size of CompCert, the burden is non-trivial considering that LLVM has more than 1,000 peephole optimizations [12].

Another problem with this approach is that changing the semantics requires modification of the proof. The semantics of poison and undef value of LLVM is currently not consistent and thus it triggers miscompilations of some programs [10]. Therefore, compiler developers regularly test various undef semantics with existing optimizations, which would be a non-trivial task if correctness proofs had to be manually updated.

Translation Validation and Credible Compilation. In translation validation [18], a pair of an original program and an optimized program is given to a validation tool at compile time to check the correctness of the optimization. Several such tools exist for LLVM [20, 22, 25]. Translation validation is, however, slow, and it cannot tell whether an optimization is correct in general. Consider this optimization:

figure f

If C is a constant, -C can be computed at compile time. However, this optimization is wrong only if C is INT_MIN. To show that compilation is fully correct, translation validation would need to be run for every combination of inputs.

Credible compilation [19], or witnessing compiler [16, 17], is an approach to improve translation validation by accepting witnesses generated by a compiler. Crellvm [8] is a credible compilation framework for LLVM. It requires modifications to the compiler, which makes it harder to apply and maintain.

6.2 Solver-Aided Programming Languages

Proving correctness of optimizations can be represented as a search problem that finds a counter-example for the optimization. Tools like Z3, CVC4 can be used to solve the search problem. Translation of a high-level search problem to the external solver’s input has been considered bug-prone, and frameworks like Rosette [21] and Smten [23] address this issue by providing higher-level languages for describing the search problem. SpaceSearch [24] helps programmers prove the correctness of the description by supporting Coq and Rosette backends from a single specification. AliveInLean provides a stronger guarantee of correctness because translation to SMT expressions is also written in Lean, leaving Lean as the sole trust-base.

6.3 Semantics of Compiler IR

Correctly encoding semantics of compiler IR is important for the validity of a tool. LLVM IR is an SSA-based intermediate representation which is used to represent a program being compiled. LLVM LangRef [1] has an informal definition of the LLVM IR, but there are a few known problems. [10] shows that the semantics of poison and undef values are inconsistent. [9] shows that the semantics of pointer\(\leftrightarrow \)integer casting is inconsistent. AliveInLean supports poison but not undef, following the suggestion from [10]. AliveInLean does not support memory-related operations including load, store, and pointer \(\leftrightarrow \) integer casting.

7 Discussion

AliveInLean has several limitations. As discussed before, AliveInLean does not support memory operations. Correctly encoding the memory model of LLVM IR is challenging because the memory model of LLVM IR is more complex than either a byte array or a set of memory objects [9]. Supporting branch instructions and floating point would help developers prove interesting optimizations. Supporting branches is a challenging job especially when loops are involved.

Maintainability of AliveInLean highly relies on one’s proficiency in Lean. Changing the semantics of an IR instruction breaks the proof, and updating it requires proficiency in Lean. However, we believe that only relevant parts in the proof need to be updated as the proof is modularized.

Alive has features that are absent in AliveInLean. Alive supports defining a precondition for an optimization, inferring types of variables if not given, and showing counter-examples if the optimization is wrong. We leave this as future work.

8 Conclusion

AliveInLean is a formally verified compiler optimization verifier. Its verification condition generator is formally verified with a machine-checked proof. Using AliveInLean, developers can easily check the correctness of compiler optimizations with high reliability. Also, they can use AliveInLean as a formal framework like Vellvm to prove properties of interest in limited cases. The extensive random testing did not find problems in the trust base, increasing its trustworthiness. Moreover, as a side-effect of the IR semantics testing, we found several bugs in LLVM.