Formal Verification of Optimizing Compilers

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10722)


Formally verifying that a compiler, especially an optimizing one, maintains the semantics of its input has been a challenging problem. This paper surveys several of the main efforts in the area and describes recent efforts that target the LLVM compiler infrastructure while taking a novel viewpoint on the problem.

1 Introduction

Formal verification attempts to formally verify a formal system against its formal specifications. It is well known that formal verification is most effective at high levels of abstraction. The verified high-level models, however, have to be transformed into executable code. Hence, A “good” verification effort should include all phases of the translation.

Today’s compilers are often optimizing and perform many modifications to their input. Consequently, it is virtually impossible to create a mapping between the high level system and the code they produce. Yet, it is often the case, as in mission critical code, that it is vital that every step is formally proven.

Active research into the formal verification of optimizing compilers has been going on almost since the introduction of high-level languages and compilation. Generally speaking, a compiler receives a high level code, translates it into some intermediate language (which we call IR for Intermediate Representation), and performs a sequence of IR into IR modifications, say IR\(_1\) to IR\(_n\). The final IR, IR\(_n\), is then translated into a low level code, often machine language. Here we focus on verification of the IR sequence, namely, formally establishing that IR\(_n\) satisfies the specifications described by IR\(_1\). We thus ignore efforts to show that IR\(_1\) implements the high level code, and that the machine code implements IR\(_n\). These, while extremely important, are of a different nature since they deal with two different languages.

We survey the main efforts of accomplishing such a proof. Roughly speaking, there are two directions that have been pursued. One can view the compiler as a translator from one IR\(_i\) into the next. The first approach is to directly verify the translator itself, that is, to formally verify that for every input of the compiler, its output preserves the semantics of the input. While this seems daunting (and it is), many modern compilers are modular, so rather than verifying a formidable monolithic code base, one can verify each module separately, making the task more manageable. Of course, if any of the modules is modified, it should be re-verified.

The second approach was first proposed in [24] in the context of equivalence of LISP code and its translation into an assembly language, and then in [21] in the context of translating Signal code into ADA. The approach suggests that rather than verifying the “translator” (the compiler itself), one verifies that the code produced by each run—“translation”—implements the input code. That is, each run of the compiler is verified separately. In the notation above, for every \(i=2,\ldots ,n\), one shows that IR\(_i\) implements IR\(_{i-1}\). This approach was termed Translation Validation (TV). At first glance this seems to be impossible, after all, equivalence of even “just” context-free languages is undecidable. At a second glance this seems to be inefficient, that the overhead incurred will not be worth the freedom of not having to verify the whole compiler. Both these points turn out to be easily addressed. For the first, one should note that the transformations of the code performed by a compiler are simple and one needs only to establish some trivial properties, such as “if \(x=4\) then after \(x:=x+3\) is performed, \(x=7\).” For the second, it turned out the overhead is minimal and well justifies the effort.

The TV approach has a couple of other advantages. Not only does it alleviate the need to verify a frequently modified moving-target compiler, it also can accommodate compilers that are closed-source as those whose code is proprietary. Moreover, it generates verification conditions (VCs) that can be independently checked by numerous theorem provers. This, in turn, allows for certification of the compiled code.

After the survey we describe our current efforts in applying TV to LLVM. LLVM is a relatively new compiler platform that is being adopted by many operating systems. It is open source, and, unlike many other compilers, each of its “passes” (the code that moves from IR\(_i\) into IR\(_{i+1}\)) is independent from the other. This allows to simplify TV and to apply scant knowledge and understanding of the code to create VCs.

As stated above, the transformations that a compiler performs are rather simple. This is partially due to the known “rule” that the analysis the compiler performs has to be extremely efficient, at most linear (the GCC wiki page, e.g., has as rule 1: “Do not add algorithms with quadratic or worse behavior, ever1.”) However, if one cares more about runtime than about compile-time, then this no longer holds. After all, one may be willing to wait hours or days to optimize a program that is to run frequently (for example, a Domain Name Server). For such, we experimented with using external, possibly slow, more precise static analyzers. Our results are that we can push optimizations even further and get better runtime results when combining TV with such external tools.

2 A Survey

Compilers are rather buggy, and, consequently, so are optimizing compilers. The work in [26] describes a randomized test-case generator that produces C programs to trigger deep compiler bugs, and, as expected, many (325) bugs were found in numerous (11) compilers. Proteus [12] uses [26] to perform randomized link-time optimization testing and uncovered 37 bugs in GCC and LLVM. More than 75% produce mis-compiled code. Those bugs were all reported and fixed.

One approach to verification of optimizing compilers is to verify the “translator” itself, that is, the compiler. This can be done by generating a machine-checkable manual proof. We describe several examples of this approach, some targeted at LLVM. The other approach is TV, and we describe some of the efforts in this direction.

2.1 Compiler Verification

Perhaps the earliest work in compiler verification dates to 1967, when John McCarthy and James Painter proved the correctness of a compiler that translates arithmetic expressions into machine language [17]. Another early work (starting late 1980s) is described in [11], where the entire chain from high level code to machine code was verified using the theorem prover ACL2. Based on [5], optimizing compilers have become a target for research several decades later, in 2000, and the earlier work employed TV.

Cobalt [14]: Cobalt is a domain-specific language for implementing optimizations as guarded rewrite rules, with its generator of proof obligations that is based on temporal logic to reason about data flow analysis (this type of reasoning was proposed in [25].) The idea behind Cobalt is to create verification of simple transformations that can be later used as building blocks when attempting to establish more complex ones.

Consider, for example, the transformation used for constant propagation. Roughly speaking, when a variable, say x, is a constant, then constant propagation is an optimization that replaces every reference to x by that constant. This allows to save the number of allocated registers and detect unreachable code. This transformation can be expressed by the temporal expression whose meaning is explained below:
$$ Stmt (x :=C) ;~ \lnot MayDef (x) ~\mathbf {Until}~ (y:= expr ~\Rightarrow y:= expr [x \leftarrow C]) $$
Generally, in an assignment of the type \(y := expr (x_1, \ldots , x_n)\) the variable on the left-hand-side (y) is defined, and the variables on the righ-hand-side (\(x_1,\ldots , x_n\)) are used. The expression above, which is written in a Cobalt-like syntax, states that if x is defined as the constant C, and x is not re-defined until it is used in an expression that defines y as an \( expr \) which depends on x, then in the definition of y, every reference to x can be safely replaced by the constant C.
Consider the example in Fig. 1 where x is defined in statement 1, is not re-defined in statement 2, and is used in statement 3.
Fig. 1.

Code before and after constant propagation

Using the expression above, at statement 3 we have:
$$ \{\mathtt {stmt 1:}(x :=3)\};~ \lnot MayDef (x))~\mathbf {Until}~ (y:= x+4 ~\Rightarrow y:=7) $$
which justifies the correctness of the transformation in the figure.

CompCert [15]: CompCert is a verified compiler developed by Xavier Leroy and his colleagues. The goal of CompCert is to formally verify an optimizaing compiler whose input is in Clight (a subset of C) and whose output is PowerPC assembly code. Each optimization is verified by Coq [1]. In the terminology above, the code that translates each IR\(_{i+1}\) into IR\(_i\) is verified.

Most optimizations in CompCert are verified by defining a simulation relation, match_state, between the (symbolic) states of the code before and after the optimization. The online repository of CompCert2 provides with numerous examples of verified optimizations, including that of constant propagation.

LLVM3 (Low-Level Virtual Machine) is a relatively new open-source compiler infrastructure that is widely used. There has been considerable effort in verifying its optimizations. Similarly to GCC, LLVM’s IR language uses Single Static Assignment (SSA) [2]. Unlike GCC, the static analysis of LLVM is not “centralized” but rather each optimization is in charge of performing the static analysis it needs. We now review several projects whose goal is to verify LLVM optimizations.

Vellvm [28]: Vellvm (verified LLVM) is a framework that is specific to LLVM’s IR. It includes a formal semantics for the IR language, and a framework to reason about IR to IR transformation, all in Coq. Vellvm covers a wide spectrum of LLVM’s IR, including heap operations and procedure calls. Vellvm was used in [29] to verify a variant of the mem2reg LLVM transformation that translate into the initial SSA and performs some register allocation. (It is interesting to note that Vellvm failed to cover the original mem2reg. We’ll return to this point when discussing witnesses.)

Project Vellvm verifies LLVM optimizations in a method that is similar to CompCert. It handles optimizations in a way the is reminiscent of Cobalt: Each optimization is divided into several micro-steps, that include, for example, a single instruction removal. Using program refinement, each the micro step is proved to be well-formed and to preserve the semantics of the source program. One then composes the proofs of the micro steps in order to obtain a proof for the correctness of the full transformation. (The next section contains formal definitions of refinement relation and compositionality.) Using a clever pipelining mechanism, Vellvm allows to re-cycle proofs, yet, these still to be manually generated in Coq.

Alive [16]: Alive is a domain-specific language that is suitable for writing program optimizations. Alive automatically verifies the transformation. Based on prior work ([26]), the authors of Alive found that the LLVM pass that combines instruction, InstCombine, has numerous bugs (this is a rather tricky pass that has many sub-cases and corner cases.) This InstCombine passed into Alive, and the tool detected numerous bugs, which were then shown to be true bugs in InstCombine (as opposed to modeling mistakes.) Alive creates VCs which is sends to Z3 SMT solver4.

Figure 2 shows an example of an InstCombine optimization in the Alive syntax. The first two lines is the source code, and the last line its optimized target code. There, 32-bit integer x is shifted left, then right, 29 positions. The target replaces the two shifts by a logical and with the decimal constant 7.
Fig. 2.

An InstrCombine transformation written in Alive

Figure 3 shows the resulting query to Z3, which attempts to find an assignment for x such that when shifted left, then right, 29 positions is not equal to its bitwise and with the decimal 7. If Z3 cannot find such an x it reports failure, else it returns some x that is a counterexample to the correctness of the optimization.
Fig. 3.

Z3 query generated by Alive for the example optimization

2.2 Translation Validation

Recall that a compiler receives a source program written in some high-level language, translates it into an Intermediate Representation (IR), and then applies a series of optimizations to the program – starting with classical architecture-independent global optimizations, and then architecture-dependent ones such as instruction scheduling. Typically, these optimizations are performed in several passes where each pass applies a certain type of optimization.

In order to prove that one code translates the other, we introduce of formal model for a system defined by a code so to give a common semantics to both. Here we follow the terminology in [31] and use the formalism of Transition Systems (TS’s). The notion of a target code T being a correct implementation of a source code S is then defined in terms of refinement, stating that every computation of T corresponds to some computation of S with matching values of the corresponding variables.

The intermediate code is a three-address code, most often (in modern compilers) given in SSA . It is described by a flow graph, which is a graph representation of the three-address code. Each node in the flow graph represents a basic block, that is, a sequence of statements that is executed in its entirety and contains no branches. The edges of the graph represent the flow of control.

Transition Systems: In order to present the formal semantics of source and intermediate code we introduce transition systems, TS’s, a variant of the transition systems of [21]. A Transition System \(S = \langle V,\Omega ,\Theta ,\rho \rangle \) is a state machine consisting of:
  • V a set of state variables,

  • \({\Omega \subseteq V}\) a set of observable variables,

  • \({\Theta }\) an initial condition characterizing the initial states of the system, and

  • \({\rho }\) a transition relation, relating a state to its possible successors.

The variables are typed, and a state of a TS is a type-consistent interpretation of the variables. For a state s and a variable \({x\in V}\), we denote by s[x] the value that s assigns to x. The transition relation refers to both unprimed and primed versions of the variables, where the primed versions refer to the values of the variables in the successor states, while unprimed versions of variables refer to their value in the pre-transition state. Thus, e.g., the transition relation may include “\(y' = y + 1\)” to denote that the value of the variable y in the successor state is greater by one than its value in the old (pre-transition) state.

The observable variables are the variables we care about. When comparing two systems, we will require that the observable variables in the two systems match. We require that all variables whose values are printed by the program be identified as an observable variables. If desired, we can also include among the observables the history of external procedure calls for a selected set of procedures.

A computation of a TS is a maximal finite or infinite sequence of states \(\sigma :s_0, s_1, \ldots \ \) starting with a state that satisfies the initial condition such that every two consecutive states are related by the transition relation.

A transition system is deterministic when the observable part of the initial condition uniquely determines the rest of the computation. We restrict our attention to deterministic transition systems and the programs that generate such systems. Thus, to simplify the presentation, we do not consider here programs whose behavior may depend on additional inputs that the program reads throughout the computation. It is straightforward to extend the theory and methods to such intermediate input-driven programs.

Let \({P_{_S}= \langle V_{_S},\Omega _{_S},\Theta _{_S},\rho _{_S}\rangle }\) and \({P_{_T}= \langle V_{_T},\Omega _{_T},\Theta _{_T},\rho _{_T}\rangle }\) be two TS’s, to which we refer as the source and target TS’s, respectively. Such two systems are called comparable if there exists a one-to-one correspondence between the observables of \({P_{_S}}\) and those of \({P_{_T}}\). To simplify the notation, we denote by \({X\in \Omega _{_S}}\) and \({x\in \Omega _{_T}}\) the corresponding observables in the two systems. A source state s is defined to be compatible with the target state t, if s and t agree on their observable parts. That is, \({s[X]=t[x]}\) for every \({x\in \Omega _{_T}}\). We say that \({P_{_T}}\) is a correct translation (refinement) of \({P_{_S}}\) if they are comparable and, for every \({\sigma _{_T}: t_0,t_1,\ldots }\) a computation of \({P_{_T}}\) and every \({\sigma _{_S}: s_0,s_1,\ldots }\) a computation of \({P_{_S}}\) such that \({s_0}\) is compatible with \({t_0}\), then \({\sigma _{_T}}\) is terminating (finite) iff \({\sigma _{_S}}\) is and, in the case of termination, their final states are compatible. Note that here the notion of compatible states implies agreeing on values of all observable variables.

The definition above seems to imply that observable variables should only match at the end of a computation. In fact, we want the output (assuming same input) and at times procedure calls to also match whether or not computations are terminating. We can therefore define a “stopping point” to be the prefix of any computation that ends with either a true termination, an output (write call), or even possibly a procedure call (for the latter, see discussion below). For all such, we consider the true termination or output (or even procedure call) as output-ing the values of all observables, and require that the target is compatible with the source upon those output point.

As for procedure calls, we have some latitude. If a procedure all in target appears in source, then all observable (which include the parameters) must match. However, at time (e.g., when inlining a procedure) the target may not include a procedure call. In such cases, one has to choose the observables and create artificial “termination points” so to be able to check the equality between values of observables. While this may seem tricky to do, in practice it is not, since one usually cares about the final values of variables rather than intermediate ones. To see this, consider, for example, a case of an observable variable X that stores some counter whose final value is \(N^2\) for some input N computed by a loop that adds successive \(2\cdot i + 1\) to X, \(i=0, \ldots , N\). If X is output-ed after every iteration, then the only target that matches the source has to do same, and is therefore similar to the source (but for replacing \(2\cdot i + 1\) by adding 2 to the last incremented value.) If we only care about the final value of X, then there are many optimizations that may occur, and the only thing that matters is the last value of X. So, while X is observable, we only care about its value once set to (presumably) \(N^2\), rather that about all its intermediate values before the final one is defined.

TVI [20]: Translation Validation Infrastructure (TVI) is the first project that implements translation validation for optimizing compilers. TVI is provided with a simulation relation of the form \((\mathsf {PC}_{_S}, \mathsf {pc}_{_T}, \alpha )\), where \(\mathsf {PC}_{_S}\) is a source location (basic block), \(\mathsf {pc}_{_T}\) is a target location, and \(\alpha \) is a conjunctions of equalities that describe relations between variables in \(V_{_T}\) and \(V_{_S}\).

Going back to the example of Fig. 1, we may view all the statements as if in the same basic block, say \(\mathtt{B1}\). There is a single exit from the block, assume it is to \(\mathtt{B2}\). There, the data mapping \(\alpha \) may include
$$\bigl ( \mathtt{B1}, \mathtt{B1}, \bigwedge _{v\in \{x, y, z\}} v=V \bigr )$$
(where the lower case variables are target ones and upper case are source ones) as well as
$$\bigl (\mathtt{B2}, \mathtt{B2}, \bigwedge _{v\in \{x, y, z\}} v=V \ \wedge \ x=3 \ \wedge \ y=5 \ \wedge \ z=7 \bigr )$$
TVI checks that if the simulation relation holds at the beginning of some simple path, it holds at its end.

TVI was implemented on GCC and was successful in validating numerous programs. It was the first true implementation of TV to a real-life optimizing compiler. It has two apparent weaknesses: For one, the simulation relations are manually generated. For another, the lack of invariants (which we’ll see in TVOC) restricts TVI’s power to optimizations that are completely order preserving. In particular, it cannot handle any code motion, including LICM.

TVOC [30]: TVOC, Translation Validation of Optimizing Compilers is a project that originated in NYU in the early 2000’s and headed by Benjamin Goldberg, Amir Pnueli, Lenore Zuck, and later Clark Barrett joined the team and facilitated a direct connection from the VCs (Verification Conditions) produced by the tool to the theorem prover CVC [3]. Yi Fang was the chief architect of the project, and many other students contributed (including, Ying Hu, Ittai Balaban, and Ganna Zaks).

TVOC’s history followed open-source compilers that were eventually closed-sourced, the last in the chain was Intel ORC. The philosophy of TVOC (well justified by the history of eventual closed-sourcing of initially open-source compilers) was that the tool doesn’t have access to the compiler. This allowed to initially depend on information from the compiler (static analysis, information about optimizations performed) and eventually removing this dependence.

The main part of TVOC is that of global optimizations that are, more or less, structure preserving. Roughly speaking, these are optimizations that do not drastically change the ordering of statements. It does allow, for example, for a statement to move in the code (as in LICM), but so that its execution is moved back, or forth, more than a constant number of steps that is independent of the values of variables. The latter occurs when, for example, loops are interchanged or reversed.

Let \({P_{_S}= \langle V_{_S}, \Omega _{_S}, \Theta _{_S}, \rho _{_S}\rangle }\) and \({P_{_T}= \langle V_{_T}, \Omega _{_T}, \Theta _{_T}, \rho _{_T}\rangle }\) be comparable TS’s, where \({P_{_S}}\) is the source and \({P_{_T}}\) is the target. In order to establish that \({P_{_T}}\) is a correct translation of \({P_{_S}}\) for the cases that the structure of \({P_{_T}}\) does not radically differ from the structure of \({P_{_S}}\), a proof rule, Validate is applied [31]. The proof rule Validate is inspired by the computational induction approach ([7]), originally introduced for proving properties of a single program, Rule Validate provides a proof methodology by which one can prove that one program refines another. This is achieved by establishing a control mapping from target to source locations, a data abstraction mapping from source to target variables, and proving that these abstractions are maintained along basic execution paths of the target program.

The proof rule assumes each TS has a cut-point set \({\mathsf{CP}}\). This is a set of blocks that includes the initial and terminal block, as well as at least one block from each of the cycles in the programs’ control flow graph. A simple path is a path connecting two cut-points, and containing no other cut-point as an intermediate node. We assume that there is at most one simple path between every two cut-points. For each simple path leading from \({\mathtt{Bi}}\) to \({\mathtt{Bj}}\), \({\rho _{ij}}\) describes the transition relation between blocks \({\mathtt{Bi}}\) and \({\mathtt{Bj}}\). Typically, such a transition relation contains the condition which enables this path to be traversed, and the data transformation effected by the path. Note that when the path from Bi to Bj passes through blocks that are not in the cut-point set, \({\rho _{ij}}\) is a compressed transition relation that can be computed by the composition of the intermediate transition relation on the path from Bi to Bj.

The main proof rule of TVOC (which can be found in [31] with a soundness proof) calls for:
  1. 1.

    Control abstraction \(\kappa \) that maps target’s control points to source ones, such that the initial and terminal blocks of target map into corresponding ones of source;

  2. 2.

    An invariant over target variables for each basic block of target;

  3. 3.

    A data abstraction \(\alpha \) which is a conjunction of (1) statement stating that source location is at the \(\kappa \)-corresponding location of the target, (2) guarded expressions of the form \(p\rightarrow V=e\) where p is a condition, V is an source variable, and e is an expression over target variables. It is required that for every initial target block \(\mathtt{Bi}\), \(\Theta _{_T}\wedge \Theta _{_S}\rightarrow \alpha \wedge \varphi _i\), that is, that the conjunction of the initial conditions of the source and target implies \(\alpha \) as well as the invariant at \(\mathtt{Bi}\), and, similarly, that for every observable variable \(V\in \Omega _{_S}\) whose target counterpart is v and every terminal target block \(\mathtt{B}\), \(\alpha \) implies that \(V=v\);

  4. 4.

    For each pair of target basic blocks \(\mathtt{Bi}\) and \(\mathtt{Bj}\) such that there is a simple target path from \(\mathtt{Bi}\) into \(\mathtt{Bj}\) (that has no other cutpoint on but for its endpoints), construct a verification condition \(C_{ij}\) that asserts if the assertion \({\varphi _i}\) and the data abstraction \(\alpha \) hold before the transition, and the transition takes place, then after the transition there exist new source variables that reflect the corresponding transition in the source, and the data abstraction and the assertion \({\varphi _j}\) hold in the new state. Hence, \({\varphi _i}\) is used as a hypothesis at the antecedent of the implication \(C_{ij}\). In return, the validator also has to establish that \({\varphi _j}\) holds after the transition. Thus, as part of the verification effort, TVOC confirms that the proposed assertions are indeed inductive and hold whenever the corresponding block is visited.


Following the generation of the verification conditions whose validity implies that the target T is a correct translation of the source program S, it only remains to check that these implications are indeed valid.

Using the example of Fig. 1, using \(\mathtt{B1}\) and \(\mathtt{B2}\) as before. The control abstraction then maps, for each \(i=1,2\), the target \(\mathtt{Bi}\) into the source \(\mathtt{Bi}\). There are no invariants at the entry to \(\mathtt{B1}\), that is, \(\varphi _1=\mathsf {true}\). (There will be, however, an invariant \(\varphi _2\), namely \((x=3) \ \wedge \ (y=5) \ \wedge \ (z=7)\).) If we follow the TVOC literature and denote source variables by upper cases and target ones by lower cases, we obtain the data mapping
$$\alpha :~ (\mathsf {PC}=\kappa (\mathsf {pc}) \ \wedge \ (\mathsf {pc}= 2 ~\rightarrow ~(X=x \ \wedge \ Y=y \ \wedge \ Z=z))$$
where \(\mathsf {PC}\) (resp. \(\mathsf {pc}\)) is the source (resp. target) program counter. The verification condition for the path from \(\mathtt{B1}\) to \(\mathtt{B2}\) iswhich is trivially true.

The approach makes sense only if this validation (as well as the preceding steps of the conditions’ generation) can be done in a fully automatic manner with no user intervention. Indeed, as shown in [6], by performing its own static analysis, TVOC can often compute all that is needed (control mapping, invariants, data mapping, and verification conditions). At times there are several candidates for \(\kappa \) and \(\alpha \). Then the tool uses some heuristics to choose one, and it if fails, it may try others.

TVOC has a separate part to validate loop optimizations such as loop interchange, loop reversal, and tiling. Initially they were constructed using a file (*.l) that seemed to have been kept for ORC debugging purposes. Later, the dependence on this file was replaced by heuristics that guessed which loop optimizations were applied [9], using only the fact that (in ORC) loop optimizations followed global optimizations.

One should note the invariants of TVOC that gave it an additional power then previous methods. These invariants allowed TVOC to deal with what referred to above as “minor reordering” such as LICM. In fact, these invariants play a major role in the LLVM project which is the topic of the next section. In essence, they allow to carry information in between basic blocks. The more precise the invariants are, the more precise the static analysis is, which, in turn, allow for more aggressive optimizations.

TVOC, and tools similar to it that were developed at the time, did not deal with either pointer analysis (in particular, aliasing) or inter-procedural optimizations (such as tail recursion, inter-procedural constant propagation, or inlining). Later [22], a framework for dealing with a certain type of inter-procedural optimization was developed by the creators of TVOC. Yet, the implementation [27] was not performed on the ORC (that, ironically, was no longer open source at the time) bur rather on LLVM.

3 TV for LLVM: Witnessing

As before, a program is described by a transition system \(S = \langle V,\Omega ,\Theta ,\rho \rangle \). We assume that the CFG has a unique \(\mathsf {B}\) basic block with no incoming edges such that \(\Theta \rightarrow \mathsf {B}\), and a unique \(\mathsf {E}\) basic block that has no outdoing edges. (Note that even while a code may have several termination nodes, one can connect them all to a single a \(\mathsf {E}\) basic block so we lose no generality in assuming that there is a single \(\mathsf {E}\) basic block.) All other basic blocks are intermediate. We assume that a program has no direct transition from \(\mathsf {B}\) to \(\mathsf {E}\).

As in TVOC, one associates, with each basic block, a generalized transition relation, describing the effect of executing the block. Here it is assumed that the transition relation of a program is complete; that is, for every non-final state s, there is a state \(s'\) such that \(\rho (s,s')\) holds. We also assume that the transition relation is location-deterministic, in that there is a at most one transition between any two locations. Formally, \([(\rho (s,s') \ \wedge \ \rho (s,s'') \ \wedge \ s'[\mathsf {pc}] = s''[\mathsf {pc}]) \;\Rightarrow \;s'=s'']\) (where \(\mathsf {pc}\) is the location variable). This allows non-determinism in the sense of Dijkstra’s if-fi and do-od constructs where multiple guards may be true at a state, since the successor states have different locations.

The notion of correct implementation (“program T (target) implements program S (source)”) is just like before, only expressed directly as a simulation relation. More formally, fix a program \(S=\langle V_{_S}, \Omega _{_S},\Theta _{_S}, \rho _{_S}\rangle \) and \(T=\langle V_{_T}\Omega _{_T},\Theta _{_T}, \rho _{_T}\rangle \) and a relation \(\preceq \) between T’s and S’s states. A T-state t matches an S-state s if \(t\preceq s\). The definitions of path matching and system matching follow. As before, we require non-terminating computations of T to be matched to non-terminating computations of S so rules out pathological “implementations” where T does not terminate on any input.

One nice feature of the implementation notion is that it is compositional, that is, If T implements S and U implements T, then U implements S. This allows to seamlessly compose a sequence of transformation.

In practice, just like in TVOC, the matching relation is often a conjunction of equalities of the form \(v_S=\mathcal{E}(V_T)\) where \(v_S\) is a source variable and \(\mathcal{E}(V_T)\) is an expression of the target variables. When T is derived from S by a set of global optimizations, it often suffices to define \(\preceq \) only for program counters that are at the beginning of a basic block and to reason only on simple paths (that include no cycles.) This is often insufficient for dealing with other transformations (for example, inter-procedural optimizations or loop optimizations) and other methods have to be used.

We refer to a “good” \(\preceq \)—one that allows to prove an implementation relation—as a witness. We outfit LLVM so thateach optimization pass with a source S and a target T, the pass produces its own witness to the correctness (implementation relation) of the optimization. Based on the compositionality property, if each pass has a witness, then so does that whole compilation.

3.1 Examples of Witnesses

Consider our example of constant propagation as described in Fig. 1, with \(\mathtt{B1}\) and \(\mathtt{B2}\) being the basic blocks as for the TVOC example. There, the transitions relations for source and programs are the same: \(\mathsf {pc}=1 \ \wedge \ \mathsf {pc}'=2 \ \wedge \ x'=3 \ \wedge \ y'=5 \ \wedge \ z'=7\) and the witness is the trivial \(x=X \ \wedge \ y=Y \ \wedge \ z=Z\) (where we follow the convention that upper case denote source variables and lower case denote target variables.)

A slightly less trivial example is for the program described in Fig. 4. We show each program with its CFG denoting its \(\mathsf {B}\) and \(\mathsf {E}\) blocks.
Fig. 4.

A sequence of transformation

The first, (a), is the source program. The second (b) is the result of constant propagation and elimination of the resulting dead branch: since \(z=50\) and \(y=100\), \((150=)3\cdot z > y(=100)\), and the condition on the left branch evaluates to \(\mathsf {false}\) while the condition on the right branch evaluates to \(\mathsf {true}\), hence the left branch is never taken and can be eliminated (unreachable code followed by dead code elimination). Since basic blocks are only constrained by being single-entry single-exit, the basic blocks B1, B3, and B4 can now be merged into a single basic block (block merge), as shown in (c). Finally, the first assignments to y and z are never used, which renders them dead and they can be removed (dead store elimination.)

Each sub-step ((a) to (b), (b) to (c), and (c) to (d)) can be assigned a witness for each target location (basic block) in the obvious way. When composed we get a witness for the (a) to (d) transformation. E.g., at location \(\mathsf {E}\) the witness is:
$$ \mathsf {pc}=\mathsf {PC}=\mathsf {E}\ \ \wedge \ X=x \ \wedge \ Y=y \ \wedge \ Z=z \ \wedge \ x=10 \ \wedge \ y=102 \ \wedge \ z=112$$

3.2 Witnesses vs. TVOC

The original goal of TV is to determine the equivalence of S and T. The methodology of doing that, which relies on heuristics, has become sufficiently complex so to merit its own verification. In fact, all known implementations of the methodology require much ingenuity and skill. The witness approach requires instrumenting each optimization pass. This instrumentation is rather simple. Ideally, it would be obtained from the designer (c is extraneous) of the optimization. In some cases it is possible to craft the instrumentation without deep understanding of the optimization. This instrumentation is in the form of small “footprints,” from which a witness can be constructed fully automatically. All the global optimizations as well as the “simplify-CFG” ones were successfully performed by Master’s level students with little experience in compilers. A notable example (not performed by a student) is the instrumentation of \(\mathtt {mem2reg}\). This optimization pass performs both translation to SSA and some register allocation. A validation of a variant of this transformation took over 18 man months, 15K Coq-lines ([4]), while using the witness theory, the creation of a witness for the original transformation took three man month (most of which spent on understanding the code) and about 300-500 LOC in OCaml and C++ [18]. It is well beyond the capabilities of TVOC.

Of course, TVOC and similar tools avoid instrumenting the compiler. Judging from the history at the time, when compilers were rarely left open source, this was probably the right approach in the early 2000s. Currently, however, compilers are often open source, and instrumenting it so to ease validation is no more a faux pas.

3.3 Implementation

The witness checking infrastructure for LLVM consists of two parts: witness generation and refinement checking. Once an optimization pass is instrumented (hopefully by its author, but usually by graduate students), a witness can be generated for the optimization. The current implementation assumes that both source and target codes are deterministic, that is, every state has a unique successor. Suppose a source S (some IR\(_i\)) and a target T (IR\(_{i+1}\)) with a witness relation \(\preceq \). With the determinism assumption, the verification condition implied by \(\preceq \) is:
$$t \preceq s \wedge \rho _{_T}(t,t^{\prime }) \ \wedge \ \rho _{_S}(s, s^{\prime }) ~\Longrightarrow \quad t^{\prime }\preceq s^{\prime }$$
There are several tools that verify LLVM code against high-level specifications (see, e.g., [10, 23]), as far as we know there are not tools that can check an LLVM IR program refines another.

We chain two existing tools to accomplish this check: Smack and Boogie. Smack [23] is ongoing project whose goal is to verify LLVM IR, and part of it is a translation into Boogie. Boogie [13] is verification language using Z3 at the backend. We input the source and target code into Smack, and obtain Boogie programs. These Boogie program, together with the witness relation \(\preceq \), are then composed with proper variable renaming to guarantee mutually exclusive memory space. The composition is such that executions of matching simple paths of source and target are interleaved. Boogie then generates and checks the verification condition implied by the witness.

4 Conclusion

There is a growing awareness, both in industry and academia, of the crucial role of formally proving the correctness of safety-critical portions of systems. Most verification methods deal with the high-level specification of the system. However, if one is to prove that the high-level specification is correctly implemented at the lower level, one needs to verify the compiler which performs the translations. Verifying the correctness of modern optimizing compilers is challenging due to the complexity and reconfigurability of the target architectures and the sophisticated analysis and optimization algorithms used in the compilers.

The paper surveys some of the work of the recent two decades in verifying optimizing compilers. The first direction is to verify the compiler (translator). The most successful effort in this direction is CompCert that combines the development of the compiler with its verification.

Most compilers, however, are a given and not developed from scratch. Formally verifying a full-fledged optimizing compiler, as one would verify any other large program, is often infeasible, due to its size and evolution over time. Translation validation offers an alternative to the verification of translators in general and of compilers in particular. According to the translation validation approach, rather than verifying the compiler itself, one constructs a validating tool which, after every run of the compiler, formally confirms that the target code produced is a correct translation of the source program. In addition to providing a proof that the target code of the compiler implements the source code, the translation validation approach also offers means to certify that the code produced is true to its source. All TV methodologies output VCs, these can be verified by independent theorem provers (or, as is often the case, SMT solvers), which allows an additional degree of confidence. As an anecdote, SNECMA (currently Safran Aircraft Engines) used to employ hundreds of highly skilled people whose sole job was to manually check that optimized code correctly translated the source code. With TV there is no need for such manual verification.

The paper surveys some of the past work in translation validation and describes a current effort in providing with translation validation to LLVM. While we focused only on the global optimizations fragment of this effort, it was also applied to loop optimization [19]. We are currently applying a novel technique, that combines TV with re-writing rules, to validate inter-procedural optimizations. Yet another part of the witness theory that this paper omits for space reasons is that of witness propagation. This is pretty similar to the invariants of TVOC, only that the propagation mechanism allows to carry, from one transformation to another, any information that is known, as well as to constantly update this information as optimizations passes are executed. The idea of propagation can be used in numerous ways. To date, we augmented LLVM with external program analysis tools (whose runtime is far from linear!) and propagated the resulting witnesses as to accomplish more efficient runtime checks such as ones for buffer and integer overflows. The results are very promising. In fact, they allow for what used to be “unscalable” runtime check to be highly scalable [8].

It should be noted that all methodologies described here, in spite of presumably attempting to decide an undecidable problem, accomplish their task in practice and incur a very small overhead.




We thank DARPA and NSF for funding this project. Thanks are also due to our numerous collaborators on this project throughout the years, and especially for Amir Pnueli who introduced Lenore Zuck to the area, as well as Kedar Namjoshi and Venkat Venkatakrishnan who have been close collaborators of hers on the LLVM project.


  1. 1.
    Coq development team. The Coq proof assistant.
  2. 2.
    Alpern, B., Wegman, M.N., Zadeck, F.K.: Detecting equality of variables in programs. In: POPL 1988, pp. 1–11. ACM, New York (1988)Google Scholar
  3. 3.
    Barrett, C., Berezin, S.: CVC lite: a new implementation of the cooperating validity checker. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 515–518. Springer, Heidelberg (2004). CrossRefGoogle Scholar
  4. 4.
    Barthe, G., Demange, D., Pichardie, D.: Formal verification of an SSA-based middle-end for CompCert. TOPLAS 36(1), 4:1–4:35 (2014)CrossRefGoogle Scholar
  5. 5.
    Dave, M.A.: Compiler verification: a bibliography. SIGSOFT SEN 28(6), 2 (2003)CrossRefGoogle Scholar
  6. 6.
    Fang, Y., Zuck, L.D.: Improved invariant generation for TVOC. ENTCS 176(3), 21–35 (2007)Google Scholar
  7. 7.
    Floyd, R.: Assigning meanings to programs. Proc. Symp. Appl. Math. 19, 19–32 (1967)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Gjomemo, R., Namjoshi, K.S., Phung, P.H., Venkatakrishnan, V.N., Zuck, L.D.: From verification to optimizations. In: DSouza, D., Lal, A., Larsen, K.G. (eds.) VMCAI 2015. LNCS, vol. 8931, pp. 300–317. Springer, Heidelberg (2015). Google Scholar
  9. 9.
    Goldberg, B., Zuck, L., Barrett, C.: Into the loops: practical issues in translation validation for optimizing compilers. ENTCS 132(1), 53–71 (2005)Google Scholar
  10. 10.
    Gurfinkel, A., Kahsai, T., Komuravelli, A., Navas, J.A.: The seahorn verification framework. In: CAV, pp. 343–361 (2015)Google Scholar
  11. 11.
    Hunt Jr., W.A., Kaufmann, M., Moore, J.S., Slobodova, A.: Industrial hardware and software verification with ACL2. Philos. Trans. R. Soc. 375, 40 (2017). (Article Number 20150399)CrossRefGoogle Scholar
  12. 12.
    Le, V., Sun, C., Su, Z.: Randomized stress-testing of link-time optimizers. In: ISSTA, pp. 327–337. ACM(2015)Google Scholar
  13. 13.
    Leino, K.R.M.: This is boogie 2. Manuscript KRML 178, 131 (2008)Google Scholar
  14. 14.
    Lerner, S., Millstein, T., Chambers, C.: Automatically proving the correctness of compiler optimizations. ACM SIGPLAN Not. 38(5), 220–231 (2003)CrossRefGoogle Scholar
  15. 15.
    Leroy, X.: Formal verification of a realistic compiler. Commun. ACM 52(7), 107–115 (2009)CrossRefGoogle Scholar
  16. 16.
    Lopes, N.P., Menendez, D., Nagarakatte, S., Regehr, J.: Provably correct peephole optimizations with alive. ACM SIGPLAN Not. 50(6), 22–32 (2015)CrossRefGoogle Scholar
  17. 17.
    McCarthy, J., Painter, J.: Correctness of a compiler for arithmetic expressions. Math. Aspects Comput. Sci. 1, 219–222 (1967)MATHGoogle Scholar
  18. 18.
    Namjoshi, K.S.: Witnessing an SSA transformation. In: VeriSure Workshop and Personal Communication, CAV 2014 (2014).
  19. 19.
    Namjoshi, K.S., Singhania, N.: Loopy: programmable and formally verified loop transformations. In: Rival, X. (ed.) SAS 2016. LNCS, vol. 9837, pp. 383–402. Springer, Heidelberg (2016). CrossRefGoogle Scholar
  20. 20.
    Necula, G.C.: Translation validation for an optimizing compiler. ACM Sigplan Not. 35(5), 83–94 (2000)CrossRefGoogle Scholar
  21. 21.
    Pnueli, A., Siegel, M., Singerman, E.: Translation validation. In: Steffen, B. (ed.) TACAS 1998. LNCS, vol. 1384, pp. 151–166. Springer, Heidelberg (1998). CrossRefGoogle Scholar
  22. 22.
    Pnueli, A., Zaks, A.: Translation validation of interprocedural optimizations. In: International Workshop on Software Verification and Validation (2006)Google Scholar
  23. 23.
    Rakamarić, Z., Emmi, M.: SMACK: decoupling source language details from verifier implementations. In: Biere, A., Bloem, R. (eds.) CAV 2014. LNCS, vol. 8559, pp. 106–113. Springer, Cham (2014). Google Scholar
  24. 24.
    Samet, H.: Automatically proving the correctness of translations involving optimized code. PhD thesis, Stanford University (1975)Google Scholar
  25. 25.
    Schmidt, D.A.: Data flow analysis is model checking of abstract interpretations. In: POPL (1998), pp. 38–48. ACM (1998)Google Scholar
  26. 26.
    Yang, X., Chen, Y., Eide, E., Regehr, J.: Finding and understanding bugs in C compilers. ACM SIGPLAN Not. 46(6), 283–294 (2011)CrossRefGoogle Scholar
  27. 27.
    Zaks, G.: Ensuring correctness of compiled code. Ph.D. thesis, New York University (2009)Google Scholar
  28. 28.
    Zhao, J., Nagarakatte, S., Martin, M.M.K., Zdancewic, S.: Formalizing the LLVM intermediate representation for verified program transformations. In: ACM SIGPLAN Notices, pp. 427–440. ACM (2012)Google Scholar
  29. 29.
    Zhao, J., Nagarakatte, S., Martin, M.M.K., Zdancewic, S.: Formal verification of SSA-based optimizations for LLVM. ACM SIGPLAN Not. 48(6), 175–186 (2013)CrossRefGoogle Scholar
  30. 30.
    Zuck, L., Pnueli, A., Goldberg, B., Barrett, C., Fang, Y., Hu, Y.: Translation and run-time validation of loop transformations. FMSD 27(3), 335–360 (2005)MATHGoogle Scholar
  31. 31.
    Zuck, L.D., Pnueli, A., Goldberg, B.: VOC: a methodology for the translation validation of optimizing compilers. J. UCS 9(3), 223–247 (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations