Angelic Verification: Precise Verification Modulo Unknowns
 10 Citations
 1 Mentions
 1.3k Downloads
Abstract
Verification of open programs can be challenging in the presence of an unconstrained environment. Verifying properties that depend on the environment yields a large class of uninteresting false alarms. Using a verifier on a program thus requires extensive initial investment in modeling the environment of the program. We propose a technique called angelic verification for verification of open programs, where we constrain a verifier to report warnings only when no acceptable environment specification exists to prove the assertion. Our framework is parametric in a vocabulary and a set of angelic assertions that allows a user to configure the tool. We describe a few instantiations of the framework and an evaluation on a set of realworld benchmarks to show that our technique is competitive with industrialstrength tools even without models of the environment.
Keywords
Conjunctive Normal Form Satisfiability Modulo Theory Quantifier Elimination Program Verifier Abductive Inference1 Introduction
Scalable software verifiers offer the potential to find defects early in the development cycle. The user of such a tool can specify a property (e.g. correct usage of kernel/security APIs) using some specification language and the tool validates that the property holds on all feasible executions of the program. There has been a significant progress in the area of software verification, leveraging ideas from model checking [13], theorem proving [34] and invariant inference algorithms [16, 22, 33]. Tools based on these principles (e.g. SDV [3], FSoft [24]) have found numerous bugs in production software.
Example 1
Consider the example program (written in the Boogie language [4]) in Fig. 1. The program has four procedures \({\mathsf {Foo}}, {\mathsf {Bar}}, {\mathsf {Baz}}, {\mathsf {FooBar}}\) and two external library procedures \({\mathsf {Lib1}}, {\mathsf {Lib2}}\). The variables in the programs can be scalars (of type \(\mathsf {int}\)) or arrays (e.g. \({\mathsf {m}}\)) that map \(\mathsf {int}\) to \(\mathsf {int}\). The Boogie program is an encoding of a C program [15]: pointers and values are uniformly modeled as integers (e.g. parameter \({\mathsf {x}}\) of \({\mathsf {Bar}}\), or the return value of \({\mathsf {Lib1}}\)), and memory dereference is modeled as array lookup (e.g. \({\mathsf {m[x]}}\)). The procedures have assertions marked using \({\mathsf {assert}}\) statements. The entry procedure for this program is \({\mathsf {Foo}}\).
There are several sources of unknowns or unconstrained values in the program: the parameter \({\mathsf {z}}\) to \({\mathsf {Foo}}\), the global variable \({\mathsf {m}}\) representing the heap, and the return values of library procedures \({\mathsf {Lib1}}\) and \({\mathsf {Lib2}}\). Even a precise verifier is bound to return assertion failures for each of the assertions in the program. This is due to the fact that all the assertions, except the one in \({\mathsf {Baz}}\) (the only definite bug in the program) are assertions over unknowns in the program and (sound) verifiers tend to be conservative (overapproximate) in the face of unknowns. Such demonic nature of verifiers will result in several false alarms.
Overview. Our goal is to push back on the demonic nature of the verifier by prioritizing alarms with higher evidence. In addition to the warning in \({\mathsf {Baz}}\), the assertion in \({\mathsf {Bar}}\) is suspicious as the only way to avoid the bug is to make the “else” branch unreachable in \({\mathsf {Bar}}\). For the remaining assertions, relatively simple constraints on the unknown values suffice to explain the correctness of these assertions. For example, it is reasonable to assume that calls to library methods do not return \({\mathsf {NULL}}\), their dereferences (\({\mathsf {m[x]}}\)) store nonnull values and calls to two different library methods do not return aliased pointers. We tone down the demonic nature of verifiers by posing a more angelic decision problem for the verifier (also termed as abductive inference [10, 20]):
For a given assertion, does there exists an acceptable specification over the unknowns such that the assertion holds?
This forces the verifier to work harder to exhaust the space of acceptable specifications before showing a warning for a given assertion. Of course, this makes the verification problem less defined as it is parameterized by what constitutes “acceptable” to the end user of the tool. At the same time, it allows a user to be able to configure the demonic nature of the tool by specifying a vocabulary of acceptable specifications.
In this paper, we provide a user a few dimensions to specify a vocabulary \( Vocab_{}\) that constitutes a specification (details can be found in Sect. 4). The vocabulary can indicate a template for the atomic formulas, or the Boolean and quantifier structure. Given a vocabulary \( Vocab_{}\), we characterize an acceptable specification by how (a) concise and (b) permissive the specification is. Conciseness is important for the resulting specifications to be understandable by the user. Permissiveness ensures that the specification is not overly strong, thus masking out true bugs. The failure in \({\mathsf {Bar}}\) is an example, where a specification \({\mathsf {x}} \ne {\mathsf {NULL}}\) is not permissive as it gives rise to dead code in the “else” branch before the assertion. To specify desired permissiveness, we allow the users to augment the program with a set of angelic assertions \(\hat{{\mathcal A}}\). The assertions in \(\hat{{\mathcal A}}\) should not be provable in the presence of any inferred specification over the unknowns. An angelic assertion \(\mathsf {assert}\ e \in \hat{{\mathcal A}}\) at a program location l indicates that the user expects at least one state to reach l and satisfy \(\lnot e\). For \({\mathsf {Bar}}\) one can add two assertions \(\mathsf {assert}\ \mathsf {false}\) inside each of the branches. The precondition \({\mathsf {x}} \ne {\mathsf {NULL}}\) would be able to prove that \(\mathsf {assert}\ \mathsf {false}\) in the “else” branch is unreachable (and thus provable), which prevents it from being permissive. We describe a few such useful instances of angelic assertions in Sect. 3.1.
We have implemented the angelic verification framework in a tool called AngelicVerifier for Boogie programs. Given a Boogie program with a set S of entrypoints, AngelicVerifier invokes each of the procedures in S with unknown input states. In the absence of any userprovided information, we assume that S is the set of all procedures in the program. Further, the library procedures are assigned a body that assigns a nondeterministic value to the return variables and adds an assume statement with a predicate \({\mathsf {unknown\_i}}\) (Fig. 2). This predicate will be used to constrain the return values of a procedure for all possible call sites (Sect. 4) within an entrypoint.

For a trace that starts at \({\mathsf {Bar}}\) and fails the assert on line 6, we conjecture a specification \({\mathsf {x \not = NULL}}\) but discover that it is not permissive. The line with “ANGELIC_WARNING” is a warning shown to the user.

For the trace that starts at \({\mathsf {Baz}}\) and fails the assert on line 11, we block the assertion failure by installing the constraint \({\mathsf {y \not = NULL}}\). The code of \({\mathsf {Bar}}\) does not have any indication that it expects to see \({\mathsf {NULL}}\) as input.

For the three traces that start at \({\mathsf {FooBar}}\) and fail an assertion inside it, we block them using constraints on the return values of library calls. Notice that the return values are not in scope at the entry to \({\mathsf {FooBar}}\); they get constrained indirectly using the \({\mathsf {unknown\_i}}\) predicates. The most interesting block is for the final assertion which involves assuming that (a) the returns from the two library calls are never aliased, and (b) the value of the array \({\mathsf {m}}\) at the value returned by \({\mathsf {Lib2}}\) is nonnull. (See Sect. 4)

The trace starting at \({\mathsf {Foo}}\) that calls \({\mathsf {Baz}}\) and fails on line 11 cannot be blocked (other than by using the nonpermissive specification false), and is reported to the user.
2 Programming Language
Semantics. A program state \({\sigma _{}}\) is a typeconsistent valuation of variables in scope in the program. The set of all states is denoted by \({\Sigma }\cup \{{ Err}\}\), where \({ Err}\) is a special state to indicate an assertion failure. For a given state \({\sigma _{}} \in {\Sigma }{}\) and an expression (or formula) \({ {e}}\), \({{ {e}}}_{{\sigma _{}}}\) denotes the evaluation of \({ {e}}\) in the state. For a formula \({\phi _{}} \in { Formula}{}\), \({{\sigma _{}}} \models {{\phi _{}}}\) holds if \({{\phi _{}}}_{{\sigma _{}}}\) evaluates to \(\mathsf {true}\). The semantics of a program is a set of execution traces, where a trace corresponds to a sequence of program states. We refer the readers to earlier works for details of the semantics [4]. Intuitively, an execution trace for a block \({ BL_{}}\) corresponds to the sequence of states obtained by executing the body, and extending the terminating sequences with the traces of the successor blocks (if any). A sequence of states for a block does not terminate if it either executes an \(\mathsf {assume}\ \phi \) or an \(\mathsf {assert}\ \phi \) statement in a state \({\sigma _{}} \in {\Sigma }\) such that \({{\sigma _{}}} \not \models {\phi }\). In the latter case, the successor state is \({ Err}\). The traces of a program is the set of traces for the start block \({ Start}\). Let \({\mathcal T}({ P_{}})\) be the set of all traces of a program \({ P_{}}\). A program \({ P_{}}\) is correct (denoted as \({} \models {{ P_{}}}\)) if \({\mathcal T}({ P_{}})\) does not contain a trace that ends in the state \({ Err}\). For a program \({ P_{}}\) that is not correct, we define a failure trace as a trace \({\tau _{}}\) that starts at \({ Start}\) and ends in the state \({ Err}\).
3 Angelic Verification

The user can provide a vocabulary \( Vocab_{}\) of acceptable specifications, along with a checker that can test membership of a formula \(\phi \) in \( Vocab_{}\). We show instances of \( Vocab_{}\) in Sect. 4.

The user can augment \({ P_{}}\) with a set of angelic assertions \(\hat{{\mathcal A}}\) at specific locations, with the expectation that any specification should not prove an assertion \(\mathsf {assert}\ { {e}} \in \hat{{\mathcal A}}\).
We term the resulting verification problem angelic as the verifier cooperates with the user (as opposed to playing an adversary) to find specifications that can prove the program. This can be seen as a particular mechanism to allow an expert user to customize the abductive inference problem tailored to their needs [20]. If no such specification can found, it indicates that the verification failure of \({ P_{}}\) cannot be categorized into previously known buckets of false alarms.
We make these ideas more precise in the next few sections. In Sect. 3, we describe the notion of angelic correctness given \({ P_{}}\), \( Vocab_{}\) and \(\hat{{\mathcal A}}{}\). In Sect. 3.2, we describe an algorithm to prove angelic correctness using existing program verifiers.
3.1 Problem Formulation
Let \(\phi \in { Formula}\) be a wellscoped formula at the block \({ Start}\) of a program \({ P_{}}\). We say that a program \({ P_{}}\) is correct under \(\phi \) (denoted as \({\phi } \models {{ P_{}}}\)), if the augmented program \({ Start}_0 \ : \ \mathsf {assume}\ \phi \ ; \ \mathsf {goto}\ { Start}\) with “Start” block as \({ Start}_0\) is correct. In other words, the program \({ P_{}}\) is correct with a precondition \(\phi \).

Normal assertions \(A_1 \subseteq \mathcal A\) that constitute a (possibly empty) subset of the original assertions present in \({ P_{}}\), and

Angelic assertions \(A_2 \subseteq \hat{{\mathcal A}}\) that constitute a (possibly empty) subset of set of additional user supplied assertions.
Definition 1
(Permissive Precondition). For a program \({ P_{}}_{\mathcal A,\hat{{\mathcal A}}}\) and formula \(\phi \), \( Permissive({ P_{}}_{\mathcal A,\hat{{\mathcal A}}},\phi )\) holds if for every assertion \({ {s}} \in \hat{{\mathcal A}}\), if \({\phi } \models {{ P_{}}_{\emptyset ,\{s\}}}, then \ {\mathsf {true}} \models {{ P_{}}_{\emptyset ,\{s\}}}\).
In other words, a specification \(\phi \) is not allowed to prove any assertion \(s \in \hat{{\mathcal A}}\) that was not provable under the unconstrained specification \(\mathsf {true}\).
Definition 2
(Angelic Correctness). Given (i) a program \({ P_{}}\) with a set of normal assertions \(\mathcal A\), (ii) an angelic set of assertions \(\hat{{\mathcal A}}\), and (iii) a vocabulary \( Vocab_{}\) constraining a set of formulas at \({ Start}\), \({ P_{}}\) is angelically correct under (\( Vocab_{},\hat{{\mathcal A}}\)) if there exists a formula \(\phi \in Vocab_{}\) such that: (i) \({\phi } \models {{ P_{}}_{\mathcal A,\emptyset }}\), and (ii) \( Permissive({ P_{}}_{\emptyset ,\hat{{\mathcal A}}},\phi )\) holds.
If no such specification \(\phi \) exists, then we say that \({ P_{}}\) has an angelic bug with respect to \(( Vocab_{},\hat{{\mathcal A}}\)). In this case, we try to ensure the angelic correctness of \({ P_{}}\) with respect to a subset of the assertions in \({ P_{}}\); the rest of the assertions are flagged as angelic warnings.
Examples of Angelic Assertions \(\varvec{\hat{{\mathcal A}}.}\) If one provides \(\mathsf {assert}\ \mathsf {false}\) at \({ Start}\) to be part of \(\hat{{\mathcal A}}\), it disallows preconditions that are inconsistent with other preconditions of the program [20]. If we add \(\mathsf {assert}\ \mathsf {false}\) at the end of every basic block, it prevents us from creating preconditions that create dead code in the program. This has the effect of detecting semantic inconsistency or doomed bugs [19, 21, 23, 36]. Further, we can allow checking such assertions interprocedurally and at only a subset of locations (e.g. exclude defensive checks in callees). Finally, one can encode other domain knowledge using such assertions. For example, consider checking the correct lock usage for \({\mathsf {if(*) \{L1: \ assert \ \lnot locked(l1); \ lock(l1);\} \ else \ \{L2: \ assert \ locked(l2); \ unlock(l2);\}}}\). If the user expects an execution where \({\mathsf {l1}} = {\mathsf {l2}}\) at \({\mathsf {L2}}\), the angelic assertion \({\mathsf {assert \ l1 \not = l2}} \in \hat{{\mathcal A}}\) precludes the precondition \({\mathsf {\lnot locked(l1) \wedge locked(l2)}}\), and reveals a warning for at least one of the two locations. As another example, if the user has observed a runtime value v for variable \({\mathtt{{x}}}\) at a program location l, she can add an assertion \(\mathsf {assert}\ {\mathtt{{x}}} \ne v \in \hat{{\mathcal A}}\) at l to ensure that a specification does not preclude a known feasible behavior; further, the idea can be extended from feasible values to feasible intraprocedural path conditions.
3.2 Finding Angelic Bugs
Algorithm 1 describes a (semi) algorithm for proving angelic correctness of a program. In addition to the program, it takes as inputs the set of angelic assertions \(\hat{{\mathcal A}}\), and a vocabulary \( Vocab_{}\). On termination, the procedure returns a specification \({ E_{}}\) and a subset \({\mathcal A_1}\subseteq \mathcal A{}\) for which the resultant program is angelically correct under \({ E_{}}\). Lines 1 and 2 initialize the variables \({ E_{}}\) and \({\mathcal A_1}\), respectively. The loop from line 3 — 16 performs the main act of blocking failure traces in \({ P_{}}\). First, we verify the assertions \({\mathcal A_1}\) over \({ P_{}}\). The routine tries to establish \({{ E_{}}} \models {{ P_{}}}\) using a sound and complete program verifier; the program verifier itself may never terminate. We return in line 6 if verification succeeds and \({ P_{}}\) contains no failure traces (\(\mathtt{NO\_TRACE}\)). In the event a failure trace \({\tau _{}}\) is present, we query a procedure \({ ExplainError}\) (see Sect. 4) to find a specification \(\phi \) that can prove that none of the executions along \({\tau _{}}\) fail an assertion. Line 10 checks if the addition of the new constraint \(\phi \) still ensures that the resulting specification \({ E_{1}}\) is permissive. If not, then it suppresses the assertion a that failed in \({\tau _{}}\) (by removing it from \({\mathcal A_1}\)) and outputs the trace \({\tau _{}}\) to the user. Otherwise, it adds \(\phi \) to the set of constraints collected so far. The loop repeats forever until verification succeeds in Line 4. The procedure may fail to terminate if either the call to \( Verify\) does not terminate, or the loop in Line 3 does not terminate due to an unbounded number of failure traces.
Theorem 1
On termination, Algorithm 1 returns a pair of precondition \({ E_{}}\) and a subset \({\mathcal A_1}\subseteq \mathcal A\) such that (i) \({{ E_{}}} \models {{ P_{}}}\) when only assertions in \({\mathcal A_1}\) are enabled, and (ii) \( Permissive({ P_{}}_{\mathcal A,\hat{{\mathcal A}}},E)\).
The proof follows directly from the check in line 4 that establishes (i), and line 10 that ensures permissiveness.
4 ExplainError
Problem. Given a program \({ P_{}}\) that is not correct, let \({\tau _{}}\) be a failure trace of \({ P_{}}\). Since a trace can be represented as a valid program (\({ Program}\)) in our language (with a single block containing the sequence of statements ending in an \(\mathsf {assert}\) statement), we will treat \({\tau _{}}\) as a program with a single control flow path.
Informally, the goal of ExplainError is to return a precondition \(\phi \) from a given vocabulary \( Vocab_{}\) such that \({\phi } \models {{\tau _{}}}\), or \(\mathsf {false}\) if no such precondition exists. ExplainError takes as input the following: (a) a program \({ P_{}}\), (b) a failure trace \({\tau _{}}\) in \({ P_{}}\) represented as a program and (c) a vocabulary \( Vocab_{}\) that specifies syntactic restrictions on formulas to search over. It returns a formula \(\phi \) such that \({\phi } \models {{\tau _{}}}\) and \(\phi \in Vocab_{} \cup \{\mathsf {false}\}\). It returns \(\mathsf {false}\) either when (a) the vocabulary does not contain any formula \(\phi \) for which \({\phi } \models {{\tau _{}}}\), or (b) the search does not terminate (say due to a timeout).
Note that the weakest liberal precondition (\({ wlp}\)) of the trace [18] is guaranteed to be the weakest possible blocking constraint; however, it is usually very specific to the trace and may require enumerating all the concrete failing traces inside Algorithm 1. Moreover, the resulting formula for long traces are often not suitable for human consumption. When ExplainError returns a formula other than \(\mathsf {false}\), one may expect \(\phi \) to be the weakest (most permissive) constraint in \( Vocab_{}\) that blocks the failure path. However, this is not possible for several reasons (a) efficiency concerns preclude searching for the weakest, (b) \( Vocab_{}\) may not be closed under disjunction and therefore the weakest constraint may not be defined. Thus the primary goals of ExplainError are to be (a) scalable (so that it can be invoked in the main loop in Algorithm 1), and (b) the resulting constraints are concise even if not the weakest over \( Vocab_{}\).

\( Vocab_{}.{ Atoms}\): a template for the set of atomic formulas that can appear in a blocking constraint. This can range over equalities (\(e_1 = e_2\)), difference constraints (\(e_1 \le e_2 + c\)), or some other syntactic pattern.

\( Vocab_{}.{ Bool}\): the complexity of Boolean structure of the blocking constraint. One may choose to have a clausal formula (\(\bigvee _i e_i\)), cube formulas (\(\bigwedge _i e_i\)), or an arbitrary conjunctive normal form (CNF) (\(\bigvee _j (\bigwedge _i e_i)\)) over atomic formulas \(e_i\).
Initially, we assume that we do not have internal nondeterminism in the form of \(\mathsf {havoc}{}\) or calls to external libraries in the trace \({\tau _{}}\) – we will describe this extension later in this section.
Let \({ wlp}(s, \phi )\) be the weakest liberal precondition transformer for a \(s \in { Stmt}\) and \(\phi \in { Formula}\) [18]. \({ wlp}(s, \phi )\) is the weakest formula representing states from which executing s does not lead to assertion failure and on termination satisfies \(\phi \). It is defined as follows on the structure of statements: \({ wlp}(\mathsf {skip}, \phi ) = \phi \), \({ wlp}({\mathtt{{x}}} := e, \phi ) = \phi [e/{\mathtt{{x}}}]\) (where \(\phi [e/{\mathtt{{x}}}]\) denotes substituting e for all free occurrences of \({\mathtt{{x}}}\)), \({ wlp}(\mathsf {assume}\ \psi , \phi ) = \psi \Rightarrow \phi \), \({ wlp}(\mathsf {assert}\ \psi , \phi ) = \psi \wedge \phi \), and \({ wlp}({s};{t}, \phi ) = { wlp}(s, { wlp}(t, \phi ))\). Thus \({ wlp}({\tau _{}}, \mathsf {true})\) will ensure that no assertion fails along \({\tau _{}}\). Our current algorithm (Algorithm 2) provides various options to create predicate (under) covers of \({ wlp}({\tau _{}}, \mathsf {true})\) [22], formulas that imply \({ wlp}({\tau _{}}, \mathsf {true})\). Such formulas are guaranteed to block the trace \({\tau _{}}\) from failing.
The first step \({ ControlSlice}\) performs an optimization to prune conditionals from \({\tau _{}}\) that do not control dominate the failing assertion, by performing a variant of the path slicing approach [25]. Line 2 performs the \({ wlp}\) computation on the resulting trace \({\tau _{1}}\). At this point, \(\phi _1\) is a Boolean combination of literals from arithmetic, equalities and array theories in satisfiability modulo theories (SMT) [34]. \({ EliminateMapUpdates}\) (in line 3) eliminates any occurrence of \( write\) from the formula using rewrite rules such as \( read( write(e_1, e_2, e_3), e_4) \rightarrow e_2 = e_4 \ ? \ e_3 : read(e_1, e_4)\). This rule introduces new equality (aliasing) constraints in the resulting formula that are not present directly in \({\tau _{}}\). Line 4 chooses a set of atomic formulas from \(\phi _2\) that match the vocabulary. Finally, the conditional in Line 5 determines the Boolean structure in the resulting expression.
The \(\mathtt{MONOMIAL}\) option specifies that the block expression is a disjunction of atoms from \({ atoms_{1}}\). Line 7 collects the set of atoms in \({ atoms_{1}}\) that imply \(\phi _2\), which in turn implies \({ wlp}({\tau _{}}, \mathsf {true})\). We return the clause representing the disjunction of such atoms, which in turn implies \({ wlp}({\tau _{}}, \mathsf {true})\). The more expensive \({ ProjectAtoms}(\phi _2, { atoms_{1}})\) returns a formula \(\phi _3\) that is a CNF expression over \({ atoms_{1}}\), such that \(\phi _3 \Rightarrow \phi _2\), by performing Boolean quantifier elimination of the atoms not present in \({ atoms_{1}}\). We first transform the formula \(\phi _2\) into a conjunctive normal form (CNF) by repeatedly applying rewrite rules such as \(\phi _1 \vee (\phi _2 \wedge \phi _3) \rightarrow (\phi _1 \vee \phi _2) \wedge (\phi _1 \vee \phi _3)\). We employ a theorem prover at each step to try simplify intermediate expressions to \(\mathsf {true}\) or \(\mathsf {false}\). Finally, for each clause c in the CNF form, we remove any literal in c that is not present in the set of atoms \({ atoms_{1}}\).
Example. Consider the example \({\mathsf {FooBar}}\) in Fig. 1, and the trace \({\tau _{}}\) that corresponds to violation of \({\mathsf {assert \ w \ne NULL}}\). The trace is a sequential composition of the following statements: \({\mathsf {z := x\_1}}\), \({\mathsf {m[z] := NULL}}\), \({\mathsf {x := x\_2}}\), \({\mathsf {w := m[x]}}\), \({\mathsf {assert \ w \ne NULL}}\), where we have replaced calls to \({\mathsf {Lib1}}\) and \({\mathsf {Lib2}}\) with \({\mathsf {x\_1}}\) and \({\mathsf {x\_2}}\) respectively. \({ wlp}({\tau _{}}, \mathsf {true})\) is \({\mathsf {read(write(m,x\_1,NULL),x\_2) \ \ne \ NULL}}\), which after applying \({ EliminateMapUpdates}\) would result in the expression \( \left( {\mathsf {x\_1 \ne x\_2 \ \wedge \ m[x\_2] \ne NULL}}\right) \). Notice that this is nearly identical to the blocking clause (except the quantifiers and triggers) returned while analyzing \({\mathsf {FooBar}}\) in Fig. 3. Let us allow any disequality \(e_1 \ne e_2\) atoms in \( Vocab_{}\). If we only allow \(\mathtt{MONOMIAL}\) Boolean structure, there does not exist any clause over these atoms (weaker than \(\mathsf {false}\)) that suppresses the trace.
Internal Nondeterminism. In the presence of only input nondeterminism (parameters and globals), the \({ wlp}({\tau _{}}, \mathsf {true})\) is a wellscoped expression at entry in terms of parameters and globals. In the presence of internal nondeterminism (due to \(\mathsf {havoc}{}\) statements either present explicitly or implicitly for nondeterministic initialization of local variables), the target of a \(\mathsf {havoc}\) is universally quantified away (\({ wlp}(\mathsf {havoc}\ {\mathtt{{x}}}, \phi ) = \forall {u}: \ \phi [u/{\mathtt{{x}}}]\)). However, this is unsatisfactory for several reasons: (a) one has to introduce a fresh quantified variable for different call sites of a function (say \({\mathsf {Lib1}}\) in Fig. 1). (b) Moreover, the quantified formula does not have good trigger [17] to instantiate the universally quantified variables u. For a quantified formula, a trigger is a set of subexpressions containing all the bound variables. To address both these issues, we introduce a distinct predicate \({\mathsf {unknown\_i}}\) after the \(\mathtt{i}\)th syntactic call to \(\mathsf {havoc}{}\) and introduce an assume statement after the \(\mathsf {havoc}\) (Fig. 2): \({\mathsf {assume \ unknown\_i(x)}}\), The \({ wlp}\) rules for \(\mathsf {assume}\) and \(\mathsf {havoc}{}\) ensure that the quantifiers are more wellbehaved as the resultant formulas have \({\mathsf {unknown\_i(x)}}\) as a trigger (see Fig. 3).
5 Evaluation
We have implemented the ideas described in this paper (Algorithms 1 and 2) in a tool called AngelicVerifier, available with sources.^{1} AngelicVerifier uses the Corral verifier [31] as a black box to implement the check \( Verify{}\) used in Algorithm 1. Corral performs interprocedural analysis of programs written in the Boogie language; the Boogie program can be generated from either C [15], .NET [5] or Java programs [1]. As an optimization, while running ExplainError, AngelicVerifier first tries the \(\mathtt{MONOMIAL}\) option and falls back to \({ ProjectAtoms}\) when the former returns \(\mathsf {false}\).
5.1 Comparison with SDV
SDV is a tool offered by Microsoft to thirdparty driver developers. It checks for typestate properties (e.g., locks are acquired and released in strict alternation) on Windows device drivers. SDV checks these properties by introducing monitors in the program in the form of global variables, and instrumenting the property as assertions in the program. We chose a subset of benchmarks and properties from SDV’s verification suite that correspond to drivers distributed in the Windows Driver Kit (WDK); their characteristics are mentioned in Fig. 5. We picked a total of 18 driverproperty pairs, in which SDV reports a defect on 13 of them. Figure 5 shows the range for the number of procedures, lines of code (contained in C files) and the total time taken by SDV (in 1000s of seconds) on all of the buggy or correct instances.

default: The vocabulary includes aliasing constraints (\(e_1 \ne e_2\)) as well as arbitrary expressions over monitor variables.

noTS: The vocabulary only includes aliasing constraints.

noAlias: The vocabulary only includes expressions over the monitor variables.

noEE: The vocabulary is empty. In this case, all traces returned by Corral are treated as bugs without running ExplainError. This option simulates a demonic environment.

default+harness: This is the same as default, but the input program includes a stripped version of the harness used by SDV. This harness initializes the monitor variables and calls specific procedures in the driver. (The actual harness used by SDV is several times bigger and includes initializations of various data structures and flags as well.)

The assertion in Fig. 6(a) will be reported as a bug by noTS but not default because \({\mathsf {LockDepth > 1}}\) is not a valid atom for noTS.

The assertion in Fig. 6(c) will be reported as a bug by noAlias but not default because it requires a specification that constrains aliasing in the environment. For instance, default constrains the environment by imposing \((x \not = {\mathsf {irp}} \wedge y \not = {\mathsf {irp}}) \vee (z \not = {\mathsf {irp}} \wedge y \not = {\mathsf {irp}})\), where x is \({\mathsf {devobj\rightarrow }}\) \({\mathsf {DeviceExtension\rightarrow FlushIrp}}\), y is \({\mathsf {devobj\rightarrow DeviceExtension\rightarrow LockIrp}}\) and z is \({\mathsf {devobj\rightarrow DeviceExtension\rightarrow BlockIrp}}\).

The procedures called \({\mathsf {Harness}}\) in Fig. 6 are only available under the setting default+harness. The assertion in Fig. 6(a) will not be reported by default as it is always possible (irrespective of the number of calls to \({\mathsf {KeAcquireSpinLock}}\) and \({\mathsf {KeReleaseSpinLock}}\)) to construct an initial value of \({\mathsf {LockDepth}}\) that suppresses the assertion. When the (stripped) harness is present, this assertion will be reported. Note that the assertion failure in Fig. 6(b) will be caught by both default and default+harness.
The results on SDV benchmarks are summarized in Table 1. For each \({ AngelicVerifier}\) configuration, we report the cumulative running time in thousands of seconds (CPU), the numbers of bugs reported (B), and the number of false positives (FP) and false negatives (FN). The experiments were run (sequentially, singlethreaded) on a server class machine with two Intel(R) Xeon(R) processors (16 logical cores) executing at 2.4 GHz with 32 GB RAM.
noEE reports a large number of false positives, confirming that a demonic environment leads to spurious warnings. The default configuration, on the other hand, reports no false positives! It is overlyoptimistic in some cases resulting in missed defects. It is clear that the outofthebox experience, i.e., before environment models have been written, of AngelicVerifier (low false positives, few false negatives) is far superior to a demonic verifier (very high false positives, few false negatives).
Results on SDV benchmarks
default  default+harness  noEE  noTS  noAlias  

Bench  CPU(Ks)  B  FP  FN  CPU(Ks)  B  FP  FN  CPU(Ks)  B  FP  FN  CPU(Ks)  B  FP  FN  CPU(Ks)  B  FP  FN 
Correct  9.97  0  0  0  16.8  0  0  0  0.28  12  12  0  4.20  2  2  0  15.1  0  0  0 
Buggy  3.19  9  0  4  3.52  13  0  0  0.47  21  13  5  2.58  14  3  2  1.42  10  3  6 
Comparison against PREfix on checking for nullpointer dereferences
stats  PREfix  default  defaultAA  

Bench  Procs  KLOC  B  CPU(Ks)  B  PM  FP  FN  PREFP  PREFN  CPU(Ks)  B 
Mod 1  453  37.2  14  2.7  26  14  4  0  0  1  1.8  26 
Mod 2  64  6.5  3  0.2  0  0  0  3  0  0  0.2  0 
Mod 3  479  56.6  5  5.8  11  3  4  2  0  1  1.7  6 
Mod 4  382  37.8  4  1.8  3  0  0  0  4  3  1.1  2 
Mod 5  284  30.9  6  0.8  12  6  1  0  0  0  0.4  11 
Mod 6  37  8.4  7  0.1  10  7  0  0  0  0  0.1  10 
Mod 7  184  20.9  10  0.6  11  10  0  0  0  1  0.4  11 
Mod 8  400  43.8  5  2.9  15  5  1  0  0  1  1.0  15 
Mod 9  40  3.2  7  0.1  8  7  0  0  0  0  0.1  8 
Mod 10  998  76.5  7  24.9  8  3  1  4  0  4  16.0  4 
total  –  321  68  39.9  104  54  11  9  4  11  22.8  93 
5.2 Comparison Against PREfix
PREfix is a production tool used internally within Microsoft. It checks for several kinds of programming errors, including checking for nullpointer dereferences, on the Windows code base. We targeted AngelicVerifier to find nullpointer exceptions and compared against PREfix on 10 modules selected randomly, such that PREfix reported at least one defect in the module. Table 2 reports the sizes of these modules. (The names are hidden for proprietary reasons.)
We used two AngelicVerifier configurations: defaultAA uses a vocabulary of only aliasing constraints. default uses the same vocabulary along with angelic assertions: an \(\mathsf {assert}\ \mathsf {false}\) is injected after any statement of the form \({\mathsf {\mathsf {assume}\ e \text { == } null}}\). This enforces that if the programmer anticipated an expression being null at some point in the program, then AngelicVerifier should not impose an environment specification that makes this check redundant.
Scalability. This set of benchmarks were several times harder than the SDV benchmarks for our tool chain. This is because of the larger codebase, but also because checking nullness requires tracking of pointers in the heap, whereas SDV’s typestate properties are mostly controlflow based and require minimal tracking of pointers. To address the scale, we use two standard tricks. First, we use a cheap alias analysis to prove many of the dereferences safe and only focus AngelicVerifier on the rest. Second, AngelicVerifier explores different entrypoints of the program in parallel. We used the same machine as for the previous experiment, and limited parallelism to 16 threads (one per available core). Further, we optimized ExplainError to avoid looking at \(\mathsf {assume}\ \) statements along the trace, i.e., it can only block the failing assertion. This can result in ExplainError returning a strongerthannecessary condition but improves the convergence time of AngelicVerifier. This is a limitation that we are planning to address in future work.
Table 2 shows the comparison between PREfix and AngelicVerifier. In each case, the number of bug reports is indicated as B and the running time as CPU (in thousands of seconds). We found AngelicVerifier to be more verbose than PREfix, producing a higher number of reports (104 to 68). However, this was mostly because AngelicVerifier reported multiple failures with the same cause. For instance, \({\mathsf {x = null; if(...) \{ *x = ... \} else \{ *x = ... \}}}\) would be flagged as two buggy traces by AngelicVerifier but only one by PREfix. Thus, there is potential for postprocessing AngelicVerifier’s output, but this is orthogonal to the goals of this paper.
We report the number of PREfix traces matched by some trace of AngelicVerifier as PM. To save effort, we consider all such traces as true positives. We manually examined the rest of the traces. We classified traces reported by AngelicVerifier but not by PREfix as either false positives of AngelicVerifier (FP) or as false negatives of PREfix (PREFN). The columns FN and PREFP are the duals, for traces reported by PREfix but not by AngelicVerifier.
PREfix is not a desktop application; one can only invoke it as a background service that runs on a dedicated cluster. Consequently, we do not have the running times of PREfix. AngelicVerifier takes 11 hours to consume all benchmarks, totaling 321 KLOC, which is very reasonable (for, say, overnight testing on a single machine).
Most importantly, AngelicVerifier is able to find most (80 %) of the bugs caught by PREfix, without any environment modeling! We verified that under a demonic environment, the Corral verifier reports 396 traces, most of which are false positives.
AngelicVerifier has 11 false positives; 5 of these are due to missing stubs (e.g., a call to the \({\mathsf {KeBugCheck}}\) routine does not return, but AngelicVerifier, in the absence of its implementation, does not consider this to be a valid specification). All of these 5 were suppressed when we added a model of the missing stubs. The other 6 reports turn out to be a bug in our compiler frontend, where it produced the wrong IR for certain features of C. (Thus, they are not issues with AngelicVerifier.) AngelicVerifier has 9 false negatives. Out of these, 1 is due to a missing stub (where it was valid for it to return a null pointer), 4 due to Corral timing out, and 5 due to our frontend issues.
Interestingly, PREfix misses 11 valid defects that AngelicVerifier reports. Out of these, 6 are reported by AngelicVerifier because it finds an inconsistency with an angelic assertion; we believe PREfix does not look for inconsistencies. We are unsure of the reason why PREfix misses the other 5. We have reported these new defects to the product teams and are awaiting a reply. We also found 4 false positives in PREfix’s results (due to infeasible path conditions).
A comparison between default and defaultAA reveals that 11 traces were found because of an inconsistency with an angelic assertion. We have already mentioned that 6 of these are valid defects. The other 5 are again due to frontend issues.
In summary, AngelicVerifier matched 80 % of PREfix’s reports, found new defects, and reported very few false positives.
6 Related Work
Our work is closely related to previous work on abductive reasoning [7, 10, 11, 20] in program verification. Dillig et al. [20] perform abductive reasoning based on quantifier elimination of variables in \({ wlp}\) that do not appear in the minimum satisfying assignment of \(\lnot { wlp}\). The method requires quantifier elimination that is difficult in the presence of richer theories such as quantifiers and uninterpreted functions. Our method \({ ProjectAtoms}\) can be seen as a (lightweight) method for performing Boolean quantifier elimination (without interpreting the theory predicates) that we have found to be effective in practice. It can be shown that the specifications obtained by the two methods can be incomparable, even for arithmetic programs. Calcagno et al. use biabductive reasoning to perform bottomup shape analysis [10] of programs, but performed only in the context of intraprocedural reasoning. In comparison of this work, we provide configurability by being able to control parts of vocabulary and the check for permissiveness using \(\hat{{\mathcal A}}\). The work on almostcorrect specifications [7] provides a method for minimally weakening the \({ wlp}\) over a set of predicates to construct specifications that disallow dead code. However, the method is expensive and can be only applied intraprocedurally.
Several program verification techniques have been proposed to detect semantic inconsistency bugs [21] in recent years [19, 23, 36]. Our work can be instantiated to detect this class of bugs (even interprocedurally); however, it may not be the most scalable approach to perform the checks. The work on angelic nondeterminism [8] allows for checking if the nondeterministic operations can be replaced with deterministic code to succeed the assertions. Although similar in principle, our end goal is bug finding with high confidence, as opposed to program synthesis. The work on angelic debugging [12] and BugAssist [26] similarly look for relevant expressions to relax to fix a failing test case. The difference is that the focus is more on debugging failing test cases and repairing a program.
The work on ranking static analysis warnings using statistical measures is orthogonal and perhaps complementary to our technique [28]. Since these techniques do not exploit program semantics, such techniques can only be used as a postprocessing step (thus offering little control to users of a tool). Finally, work on differential static analysis [2] can be leveraged to suppress a class of warnings with respect to another program that can serve as a specification [29, 32]. Our work does not require any additional program as a specification and therefore can be more readily applied to standard verification tasks. The work on CBUGS [27] leverages sequential interleavings as a specification while checking concurrent programs.
7 Conclusions
We presented the angelic verification framework that constrains a verifier to search for warnings that cannot be precluded with acceptable specifications over unknowns from the environment. Our framework is parameterized to allow a user to choose different instantiations to fit the precisionrecall tradeoff. Preliminary experiments indicate that such a tool can indeed be competitive with industrial tools, even without any modeling effort. With subsequent modeling (e.g. adding a harness), the same tool can find more interesting warnings.
Footnotes
 1.
At http://corral.codeplex.com, project \({\mathsf {AddOns\backslash AngelicVerifierNull}}\).
References
 1.Arlt, S., Schäf, M.: Joogie: infeasible code detection for java. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 767–773. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 2.Lahiri, S.K., Vaswani, K., Hoare, C.A.R.: Differential static analysis: opportunities, applications, and challenges. In: Proceedings of the Workshop on Future of Software Engineering Research, FoSER 2010, at the 18th ACM SIGSOFT, International Symposium on Foundations of Software Engineering, November 711, 2010, pp. 201–2014, Santa Fe, NM, USA (2010)Google Scholar
 3.Ball, T., Levin, V., Rajamani, S.K.: A decade of software model checking with SLAM. Commun. ACM 54(7), 68–76 (2011)CrossRefGoogle Scholar
 4.Barnett, M., Leino, K.R.M.: Weakestprecondition of unstructured programs. In: Program Analysis For Software Tools and Engineering (PASTE 2005), pp. 82–87 (2005)Google Scholar
 5.Barnett, M., Qadeer, S.: BCT: a translator from MSIL to Boogie. In: Seventh Workshop on Bytecode Semantics, Verification, Analysis and Transformation (2012)Google Scholar
 6.Bessey, A., Block, K., Chelf, B., Chou, A., Fulton, B., Hallem, S., HenriGros, C., Kamsky, A., McPeak, S., Engler, D.: A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53(2), 66–75 (2010)CrossRefGoogle Scholar
 7.Blackshear, S., Lahiri, S.K.: Almostcorrect specifications: a modular semantic framework for assigning confidence to warnings. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2013, pp. 209–218, Seattle, WA, USA, 16–19 Jun 2013Google Scholar
 8.Bodík, R., Chandra, S., Galenson, J., Kimelman, D., Tung, N., Barman, S., Rodarmor, C.: Programming with angelic nondeterminism. In: Principles of Programming Languages (POPL 2010), pp. 339–352 (2010)Google Scholar
 9.Bush, W.R., Pincus, J.D., Sielaff, D.J.: A static analyzer for finding dynamic programming errors. Softw. Pract. Exper. 30(7), 775–802 (2000)CrossRefzbMATHGoogle Scholar
 10.Calcagno, C., Distefano, D., O’Hearn, P.W., Yang, H.: Compositional shape analysis by means of biabduction. In: Proceedings of the 36th ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2009, pp. 289–300, Savannah, GA, USA, 21–23 Jan 2009Google Scholar
 11.Chandra, S., Fink, S.J., Sridharan, M.: Snugglebug: a powerful approach to weakest preconditions. In: Programming Language Design and Implementation (PLDI 2009), pp. 363–374 (2009)Google Scholar
 12.Chandra, S., Torlak, E., Barman, S., Bodik, R.: Angelic debugging. In: Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, pp. 121–130. ACM, New York, NY, USA (2011)Google Scholar
 13.Clarke, E.M., Grumberg, O., Peled, D.A.: Model Checking. MIT Press, Cambridge (2000)Google Scholar
 14.Clarke, E.M., Kroening, D., Yorav, K.: Behavioral consistency of C and verilog programs using bounded model checking. In: Proceedings of the 40th Design Automation Conference, DAC 2003, pp. 368–371, Anaheim, CA, USA, 2–6 Jun 2003Google Scholar
 15.Condit, J., Hackett, B., Lahiri, S.K., Qadeer, S.: Unifying type checking and property checking for lowlevel code. In: Principles of Programming Languages (POPL 2009), pp. 302–314 (2009)Google Scholar
 16.Cousot, P., Cousot, R.: Abstract interpretation : a unified lattice model for the static analysis of programs by construction or approximation of fixpoints. In: Symposium on Principles of Programming Languages (POPL 1977), ACM Press (1977)Google Scholar
 17.Detlefs, D., Nelson, G., Saxe, J.B.: Simplify: a theorem prover for program checking. J. ACM 52(3), 365–473 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Dijkstra, E.W.: Guarded commands, nondeterminacy and formal derivation of programs. Commun. ACM 18(8), 453–457 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Dillig, I., Dillig, T., Aiken, A.: Static error detection using semantic inconsistency inference. In: Programming Language Design and Implementation (PLDI 2007), pp. 435–445 (2007)Google Scholar
 20.Dillig, I., Dillig, T., Aiken, A.: Automated error diagnosis using abductive inference. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2012, pp. 181–192. ACM, New York, NY, USA, (2012)Google Scholar
 21.Engler, D.R., Chen, D.Y., Chou, A.: Bugs as inconsistent behavior: a general approach to inferring errors in systems code. In: Symposium on Operating Systems Principles (SOSP 2001), pp. 57–72 (2001)Google Scholar
 22.Graf, S., Saïdi, H.: Construction of abstract state graphs with PVS. In: Grumberg, O. (ed.) CAV 1997. LNCS, vol. 1254, pp. 72–83. Springer, Heidelberg (1997)CrossRefGoogle Scholar
 23.Hoenicke, J., Leino, K.R.M., Podelski, A., Schäf, M., Wies, T.: Doomed program points. Form. Meth. Syst. Des. 37(2–3), 171–199 (2010)CrossRefzbMATHGoogle Scholar
 24.Ivančić, F., Yang, Z., Ganai, M.K., Gupta, A., Shlyakhter, I., Ashar, P.: FSoft: software verification platform. In: Etessami, K., Rajamani, S.K. (eds.) CAV 2005. LNCS, vol. 3576, pp. 301–306. Springer, Heidelberg (2005) CrossRefGoogle Scholar
 25.Jhala, R., Majumdar, R.: Path slicing. In: Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pp. 38–47, Chicago, IL, USA, 12–15 Jun 2005Google Scholar
 26.Jose, M., Majumdar, R.: Cause clue clauses: error localization using maximum satisfiability. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, pp. 437–446, San Jose, CA, USA, 4–8 Jun 2011Google Scholar
 27.Joshi, S., Lahiri, S.K., Lal, A.: Underspecified harnesses and interleaved bugs. In: Principles of Programming Languages (POPL 2012), pp. 19–30, ACM (2012)Google Scholar
 28.Kremenek, T., Engler, D.R.: Zranking: using statistical analysis to counter the impact of static analysis approximations. In: Cousot, R. (ed.) SAS 2003. LNCS, vol. 2694, pp. 295–315. Springer, Heidelberg (2003)CrossRefGoogle Scholar
 29.Lahiri, S.K., McMillan, K.L., Sharma, R., Hawblitzel, C.: Differential assertion checking. In: Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2013, pp. 345–355, Saint Petersburg, Russian Federation, 18–26 Aug 2013Google Scholar
 30.Lahiri, S.K., Qadeer, S., Galeotti, J.P., Voung, J.W., Wies, T.: Intramodule inference. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 493–508. Springer, Heidelberg (2009) CrossRefGoogle Scholar
 31.Lal, A., Qadeer, S., Lahiri, S.K.: A solver for reachability modulo theories. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 427–443. Springer, Heidelberg (2012) CrossRefGoogle Scholar
 32.Logozzo, F., Lahiri, S.K., Fähndrich, M., Blackshear, S.: Verification modulo versions: towards usable verification. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2014, p. 32, Edinburgh, United Kingdom, 09–11 Jun 2014Google Scholar
 33.McMillan, K.L.: An interpolating theorem prover. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 16–30. Springer, Heidelberg (2004) CrossRefGoogle Scholar
 34.Satisfiability modulo theories library (SMTLIB). http://goedel.cs.uiowa.edu/smtlib/
 35.Stump, A., Barrett, C.W., Dill, D.L., Levitt, J.R.: A decision procedure for an extensional theory of arrays. In: IEEE Symposium of Logic in Computer Science (LICS 2001) (2001)Google Scholar
 36.Tomb, A., Flanagan, C.: Detecting inconsistencies via universal reachability analysis. In: International Symposium on Software Testing and Analysis (ISSTA 2012) (2012)Google Scholar