Verifying Whiley Programs with Boogie

The quest to develop increasingly sophisticated verification systems continues unabated. Tools such as Dafny, Spec#, ESC/Java, SPARK Ada and Whiley attempt to seamlessly integrate specification and verification into a programming language, in a similar way to type checking. A common integration approach is to generate verification conditions that are handed off to an automated theorem prover. This provides a nice separation of concerns and allows different theorem provers to be used interchangeably. However, generating verification conditions is still a difficult undertaking and the use of more “high-level” intermediate verification languages has become commonplace. In particular, Boogie provides a widely used and understood intermediate verification language. A common difficulty is the potential for an impedance mismatch between the source language and the intermediate verification language. In this paper, we explore the use of Boogie as an intermediate verification language for verifying programs in Whiley. This is noteworthy because the Whiley language has (amongst other things) a rich type system with considerable potential for an impedance mismatch. We provide a comprehensive account of translating Whiley to Boogie which demonstrates that it is possible to model most aspects of the Whiley language. Key challenges posed by the Whiley language included: the encoding of Whiley’s expressive type system and support for flow typing and generics; the implicit assumption that expressions in specifications are well defined; the ability to invoke methods from within expressions; the ability to return multiple values from a function or method; the presence of unrestricted lambda functions; and the limited syntax for framing. We demonstrate that the resulting verification tool can verify significantly more programs than the native Whiley verifier which was custom-built for Whiley verification. Furthermore, our work provides evidence that Boogie is (for the most part) sufficiently general to act as an intermediate language for a wide range of source languages.


Introduction
The idea of verifying that a program meets a given specification for all possible inputs has been studied for a long time. Part of the appeal of software verification is that it can ensure theoretical correctness of a software module for all possible usages. This is complementary to testing which, by acting at a more concrete level, may detect resource or hardware errors that are typically outside the scope of software verification [43].
Verifying compilers often target an intermediate verification language, such as Boogie [14], WhyML [28,65] or Viper [125], as these provide a nice separation of concerns and allow different theorem provers to be used interchangeably. SMT-LIB [21] provides another standard readily accepted by modern automated theorem provers, although it is often considered rather low level [28]. One issue faced by intermediate verification languages is the potential for an impedance mismatch [139] (see Sect. 5). This arises when constructs in the source language cannot be easily translated into those of the intermediate verification language (and vice versa).
Whiley is a programming language with first-class support for software specifications that is designed to simplify verification [43, 134, 135, 137-140, 159, 168, 169]. An important goal was to develop a system which is as accessible as possible and which one could imagine being used in a day-to-day setting. As such, Whiley superficially resembles a modern imperative language and employs flow typing [77,133,160] to eliminate unnecessary casts (which also aids specification). The ultimate aim is that all programs written in Whiley will be verified at compile time to ensure their specifications hold which, for example, has obvious application in safety-critical systems [40,134]. In this paper, we explore Boogie as an intermediate verification language for Whiley. Our motivation is the desire to improve the verification capability of Whiley by leveraging the significant resources already invested in the development of Boogie (and Z3). A particular concern is the potential for an impedance mismatch arising, such as from Whiley's type system (e.g. which supports union types and flow typing). The contributions of this paper include: -(Translation) A comprehensive account of our encoding of Whiley programs into Boogie for the purpose of verification. Whilst in many cases the translation is straightforward, a number of challenges had to be overcome arising from Whiley's design, including: the encoding of Whiley's expressive type system and support for flow typing and generics; Whiley's implicit assumption that expressions in specifications are well defined; the ability to invoke methods from within expressions; the ability to return multiple values from a function or method; the presence of unrestricted lambda functions; and Whiley's limited syntax for framing. -(Evaluation) An empirical comparison between Boogie/Z3 and the native Whiley verifier using the existing suite of 1100+ tests provided for the Whiley compiler. The results confirm that Boogie/Z3 significantly outperforms the Whiley native verifier in terms of the number of tests passing. -(Case Studies) A report into the use of Boogie/Z3 to verify a number of larger Whiley programs, including a web-based implementation of Conway's Game of Life and a number of challenges from the VerifyThis 2019 competition [59]. From these case studies we identify several areas in which the Whiley language or libraries could be improved to better exploit Boogie.
We note also that our work provides further evidence of Boogie's utility as a generalpurpose intermediate verification language. In particular, compared with Dafny or Spec#, Whiley was developed entirely independently from Boogie and includes various design choices that are not necessarily a natural fit. As such, it was unclear from the outset of this project whether or not Boogie would be sufficiently general for this task. Finally, compared with our earlier paper [164], this paper represents a significant evolution and improvement of our translation. We also provide a much more detailed account which covers almost the entire language, including generics, lambdas, references and the handling of various soundness issues. Our evaluation now includes a number of larger case studies, and we have expanded the related work discussion. Organisation. The remainder of this paper is organised as follows: Sect. 2 provides an introduction to Whiley and Boogie; Sect. 3 provides a detailed description of our Whiley-to-Boogie translator and discusses the various challenges encountered; Sect. 4 presents our evaluation using the existing Whiley compiler test suite and various case studies; Sect. 5 examines the related work; and finally, Sect. 6 concludes. Finally, for reference, "Appendix A" illustrates our verified version of Conway's Game of Life.

Background
We begin with an overview of Whiley and then a brief discussion of Boogie.

Whiley
The Whiley programming language has been developed to enable compile time verification of programs and, furthermore, to make this accessible to everyday programmers [139,159]. The Whiley Compiler (WyC) attempts to ensure that all functions and methods in a program meet their specifications. When this succeeds, we know that: (i) all function/method postconditions are met (assuming their preconditions held on entry); (ii) all invocations meet the respective function or method precondition; (iii) runtime errors such as divide-by-zero, out-of-bounds accesses and null-pointer dereferences cannot occur. Notwithstanding, such programs may still loop indefinitely and/or exhaust available resources (e.g. stack or heap).
This counts the number of nodes in a list. Here, we see flow typing in action as list is automatically retyped to Node<T> on the false branch [132,133]. Flow typing turns out to be particularly useful when specifying programs. Specifically, in (x is T) ==> e it follows that x has type T within the expression e . This helps, for example, when writing postconditions (as we'll see shortly).

Value Semantics
The semantics of Whiley diverge from many mainstream languages (e.g. Java) in the treatment of compound data types, such as arrays. Specifically, arrays and records in Whiley have value semantics. This means they are passed and returned by value (as in Pascal, MATLAB [98] or most functional languages). But, unlike functional languages (and like Pascal), values of compound types can be updated in place [129,151]. This latter point serves to give Whiley the appearance of an imperative language when, in fact, Whiley has a functional core. The following illustrates: Despite appearances, the above is a pure function which has no side effects. This contrasts with languages like Java, where arrays are references and updating them has unavoidable side effects. The following attempts to clarify this further: § ¤ In a language like Java, the assertion xs[0] == 1 would fail because xs and ys would alias each other. However, since this is not the case in Whiley, the above verifies without problem. We can think of arrays and records in Whiley as being immutable, so that updating them effectively means cloning them. The reason this semantics is adopted in Whiley is to facilitate their use in specification. Indeed, without a fundamental immutable collection type, verification is inherently challenging [99].

Side Effects
A function in Whiley is pure and cannot have side effects. In contrast, a method is impure and may have side effects, such as mutating the global heap or performing I/O. Whiley provides reference types which are allocated from a single global heap. For example, &int is a reference to an integer variable. The following illustrates the syntax: § ¤ &int p = new 1 &int q = p *p = 2 assert *p == *q ¦ ¥ Here, the assignment through p affects q (because they are aliases), and hence, the final assertion holds. We note that, at the time of writing, Whiley supports allocation but not deallocation (and, hence, currently relies on garbage collection).
Statements which mutate the heap must appear within the body of a method and, for example, are not permitted within a function . To illustrate a more complete example, here is the classical algorithm for reversing a linked list [6]: § ¤ type LinkedList<T> is null | &{T data, LinkedList<T> next} method reverse<T>(LinkedList<T> v) -> (LinkedList<T> r): We note that the above is not yet fully specified, and this would be necessary before its behaviour could be fully verified (more on this later).

Packaging
Whiley currently supports a relatively limited form of packages and package management. For example, the standard library, STD.wy, can be added as a dependency and compiled against. The following illustrates a simple example: § ¤ import std::ascii import append from std::array function to_string(int[] items) -> (ascii::string str): ascii::string r = "[" // Convert each element to an ascii string for i in 0..|items|: // Add comma (when necessary) if i != 0: r = append(r,",") // Add element as string r = append(r,ascii::to_string(items[i])) return append(r,"]") ¦ ¥ The above illustrates a simple function for converting an integer array into a string. This employs standard library functions from the modules std::ascii and std::array .

Specification and Verification
We now consider those features of Whiley provided for specifying and verifying programs. Figure 1 provides an initial example to illustrate the salient features: -Properties are used to specify things of interest, particularly to help with verification.
They are interpreted meaning that, during verification, they can be expanded/unrolled as necessary. To facilitate this, they have a restricted form allowing them to be substituted in place for their body. In contrast, functions are uninterpreted which helps ensure verification remains (mostly) modular [78]. This means that, during verification, their actual implementation is ignored at call sites (more on this below). -Preconditions are given by requires clauses and postconditions by ensures clauses. Multiple clauses are simply conjoined together. We have found that allowing multiple requires and/or ensures clauses can help readability, and note that JML [48], Spec# [16] and Dafny [104] also permit this. -Loop invariants are given by where clauses. Figure 1 illustrates an inductive loop invariant covering indices from zero to i (exclusive). Similarly, type invariants arise from where clauses. For example, type nat has an invariant and is used for variable i to avoid the need for a loop invariant of the form i >= 0 . We consider good use of type invariants as critical to improving the readability of function specifications. -Assertions must be statically checked during verification, thus providing a useful debugging tool. For example, if during verification we are struggling to understand why a given postcondition is not met, assertions can be added to check our beliefs at a given point. In contrast, assumptions are not statically checked and, instead, are simply assumed to hold during verification. As such, they are a useful tool for overriding the verifier in cases where it cannot establish something we know to be true. -Flow typing simplifies postconditions (amongst other things) by ensuring that casts need not be given. For example, without flow typing, the first ensures clause from Figure 1 would require a cast for r on the right-hand side.
Being uninterpreted means a function's implementation can change arbitrarily without affecting callers provided it still meets its specification. However, it also means that functions need to be properly specified before they can be used, which is sometimes problematic (e.g. when several functions are developed in tandem). For example, consider the following: Here, the first assert follows from the specification of swap(&int,&int) . In contrast, the second follows because the state referred to by z is not reachable from any parameter passed to swap(&int,&int) and, hence, could not be modified by it.

Boogie
Boogie [14] is an intermediate verification language developed by Microsoft Research as part of the Spec# project [16]. Boogie is intended as a back end for other programming language and verification systems [106], and has found use in various tools, such as Dafny [104], VCC [45], and others (e.g. [25]). Boogie is both a specification language (which shares some similarity with Dijkstra's language of guarded commands [57]) and a tool for checking that Boogie "programs" are correct. The original Boogie language was "somewhat like a high-level assembly language in that the control flow is unstructured but the notions of statically scoped locals and procedural abstraction are retained" [14]. However, later versions support structured if and while statements to improve readability. Nevertheless, a nondeterministic goto statement is retained for encoding arbitrary control flow, which permits multiple target labels with non-deterministic choice. Boogie provides various primitive types including bool , int and map types, which can be used to model arrays and records. Concepts such as a "program heap" can also be modelled using a map from references to values.
Boogie supports function and procedure declarations which have an important distinction. In general, functions are pure and can be used within the Boogie logic, such as in axioms and specifications. In contrast, procedures are potentially impure and are intended to model methods in the source language. A procedure can be given a specification composed of requires and ensures clauses, and also a modifies clause indicating non-local state that can be modified. Most importantly, a procedure can be given an implementation , and the tool will attempt to ensure this implementation meets the given specification. The requires and ensures for procedures demarcate proof obligations, for which Boogie emits verification conditions in first-order logic to be discharged by Z3. In addition, the implementation of a procedure may include assert and assume statements. The former lead to proof obligations, whilst the latter give properties which the underlying theorem prover can exploit.
To illustrate Boogie, Figure 2 provides an example encoding of the indexOf() function into Boogie. Note that the example encodings used in this section are a little different to the more sophisticated encoding used later in the paper. At first glance, it is perhaps surprising how close to an actual programming language Boogie has become. Various features of the language are demonstrated with this example. Firstly, an array length operator is encoded using an uninterpreted function len() , and accompanying axiom . Secondly, the input array is modelled using the map [int]int , which is a total mapping from arbitrary integers to arbitrary integers. For example, xs[-1] identifies a valid element of the map despite -1 not normally being a valid array index (e.g. in Whiley). We can refine this to something closer to an array through additional constraints, as shown in the next section.
Whilst the structured form of Boogie is preferred, where possible, it is also useful to consider the unstructured form, which we use for a few Whiley constructs such as switch (Sect. 3.4.1). Figure 3 provides an unstructured encoding of the indexOf() function from Figure 2. In this version, the while loop is decomposed using a non-deterministic goto statement-the goto LOOP_BODY, LOOP_EXIT statement allows flow of control to jump to either label, but the assume statements after those labels block progress if their condition is false. Likewise, in this unstructured encoding, the loop condition and invariant are explicitly assumed (lines 8,9,12) and asserted (lines 15,16), rather than being done implicitly by the tool (as in Figure 2). The havoc statement"assigns an arbitrary value to each indicated variable" [14], so is used here to indicate that variable i contains an arbitrary integer value at this point.
Finally, we note that Boogie allows one to designate preconditions, postconditions and loop invariants as free . This allows Boogie to assume these conditions hold without checking them-thereby (potentially) reducing overall verification time [103].

Modelling Whiley in Boogie
Our goal is to model as much of the Whiley language as possible in Boogie, so that we can utilise Boogie for verifying Whiley programs. Indeed, the motivation for this project was the hope that Boogie would offer significantly better verification capability than the existing (and relatively ad hoc) native verifier used in Whiley (and, as Sect. 4 shows, this is the case). At  Figure 2-the pre-/postconditions are omitted as they are unchanged from above, and likewise for len() a superficial level, Whiley's native verifier is not so different from Boogie/Z3. In particular, it employs an intermediate assertion language in which verification conditions are encoded and then discharged using a purpose-built SMT solver [139]. A key advantage is that the generated verification conditions resemble the Whiley source language much more closely. Nevertheless, whilst this toolchain has potential, it remains relatively immature compared with Boogie/Z3 and the considerable resources invested in their development [16]. However, this transition is not without challenges as, despite their obvious similarities, there remain significant differences between Whiley and Boogie: -Types. Whiley has a relatively rich (structural) type system which includes: union, record, array, reference and lambda types. Furthermore, there is support for type polymorphism through generics. -Flow Typing. Whiley's support for flow typing is also problematic, as a given variable may have different types at different program points and there is a need to support runtime type tests [133]. To understand the definedness issue, consider a precondition that contains an array reference, like requires a[i] == 0 . In a language like Dafny, one would additionally need to explicitly specify i >= 0 && i < |a| to avoid the verifier reporting an out-of-bounds error. Such preconditions are implicit in Whiley, so must be (automatically) extracted by our translator and made explicit in the generated Boogie.
We now present the main contribution of this paper, namely a mechanism for translating Whiley programs into Boogie, which is implemented in our translator program, called Wy2B. 1

Types
Finding an appropriate representation of Whiley types is a challenge. We begin by considering the straightforward (i.e. naive) shallow translation of Whiley types into Boogie, and highlight why this fails. Then, we present a more sophisticated approach which corresponds more closely with a deep embedding of types.

Primitives
Integers. The mapping functions for the Whiley int type of unbounded integers are as follows (recall int is also the Boogie name for integers). Coercions. In order to utilise our deep embedding, values must be coerced to / from primitive Boogie types. Consider an assignment x = 0 where x has type int|null . Since union types in Whiley are encoded as type Any in Boogie, we must coerce the value 0 (of Boogie type int ) into its embedded form via Int#box() . Such an assignment is thus translated as x := Int#Box(0); . In general, our translation attempts to minimise the amount of boxing/unboxing. For example, generated expressions of the form Int#Unbox(Int#Box(x)) are automatically reduced to x , etc. Amongst other things, this helps to simplify debugging!

Arrays
Whiley arrays are fixed-length sequences of values whose length can be queried at runtime (recall from Sect. 2.1.3 they have value semantics). We model Whiley arrays using: (1) a Boogie map [int]Any from integers to Any values; and (2) an uninterpreted function returning the length. The embedding requires a number of additional axioms, as follows. As before, we provide extraction/injection functions as follows: A key aspect of our embedding is the treatment of indices which are out-of-bounds. The primary issue is that Boogie maps (e.g.
[int]Any ) are infinite structures with no concept of bounds. Elements which have not been explicitly defined always exist with some arbitrary value. This presents a problem for equality of arrays, as illustrated in Figure 4. To resolve this we fix all out-of-bounds indices to the special Void value, and enforce this throughout the axioms that follow. Array Length. We employ the following function for extracting the length of an array: § ¤ // Extraction for array length function Array#Length([int]Any) returns (int); // Length of an array is non-negative axiom (forall a:[int]Any :: 0 <= Array#Length(a)); // Updates don't affect array length axiom (forall a:[int]Any,i:int,v:Any :: (v != Void && Array#in(a,i)) ==> (Array#Length(a) == Array#Length(a[i:=v]))); ¦ ¥ In the above, we take steps to ensure the axioms remain consistent. To understand this, consider the last axiom above which holds the array length invariant across an update. The value v being assigned cannot be Void as, otherwise, we could artificially reduce an array's length (e.g. by assigning Void to the last element). Finally, whilst our encoding of arrays here may appear somewhat elaborate, it does allow us to exploit Boogie's internal notion of equality. An alternative, however, would be to define a bespoke equality operator for arrays (though this is complicated by the presence of unions and recursive types). Array Initialisers. Array values in Whiley can be constructed using the array literal syntax (e.g. [0,4,3] , etc.). This creates an array containing the given values (zero-indexed). To translate this we employ a constructor, Array#Empty(int) , as follows: The intuition is that Array#Empty(n) constructs an uninitialised array of size n , whose elements must then be initialised individually. For example, the array literal [6,3] is translated into Array#Empty(2)[0:=Int#box (6) ; // Fix out-of-bounds indices for array generator axiom (forall v:Any,l:int,i:int :: (i < 0 || l <= i) ==> (Array#Generator(v,l)[i] == Void)); // Array length must match length of array generator axiom (forall a:[int]Any,v:Any,l:int :: (0<=l && Array#Generator(v,l)==a) ==> Array#Length(a)==l); ¦ ¥

Records
Records are encoded using maps, [Field]Any , where Field characterises field names. For every field name used within the program, a unique constant is created. For example, if the type {int x, int y} is used then the following constants are generated: § ¤ type Field; // Set of all field names const unique $x : Field; const unique $y : Field; ¦ ¥ These constants are then used as indices for the map encoding of the record (and any other record type containing a field x or y ). The constants are marked unique to ensure they are disjoint. Thus, the number of constants generated depends on exactly what types are used within the target program. As for arrays, care must be taken when encoding a given record to ensure that all other fields are mapped to Void . Again, various functions and axioms are provided to allow records to be embedded within other compound types: § ¤  (2)] .

Generics
Type polymorphism in Whiley presents a number of challenges when translating to Boogie. Roughly speaking, we translate generic types (e.g. T ) into Boogie's Any type. We will return to discuss this in more detail later (see §3.4).

Lambdas
The ability to pass around first-class functions and methods as lambdas also presents some challenges, since lambdas in Boogie are relatively restricted. We return to discuss this in more detail later (see §3 The above is useful in various situations where there is no logical heap (more on this later). In particular, since it does not provide any guarantee about its contents, it cannot be relied upon at all.

User-Defined Types
Our treatment of user-defined types follows naturally from our embedding of types discussed above. Roughly speaking, we can consider that every user-defined type in Whiley consists of two parts: firstly, its base or underlying type; secondly, its invariants (if any). For example, consider the following Whiley declaration: The underlying type of nat is int , and it enforces a single invariant x >= 0 . In our translation to Boogie, this declaration would produce the following: 3  This allows for several different use cases. For example, if we have a variable of type nat and wish to assume or assert its invariant, then nat#inv() can be applied directly. Alternatively, if we are reading such a variable from a boxed position (e.g. out of an array or record), then nat#is() can be applied. Observe also that, for uniformity, such methods always accept a HEAP parameter even if (as in this case) this is not used. This parameter is necessary for user-defined types which are, or contain, references. For example, consider this declaration which builds upon the definition of nat : Here, pNat#is() enforces nat#is() upon the element in HEAP referred to by p . Thus it becomes clear that the embedding of a reference type only makes sense in the context of a given HEAP .

Constants
Global constants in Whiley require care to ensure a safe translation. A well-known issue with Boogie arises when specifications written by the user (i.e. in Whiley) are translated into unguarded Boogie axioms. In such cases, the user can be considered as maliciously injecting problematic (though rarely useful) code. 4 For example, a user can (perhaps accidentally) insert axiom false; (or some equivalent thereof) into the generated Boogie file. Unfortunately, the presence of such a declaration allows Boogie to immediately verify all assertions in the file (i.e. regardless of whether they are correct or not) [102]. More importantly, Boogie does not report this as an error, and hence, it happens silently without the user being made aware. To see how this applies to constants in Whiley, consider the following (recall definition of nat from page 19): § ¤ The challenge here is to ensure the value being assigned adheres to any type invariant(s) required of x . One approach is to generate a typing axiom, such as axiom nat#is(x) , for this. Whilst this is sufficient for the above example, a problem arises if the value assigned was -1 instead of 0 . In such case, the translation leads to the following: Unfortunately, these axioms conflict as they imply both x == -1 and x >= 0 (which is equivalent to axiom false ). To protected against this, we stratify our translation into two levels: the first establishes global constants are correctly initialised, and the second verifies functions and methods assuming they are correctly initialised. Following the approach taken in Dafny [104], this is done using a special constant Context#Level . The following illustrates the translation of our example above: The above verifies without trouble. However, were x to be initialised with -1 , Boogie would now correctly report a failed proof obligation inside the x#check() method. Furthermore, note that all procedure bodies generated from functions or methods in Whiley require Context#Level > 1 to ensure access to x 's invariant (see Figure 6 below).

Properties
Properties in Whiley are straightforward as they can be translated directly as Boogie functions. For example, consider the following property in Whiley: As for types, properties always accept a HEAP parameter for uniformity even when not needed. A key observation is that Boogie functions are strictly more expressive than properties in Whiley, and we will return later to consider the impact of this (see Sect. 4.4).

Functions
Recall that functions in Whiley are pure, have bodies comprised of statement blocks and may have multiple return values. This differs from functions in Boogie, whose bodies are made up of a single expression and can only return a single value. This presents challenges: firstly, the body of a Whiley function corresponds more closely with a Boogie procedure; but, secondly, functions in Whiley can be called from specification elements (e.g. pre-/postconditions) whereas Boogie procedures cannot. As such we provide a two-pronged translation (similar to that found in Dafny [102]) comprising: a prototype implemented as a Boogie function which can be invoked from a specification element; and a body, implemented as a Boogie procedure, which can be invoked directly from the body of other functions or methods. Figure 5 illustrates a simple function written in Whiley which we adopt as a running example, whilst the generated Boogie for this is shown in Figure 6. We will endeavour to fully clarify all aspects of this figure over the coming pages, but for now we focus on the procedure's specification. Here, additional requires clauses are included to enforce the type of xs and, likewise, additional ensures clauses for the type of rs . Whilst the soundness assumption for constants was discussed above, we will return to discuss the purpose of the function prototype and linkage later. We note also that, whilst functions in Whiley cannot modify the heap, they can manipulate references as simple values (though cannot mutate through them). Generics. Since Whiley supports type polymorphism, we might like to upgrade our fill() function as follows: As discussed in Sect. 3.1.4, we translate the Whiley type T as Boogie type Any . In terms of verifying the above function in isolation, this presents no problems. However, in most cases, call sites of this function would expect to receive an array of the same type they put in. For example, consider this: In this case, Boogie must be able to determine that the return from fill() is an array of integers. Thus, a mechanism is required to enable our translation to state meta properties Fig. 6 Illustrating the generated Boogie code for the fill() example. Note that this is somewhat simplified as various details related to name mangling and parameter shadowing are omitted about the relationships between variable types (e.g. that they are the same). To do this, we introduce meta types as follows: Here, we see the generic type T is now passed as an argument to procedure fill() and using this we can, for example, make statements about the return value. For example, the postcondition now tells us at a given call site that all elements in the returned array have the same type as those elements in the input array. To make this work, we still need one additional piece. Specifically, for every type which can be used to instantiate a type variable (e.g. int in fill<int>() ) we construct a unique meta type constant. For example, the meta type constant for int is declared as follows: ...

¦ ¥
The key here is that size(HEAP,s) refers to the function prototype of size() , rather than its procedure. Furthermore, since size(HEAP,s) == r is ensured in the postcondition of procedure size() , we can verify statements such as the following: Correctness. An important limitation of Whiley is that it cannot ensure termination. For example, there is no equivalent syntax to decreasing as found in Dafny. As a result, non-terminating recursive functions can be verified with almost any postcondition.
The following illustrates such an example: Observe that the above function will never violate its postcondition and, hence, is correct up to non-termination. In the future, we expect Whiley to be extended with support for variant expressions such that a well-founded ordering over recursive calls can be specified to ensure termination.

Statements and Expressions
Translating most Whiley statements and expressions into Boogie is straightforward (see the similarities between Figures 1 and 2). Here, we describe only the interesting cases that present specific challenges. Variable Scoping.
Boogie requires all local variables to be declared at the start of a procedure body where, like most modern languages, Whiley allows variables to be declared with block scopes. Whilst, in most cases, this is relatively trivial to manage there are cases where name clashes arise. The following illustrates: In this case, the same variable is declared twice with different types. This is a problem because they have incompatible types, and hence, we cannot declare a single Boogie variable to cover both. Instead, we apply name mangling to ensure variables in different scopes have unique names. Well-Definedness. As highlighted already, Whiley's treatment of expressions (especially when used in specification elements such as pre-/postconditions) differs from other comparable systems (e.g. Dafny). In fact, handling this is straightforward and has been covered reasonably extensively elsewhere [102]. Essentially, when translating a Whiley expression, care must be taken to insert checks as necessary to ensure expressions are well defined. The following illustrates a simple example: Here, there is an implicit assumption that i >= 0 and i < |xs| . Of course, this may not actually be the case and we employ assert statements to check such preconditions. As such, the above is translated roughly as follows: § ¤ In this case, we must establish that x >= 0 holds after x is initialised, and also after it is subsequently reassigned. To do this, the above is translated as follows: Looking at Figure 6 provides further insight into this process. No assertion for invariant preservation is generated for i := i + 1 because the type of variable i is unconstrained. In other words, since the check would correspond to assert true; we simply optimise it away. However, such optimisation remains relatively simplistic, as checks are still produced unnecessarily for the xs := xs[i:=Int#box(x)] assignment. Invocation. Translating function invocations into Boogie presents something of a challenge, since functions can be invoked from arbitrary expressions (including specification elements discussed previously in §3.4). However, Boogie does not permit procedure invocations from within an expression, and provides only a simple statement form for calling procedures (e.g. call x := f(y); ). 6 In short, this means function invocations must be extracted from expressions. Consider the following snippet in Whiley: The above is translated into the following Boogie sequence: § ¤ call f#114 := f(x); y := f#114 + 1; ¦ ¥ Here a temporary variable, f#144 , is introduced to hold the value returned from f(x) . Thus, the order of evaluation for expressions is exposed by the order in which the calls are made prior to the final expression. In general, this approach works fine, but there are challenges. Short-circuit semantics presents the first challenge. For example, consider the following: In this case, we cannot just extract the function invocation and execute it before the if statement. Such a translation would model f(x) being executed every time the if statement is executed, which is not the case. Instead, we must carefully preserve short circuit semantics using unstructured branching as necessary. For example, we can translate the above as follows: ... falseLab: ¦ ¥ We can see that, whilst this gives a faithful rendition of the original program, it is quite low level and harder to comprehend. This issue is further compounded with loops, whose unstructured representation is far more verbose (recall Figure 2 versus Figure 3).

¦ ¥
Here, l1 corresponds with case 0,1 whilst l2 corresponds with the default case. Note also that cases do not fall through by default in Whiley. Furthermore, if there are nested break / continue statements these are translated into goto s as well.
Loops. Loops are also relatively easy to translate. Since Boogie supports only while loops, all other looping forms found in Whiley must be translated using this. Furthermore, since Boogie has no break or continue statement, we translate these using goto s as for switch statements. We note also that, for a do-while loop in Whiley, the loop invariant need not hold before the first iteration (which makes some proofs easier). Furthermore (if desired) one can always check the invariant on entry using an explicit assert statement.
One challenge faced in translating loops is the handling of types for variables which are modified in a loop. For example, in our translation of fill() our translator inserted additional loop invariants to preserve the type of variable xs (recall Figure 6). This is necessary because the postcondition for fill() restates that rs is an array of integers and this is not expressed explicitly in the type [int]Any . Indeed, this is stated for xs in the function's precondition but, since xs is modified in the loop, this information is lost within and after the loop (because Boogie sends its value to havoc). To resolve this, we must reassert this type information as a loop invariant. Furthermore, this is done for any variables modified in the loop.
A related issue, which our translator does not currently address, is that of preserving immutable properties of variables. Consider again the fill() example from Figure 5. In fact, this example does not verify as is with our translator! Again, key information about xs is lost within and after the loop. In this case, the information that needs to be preserved is that the length of xs is unchanged by the loop. In principle, our translator could be extended with a static analysis to infer this and add it implicitly as a loop invariant (but this remains future work). We note that this extends to records, as the following illustrates: Perhaps surprisingly, this also does not verify because the property b.len == 0 is not preserved across the loop. This can be fixed by performing the assignment to b.len after the loop. Or, we could add a loop invariant to ensure b.len == 0 is preserved. Lambdas. Boogie provides syntax (e.g. (lambda y:int :: y + 1) ) for lambdas (with map type [int]int in this case). They are comparable with Boogie functions and cannot, for example, call procedures, etc. As such, they are insufficient for representing lambdas in Whiley which can have side effects. Instead, we translate them into named Boogie procedures. Mostly this is straightforward, but a few challenges arise with captured variables. For example, consider the following: § ¤

type Pred<T> is function(T)->(bool) function isBelow(int n) -> Pred<int>: return &(int v -> v < n) ¦ ¥
Translating the lambda into a standalone procedure requires identifying captured variables ( n in this case) and adding them as parameters. The following illustrates: § ¤ Here, the procedure contains the body of the lambda, which will include any necessary checks on the lambda itself. Likewise, the constant lambda#131 is generated to represent this particular lambda. When translating an indirect invocation, we automatically generate a suitable prototype to invoke. Here, the function f_apply() is generated (in practice, with a suitable mangling) to represent the anonymous function being invoked. It accepts the lambda as a parameter, thus allowing one to exploit the fact that the same lambda returns the same value(s) when given the same parameter. Finally, we note that work remains to improve our translation of lambdas. In particular, information known about captured variables is not currently transferred to the generated procedure . Thus, the following fails to verify: § ¤

function isBelow(int[] xs, int i) -> Pred<int> // index i within bounds requires i >= 0 && i < xs[i]: // Return lambda return &(int v -> v < xs[i]) ¦ ¥
the generated procedure accepts the captured variables i and xs , but does not include a corresponding precondition. Whilst, in this case, it would be relatively easy to fix, in other cases it is more challenging (e.g. when a parameter has been modified prior to being captured). One approach, for example, would be to apply the Weakest Precondition transformer [17,57,101] to the body of the lambda (which should be relatively straightforward since this is just an expression).

Methods and Framing
Recall from Sect. 2.1.4 that methods in Whiley are permitted to have side effects and, for example, manipulate heap-allocated data through references. As such, Whiley methods correspond closely with procedures in Boogie. However, methods in Whiley can be called from expressions used in statements (though not from specification elements, such as pre-/postconditions or loop invariants). In many ways, the translation of methods follows that for functions, but with some important differences which we now consider.
Framing. Whilst the Whiley language provides relatively limited support for describing the effect a method has on the heap, a lot of machinery is nevertheless required to manage what can be expressed. As highlighted before, we adopt a relatively standard approach to modelling the heap. Specifically, a global variable HEAP of type [Ref]Any is provided to model this. For example, consider the following Whiley method: § ¤ method swap(&int p, &int q) ensures *p == old(*q) && *q == old(*p): ...

¦ ¥
Our translation produces both a procedure prototype and implementation in Boogie. The prototype for the above method looks roughly as follows: § ¤ Observe that the modifies clause is provided as we must conservatively assume methods may modify the heap. Note also that old() in Whiley is translated directly using Boogie's old() syntax. There are two essential issues here: typing and framing. The former simply makes explicit guarantees on the shape of the heap provided by Whiley's type system. For example, that for an integer reference p there is indeed an integer value at HEAP[p] , etc. The latter aspect of framing is perhaps more interesting. We divide framing into two separate conditions (both of which are marked free since Whiley's type system guarantees them). These conditions rely on a simple predicate for determining whether a reference is reachable from-or within-the frame of a given variable (see Figure 7). The first frame condition enforces self-framing [92] by ensuring that only locations within the method's frame can be modified: There are three parts of the condition as follows: 1. (Mutable) This identifies which locations could be modified by the method and, for these, does not provide a connection between the heap beforehand with that after. 2. (Immutable) For locations which could not be modified by the method, an explicit connection is made to ensure this between the heap beforehand and that after. 3. (Allocated) As a special case, heap locations which did not exist prior to the method (i.e. were mapped to Void ) can have arbitrary values afterwards.
In essence, the footprint of a method (i.e. those locations it could write) is conservatively tied with its frame (i.e. those locations it could read). This provides a straightforward and extensible basis for reasoning about how methods modify the heap. For example, if syntax for describing the old heap in postconditions was added to Whiley, this would easily layer on top. The key is that, in the absence of more expressive syntax for restricting the locations a method may modify, we must adopt a worst-case assumption that any reachable location could be modified.

¦ ¥
In essence, this ensures that any reference reachable from parameters p or q after the method was either freshly allocated, or was reachable from them beforehand. Note that, whilst for this particular method, these conditions are trivial they are required in general (e.g. for handling linked structures).
As a further example to illustrate the challenges addressed by the frame conditions, consider the following: § ¤ Establishing that l2 is not modified by the calls to clear(l1) above requires both frame conditions (something which is not immediately obvious at first glance). It is clear that the first frame condition (self-framing) allows us to establish that l2 is not modified by the first call. One might then conclude the first condition is sufficient to establish this across both calls-but that is not the case! The challenge is that clear(l1) ensures l2 is not modified, but allows l1 to be modified. Without the second frame condition, the verifier might then consider that l2 was within l1 after the first call (e.g. that l1->next == l2 ). And, in such case, it would then rightly conclude that l2 could be modified by the second call. As such, we see how the second frame condition helps to ensure that disjoint frames remain disjoint.
Finally, we note that our encoding makes heavy use of a recursive predicate (see Figure 7) which (as we have observed) can lead to the butterfly effect [111]. That is, where the verifier loops indefinitely unrolling predicates fruitlessly. In our experience, this typically happens when the condition being checked is invalid, and hence, the verifier cannot quickly find a proof by contradiction.
Allocation. Since data can be allocated on the heap in Whiley methods using the new operator, a translation of this operator is required. To this end, we employ the following: § ¤ This simply returns an arbitrary location which was not previously allocated, and ensures it now holds the requested value. Recall that, at the time of writing, Whiley does not support explicit memory deallocation, and hence, no counterpart for this is required. Finally we note that, since allocations result in calls to Ref#new , they must be extracted from expressions as for method invocations above.

Experimental Results
In this section, we compare our Wy2B translator against the Whiley native verifier using the existing compiler test suite which consists of 1100+ (mostly) small Whiley programs. In particular, we are concerned with the number of tests that Wy2B can pass correctly, and note that the existing Whiley native verifier does not pass all the tests (e.g. because of outstanding bugs, etc.). In addition, we discuss our experiences using the new Wy2B toolchain on several larger case studies.

Micro-test Statistics
The Whiley compiler system includes a comprehensive suite of "micro"-test programs, which are small Whiley programs intended to methodically test all Whiley language features, including the Whiley native verifier. At the time of this evaluation (May 2021), this test suite included 731 "valid" micro-test case programs that should be verifiable, as well as 461 "invalid" microtest case programs that should generate compiler errors or verification failures (to ensure that the compiler correctly catches them). Our first step in evaluating the correctness and usefulness of our new verifier is to apply it to this test suite. We use Boogie v2.8.26 and Z3 v4.8.10 for these evaluations.
When we applied our new Wy2B verifier to the invalid programs, ignoring 7 programs that are marked as IGNORE due to current limitations of the compiler front end, we found that all 454 of the remaining programs failed as expected. This confirms that the Boogie back end is correctly detecting verification issues in programs that should not be verifiable. For completeness, we illustrate one such example: § ¤

function f(int[] xs) -> int[]:
xs[0] = 1 return xs ¦ ¥ The above "invalid" program is used to test that the verifier correctly reports a potential out-of-bounds access on line 2. Both the native verifier and our Wy2B verifier pass this test. The valid micro-test programs are small Whiley programs (ranging from 3 to 250 lines of code with an average length of 18 lines) that each contain several (2.2 on average) function and method definitions, some with specifications and some without. Around one third of the programs have functions or methods with requires/ensures specifications, one third use arrays (which generate array bound proof obligations), and 21% have loops with invariants. On average, our Wy2B translator generates 6.0 explicit proof obligations per micro-test program (to check array bounds, function call preconditions, etc.). This is on top of any explicit assert statements in the Whiley program and also in addition to the main proof obligations of Boogie, which are that each function or method body correctly implements its specification, and that every loop invariant is correctly preserved. Again, for completeness we illustrate one such example: § ¤ The above "valid" program is expected to pass verification without raising any errors. This means that, amongst other things, the verifier must prove that the body of f satisfies its specification, and within test must establish the precondition for the call f(xs) and that the final assert holds. Again, both the native verifier and our Wy2B verifier pass this test. Figure 8 compares the percentages of these "valid" micro-tests that the native Whiley verifier and the Wy2B Boogie-based verifier can verify respectively. The leftmost bar on each row corresponds to the programs that both verifiers can verify (604 programs, or 82.6%). The middle bars show that the Whiley native verifier can verify an extra 7 programs (1.0%), whereas the Boogie verifier can verify an additional 102 programs (14.0%). So in total, the Whiley native verifier can verify 83.6% of the programs, whilst the Boogie verifier can verify a total of 96.6%.
We investigated the 7 programs that the Whiley native verifier could verify but Boogie could not, and found that 4 of them are verifiable by a later version of Boogie (v2.9.6.0) and Z3 (v4.8.12). The remaining three are due to outstanding issues with the translation to Boogie related to lambda functions that return union types (Issue #59 in the Whiley2Boogie repository) and to proving the type invariants of cyclic data structures (Issue #61).
The larger number of programs that are verifiable by Boogie but not by the Whiley native verifier are largely because there are several Whiley language features that are not supported by the Whiley native verifier, such as: -heap updates; -reasoning about the results of calls to lambda functions; -some kinds of generic types.
The Wy2B+Boogie toolchain takes 15:30 minutes (930 seconds) to translate and verify just the 706 test programs that it can verify, on a Dell Precision 5520 laptop with an Intel i7-7820HQ CPU @ 2.90GHz and 32Gb RAM, and a 60 second timeout for Boogie. This is 1.3 seconds on average for each small valid test program, which is acceptable performance for real-world usage. When run on all 731 programs with a timeout of 60 seconds, the whole test run takes around 17:55 minutes, because some of the more difficult programs hit the 60 second timeout and fail. This is around 1.5 seconds average for each test, with a maximum of 60 seconds for those that time out, which is still reasonable.
Another interesting performance issue is that we run Boogie with the -useArrayTheory flag by default-this uses the built-in SMT theory of arrays within Z3, which handles large arrays better, usually gives better performance, and enables more programs to be verified (without this flag, Boogie can verify only 665/731 = 91% of the valid test suite). However, there are a few programs (e.g. While_Valid_71.whiley) where performance becomes dramatically worse with this flag-it takes 4.5 minutes to report 5 unverifiable proof obligations with the flag, but less than one second to finish and report 7 unverifiable proof obligations without the flag.
The Whiley native verifier takes only four minutes to process the 600+ test programs that it can verify (around 2.5 programs/sec), which is significantly faster than the Boogie verifier, but takes 18:32 minutes to process all the 731 valid tests (around 1.5 secs/test on average). However, it is difficult to compare the actual proof times, because the Whiley verifier runs within a single Java JVM process, whereas the Wy2B+Boogie toolchain creates several separate processes and intermediate files for each test program.

Case Study: Conway Game of Life
The first case study we discuss is an interactive web page for playing the Game of Life by Conway [70]. This consists of a small index.html file to load the game, plus three Whiley modules: The Whiley compiler compiles these three modules and generates JavaScript as output, which can then run in a standard web browser (see Figure 9). We focussed on verifying just the model component, since the others are just the view and controller components whose correct functioning is generally obvious by the visual updates of the canvas. We aimed to specify and verify as much of the functional behaviour of the model as possible, to try to explore the limits of the Boogie verification path. Figure 10 shows the main data structure that represents the board, plus a Whiley function that counts the number of neighbouring cells that are alive. In addition to adding specifications to model.whiley, we made some small changes to the code to make specification or verification easier: -The original board init function took width and height inputs as arbitrary pixel sizes, but we changed these to be cell counts rather than pixels (since the size in pixels is just a GUI display issue) and required them to be greater than zero to avoid empty board cases that are not interesting in practice; -We moved the cell-update code out of a doubly nested loop into a separate function, for better modularity and easier specification; -Whiley currently supports only one-dimensional arrays, so the code implemented the 2D board as a one-dimensional array, where each (x, y) location was translated into an index x + y*state.width . We respected this data representation choice, 7 but initially had some difficulty with Boogie struggling to verify in-range assertions about these indexes, due to the nonlinear multiplication ( state.width is initialised at the start of each game, so is not a static constant). Frequently, Boogie would go into an infinite loop trying to prove these assertions (or terminate with a timeout error if we set a time limit). Eventually we found that upgrading Z3 from version 4.8.9 to 4.8.10 solved most of these problems, and Boogie was then able to prove most of the required assertions, or give a quick failure result for those it could not prove. Even then, we found that it was sometimes necessary to try several different ways of specifying indexes and bounds before finding one that Boogie could verify. For example, it was much easier to verify the count_living(...) function when it took a single index parameter rather than separate x and y parameters-this is why in our final version the count_living function re-derives the x and y coordinates from the index. This meant that only one variable needed to be quantified in the update loop invariant, instead of both the Fig. 10 Snippets from the Game of Life case study: the State data structure with its invariants, and the count_living function that counts how many neighbouring cells are alive. As explained in the text, the index input of count_living is given a cell location (x, y) as x+y*width . Note that uint is defined in the standard library and has the same definition as nat (recall Figure 1) x and y coordinates. Typically, we found that cases where Boogie did not terminate were due to array accesses that it couldn't prove were within bounds, and that adding redundant constraints to the specification to make it clear that they were in bounds would fix that problem. This process was rather frustrating, but reflects a limitation of SMT solvers (because nonlinear arithmetic is not decidable) rather than of the Whiley-to-Boogie translation.
After these changes, Boogie (v2.8.26) can easily verify all the functions in this program in 2.2 seconds, plus 2.8 seconds for the translation from Whiley to Boogie.

VerifyThis 2019 Competition Challenges
In this section, we briefly discuss our experience of translating and verifying several of the Dafny and JML solutions to the 'VerifyThis 2019' verification challenge [59]. 8 These challenges involve quite sophisticated algorithms, with full specifications of the functional behaviour, so are reasonably challenging verification tasks.
Dafny and Boogie were designed to work together, whereas Whiley was independently designed, and originally used several generations of custom-built "native" provers to discharge proof obligations. It is only recently that we have developed the Boogie back end as an alternative verifier. So a useful way to evaluate the usability of the Whiley+Boogie verifier is to take verification solutions that are written in Dafny, translate them into Whiley and see how well the verification works in comparison with Dafny+Boogie. This can help us to understand how various language features of Whiley help or hinder the verification process and how well Whiley translates into the underlying Boogie verifier, which is a common back end for both languages.
We translated and verified the following challenge solutions using Boogie v2.9.6.0 and Z3 v4.8.12-the resulting Whiley solutions can be seen on GitHub. 9 Challenge 1A: Monotonic Segments This challenge takes an array and cuts it into monotonic segments, which are either increasing or decreasing. The Dafny solution used the builtin extensible sequences to specify some of the operations, but Whiley has only fixed size arrays. So to replicate Dafny's sequence append operator, we defined in Whiley an append() function that adds an element to the end of an array. To replicate the functionality of Dafny's sequence slicing, we added start and end parameters to each of the properties used where necessary, as these were the only uses of sequence slicing in the Dafny version. Interestingly, the lemmas found in the Dafny version were not necessary, as they were needed to prove properties of Dafny's sequence manipulations that were not relevant to Whiley. Some assertions found in the Dafny version were also not needed in the Whiley code, as they were only needed to prove properties of the sequence manipulation in Dafny. The Dafny solution was 72 lines of non-comment specification and source code (excluding curly braces), while the Whiley solution was slightly shorter at 56 lines (including 11 lines for append<T>() ) and took roughly 30s to verify without the -useArrayTheory flag. We note, with that flag enabled, it would not verify the program within 20 minutes. Challenge 1B: GHC Sort. This challenge was to verify a sorting algorithm used by the GHC Haskell compiler, which takes the monotonic segments from the previous challenge, reverses the decreasing ones, and then pairwise merges the segments into a sorted result. For this challenge, we added the same append<T>() function as above. As 01_ghc_sort builds upon 01_findcuts, the same start-and-end modifications to the properties were made, but a separate slice function was also added, as Dafny's sequence slicing was used more extensively than in the cutpoints solution. The lemmas from the Dafny code were not needed in the Whiley code, and neither were any of the assertions. New assertions were necessary to add to the Whiley code in the merge_pair and monotonic_segments functions to demonstrate properties of the implemented slice function. The ghc_sort function was simplified, as the Dafny solution uses an extra while loop to copy its output sequence into an array, which is not necessary in Whiley as arrays are used throughout. The reverse function was re-implemented slightly to avoid an append() on every iteration. The Dafny language includes multi-sets (bags) and the Dafny solution used these to prove that one sequence is a permutation of the other. However, the authors comment the "specification (and hence proofs) that the output is a permutation of the input is incomplete". Whiley does not have built-in support for multi-sets, and it is difficult to recreate this using uninterpreted functions. As such, we also did not establish the permutation property in the Whiley version. Overall, the Dafny solution was 137 lines of non-comment specification and source code (excluding curly braces), and the Whiley solution was slightly longer at 152 lines. The Wy2B+Boogie verifier takes roughly 20 seconds to verify this program, and again failed to verify within 20mins with the Boogie -useArrayTheory flag. Challenge 2A: Cartesian Trees. This challenge was to verify a stack-based algorithm for finding the nearest smaller value for each item in an array. There was no Dafny solution, so we started from the OpenJML solution, which has a single function with a doubly nested loop. For this challenge, it was only necessary to add the loop invariants |left| == |s| and |stack| == |s| to the outer loop, as Whiley is not able to automatically determine that the sizes of the arrays are unmodified when the loop body only updates valid indexes in the array, whereas OpenJML can infer this invariant. The OpenJML solution was 38 lines of code and specifications, and the Whiley solution is 35 lines. The Wy2B+Boogie verifier takes 1.8 seconds to verify this program, or 5 seconds if we add the Boogie -useArrayTheory flag.

Discussion
From our case studies and our micro-test results, we have observed that using Boogie to verify Whiley programs has significantly increased the verification abilities of Whiley. This is partly due to Boogie making it easier to provide proof support for a wider range of Whiley language features and partly due to the maturity and power of the underlying proof tools-the decades of careful proof engineering that have gone into Z3. However, the use of Boogie and Z3 is not yet perfect. The Boogie -useArrayTheory flag is necessary in some case studies to handle large arrays, but in other case studies it can lead to vastly increased proof times or even non-termination. Also, we have observed that Boogie can often make effective use of recursive predicates to prove a valid proof obligation, but can go into an infinite unfolding loop if that proof obligation is difficult or unprovable.
On the Whiley side, we found that when reasoning about arrays it is helpful to define various supporting properties, such as taking slices of an array, appending two arrays, and counting the occurrences of a given element. It would be useful to develop a Whiley library of these supporting properties and this would be easier if Whiley properties could return arbitrary values, rather than being limited to Boolean results. This would allow them to be used as specification-only functions, which would make it easier for Boogie to reason about the domain-specific concepts that are captured by those functions.

Extended Static Checkers
The Extended Static Checker for Java (ESC/Java) [68] and its later successor (ESC/Java2) is perhaps one of the most influential tools in the area of verifying compilers [38,48]. The tool essentially provides a verifying compiler for Java programs whose specifications are given as annotations in a subset of the Java Modelling Language (JML) [38,39,99]. JML provides a standard notation for expressing contracts in Java, and the following illustrates a simple method in JML which ESC/Java verifies as correct: § ¤ / * @ requires n >= 0; @ ensures \result >= 0; @ * / public static int method(int n) { int i = 0; / * @ maintaining i >= \old(i); * / while(i < n) { i = i + 1; } return i; } ¦ ¥ Here, we can see pre-and postconditions are given for the method, along with an appropriate loop invariant. Since \old(i) refers to i on entry to the loop, we have \old(i)==0 in this case. Despite some unsoundness (e.g. ignoring arithmetic overflow and unrolling loops a fixed number of times), the tool has been demonstrated in real-world settings. For example, Cataño and Huisman [37] used it to check specifications given for an independently developed implementation of an electronic purse. In addition to ESC/Java, a Runtime Assertion Checker (RAC) was developed for JML [34,39,99] as well as various utilities for specification-based testing [32,42,170,171]. Likewise, Krakatoa [67] provided an alternative to ESC/Java for statically verifying Java programs based on the original Why platform. Finally, whilst the development of JML and its associated tooling stagnated somewhat over the last decade, we note more recent efforts through the OpenJML initiative [29,46,47,149].
The approach taken to generating verification conditions in an earlier tool, ESC/Modula-3, was also adopted in ESC/Java [56]. In fact, ESC/Modula-3 was one of the earliest tools to use an intermediate verification language (based on Dijkstra's language of guarded commands [57]) and, in many ways, is Boogie's predecessor. Such a language typically includes assignment, assume and assert statements and non-deterministic choice. It is notable that the guarded command language used in ESC/Modula-3 lacked type information and used a similar encoding of types as ours, although Modula-3 has a simpler type system than Whiley. For example, a predicate isT was defined for each type to determine whether a given variable was in the type T. A similar approach was also taken in Leino's Ecstatic tool, where the subtyping relation was encoded using a subtype() predicate [100]. Again, every type was given a membership predicate with specific axioms stating their non-intersection and was contained in what Leino refers to as the background predicate and included with each generated verification condition. A key difference from ESC/Modula-3 is that ESC/Java employed a multi-stage process allowing "high-level" guarded command programs to be desugared into a lower-level form. Further refinements were also made with "passive form" which reduced the size of generated verification conditions, and supported unstructured control flow [17].

Spec#
The Spec# system followed ESC/Java and benefited from many of the insights gained in that project. Spec# added proper support for handling loop invariants [16], for handling safe object initialisation [64] and allowing temporary violations of object invariants through the expose keyword [109]. The latter is necessary to address the so-called packing problem which was essentially ignored by ESC/Java [15]. Two further improvements meant Spec# was capable of verifying a wider range of programs than ESC/Java: firstly, Spec# incorporated the new Z3 automated theorem prover (as opposed to Simplify) [53]; secondly, Spec# refined the language of guarded commands used in ESC/Java to form Boogie. Boogie was described as an "effective intermediate language for verification condition generation of object-oriented programs because it lacks the complexities of a full-featured object-oriented programming language" [14]. In essence, Boogie was a version of the guarded command language from ESC/Java which also supported a textual syntax, type checking, and static analysis for inferring loop invariants. Other important innovations included the ability to specify triggers to help guide quantifier instantiation, and the use of trace semantics to formalise the meaning of Boogie [107].
Leino and Schulte [113] provide an excellent account of how Spec# programs are encoded in Boogie, and there is much similarity with that presented here. For example, the heap is modelled using a global variable of type [ref,name]any where a special field, alloc , tracks whether a location is allocated. Like Whiley, Spec# permits method calls within expressions, and hence, a similar mechanism for safely extracting them is employed. Furthermore, key challenges arise in preserving class invariants across inheritance and ownership relationships. The approach adopted was based on packing/unpacking [15] which identify code regions where class invariants are not required to hold.

Dafny
Dafny [104,105] is perhaps the most comparable related work to Whiley, and was developed independently at roughly the same time. That said, the goals of the Dafny project are somewhat different. In particular, the primary goal of Dafny is to provide a proof assistant for verifying algorithms rather than, for example, generating efficient executable code (though it does compile to C#). In contrast, Whiley aims to generate code which is, for example, suitable for embedded systems [134,156]. Dafny is an imperative language with simple support for objects and classes without inheritance and, more recently, traits [1]. Like Whiley, Dafny employs unbound arithmetic and distinguishes between pure and impure functions. Dafny provides algebraic data types (which are similar to Whiley's recursive data types) and supports immutable collection types with value semantics that are primarily used for ghost fields to enable specification of pointer-based programs. Dynamic memory allocation is possible in Dafny, but no explicit deallocation mechanism is given and presumably any implementation would require a garbage collector.
Leino [102] provides a detailed description of how Dafny programs are translated into Boogie, much of which has already been touched upon earlier in this paper. Dafny also supports generic types and, unlike Whiley, dynamic frames [91]. As discussed in §2.1.6, the latter provides a suitable mechanism for reasoning about pointer-based programs. For example, Dafny has been used successfully to verify the Schorr-Waite algorithm for marking reachable nodes in a graph [104]. Finally, Dafny has been used to successfully verify benchmarks from the VSTTE'08 [108], VSCOMP'10 [93], VerifyThis'12 [83] challenges (and more).
Leino and Pit-Claudel [111] characterise the "Butterfly Effect" where minor changes to the program source cause significant instabilities in verification time. The authors argue one reason for this are so-called matching loops where the SMT solver repeatedly instantiates quantifiers or recursive predicates without making actual progress towards either a proof or a contradiction. Their approach is prototyped in Dafny and moves responsibility for trigger selection out of the SMT solver. This enables trigger selection to occur before quantifiers are rewritten into lower-level forms (i.e. as necessary for the SMT solver) where important triggers are obscured. Furthermore, whilst the authors don't expect Dafny users to write triggers themselves, they are expected to understand them in order to diagnose verification performance problems.

Why3
In addition to Boogie, the other main intermediate verification language in use is WhyML [28,65]. This is part of the Why3 verification platform which is intended to enable a range of different theorem provers to be used in proving correctness, depending on the nature of the program being verified. For example, a short but extremely intricate C program for solving the N-Queens program has been fully verified with the aid of Why3 [66]. This was achieved by abstracting the original program into WhyML, and the proof required the use of three distinct theorem provers to discharge 41 verification conditions. Of these, 35 were discharged automatically by Alt-ERGO [158] or CVC3 [19], whilst the remainder were discharged manually using Coq [24]. Indeed, the authors of Why3 state [28]: The Why3 platform can be used by itself, as some kind of standalone "meta" theorem prover, but the main purpose of Why3 is to be used as an intermediate language.
WhyML is a first-order language with polymorphic types, pattern matching, inductive predicates, records and type invariants. It has also been used in the verification of C, Java and Ada programs (amongst others). Like Boogie, WhyML provides structured statements (e.g. while and if statements). In addition, a standard library is included which provides support for different theories (e.g. integer and real arithmetic, sets and maps).
Of note here is the Boogie to WhyML translation developed by Ameri and Furia [4] which, although largely successful, did expose some important mismatches between them. Their primary motivation was the wide support for alternative (even interactive) provers with Why3. The structured nature of WhyML presented some problems in handling Boogie's unstructured branching, and aspects of Boogie's polymorphic maps and bitvectors were problematic. They showed that Why3 could verify 83% of the translated programs with the same outcome as Boogie. However, they also identified three simple Boogie programs which Boogie either did not verify or incorrectly verified. Why3, on the other hand, handles these cases by virtue of its ability to use a wider range of provers. One of the cases, for example, failed to verify because of the way Z3 handles quantifier instantiation through triggers.
As another example, Spark/ADA is a commercially developed verifying compiler building upon Why3 which has seen good industrial uptake [13,87]. For example, it has been used used for (amongst other things) space-control systems [33], aviation systems [40], automobile systems [81] and railway systems [58]. Finally, given our success here using Boogie to verify Whiley programs, we note it would be interesting future work to explore a WhyML back end for verifying Whiley programs as well.

Viper
Müller et al. [125] observed that existing intermediate verification languages (e.g. Boogie or WhyML) do not support separation logics and related permission-based logics. They identify that such systems have a more "higher-order nature" than typical software verification problems, and make extensive use of recursive predicates (which Boogie/Z3 does not support well). They developed an alternative intermediate verification language (Viper) which offers more precise handling of recursive predicates and protects against "infinite unrolling" using a least fixed-point semantic. The tool also supports two back ends, one of which generates an encoding in Boogie. This builds on earlier work looking at the encoding of abstract predicates and abstraction functions in the context of permission-based logics [79]. Here, abstract predicates describe the (potentially infinite) set of access permission a given object has, but this is problematic for an SMT solver which cannot arbitrarily unroll them. To handle this an encoding is employed which "versions" predicates to prevent arbitrary unrolling, along with various tactics to prevent unlimited matching loops.
An example of work utilising Viper is that of Ter-Gabrielyan et al. who argue that SMT solvers typically provide limited support for graph reachability problems, which is prohibitive for reasoning about mutable data structures that admit sharing in various forms [157]. By restricting themselves to problems involving acyclic structures of bounded outdegree, they obtained an encoding ammenable to first-order theorem provers which they demonstrated in the context of Viper. Finally, we note that Viper currently acts as the intermediate verification language for Chalice [110,114], Prusti [8], Nagini [63], VerCors [5,27] and more.

VeriFast
VeriFast is a modular program verifier for concurrent and sequential programs written in C and Java, which employs separation logic and fractional permissions to ensure memory safety [85,86]. The tool comes from a line of work exploring the use of dynamic frames in the context of verification [152][153][154]. VeriFast is unusual in eschewing the use of quantifiers within specifications. Instead, inductive predicates are provided to model properties that would otherwise be expressed using quantified formulae. VeriFast supports algebraic data types to allow specifications to reason about locations contained in linked structures. Finally, VeriFast has been used to reason about memory safety in JavaCard programs and Linux device drivers [85], and also in the verification of FreeRTOS [124].

Frama-C
Frama-C [52] provides a set of sound software analyses for the industrial analysis of ISO C99 source code. The system uses the ACSL specification language as a platform on which different solver plugins can operate. For example, different plugins may use different approaches to checking functions meet their specifications, such as abstract interpretation or deductive verification. The ACSL specification language is based loosely upon JML and supports a variant of separation logic through the \separated command. An unusual feature of Frama-C (e.g. compared with Dafny or Whiley) is that multiple loop invariants may be specified at different positions within the loop [22].
Volkov et al. [165] developed an extension for lemma functions (similar to those in Dafny) which enables a more "interactive" style of verification, and applied this to various functions from the Linux Kernel. We note that, whilst Whiley lacks specific support for lemmas, a similar effect can be achieved using a function with a void return. Kosmatov and Signoles illustrate runtime assertion checking with Frama-C which they argue provides useful stepping stone prior to static verification [94]. We note similar findings in the context of an automated testing tool for Whiley [43].
Finally, Frama-C has been applied to a range of real-world problems. For example, it has been used in the context of Air Traffic Management systems to reason about floating point operations and establish bounds on rounding errors [73]. Similarly, AirBus has investigated the use of Frama-C within the context of the DO-178B standard for software in airborne systems and equipment [155]. Frama-C has also been used in the context of IoT devices employing AES encryption [26], for verifying components of Contiki, an open-source operating system for IoT [119], and the Xen hypervisor [143]. It has also been applied to verifying railway software [142] and combined with CBMC [97] for test case generation in the context of automotive controllers [127].

AutoProof
Eiffel [122] is an influential and widely used language that promotes the idea of "Design by Contract" as a lightweight alternative to formal specification [123].
Tschannen et al. characterise the AutoProof verifier for Eiffel as being auto-activemeaning it lies somewhere between fully automatic and manual (i.e. interactive) [163]. Here, an automated theorem prover is used (as for Dafny or Whiley) in conjunction with appropriate annotations (e.g. pre-/postconditions, loops invariants, etc.). AutoProof translates Eiffel programs into Boogie which, for example, allows strengthening of postconditions and weakening of preconditions in subclasses [162]. Of relevance here is the approach to framing. Whilst Eiffel has no specific syntax for framing, a modifies was implemented as a pragma. A default frame condition is employed which assumes only references mentioned in the postcondition can be modified. This utilises a similar rule to that in §3.5 for state preservation across method calls.
Finally, we note AutoProof has been used in various settings, such as for teaching a graduate course on software verification [69].

Other
Aside from various descriptions of Boogie's syntax and semantics [14,103,112], several works focus on the usability of Boogie as an intermediate verification language. For example, Chen and Furia were concerned with the "brittleness" of verification tools [41]. Specifically, a verifier is brittle if small (inconsequential) changes can have major impacts on the outcome (e.g. it no longer verifies). They investigated this in the context of Boogie by mutating various (verified) programs in ways that preserved correctness finding, perhaps surprisingly, several issues. For example, where the ordering of declarations in a Boogie program affected the chance of success. Indeed, for one program consisting of five (independent) statements, Boogie only managed to verify half of the 120 possible orderings. In a similar vein, the Boogie verification debugger (BDV) addresses the disconnect between counterexamples generated at the Boogie level and the source language above [74]. The tool employs plugins to convert Boogie counterexamples into a form recognisable in the source language, with plugins provided for VCC and Dafny. Likewise, Boogaloo attempts to improve the process of debugging failed verification attempts by generating concrete inputs (e.g. arguments) that illustrate the failing trace [141]. A key challenge was in providing a runtime semantics for Boogie, and for input generation, a mixture of symbolic execution and constraint solving with Z3 was applied.
Segal and Chalin [150] attempted a systematic comparison of Boogie and Pilar. Here, Pilar is a component of the open-source Sireum framework and is similar in many ways to Boogie. They stated that it is "not trivial to define a common intermediate language that can still support the syntax and semantics of many source languages". Their research method was to develop translations from Ruby into both Boogie and Pilar, and then compare. Various aspects of Ruby proved challenging for Boogie, including its dynamically typed nature and arrays. Their solution bears similarity to ours, as they defined an abstract Boogie type as the root of all Ruby values. Overall, they concluded that Boogie's type system makes it "more flexible for languages with non-traditional type systems" whereas Pilar is more suitable for traditional Object-Oriented languages.
Arlt et al. [7] presented a translation from SOOT's intermediate bytecode language (Jimple) to Boogie, with an aim of identifying unreachable code. As such, an important aspect of the translation was the preservation of feasible execution paths. Overall, they found many aspects of the translation straightforward. For example, Java's instanceof operator was modelled using an uninterpreted function. Another interesting aspect of their translation was the use of multiple typed heaps (a Burstall-Bornat heap [30]) to model the Java heap. However, some aspects of impedance mismatch were present and they had difficulty with monitor bytecodes, exceptions, certain chains of if-else statements and finally blocks.
On a related note, Cook et al. [50] focus on the impedance mismatch, arguing that "existing theorem provers, such as Simplify, lack precise support for important programming language constructs such as pointers, structures and unions". For example, that integer types are almost never unbounded in practice, though verification tools often assume this. Likewise, that the lack of support for nonlinear arithmetic is often a problem (though we note useful advances have been made in the intervening years [23,44,51,88,96]). Their tool, Cogent, "implements an eager and accurate translation of ANSI-C expressions (including features such as bitvectors, structures, unions, pointers and pointer arithmetic) into propositional logic". It is in essence a layer that sits above tools like Boogie and encodes ANSI-C data types using bitvectors and we note the obvious similarity with the more recent tool, Frama-C, discussed above.
Rust provides another interesting perspective as there has also been growing interest in exploiting its safety guarantees for program verification. For example, RustHorn, translates Rust programs into Constrained Horn Clauses (CHC) which can then be discharged by a specialised CHC solver [120]. Likewise, Astrauskas et al. leverage Rust's type system to simplify the specification and verification of systems software [8]. Their tool, Prusti, extends Rust with a specification language embedded using annotations and statically checked using Viper [125]. The SMACK verifier which translates LLVM IR to Boogie/Z3 [14,53] was also extended to Rust [11]. The CRUST tool [161] enables unsafe code to be checked using the C Bounded Model Checker (CMBC) [97]. This employs a custom C code generator for rustc, and correctly identified bugs arising during development of Rust's standard library. The widely used symbolic execution tool, Klee [35], was also extended for Rust allowing assertions to be checked statically [115,116]. Finally, we note ongoing work to formalise subsets of the Rust language which could assist the development of verification tools [89,90,136,166,167].
Finally, Jahob supports multiple provers and is concerned with recursive data structures (e.g. trees, etc.) and their encoding in first-order logic [31,147]. Bannwart and Müller presented a Hoare-style logic for a sequential bytecode language similar to JVM Bytecode or MSIL [10]. As expected, the unstructured nature of bytecode languages presented a key challenge here. In a similar fashion, Barnett and Leino consider the problem of translating MSIL bytecode into a form suitable for Boogie, in particular by turning unstructured loops into quantified expressions [18].

Conclusion
Using Boogie as an intermediate verification language eases the development of a verifying compiler, particularly as it handles verification condition generation, and offers high-level structures such as while loops and procedures with specifications. However, as with any intermediate language, there is potential for an impedance mismatch when Boogie structures do not exactly match the source language. Fortunately, this impedance mismatch can be circumvented in a variety of ways, such as translating to lower-level Boogie statements (e.g. with unstructured control flow). Furthermore, Boogie provides a good level of flexibility to define the "background theory" of a source language, such as its type system, its object structure, and support for heaps. This background theory is at a similar level of abstraction in Boogie as it would be in SMT-LIB so, whilst Boogie offers no major advantages in this area, it also has no disadvantages.
Our work provides a comprehensive account of the encoding of an independently developed non-trivial source language (Whiley) into Boogie. In doing this, we faced many challenges in figuring out a good encoding and, unfortunately, encountered many dead ends along the way. As such, we hope this work can offer guidance to researchers when developing verifying compilers for other languages. Indeed, it would be beneficial to have a repository of knowledge about different ways of encoding various language constructs. Some alternatives (particularly for various heap encoding techniques and procedure framing axioms) are discussed in the published Boogie papers, but there is no central repository of techniques or publications comparing encoding techniques. A major benefit of Boogie is, of course, its easy access to Z3. We have shown that the Wy2B/Boogie/Z3 stack offers significant advantages over the native Whiley verifier in terms of the percentage of programs that can be verified automatically. We note, however, that whilst Boogie/Z3 offers tangible benefits, they are not without their own challenges. For example, understanding why Boogie/Z3 cannot verify a particular program, or loops indefinitely, still requires considerable expertise. We have also shown that a number of non-trivial case studies written in Whiley can be successfully verified with Boogie. This has also helped us identify areas in which the Whiley language itself could be improved to better exploit Boogie.
Finally, interesting future work would be to explore translating Boogie's counterexample models back into Whiley-like notation to improve error reporting. We would also like to extend Whiley's support for framing and, consequently, our Boogie back end. Another interesting path would be a more detailed comparison against the Whiley verifier. As discussed in §3, this has a layered designed based on an intermediate assertion language and an underlying SMT solver. Hence, one could swap out the SMT solver for Z3 to provide a more accurate comparison with Boogie itself.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. § § ¤