1 Introduction

Standard libraries of programming languages provide efficient implementations for common data containers. The details of these implementations are abstracted away by generic interfaces which are specified in terms of well understood mathematical structures such as sets, multisets, sequences, and partial functions. The intensive use of container libraries makes important their formal verification.

However, the functional correctness of these libraries is challenging to verify for several reasons. Firstly, their implementation is highly optimized: it employs complex data structures and manages the memory directly through pointers/references or specific memory allocators. Secondly, the specification of containers is rarely formal. Notable exceptions are, e.g., Eiffel [28] and SPARK [11]; recently, [1] provided a specification of the Ada 2012 container library. The formal specifications are very important when the library employs constructs that are out of the scope of the underlying mathematical structure. A typical example of such constructs are iterators. For example, Java iterators are generic and can exist independently of the container; Ada 2012 iterators, called cursors, are part of the container. Thirdly, the specification of the link between the low level implementation and the mathematical specification requires hybrid logics that are able to capture both low level and high level specifications of the container. For verification purposes, these logics shall be supported by efficient solvers.

This work focuses on the functional verification of the bounded doubly linked lists container, which is a GNAT implementation [12] of the doubly linked lists container in the standard library of Ada 2012 [1]. This container is currently used by client programs [2] written in SPARK [22], a subset of Ada targeted at safety- and security-critical applications. The lists have bounded capacity, fixed at the list creation, and thus avoid dynamic memory allocation during the container use. This feature is required in critical code, where it is necessary to supply formal guarantees on the maximal amount of memory used by the running code.

The container implementation is original compared with other implementations of linked lists inside arrays. It employs an array of fixed size in which it manages (i) the occupied array cells inside a doubly linked list representing the content of the container and (ii) a singly linked list of free array cells. The operations provided are classic for lists. The amortized constant time complexity is preserved by the implementation of insert and delete operations. The list elements are designated using (bi-directional) cursors, also used to traverse the list. In conclusion, the code of this container was designed to ensure efficiency of operations and not its verification, and therefore it provides a realistic test for the automated verification.

Thanks to the introduction of formal contracts in Ada 2012, the container has been fully specified recently based on a previous specification in Why3 by Dross et al. [11]. The specification given is “meant to facilitate the formal verification of code using this container” [12], and it is presently used to prove the clients written in SPARK. The container is specified in terms of a model representing a functional implementation of bounded vectors, also written in Ada. This kind of specification is a substitute for the algebraic data types, not supported by Ada. It has the advantage of being executable, which enables the run-time verification of the implementation. An important feature of these contracts is their completeness [27] with respect to the models considered for the container and the cursors. This aspect is a challenge for the state of the art verification tools. The formal verification of these contracts can not be done by GNATprove, the deductive verification environment for SPARK, because the code employs language constructs out of its scope.

The goal of our study is to apply on-the-shelf verification tools to prove the full functional correctness and the memory safety for this implementation, without simplifying the code or the specification. To open the case study to more verification platforms, we choose to write this library in C, because C may capture all the features of the container implementation, except the strong typing and the generic types of Ada. The C implementation mimics the Ada code. The functional specification of the C code translates the container contracts from Ada based on (i) representation predicates that relate heap regions with algebraic models using inductively defined predicates, (ii) algebraic lists and maps, and (iii) inductively defined predicates and functions on the algebraic models. The logic including these features is undecidable in general. Therefore, we have to help the prover to obtain push button verification. The auto-active verification [20] environments are helpful in such tasks.

The invariant properties of the implementation and the features exhibited by the specification guided us towards deductive verification platforms that support Separation Logic [31] (SL) and algebraic data types. Consequently, we choose the VeriFast [15] auto-active verification tool, which provides means for (a) the specification of representation predicates in the style introduced for SL by O’Hearn et al. [25], (b) the definition of polymorphic algebraic data types, predicates and functions, and (c) the definition of user lemmas to help verification. Using these features, we employ a verification methodology based on the refinement of the original specification. The refined specification not only captures accurately the contracts, but also eases the deductive verification process, i.e., the writing of lemmas. For example, we employ a style of writing representation predicates in SL that leads to simpler lemmas for list segment composition.

To summarize, we verified the C implementation of a bounded doubly linked list container against its functional specification. In addition, we verified the safety of memory accesses using Separation Logic. For this, we annotated the C code and we extended the library for algebraic polymorphic lists of VeriFast with new predicates and lemmas. These logic development may be used in other verification tools based on induction.

The paper begins by presenting the case study in Sect. 2. Then, we highlight in Sect. 3 the main ingredients of the verification approach used and the challenges we faced. Section 4 presents the experimental results. We compare this work with other approaches for verification of containers and complex data structures in Sect. 5.

2 Dynamic Bounded Doubly-Linked Lists

This section presents the container code and its functional specification.

2.1 Overview

Implementation: The code is written in a very simple fragment of C, which may be easily translated to most imperative programming languages. It uses records and pointers to records, dynamic memory allocation, classic accesses to record fields and array elements, basic integer type and its operations. Like in the original code, the container does not support concurrency and has been written to obtain efficient operations and not to ease the verification. The container elements are designated through cursors, which represent valid positions in the list; they may be moved forward and backward in the list. The container interface includes 30 operations including classic operations (creation, copy, size access, clearing and deallocation, equality test, searching) and a rich set of utilities (inserting or deleting bunches of elements at some position, searching from the end, merging lists, swapping elements or links, reversing in place, sorting).

Specification: The functional specification is model based [28]. Two mathematical models are used: algebraic lists (i.e., finite sequences) to represent the content of the list and finite partial maps to model the set of valid cursors (see Sect. 2.3 for details). The contracts employ operations on these mathematical models that are beyond their classic usage. For example, the test of inclusion between the set of elements of two sequences, or the test that the domain of a partial mapping has been truncated from a given value. For this reason, we enriched the library of mathematical models provided by our prover with such operations and the corresponding axiomatizations (see Sect. 3.2).

An important feature of our functional specification is the usage of a refined abstraction for the list to ease the proof that the operations satisfy their contracts. We introduce a precise model for the list, which is an algebraic list of abstract cells, storing container values together with the links between the cells. This precise model is mapped to the abstract model (sequence of values) using a catamorphic mapping [35], called . Moreover, the precise model is used to compute the (abstract) model of cursors, based on a catamorphic mapping, called . The use of the precise model facilitates the verification effort for proving that implementations of operations satisfy their contracts (see Sect. 3.1).

The functional specification is complete in the sense given by [27]: the post-condition of each operation uniquely defines its result and the side effect on the model of the container and of its cursors. However, it does not check for memory overflow at the container creation.

For the syntax of specifications, we employ in the following the specification language of VeriFast, which extends the normalized specification language for C, ACSL [3], with shorthand notations and operators for Separation Logic. Therefore, we employ ‘’ to introduce existentially quantified variables, ‘’ for both classic conjunction and the separating one, ‘’ for the points-to operator that defines the content (right operand) of an allocated memory cell (left operand), and for the empty heap. Algebraic lists of VeriFast have type and are polymorphic; the operations on lists have classic names. The definition of new logic types (and functions) is introduced by the keyword (resp. ).

Fig. 1.
figure 1

Logic definitions for the BDLL container

2.2 List Container

List Elements: The data stored in the list container is typed by an abstract type , defined as an alias to the integer type in our code. This coding is sound for the proof of the functional correctness of the container implementation because the container assumes only that values of may be compared for equality.

List Cell: Also called node in the following, the list cell encapsulates the container element together with links to the next and previous cell in the list. A node is also an element of the array allocated for the container.

figure a

The values of the C type are abstracted by the ghost type , defined at line 1 in Fig. 1, which records the values of node fields. The logic functions , , and to access first, second, resp. third component of a value.

The predicate (line 6 in Fig. 1) relates a node allocated in the heap with its model . The allocation property is expressed by the predefined predicate . The values of the fields are bound to existentially quantified variables and used to build the model of the node. The predicate constrains the fields and to be indexes in an array starting at index 0 and ending at index .

There are two kinds of nodes in the array managed by the container: nodes occupied by list elements and nodes not yet used in the list, i.e., free. Free nodes have the field at \(-1\) and the field is irrelevant. They are specified by the predicate (line 14 in Fig. 1), which also constraints the parameter to be equal to the value of the field . Occupied nodes have the field set to a non-negative integer and the field is relevant. The predicate (line 19 in Fig. 1) relates the node with its abstract model.

Acyclic Doubly Linked List: The container stores the doubly linked list (BDLL) into an array of fixed capacity, which is given at the container creation. The number of elements stored in the list can not exceed the container capacity. The nodes of the BDLL are stored starting from the index 1; index 0 plays the role of the null reference. The type of the list container is given by the following record:

figure b

The length of the list is given by the field . The first and the last cells of the lists are stored at indexes resp. . Field denotes the start of the list registering the free nodes. The operation creating the container allocates the array and sets at free all nodes in the array. The fields denoting the size and the extreme cells of the doubly linked list are set to 0. The initialization of the field is detailed in the next paragraph.

The representation predicate of the BDLL formed by the occupied nodes, , is defined at line 22 in Fig. 1 as a doubly linked list segment starting by the node at index , ending by the node at index ; the starting node stores as previous node , and the ending node stores as next node . The predicate definition is classic in Separation Logic [24], except the bound constraint on the node indexes (locations). If the source and target indexes are equal, the list is empty; otherwise an occupied node is present at index and it is linked to the previous node and the remainder of the list. Notice the use of pointer arithmetics to access the node at index . The predicate relates the heap specification with the mathematical model of the list, i.e., the sequence of abstract nodes. We employ the polymorphic algebraic type available in VeriFast mathematical library and we instantiate it with the logic type . This precise model of the list content is mapped by the inductively defined ghost function to the abstract model, sequence of values of stored.

Fig. 2.
figure 2

Two doubly linked lists of capacity 4 and length 2

Acyclic List of Free Nodes: The free nodes are organized in a singly linked list, called the free-list. The start of this list is given by the field of the type . If is negative, the list is built from all nodes stored between and included; this permits a fast initialization of the free-list at the container creation. If is positive, the free-list starts at index , uses as successor relation the field, and ends at index 0. Figure 2 illustrates the two kinds of free list. The representation predicate (line 31 in Fig. 1) is used when is negative. It collects in the parameter the sequence of the indexes of free nodes. For the second case, we define the predicate (line 38 in Fig. 1). The two kinds of free-list are combined in the predicate (line 45 of Fig. 1) that calls the correct predicate depending on the sign of . Notice that the constraints required by this predicates (the relation between container capacity, BDLL and free-list sizes) are all present in the Ada 2012 specification [12].

Representation Predicate: The invariants of the container are collected in the predicate (line 52 in Fig. 1) which mainly specifies that the container is allocated in the heap (predefined predicate), and its field is also allocated as a block containing \(+1\) records of type . The first node of this array (at address ) has its and fields set to \(-1\) resp. 0. The set of remaining nodes is split between the lists specified by the and predicates due to the separating conjunction. The size of the BDLL is exactly the one of its model and stored in the field .

Examples of Container Contracts: We illustrate the usage of representation predicates defined above by presenting some contracts specifying container operations. For example, the contract of the constructor is:

figure c

It states that the resulting container (denoted by the ghost variable ) is a well formed but empty bounded doubly linked list (its abstract model is the empty list) with free nodes. As said before, the above contract (like in Ada 2012 specification), does not consider the case of memory shortage.

The contract of illustrates how the catamorphism is used to obtain the abstract contract on the sequence of values from the precise models (given by variables and for each list parameter):

figure d

The operation frees all occupied nodes. Its contract only constrains the content of the doubly linked list and leaves unspecified the free list.

figure e

2.3 Cursors

Following the Ada 2012 semantics [1], “a cursor designates a particular node within a list (...). A cursor keeps designating the same node (...) as long as the node is part of the container, even if the node is moved in the container. [...] If [a cursor] is not otherwise initialized, it is initialized to [...] .” Therefore, a cursor is a record storing an array index. The special cursor is defined as a global constant storing the index 0, indeed invalid for a list node (recall that valid nodes are stored from index 1).

figure f

The logic type abstracts the cursor implementation (line 1 in Fig. 3). The representation predicate for cursors, (line 3 in Fig. 3), checks that the cursor content, , corresponds to an occupied node in the list using the precise model of the BDLL (see line 13). Moreover, the predicate computes from and , the BDLL starting index, the segments and , into which the cursor splits .

Given a BDLL container, the model of valid cursors for this container is defined (following Ada 2012 specification) as the finite bijection between the set of abstract cursors and the positions (from 1 to the size) in the list. We encode the mathematical type map by an association list, using the polymorphic type provided in the libraries of VeriFast. The cursor model is computed by the logic function (line 19 in Fig. 3) from the container model, the index of the first node in the BDLL, and the first position in the list.

Fig. 3.
figure 3

Logic definitions for cursors

Notice that this manner of specifying cursor model is coherent with the sequence model of the container, because the access to the elements of a sequence is based on positions. However, this specification choice does not combine well with inductive reasoning and induces additional work for the proof (see Sect. 3). We have to enrich the inductive list model with operations using positions. For example, we define the operation which returns the th element of the list . We also defined operations and on association lists to test if an abstract cursor is in the domain of the map resp. to obtain the value to which it is bound.

An example of a contract using cursors is the operation , which returns the value stored at the position in the list given by the cursor :

figure g

Contracts of functions changing the positions in the list (e.g., insert or delete) are complete with respect to the model of cursors. For example, consider the post-condition of operation given below, which deletes first elements of the list. It uses a conditional expression (syntax like in C) to specify two contract cases. The first case corresponds to an input container with size less than . In the second case, the container preserved its content after position (predicate ) and the positions of valid cursors in the new container (of model ) are shifted by (predicate ) with respect to the old container.

figure h

3 Verification Approach

We employ an auto-active verification approach [20], supported by the tool VeriFast [15]. The auto-active approach provides more automation of the verification process based on the ability given to the user to help the prover by adding annotations and lemmas and the efficient use of back-end solvers. This section highlights the methodology applied to conduct auto-active verification for this case study. This methodology is independent of the specific tool used. We also comment on the advantages and difficulties encountered with the tool used. Notice that we did not have prior experience with VeriFast.

3.1 Model-Based Specification for Verification

The contracts provided for our container are in a first order logic over sequences and maps, which employs recursive logic functions. This theory is undecidable so we have to provide lemma to help the prover to tackle verification conditions.

Usage of a precise model is the solution we found to ease the writing of lemmas. It consists in refining the abstract model used for the container specification into a model that captures more details on the container organization. The abstract model is obtained from the refined one using a catamorphic mapping. This methodology is required by the gap between the abstract model and the lower level implementation of the container.

Let us explain why this methodology leads to efficient verification in our case. Consider the specification where (i) the model for the container is the sequence of the values stored and (ii) the model for the cursors is the mapping of occupied nodes to list positions. To capture these models with the representation predicate for the heap, i.e., the predicate defined at line 22 in Fig. 1, we have to replace the model by the sequence of values and the map of cursors . The verification of iterative operations on the list requires to provide a lemma that allows to compose “well linked” list segments into a new list segment, i.e.,

figure i

This lemma employs an operation , that concatenates two models of counters and such that the positions associated with counters in are shifted by the size of the domain of . This operation is more difficult to axiomatize than list concatenation. Moreover, all invariant proofs require to keep together the two loosely related models (sequence and map) which leads to less modular proofs. Our solution to this problem is to employ the precise model of the list segment represented by the predicate, as has been presented in Sect. 2.2. The composition lemma for predicate is simpler because it avoids the reasoning on the model of cursors.

The catamorphism mappings used to obtain the abstract model of the container and the model of valid cursors have good inductive definitions and enable efficient decision procedures [35]. However, these decision procedures are not available in our verification tool; this work may be a motivation to add them.

Specification of user types by representation predicates mapping them to inductive types is classical in Separation Logic. We encode the invariant of the BDLL data structure in the predicate . The adoption of C for the implementation keeps us away from the problems of verifying object-related properties, for example. However, this choice leads to an overburden in annotations because we have to specify that parameters of type ‘’ satisfy the invariant.

Additional annotation have been supplied to axiomatize global constants (like the record in Fig. 1) and arrays of user-defined structures (like in ).

Contract cases are intensively used in the considered GNAT library. We got around the absence of contract cases in VeriFast using conditional expressions and logic predicate and functions that relate two models (old and new). We do not observe any expressivity or performance problems with this method of encoding contracts.

3.2 Support for Specification Types

Specification of model types is done based on the mathematical models sequence (or inductive polymorphic list) and map (or inductive polymorphic association list). The VeriFast libraries including these models (mainly list.*) contain 9 predicates and 20 lemmas, and are not enough for the operations on models required in our specifications. We added tens of lemma and predicates. They are useful not only for the container proof but also for the verification of client programs with inductive back-end solvers. (Nowadays, these proofs are done by GNATprove by calling SMT solvers with quantifiers support.)

More problematic was the lack of support for finite maps and automation of inductive reasoning. VeriFast does not provide sets and finite maps as primitives. The encoding of cursor models by association lists renders more complex the lemmas needed on cursor models. For example, the map inclusion property is defined as follows:

figure j

This definition is not as easy to reason about as we might expect. In particular, some properties of this definition of inclusion such as reflexivity are only provable under the additional assumption that the keys are distinct.

We proved that the models of cursors fulfill the constraint (defined also in VeriFast libraries) because keys are index positions in the array used to denote separated cells.

figure k

Notice that these proofs are not necessary for provers with support for finite maps and sets. Although VeriFast supports as back-end solver Z3 [9], it does not use it for such theories. The inductive theories are supported by other back-end solvers, e.g., CVC4 [30] that are not connected to VeriFast.

3.3 Annotations Load

The annotations required by the proof of our library belong to two categories: (1) mandatory annotations including function contracts and predicates employed by these contracts and (2) auxiliary annotations including loop invariants, open/close of predicates, definitions and calls of lemmas.

The prover VeriFast includes all this annotation burden, since we can not direct the prover in the usage of these annotations. VeriFast provides two mechanisms to limit the burden of the auxiliary annotations: (i) lemmas can be marked as automated which means they will be given to the backend solver on all problems, (ii) inductive predicate definitions can be automatically folded and unfolded when used with computed parameters.

We introduce few automated lemmas and call them in order to lighten the prover load. We don’t observe performance problems by including all these annotations and despite the absence of modular proofs. The frame reasoning rule of Separation Logic seems to play an important role in this good behavior.

We found useful the two ways of specifying inductive predicates in VeriFast: by case on the model or by case over the aliasing of heap locations. We started with the first style, but finally chose the second to bring advantages of automatic folding and unfolding of computed predicates.

3.4 Challenges Dealt

To resume, we faced the following challenges during the verification process:

  • We considered a functional specification which is already in use in client code. Therefore, we can not adapt this specification to ease the verification. Instead, we propose a method based on a refined specification based on a precise model of the container that eases the verification and allows to obtain the initial specification with minimal cost.

  • The specification we received is complete with respect to the model of containers and cursors. This requires to specify logic functions and predicates that are more complex than the usual ones.

  • The code has been designed to obtain efficient container implementation and does not focus on verification. Therefore, the verification task has been more difficult compared with previous work verifying functional specification of container libraries [28, 39] designed with verification in mind.

  • Only specifications of contracts for public operations on the container were provided. We had to annotate the code and the internal operations. This implied an additional cost in annotations because some internal operations break the data structure invariants.

  • Having in mind the extension of this verification effort to other bounded container libraries (for sets or maps), we propose reusable logic libraries and suggest some improvements for the auto-active verification tool in use.

4 Verification Results

Bugs Found: We did not find spectacular bugs in the code, which is normal for a library that has been used for years. We only detected a potential arithmetic overflow in the computation of the memory to be allocated and a potential memory shortage. The last problem is in fact dealt for the SPARK clients using tools that measure the memory allocated by the program.

Complete Specifications: We also fix some minor completeness problems with the original specifications. Our verification effort leads to a complete functional specifications for all operations, including non public operations.

Table 1. Statistics on the proof

Specification Load: We have coded, specified, and verified 27 functions out of the 39 provided by the container library including equality and emptiness tests, clear, assign and copy, getting and setting one element, manipulating the cursors, inserting and deleting at some cursor, finding an element before and after a cursor. Most of the 12 remaining functions deal with sorting. The size of our development is given in Table 1. To obtain a specification close to the Ada 2012 one, we wrote two files of logic definitions for models (vfseq.gh and vfmap.gh) extending the VeriFast libraries. Additional fixpoint functions and lemmas required on VeriFast lists are written in file vflist.gh. The rate between source code and annotations is about 1 to 9. The required annotations (i.e., data structure invariant, pre/post conditions, and logic predicate and function used directly in them) represent a quarter of all annotations (including also loop invariants and lemmas). In Ada 2012 container, the rate between source code and contracts is already of about 1 to 3.

Verification Performance: We run VeriFast on a machine with 16 GB RAM, Intel core i5, and 2.70 GHz, installed with Linux. The back-end solver of VeriFast was redux. The verification takes 1.3 s for the full container.

5 Related Work

The verification of individual data structures has received special attention. General safety properties (i.e., absence of out of array bounds accesses, null dereferences, division by zero, arithmetic overflow) may be verified automatically with low load of annotations using static analysis methods, e.g. [13, 17, 19, 21]. More complex properties like reachability of locations in the heap and shape of the data structures could also be proved with static analysis methods based on shape analysis, e.g., [4, 6, 10, 32]. These automatic techniques have been applied to linked lists coded in arrays [34]. These methods concern limited properties and may be used in the early stages of the library development to infer internal invariant properties. Extension of fully automatic techniques to cover functional specification abstractions like sets or bags are based either on shape analysis, e.g.,  [7, 14] or on logic fragments supported by SMT decision procedures [16, 18, 37, 38]. These functional specifications capture essential mathematical properties of the data structure but do not deal with properties of iterators over them.

At the opposite end of the spectrum of verification techniques, interactive provers have been used to obtain detailed specifications about data structures based on powerful theories, e.g., [8, 23, 29], but they require expertise and great amount of proof scripting.

At the intermediate level of automation, functional verification tools have been used to tackle the verification of specific data structures (e.g., Dafny [33], GRASShoper [26], VeriFast  [15], or Why3 [36]) but we are not aware of any experiment on bounded lists.

The full functional correctness of container libraries has been considered in [28, 39]. They consider complex data structures in imperative and object oriented languages that require to verify special properties and may benefit from modular verification thanks to inheritance. In both cited works, a special effort has been deployed to improve the prover to call solvers for different theories or to generate verification conditions that may be dealt with efficiently. This efforts lead to a low annotation overhead, especially in [28]. We use an on-the-shelf auto-active verification tool but improve its performances by employing a refinement method which leads to more automation but a more important annotation overhead. None of these works consider the container of bounded list.

6 Conclusion

We apply auto-active verification provided by the VeriFast tool to prove the functional specifications of the bounded doubly linked list container. The implementation we consider is in C, but it mimics the GNAT library [12], which is used in SPARK client programs. The functional specification is model-based and uses sequence and map mathematical models in a specific way to model the content of the list and its valid cursors. Our main contributions are (i) the improvement of the logic libraries of VeriFast to deal with such specific models and (ii) the use of a refinement based methodology to ease the proof automation.

This case study provides a motivation for the development of inductive solvers and their connection with auto-active provers like VeriFast. This experiment is another demonstration of the known fact (see [28]) that proving functional specifications of real world containers is more difficult than proving functional specification of data structures. The support for automation of these proofs is of an utmost importance to scale the verification to a full library of containers.