1 Introduction

Separation logic (SL) [20, 37] has been well established for reasoning about heap-manipulating programs (like linked-lists and trees). Often, SL is used in combination with inductive predicates to precisely specify data structures manipulated by a program. In the last decade, a large number of SL-based verification systems have been developed [1, 3, 6, 8, 13, 18, 19, 24, 29, 33, 36]. In these systems, SL is typically used to express assertions about program states. The problem of validating these assertions can be reduced to the entailment problem in SL, i.e., given two SL formulas \(\varDelta _a\) and \(\varDelta _c\), to check whether \(\varDelta _a ~\models ~ \varDelta _c\) holds. Moreover, SL provides the frame rule [20], one prominent feature to enable compositional (a.k.a. modular) reasoning in the presence of the heap:

where c is a program, P, Q and F are SL formulas, and \(*\) is the separating conjunction in SL. Intuitively, \(P {*} F\) states that \(P\) and \(F\) hold in disjoint heaps. This conjunction allows the frame rule to guarantee that F is unchanged under the action of c. This feature of SL is essential for scalability [6, 21, 44] as it allows the proof of a program to be decomposed (and reused) into smaller ones, e.g., proofs of procedures. To automate the application of the frame rule, SL-based proof systems rely on a generalized form of the entailment, which is referred to as frame inference [1, 8, 12, 33, 39]. That is, given \(\varDelta _a\) and \(\varDelta _c\), to check whether \(\varDelta _a\) entails \(\varDelta _c\) and simultaneously generate the residual heap, which is a satisfiable frame \(\varDelta _f\) capturing properties of the memory in \(\varDelta _a\) that is not covered by \(\varDelta _c\). This problem, especially if \(\varDelta _a\) and \(\varDelta _c\) are constituted by general inductive predicates, is highly non-trivial as it may require inductive reasoning. Existing approaches [1, 33] are limited to specific predicates e.g., linked lists and trees. The systems reported in [8, 12, 39] do not adequately support the frame inference problem for inductive entailments in separation logic with predicate definitions and arithmetic.

In this work, we propose a sound approach for frame inference which aims to enhance modular verification in an expressive SL fragment with general inductive predicates and Presburger arithmetic. Intuitively, given an entailment \(\varDelta _a ~{\models }~ \varDelta _c\), our goal is to infer a satisfiable frame axiom \(\varDelta _f\) such that \(\varDelta _a ~{\models }~ \varDelta _c *\varDelta _f\) holds. Our approach works as follows. We first augment the entailment checking with an unknown second-order variable \({\mathtt{{U_f}}}(\bar{t})\) as a place-holder of the frame, where \(\bar{t}\) is a set of pointer-typed variables common in \(\varDelta _{a}\) and \(\varDelta _{c}\). That is, the entailment checking becomes \(\varDelta _a ~\models ~ \varDelta _c*{\mathtt{{U_f}}}(\bar{t})\). Afterwards, the following two steps are conducted. Firstly, we invoke a novel proof system to derive a cyclic proof for \(\varDelta _a ~\models ~ \varDelta _c *{\mathtt{{U_f}}}(\bar{t})\) whilst inferring a predicate which \({\mathtt{{U_f}}}\) must satisfy so that the entailment is valid. We show that the cyclic proof is valid if this predicate is satisfiable. Secondly, we strengthen the inferred frame with shape normalization and arithmetic inference.

For the first step, we design a new cyclic proof system (e.g., based on [2, 3]) with an automated cut rule so as to effectively infer the predicate on \({\mathtt{{U_f}}}\). A cyclic proof is a derivation tree whose root is the given entailment checking and whose edges are constructed by applying SL proof rules. A derivation tree of a cyclic proof may contain virtual back-links, each of which links a (leaf) node back to an ancestor. Intuitively, a back-link from a node l to an internal node i means that the proof obligation at l is induced by that at i. Furthermore, to avoid potentially unsound cycles (i.e., self-cycles), a global soundness condition must be imposed upon these derivations to qualify them as genuine proofs. In this work, we develop a sequent-based cyclic proof system with a cyclic cut rule so as to form back-links effectively and check the soundness condition eagerly. Furthermore, we show how to extract lemmas from the proven cyclic proofs and reuse them through lemma application for an efficient proof system. These synthesized lemmas work as dynamic cuts in the proposed proof system.

For the second step, we strengthen the inferred predicate on the frame \({\mathtt{{U_f}}}(\bar{t})\) so that it becomes more powerful in establishing correctness of certain programs. In particular, the inferred frame is strengthened with predicate normalization and arithmetic inference. The normalization includes predicate split (i.e., to expose the spatial separation of the inferred frame) and predicate equivalence (i.e., to relate the inferred frame with user-supplied predicates). The arithmetic inference discovers predicates on pure properties (size, sum, height, content and bag) to support programs which require induction reasoning on both shape and data properties.

Lastly, we have implemented the proposal and integrated it into a modular verification engine. Our experiments show that our approach infers strong frames which enhances the verification of heap-manipulating programs.

2 Preliminaries

In this section, we present the fragment of SL which is used as the assertion language in this work. This fragment, described in Fig. 1, is expressive enough for specifying and verifying properties of a variety of data structures [24,25,26, 35, 41]. We use \(\bar{t}\) to denote a sequence of terms and occasionally use a sequence (i.e., \(\bar{t}\)) to denote a set when there is no ambiguity. A formula \(\varPhi \) in our language is a disjunction of multiple clauses \(\varDelta \), each of which is a conjunction of a spatial predicate \(\kappa \) and a pure (non-heap) constraint \(\pi \). The spatial predicate \(\kappa \) captures properties of the heap whereas \(\pi \) captures properties of the data. \(\kappa \) can be an empty heap \(\mathtt{emp}\), or a points-to predicate \({r}{{\mapsto }}c(\bar{v})\) where \(c\) is a data structure, or a user-defined predicate \({\mathtt{{P}}}(\bar{t})\) or a spatial conjunction \(\kappa _1{*}\kappa _2\). \(\mathtt{null}\) is a special heap location. A pure constraint \(\pi \) is in the form of (dis)equality \(\alpha \) (on pointers) and Presburger arithmetic \(\phi \). We write \(v_1\,{\ne }\,v_2\) and \(v\,{\ne }\,\mathtt{null}\) for \(\lnot (v_1\,{=}\,v_2)\) and \(\lnot (v\,{=}\,\mathtt{null})\), respectively. We often omit the pure part of a formula \(\varPhi \) when it is \({\mathtt{{true}}}\,\). For standardizing the notations, we use uppercase letters for unknown (to-be-inferred) predicates, (e.g., \({\mathtt{{P}}}(\bar{t})\)) and lowercase letters (e.g., \(p(\bar{t})\)) for known predicates.

Fig. 1.
figure 1

Syntax

A user-defined (inductive) predicate \({\mathtt{{P}}}(\bar{v})\) with parameters \(\bar{v}\) is defined in the form of a disjunction, i.e., \(\mathtt{pred}~{\mathtt{{P}}}(\bar{v})\,{\equiv }\,\varPhi \), where each disjunct in \(\varPhi \) is referred to as a branch. In each branch, variables that are not in \(\bar{v}\) are implicitly existentially-quantified. We use function \(\mathtt{unfold}({\mathtt{{P}}}(\bar{t}))\) to replace an occurrence of inductive predicates by the disjuncts in the definition of \(\mathtt {P}\) with actual/formal parameters renaming. For example, the following predicates \(\mathtt {lseg}\) and \(\mathtt {lsegn}\) are defined to express list segments where every node contains the same value 1, given data structure \(\mathtt {node\{{int ~val;~ node ~ next};\}}\).

$$ \begin{array}{l} {\mathtt{pred}~ {\mathtt{{lseg}}}(\mathtt{root}{,}l) {\equiv }} {\mathtt{emp}{\wedge } {\mathtt{{\mathtt{root}}}}{=}l ~ {\vee }~\exists ~q {\cdot } {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q) {*} {\mathtt{{lseg}}}(q{,}l) ;} \\ {\mathtt{pred}~ {\mathtt{{lsegn}}}(\mathtt{root}{,}l{,}n) {\equiv }} {\mathtt{emp}{\wedge } {\mathtt{{\mathtt{root}}}}{=}l {\wedge }n{=}0} {~{\vee }~\exists ~q {\cdot }~ {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q) {*} {\mathtt{{lsegn}}}(q{,}l{,}n{-}1) ; } \\ \end{array} $$

where \(\mathtt{root}\) is the head, \(l\) the end of the segment and \(n\) the length of the segment.

In our framework, we may have lemmas to assist program verification. A lemma \(\iota \) of the form \(\varDelta _l ~{\rightarrow }~ \varDelta _r\), which means that the entailment \(\varDelta _l\,\models \,\varDelta _r\) holds. We write \(A\,{\leftrightarrow }\,B\), a short form of \(A\,{\rightarrow }\,B\) and \(B{\,\rightarrow }\,A\), to denote a two-way lemma. If \(A\,{\leftrightarrow }\,B\), \(A\) is semantically equivalent to \(B\). We use \(E\) and \(F\) to denote an entailment problem.

In the following, we discuss semantics of the SL fragment. Concrete heap models assume a fixed finite collection Node, a fixed finite collection Fields, a disjoint set Loc of locations (i.e., heap addresses), a set of non-address values Val such that \(\mathtt{null}\,{\in }\,{\textit{Val}}\) and Val \(\cap \) Loc = \(\emptyset \). The semantics is given by a satisfaction relation: \(s{,}h\,{\models }\,\varPhi \) that forces the stack \(s\) and heap \(h\) to satisfy the constraint \(\varPhi \) where \(h\in {\textit{Heaps}} \), \(s\,{\in }\,{\textit{Stacks}}\), and \(\varPhi \) is a formula. \(\textit{Heaps}\) and \(\textit{Stacks}\) are defined as follows.

The details of semantics of this SL fragment follow the one in [25].

Fig. 2.
figure 2

Code of \(\mathtt {append}\).

3 Illustrative Example

In the following, we first discuss the limitation of the existing entailment procedures [1, 8] to the frame inference problem. Given an entailment, these procedures deduce it until the following subgoal is obtained: . Then, they conclude that \(\varDelta _a\) is the residual frame. However, these approaches provide limited support for proofs of induction. While [1] provides inference rules as a sequence of inductive reasoning for hardwired lists and trees, our previous work [8] supports inductive proofs via user-supplied lemmas [30]. Hence, it is very hard for these procedures to automatically infer the frame for the entailments which require proofs of induction.

We illustrate our approach via the verification of the \(\mathtt {append}\) method shown in Fig. 2, which appends a singly-linked list referred to by \(\mathtt {y}\) to the end of the singly-linked list referred to by \(\mathtt {x}\). It uses the auxiliary procedure \(\mathtt {last}\) (lines 8–12) to obtain the pointer referring to the last node in the list. Each node object x has a data value \(\mathtt {x{\texttt {-\!\!>}}data}\) and a next pointer \(\mathtt {x {\texttt {-\!\!>}} next}\). For simplicity, we assume that every node in the \(\mathtt {x}\) list and the \(\mathtt {y}\) list has data value 1. The correctness of \(\mathtt {append}\) and \(\mathtt {last}\) is specified using our fragment of SL with a pre-condition (\(\mathtt{requires}\)) and a post-condition (\(\mathtt{ensures}\)). The auxiliary variable \(\mathtt{res}\) denotes the return value of the procedure. Note that these specifications refer to the user-provided predicates \({\mathtt{{lln}}}\) and \({\mathtt{{ll\_last}}}\), which are defined as follows.

$$ \begin{array}{l} \mathtt{pred}~ {\mathtt{{lln}}}(\mathtt{root}{,}n) ~{\equiv }~ \mathtt{emp}{\wedge } {\mathtt{{\mathtt{root}}}}{=}\mathtt{null}{\wedge }n{=}0 ~{\vee }~\exists ~q {\cdot } {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q) {*} {\mathtt{{lln}}}(q{,}n{-}1) ;\\ \mathtt{pred}~{\mathtt{{ll\_last}}}(\mathtt{root}{,}l{,}n) ~{\equiv }~ {\mathtt{{l}}}{\mapsto }{\mathtt{{node}}}(1{,}\mathtt{null}){\wedge }{\mathtt{{\mathtt{root}}}}{=}l {\wedge } n{=}1 \\ \qquad ~{\vee } ~\exists q {\cdot }~ {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q) {*} {\mathtt{{ll\_last}}}(q{,}l{,}n{-}1) ; \end{array} $$

Intuitively, the predicate \({\mathtt{{lln(\mathtt{root}{,}n)}}}\) is satisfied if \(\mathtt{root}\) points to a singly-linked list with \(\mathtt {n}\) nodes. The predicate is satisfied if \(\mathtt {t}\) points to a list segment with last element \(\mathtt {p}\) and length \(\mathtt {n}\). In our framework, we provide a library of commonly used inductive predicates (and the corresponding lemmas), including for example the definitions for list segments \(\mathtt {lseg}\) and \(\mathtt {lsegn}\) introduced earlier. Given these specifications, we automatically deduce predicates on the intermediate program states (using existing approaches [8]), shown as comments in Fig. 2, as well as the following three entailment checks that must be established in order to verify the absence of memory errors and the correctness of the method \(\mathtt {append}\).

$$ \begin{array}{l} {\mathtt{{E_1{:}}}} {\mathtt{{lln}}}(x{,}i){*}{\mathtt{{lln}}}(y{,}j) {\wedge } i{>}0 {~{\vdash }}~ \exists ~n_1{\cdot }{\mathtt{{lln}}}(x{,}n_1) {\wedge } n_1{>}0 \\ {\mathtt{{E_2{:}}}} {\mathtt{{ll\_last}}}(x{,}t{,}i) {*}{\mathtt{{lln}}}(y{,}j){\wedge }i{>}0 ~~{{\vdash }}~~ \exists ~q{,}v{\cdot }t{\mapsto }{\mathtt{{node}}}({v}{,}q) \\ {\mathtt{{E_3{:}}}} {\mathtt{{lsegn}}}(\mathtt{res}{,}t{,}i{-}1) {*}t{\mapsto }{\mathtt{{{\mathtt{{node}}}}}}(1{,}y){*} {\mathtt{{lln}}}(y{,}j){\wedge }i{>}0 ~~{{\vdash }}~~ {\mathtt{{lln}}}(\mathtt{res}{,}i{+}j) \end{array} $$

\(\mathtt {E_1}\) aims to establish a local specification at line 5 which we generate automatically. \(\mathtt {E_2}\) must be satisfied so that no null-dereference error would occur for the assignment to \(\mathtt {t{\texttt {-\!\!>}}next}\) at line 6. \(\mathtt {E_3}\) aims to establish that the postcondition is met. Frame inference is necessary in order to verify the program. In particular, frame inference for \(\mathtt {E_2}\) is crucial to construct a precise heap state after line 6, i.e., the state \(\alpha \) in the figure, which is necessary to establish \(\mathtt {E_3}\). Furthermore, the frame of \(\mathtt {E_3}\) (which is inferred as \(\mathtt {emp}\)) helps to show that this program does not leak memory. As the entailment checks \(\mathtt {E_2}\) and \(\mathtt {E_3}\) require both induction reasoning and frame inference, they are challenging for existing SL proof systems [3, 8, 9, 12, 15, 31, 36, 40]. In what follows, we illustrate how our system establishes a cyclic proof with frame inference for \(\mathtt {E_2}\).

Frame Inference. Our frame inference starts with introducing an unknown predicate (a second-order variable) \({\mathtt{{U_1}}}(x{,}t{,}q{,}v{,}y)\)Footnote 1 as the initial frame, which is a place-holder for a heap predicate on variables \(x\), \(t\), \(q\) and \(y\) (i.e., variables referred to in \(\mathtt {E_2}\)). That is, \(\mathtt {E_2}\) is transformed to the following entailment checking problem:

$$ \begin{array}{ll} {\mathtt{{F_2{:}}}}&{\mathtt{{ll\_last}}}(x{,}t{,}i){*} {\mathtt{{lln}}}(y{,}j) {\wedge }i{>}0 {~{\vdash }_{{L}_0}}~ \exists q{,}v{\cdot }t{\mapsto }{\mathtt{{node}}}(v{,}q) {*} {\mathtt{{U_1}}}(x{,}t{,}q{,}v{,}y) \end{array} $$

where \({L}_0\) is a set of induction hypotheses and sound lemmas. This set is accumulated automatically during the proof search and used for constructing cyclic proofs and lemma application. If a hypothesis is proven, it becomes a lemma and may be applied latter during the proof search. in this example, initially \({L}_0\,{=}\,\emptyset \). The proposed proof system derives a cyclic proof for the entailment problem and, at the same time, infers a set of constraints \(\mathcal {R}\) for \({\mathtt{{U_1}}}(x{,}t{,}q{,}v{,}y)\) such that the proof is valid if the system \(\mathcal {R}\) is satisfiable. Each constraint in \(\mathcal {R}\) has the form of logical implication i.e., \(\varDelta _{b} ~{\Rightarrow }~ {\mathtt{{U}}}(\bar{v})\) where \(\varDelta _{b}\) is the body and \({\mathtt{{U}}}(\bar{v})\) is the head (a second-order variable). For \(\mathtt {F_2}\), the following two constraints are inferred, denoted by \(\sigma _1\) and \(\sigma _2\).

$$ \begin{array}{l} \sigma _1{:}~ {\mathtt{{lln}}}(y{,}j){\wedge }t{=}x {\wedge }q{=}\mathtt{null}{\wedge }v{=}1~ ~{\Rightarrow }~{{\mathtt{{U_1}}}(x{,}t{,}q{,}v{,}y)} \\ \sigma _2{:}~x_2{\mapsto }{\mathtt{{node}}}(1{,}x){*}{\mathtt{{U_1}}}(x{,}t{,}q{,}v{,}y) ~{\Rightarrow }~ {\mathtt{{U_1}}}(x_2{,}t{,}q{,}v{,}y) \end{array} $$

We then use a decision procedure (e.g., \(\mathtt {S2SAT_{SL}}\) [25, 26] or [4]) to check the satisfiability of \(\sigma _1{\wedge }\sigma _2\). Note that we write a satisfiable definition of \((\varDelta _1 {\Rightarrow } {\mathtt{{U}}}(\bar{v})) {\wedge } (\varDelta _2 {\Rightarrow } {\mathtt{{U}}}(\bar{v})\)) in the equivalent form of \({\mathtt{{U}}}(\bar{v})\,{\equiv }\,\varDelta _1{\vee } \varDelta _2\). For instance, the above constraints are written as:

$$ \begin{array}{l} {\mathtt{{U_1}}}(\mathtt{root}{,}t{,}q{,}v{,}y) ~{\equiv }~ {\mathtt{{lln}}}(y{,}j){\wedge }{\mathtt{{\mathtt{root}}}}{=}t {\wedge }q{=}\mathtt{null}{\wedge } v{=}1 \\ \qquad {\vee }~\exists q_1 {\cdot } {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q_1) {*} {\mathtt{{U_1}}}(q_1{,}t{,}q{,}v{,}y); \\ \end{array} $$

Note that, in the above definition of \(\mathtt {U_1}\), the separation of those heap-lets referred to by \(\mathtt{root}\), \(y\) and \(q\) is not explicitly captured. Additionally, relations over the sizes are also missing. Such information is necessary in order to establish the left-hand side of \(\mathtt {E_3}\). The successful verification of \(\mathtt {E_3}\) in turn establishes the postcondition of method \(\mathtt {append}\). In the following we show how to strengthen the inferred frame.

Frame Strengthening. We strengthen \(\mathtt {U_1}\) with spatial separation constraints on the pointer variables \(root\), \(y\) and \(q\). To explicate the spatial separation among these pointers, our system generates the following equivalent lemma and splits \(\mathtt {U_1}\) into two disjoint heap regions (with conjunction):

where \(\mathtt {U_2}\) is a new auxiliary predicate with an inferred definition:

$$ {\mathtt{{U_2}}}(\mathtt{root}{,}t) ~{\equiv }~ \mathtt{emp}{\wedge }{\mathtt{{\mathtt{root}}}}{=}t ~ {\vee }~ \exists ~q_1 {\cdot }~ {\mathtt{{\mathtt{root}}}}{\mapsto }{\mathtt{{node}}}(1{,}q_1) {*} {\mathtt{{U_2}}}(q_1{,}t) $$

Next, our system detects that \(\mathtt {U_2}\) is equivalent to the user-defined predicate \(\mathtt {lseg}\), and generates the lemma: \({\mathtt{{U_2}}}(\mathtt{root}{,}t) {\leftrightarrow } {\mathtt{{lseg}}}(\mathtt{root}{,}t)\). Relating \(\mathtt {U_2}\) to \(\mathtt {lseg}\) enhances the understanding of the inferred predicates. Furthermore, as shown in [9], this relation helps to reduce the requirements of induction reasoning among equivalent inductive predicates with different names. Substituting \(\mathtt {U_2}\) with the equivalent \(\mathtt {lseg}\), \(\mathtt {U_1}\) becomes:

$$ \begin{array}{l} {\mathtt{{U_1}}}(\mathtt{root}{,}t{,}q{,}v{,}y)\equiv {\mathtt{{lseg}}}(\mathtt{root}{,}t){*}{\mathtt{{lln}}}(y{,}j){\wedge }q{=}\mathtt{null}{\wedge }v{=}1 \end{array} $$

This definition states that frame \(\mathtt {U_1}\) holds in two disjoint heaps: one list segment pointed to by \(\mathtt{root}\) and a list pointed to by \(y\). After substitution the entailment \(\mathtt {F_2}\) becomes

$$ \begin{array}{l} {\mathtt{{ll\_last}}}(x{,}t{,}i){*} {\mathtt{{lln}}}(y{,}j){\wedge }i{>}0 ~ {{\vdash }}_{{L}_0}~ {\mathtt{{t}}}{\mapsto }{\mathtt{{node}}}(1{,}\mathtt{null}) {*} {\mathtt{{lseg}}}(x{,}t){*} {\mathtt{{lln}}}(y{,}j) \end{array} $$
Fig. 3.
figure 3

Basic inference rules for entailment procedure (where gsc is global soundness condition)

Next, we further strengthen the frame with pure properties, which is necessary to successfully establish the left hand side of \(\mathtt {E_3}\). In particular, we generate constraints to capture that the numbers of allocated heaps in the left hand side and the right hand side of \(\mathtt {F_2}\) are identical. Our system obtains these constraints through two phases. First, it automatically augments an argument for each inductive predicate in \(\mathtt {F_2}\) to capture its size property. Concretely, it detects that while predicates \(\mathtt {ll\_last}\) and \(\mathtt {lln}\) have such size argument already, the shape-based frame \(\mathtt {lseg}\) has not. As so, it extends \({\mathtt{{lseg}}}(\mathtt{root}{,}t)\) to obtain the predicate where the size property is captured by parameter \(m\). Now, we substitute the \(\mathtt {lsegn}\) into \(\mathtt {F_2}\) to obtain:

After that, we apply the same three steps of frame inference to generate the size constraint: constructing unknown predicates, proving entailment and inferring a set of constraints and checking satisfiability. For the first step, the above entailment is enriched with one unknown (pure) predicate: \({\mathtt{{P_1}}}(i{,}j{,}k)\) which is the place-holder for arithmetical constraints among size variables \(i\), \(j\) and \(k\). The augmented entailment checking is:

$$ \begin{array}{l} {\mathtt{{ll\_last}}}(x{,}t{,}i){*} {\mathtt{{lln}}}(y{,}j) {\wedge }i{>}0 \\ \quad {{\vdash }_{{L}_0}} ~\exists k{\cdot }{\mathtt{{lsegn}}}(x{,}t{,}k){*} t{\mapsto }{\mathtt{{node}}}(1{,}\mathtt{null}) {*} {\mathtt{{lln}}}(y{,}j){\wedge } {\mathtt{{P_1}}}(i{,}j{,}k) \end{array} $$

Secondly, our system successfully derives a proof for the above entailment under condition that the following disjunctive set of two constraints is satisfiable.

$$ \begin{array}{ll} \sigma _3{:}&{} i{=}1{\wedge }k{=}0 ~{\Rightarrow }~{{\mathtt{{P_1}}}(i{,}j{,}k)}\\ \sigma _4{:}&{} i_1{=}i{-}1 {\wedge } k_1{=}k{-}1{\wedge }i{>}0 {\wedge }{\mathtt{{P_1}}}(i_1{,}j{,}k_1) ~{\Rightarrow }~ {{\mathtt{{P_1}}}(i{,}j{,}k)}\\ \end{array} $$

Lastly, to check whether the \(\sigma _3 {\wedge } \sigma _4\) is satisfiable, we automatically compute the closure form for \(\sigma _3 {\wedge } \sigma _4\) as: \({{\mathtt{{P_1}}}(i{,}j{,}k)}~{\equiv }~ k{=}i{-}1{\wedge }i{>}0\). This formula is satisfiable and substituted into the frame as: \({\mathtt{{lsegn}}}(x{,}t{,}k){*}{\mathtt{{lln}}}(y{,}j){\wedge }q{=}\mathtt{null}{\wedge }v{=}1{\wedge }k{=}i{-}1{\wedge }i{>}0\).

4 Frame Inference

In this section, we present our approach for frame inference in detail. Given an entailment \(\varDelta _a ~{{\vdash }}~ \varDelta _c\), where \(\varDelta _{a}\) is the antecedent (LHS) and \(\varDelta _{c}\) is the consequence (RHS), our system attempts to infer a frame \(\varDelta _{f}\) such that when a frame is successfully inferred, the validity of the entailment \(\varDelta _a ~{{\vdash }}~ \varDelta _c {*}\varDelta _f\) is established at the same time.

Our approach has three main steps. Firstly, we enrich RHS with an unknown predicate in the form of \({\mathtt{{{\mathtt{{U}}}}}}(\bar{v})\) to form the entailment \(\varDelta _a ~{\vdash }_{{L}}~ \varDelta _c{*}{\mathtt{{{\mathtt{{U}}}}}}(\bar{v})\) where \(\bar{v}\) includes all free pointer-typed variables of \(\varDelta _a\) and \(\varDelta _c\) and \({L}\) is the union of a set of user-supplied lemmas and a set of induction hypotheses (initially \(\emptyset \)). Among these, the parameters are annotated with \({{{\scriptstyle \#}}}\) following the principle that instantiation (and subtraction) must be done before inference. The detail is as follows: (i) all common variables of \(\varDelta _a\) and \(\varDelta _c\) are \({{{\scriptstyle \#}}}\)-annotated; (ii) points-to pointers of \(\varDelta _c\) are \({{{\scriptstyle \#}}}\)-annotated; (iii) the remaining pointers are not \({{{\scriptstyle \#}}}\)-annotated. In the implementation, inference of frame predicates is performed incrementally such that shape predicates are inferred prior to pure ones. Secondly, we construct a proof of the entailment and infer a set of constraints \(\mathcal {R}\) for \({\mathtt{{{\mathtt{{U}}}}}}(\bar{v})\). Thirdly, we check the satisfiability of \(\mathcal {R}\) using the decision procedure in [25, 26].

In the following, we present our entailment checking procedure with a set of proof rules shown in Figs. 3 and 4. For each rule, the obligation is at the bottom and its reduced form is on the top. In particular, the rules in Fig. 3 are used for entailment proving (i.e., to establish a cyclic proof) and the rules in Fig. 4 are used for predicate inference.

Fig. 4.
figure 4

Inference rules with predicate synthesis.

Given an entailment check in the form of \(\varDelta _a\,{{\vdash }}_{{L}}\,\varDelta _c\), the rules shown in Fig. 3 are designed to subtract the heap (via the rules \([\underline{\mathbf{\scriptstyle M}}]\) and \([\underline{\mathbf{\scriptstyle PRED-M}}]\)) on both sides until their heaps are empty. After that, it checks the validity for the implication of two pure formulas by using an SMT solver, like Z3 [27], as shown in rule \([\underline{\mathbf{\scriptstyle EMP}}]\). Algorithmically, this entailment checking is performed as follows.

  • Matching. The rules \([\underline{\mathbf{\scriptstyle M}}]\) and \([\underline{\mathbf{\scriptstyle PRED-M}}]\) are used to match up identified heap chains. Starting from identified root pointers, the procedure keeps matching all their reachable heaps. It unifies corresponding fields of matched roots by using the following auxiliary function \(\text {freeEQ}(\rho )\): \(\text {freeEQ}([u_i/v_i]^n_{i=1}) = \bigwedge ^n_{i=1} \{ u_i = v_i \}\).

  • Unfolding. The rules \([\underline{\mathbf{\scriptstyle LU}}]\) and \([\underline{\mathbf{\scriptstyle RU}}]\) are used to derive alternative heap chains. While rule \([\underline{\mathbf{\scriptstyle LU}}]\) presents the unfolding in the antecedent, \([\underline{\mathbf{\scriptstyle RU}}]\) in the consequent.

  • Applying Lemma. Rule \([\underline{\mathbf{\scriptstyle CCUT}}]\) derives yet other alternative heap chains. For LHS which has at least one \(\mathtt {UD}\) predicate, we attempt to apply a lemma as an alternative search using \([\underline{\mathbf{\scriptstyle CCUT}}]\) rule. We notice that as we assume that a lemma which is supplied by the user is valid, applying this lemma does not requires the global condition.

Cyclic Proof. The proof rules in Fig. 3 are designed to establish cyclic proofs. In the following, we briefly describe a cyclic proof technique enhancing the proposal in [2].

Definition 1

(Pre-proof). A pre-proof of entailment \(E\) is a pair (\(\mathcal{T}_{i}\), \(\mathcal L\)) where \(\mathcal{T}_{i}\) is a derivation tree and \(\mathcal L\) is a back-link function such that: the root of \(\mathcal{T}_{i}\) is \(E\); for every edge from \(E_i\) to \(E_j\) in \(\mathcal{T}_{i}\), \(E_i\) is a conclusion of an inference rule with a premise \(E_{j}\). There is a back-link between \(E_c\) and \(E_l\) if there exists \(\mathcal L\)(\(E_l\)) = \(E_c\) (i.e., \(E_{c}=E_{l}\,\theta \) with some substitution \(\theta \)) ; and for every leaf \(E_l\), \(E_l\) is an axiom rule (without conclusion).

If \(\mathcal L\)(\(E_l\)) = \(E_c\), \(E_{l}\) (resp. \(E_{c}\)) is referred as a bud (resp. companion).

Definition 2

(Trace). Let (\(\mathcal{T}_{i}\), \(\mathcal L\)) be a pre-proof of \(\varDelta _{a}\,{{\vdash }_{{L}}}\,\varDelta _{c}\); \(({\varDelta _{a_i}\,{{\vdash }_{{L}_i}}\,\varDelta _{c_i} })_{i{\ge }0}\) be a path of \(\mathcal{T}_{i}\). A trace following \(({\varDelta _{a_i}\,{{\vdash }_{{L}_i}}\,\varDelta _{c_i} })_{i{\ge }0}\) is a sequence \((\alpha _i)_{i{\ge }0}\) such that each \(\alpha _i\) (for all \(i\,{\ge }\,0\)) is an instance of the predicate \({\mathtt{{P}}}(\bar{t})\) in the formula \({\varDelta _{a_i}}\), and either:

  • \(\alpha _{i{+}1}\) is the subformula containing an instance of \({\mathtt{{P}}}(\bar{t})\) in \(\varDelta _{a_{i+1}}\);

  • or \({\varDelta _{a_i} {{\vdash }_{{L}_i}} \varDelta _{c_i} }\) is the conclusion of an unfolding rule, \(\alpha _i\) is an instance predicate \({\mathtt{{P}}}(\bar{t})\) in \(\varDelta _{a_i}\) and \(\alpha _{i+1}\) is a subformula \(\varDelta \)[\(\bar{t}\)/\(\bar{v}\)] which is a definition rule of the inductive predicate \({\mathtt{{P}}}(\bar{v})\). i is a progressing point of the trace.

To ensure that a pre-proof is sound, a global soundness condition must be imposed to guarantee well-foundedness.

Definition 3

(Cyclic proof). A pre-proof (\(\mathcal{T}_{i}\), \(\mathcal L\)) of \(\varDelta _{a}\,{{\vdash }_{{L}}}\,\varDelta _{c}\) is a cyclic proof if, for every infinite path \((\varDelta _{a_i}\,{{\vdash }_{{L}_i}}\,\varDelta _{c_i})_{i{\ge }0}\) of \(\mathcal{T}_{i}\), there is a tail of the path \(p{=}(\varDelta _{a_i}\,{{\vdash }_{{L}_i}}\,\varDelta _{c_i})_{i{\ge }n}\) such that there is a trace following p which has infinitely progressing points.

Brotherston et al. proved [2] that \(\varDelta _{a} ~{\vdash }~ \varDelta _{c}\) holds if there is a cyclic proof of \(\varDelta _{a} ~{{\vdash }}_{\emptyset }~ \varDelta _{c}\) where \(\varDelta _{a}\) and \(\varDelta _{c}\) do not contain any unknown predicate.

In the following, we explain how cyclic proofs are constructed using the proof rules shown in Fig. 3. \([\underline{\mathbf{\scriptstyle LU}}]\) and \([\underline{\mathbf{\scriptstyle CCUT}}]\) are the most important rules for forming back-links and then pre-proof construction. While rule \([\underline{\mathbf{\scriptstyle LU}}]\) accumulates possible companions and stores them in historical sequents \({L}\), \([\underline{\mathbf{\scriptstyle CCUT}}]\) links a bud with a companion using some substitutions as well as checks the global soundness condition eagerly. Different to the original cyclic system [3], our linking back function only considers companions selected in the set of historical sequents \({L}\). Particularly, \(\varDelta _l {\rightarrow }\varDelta _r \in {L}\) is used as an intelligent cut as follows. During proof search, a subgoal (i.e., \({\varDelta _{a_1}{*}\varDelta _{a_2}} ~{{\vdash }_{{L}}}~{\varDelta _c}\)) may be matched with the above historical sequent to form a cycle and close the proof branch using the following principle. First, \(\varDelta _l ~{{\vdash }}~\varDelta _r\) is used as an induction hypothesis. As so, we have \(\varDelta _l\rho {*}\varDelta _{a_2} ~{\models }~\varDelta _r\rho {*}\varDelta _{a_2}\) where \(\rho \) are substitutions including those for avoiding clashing of variables between \(\varDelta _r\) and \(\varDelta _{a_2}\). If both \({\varDelta _{a_1}{*}\varDelta _{a_2}} ~{{\vdash }_{{L}}}~ {\varDelta _l\rho {*}\varDelta _{a_2}}\) and \({\varDelta _r\rho {*}\varDelta _{a_2}} ~{{\vdash }_{{L}}}~ {\varDelta _c}\) are proven, then we have:

$$ {\varDelta _{a_1}{*}\varDelta _{a_2}} {\implies } {\varDelta _l\rho {*}\varDelta _{a_2}} ~{\implies }~{\varDelta _r\rho {*}\varDelta _{a_2}} {\implies } {\varDelta _c}. $$

Thus, the subgoal \({\varDelta _{a_1}{*}\varDelta _{a_2}} ~{{\vdash }_{{L}}}~{\varDelta _c}\) holds. We remark that if a hypothesis is proven, it can be applied as a valid lemma subsequently.

In our system, often a lemma includes universally quantified variables. We thus show a new mechanism to instantiate those lemmas that include universally quantified variables. We denote constraints with universal variables as universal guards \(\forall G\). A universal guard \(\forall G\) is equivalent to an infinite conjunction \(\bigwedge _{\rho }G[\rho ]\). Linking a leaf with universal guards is not straightforward. For illustration, let us consider the following bud \(\mathtt {B_0}\) and the universally quantified companion/lemma \({\mathtt{{C_0}}}\in {L}\).

$$ \begin{array}{l} \!\!\!{\mathtt{{B_0{:}}}}{\mathtt{{lsegn}}}(\mathtt{root}{,}\mathtt{null},n) {\wedge } n{=}10 ~{{\vdash }}_{{L}}~ {\exists }r {\cdot } {\mathtt{{lsegn}}}(\mathtt{root}{,}r{,}3) {*} {\mathtt{{lsegn}}}(r{,}\mathtt{null},7) \\ \!\!\!{\mathtt{{C_0{:}}}} \forall a{,}b{\cdot }{\mathtt{{lsegn}}}(\mathtt{root}{,}\mathtt{null},n) {\wedge } n{=}a{+}b{\wedge }a{\ge }0{\wedge }b{\ge }0 \\ \qquad ~{\rightarrow }~ {\exists }r {\cdot } {\mathtt{{lsegn}}}(\mathtt{root}{,}r{,}a) {*} {\mathtt{{lsegn}}}(r{,}\mathtt{null},b) \end{array} $$

As shown in rule \([\underline{\mathbf{\scriptstyle CCUT}}]\), to link \(\mathtt {B_0}\) back to \(\mathtt {C_0}\), the LHS of these two entailments must be implied through some substitution. To obtain that, we propose lemma instantiation, a sound solution for universal lemma application. Based on the constraints in the LHS of the bud, our technique instantiates a universally quantified guard (of the selected companion/lemma) before linking it back. Concretely, we replace the universal guard by a finite set of its instances; an instantiation of a formula \(\forall \bar{v}G(\bar{t})\) is \(G(\bar{t})[\bar{w}/\bar{v}]\) for some vector of terms \(\bar{w}\). These instances are introduced based on the instantiations in both LHS and RHS of the corresponding bud e.g., \(n\,{=}\,10 \wedge a\,{=}\,3 \wedge b\,{=}\,7\) in \(\mathtt {B_0}\).

Frame Inference. The two inference rules shown in Fig. 4 are designed specifically to infer constraints for frame. In these rules, \({\bigtriangledown }(\bar{w},\pi )\) is an auxiliary function that existentially quantifies free variables in \(\pi \) that are not in the set \(\bar{w}\). This function extracts relevant arithmetic constraints to define data contents of the unknown predicates. \(\mathtt{R}(r{,}\bar{t})\) is either \(r{\mapsto }c(\bar{t})\) or a known (defined) predicate \({\mathtt{{P}}}(r{,}\bar{t})\), or an unknown predicate \({\mathtt{{U'}}}(r{,}\bar{t}{,}\bar{w}{{{\scriptstyle \#}}})\). The # in the unknown predicates is used to guide inference and proof search. We only infer on pointers without \({{{\scriptstyle \#}}}\)-annotation. \({\mathtt{{{\mathtt{{U}}}_{f}}}}(\bar{w},\bar{t'})\) is another unknown predicate which is generated to infer the shape of pointers \(\bar{w}\). Inferred pointers are annotated with \({{{\scriptstyle \#}}}\) to avoid double inference. A new unknown predicate \(\mathtt {{\mathtt{{U}}}_{f}}\) is generated only if there exists at least one parameter not to be annotated with \({{{\scriptstyle \#}}}\) (i.e., \(\bar{w}\cup \bar{t'} {\ne } \emptyset \)). To avoid conflict between the inference rules and the other rules (e.g., unfolding and matching), root pointers of a heap formula must be annotated with \({{{\scriptstyle \#}}}\) in unknown predicates. For example, in our system while is legal, \(x{\mapsto }{\mathtt{{c_1}}}(y){*}{\mathtt{{U_1}}}(x,y)\) is illegal. Our system applies subtraction on the heap pointed to by \(x\) rather than inference for the following check: .

Soundness. The soundness of the inference rules in Fig. 3 has been shown in unfold-and-match systems for general inductive predicates [3, 8]. In the following, we present the soundness of the inference rules in Fig. 4. We introduce the notation \(\mathcal{R}(\varGamma )\) to denote a set of predicate definitions \(\varGamma {\,=\,}\{{\mathtt{{U}}}_1(\bar{v}_1)\,{\equiv }\,\varPhi _1,..{\mathtt{{U}}}_n(\bar{v}_n)\,{\equiv }\,\varPhi _n\}\) satisfying the set of constraints \(\mathcal{R}\). That is, for all constraints \({\varDelta _l\Rightarrow \varDelta _r}\in \mathcal{R}\), (i) \(\varGamma \) contains states (\(s_i\), \(h_i\)), a predicate definition for each unknown predicate appearing in \(\varDelta _l\) and \(\varDelta _r\); (ii) by interpreting all unknown predicates according to \(\varGamma \), then it is provable that \(\varDelta _l\) implies \(\varDelta _r\) (i.e., there exists \(s_i \subseteq s\), \(h_i \subseteq h\) for \(i \in \{1..n\}\), and \(s, h\,\models \,\varDelta _l\) implies \(s, h\models \varDelta _r\)), written as .

Lemma 1

Given the entailment judgement \( \varDelta _{a}~{\vdash _{\{\}}}~\varDelta _{c}\,{\leadsto }\,\mathcal {R} \), if there is \({\varGamma }\) such that \(\mathcal{R}(\varGamma )\), the entailment \({\varGamma }{:} \varDelta _{a}{{\vdash }} \varDelta _{c}\) holds.

The soundness of the predicate synthesis requires that if definitions generated for unknown predicates are satisfiable, then the entailment is valid.

Theorem 1

Given the entailment judgement \(\varDelta _{a}\,{\vdash }_{\emptyset }\,\varDelta _{c} {\leadsto } \mathcal {R}\) \(\varDelta _a(\varGamma )\,{{\vdash }}\,\varDelta _c(\varGamma )\) holds if there exists a solution \(\varGamma \) of \(\mathcal {R}\).

Theorem 1 follows from the soundness of the rules in Fig. 3 and Lemma 1.

5 Extensions

In this section, we present two ways to strengthen the inferred frame, by inferring pure properties and by normalizing inductive predicates.

Pure Constraint Inference. The inferred frame is strengthened with pure constraints following two phases. We first enrich the shape-base frame with pure properties such as size, height, sum, set of addresses/values, and their combinations. After that, we apply the same three steps in Sect. 4 to infer relational assumptions on the new pure properties. Lastly, we check satisfiability of these assumptions using FixCalc [34].

In the following, we describe how to infer size properties given a set of dependent predicates. We can similarly infer properties on height, set of addresses and values properties. We first extend an inductive predicate with a size function to capture size properties. That is, given an inductive predicate \({\mathtt{{P}}}(\bar{v})\,{\equiv }\bigvee \varDelta _i\), we generate a new predicate \(\mathtt {Pn}\) with a new size parameter \(n\) as: \({\mathtt{{Pn}}}(\bar{v},n)\,{\equiv }\,\bigvee (\varDelta _i {\wedge } n\,{=}\,sizeF(\varDelta _i) )\) where function \(sizeF\) is inductively defined as follows.

$$ \begin{array}{l} \begin{array}{ll} sizeF({r}{{\mapsto }}c(\bar{t})){=}1 &{}\quad sizeF(\exists \bar{v}{\cdot }~\kappa {\wedge }\pi ) {=} sizeF(\kappa ) \\ sizeF(\mathtt{emp}){=}0 &{}~~ sizeF(\kappa _1{*}\kappa _2) {=} sizeF(\kappa _1) {+} sizeF({\kappa _2}) \\ \end{array} \\ sizeF({\mathtt{{P}}}(\bar{t})) {=} t_s \text { where } t_s {\in } \bar{t} \text{ and } t_s \text{ is } \text{ a } \text{ size } \text{ parameter } \end{array} $$

To support pure properties, we extend the proposed cyclic proof system with bi-abduction for pure constraints which was presented in [43]. In particular, we adopt the abduction rules to generate relational assumptions over the pure properties in LHS and RHS. These rules are applied exhaustively until no more unknown predicates occur.

Normalization. We aim to relate the inferred frame to existing user-provided predicates if possible as well as to explicate the heap separation (a.k.a. pointer non-aliasing) which may be implicitly constrained through predicates. Particularly, we present a lemma synthesis mechanism to explore relations between inductive predicates. Our system processes each inductive predicate in four steps. First, it generates heap-only conjectures (with quantifiers). Secondly, it enriches these conjectures with unknown predicates. Thirdly, it invokes the proposed entailment procedure to prove these conjectures, infer definitions for the unknown predicates and synthesize the lemmas. Last, it strengthens the inferred lemma with pure inference.

In the following, we present two types of normalization. This first type is to generate equivalence lemmas. This normalization equivalently matches a new generated predicate to an existing predicate in a given predicate library. Under the assumption that a library of predicates is provided together with advanced knowledge (i.e., lemmas in [1]) to enhance completeness. This normalization helps to reuse this knowledge for the new synthesized predicates, and potentially enhance the completeness of the proof system. Intuitively, given a set \(\mathtt {S}\) of inductive predicates and another inductive predicate \(\mathtt {P}\) (which is not in \(\mathtt {S}\)), we identify all predicates in \(\mathtt {S}\) which are equivalent to \(\mathtt {P}\). Heap-only conjecture is generated to explore the equivalent relation between two predicates, e.g., in the case of \({\mathtt{{P}}}(x,\bar{v})\) and \({\mathtt{{Q}}}(x,\bar{w})\): \(\forall \bar{v} {\cdot } {\mathtt{{P}}}(\mathtt{root},\bar{v}) {\rightarrow } {\exists } \bar{w}{\cdot } {\mathtt{{Q}}}(\mathtt{root},\bar{w}) \). The shared root parameter \(x\) has been identified by examining all permutations of root parameters of the two predicates. Moreover, our system synthesizes lemmas incrementally for the combined domains of shape and pure properties. For example, with \(\mathtt {lln}\) and \(\mathtt {lsegn}\), our system generates the following lemma afterwards: \({\mathtt{{lsegn}}}(\mathtt{root}{,}\mathtt{null}{,}n) {\leftrightarrow } {\mathtt{{lln}}}(\mathtt{root}{,}n)\).

The other type of normalization is to generate separating lemmas. This normalization aims to expose hidden separation of heaps in inductive definitions. This paragraph explores parallel or consequence separate relations over inductive predicates parameters. Two parameters of a predicate are parallel separating if they are both root parameters e.g., \(r_1\) and \(r_2\) of the predicate \(\mathtt {zip2}\) as follows.

$$ \begin{array}{l} {\mathtt{{zip2}}}(r_1{,}r_2{,}n) ~{\equiv }~ \mathtt{emp} {\wedge } r_1{=}\mathtt{null}{\wedge } r_2{=}\mathtt{null}{\wedge }n{=}0\\ \qquad {\vee }~ r_1{\mapsto }c_1(q_1) {*} r_2{\mapsto }c_1(q_2) {*} {\mathtt{{zip2}}}(q_1{,}q_2{,}n{-}2) ; \end{array} $$

Two arguments of a predicate are consequence separating if one is a root parameter and another is reachable from the root in all base formulas derived by unfolding the predicate (e.g., those of the predicate \(\mathtt {ll\_last}\)). We generate these separating lemmas to explicate separation globally. As a result, the separation of actual parameters is externally visible to analyses. This visible separation enables strong updates in a modular heap analysis or frame inference in modular verification. Suppose \(r_1\), \(r_2\) are consequence or parallel parameters in \({\mathtt{{Q}}}(r_1,r_2,\bar{w})\), heap conjecture is generated as:

$$ {\mathtt{{Q}}}(r_1,r_2,\bar{w}) \rightarrow {\mathtt{{Q_1}}}(r_1) {*} {\mathtt{{Q_2}}}(r_2){*}{\mathtt{{Q_3}}}(\bar{w}) $$

This technique could be applied to synthesize spit/join lemmas to transform predicates into the fragment of linearly compositional predicates [14, 15]. For example, our system splits the predicate \(\mathtt {zip2}\) into two separating singly-lined lists through the following equivalent lemma: \({\mathtt{{zip2}}}(\mathtt{root}{,}r_2{,}n) ~{\leftrightarrow } ~ {\mathtt{{lln}}}(\mathtt{root}{,}n) {*} {\mathtt{{lln}}}(r_2{,}n)\).

6 Implementation and Experiments

We have implemented the proposed ideas into a procedure called \(\mathtt {S2ENT}\) for entailment checking and frame inference, based on the \(\mathtt {SLEEK}\) [8]. \(\mathtt {S2ENT}\) relies on the SMT solver Z3 [27] to check satisfiability of arithmetical formulas. We have also integrated \(\mathtt {S2ENT}\) into the verifier \(\mathtt {S2}\) [24]. We have conducted two sets of experiments to evaluate the effectiveness and efficiency of \(\mathtt {S2ENT}\). The first set of experiments are conducted on a set of inductive entailment checking problems gathered from previous publications [1, 5, 9]. We compare \(\mathtt {S2ENT}\) with the state-of-the-art tools to see how many of these problems can be solved. In the second set of experiments, we apply \(\mathtt {S2ENT}\) to conduct modular verification of a set of non-trivial programs. The experiments are conducted on a machine with the Intel i3-M370 (2.4 GHz) processor and 3 GB of RAM.

Table 1. Inductive entailment checks

Entailment Proving. In Table 1, we evaluate \(\mathtt {S2ENT}\) on a set of 36 valid entailment problems that require induction reasoning techniques. In particular, Ent 1–5 were taken from Smallfoot [1], Ent 6–19 from [3, 5], Ent 20–28 from [9], and Ent 29–36 were generated by us. We evaluate \(\mathtt {S2ENT}\) against the existing proof systems presented for user-defined predicates. While the tools reported in [8, 12, 36] could handle a subset of these benchmarks if users provide auxiliary lemmas/axioms, [15] was designed neither for those inductive predicates in Ent 6–28 nor frame problems in Ent 29–36. The only two tools which we can compare \(\mathtt {S2ENT}\) with are  [3] and \(\mathtt {songbird}\) [40].

The experimental results are presented in Table 1. The second column shows the entailment problems. Column \(bl\) captures the number of back-links in cyclic proofs generated by \(\mathtt {S2ENT}\). We observe that most of problems require only one back-link in the cyclic proofs, except that Ent 4 requires two back-links and Ent 13–15 of mutual inductive odd-even singly linked lists require three back-links. The last three columns show the results of , \(\mathtt {songbird}\) and \(\mathtt {S2ENT}\) respectively. Each cell shown in these columns is either CPU times (in seconds) if the tool proves successfully, or TO if the tool runs longer than 30 s, or X if the tool returns a false positive, or NA if the entailment is beyond the capability of the tool. In summary, out of the 36 problems, solves 18 (with one TO - Ent 4); \(\mathtt {songbird}\) solves 25 (with two false positive - Ent 17 and 27 and one TO - Ent 23); and \(\mathtt {S2ENT}\) solves all 36 problems.

In Table 1, each entailment check in Ent 1–19 has \(\mathtt{emp}\) as frame axioms (their LHS and RHS have the same heaps). Hence, they may be handled by existing inductive proof systems like [3, 9, 15, 40]. In particular, Ent 1–19 include shape-only predicates. The results show that and \(\mathtt {songbird}\) ran a bit faster than \(\mathtt {S2ENT}\) in most of the their successful cases. It is expected as \(\mathtt {S2ENT}\) requires additional steps for frame inference. Each entailment check in Ent 20–28 includes inductive predicates with pure properties (e.g., size and sortedness). While can provide inductive reasoning for arithmetic and heap domains separately [5], there is no system proposed for cyclic proofs in the combined domain. Hence, these problems are beyond the capability of . Ent 20 which requires mutual induction reasoning is the motivating example of \(\mathtt {songbird}\) (agumented with size property) [40]. In particular, \(\mathtt {sortll}\) represents a sorted list with smallest value \(\mathtt {min}\), and tll is a binary tree whose nodes point to their parents and leaves are linked by a linked list [19, 24]. \(\mathtt {S2ENT}\) solves each entailment incrementally: shape-based frame and then pure properties. The results show that \(\mathtt {S2ENT}\) was more effective and efficient than \(\mathtt {songbird}\).

Each entailment check in Ent 29–36 requires both inductive reasoning and frame inference. These checks are beyond the capability of all existing entailment procedures for SL. \(\mathtt {S2ENT}\) generates frame axioms for inductive reasoning. The experiments show that the proposed proof system can support efficient and effective reasoning on both shape and numeric domains as well as inductive proofs and frame inference.

Table 2. Experiments on Glib library

Modular Verification for Memory Safety. We enhance the existing program verifier \(\mathtt {S2}\) [24] with \(\mathtt {S2ENT}\) to automatically verify a range of heap-manipulating programs. We evaluate the enhanced \(\mathtt {S2}\) on the C library Glib open source [16] which includes non-GUI code from the GTK+ toolkit and the GNOME desktop environment. We conduct experiments on heap-manipulating files, i.e., singly-linked lists (gslist.c), doubly-linked lists (glist.c), balanced binary trees (gtree.c) and N-ary trees (gnode.c). These files contain fairly complex algorithms (e.g., sortedness) and the data structures used in gtree.c and gnode.c are very complex. Some procedures of gslist.c and glist.c were evaluated by tools presented in [9, 31, 36] where the user had to manually provide a large number of lemmas to support the tool. Furthermore, the verification in [9] is semi-automatic, i.e., verification conditions were manually generated. Besides the tool in [9], tools in [31, 36] were no longer available for comparison.

In Table 2 we show, for each file the number of lines of code (excluding comments) LOC and the number of procedures #Pr. We remark that these procedures include tail-recursive procedures which are translated from loops. The columns (\(\#\surd \)) (and sec.) show the number of procedures (and time in seconds) for which \(\mathtt {S2}\) can verify memory safety without (wo.) and with (w.) \(\mathtt {S2ENT}\). Column #syn shows the number of synthesized lemmas that used the technique in Sect. 5. With the lemma synthesis, the number of procedures that can be successfully verified increases from 168 (81%) to 182 (88%) with a time overhead of 28% (157 s/123 s).

A closer look shows that with \(\mathtt {S2ENT}\) we are able to verify a number of challenging methods in gslist.c and glist.c. By generating separating lemmas, \(\mathtt {S2ENT}\) successfully infers shape specifications of methods manipulating the last element of lists (i.e., \(\mathtt {g\_slist\_concat}\) in \(\mathtt {gslist.c}\) and \(\mathtt {g\_list\_append}\) in \(\mathtt {glist.c}\)). By generating equivalence lemmas, matching a newly-inferred inductive predicate with predefined predicates in \(\mathtt {S2}\) is now extended beyond the shape-only domain. Moreover, the experimental results also show that the enhanced \(\mathtt {S2}\) were able to verify 41/52 procedures in gslist.c and 39/51 procedures in glist.c. In comparison, while the tool in [9] could semi-automatically verify 11 procedures in gslist.c and 6 procedures in glist.c, with user-supplied lemmas the tool in [31] could verify 22 procedures in gslist.c and 10 procedures in glist.c.

7 Related Work and Conclusion

This work is related to three groups of work. The first group are those on entailment procedures in SL. Initial proof systems in SL mainly focus on a decidable fragment combining linked lists (and trees) [1, 7, 11, 13, 14, 17, 22, 29, 32, 33]. Recently, Iosif et al. extend the decidable fragment to restricted inductive predicates [19]. Timos et al. [42] present a comprehensive summary on computational complexity for entailments in SL with inductive predicates. Smallfoot [1] and GRASShopper [33] provide systematic approaches for frame inference but with limited support for (general) inductive predicates. Extending these approaches to support general inductive predicates is non-trivial. GRASShopper is limited to a GRASS-reducible class of inductive predicates. While Smallfoot system has been designed to allow the use of general inductive predicates, the inference rules in Smallfoot are hardwired for list predicates only and a set of new rules must be developed for a proof system targeting general inductive predicates. SLEEK [8] and jStar [12] support frame inference with a soundness guarantee for general inductive predicates. However, they provide limited support for induction using user-supplied lemmas [12, 30]. Our work, like [8, 36], targets an undecidable SL fragment including (arbitrary) inductive predicates and numerical constraints; we trade completeness for expressiveness. In addition to what are supported in [8, 36], we support frame inference with inductive reasoning in SL by providing a system of cyclic proofs.

The second group is work on inductive reasoning. Lemmas are used to enhance the inductive reasoning of heap-based programs [5, 12, 30]. They are used as alternative unfoldings beyond predicates’ definitions [5, 30], external inference rules [12], or intelligent generalization to support inductive reasoning [3]. Unfortunately, the mechanisms in these systems require users to supply those additional lemmas that might be needed during a proof. SPEN [15] synthesizes lemmas to enhance inductive reasoning for some inductive predicates with bags of values. However, it is designed to support some specific classes of inductive predicates and it is difficult to extend it to cater for general inductive predicates. For a solution to inductive reasoning in SL, Smallfoot [1, 3, 5] presents subtraction rules that are consequent from a set of lemmas of lists and trees. Brotherston et al. propose cyclic proof system for the entailment problem [2, 3]. Similarly, the circularity rule has been introduced in matching logic [38], Constraint Logic Programming [9] and separation logic combined with predicate definitions and arithmetic [40]. Furthermore, work in [39] supports frame inference based on an ad-hoc mechanism, using a simple unfolding and matching. Like [3, 9, 40], our system also uses historical sequents at case split steps as induction hypotheses. Beyond these systems [3, 9, 15, 40], \(\mathtt {S2ENT}\) infers frames for inductive proofs systematically; and thus it gives a better support for modular verification of heap-manipulating programs. Moreover, we show how we can incrementally support inductive reasoning for the combination of heap and pure domains. In contrast, there are no formalized discussions in [5, 9, 40] about inductive reasoning for the combined domains; while [5] supports these domains separately, [9, 40] only demonstrates their support through experimental results.

The third group is on lemma synthesis. In inductive reasoning, auxiliary lemmas are generated to discover theorems (e.g. [10, 23, 28]). The key elements of these techniques are heuristics used to generate equivalent lemmas for sets of given functions, constants and datatypes. In our work, we introduce lemma synthesis to strengthen the inductive constraints. To support theorem discovery, we synthesize equivalent and separating lemmas. This mechanism can be extended to other heuristics to enhance the completeness of modular verification.

Conclusion. We have presented a novel approach to frame inference for inductive entailments in SL with inductive predicates and arithmetic. The core of our proposal is the system of lemma synthesis through cyclic proofs in which back-links are formed using the cut rule. Moreover, we have presented two extensions to strengthen the inferred frames. Our evaluation indicates that our system is able to infer frame axioms for inductive entailment checking that are beyond the capability of the existing systems.