figure a

1 Introduction

Separation logic (SL) [12, 18] is a popular formalism for Hoare-style verification of imperative, heap-manipulating programs. In particular, the symbolic heap separation logic fragment has received a lot of attention: Symbolic heaps serve as the basis of various automated verification tools, such as Infer [6], Sleek [7], Songbird [19], GRASSHopper [17], VCDryad [16], VeriFast [13], SLS [20], and Spen [9]. Many of the aforementioned tools rely on systems of inductive predicate definitions (SID) that serve as specifications of dynamic data structures, e.g., linked lists and trees.

At the heart of every Hoare-style verification procedure based on separation logic lies the entailment problem: Given two SL formulas, say \(\varphi \) and \(\psi \), is every model of \(\varphi \) also a model of \(\psi \)? While the entailment problem is undecidable in general [2], there are various approaches to decide entailments between symbolic heaps ranging from complete methods for fixed SIDs [3], over decision procedures for restricted classes of SIDs [10, 11], to incomplete approaches, such as fold/unfold reasoning [7] or cyclic proofs [5].

Among the largest decidable fragments of symbolic heaps with inductive definitions is the fragment of symbolic heaps with bounded tree-width (\(\text {SL}_{\text {btw}}\)) [10]. This fragment supports a rich class of data structures in SID specifications, such as doubly-linked lists and binary trees with linked leaves. \(\text {SL}_{\text {btw}}\) introduces three syntactic conditions on SIDs—progress, connectivity, and establishment—that enable a reduction from the entailment problem for \(\text {SL}_{\text {btw}}\) to the (decidable) satisfiability problem for monadic second-order logic (MSO) over graphs of bounded tree width. This gives rise to a decision procedure of non-elementary complexity—at least without an in-depth analysis of the quantifier alternations involved in the reduction. The reduction to MSO is also technically involved and has—to the best of our knowledge—never been implemented. The authors remark in the follow-up paper [11] that “the method from [10] causes a blowup of several exponentials in the size of the input problem and is unlikely to produce an effective decision procedure.”

Contributions. We give a new proof for the decidability of the entailment problem for the \(\text {SL}_{\text {btw}}\) fragment. In contrast to [10], we circumvent the reduction to MSO and give a direct model-theoretic construction with elementary complexity. This yields an easy-to-implement decision procedure for entailments in the full \(\text {SL}_{\text {btw}}\) fragment. We implemented our approach in the Harrsh analyzer and report on promising results for challenging examples (Sect. 6). In particular, we show that Harrsh can decide the entailment problem for data structure definitions for which no previous decision procedures have been implemented.

Fig. 1.
figure 1

An SID \({\varPhi }\) with three predicates for binary trees with parent pointers.

Fig. 2.
figure 2

treep

A challenging example. To highlight the challenges faced when developing and implementing decision procedures for entailments in \(\text {SL}_{\text {btw}}\), consider the SID \({\varPhi }\) consisting of the rules in Fig. 1.Footnote 1 There are three predicates, namely \(\mathtt {tree}\), \(\mathtt {rtree}\), and \(\mathtt {ltree}\), that specify binary trees with parent pointers (treep for short). The predicate \(\mathtt {tree}\) takes two parameters representing the root of the tree and its parent. Predicates \(\mathtt {rtree}\) and \(\mathtt {ltree}\) both have the leftmost leaf of the tree as an additional parameter. Such a parameter may, for example, be required to specify tree segments for an automated program analysis. Although both \(\mathtt {rtree}\) and \(\mathtt {ltree}\) describe treeps, they take radically different approaches: Predicate \(\mathtt {rtree}\) defines a treep starting at the root, i.e., it specifies the root of the treepand then states that both subtrees are treeps (the parameter representing the leftmost leaf is additionally passed to the left subtree). In contrast, predicate \(\mathtt {ltree}\) specifies treeps starting at the leftmost leaf and moving up to the root. Consequently, the models of these predicates are derived in completely different ways, which is a challenge for commonly applied approaches, such as fold/unfold (cf. [7]) or inductive reasoning (cf. [5, 19, 20]). In fact, the entailment \(\mathtt {ltree}(x_1,x_2,x_3) \models \mathtt {rtree}(x_2,x_3,x_1)\) holds, whereas the entailment \(\mathtt {rtree}(x_2,x_3,x_1) \models \mathtt {ltree}(x_1,x_2,x_3)\) is violated: Intuitively, \(\mathtt {rtree}\) admits models in which all shortest paths from the root to the leftmost leaf have length one. In contrast, for \(\mathtt {ltree}\), the minimal length of all shortest paths is two. Thus, the heap illustrated in Fig. 2 is a model of \(\mathtt {rtree}\), but not of \(\mathtt {ltree}\). In fact, if we rule out this model, \(\mathtt {rtree}\) and \(\mathtt {ltree}\) entail each other. That is, the entailment below and its converse are both valid:

figure b

Harrsh solved the entailment \((\clubsuit )\) from above in less than a second. The only other tool capable of successfully solving \((\clubsuit )\) is Slide [11], which is based on tree automata. However, the approach in [11] is not complete for \(\text {SL}_{\text {btw}}\).

Overview of our approach. We first present an algebra à la Courcelle [8] to systematically construct models of separation logic formulas (Sect. 2). This algebra enables us to conveniently formalize the semantics of separation logic (Sect. 3). To decide entailments, we then develop an abstraction mechanism for models with the following properties (Sect. 4):

  1. 1.

    The abstraction is compositional, i.e., we can perform our algebraic operations on abstractions instead of models (Theorem 2).

  2. 2.

    The abstraction is finite, i.e., each model of a predicate is abstracted to one of finitely many abstractions (Lemma 3).

  3. 3.

    The abstraction refines the predicate satisfaction relation, i.e., models with the same abstraction entail the same predicates among those relevant for the entailment (Lemma 2).

  4. 4.

    The abstraction is effective, i.e., for a given abstraction, one can determine which predicates are entailed (Theorem 3).

How do we obtain a decision procedure from these properties for an entailment, say \(\mathsf {pred}_1(\mathbf {x_1})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x_2})\)? We iteratively compute all abstractions corresponding to models of \(\mathsf {pred}_1(\mathbf {x_1})\). Due to compositionality (1), this can be achieved by applying the same operations used to construct models on previously computed abstractions until a fixed point is reached. Finiteness of the abstraction (2) ensures termination. We then exploit that the abstraction is well-defined (3) and effective (4) to decide the entailment: \(\mathsf {pred}_1(\mathbf {x_1})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x_2})\) holds iff all computed abstractions of models of \(\mathsf {pred}_1(\mathbf {x_1})\) yield that they are also models of \(\mathsf {pred}_2(\mathbf {x_2})\) (Sect. 5).

Due to space constraints, all proofs are in the supplementary material [1].

Notation. The set of all (non-empty) finite sequences over a set S is \(S^{*}\) (\(S^{+}\)). Bold letters denote sequences, e.g., \(\mathbf {x} = \left\langle x_1,\ldots ,x_k\right\rangle \). \(\mathbf {x}[i]\) refers to the i-th element of \(\mathbf {x}\). We often treat sequences as sets, i.e. we write \(y \in \mathbf {x}\) if y occurs in \(\mathbf {x}\), \(\mathbf {x} \cup \mathbf {z}\) for the set of all elements in \(\mathbf {x}\) or \(\mathbf {z}\), etc. \(f = \{x_1 \mapsto y_1,\ldots ,x_n \mapsto y_n\}\) is the function given by \(f(x_i) = y_i\) for \(i \in [1,n]\), \(n \ge 0\). Moreover, functions \(f:X \rightarrow Y\) are lifted to functions on sequences \(f:X^{*} \rightarrow Y^{*}\) by pointwise application.

Fig. 3.
figure 3

A heap graph modeling a list segment of length at least 5 from \(x_1\) to \(x_2\).

2 Heap Graphs

Separation logic is typically interpreted in terms of stack-heap pairs consisting of a stack, i.e., an evaluation of variables, and a heap, i.e., a finite mapping from memory locations to values. In our setting, however, it is more convenient to abstract from locations and consider labeled graphs.

Formally, let \(\mathbf {Var}\) be a set of variables containing a special variable \(\mathbf{null }\in \mathbf {Var}\). Moreover, let \(\mathbf {Preds}\) be a set of predicate identifiers; each predicate \(\mathsf {pred}\in \mathbf {Preds}\) is equipped with an arity \(\mathrm {ar}(\mathsf {pred}) \in \mathbb {N}\). \(\mathsf {pred}(\mathbf {x})\) is a predicate call if the length of sequence \(\mathbf {x} \in \mathbf {Var}^{*}\) is \(\mathrm {ar}(\mathsf {pred})\).

Definition 1

(Heap Graph). A heap graph \(\mathcal {M}= \left\langle \mathsf {Ptr}, \mathsf {FV}, \mathsf {calls}\right\rangle \) is a graph whose nodes are a finite set of variables in \(\mathbf {Var}\). The edges of \(\mathcal {M}\) are given by a partial points-to function \(\mathsf {Ptr}:\mathbf {Var}\setminus \{\mathbf{null}\} \rightharpoonup _{\textit{finite}}\mathbf {Var}^{+}\) mapping variables to finite tuples of variables. Moreover, \(\mathsf {FV}\subseteq \mathbf {Var}\) is a finite set of free variables and \(\mathsf {calls}\) is a finite set of predicate calls. A heap graph is concrete if \(\mathsf {calls}= \emptyset \). We collect all variables in \(\mathsf {Ptr}\), \(\mathsf {FV}\), and \(\mathsf {calls}\) in \(\mathsf {vars}(\mathcal {M})\). Finally, we write \(\mathsf {Ptr}_\mathcal {M}\), \(\mathsf {FV}_\mathcal {M}\), and \(\mathsf {calls}_\mathcal {M}\) to refer to the individual components of heap graph \(\mathcal {M}\).    \(\triangle \)

Example 1

Figure 3 depicts a heap graph modeling a singly-linked list of length at least five with head \(x_1\) and tail \(x_2\) (assuming the predicate call \(\mathtt {sll}(d,x_2)\) stands for non-empty lists segments from d to \(x_2\); see the left part of Fig. 5). In our graphical notation, every node corresponds to the variable it is labeled with. Gray nodes correspond to the free variables in \(\mathsf {FV}\). For each variable, say x, the pointers \(\mathsf {Ptr}(x) = \left\langle y_1,\ldots ,y_k\right\rangle \) are represented by directed edges—labeled with the position \(1,2,\ldots ,k\)—from the node labeled with x to nodes labeled with \(y_1,\ldots ,y_k\), respectively. We usually omit the edge labels if each node has at most one outgoing edge. Finally, a predicate call is drawn as a box labeled with the predicate call and connected to the nodes representing the variables occurring in the call’s parameters. Formally, the heap graph in Fig. 3 is given by \(\mathcal {M}= \left\langle \mathsf {Ptr}, \mathsf {FV}, \mathsf {calls}\right\rangle \) with points-to mapping \(\mathsf {Ptr}= \{ x_1 \mapsto a, a \mapsto b, b \mapsto c, c \mapsto d \}\), free variables \(\mathsf {FV}= \left\{ x_1,x_2\right\} \) and predicate calls \(\mathsf {calls}= \{ \mathtt {sll}(d,x_2) \}\).    \(\triangle \)

Heap graphs are an abstraction of the classical stack-heap model. To reason about separation logic with heap graphs (and their abstractions), we need a few operations for their systematic construction: Let \(f:\mathbf {Var}\rightarrow \mathbf {Var}\) be a partial function and \(f(\mathcal {M})\) its application to every variable in every component of \(\mathcal {M}\).

Isomorphic heap graphs. We call a variable \(x \in \mathbf {Var}\) an auxiliary variable of heap graph \(\mathcal {M}\) if x is not a free variable of \(\mathcal {M}\). Throughout this article, we do not distinguish between isomorphic heap graphs, i.e., heap graphs that are identical up to renaming of auxiliary variables. Formally, two heap graphs \(\mathcal {M}_1\) and \(\mathcal {M}_2\) are isomorphic, written \(\mathcal {M}_1 \,\cong \,\mathcal {M}_2\), if there exists a bijective function \(f:\mathsf {vars}(\mathcal {M}_1) \rightarrow \mathsf {vars}(\mathcal {M}_2)\) such that (1) \(\mathsf {FV}_{\mathcal {M}_1} = \mathsf {FV}_{\mathcal {M}_2}\), (2) \(f(x) = x\) for all \(x \in \mathsf {FV}_{\mathcal {M}_1}\), and (3) \(f(\mathcal {M}_1) = \mathcal {M}_2\).

Renaming heap graphs. Our first operation enables renaming of free variables. Formally, let \(\mathcal {M}\) be a heap graph and \(\mathbf {x} \in \mathsf {FV}_{\mathcal {M}}^{*}\), \(\mathbf {y} \in \mathbf {Var}^{*}\) be repetition free sequences of variables of the same length. Then the renaming of \(\mathbf {x}\) to \(\mathbf {y}\) in \(\mathcal {M}\) is given by \(\mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {M}) = f(\mathcal {M})\), where

$$\begin{aligned} f:\mathbf {Var}\rightarrow \mathbf {Var},\quad z \mapsto {\left\{ \begin{array}{ll} \mathbf {y}[i] &{} ~\text {if}~ \mathbf {x}[i] = z \\ z &{} ~\text {otherwise}. \end{array}\right. } \end{aligned}$$

Composition. Our next operation allows composing heap graphs by “gluing” them together at their common free variables. Formally, let \(\mathcal {M}_1,\mathcal {M}_2\) be heap graphs such that (1) \(\mathsf {vars}(\mathcal {M}_1) \cap \mathsf {vars}(\mathcal {M}_2) \subseteq \mathsf {FV}_{\mathcal {M}_1} \cap \mathsf {FV}_{\mathcal {M}_2}\) and (2) \(\mathsf {Ptr}_{\mathcal {M}_1}\) and \(\mathsf {Ptr}_{\mathcal {M}_1}\) are domain disjoint, i.e., \({{\,\mathrm{dom}\,}}(\mathsf {Ptr}_{\mathcal {M}_1}) \cap {{\,\mathrm{dom}\,}}(\mathsf {Ptr}_{\mathcal {M}_2}) = \emptyset \). Then the componentwise union \(\mathcal {M}_1 \cup \mathcal {M}_2\) of \(\mathcal {M}_1\) and \(\mathcal {M}_2\) is \(\langle \mathsf {Ptr}_{\mathcal {M}_1} \cup \mathsf {Ptr}_{\mathcal {M}_2}, \mathsf {FV}_{\mathcal {M}_1} \cup \mathsf {FV}_{\mathcal {M}_2}, \mathsf {calls}_{\mathcal {M}_1} \cup \mathsf {calls}_{\mathcal {M}_2} \rangle \). Otherwise, \(\mathcal {M}_1 \cup \mathcal {M}_2\) is undefined. We then define the composition \(\mathcal {M}_1 \,\bullet \,\mathcal {M}_2\) of heap graphs \(\mathcal {M}_1,\mathcal {M}_2\) as

$$\begin{aligned} \mathcal {M}_1 \,\bullet \,\mathcal {M}_2 = {\left\{ \begin{array}{ll} \mathcal {M}_1 \cup \mathcal {M}&{} ~\text {where}~\mathcal {M}\,\cong \,\mathcal {M}_2 ~\text {and}~ \mathcal {M}_1 \cup \mathcal {M}~\text {is defined} \\ \text {undefined} &{} ~\text {otherwise}. \end{array}\right. } \end{aligned}$$
Fig. 4.
figure 4

Illustration of composition of two heap graphs.

Example 2

Figure 4 depicts the composition of two heap graphs representing lists of length two. Since both heap graphs share a variable \(a \notin \mathsf {FV}\), we first compute an isomorphic heap graph in which variable a is substituted by c in the second graph. Both heap graphs are then merged at their common free variable b. This results in a heap graph modeling a list of length four.    \(\triangle \)

Forgetting free variables. To construct larger heap graphs from smaller ones, we often need additional free variables to glue the right nodes together, e.g., the variable b in Example 2. Consequently, we need a mechanism for subsequent removal of these variables from the set of free variables. To this end, for every heap graph \(\mathcal {M}\) and sequence of free variables \(\mathbf {x} \in \mathsf {FV}_{\mathcal {M}}^{*}\), we define the operation \(\mathsf {forget}_{\mathbf {x}}(\mathcal {M}) = \left\langle \mathsf {Ptr}_{\mathcal {M}}, \mathsf {FV}_{\mathcal {M}} \setminus \mathbf {x}, \mathsf {calls}_{\mathcal {M}}\right\rangle \).

Single allocations. The simplest non-empty heap graph is a single variable, say x with pointers to a sequence \(\mathbf {y}\) of finitely many other variables. We write \(x \rightarrowtail \mathbf {y}\) to denote this single-allocation heap graph \(\left\langle \left\{ x \mapsto \mathbf {y}\right\} ,\left\{ x\right\} \cup \mathbf {y}, \emptyset \right\rangle \).

Theorem 1

([8]). Every non-empty heap graph of tree width at most k can be constructed from heap graphs \(x \rightarrowtail \mathbf {y}\), renaming, composition, and forgetting using at most \(k+1\) free variables.

3 Symbolic Heap Separation Logic

We consider the symbolic heap fragment of separation logic with user-defined inductive predicate definitions. We omit pure formulas to simplify the presentation. Notice, however, that our implementation supports reasoning about symbolic heaps with pure formulas.

Syntax. The syntax of our simplified symbolic heap fragment is then given by the following context-free grammar:

$$\begin{aligned} \varphi \,{::=}\,\mathbf {emp}~|~ x \mapsto \mathbf {y} ~|~ \mathsf {pred}(\mathbf {y}) ~|~ \exists x:\varphi ~|~ \varphi * \varphi , \end{aligned}$$

where \(x \in \mathbf {Var}\setminus \{\mathbf{null }\}\) is a variable, \(\mathbf {y} \in \mathbf {Var}^{+}\) is a sequence of variables, and \(\mathsf {pred}(\mathbf {y})\) is a predicate call. Here, \(\mathbf {emp}\) is the empty heap, \(x \mapsto \mathbf {y}\) asserts that x points-to the locations captured by \(\mathbf {y}\), \(\exists x:\varphi \) is existential quantification, and \(*\) is the separating conjunction. Because \(*\) is commutative and associative and because existential quantifiers can always be moved to the front, we will always consider symbolic heaps to be of form \(\exists \mathbf {y}:(x_1 \mapsto \mathbf {y}_1) * \cdots * (x_m \mapsto \mathbf {y}_m) * \mathsf {pred}_1(\mathbf {z}_1) * \cdots * \mathsf {pred}_n(\mathbf {z}_n)\).

Inductive definitions. Before we assign formal semantics to symbolic heaps, we clarify how custom predicates are specified. To this end, a system of inductive definitions (SID) is a finite set \({\varPhi }\) of rules of the form \(\mathsf {pred}~\Leftarrow ~\varphi \), where \(\mathsf {pred}\in \mathbf {Preds}\) is a predicate symbol and \(\varphi \) is a symbolic heap. We assume that all symbolic heaps of rules with head \(\mathsf {pred}\) have the same sequence of free variables \((x_1,\ldots ,x_{\mathrm {ar}(\mathsf {pred})})\)Footnote 2 and collect these variables in the set \(\mathsf {fv}(\mathsf {pred})\). Moreover, we collect all predicates that occur in SID \({\varPhi }\) in the set \(\mathbf {Preds}({\varPhi })\) and all rules of SID \({\varPhi }\) in the set \(\mathbf {Rules}({\varPhi })\). Examples of SIDs are found in Figs. 1 and 5.

Semantics. We define the semantics of symbolic heaps \(\varphi \) for a given SID \({\varPhi }\) in terms of a force relation \(\models _{{\varPhi }}\), which determines whether a heap graph \(\mathcal {M}\) satisfies \(\varphi \). To this end, let \(\varphi [\mathbf {x} / \mathbf {y}]\) denote the symbolic heap \(\varphi \) in which every free occurrence of variable \(\mathbf {x}[i]\) is substituted by variable \(\mathbf {y}[i]\), where \(1 \le i \le |\mathbf {x}| = |\mathbf {y}|\). Then the relation \(\models _{{\varPhi }}\) is defined inductively on the syntax of symbolic heaps:

$$\begin{aligned} \mathcal {M}\,\models _{{\varPhi }}\,\mathbf {emp}\,\,\,\text {iff}\,\,\,&\text {ex.}~\mathbf {x} \in \mathbf {Var}^{*} ~\text {s.t.}~ \mathcal {M}= \left\langle \emptyset , \mathbf {x}, \emptyset \right\rangle \\ \mathcal {M}\,\models _{{\varPhi }}\,x \mapsto \mathbf {y}\,\,\,\text {iff}\,\,\,&\text {ex.}~\mathbf {z} \supseteq \{x\} \cup \mathbf {y} ~\text {s.t.}~ \mathcal {M}= \left\langle \{ x \mapsto \mathbf {y} \}, \mathbf {z}, \emptyset \right\rangle \\ \mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}(\mathbf {y})\,\,\,\text {iff}\,\,\,&\text {ex.}~ \mathbf {z}\supseteq \mathbf {y} ~\text {s.t.}~ \mathcal {M}\,\cong \,\left\langle \emptyset ,\mathbf {z},\{\mathsf {pred}(\mathbf {y})\}\right\rangle \\&\text {or ex.}~(\mathsf {pred}~\Leftarrow ~\psi ) \in \mathbf {Rules}({\varPhi }) ~\text {s.t.}~ \mathcal {M}\,\models _{{\varPhi }}\,\psi [\mathsf {fv}(\mathsf {pred}) / \mathbf {y}]\\ \mathcal {M}\,\models _{{\varPhi }}\,\exists x:\varphi \,\,\,\text {iff}\,\,\,&\text {ex.}~y \in \mathbf {Var}~\text {s.t.}~ \left\langle \mathsf {Ptr}_{\mathcal {M}},\mathsf {FV}_{\mathcal {M}}\cup \{y\},\mathsf {calls}_{\mathcal {M}}\right\rangle \,\models _{{\varPhi }}\,\varphi [x/y]\\ \mathcal {M}\,\models _{{\varPhi }}\,\varphi _1 * \varphi _2\,\,\,\text {iff}\,\,\,&\text {ex.}~\mathcal {M}_1,\mathcal {M}_2 ~\text {s.t.}~ \mathcal {M}\,\cong \,\mathcal {M}_1 \,\bullet \,\mathcal {M}_2 \\&\text {and}~ \mathcal {M}_1\,\models _{{\varPhi }}\,\varphi _1 ~\text {and}~ \mathcal {M}_2\,\models _{{\varPhi }}\,\varphi _2 \end{aligned}$$

The above semantics coincides with the standard least fixed-point semantics of symbolic heaps (cf. [4]) for stack-heap pairs if we restrict ourselves to concrete heap graphs. Moreover, there is a strong relationship between our SL semantics and the operations on heap graphs defined in Sect. 2.

Lemma 1

Let \(\varphi = \exists \mathbf {y}:(x_1 \mapsto \mathbf {y}_1) * \cdots * (x_m \mapsto \mathbf {y}_m) * \mathsf {pred}_1(\mathbf {z}_1) * \cdots * \mathsf {pred}_n(\mathbf {z}_n)\) be a symbolic heap. \(\mathcal {M}\,\models _{{\varPhi }}\,\varphi \) iff there exist \(\mathcal {M}_1,\ldots ,\mathcal {M}_{m+n}\) such that (1) \(\mathcal {M}_i\,\models _{{\varPhi }}\,x \mapsto \mathbf {y}_i\) for \(1 \le i \le m\), (2) \(\mathcal {M}_{m+j}\,\models _{{\varPhi }}\,\mathsf {pred}_{j}(\mathsf {fv}(\mathsf {pred}_{j}))\) for \(1 \le j \le n\), and (3) \(\mathcal {M}= \mathsf {forget}_{\mathbf {y}}( \mathcal {M}_1 \,\bullet \,\cdots \,\bullet \,\mathcal {M}_m\,\bullet \,\mathsf {rename}_{\mathsf {fv}(\mathsf {pred}_1),\mathbf {z}_1}(\mathcal {M}_{m+1}) \,\bullet \,\cdots \,\bullet \,\mathsf {rename}_{\mathsf {fv}(\mathsf {pred}_n),\mathbf {z}_n}(\mathcal {M}_{m+n}))\).

Symbolic heaps with bounded tree-width. Our goal is to develop a decision procedure for symbolic heaps with inductive definitions in the bounded tree-width fragment developed by Iosif et al. [10]. This fragment imposes three conditions on SIDs, which we assume for all SIDs \({\varPhi }\) considered in the following:

  1. 1.

    Progress: Every rule allocates exactly one variable x, i.e. every rule contains exactly one points-to assertion \(x \mapsto \mathbf {y}\).

  2. 2.

    Connectivity: Every predicate call \(\mathsf {pred}(\mathbf {z})\) of a rule has a parameter \(\mathbf {z}[i]\) that is referenced by the rule’s allocated variable, i.e., \(\mathbf {z}[i] \in \mathbf {y}\). Moreover, the i-th free variable of predicate \(\mathsf {pred}\) must be allocated in all rules \(\mathsf {pred}~\Leftarrow ~\varphi \) of \({\varPhi }\).

  3. 3.

    Establishment: All existentially quantified variables are eventually allocated.

Assumptions. We make two further assumptions for all SIDs throughout this paper: (1) Predicates are called with pairwise different parameters. (2) Unfolding predicates (iteratively substituting predicate calls \(\mathsf {pred}(\mathbf {y})\) with the right-hand sides \(\varphi [\mathsf {fv}(\mathsf {pred})/\mathbf {y}]\) of rules \(\mathsf {pred}~\Leftarrow ~\varphi \)) always yields satisfiable symbolic heaps. SIDs can be transformed automatically to satisfy (1) and (2) before applying our decision procedure (cf. [1, 14]). The SIDs in Figs. 1 and 5 satisfy all assumptions.

4 Profiles: An Abstraction for Concrete Heap Graphs

Entailment problem. We present our approach for entailments \(\mathsf {pred}_1(\mathbf {x})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {y})\) between predicate calls \(\mathsf {pred}_1(\mathbf {x})\), and \(\mathsf {pred}_2(\mathbf {y})\) of an SID \({\varPhi }\). We discuss the treatment of more general entailments at the end of Sect. 5. Formally, the entailment \(\mathsf {pred}_1(\mathbf {x})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {y})\) holds iff for all concrete heap graphs \(\mathcal {M}\), we have \(\mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}_1(\mathbf {x})\) implies \(\mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {y})\).

Model reconstruction. Recall from Lemma 1 that \(\mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}_1(\mathbf {x})\) can be interpreted as being able to construct \(\mathcal {M}\) as a model of \(\mathsf {pred}_1(\mathbf {x})\) using the rules of SID \({\varPhi }\) and our operations on heap graphs introduced in Sect. 2. To prove the entailment \(\mathsf {pred}_1(\mathbf {x})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {y})\), we then have to “reconstruct” any such \(\mathcal {M}\) as a model of \(\mathsf {pred}_2(\mathbf {y})\). Since infinitely many model reconstructions might be required—after all there might be infinitely many \(\mathcal {M}\) with \(\mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}_1(\mathbf {x})\)—we now develop an abstraction of heap graphs such that finitely many abstract model reconstructions suffice to cover all models of \(\mathsf {pred}_1(\mathbf {x})\).

Running example. To sharpen our intuition, we present the technical details of our abstraction together with a running example: Fig. 5 shows an SID \({\varPhi }_{\text {lists}}\) specifying predicates for various singly-linked list segments. The predicate \(\mathtt {sll}(x_1,x_2)\) specifies non-empty singly-linked list segments with head \(x_1\) and tail \(x_2\). Similarly, the predicates \(\mathtt {odd}(x_1,x_2)\) and \(\mathtt {even}(x_1,x_2)\) restrict such list segments to odd and even length, respectively. In the remainder of this and the next section, we will use our abstraction to show that the entailment \(\mathtt {sll}(x_1,x_2) \models _{{\varPhi }_{\text {lists}}}\mathtt {odd}(x_1,x_2)\) does not hold.

Fig. 5.
figure 5

SIDs \({\varPhi }_{\text {sll}}\) (left) and \({\varPhi }_{\text {o/e}}\) (right) specifying singly-linked list segments with head \(x_1\) and tail \(x_2\). Moreover, we define \({\varPhi }_{\text {lists}}= {\varPhi }_{\text {sll}}\cup {\varPhi }_{\text {o/e}}\).

4.1 Context Profiles as an Abstract Domain

Contexts. Our proposed abstraction is based on contexts. Intuitively, every context describes an extension of a concrete heap graph by predicate calls such that the resulting graph satisfies a fixed predicate call. Thus, contexts reveal what is missing in a concrete heap graph to reconstruct models of predicate calls.

Definition 2

(Context). A triple \(\mathcal {C}= \left\langle \mathsf {V},\mathsf {pred}(\mathbf {x}),\mathsf {calls}\right\rangle \) is a context of a concrete heap graph \(\mathcal {M}\) w.r.t. SID \({\varPhi }\) if (1) \(\mathsf {V} = \mathsf {FV}_\mathcal {M}\), (2) \(\left\langle \mathsf {Ptr}_{\mathcal {M}}, \mathbf {x}, \mathsf {calls}\right\rangle \,\models _{{\varPhi }}\,\mathsf {pred}(\mathbf {x})\), and (3) neither \(\mathbf {x}\) nor \(\mathsf {calls}\) contain auxiliary variables of \(\mathcal {M}\). Moreover, we define the set of free variables of context \(\mathcal {C}\) as \(\mathsf {fv}(\mathcal {C}) := \mathsf {V}\). We call variables in \(\mathbf {x}\) or \(\mathsf {calls}\), but not in \(\mathsf {fv}(\mathcal {C})\), the auxiliary variables of \(\mathcal {C}\).    \(\triangle \)

Example 3

Figure 6 shows contexts for two concrete heap graphs \(\mathcal {M}_{\text {odd}}\) and \(\mathcal {M}_{\text {even}}\) of odd and even length (without dashes), respectively. The extension by calls from the contexts is illustrated by dashed lines. Intuitively, context \(\mathcal {C}_1\) states that no extension of \(\mathcal {M}_{\text {odd}}\) is needed to obtain a model of predicate \(\mathtt {odd}(x_1,x_2)\). Context \(\mathcal {C}_2\) states that—in order to obtain an odd list segment from \(x_1\) to a, where a is an additional free variable—we have to add an even list segment from \(x_2\) to a. Similarly, we obtain an even list segment from \(x_1\) to some fresh variable a by adding an odd list segment from \(x_2\) to a. The interpretation of contexts \(\mathcal {C}_4\), \(\mathcal {C}_5\), and \(\mathcal {C}_6\) of \(\mathcal {M}_{\text {even}}\) is analogous.    \(\triangle \)

Contexts decompositions. A context of heap graph \(\mathcal {M}\) stores the free variables of \(\mathcal {M}\). These variables are important, because additional free variables might allow to split a heap graph into several smaller ones. For example, the additional free variable b in Fig. 4 (read from right to left) allows to decompose a list into two lists. Since our goal is to develop a compositional abstraction, we have to take contexts of decompositions of heap graphs into account. In general, these decompositions are relevant for entailment when considering more complicated SIDs, e.g., doubly-linked binary trees or trees with linked leaves. We thus have to compute decompositions \(\mathcal {M}_1 \,\bullet \,\ldots \,\bullet \,\mathcal {M}_k\), \(k \ge 1\), of a concrete heap graph \(\mathcal {M}\) and then consider a context for each component.

Definition 3

(Context decomposition). A context decomposition of a concrete heap graph \(\mathcal {M}\) w.r.t. SID \({\varPhi }\) is a set \(\mathcal {E}= \{\mathcal {C}_1,\ldots ,\mathcal {C}_k\}\) such that \(\mathcal {M}= \mathcal {M}_1 \,\bullet \,\ldots \,\bullet \,\mathcal {M}_k\), \(k \ge 1\), is a decomposition of \(\mathcal {M}\) and \(\mathcal {C}_1,\ldots ,\mathcal {C}_k\) are contexts of the concrete heap graphs \(\mathcal {M}_1,\ldots ,\mathcal {M}_k\) w.r.t. \({\varPhi }\), respectively. Moreover, we define the set of free variables of context decomposition \(\mathcal {E}\) as \(\mathsf {fv}(\mathcal {E}) := \bigcup _{\mathcal {C}\in \mathcal {E}} \mathsf {fv}(\mathcal {C})\).    \(\triangle \)

Fig. 6.
figure 6

Contexts of concrete heap graphs \(\mathcal {M}_{\text {odd}}\) (first graph) and \(\mathcal {M}_{\text {even}}\) (fourth graph). The extensions by a context are drawn in dashed lines.

Example 4

The concrete heap graph \(\mathcal {M}_{\text {odd}}\) in Fig. 6 cannot be decomposed into smaller graphs due to a lack of free variables. Hence, context decompositions of \(\mathcal {M}_{\text {odd}}\) are singletons consisting of \(\mathcal {C}_1\), \(\mathcal {C}_2\), and \(\mathcal {C}_3\) in Fig. 6, respectively.    \(\triangle \)

Profiles. As the above example shows, concrete heap graphs may have multiple context decompositions. We thus abstract a concrete heap graph \(\mathcal {M}\) by the set of all context decompositions of \(\mathcal {M}\):

Definition 4

(Profiles). The profile \(\mathsf {profile}_{\varPhi }(\mathcal {M})\) of a concrete heap graph \(\mathcal {M}\) w.r.t. SID \({\varPhi }\) is the set of all context decompositions of \(\mathcal {M}\) w.r.t. \({\varPhi }\). Moreover, since all \(\mathcal {E}\in \mathcal {P}\) have the same free variables, we define the free variables of \(\mathcal {P}\) as \(\mathsf {fv}(\mathcal {P}) := \mathsf {fv}(\mathcal {E})\) for some \(\mathcal {E}\in \mathcal {P}\).    \(\triangle \)

Refinement property. We propose profiles as a suitable abstraction for deciding entailments. We will argue that they comply with the four essential correctness properties discussed in Sect. 1: refinement, finiteness, compositionality, and effectiveness. Refinement means that two concrete heap graphs with the same profiles entail the same SID predicates. Hence, for each profile and predicate \(\mathsf {pred}\), it suffices to find a single model of \(\mathsf {pred}\) with that profile. Formally,

Lemma 2

Let \(\mathcal {M},\mathcal {M}'\) be concrete heap graphs with \(\mathsf {profile}_{\varPhi }(\mathcal {M}) = \mathsf {profile}_{\varPhi }(\mathcal {M}')\). Then, for all \(\mathsf {pred}\in \mathbf {Preds}({\varPhi })\), we have \(\mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}(\mathbf {x})\) iff \(\mathcal {M}'\,\models _{{\varPhi }}\,\mathsf {pred}(\mathbf {x})\).

Finiteness. In general, the set of profiles of concrete heap graphs is infinite due to different names for additional free variables, e.g., variable a in Fig. 6. To obtain a finite set of profiles, we thus (a) limit the total number of free variables, (b) consider profiles up to renaming of additional free variables, and (c) exploit the connectivity condition. Notice that condition (a) is not a restriction, because the number of free variables for every SID and thus every entailment query is bounded. For condition (b), we have to lift the notion of isomorphism from heap graphs to profiles. Formally, contexts \(\mathcal {C}_1 = \left\langle \mathbf {z}_1,\mathsf {pred}_1(\mathbf {x}_1),\mathsf {calls}_1\right\rangle \) and \(\mathcal {C}_2 = \left\langle \mathbf {z}_2,\mathsf {pred}_2(\mathbf {x}_2),\mathsf {calls}_2\right\rangle \) are isomorphic iff \(\mathbf {z}_1 = \mathbf {z_2}\), \(\mathsf {pred}_1 = \mathsf {pred}_2\) and there exists a bijective function \(f:\mathbf {Var}\rightarrow \mathbf {Var}\) such that (1) for all \(z \in \mathbf {z}_1\), \(f(z) = z\), (2) \(f(\mathbf {x}_1) = \mathbf {x_2}\), and (3) \(\mathsf {calls}_2 = \left\{ \mathsf {pred}(f(\mathbf {y})) \mid \mathsf {pred}(\mathbf {y}) \in \mathsf {calls}_1\right\} \). Moreover, two context decompositions \(\mathcal {E}_1\), \(\mathcal {E}_2\) are isomorphic iff for all \(i \in \left\{ 1,2\right\} \) and contexts \(\mathcal {C}\in \mathcal {E}_{i}\) there is a context \(\mathcal {C}' \in \mathcal {E}_{3-i}\) that is isomorphic to \(\mathcal {C}\). Analogously, two profiles \(\mathcal {P}_1\), \(\mathcal {P}_2\) are isomorphic iff for all \(i \in \left\{ 1,2\right\} \) and context decompositions \(\mathcal {E}\in \mathcal {P}_i\) there exists a context decomposition \(\mathcal {E}' \in \mathcal {P}_{3-i}\) that is isomorphic to \(\mathcal {E}_i\).

Throughout this paper, we do not distinguish between isomorphic contexts, context decompositions, or profiles.

Lemma 3

For every SID \({\varPhi }\) and variable sequence \(\mathbf {x}\in \mathbf {Var}^{*}\), the set of profiles is finite up to profile isomorphism.

Example 5

Recall from Fig. 5 the SID \({\varPhi }_{\text {o/e}}\). Moreover, recall from Fig. 6 the concrete heap graphs \(\mathcal {M}_{\text {odd}}\) and \(\mathcal {M}_{\text {even}}\) and their contexts \(\mathcal {C}_1\), \(\mathcal {C}_2\), \(\mathcal {C}_3\) and \(\mathcal {C}_4\), \(\mathcal {C}_5\), \(\mathcal {C}_6\), respectively. Then the profiles of \(\mathcal {M}_{\text {odd}}\) and \(\mathcal {M}_{\text {even}}\) w.r.t. \({\varPhi }_{\text {o/e}}\) are (up to isomorphism) \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}}) = \left\{ \{\mathcal {C}_1\},\{\mathcal {C}_2\},\{\mathcal {C}_2\}\right\} \) and \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}}) = \left\{ \{\mathcal {C}_4\},\{\mathcal {C}_5\},\{\mathcal {C}_6\}\right\} \). In fact, the profile of every singly-linked list segment from \(x_1\) to \(x_2\) of odd (even) length is isomorphic to \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}})\) (\(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}})\)). Hence, the profile of every model of the singly-linked list predicate \(\mathtt {sll}(x_1,x_2)\) is either \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}})\) or \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}})\).    \(\triangle \)

4.2 Computation of Profiles

Due to Lemmas 2 and 3, we can decide an entailment \(\mathsf {pred}_1(\mathbf {x})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x})\), once the profiles of all models of \(\mathsf {pred}_1(\mathbf {x})\) with respect to the rules relevant for \(\mathsf {pred}_2(\mathbf {x})\) are known. The key insight underlying our entailment checker is that profiles can be computed automatically in a compositional manner. To this end, recall from Theorem 1 that every concrete heap graph can be constructed from single-allocation heap graphs \(x \rightarrowtail \mathbf {y}\) by means of renaming, forgetting, and composition. We exploit this by (1) devising an algorithm to compute \(\mathsf {profile}_{\varPhi }(x \rightarrowtail \mathbf {y})\) and (2) lifting the operations \(\mathsf {rename}_{\mathbf {x},\mathbf {y}}\), \(\mathsf {forget}_{\mathbf {x}}\), and \(\,\bullet \,\) for renaming, forgetting, and composition of heap graphs to operations \(\overline{\mathsf {rename}}_{\mathbf {x},\mathbf {y}}\), \(\overline{\mathsf {forget}}_{\mathbf {x}}\), and \(\,\overline{\bullet }\,\) on profiles.

Profiles of single allocations. Since single allocations \(x \rightarrowtail \mathbf {y}\) cannot be further decomposed, every context decomposition of \(x \rightarrowtail \mathbf {y}\) w.r.t. an SID \({\varPhi }\) is a singleton. Due to the progress condition, every rule of \({\varPhi }\) contains exactly one points-to assertion. For each SID rule \(\mathsf {pred}~\Leftarrow ~\exists \mathbf {z}:x' \mapsto \mathbf {y}' * \mathsf {pred}_1(\mathbf {y}_1)*\cdots *\mathsf {pred}_k(\mathbf {y}_k)\), the corresponding context \(\left\langle \{x'\} \cup \mathbf {y}', \mathsf {pred}(\mathbf {x}), \{\mathsf {pred}_1(\mathbf {y}_1),\ldots ,\mathsf {pred}_k(\mathbf {y}_k)\}\right\rangle \) must be in the profile of \(x \rightarrowtail \mathbf {y}\) iff \(x \rightarrowtail \mathbf {y}\) is a model of \(\exists \mathbf {z}:x' \mapsto \mathbf {y}'\). Hence:

Lemma 4

Profiles of single allocations, i.e., \(\mathsf {profile}_{\varPhi }(x \rightarrowtail \mathbf {y})\), are computable.

Rename for profiles. We lift the operation \(\mathsf {rename}_{\mathbf {x},\mathbf {y}}\), which renames each variable in \(\mathbf {x}\) to the corresponding variable in \(\mathbf {y}\) according to their position, from heap graphs to contexts, context decompositions, and profiles by componentwise application. That is, for a context \(\mathcal {C}= \left\langle \mathbf {z}, \mathsf {pred}(\mathbf {u}),\mathsf {calls}\right\rangle \), a context decomposition \(\mathcal {E}\), and a profile \(\mathcal {P}\), we define:

$$\begin{aligned} \mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {C}) \,:=\,&\langle \mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathbf {z}), \mathsf {pred}(\mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathbf {u})), \\&\qquad \left\{ \mathsf {pred}'(\mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathbf {v}) \mid \mathsf {pred}'(\mathbf {v}) \in \mathsf {calls}\right\} \rangle \\ \mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {E}) \,:=\,&\left\{ \mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {C}) ~|~ \mathcal {C}\in \mathcal {E}\right\} \\ \overline{\mathsf {rename}}_{\mathbf {x},\mathbf {y}}(\mathcal {P}) \,:=\,&\{ \mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {E}) ~|~ \mathcal {E}\in \mathcal {P}\} \end{aligned}$$

Forget for profiles. Next, we lift the operation \(\mathsf {forget}_{\mathbf {x}}\), which removes variables in \(\mathbf {x}\) from the set of free variables, to contexts, context decompositions, and profiles. For a profile, forgetting a free variable means that some of its constituting context decompositions do not have to be considered anymore, because the composition of their underlying models is no longer defined. Hence, these decompositions are removed. Formally, for a context \(\mathcal {C}= \left\langle \mathbf {z}, \mathsf {pred}(\mathbf {u}),\mathsf {calls}\right\rangle \), a context decomposition \(\mathcal {E}\), and a profile \(\mathcal {P}\), we define:

$$\begin{aligned} \mathsf {forget}_{\mathbf {x}}(\mathcal {C}) \,:=\,&\left\langle \mathbf {z} \setminus \mathbf {x}, \mathsf {pred}(\mathbf {u}), \mathsf {calls}\right\rangle \qquad \mathsf {forget}_{\mathbf {x}}(\mathcal {E}) \,:=\, \left\{ \mathsf {forget}_{\mathbf {x}}(\mathcal {C}) \mid \mathcal {C}\in \mathcal {E}\right\} \\ \overline{\mathsf {forget}}_{\mathbf {x}}(\mathcal {P}) \,:=\,&\left\{ \mathsf {forget}_{\mathbf {x}}(\mathcal {E}) \mid \mathcal {E}\in \mathcal {P}~\text {and}~\mathbf {x} \cap \mathsf {usedvs}(\mathcal {E}) = \emptyset \right\} \\ \mathsf {usedvs}(\mathcal {E}) \,:=\,&\bigcup _{\mathcal {C}\in \mathcal {E}} \mathsf {usedvs}(\mathcal {C}) ~\quad \qquad \qquad \mathsf {usedvs}(\mathcal {C}) \,:=\, \mathbf {u} \,\cup \, \bigcup _{\mathsf {pred}'(\mathbf {y}) \in \mathsf {calls}} \mathbf {y} \end{aligned}$$

Composition for profiles. It remains to lift heap graph composition to profiles. This is formalized as substituting predicate calls of contexts by other contexts:

Definition 5

(Context substitution). Let \(\mathcal {C}_1 = \left\langle \mathbf {x}_1, \mathsf {pred}_1(\mathbf {z}_1),\mathsf {calls}_1\right\rangle \) and \(\mathcal {C}_2 = \left\langle \mathbf {x}_2,\mathsf {pred}_2(\mathbf {z}_2),\mathsf {calls}_2\right\rangle \) be contexts such that (1) \(\mathsf {pred}_1(\mathbf {z}_1) \in \mathsf {calls}_2\) and (2) no auxiliary variable of \(\mathcal {C}_2\) is a free variable of \(\mathcal {C}_1\) and vice versa. Then the substitution of \(\mathsf {pred}_1(\mathbf {z})\) in \(\mathcal {C}_2\) by \(\mathcal {C}_1\) is given by

figure c

To compose profiles, we attempt to substitute the underlying contexts with each other in all possible ways. Formally, a context decomposition \(\mathcal {E}_1\) derives a context decomposition \(\mathcal {E}_2\), written \(\mathcal {E}_1 \triangleright \mathcal {E}_2\), iff there exist contexts \(\mathcal {C}_1,\mathcal {C}_2 \in \mathcal {E}_1\) such that \(\mathcal {E}_2 = \left( \mathcal {E}_1 \setminus \{\mathcal {C}_1,\mathcal {C}_2\}\right) \cup \{ \mathcal {C}_2\left[ \mathcal {C}_1\right] \}\).Footnote 3 We denote by \(\triangleright ^{*}\) the reflexive-transitive closure of the derivation relation \(\triangleright \). The composition of two profiles then consists of all context decompositions derivable from some decompositions of both profiles:

Definition 6

(Composition of profiles). Let \(\mathcal {P}_1\) and \(\mathcal {P}_2\) be profiles w.r.t. \({\varPhi }\). Then the composition \(\mathcal {P}_1 \,\,\overline{\bullet }\,\, \mathcal {P}_2\) of \(\mathcal {P}_1\) and \(\mathcal {P}_2\) is defined as

figure d

Compositionality. Our lifted heap graph operations satisfy the compositionality property mentioned in Sect. 1. That is,

Theorem 2

For all concrete heap graphs \(\mathcal {M}\), \(\mathcal {M}'\) and every SID \({\varPhi }\), we have

$$\begin{aligned} \overline{\mathsf {rename}}_{\mathbf {x},\mathbf {y}}(\mathsf {profile}_{\varPhi }(\mathcal {M})) ~=~&\mathsf {profile}_{\varPhi }(\mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {M})) \\ \overline{\mathsf {forget}}_{\mathbf {x}}(\mathsf {profile}_{\varPhi }(\mathcal {M})) ~=~&\mathsf {profile}_{\varPhi }(\mathsf {forget}_{\mathbf {x}}(\mathcal {M})) \\ \mathsf {profile}_{\varPhi }(\mathcal {M}) \,\,\overline{\bullet }\,\, \mathsf {profile}_{\varPhi }(\mathcal {M}') ~=~&\mathsf {profile}_{\varPhi }(\mathcal {M}\,\bullet \,\mathcal {M}') \end{aligned}$$

provided that \(\mathsf {rename}_{\mathbf {x},\mathbf {y}}(\mathcal {M})\), \(\mathsf {forget}_{\mathbf {x}}(\mathcal {M})\), and \(\mathcal {M}\,\bullet \,\mathcal {M}'\) are defined, respectively.

Example 6

Recall from Fig. 6 the heap graphs \(\mathcal {M}_{\text {odd}}\) and \(\mathcal {M}_{\text {even}}\) whose profiles w.r.t. \({\varPhi }_{\text {o/e}}\) capture all singly-linked lists. We can construct a concrete heap graph \(\mathcal {M}\) representing a list of length five from \(x_1\) to \(x_2\) by computing

$$\begin{aligned} \mathcal {M}:= \mathsf {rename}_{v,x_2}\left( \mathsf {forget}_{x_2}\left( \mathcal {M}_{\text {odd}}\,\bullet \,\mathsf {rename}_{(x_1,x_2),(x_2,v)}(\mathcal {M}_{\text {even}}) \right) \right) . \end{aligned}$$

Then, by Theorem 2, the corresponding profile \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(M)\) is given by:

$$\begin{aligned} \overline{\mathsf {rename}}_{v,x_2}\!\!\left( \overline{\mathsf {forget}}_{x_2}\!\!\left( \mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}}) \,\overline{\bullet }\,\overline{\mathsf {rename}}_{(x_1,x_2),(x_2,v)}(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}}) \right) \right) \end{aligned}$$

This profile, in turn, coincides with the profile of \(\mathcal {M}_{\text {odd}}\), i.e., we have

$$\begin{aligned} \mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}) \,=\, \mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}}). \end{aligned}$$

In particular, notice that without the forget statement, we would obtain a heap graph \(\mathcal {M}'\) with an additional free variable. The additional free variable would also influence the profile of \(\mathcal {M}'\), because there exist more decompositions of \(\mathcal {M}'\) into heap graphs \(\mathcal {M}_1 \,\bullet \,\mathcal {M}_2\). Consequently, there are also more context decompositions of \(\mathcal {M}'\) and thus \(\mathcal {M}'\) has a larger profile.    \(\triangle \)

5 An Effective Decision Procedure for Entailment

Profile analysis. We now exploit our abstract domain to develop a decision procedure for entailments of the form \(\mathsf {pred}_1(\mathbf {a})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {b})\). Let us first consider the case in which the parameters \(\mathbf {a}\) and \(\mathbf {b}\) coincide with the free variables in the rules of the SID, i.e., \(\mathbf {a} = \mathsf {fv}(\mathsf {pred}_1) =: \mathbf {x_1}\) and \(\mathbf {b} = \mathsf {fv}(\mathsf {pred}_2) =: \mathbf {x_2}\). Our key observation is then that analyzing profiles of the entailment’s left-hand side suffices to discharge it: The entailment \(\mathsf {pred}_1(\mathbf {x_1})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x_2})\) holds iff the profile of every model \(\mathcal {M}\) of \(\mathsf {pred}_1(\mathbf {x_1})\) contains a context decomposition stating that a model of \(\mathsf {pred}_2(\mathbf {x_2})\) can be reconstructed from \(\mathcal {M}\). Formally,

Theorem 3

The entailment \(\mathsf {pred}_1(\mathbf {x_1})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x_2})\) holds iff for all concrete heap graphs \(\mathcal {M}\) with \(\mathcal {M}\,\models \,\mathsf {pred}_1(\mathbf {x_1})\), \(\left\{ \left\langle \mathsf {FV}_{\mathcal {M}}, \mathsf {pred}_2(\mathbf {x_2}), \emptyset \right\rangle \right\} \in \mathsf {profile}_{\varPhi }(\mathcal {M})\).

Example 7

Recall the profiles \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}})\) and \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {odd}})\) from Example 5 computed for models of \(\mathtt {sll}(x_1,x_2)\) w.r.t. SID \({\varPhi }_{\text {o/e}}\) (Fig. 5). We now use these profiles to disprove the entailment \(\mathtt {sll}(x_1,x_2) \models _{{\varPhi }_{\text {lists}}}\mathtt {odd}(x_1,x_2)\): First, observe that all predicates relevant for constructing models of \(\mathtt {odd}(x_1,x_2)\) belong to \({\varPhi }_{\text {o/e}}\subseteq {\varPhi }_{\text {lists}}\). Second, the \(\mathsf {profile}_{{\varPhi }_{\text {o/e}}}(\mathcal {M}_{\text {even}})\) does not contain a context decomposition \(\{\left\langle \{x_1,x_2\},\mathtt {odd}(x_1,x_2),\emptyset \right\rangle \} \). Hence, by Theorem 3, the entailment does not hold as we cannot reconstruct \(\mathcal {M}_{\text {even}}\) as a model of predicate \(\mathtt {odd}(x_1,x_2)\).    \(\triangle \)

figure e

Computing profiles. By Theorem 3, to decide whether \(\mathsf {pred}_1(\mathbf {x_1})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {x_2})\) holds, it suffices to compute the finite (by Lemma 3) set of all profiles of models of \(\mathsf {pred}_1(\mathbf {x_1})\). This is performed by the procedure \(\mathsf {abstractSID}({\varPhi })\) shown in Algorithm 1. To understand how the algorithm works, recall how predicates can be unrolled to compute a model: We select an SID rule and replace all of its predicate calls with previously computed models. By Lemma 1, this amounts to performing heap graph operations. That is, we first rename the free variables of previously computed models to match the parameters of predicate calls. After that, the resulting models and the single allocation (due to the progress condition) of the rule are composed into a single heap graph. Finally, we apply a forget operation to remove free variables that have been existentially quantified.

Algorithm 1 behaves analogously. However, instead of applying operations on heap graphs, it applies our abstract operations on profiles (cf. Theorem 2): We select an SID rule \(\mathsf {pred}~\Leftarrow ~\varphi \) in line 5. By Lemma 4, we can compute the profile of the single allocation in \(\varphi \). (l. 6). We then select previously computed profiles for the predicate rules and rename their free variables to match the parameters of the predicate calls in \(\varphi \) (l. 7–9). Finally, the selected profiles are composed and added to the computed profiles of predicate \(\mathsf {pred}\) (l. 10, 11). The algorithm then proceeds by computing profiles until a fixed point is reached (l. 12).

Correctness. Algorithm 1 is guaranteed to terminate due to the finiteness of our abstract domain (Lemma 3). Moreover, it computes the desired set of profiles:

Theorem 4

\(\mathsf {abstractSID}({\varPhi })(\mathsf {pred}) = \{ \mathsf {profile}_{\varPhi }(\mathcal {M}) \mid \mathcal {M}\,\models _{{\varPhi }}\,\mathsf {pred}(\mathsf {fv}(\mathsf {pred})) \text { and} \mathsf {FV}_\mathcal {M}\subseteq \mathsf {fv}(\mathsf {pred})\}\).

To check entailments \(\mathsf {pred}_1(\mathbf {a})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {b})\), where \(\mathbf {a}\) and \(\mathbf {b}\) do not coincide with the free variables of \(\mathsf {pred}_1\) and \(\mathsf {pred}_2\) in the rules of \({\varPhi }\), it suffices to apply an additional rename operation. Hence, by combining Theorems 3 and 4, we obtain a constructive decidability proof for entailments between predicate calls. Moreover, a close inspection of the size of the set of profiles and the runtime of Algorithm 1 reveals that our decision procedure runs in time doubly exponential in the size of a given SID. A detailed analysis is found in [1, Sect. 7.4].

Corollary 1

It is decidable in doubly exponential time whether the entailment \(\mathsf {pred}_1(\mathbf {a})\,\models _{{\varPhi }}\,\mathsf {pred}_2(\mathbf {b})\) holds.

Generalizations. Several of our assumptions about SIDs and entailments have been made purely to simplify the presentation. In fact, Corollary 1 can be generalized to (1) decide entailments \(\varphi \,\models _{{\varPhi }}\,\psi \) for symbolic heaps \(\varphi , \psi \) (instead of predicate calls) and (2) SIDs with pure formulas. Both extensions are supported by our implementation. Further details are found in [1].

6 Experiments

We implemented our decision procedure for entailment in the separation logic prover Harrsh [1, 15], which is written in Scala. Harrsh supports the full \(\text {SL}_{\text {btw}}\) fragment, including pure formulas, parameter repetitions, and entailments between symbolic heaps (as opposed to single predicate calls). Table 1 summarizes the results of our evaluation for a selection of entailments and SIDs. Our full collection of 101 benchmarks and all experimental results are available online [1].

Methodology. We compared Harrsh against Songbird [19], the winner of the SID entailment category of this year’s separation logic competition, SL-COMP’18; and against Slide [11], the tool that is most closely related to our approach but that is complete only for a subclass of \(\text {SL}_{\text {btw}}\). Experiments were conducted using the popular benchmarking harness jmh on an Intel® Core™ i7-7500U CPU running at 2.70 GHz with a memory limit of 4 GB. We report the average run times obtained by running Jmh on each benchmark for 100 s.

Table 1. The performance of Harrsh (HRS), Songbird (SB) and Slide (SLD) on a variety of SIDs; and the size of the abstraction computed by Harrsh. The timeout (TO) was 180,000 ms. Termination before the timeout but without result is denoted (U). Wrong results/crashes are marked (X).
Fig. 7.
figure 7

dlgrid

Benchmarks. Besides the running example (with \(\mathtt {sll}\), \(\mathtt {even}\) and \(\mathtt {odd}\) as in Fig. 5) and the entailments for doubly-linked trees discussed in the introduction (with \(\mathtt {ltree}\), \(\mathtt {rtree}\) as defined in Fig. 1), we show results on standard data-structure specifications from the SL literature: Several variants of trees with linked leaves (\(\mathtt {tll}\) [10], \(\mathtt {atll}\), \(\mathtt {tll}^{lin}\)) and doubly-linked lists (\(\mathtt {dllht}\) [18] defining lists from head to tail, \(\mathtt {dllth}\) from tail to head). Beyond lists and trees, we checked an entailment between doubly-linked 2-grid segments (see Fig. 7) defined forwards \(\mathtt {dlgridr}\) and backwards \(\mathtt {dlgridl}\).Footnote 4

Size of the abstraction. Beside the run times, we report the size of the abstraction computed by Harrsh. More specifically, we report (1) the total number of profiles in the fixed point of \(\mathsf {abstractSID}\) (#P), (2) the total number of context decompositions across all profiles (#D), and (3) the total number of contexts across all decompositions of all profiles (#C). This shows that even though the abstract domain \(\mathbf {Profiles}^{\mathbf {x}}({\varPhi })\) is very large in general, Harrsh typically only needs to explore a small portion of it to decide an entailment.

Results. Table 1 reveals that our decision procedure—being the first implemented decision procedure that is complete for the entire SL fragment \(\text {SL}_{\text {btw}}\)—is not only of theoretical interest, but can also solve challenging entailment problems efficiently in practice. While Slide was faster on some benchmarks that fall into the fragment defined in [11], as well as on some SIDs outside of that fragment, Harrsh was able to solve several benchmarks on which Slide failed. Two benchmarks led to errors: One wrong result and one program crash (the first and the second entries marked by (X) in Table 1, respectively). We are unsure whether the timeouts encountered on the TLL benchmarks are caused by a bug in Slide, as Slide is quite efficient on other TLL variants (see [11, Table 1]). Furthermore, note that Harrsh significantly outperformed Songbird, providing further evidence of the effectiveness of our profile-based abstraction.

7 Conclusion

We presented an alternative proof for decidability of entailment in separation logic with bounded tree width [10]. In contrast to the original proof, we give a direct model theoretic construction. We implemented the resulting decision procedure in the tool Harrsh and obtained promising experimental results. For future work, we plan to extend our approach to the bi-abduction problem.