figure a
figure b

1 Introduction

In the last decade, separation logic (SL) [15, 30] has become one of the most popular formalisms for reasoning about programs working with dynamically-allocated memory, including approaches based on deductive verification [32], abstract interpretation [34], symbolic execution [31], or bi-abductive analysis [6, 12, 18]. The key ingredients of SL used in these approaches include the separating conjunction \(*\), which allows modular reasoning by stating that the program heap can be decomposed into disjoint parts satisfying operands of the separating conjunction, along with inductive predicates describing shapes of data structures, such as lists, trees, or their various combinations.

The high expressive power of SL comes with the price of high complexity and even undecidability when several of its features are combined together. The existing decision procedures are usually limited to the so-called symbolic heap fragment that disallows any boolean structure of spatial assertions.

In this paper, we present a novel decision procedure for a fragment of SL that we call boolean separation logic (BSL). The fragment allows arbitrary nesting of separating conjunctions and boolean connectives of conjunction, disjunction, and a limited form of negation of the form \(\varphi \wedge \lnot \psi \) called guarded negation. To the best of our knowledge, no existing, practically applicable decision procedure supports a fragment with such a rich boolean structure and at least basic inductive predicates. The decision procedure for SL in cvc5 [29] supports arbitrary nesting of boolean connectives (including even unguarded negation, which is considered very expensive in the context of SL) but no inductive predicates. A support for conjunctions and disjunctions under separating conjunctions is available in the backend solver of the GRASShopper verifier [27, 28] though not described in the papers. In our experimental evaluation, we outperform both of these approaches on some benchmarks (and can decide some formulae beyond the capabilities of both of them). We further show that adding guarded negations to BSL makes its satisfiability problem \(\textsf{PSPACE}\)-hard.

To motivate the usefulness of the fragment we consider, we now give several examples when SL formulae with a rich boolean structure are useful. First, in symbolic execution of heap manipulating programs, one usually needs to consider functions that involve some non-determinism—typically, at least the malloc statement has the non-deterministic contract \(\{\textsf{emp}\} \;\texttt {x = malloc()}\; \{x \mapsto f \vee (x = \textsf{nil}\wedge \textsf{emp})\}\) (where f is a fresh variable) stating that when the statement is started in the empty heap, once it finishes, x is either allocated, or the allocation had failed and the heap is empty. Such contracts typically need a dedicated (and usually incomplete) treatment when no support of disjunctions is available.Footnote 1 Further, the guarded negation semantically represents the set of counterexamples of the entailment \(\varphi \,\models \,\psi \), and hence allows one to reduce entailment queries to UNSAT checking. Guarded negation can also be used when one needs to obtain several models of a formula \(\varphi \) by joining formulae representing the already obtained models to \(\varphi \) using guarded negations. One can also use the guarded negation to express interesting properties such as the fact that given a list \(\textsf{sls}(x,y)\) and a pointer \(y \mapsto z\), the pointer does not point back somewhere into the list closing a lasso. This can be expressed through the formula \(\bigl (\textsf{sls}(x,y) \wedge \lnot \bigl (\textsf{sls}(x, z) *\textsf{sls}(z, y) \bigr )\bigr ) *y \mapsto z\). Finally, boolean connectives can be introduced by translating quantitative separation logic into the classical SL [2].

In this work, we consider BSL with three fixed, built-in inductive predicates representing the most-common variants of lists: singly-linked (SLL), doubly-linked (DLL), and nested singly-linked (NLL). Our results can be easily extended for their variations such as nested doubly-linked lists of singly-linked lists and the like, but for the price of manually defining their semantics in the SMT encoding. We do, however, believe that our approach of bounding the sizes of models and instantiations of the individual predicates can be lifted to more complex inductive definitions and can serve as a starting point for allowing integration of SL with inductive definitions into SMT.

Contributions. Our approach to deciding BSL formulae is inspired by previous works on translation of SL to SMT. The early works [27] and [28] translate SL to intermediate theories first. Our approach is closer to the more recent approach of [16], which builds on small-model properties and axiomatizes reachability through pointer links directly. We extend the SL fragment considered in [16] by going beyond the so-called unique footprint property (under which it is much easier to obtain an efficient translation). Further, we define a more precise way to obtain global bounds on models of entire formulae, and, most importantly, we modify the translation of inductive predicates in a way that allows us to encode them succinctly by computing local bounds on their instantiations. According to our experiments, this makes the decision procedure efficient and competitive with the state-of-the-art approaches on the symbolic heap fragment (despite the increased decisive power). The claims we make in this paper are proven in [9].

Related work. In [3], a proof system for deciding entailments of symbolic heaps with lists was proposed. This problem was later shown to be solvable in polynomial time in [8] via graph homomorphism checking. A superposition-based calculus for the fragment was presented in [23], and a model-based approach enhancing SMT solvers was proposed in [24]. In [24], a combination of SL with SMT theories is considered but still limited to the symbolic heap fragment. A more expressive boolean structure and integration with SMT theories was developed in [27] for lists and extended for trees in [28] but still without a support for guarded negations.

Other decision procedures are focusing on more general, user-defined inductive predicates (usually of some restricted form). They are based, e.g., on cyclic proof systems (Cyclist [5], S2S [19, 20]); lemma synthesis (Songbird [33]); or automata—tree automata are used in the tools Slide [13] and Spen [11], and a specialised type of automata, called heap automata, is used in Harrsh [17]. These procedures do, however, not support nested use of boolean connectives and separating conjunctions.

There also exist works on deciding much more expressive fragments of SL such as [10, 14, 21, 26] but they do not lead to practically implementable decision procedures.

2 Preliminaries

Partial functions. We write \(f : X \rightharpoonup Y\) to denote a partial function from X to Y. For a partial function f, \(\textsf{dom}(f)\) and \(\textsf{img}(f)\) denote its domain and image, respectively; \(|f| = |\textsf{dom}(f)|\) denotes its size, and \(f(x) = \bot \) denotes that f is undefined for x. A restriction \(f|_A\) of f to \(A \subseteq X\) is defined as f(x) for \(x \in A\) and undefined otherwise. To represent a finite partial function f, we often use the set notation \(f = \{x_1 \mapsto y_1, \ldots , x_n \mapsto y_n\}\) meaning that f maps each \(x_i\) to \(y_i\), and is undefined for other values. We call partial functions \(f_1\) and \(f_2\) disjoint if \(\textsf{dom}(f_1) \cap \textsf{dom}(f_2) = \emptyset \) and define their disjoint union \(f_1 \uplus f_2\) as \(f_1 \cup f_2\), which is otherwise undefined.

Graphs and paths. Let \(G = (V, \xrightarrow {}_1, \ldots , \xrightarrow {}_m)\) be a directed graph with vertices V and edges \(\xrightarrow {}= \xrightarrow {}_1 \cup \cdots \cup \xrightarrow {}_m\). For \(1 \le \textsf{f}\le m\), a sequence \(\sigma = \langle v_0, v_1, \ldots , v_n \rangle \in V^{+}\) is a path from \(v_0\) to \(v_n\) via \(\xrightarrow {}_\textsf{f}\) in G, denoted as \(\sigma : v_0 {\;\leadsto }_{\textsf{f}}\; v_n\), if all elements of \(\sigma \) are distinct, and for all \(0 \le i < n\), it holds that \(v_i \xrightarrow {}_\textsf{f}v_{i+1}\). By the definition, paths cannot be cyclic. The domain of the path \(\sigma \) is the set \(\textsf{dom}(\sigma ) = \{v_0, v_1, ..., v_{n-1}\}\), and the length of the path is defined as \(|\sigma | = |\textsf{dom}(\sigma )| = n\).

Formulae. For a first-order formula \(\varphi \), we denote by \(\varphi [t / x]\) the formula obtained by simultaneously replacing all free occurrences of the variable x in \(\varphi \) with the term t. For a first-order model \(\mathcal {M}\) and a term t, we write \(t^{\mathcal {M}}\) to denote the evaluation of t in \(\mathcal {M}\) defined as usual.

3 Separation Logic

Syntax. Let \(\textsf{Vars}\) be a countably infinite set of sorted variables. We denote by \(x^S\) a variable x of a sort \(S \in \textsf{Sort}= \{\mathbb {S}, \mathbb {D}, \mathbb {N}\}\) representing a location in an SLL, DLL, or NLL, respectively. We omit the sorts when they are not relevant or clear from the context. We further assume that there exists a distinguished, unsorted variable \(\textsf{nil}\). We write \(\textsf{vars}(\varphi )\) to denote the set of all variables in \(\varphi \) plus \(\textsf{nil}\) (even when it does not appear in \(\varphi \)). Analogically, \(\textsf{vars}_S(\varphi )\) stands for all variables of the sort S plus \(\textsf{nil}\).

The syntax of our fragment is given by the following grammar:

figure c

The points-to predicate \(x \mapsto \langle \textsf{f}_1 : f_1, \ldots , \textsf{f}_n: f_n\rangle \) denotes that x is a structure whose fields \(\textsf{f}_i\) point to values \(f_i\). We often write \(x \mapsto n\) instead of \(x \mapsto \langle \textsf{n}\!: n \rangle \) and \(x \mapsto \underline{}\) if the right-hand side is not relevant. We call x the root of the points-to predicate. If \(\pi \) is an inductive predicate \(\textsf{sls}(x,y)\), \(\textsf{dls}(x,y,x',y')\), or \(\textsf{nls}(x,y,z)\), we again call x the root of \(\pi \), y is the sink of \(\pi \), and we write \(\pi (x, y)\) to denote the root and the sink. We define the sort of the predicate \(\pi \), denoted as \(S_\pi \), as the sort of its root. Then, there is a one-to-one correspondence of predicates and sorts, which we often implicitly use.

Memory model. Let \(\textsf{Loc}\) be a countably infinite set of memory locations, and let \(\textsf{Field}= \{ \textsf{n}, \textsf{p}, \textsf{t}\}\) be the set of fields. A stack is a finite partial function \(s: \textsf{Vars}\rightharpoonup \textsf{Loc}\). A heap is a finite partial function \(h : \textsf{Loc}\rightharpoonup (\textsf{Field}\rightharpoonup \textsf{Loc})\). For succinctness, we write \(h(\ell , \textsf{f})\) instead of \(h(\ell )(\textsf{f})\). To represent heap elements in a readable way, we write functions \(\textsf{Field}\rightharpoonup \textsf{Loc}\) as vectors with labels, i.e., \(h(\ell ) = \langle \textsf{f}: h(\ell , \textsf{f}) \;|\; \textsf{f}\in \textsf{Field}\;\wedge \; h(\ell , \textsf{f}) \ne \bot \rangle \) and we write \(\textsf{img}(h)\) for \(\{\ell \in \textsf{Loc}\;|\; \exists \ell ', \textsf{f}.\; h(\ell ', \textsf{f}) = \ell \}\). Moreover, we use \(h(\ell ) = n\) when \(h(\ell ) = \langle \textsf{n}: n \rangle \). A stack-heap model is a pair (sh) where s is stack and h is a heap such that \(s(\textsf{nil}) \ne \bot \) and \(h(s(\textsf{nil})) = \bot \). We define the set of locations of the model (sh) as \(\textsf{locs}{(s,h)} = \textsf{img}(s) \cup \textsf{dom}(h) \cup \textsf{img}(h)\).

Fig. 1.
figure 1

The semantics of the separation logic. The existential quantifier is used for the definition of the semantics of inductive predicates and it is not a part of our fragment.

Semantics. The semantics of our SL over stack-heap models is given in Fig. 1. For pure formulae, we use the so-called precise semantics, which additionally requires that the heap must be emptyFootnote 2. The semantics of pointer assertions, boolean connectives, and separating conjunctions is as usual. The intuition behind the semantics of the inductive predicates is as follows. An SLL segment \(\textsf{sls}(x,y)\) is either empty or represents an acyclic sequence of allocated locations starting from x and leading via the \(\textsf{n}\) field to y, which is not allocated. A DLL segment \(\textsf{dls}(x, y, x', y')\) is either empty with \(x = y\) and \(x' = y'\), or it represents an acyclic sequence that is doubly-linked via the \(\textsf{n}\) and \(\textsf{p}\) fields and leads from the first allocated location x of the segment to its last allocated location \(x'\) (x and \(x'\) may coincide) with y/\(y'\) being the \(\textsf{n}\)/\(\textsf{p}\)-successors of \(x'\)/x, respectively. Both y and \(y'\) are not allocated. An NLL segment \(\textsf{nls}(x, y, z)\) is a (possibly empty) acyclic sequence of locations starting from x and leading to y via the \(\textsf{t}\) (top) field in which successor of each locations starts a disjoint inner SLL to z via \(\textsf{n}\).

Stack-heap graphs. We frequently identify stack-heap models with their graph representation. A stack-heap model (sh) defines a graph \(G[(s,h)] = (V, (\xrightarrow {}_\textsf{f})_{\textsf{f}\in \textsf{Field}})\) where \(V = \textsf{locs}(s,h)\) and \(u \xrightarrow {}_\textsf{f}v\) iff \(h(u, \textsf{f}) = v\). We frequently use the fact that if there exists a path \(\sigma : x {\;\leadsto }_{\textsf{f}}\; y\) in a stack-heap graph, then it is uniquely determined because \(\textsf{f}\)-edges are given by a partial function.

4 Small-Model Property

Small-model properties, which state that each satisfiable formula has a model of bounded size, are frequently used for various fragments of SL to prove their decidability [7] or to design decision procedures [16, 26, 29]. The latter is also the case of our translation-based decision procedure which will heavily rely on enumeration over all locations, and, for its efficiency, it is therefore necessary to obtain location bounds that are as small as possible.

The way we obtain our small-model property is inspired by the approach of [16] and by insights from the so-called strong-separation logic [26]. The main idea is to define a satisfiability-preserving reduction \(\downarrow ^{s}{\!h}\) which takes a heap h (referenced from a stack s), decomposes it into basic sub-heaps (which we call chunks), and reduces it per the sub-heaps in such a way that its size can be easily bounded by a linear expression. To define the reduction, we first need to introduce some auxiliary notions related to stack-heap models.

We say that a model (sh) is positive if there exists \(\varphi \) with \((s,h)\,\models \,\varphi \). A positive model (sh) is atomic if it is non-empty, and for all positive models \((s, h_1)\) and \((s, h_2)\), \(h = h_1 \uplus h_2\) implies that \(h_1 = \emptyset \) or \(h_2 = \emptyset \). In other words, atomic models cannot be decomposed into two non-empty positive models. Several examples of atomic models are shown in Fig. 2. Observe that the models of \(\textsf{dls}\) (Figure 2b) and \(\textsf{nls}\) (Figure 2c) are indeed atomic as any of their decomposition, in particular the split at the location u, does not give two positive models.

Fig. 2.
figure 2

An illustration of reductions of atomic models of inductive predicates. Removed heap locations are red, removed edges are dotted, and added edges are highlighted.

A sub-heap \(c \subseteq h\) is a chunk of a model (sh) if c is a maximal sub-heap of h such that (sc) is an atomic positive model. Notice that the way the definition of chunks is constructed excludes the possibility of using as a chunk a sub-heap of a heap that itself forms an atomic model. The reason is that otherwise the remaining part of the larger atomic model could not described by the available predicates. For example, in nested lists as shown in Fig. 2c, one cannot take as a chunk a part of some inner list (e.g., the pointer \(u \mapsto z\)) as the heap shown in the figure itself forms an atomic model. Indeed, if \(u \mapsto z\) was removed, one would need a more general version of the NLL predicate to cover the remaining heap by atomic models.

Lemma 1 (Chunk decomposition)

A positive model (sh) can be uniquely decomposed into the set of its chunks, denoted \(\textsf{chunks}(s,h)\), i.e., \(h=\biguplus \textsf{chunks}(s,h)\).

Minimal atomic models of inductive predicates. The key reason why the small-model property that we are going to state holds is that our fragment of SL cannot distinguish atomic models of the considered predicates beyond certain small sizes—namely, two for \(\textsf{sls}\) and \(\textsf{nls}\), and three for \(\textsf{dls}\). For further use, we will now state predicates describing exactly the sets of the indistinguishable lists of the different kinds.

We start with SLLs and use a disequality to exclude empty lists: \(\textsf{sls}_{\ge 1}(x,y) \triangleq \textsf{sls}(x,y)*x \ne y\), and a guarded negation to exclude lists of length one consisting of a single pointer only: \(\textsf{sls}_{\ge 2}(x,y) \triangleq \textsf{sls}_{\ge 1}(x,y) \wedge \lnot (x \mapsto y)\). A similar predicate can be defined for NLLs too: \(\textsf{nls}_{\ge 2}(x,y,z) \triangleq \bigl (\textsf{nls}(x,y,z) *x \ne y\bigr ) \wedge \lnot (x \mapsto \langle \textsf{n}\!:z, \textsf{t}\!:y \rangle )\).

For DLLs, we define \(\textsf{dls}_{\ge 2}(x, y, x', y') \triangleq \textsf{dls}(x,y,x',y')*x \ne y *x \ne x'\) to exclude models that are either empty or consist of a single pointer; and \(\textsf{dls}_{\ge 3}(x, y, x', y') \triangleq \textsf{dls}_{\ge 2}(x, y, x', y') \wedge \lnot (x \mapsto \langle \textsf{n}\!:x', \textsf{p}\!:y' \rangle *x' \mapsto \langle \textsf{n}\!:y, \textsf{p}\!:x \rangle )\) to also exclude models consisting of exactly two pointers.

It holds that atomic models, and consequently also chunks, are precisely either models of single pointers or of the above predicates.

Lemma 2

For atomic model (sh), exactly one of the following conditions holds.

  1. 1.

    \((s,h)\,\models \,x \mapsto \underline{}\;\) for some x. (pointer-atom)

  2. 2.

    \((s,h)\,\models \,\textsf{sls}_{\ge 2}(x,y)\) for some x and y. (\(\textsf{sls}\)-atom)

  3. 3.

    \((s,h)\,\models \,\textsf{dls}_{\ge 3}(x, y, x', y')\) for some x, y, \(x'\), and y’. (\(\textsf{dls}\)-atom)

  4. 4.

    \((s,h)\,\models \,\textsf{nls}_{\ge 2}(x, y, z)\) for some x, y, and z. (\(\textsf{nls}\)-atom)

We can now define the reduction in the way we have already sketched.

Definition 1

The heap of a positive model (sh) reduces to \(\downarrow ^{s}{\!h}= \biguplus _{c \in \textsf{chunks}(s,h)} \downarrow ^{s}{\!c}\) where the reduction of a chunk c with a root x as follows:

  • \(\downarrow ^{s}{\!c} = c\) if \((s,c)\,\models \,x \mapsto \underline{}\;\).

  • \(\downarrow ^{s}{\!c} = \{s(x) \mapsto \ell , \ell \mapsto s(y) \}\) where \(\ell = c(s(x),\textsf{n})\) if \((s,c)\,\models \,\textsf{sls}_{\ge 2}(x, y)\) for some y.

  • \(\downarrow ^{s}{\!c} = \{ s(x) \mapsto \langle \textsf{n}\!:\!\ell ,\) \(\textsf{p}\!:\!s(y') \rangle , \ell \mapsto \langle \textsf{n}\!:\!s(x'), \textsf{p}\!:\!s(x) \rangle , s(x') \mapsto \langle \textsf{n}\!:\!s(y), \textsf{p}\!:\! \ell \rangle \}\) where \(\ell = c(s(x),\textsf{n})\) if \((s,c)\,\models \,\textsf{dls}_{\ge 3}(x, y, x', y')\) for some \(x'\), \(y'\) and y.

  • \(\downarrow ^{s}{\!c} = \{s(x) \mapsto \langle \textsf{t}: \ell , \textsf{n}: s(z) \rangle , \ell \mapsto \langle \textsf{t}: s(y), \textsf{n}: s(z)\rangle \}\) where \(\ell = c(s(x),\textsf{t})\) if \((s,c)\,\models \,\textsf{nls}_{\ge 2}(x, y, z)\) for some y and z.

We lift the reduction to stack-heap models as \(\downarrow ^{X}\!(s,h) = (s', \downarrow ^{s'}{\!h})\) where \(s' = s|_X\) for some set of variables X and show that it preserves satisfiability when \(X = \textsf{vars}(\varphi )\).

Theorem 1

For a positive model (sh), it holds that \((s,h)\,\models \,\varphi \) iff \(\;\downarrow ^{\textsf{vars}(\varphi )}\!(s,h)\,\models \,\varphi \).

The final step to show our small-model property is to find an upper bound on the size of the reduced models. We define the size of a variable \(x^S\), \(||x^S ||\), which represents its contribution to the location bound, and is defined as 2 if \(S \in \{\mathbb {S}, \mathbb {N}\}\) and 1.5 if \(S = \mathbb {D}\) (this corresponds to the size of a reduced chunk of sort S divided by the number of variables which are allocated in it). We further define \(||\textsf{nil} || = 0\). The location bound of \(\varphi \) is then given as \(\textsf{bound}(\varphi ) = 1 + \lfloor \sum _{x \in \textsf{vars}(\varphi )} ||x || \rfloor \) (the additional location is for \(\textsf{nil}\)). Analogically, the location bound for a sort S is \(\textsf{bound}_S(\varphi ) = \lfloor \sum _{x \in \textsf{vars}_S(\varphi )} ||x || \rfloor \).

Theorem 2 (Small-model property)

If a formula \(\varphi \) is satisfiable, then there exists a model \((s, h)\,\models \,\varphi \) such that \(|\textsf{locs}(s,h)| \le \textsf{bound}(\varphi )\).

We conjecture that the bound can be further improved, e.g., by showing that each model can be transformed to an equivalent one (indistinguishable by BSL formulae) such that the number of its chunks is bounded by the number of roots of spatial predicates in \(\varphi \). We demonstrate this on the formula \(\textsf{sls}(x, y) *y \mapsto z\) and its model in which y points back into the middle of the list segment (thus splitting it into two chunks). Clearly, this model can be transformed by redirecting z outside of the list domain.

5 Translation-Based Decision Procedure

In this section, we present our translation of SL to SMT. We first present an SMT encoding of our memory model and a translation of basic predicates and boolean connectives. Then we discuss methods for efficient translation of separating conjunctions and inductive predicates with the focus on avoiding quantifiers by replacing them by small enumerations of their instantiations.

We fix an input formula \(\varphi \) and let \(n_S = \textsf{bound}_S(\varphi )\) for each sort \(S \in \textsf{Sort}\).

5.1 Encoding the Memory Model in SMT

To encode the heap, we use a classical approach which encodes its mapping and domain separately [16, 27, 29]. Namely, we use arrays to encode mappings and sets to encode domains. We also use the theory of datatypes to represent a finite sort of locations by a datatype \(\textsf{L}\triangleq \; \textsf{loc}^{\textsf{nil}} \,\,|\,\, \textsf{loc}^{\mathbb {S}}_1 \;|\; \ldots \;|\; \textsf{loc}^{\mathbb {S}}_{n_\mathbb {S}} \,\,|\,\, \textsf{loc}^{\mathbb {D}}_1 \;|\; \ldots \;|\; \textsf{loc}^{\mathbb {D}}_{n_\mathbb {D}} \,\,|\,\, \textsf{loc}^{\mathbb {N}}_1 \;|\; \ldots \;|\; \textsf{loc}^{\mathbb {N}}_{n_\mathbb {N}}. \)

Now, we define the signature of the translation’s language over the sort \(\textsf{L}\). For each \(x~\in ~\textsf{vars}(\varphi )\), we introduce a constant x of the same name—its interpretation represents the stack image s(x). To represent the heap, we introduce a set symbol D representing the domain and an array symbol \(h_\textsf{f}\) for each field \(\textsf{f}\in \textsf{Field}\) which represents the mapping of the partial function \(\lambda \ell .\; h(\ell , \textsf{f})\). To distinguish sorts of locations, we further introduce a set symbol \(D_S\) for each sort \(S \in \textsf{Sort}\). We define meaning of these symbols by showing how a stack-heap model can be reconstructed from a first-order model.

Definition 2 (Inverse translation)

Let \(\mathcal {M}\) be a first-order model. We define its inverse translation \(\textsf{T}_{\varphi }^{-1}(\mathcal {M}) = (s,h)\) where s(x) = \(x^\mathcal {M}\) if \(x \in \textsf{vars}(\varphi )\) and

$$\begin{aligned} h(\ell ) &= {\left\{ \begin{array}{ll} \langle \textsf{n}\!: h_\textsf{n}[\ell ]^\mathcal {M} \rangle &{}\text { if }\ell \in (D \cap D_\mathbb {S})^{\mathcal {M}}\\ \langle \textsf{n}\!:h_\textsf{n}[\ell ]^\mathcal {M}, \textsf{p}\!:h_\textsf{p}[\ell ]^\mathcal {M} \rangle &{}\text { if }\ell \in (D \cap D_\mathbb {D})^{\mathcal {M}}\\ \langle \textsf{n}\!:h_\textsf{n}[\ell ]^\mathcal {M}, \textsf{t}\!:h_\textsf{t}[\ell ]^\mathcal {M} \rangle &{}\text { if }\ell \in (D \cap D_\mathbb {N})^{\mathcal {M}}. \end{array}\right. } \end{aligned}$$

To ensure consistency of the translation with the memory model used, we define the following axioms that a result of translation needs to satisfy:

$$\begin{aligned} \mathcal {A}_\varphi \triangleq \textsf{nil}= \textsf{loc}^{\textsf{nil}} \wedge \; \textsf{nil}\not \in D \;\wedge \; \!\!\!\!\bigwedge _{S \in \textsf{Sort}}\!\! \bigl (D_S = \{\textsf{loc}^{\textsf{nil}}, \textsf{loc}^{S}_1, \ldots , \textsf{loc}^{S}_{n_S}\} \;\wedge \; \!\!\!\!\!\!\!\!\!\bigwedge _{x \in \textsf{vars}_S(\varphi )}\!\!\!\!\!\! x \in D_S\bigl ). \end{aligned}$$

The axioms ensure that \(\textsf{nil}\) is never allocated, that each variable is interpreted as a location of the corresponding sort and they fix the interpretation of the sets \(D_\mathbb {S}, D_\mathbb {D}, D_\mathbb {N}\), which we will later use in the translation to assign sorts to locations.

5.2 Translation of SL to SMT

We define the translation as a function \(\textsf{T}(\varphi ) = \mathcal {A}_\varphi \wedge \textsf{T}(\varphi , D)\) where \(\mathcal {A}_\varphi \) are the above defined axioms and \(\textsf{T}(\varphi , D)\) is a recursive translation function of the formula \(\varphi \) with the domain symbol D. The translation \(\textsf{T}(\cdot )\) together with the inverse translation of models \(\textsf{T}^{-1}_\varphi (\cdot )\) are linked by the following correctness theorem.

Theorem 3 (Translation correctness)

An SL formula \(\varphi \) is satisfiable iff its translation \(\textsf{T}(\varphi )\) is satisfiable. Moreover, if \(\mathcal {M}\,\models \,\textsf{T}(\varphi )\), then \(\textsf{T}_{\varphi }^{-1}(\mathcal {M})\,\models \,\varphi \).

The translation of non-inductive predicates and boolean connectives is defined as:

figure e

The translation of boolean connectives follows the boolean structure and propagates the domain symbol F to the operands. The translation of pointer assertions postulates content of memory cells represented by arrays and also requires the domain F to be \(\{x\}\).

Translation of separating conjunctions. The semantics of separating conjunctions involves a quantification over sets (heap domains). The most direct way of translation is to use quantifiers over sets leading to decidable formulae due to the bounded location domain. This approach combined with a counterexample-guided quantifier instantiation is used in the decision procedure for a fragment of SL supported in cvc5 [29]. In some fragments, however, separating conjunctions can be translated in a way that completely avoids quantifiers. An example is the fragment of boolean combinations of symbolic heaps which has the so-called unique footprint property (UFP) [16, 27]—a formula \(\psi \) has a (unique) footprint in a model (sh) with \((s,h)\,\models \,\psi *\textsf{true}\)Footnote 3, if there exists a (unique) set F such that \((s, h|{_{F}})\,\models \,\psi \). The UFP-based approaches of [16, 27] axiomatize the footprints during translation and check operands of separating conjunctions just on the sub-heaps induced by their footprints.

However, UFP does not hold for BSL because of disjunctions. As an example, take the formula \(\psi \triangleq x \mapsto y \vee \textsf{emp}\) and the heap \(h = \{x \mapsto y\}\). Both \((s,h|_{\{s(x)\}})\,\models \, \psi \) and \((s,h|_{\emptyset })\,\models \,\psi \) hold. The sets \(\{s(x)\}\) and \(\emptyset \) are, however, the only footprints of \(\psi \) in (sh), and this observation can be used to generalise the idea of footprints beyond the fragment in which they are unique.

Instead of axiomatizing the footprints, our translation builds a set of footprint terms for operands of separating conjunctions. This change can be also seen as a simplification of the former translations as it eliminates the need to deal with two kinds of formulae (the actual translation and footprint axioms), which must be treated differently during the translation. However, the precise computation of the set of all footprints of \(\psi \) in (sh), denoted as \(\textsf{FP}_{(s,h)}(\psi )\), is as hard as satisfiability—when the set of footprints is non-empty, the formula \(\psi \) is satisfiable. Therefore, we compute just an over-approximation denoted as \(\textsf{FP}^{\#}(\psi )\). This is justified by the following lemma which gives an equivalent semantics of the separating conjunction in terms of footprints.

Lemma 3

Let \(\varphi \triangleq \psi _1 *\psi _2\) and let (sh) be a model. Let \(\mathcal {F}_1\) and \(\mathcal {F}_2\) be sets of locations such that \(\textsf{FP}_{(s,h)}(\psi _i) \subseteq \mathcal {F}_i\). Then \((s,h) \,\models \,\psi _1 *\psi _2\) iff

$$\begin{aligned} \bigvee _{F_1 \in \mathcal {F}_1} \; \bigvee _{F_2 \in \mathcal {F}_2} \; \bigwedge _{i = 1, 2} (s, h|_{F_i})\,\models \,\psi _i \;\wedge \; F_1 \cap F_2 = \emptyset \;\wedge \; F_1 \cup F_2 = \textsf{dom}(h). \end{aligned}$$

Intuitively, to check whether a separating conjunction holds in a model, it is not necessary to check all possible splits of the heap, but only the splits induced by (possibly over-approximated) footprints of its operands. The lemma is therefore a generalisation of UFP and leads to the following definition of the translation \(\textsf{T}(\psi _1 *\psi _2, F)\):

$$\begin{aligned} \exists F_1 \in \mathcal {F}_1.\; \exists F_2 \in \mathcal {F}_2.\;\;\; \textsf{T}(\psi _1, F_1) \wedge \textsf{T}(\psi _2, F_2) \wedge F_1 \cap F_2 = \emptyset \wedge F = F_1 \cup F_2. \end{aligned}$$

Here, we use a quantifier expression of the form \(\exists x \in X.\; \psi \) as a placeholder that helps us to define two methods which the translation can use for separating conjunctions:

  • The method \(\texttt{SatEnum}\) computes sets of footprints \(\mathcal {F}_i\) as \(\textsf{FP}^{\#}(\psi _i)\) (the computation is described below) and replaces expressions \(\exists x \in X.\; \psi \) with \(\bigvee _{x' \in X} \psi [x'/x]\) as in Lemma 3. This strategy is quite efficient in many practical cases when we can compute small sets of footprints \(\mathcal {F}_1\) and \(\mathcal {F}_2\).

  • The method \(\texttt{SatQuantif}\) does not compute sets \(\mathcal {F}_i\) at all and replaces \(\exists x \in X.\; \psi \) simply with \(\exists x.\; \psi \). This strategy is better when the existential quantifier can be later eliminated by Skolemization or when the set of footprints would be too large.

We now show how to compute the set of footprint terms \(\textsf{FP}^{\#}(\psi )\). We again postpone inductive predicates to Section 5.3. We just note that their footprints are unique. The cases of pure formulae and pointer assertions follow directly from the definition of their semantics, which requires the heap to be empty and a single pointer, respectively.

$$\begin{aligned} \textsf{FP}^{\#}(x \bowtie y) = \{\emptyset \} \;\text { for } \bowtie \; \in \{=, \ne \} {} & {} \textsf{FP}^{\#}(x \mapsto \underline{}) = \{\{x\}\} \end{aligned}$$

For the boolean conjunction, we can select from footprints of its operand the one with the lesser cardinality. Since negations have many footprints (consider, e.g., \(\lnot \textsf{emp}\)), we define the case of the guarded negation by taking footprints of its guard. The disjunction is the only case which brings non-uniqueness as we need to consider footprints of both of its operands.

figure f

Finally, we define footprints of the separating conjunction by taking the union \(F_1 \cup F_2\) for each pair \((F_1, F_2)\) of footprints of its operands. Notice that here \(F_1 \cup F_2\) represents an SMT term, therefore we cannot replace it with a disjoint union which is not available in the classical set theories in SMT. We can, however, use heuristics and filter out terms for which we can statically determine that interpretations of \(F_1\) and \(F_2\) are not disjoint.

$$\begin{aligned} \textsf{FP}^{\#}(\psi _1 *\psi _2) = \{F_1 \cup F_2 \;|\; F_1 \in \textsf{FP}^{\#}(\psi _1) \text { and } F_2 \in \textsf{FP}^{\#}(\psi _2)\} \end{aligned}$$

We state the correctness of the footprint computation in the following lemma.

Lemma 4

Let \(\mathcal {M}\) be a first-order model with \(\mathcal {M}\,\models \,\textsf{T}(\varphi )\) and let \((s,h) = \textsf{T}_{\varphi }^{-1}(\mathcal {M})\). Then we have \(\textsf{FP}_{(s,h)}(\varphi ) \subseteq \{F^\mathcal {M}\;|\; F \in \textsf{FP}^{\#}(\varphi )\}\).

5.3 Translation of Inductive Predicates

To translate inductive predicates, we express them in terms of reachability and paths in the heaps. While unbounded reachability cannot be expressed in first-order logic, we can efficiently express bounded linear reachability in our encoding. The linearity means that each path uses only a single field (which is not the case, e.g., for paths in trees). All predicates in this section are parametrised with an interval [mn] which bounds the length of the considered paths. When we do not state the bounds explicitly, we assume conservative bounds \([0, \textsf{bound}_S(\varphi )]\) for a path starting from a root of a sort S. We show how to compute more precise bounds in Section 6. We start with the translation of reachability:

$$\begin{aligned} \textsf{reach}^{=n}(h, x, y) \triangleq h^n[x] = y {} & {} \textsf{reach}^{[m,n]}(h, x,y) \triangleq \bigvee \!_{m \le i \le n} \textsf{reach}^{=i}(h, x, y) \end{aligned}$$

Here, the predicate \(\textsf{reach}^{=n}(h, x, y)\) expresses that x can reach y via a field represented by the array h in exactly n steps. Similarly, \(\textsf{reach}^{[m,n]}\) expresses reachability in m to n steps. Besides reachability, we will need a macro \(\textsf{path}_C(h, x, y)\) expressing the domain of a path from x to y, or the empty set if such a path does not exists:

$$\begin{aligned} \textsf{path}^{=n}_C(h, x,y) \triangleq \;& \bigcup \;_{0\le i < n} \; C(h^i[x])\\ \textsf{path}^{[m,n]}_C(h, x,y) \triangleq \;& \textsf{if} \;(\textsf{reach}^{=m}(h, x, y))\; \textsf{then} \;(\textsf{path}^{=m}_C(h, x, y))\\ & \cdots \; \mathsf {else\;if} \;(\textsf{reach}^{=n}(h, x, y))\; \textsf{then} \;(\textsf{path}^{=n}_C(h, x, y)) \;\textsf{else} \;(\emptyset ) \end{aligned}$$

The additional parameter C is a function applied to each element of the path that can be used to define nested paths. We define a simple path \(\textsf{path}^{[m,n]}_S(h, x, y) \triangleq \textsf{path}^{[m,n]}_C(h, x, y)\) with \(C \triangleq \lambda \ell .\; \{\ell \}\) and a nested path as \(\textsf{path}^{[m,n]}_N(h_1, h_2, x, y, z) \triangleq \textsf{path}^{[m,n]}_C(h_1, x, y)\) with \(C \triangleq \lambda \ell .\; \textsf{path}_S(h_2, \ell , z)\). In the case of the nested path, the array \(h_1\) represents the top-level path from x to y, and \(h_2\) represents nested paths terminating in the common location z. Now we can define footprints of inductive predicates using \(\textsf{path}\) terms as follows:

$$\begin{aligned} \textsf{FP}^{\#}(\pi (x,y)) &= \{ \textsf{path}_S(h_\textsf{n}, x, y) \} {} & {} \text {for }\pi \in \{\textsf{sls}, \textsf{dls}\}\\ \textsf{FP}^{\#}(\textsf{nls}(x,y,z)) &= \{ \textsf{path}_N(h_\textsf{t}, h_\textsf{n}, x, y, z) \} \end{aligned}$$

The common part of the translation \(\textsf{T}(\pi (x,y), F)\) postulates the existence of a top-level path from x to y and a domain F based on this path (formalised in the formula \(\mathsf {main\_path}\) below); and ensures that all locations have the correct sort (through the formula \(\textsf{typing}\)). For DLLs, we add an invariant which ensures that its locations are correctly doubly-linked (the \(\mathsf {back\_links}\) formula), and we further need a special treatment of the cases when the list is empty as well as a special treatment for its roots and sinks (cf. the formula \(\textsf{boundaries}\)). For NLLs, we add an invariant stating that an inner list starts from each location in its top-level path (the \(\mathsf {inner\_lists}\) formula) and that those inner paths are disjoint (the \(\textsf{disjoint}\) formula)Footnote 4 .

  • \(\textsf{T}(\textsf{sls}(x,y), F) \;\triangleq \; \mathsf {main\_path} \wedge \textsf{typing}\) where

    $$\begin{aligned} \mathsf {main\_path} \triangleq \textsf{reach}(h_\textsf{n}, x, y) \wedge F = \textsf{path}_S(h_\textsf{n}, x, y) \text { and } \textsf{typing} \triangleq F \subseteq D_\mathbb {S}. \end{aligned}$$
  • \(\textsf{T}(\textsf{dls}(x,y,x',y'), F) \;\triangleq \; \textsf{empty} \vee \textsf{nonempty}\) where

    $$\begin{aligned} \textsf{empty} &\triangleq x = y \wedge x' = y' \wedge F = \emptyset ,\\ \textsf{nonempty} &\triangleq x \ne y \wedge x' \ne y' \wedge \mathsf {main\_path} \wedge \textsf{boundaries} \wedge \textsf{typing} \wedge \mathsf {back\_links},\\ \mathsf {main\_path} &\triangleq \textsf{reach}(h_\textsf{n}, x, y) \wedge F = \textsf{path}_S(h_\textsf{n}, x, y),\\ \textsf{boundaries} &\triangleq h_\textsf{p}[x] = y' \wedge h_\textsf{n}[x'] = y \wedge x' \in F \wedge y' \not \in F,\\ \textsf{typing} &\triangleq F \subseteq D_\mathbb {D},\\ \mathsf {back\_links} &\triangleq \forall \ell .\; (\ell \in F \wedge \ell \ne x') \xrightarrow {}h_\textsf{p}[h_\textsf{n}[\ell ]] = \ell . \end{aligned}$$
  • \(\textsf{T}(\textsf{nls}(x,y,z), F) \;\triangleq \; \mathsf {main\_path} \wedge \textsf{typing} \wedge \mathsf {inner\_lists} \wedge \textsf{disjoint}\) where

    $$\begin{aligned} \mathsf {main\_path} &\triangleq \textsf{reach}(h_\textsf{t}, x, y) \wedge F = \textsf{path}_N(h_\textsf{t},h_\textsf{n}, x, y, z),\\ \textsf{typing} &\triangleq \textsf{path}_S(h_\textsf{t}, x, y) \subseteq D_\mathbb {N}\wedge F \setminus \textsf{path}_S(h_\textsf{t}, x, y) \subseteq D_\mathbb {S},\\ \mathsf {inner\_lists} &\triangleq \forall \ell .\; \ell \in F \cap D_\mathbb {N}\xrightarrow {}\textsf{reach}(h_\textsf{n}, h[\ell ], z),\\ \textsf{disjoint} &\triangleq \forall \ell _1, \ell _2. \bigl (\{\ell _1, \ell _2\} \subseteq F \wedge \ell _1 \ne \ell _2 \wedge h_\textsf{n}[\ell _1] = h_\textsf{n}[\ell _2]\bigl ) \xrightarrow {}h_\textsf{n}[\ell _1] \not \in F. \end{aligned}$$

Path quantifiers. Invariants of paths are naturally expressed using universal quantifiers. For quantifiers, however, we cannot directly take advantage of bounds on path lengths. Therefore, similarly as for separating conjunctions, we use the idea of replacing quantifiers by small enumerations of their instances, which is efficient when we can compute small enough bounds on the paths. For example, if we know that the length of an \(\textsf{f}\)-path with a root x is at most two, it is enough to instantiate its invariant for x, \(h_\textsf{f}[x]\), and \(h^2_\textsf{f}[x]\). This idea is formalised using expressions \(\mathbb {P}^{\le n}_{(h, x)}\;\ell .\; \psi \), which we call path quantifiers and which state that \(\psi \) holds for all locations of the path with the length n starting from x via the array h:

$$\mathbb {P}^{\le n}_{(h, x)}\;\ell .\; \psi \;\triangleq \; \bigwedge \!_{0 \le i \le n} \;\; \psi [h^i[x]/\ell ].$$

If we need to quantify over nested paths, we need to use two path quantifiers (one for the top-level path and one for the nested paths). The quantifiers in the last conjunct of the NLL translation can be rewritten as \(\mathbb {P}_{(h_t, x)}\;\ell '_1.\; \mathbb {P}_{(h_t, x)}\;\ell '_2.\; \mathbb {P}_{(h_n, \ell '_1)}\;\ell _1.\; \mathbb {P}_{(h_n, \ell '_2)}\;\ell _2.\;\) In this expression, \(\ell '_1\) and \(\ell '_2\) range over locations in the top-level list, and \(\ell _1\) and \(\ell _2\) range over locations in the nested paths starting from \(\ell '_1\) and \(\ell '_2\), respectively.

5.4 Complexity

This section briefly discusses the complexity of the proposed decision procedure as well as the complexity lower bound for the satisfiability problem in the considered fragment of SL. We will use \(\textsf{SAT}(\omega _1, \ldots , \omega _n)\) to denote the satisfiability problem for a sub-fragment constructed of atomic formulae and the connectives \(\omega _i\) and \(\textsf{SAT}(\overline{\omega _1, \ldots , \omega _n})\) to denote the fragment where none of the connectives \(\omega _i\) appear.

Theorem 4

The procedure \(\texttt{SatQuantif}\) produces formula of polynomial size, and, for \(\textsf{SAT}(\overline{\wedge \lnot })\), it runs in \(\textsf{NP}\). The procedure \(\texttt{SatEnum}\) runs in \(\textsf{NP}\) for \(\textsf{SAT}(\overline{\vee })\).

Proof (sketch)

When not considering the instantiation of quantifiers over footprints, both \(\texttt{SatQuantif}\) and \(\texttt{SatEnum}\) produce a formula \(\textsf{T}(\varphi )\) of a polynomial size dominated by the translation of inductive predicates. For the variant of the translation of inductive predicates using universal quantifiers over locations, the size is \(\mathcal {O}(n^3)\) for SLLs and DLLs (dominated by the \(\mathcal {O}(n^3)\) size of the \(\textsf{path}_S\) term), and \(\mathcal {O}(n^5)\) for NLLs (dominated by \(\textsf{path}_N\)). If the input formula does not contain guarded negations, then all quantifiers can be eliminated using Skolemization. The translated formulae are then in a theory decidable in \(\textsf{NP}\) (e.g., when sets are encoded as extended arrays [22]).

The procedure \(\texttt{SatEnum}\) can produce exponentially large formulae because of the footprint enumeration. This can be prevented if the input formula does not contain disjunctions, in which case the footprints of all sub-formulae are unique, i.e., singleton sets. The translated formulae are then again in a theory decidable in \(\textsf{NP}\).    \(\square \)

Theorem 5

\(\textsf{SAT}(\mapsto , \wedge \lnot , \wedge , \vee , *)\) is \(\textsf{PSPACE}\)-complete.

Proof (sketch)

Membership in \(\textsf{PSPACE}\) was proved in [26] for a more expressive fragment. For the hardness part, we build on the reduction from QBF used in [7]. In this reduction, the boolean value of a variable is represented by the corresponding SL variable being allocated (always pointing to \(\textsf{nil}\) for simplicity). The fact that x is false is expressed using a negative points-to predicate stating that x is not allocated. The existential quantifier is expressed using the separating conjunction, and the universal quantifier is obtained using the (unguarded) negation. (For details, see [7].)

We show that this reduction can be done without the unguarded negation and the negative points-to assertion, using the guarded negation instead. The key observation is that, for a QBF formula with variables X, we can express that all variables in X can have arbitrary boolean values as . In the context of variables X, we can then express negation as \(\lnot F \triangleq \textsf{arbitrary}[X] \wedge \lnot F\) and the truth values of a variable x as \(\lnot x \triangleq \textsf{arbitrary}[X \setminus \{x\}]\) and \(x \triangleq \textsf{arbitrary}[X] *x \mapsto \textsf{nil}\). The rest of the reduction then easily follows [7].    \(\square \)

6 Optimised Bound Computation

In many practical cases, the main source of complexity is the translation of inductive predicates, which heavily depends on the possible lengths of paths between locations. We now propose how to bound the length of these paths based on the so-called SL-graphs which are graph representations of constraints imposed by SL formulae. SL-graphs were originally used for representation and deciding of symbolic heaps with lists in [8]. Here, we use their generalised form which captures must-relations holding in all models of a given formula. Note that the nodes of the graphs are implicitly given by the domains of the involved relations, which themselves can be viewed as edges.

Definition 3

An SL-graph of \(\varphi \) is a tuple where:

  • is an equivalence relation called must-equality,

  • is a symmetric relation called must-disequality,

  • is a must-\(\textsf{f}\)-pointer relation,

  • is an irreflexive must-\(\textsf{f}\)-path relation,

  • is a symmetric relation called must-\(\textsf{f}\)-path-disjointness.

Except , the components of \(G[\varphi ]\) represent atomic formulae—equalities, disequalities, pointers, and paths (i.e., list segments)—holding within all models of \(\varphi \). The fact that states that, in all models of \(\varphi \), the domains of \(\textsf{f}\)-paths from \(x_1\) to \(y_1\) and from \(x_2\) to \(y_2\) are disjoint.

To compute the SL-graph \(G[\varphi ]\), we define some auxiliary notation. We define \(G_\emptyset \) to be an SL-graph where all the relations are empty. We write \(G \lhd \{x_i \bowtie _i y_i\}_{i \in I}\) to denote the SL-graph \(G'\) which is the same as G with the elements \(x_i \bowtie _i y_i\) for \(i \in I\) added to the corresponding relations. We use \(\sqcup \) and \(\sqcap \) as a component-wise union and intersection of SL-graphs, respectively. We define the disjoint union of SL-graphs as:

figure p

Here, \(\textsf{paths}_\textsf{f}(G)\) is defined as , and the set of must-allocated variables is (\(\textsf{nil}\) is added for technical reasons). We further assume that all operations on SL-graphs (\(\lhd \), \(\sqcup \), \(\sqcap \), and  ) preserve relational properties (symmetry, transitivity, etc.) of the components of SL-graphs by computing the corresponding closures after the operation is performed. We compute the SL-graph \(G[\varphi ]\) as follows.

figure t

Observe that we only approximate \(\textsf{dls}\) and \(\textsf{nls}\). After the construction is finished, we apply the following rules for matching of pointers and for detection of inconsistencies.

figure u

Tighter location bounds. Using SL-graphs, we can slightly improve the location bound from Section 4 by considering equivalence classes of instead of individual variables (this can be also used to refine the later described path bound computation) and by defining \(||x || =1\) if x is a must-pointers, i.e., for some \(\textsf{f}\) and y.

Path bounds. We now fix an \(\textsf{f}\)-path \(\sigma \) from \(x^S\) to y and show how to compute an interval \([\ell , u]\) that gives bounds on its length. The computation of the path bounds runs in two steps. In the first step, we compute an initial bound \([\ell _{e}^0, u_{e}^0]\) for each edge \(e \in \textsf{paths}_\textsf{f}(G)\). If e is a pointer edge, its bound is given as [1, 1]. For a path edge \(e = (a,b)\), we define \(\ell _e^0 = 1\) if and 0 otherwise; while \(u_e^0\) is defined as \(\textsf{bound}_S(\varphi ) - \sum _{v \in V} ||v ||\) where

This way, we exclude from the computation of the initial upper bound the source v of each path disjoint with \(\sigma \) and all locations possibly allocated in a chunk with the root v. Note that it can be the case that the actual size of this chunk has a lesser size than \(||v ||\), but this means that we were too conservative when computing the global location bound and can decrease the path bound by the same number anyway.

In the second phase, we compute the bounds of the path \(\sigma \) using initial bounds from the first step. The computation is based on two weighted directed graphs derived from the SL-graph G: \(G^{\textrm{u}}_\sigma \) for the upper bound and \(G^{\mathrm {\ell }}_\sigma \) for the lower bound (in both cases, the vertices are implicitly given as \(\textsf{vars}(\varphi )\), and the edge weight of an edge e is given by \(u_e^0\) and \(\ell _e^0\) computed in the previous step, respectively):

figure z

Here, the condition \(\textsf{nonempty}(y, w)\) states that a directed SL-graph edge (yw) is non-empty which holds if either .

Intuitively, the upper bound u is computed as the length of the shortest path from x to y in \(G^{\textrm{u}}_\sigma \). Since \(\textsf{f}\)-paths are uniquely determined, we know that no path can be longer than the shortest one, and thus u is indeed a correct upper bound. The lower bound \(\ell \) is computed as the length of the longest path starting from x (ending anywhere) in \(G^{\mathrm {\ell }}_\sigma \). By construction, \(G^{\mathrm {\ell }}_\sigma \) contains only those edges for which one can prove that they cannot contain y in their domains. A path from x of a length \(\ell \) therefore implies that x cannot reach y in less than \(\ell \) steps, and thus \(\ell \) is indeed a correct lower bound.

Fig. 3.
figure 3

An illustration of the bound computation for the path \(\sigma \) from a to c on a fragment of SL-graph of \(\varphi \triangleq \bigl (\textsf{sls}(a, b) *b \mapsto c *c \mapsto d *\textsf{sls}(d, a)\bigr ) \wedge \lnot \bigl (\textsf{sls}(a,c) *\textsf{sls}(c,a)\bigr )\). The highlighted edges denote the paths used to determine the bound [1, 3].

Example. We demonstrate the path bound computation in Fig. 3, which shows a fragment of the SL-graph of a formula \(\varphi \) (it shows only those edges that are relevant in our example) and the graphs \(G^{\mathrm {\ell }}_\sigma \) and \(G^{\textrm{u}}_\sigma \) for the path \(\sigma \) from a to c. We have that \(||b || = ||c || = 1\) and \(||a || = ||d || = 2\). This gives us the location bound, which is 6. In the first phase, we compute the initial bound [0, 2] for paths of the predicates \(\textsf{sls}(a, b)\) and \(\textsf{sls}(d, a)\) because both of them are disjoint with all the other paths in \(G[\varphi ]\). In the second phase, we get the bound for \(\sigma \) equal to [1, 3] instead of the default bound [0, 6].

7 Experimental Evaluation

We have implemented the proposed decision procedure in a new solver called AstralFootnote 5. Astral is written in OCaml and can use multiple backend SMT solvers. With the encoding presented in Section 5, it can use either cvc5 supporting set theory directly [1] or Z3 supporting it by a reduction to the extended theory of arrays [22]. We have also developed an alternative encoding in which both locations and location sets are represented as bitvectors. The bitvector encoding differs only in expressing set operations on the level of bitvectors with additional axioms ensuring that all locations “can fit” into sets encoded by the bitvectors (for details, see [9]). With the bitvector encoding, a backend solver only needs to support theories of bitvectors and arrays, which are both standard and supported by many other SMT solvers. Another advantage is that the quantification on bitvectors seems to perform significantly better than on sets.

In our experiments, if we do not say explicitly which encoding and solver is used, we use the bitvector encoding and Bitwuzla [25] as the backend solver, which we found to be the best performing combination. We set a limit for the method \(\texttt{SatEnum}\) to 64 footprints. If this limited is exceeded, we dynamically switch to \(\texttt{SatQuantif}\). We use path quantifiers when the path bound is at most half of the domain bound. These are design choices that can be revisited in the future.

All experiments were run on a machine with 2.5 GHz Intel Core i5-7300HQ CPU and 16 GiB RAM, running Ubuntu 18.04. The timeout was set to 60 s and the memory limit to 1 GB. Our experiments were conducted using Benchexec  [4], a framework for reliable benchmarking.

Table 1. Experimental results for formulae from SL-COMP. The columns are: solved instances (OK), out of time/memory (RO), instances on which Astral wins—Astral can solve it and the other solver not or Astral solves it faster (WIN), instances solved in the time limits of 0.1 s and 1 s, and the total time for solved instances in seconds.

7.1 Entailments of Symbolic Heaps

In the first part of our evaluation, we focus on formulae from the symbolic heap fragment which is frequently used by verification tools and for which there exist many dedicated solvers. We therefore do not expect to outperform the best existing tools but rather to obtain a comparison with other translation-based decision procedures.

In Table 1a, we provide results for the category QF_SHLID_ENTL (entailments with SLLs). We divide the category into two subsets: verification conditions (which are simpler) and more complex artificially generated formulae “bolognesa" and “clones" from [23]. During the experiments, we found out that several “cloned” entailments contain root variables on the right-hand side of the entailment that do not appear on the left-hand side, making the entailment trivially invalid when its left-hand side is satisfiable. For a few hard clone instances, this makes a problem for Astral as it cannot use the path bound computation as such roots do not appear in the SL-graph. We have therefore implemented a heuristic that detects entailments \(\varphi \,\models \,\psi \) that can be reduced to satisfiability of \(\varphi \). Since this is a benchmark-specific heuristic, we present also the version without this heuristic (Astral \(^*\)) in Table 1a. The optimised version of Astral is able to solve all the formulae being faster than other translation-based solvers GRASShopperFootnote 6 and Sloth. For illustration, the table further contains the second best solver in the latest edition of SL-COMP, S2SFootnote 7.

In Table 1b, we provide results for a subset of the category QF_SHLID_ENTL (entailments with linear inductive definitions from which we selected DLLs and NLLs) for Astral and three best-performing solvers competing in the latest edition of SL-COMP—S2S, Songbird (in the version with automated lemma synthesis called SLS), and Harrsh. We also include GRASShopper which supports DLLs only. Except S2S which solves almost all formulae virtually immediately, Astral is the only one able to solve all the formulae in the given time limit.

7.2 Experiments on Formulae Outside of the Symbolic Heap Fragment

For formulae outside of the symbolic heap fragment and its top-level boolean closure, there are currently no existing benchmarks. For now, we therefore limit ourselves to randomly generated but extensive sets of formulae. In the future, we would like to develop a program analyser using symbolic execution over BSL and make more careful experiments on realistic formulae.

We first focus on the fragment with guarded negations but without inductive predicates, on which we can compare Astral with cvc5. We have prepared a set of 1000 entailments of the form \(\varphi \,\models \,\psi \) which are generated as random binary trees with depth 8 over 8 variables with the only atoms being pointer assertions. To reduce the number of trivial instances, we only generated formulae for which \(\textsf{vars}(\psi ) \subseteq \textsf{vars}(\varphi )\) and Astral cannot deduce contradiction from their SL-graphs. To avoid any suspicion that the difference is caused by better performance of the backend solver rather than the design of our translation, we used Astral with the \(\textsc {cvc5} \) backend and direct set encoding (with Bitwuzla and bitvector encoding, our results would be even better). The results are given in Fig. 4a and suggest that our treatment of guarded negations really brings a better performance—Astral can solve all the instances and almost all of them under 10 seconds. On the other hand, cvc5 timed out in 61 cases and is usually slower than Astral, in particular on satisfiable formulae which represent invalid entailments.

In the second experiment, we compared our solver with GRASShopper on the fragment which it supports, i.e., arbitrary nesting of conjunctions and disjunctions. We again generated 1000 entailments, this time with depth 6, 6 variables and with atoms being singly-linked lists (with 20 % probability) or pointer-assertions. The results are given in Fig. 4b. Astral ran out of memory in 5 cases, and GRASShopper timed out in 10 cases. In summary, Astral is faster on more than 80 % of the formulae with an almost 3 times lesser running time.

Finally, to illustrate that Astral can indeed handle formulae out of the fragments of all the other mentioned tools, we apply it on an entailment query that involves the formula mentioned at the end of the introduction: \(((\textsf{sls}(x,y) \wedge \lnot (\textsf{sls}(x,z) *\textsf{sls}(z,y))) *y \mapsto z)\,\models \,\textsf{sls}(x,z),\) converted to an unsatisfiability query. Astral resolves the query in 0.12 s. Note that without the requirement \(\lnot (\textsf{sls}(x,z) *\textsf{sls}(z,y))\), the entailment does not hold as a cycle may be closed in the heap.

Fig. 4.
figure 4

A comparison of Astral with cvc5 and GRASShopper on randomly generated formulae. Times are in seconds, axes are logarithmic. The timeout was set to 60 s.

8 Conclusions and Future Work

We have presented a novel decision procedure based on a small-model property and translation to SMT. Our experiments have shown very promising results, especially for formulae with rich boolean structure for which our decision procedure outperforms other approaches (apart from being able to solve more formulae).

In the future, we would like to extend our approach with some class of user-defined inductive predicates, with more complex spatial connectives such as septractions and/or magic wands, consider a lazy and/or interactive translation instead of the current eager approach, and try Astral within some SL-based program analyser.