1 Introduction

Programs written in modern languages today are rife with higher-order functions [3, 35], but specifying and verifying them remains challenging, especially if they contain imperative effects. Consider the \( foldr \) function from OCaml. Here is a good specification for it in Iris [19], a state-of-the-art framework for higher-order concurrent separation logic that is built using Coq proof assistant.

figure a

While this specification is conventional in weakest-precondition calculi like Iris, one might argue that that this specification is not the best possible specification for \( foldr \), since it requires two abstract properties \( Inv \) and \( P \) to summarize the behaviour of \(f\). Moreover, the input list \(l\) is also immutable, through the same \( isList \) predicate in both its pre- and postcondition. (If mutation of list is allowed, a more complex \( Inv \) with an extra mutated list parameter is required.)

These abstract properties must be correspondingly instantiated for each instance of \(f\), but unfortunately some usage scenarios (to be highlighted later in Sect. 2.2) of \( foldr \) cannot be captured by this particular pre/post specification of Iris, despite how well-designed it was. Thus, the conventional pre/post approach to specifying higher-order functions currently suffers from possible loss in precision in its specifications since the presence of these abstract properties implicitly strengthens the preconditions for higher-order imperative methods.

This paper proposes a new logic, Higher-Order Staged Specification Logic (HSSL), for specifying and verifying higher-order imperative methods. It is designed for automated verification via SMT and uses separation logic as its core stateful logic, aiming at more precise specifications of heap-based changes. While we have adopted separation logic to support heap-based mutations, HSSL  may also be used with other base logics, such as those using dynamic frames [25]. We next provide an overview of our methodology by examples before providing formal details and an experimental evaluation of our proposal.

2 Illustrative Examples

We provide three examples to highlight the key features of our methodology.

2.1 A Simple Example

Fig. 1.
figure 1

A Simple Example

We introduce the specification logic using a simple example (Fig. 1), to highlight a key challenge we hope to solve, namely how should we specify the behavior of \( hello \) without pre-committing to some abstract property on \( f \)? To do that, we can model \( f \) using an uninterpreted relation. We use uninterpreted relation rather than a function here in order to model both over-approximation and possible side-effect. Since \( f \) is effectful and may modify arbitrary state, including the references \( x \) and \( y \), a modular specification of \( hello \) must express the ordering of the call to \( f \) with respect to the other statements in it so that the caller of \( hello \) may reason precisely about its effects. Therefore, a first approximation is the following specification. We adopt standard separation logic pre/post assertions and extend them with sequential composition and uninterpreted relations. A final parameter (named as \( res \) here) is added to denote the result of each staged specification’s relation (\( hello \) here), a convention we follow henceforth.

figure b

We can summarize the imperative behavior of \( hello \) before the call to \( f \) with a read from \( x \), followed by a write to \( x \), as captured by Stages 1–2. The same applies to the portion after the call to \( f \) (lines 4–6), but here we only consider the scenario when \( x \) and \( y \) are disjointFootnote 1. Stages 4 and 5 state that memory location \( x \) is being read while \( y \) will be correspondingly updated.

The ordering of the unknown \( f \) call with respect to the parts before and after does matter, so the call can be seen as stratifying the temporal behavior of the function into stages. Should a specification for \( f \) become known, usually at a call site, its instantiation may lead to a staged formula with only req/ens stages; which can always be compacted into a single req/ens pair. We detail a normalization procedure to do this in Sect. 3.2.

As mentioned before, \( f \) can modify \( x \) despite not having direct access to it via an argument, as it could capture \( x \) from the environment of the caller of \( hello \). To model this, we make worst-case assumptions on the footprints of unknown functions, resulting in the precondition \({{x}}\,{\mapsto }\,{ }{{b}}\) in stage 4.

2.2 Pre/Post Vs Staged Specifications via \( foldr \)

We now specify \( foldr \) and compare it with the Iris specification from Sect. 1.

figure c

We model \( foldr \) as a recursive predicate whose body is a staged formula. The top-level disjunction represents the two possible paths that result from pattern matching. In the base case, when \( l \) is the empty list, and the result of \( foldr \) is \( a \). In the recursive case, when \( l \) is nonempty, the specification expresses that the behavior of \( foldr \) is given by a recursive call to \( foldr \) on the tail of \( l \) to produce a result \( r \), followed by a call to \( f \) with \( r \) to produce a value for \( rr \). Crucially, we are able to represent the call to the unknown function \( f \) directly in the specification, without being forced to impose a stronger precondition on \( f \).

\( foldr \)’s specification’s is actually very precise, to the point of mirroring the \( foldr \) program. Nevertheless, abstraction may readily be recovered by proving that this predicate entails a weaker formula, and a convenient point for this would be when the unknown function-typed parameter is instantiated at each of \( foldr \)’s call sites; we discuss an example of this shortly. The point of specifying \( foldr \) this way is that the precision of stages enables us not to have to commit to an abstraction prematurely. We should, of course, summarize as early as is appropriate to keep our proving process tractable.

Recursive staged formulae are needed mainly to specify higher-order functions with unknown function-typed parameters. Otherwise, our preference is to apply summarization to obtain non-recursive staged formulae whenever unknown function-type parameters have been suitably instantiated. Under this scenario, we may still use recursive pure predicates or recursive shape predicates in order to obtain best possible modular specifications for our program code.

Now, we show how the staged specification for \( foldr \) can be used by proving that we can sum a list by folding it. \( sum \) can be specified in a similar way to \( foldr \), but since this is a pure function that can be additionally checked for termination, we can automatically convert it into a pure predicate (without any stages or imperative side effects) to be used in (the pure fragment of) our specification logic. Termination of pure predicates is required for them to be safely used in specifications. (Techniques to check for purity and termination are well-known and thus omitted.) Also, each pure predicate may be used as either a staged predicate or a pure predicate. In case a pure predicate \( p(v^*, res ) \) is used as a staged predicate; its staged definition is:

$$\begin{aligned} p (v^*, res ) =~ & {\textbf {req}}{\,}{{ emp }}{\wedge }pre(v^*);{\textbf {ens}}[\_]{\,}{{ emp }}{\wedge } p (v^*, res ) \end{aligned}$$

where \( pre(v^*) \) denotes the precondition to guarantee termination and avoids exceptions. Note that \( p (v^*, res )\) is overloaded to be used as either a staged predicate or a pure predicate. This is unambiguous from the context of its use.

figure d

We can now re-summarize an imperative use of \( foldr \) with the help of \( sum \).

figure e

This summarization gives rise to the following entailment:

$$\begin{aligned} \quad &\forall \, m , xs , init , res .\, foldr (g,xs,init, res ) \\ \mathrel {\sqsubseteq }\quad & {\exists }\,{i,r}{\,{.}\,}{\textbf {req}}{\,}{{x}}\,{\mapsto }\,{ }{{i}};{\textbf {ens}}[ res ]{\,}{{x}}\,{\mapsto }\,{ }{{i{+}r}}{\wedge } res {=}r{+}init {\wedge } sum (xs,r) \end{aligned}$$

We have implemented a proof system for subsumption (denoted by \(\mathrel {\sqsubseteq }\)) between staged formulae in our verifier, called Heifer [13]. This particular entailment can be proved automatically by induction on \( xs \). While Iris’s earlier pre/post specification for \( foldr \) can handle this example through a suitable instantiation of \(( Inv ~\_~\_)\), it is unable to handle the following three other call instances.

figure f

The first example cannot be handled since Iris’s current specification for \( foldr \) expects its input list \( l \) to be immutable. The second example fails since the precondition required cannot be expressed using just the abstract property \((P~x)\). The last example fails because the abstract property \(( Inv ~(x\,{::}\,ys)~r)\) used in the postcondition of \( f \) expects its method calls to return normally. In contrast, using our approach via staged specification , we can re-summarize the above three call instances to use the following subsumed specifications.

figure g

Note that the first example utilizes a recursive spatial \( List (l,xs)\) predicate, while the last example used \( Exc() \) as a relation to model exception as a stage in our specification. The three pure predicates and one spatial predicate used in the above can be formally defined, as shown below.

figure h

We emphasize that our proposal for staged logics is strictly more expressive than traditional two-stage pre/post specifications, since the latter can be viewed as an instance of staged logics. As an example, the earlier two-stage specification for \( foldr \) can be modelled non-recursively in our staged logics as:

figure i

2.3 Inferrable Vs User-Provided Specifications via \( map \)

Fig. 2.
figure 2

Implementation of \( map\_incr \) with a Summarized Specification from \( map \)

Our methodology for higher-order functions is further explicated by the \( map \) method, shown in Fig. 2. Specifications typeset in must be user-supplied, whereas those shown in (with the small circle) may be automated or inferred (using the rules of Sect. 4). Like \( sum \) before, \( length \) and \( incrg \) may be viewed as ghost functions, written only for their specifications to be used to describe behavior. These specifications are also routine and can be mechanically derived; we elide them here and provide them in Appendix A [15]. The method \( map\_incr \) describes the scenario we are interested in, where the state of the closure affects the result of map. Its specification states that the pointer \( x \) must have its value incremented by the length of \( xs \). Moreover, the contents of the resulting list is captured by another pure function \( incrg \), which builds a list of as many increasing values as there are elements in its input list.

These examples illustrate the methodology involved with staged specifications. They inherit the modular verification and biabduction-based [4] specification inference of separation logic, adding the ability to describe imperative behavior using function stages to the mix; biabduction then doubles as a means to normalize and compact stages. There is emphasis on the inference of specifications and proof automation, and proofs are built out of simple lemmas, which help summarize behavior and the shapes of data, and either remove recursion or move it into a pure ghost function where it is easier to comprehend.

In summary, staged logic for specifying imperative higher-order functions represents a fundamentally new approach that is more general and yet can be more precise than what is currently possible via state-of-the-art pre/post specification logics for imperative higher-order methods. Our main technical contributions to support this new approach include:

  1. 1.

    Higher-Order Staged Specification Logic (HSSL): we design a novel program logic to specify the behaviors of imperative higher-order methods and give its formal semantics.

  2. 2.

    Biabduction-based Normalization: we propose a normalization procedure for HSSL that serves two purposes: (i) it allows us to produce succinct staged formulae for programs automatically, and (ii) it helps structure entailment proof obligations, allowing them to be discharged via SMT.

  3. 3.

    Entailment: we develop a proof system to solve subsumption entailments between normalized HSSL formulae, prove its soundness, and implement an automated prover based on it.

  4. 4.

    Evaluation: we report on initial experimental results, and present various case studies highlighting HSSL’s capabilities.

3 Language and Specification Logic

We target a minimal OCaml-like imperative language with higher-order functions and state. The syntax is given in Fig. 3. Expressions are in ANF (A-normal form); sequencing and control over evaluation order may be achieved using let-bindings, which define immutable variables. Mutation may occur through heap-allocated \( ref \)s. Functions are defined by naming lambda expressions, which may be annotated with a specification \(\Phi \) (covered below). For simplicity, they are always in tupled form and their calls are always fully applied. Pattern matching is encoded using recognizer functions (e.g., \(is\_cons\)) and \( if \) statements. \( assert \) allows proofs of program properties to be carried out at arbitrary points.

Fig. 3.
figure 3

Syntax of the Core Language and Staged Logics

Program behavior is specified using staged formulae   \(\Phi \), which are disjunctions and/or sequences of stages \(E\). A stage is an assertion about program state at a specific point. Each stage takes one of three forms: a precondition \({\textbf {req}}{\,}D\), a postcondition \({\textbf {ens}}[r]{\,}D\) with a named result \( r \), or a function stage \( f (v^*, r)\), representing the specification of a (possibly-unknown) function call. For brevity, we use a context notation \( \Phi [r] \) where \( r \) explictly identifies the final result of specification \( \Phi \). Program states \(D\) are described using separation logic formulae from the symbolic heap fragment [4], without recursive spatial predicates (for simplicity of presentation). Most values of the core language are as usual also terms of the (pure) logic; a notable exception is the lambda expression , which occurs in the logic as , without its body. Subsumption assertions between two staged formulae (Sect. 5) are denoted by \( {\mathrm {\Phi }}_1{\mathrel {\sqsubseteq }}{\mathrm {\Phi }}_2 \).

3.1 Semantics of Staged Formulae

From Triples to Stages. Staged formulae generalize standard Hoare triples. The standard partial-correctness interpretation of the separation logic Hoare triple \(\{~P(v^*,x^*)~\}~e~\{~\exists \,{y^*}{\,{.}\,}Q(v^*,x^*,y^*, res )~\}\) where \( v^* \) denote valid program variables and \( x^* \) denote specification variables (e.g., ghost variables) is that for all states \( st \) satisfying \(P(v^*,x^*)\), given a reduction \(e, st \leadsto ^* v, st '\), if \(e, st \not \leadsto ^* fault \), then \( st '\) satisfies \(\exists \,{y^*}{\,{.}\,}Q(v^*,x^*,y^*, res )\). The staged equivalent is \(\{~{\mathrm {\Phi }}~\}~e~\{~{\mathrm {\Phi }};{\exists }\,{x^*}{\,{.}\,}{\textbf {req}}{\,}P(v^*,x^*);{\exists }\,{y^*}{\,{.}\,}{\textbf {ens}}[\_]{\,}Q(v^*,x^*,y^*, res )~\}\). Apart from mentioning the history \({\mathrm {\Phi }}\), which remains unchanged, its meaning is identical. Consider, then, \(\{~{\mathrm {\Phi }}~\}~e~\{~{\mathrm {\Phi }};{\textbf {req}}{\,}P_1;{\textbf {ens}}[\_]{\,}Q_1;{\textbf {req}}{\,}P_2;{\textbf {ens}}[\_]{\,}Q_2~\}\) – an intuitive extension of the semantics of triples is that given \(e, st \leadsto ^*e_1, st _1\), where \( st _1\) satisfies \(Q_1\), the extended judgment holds if \( st _1\) further satisfies \(P_2\), and reduction from there, \(e_1, st _1\leadsto ^*e_2, st _2\), results in a state \( st _2\) that satisfies \(Q_2\).

While heap formulae are satisfied by program states, staged formulae (like triples), are satisfied by traces which begin and end at particular states. Uninterpreted function stages further allow stages to describe the intermediate states of programs in specifications – a useful ability in the presence of unknown higher-order imperative functions, as we illustrate in Sect. 2 and Appendix C [15]. To formalize all this, we give a semantics for staged formulae next.

Formal Semantics. We first recall the standard semantics for separation logic formulae in Fig. 4, which provides a useful starting point.

Fig. 4.
figure 4

Semantics of Separation Logic Formulae

Let \( var \) be the set of program variables, \( val \) the set of primitive values, and \( loc \subset val \) the set of heap locations; \(\ell \) is a metavariable ranging over locations. The models are program states, comprising a store of variables S, a partial mapping from a finite set of variables to values \( var \rightharpoonup val \), and the heap h, a partial mapping from locations to values \( loc \rightharpoonup val \). \(\llbracket {\pi }\rrbracket _{S}\) denotes the valuation of pure formula \({\pi }\) under store S. \( dom (h) \) denotes the domain of heap \( h \). \( h_1 {\circ } h_2 {=} h \) denotes disjoint union of heaps; if \( dom (h_1) {\cap } dom (h_2) = \{\} \), \( h_1 {\cup } h_2 = h \). We write \(h_1{\subseteq }h_2\) to denote that \(h_1\) is a subheap of \(h_2\), i.e., \({\exists }\,{h_3}{\,{.}\,} h_1 {\circ } h_3 {=} h_2\). \(s[x{:=}v]\) and stand for store/heap updates and removal of keys.

Fig. 5.
figure 5

Semantics of Staged Formulae

We define the semantics of HSSL formulae in Fig. 5. Let denote the models relation, i.e., starting from the program state with store \( S \) and heap \( h \), the formula \({\mathrm {\Phi }}\) transforms the state into \( S_1, h_1 \), with an intermediate result \( R \). \( R \) is either \( Norm(r) \) for partial correctness, \( { {Err}} \) for precondition failure, or \( \top \) for possible precondition failure in one of its execution paths.

When \({\mathrm {\Phi }}\) is of the form \({\textbf {req}}{\,}{{ \sigma }}{\wedge }{{ \pi }}\), the heap h is split into a heaplet \(h_1\) satisfying \({{ \sigma }}{\wedge }{{ \pi }}\), which is consumed, and a frame \(h_2\), which is left as the new heap. Read-only heap assertions\(({{ \sigma }}{\wedge }{{ \pi }})@R\) under \({\textbf {req}}\) check but do not change the heap.

When \({\mathrm {\Phi }}\) is of the form \({\textbf {ens}}[\_]{\,}{{ \sigma }}{\wedge }{{ \pi }}\), \({{ \sigma }}\) describes locations which are to be added to the current heap. The semantics allows some concrete heaplet \(h_1\) that satisfies \({{ \sigma }}{\wedge }{{ \pi }}\) (containing new or updated locations) be (re-)added to heap h.

When \({\mathrm {\Phi }}\) is a function stage \( f (x^*, r)\), its semantics depends on the specification of f. A staged existential causes the store to be extended with a binding from \( x \) to an existential value \( v \). Sequential composition \( {\mathrm {\Phi }}_1{;}{\mathrm {\Phi }}_2 \) results in a failure \( \top \) if \( {\mathrm {\Phi }}_1 \) does, while disjunction \( {\mathrm {\Phi }}_1{\vee }{\mathrm {\Phi }}_2 \) requires both branches not to fail.

3.2 Compaction

Staged formulae subsume separation logic triples, but triples suffice for many verification tasks, particularly those without calls to unknown functions, and we would like to recover their succinctness in cases where intermediate states are not required. This motivates a compaction or normalization procedure for staged formulae, written \({\mathrm {\Phi }}~{=}\!{=}\!{>}~ {\mathrm {\Phi }}\) (Fig. 6). Compaction is also useful for aligning staged formulae, allowing entailment proofs to be carried out stage by stage; we elaborate on this use in Sect. 5.

Fig. 6.
figure 6

Select compaction rules

The three rules on the left simplify flows. A false postcondition (\({\textbf {ens}}~{{ \sigma }}{\wedge } false \)) models an unreachable or nonterminating program state, so the rest of a flow may be safely ignored. \({{ emp }}\) in the next two rules is either \(({\textbf {req}}~{{ emp }}{\wedge }true)\) or \(({\textbf {ens}}~{{ emp }}{\wedge }true)\); either may serve as an identity for flows. The first two rules on the right merge consecutive pre- and postconditions. Intuitively, they are sound because symbolic heaps separated by sequential composition must be disjoint to be meaningful – this follows from the use of disjoint union in Fig. 5. The last rule allows a precondition \({\textbf {req}}~D_2\) to be transposed with a preceding postcondition \({\textbf {ens}}~D_1\). This is done using biabduction [4], which computes a pair of antiframe \(D_A\) and frame \(D_F\) such that the antiframe is the new precondition required, and frame is what remains after proving the known precondition. The given rule assumes that \(D_1\) and \(D_2\) are disjointFootnote 2. A read-only \(@R\) heap assertion under \({\textbf {req}}\) would be handled by matching but not removing from \(D_F\) (see [7]).

Thus staged formulae can always be compacted into the following form, consisting of a disjunction of flows \(\mathrm {\theta }\) (a disjunction-free staged formula)Footnote 3, each consisting of a prefix of function stages (preceded by a description of the intermediate state at that point), followed by a final pre- and postcondition, capturing any behavior remaining after calling unknown functions.

$$\begin{aligned} {\mathrm {\Phi }} & \,::\,= \mathrm {\theta }{{ \mathbf {\,}|{\,} }}{\mathrm {\Phi }}\vee {\mathrm {\Phi }} \\ \mathrm {\theta } & \,::\,= ({\exists }\,{x^*}{\,{.}\,}{{\textbf {req}}{\,}D;{\exists }\,{x^*}{\,{.}\,}{\textbf {ens}}[\_]{\,}D;{ f (v^*, r)}}\ ;)^*\ {\exists }\,{x^*}{\,{.}\,}{{\textbf {req}}{\,}D;{\exists }\,{x^*}{\,{.}\,}{\textbf {ens}}[\_]{\,}D} \end{aligned}$$

An example of compaction is given below (Fig. 7, left). We start at the first two stages of the flow and solve a biabduction problem (shown on the right, with solution immediately below) to infer a precondition for the whole flow, or, more operationally, to “push” the req to the left. We will later be able to rely on the new precondition to know that \(a=1\) when proving properties of the rest of the flow. Finally, we may combine the two ens stages because sequential composition guarantees disjointness. Normalization is sound in the sense that it transforms staged formulae without changing their meaning.

Fig. 7.
figure 7

An example of compaction

Theorem 1

(Soundness of Normalization). Given \({\mathrm {\Phi }}_1{=}\!{=}\!{>}{\mathrm {\Phi }}_2\), if , then .

Proof

By case analysis on the derivation of \({\mathrm {\Phi }}_1{=}\!{=}\!{>}{\mathrm {\Phi }}_2\). See Appendix I.2 [15].

4 Forward Rules for Staged Logics

To verify that a program satisfies a given specification \({\mathrm {\Phi }}_s\), we utilize a set of rules (presented in Fig. 8) to compute an abstraction or summary of the program \({\mathrm {\Phi }}_p\), then discharge the proof obligation \({\mathrm {\Phi }}_p\mathrel {\sqsubseteq }{\mathrm {\Phi }}_s\) (covered in Sect. 5), in a manner similar to strongest postcondition calculations.

Fig. 8.
figure 8

Forward Reasoning Hoare Rules with Staged Logics

We make use of the following notations. \(\_\) denotes an anonymous existentially quantified variable. \([x{:=}v]{\mathrm {\Phi }}\) denotes the substitution of x with v in \({\mathrm {\Phi }}\), giving priority to recently bound variables. We lift sequencing from flows to disjunctive staged formulae in the natural way: \({\mathrm {\Phi }}_1\,{;}\,{\mathrm {\Phi }}_2 \triangleq \bigvee \{ \mathrm {\theta }_1\,{;}\,\mathrm {\theta }_2 \mid \mathrm {\theta }_1 \in {\mathrm {\Phi }}_2, \mathrm {\theta }_2 \in {\mathrm {\Phi }}_2 \}\).

The first two rules in Fig. 8 are structural. The \(\mathbf{\scriptstyle Conseq}\) rule uses specification subsumption (detailed in Sect. 5) in place of implication – a form of behavioral subtyping. The \(\mathbf{\scriptstyle Frame}\) rule has both a temporal interpretation, which is that the reasoning rules are compositional with respect to the history of the current flow, and a spatial interpretation, consistent with the usual one from separation logic, if one uses the normalization rules (Sect. 3.2) to move untouched p from the final states of \({\mathrm {\Phi }}_1\) and \({\mathrm {\Phi }}_2\) into the frame \({\mathrm {\Phi }}\).

The \(\mathbf{\scriptstyle Var}\) and \(\mathbf{\scriptstyle Val}\) rules illustrate how the results of pure expressions are tracked via named ens results. The \(\mathbf{\scriptstyle Ref}\) rule results in a new, existentially-quantified location being added to the current state. The \(\mathbf{\scriptstyle Deref}\) and \(\mathbf{\scriptstyle Assign}\) rules are similar, both requiring proof that a named location exists with a value, then respectively either returning the value of the location and leaving it unchanged, or changing the location and returning the unit value. \(\mathbf{\scriptstyle Assert}\) checks the current heap state without modifying it using the \(@R\) read-only annotation. \(\mathbf{\scriptstyle If}\) introduces disjunction. \(\mathbf{\scriptstyle Let}\) sequences expressions, renaming the intermediate result of \(e_1\) accordingly; the scope of x in \(e_2\) is represented by the scope of the introduced existential in the conclusion of the rule.

The \(\mathbf{\scriptstyle Lambda}\) rule handles function definition annotated with a given specification \({\mathrm {\Phi }}_s\). The body of the lambda is summarized into \({\mathrm {\Phi }}_p\) starting from pure information \( {{ Pure({\mathrm {\Phi }}) }} \) from its program context. Its behavior must be subsumed by the given specification. The result is then the lambda expression itself.

The \(\mathbf{\scriptstyle Call}\) rule is completely trivial, yet perhaps the most illuminating as to the design of HSSL. A standard modular verifier would utilize this rule to look up the specification associated with f, prove its precondition, then assume its postcondition. In our setting, however, there is the possibility that f is higher-order, unknown, and/or unspecified. Moreover, there is no need to prove the precondition of f immediately, due to the use of flows for describing program behaviors. Both of these point to the simple use of a function stage, which stands for a possibly-unknown function call. Utilizing the specification of f, if it is provided, is deferred to the unfolding done in the entailment procedure.

We prove soundness of these rules, which is to say that derived specifications faithfully overapproximate the programs they are derived from. In the following theorem, \(e, h, S \leadsto h_1, S_1\) is a standard big-step reduction relation whose definition we leave to Appendix I.1 [15]. Termination is also considered in Appendix I.5 [15]. However, completeness is yet to be established.

Theorem 2

(Soundness of Forward Rules). Given \(\{~{{ emp }}~\}~e~\{~{\mathrm {\Phi }}~\}\), then \( \Rightarrow \exists S_1 \,{.}\,e, h, S \leadsto Norm (v),h_1, S_1\) and \(S_1 \subseteq S_2\) and \(S_1(r) = v\).

Proof

By induction on the derivation of \(e, h, S_1 \leadsto R_1,h_1, S_1\). See Appendix I.3 [15].

5 Staged Entailment Checking and Its Soundness

In this section, we outline how entailments of the form \(F\vdash {\mathrm {\Phi }}_p\mathrel {\sqsubseteq }{\mathrm {\Phi }}_s\) may be automatically checked. F denotes heap and pure frames that are propagated by our staged logics entailment rules. Our entailment is always conducted over the compacted form where non-recursive staged predicate definitions are unfolded, while unknown predicates are matched exactly. Lemmas are also used to try re-summarize each instantiation of recursive staged predicates to simpler forms, where feasible. As staged entailment ensures that all execution traces that satisfy \({\mathrm {\Phi }}_p\) must also satisfy \({\mathrm {\Phi }}_s\), we rely on theory of behavioral subtyping [20] to relate them. Specifically, we check that contravariance holds for pre-condition entailment, while covariance holds for post-condition entailment, as follows:

figure s

More details of staged entailment rules are given in Appendix G [15]. Note that we use another entailment over separation logic \(D_1 \,{\vdash }\,D_2\,{{ * }}\,F_r\) that can propagate residual frame, \( F_r \). Lastly, we outline the soundness of staged entailemt against the semantics of staged formulae, ensuring that all derivations are valid.

Theorem 3

(Soundness of Entailment). Given \( {\mathrm {\Phi }}_1\mathrel {\sqsubseteq }{\mathrm {\Phi }}_2\) and , then there exists \(h_2\) such that where \(h_2 \subseteq h_1\). (Here, \(h_1 \subseteq h_2\) denotes that  \(\exists \,h_3\,{.}\, h_1 \circ h_3 = h_2\).)

Proof

By induction on the derivation of \({\mathrm {\Phi }}_1\mathrel {\sqsubseteq }{\mathrm {\Phi }}_2\). See Appendix I.4 [15].

Table 1. A Comparison with Cameleer and Prusti. (Programs that are natively inexpressible are marked with “✗”. Programs that cannot be reproduced from Prusti’s artifact [1] are marked with “-” denoting incomparable. We use T to denote the total verification time (in seconds) and \(T_{P}\) to record the time spent on the external provers.)

6 Implementation and Initial Results

We prototyped our verification methodology in a tool named Heifer [13]. Our tool takes input programs written in a subset of OCaml annotated with user-provided specifications. It analyzes input programs to produce normalized staged formulae (Sect. 3.2, Sect. 4), which it then translates to first-order verification conditions (Sect. 5) suitable for an off-the-shelf SMT solver. Here, our prototype targets SMT encodings via Why3 [11]. As an optimization, it uses Z3 [8] directly for queries which do not require Why3’s added features.

We have verified a suite of programs [14] involving higher-order functions and closures (Table 1). As the focus of our work is to explore a new program logic and subsumption-based verification methodology (rather than to verify existing programs), the benchmarks are small in size, and are meant to illustrate the style of specification and give a flavor of the potential for automation.

Table 1 provides an overview of the benchmark suite. The first two sub-columns show the size of each program (LoC) and the number of lines of user-provided specifications (LoS) required. The next two give the total wall-clock time taken (in seconds) to verify all functions in each program against the provided specifications, and the amount of time spent in external provers.

The next column shows the same programs verified using Cameleer [23, 26], a state-of-the-art deductive verifier. Cameleer serves as a good baseline for several reasons: it is representative of the dominant paradigm of pre/post specifications and, like Heifer, targets (a subset of) OCaml. It supports higher-order functions in both programs and specifications [27]. The most significant differences between Cameleer and Heifer are that Cameleer does not support effectful higher-order functions and is intended to be used via the Why3 IDE in a semi-interactive way (allowing tactic-like proof transformations, used in the above programs).

The last column shows results for Prusti [32]. Despite Rust’s ownership type system, we compare it against Prusti because of its state-of-the-art support for mutable closures, highlighting differences below. While we were able to reproduce the claims made in Prusti’s OOPSLA 2021 artifact [1], we were not able to verify many of our own benchmark programs due to two technical reasons, namely lacking support for Rust’s impl Trait (to return closures) and ML-like cons lists (which caused timeouts and crashes). Support for closures is also not yet in mainline Prusti [2]. Nevertheless, we verified the programs we could use for the artifact, the results of which are shown in Table 1. All experiments were performed on macOS using a 2.3 GHz Quad-Core Intel Core i7 CPU with 16 GB of RAM. Why3 1.7.0 was used, with SMT solvers Z3 4.12.2, CVC4 1.8, and Alt-Ergo 2.5.2. The Prusti artifact, a Docker image, was run using Moby 25.0.1.

User annotations required. Significantly less specification than code is required in Heifer, with an average LoS/LoC ratio of 0.37. This is helped by two things: the use of function stages in specifications, and the use of biabduction-based normalization, which allows the specifications of functions to be mostly automated, requiring only properties and auxiliary lemmas to be provided. In contrast, Cameleer’s ratio is 2.49, due to the need to adequately summarize the behaviors of the function arguments and accompany these summaries with invariants and auxiliary lemmas. Two examples illustrating this are detailed in Appendix F [15]. Prusti’s ratio is 0.73, but a caveat is that in the programs for it, only closure reasoning was used, without lemmas or summarization.

Expressiveness. Heifer is able to express many programs that Cameleer cannot, particularly closure-manipulating ones. This accounts for the ✗ rows in Table 1. While some of these can be verified with Prusti, unlike stages, Prusti’s call descriptions do not capture ordering [1, 10]; an explicit limitation as shown by the ✗ rows in Prusti’s column. Prusti is able to use history invariants and the ownership of the Rust type system, but this difference is more than mitigated in Heifer with the adoption of an expressive staged logic with spatial heap state; more appropriate for the weaker (but more general) type system of OCaml.

7 Related Work

The use of sequential composition in specifications goes back to classic theories of program refinement, such as Morgan’s refinement calculus [21] and Hoare and He’s Unifying Theories [17], as well session types [9] and logics [6]. It has also been used to structure verification conditions and give users control over the order in which they are given to provers [16], allowing more reliable proof automation. We extend both lines of work, developing the use of sequential composition as a precise specification mechanism for higher-order imperative functions, and using it to guide entailment proofs of staged formulae.

Higher-order imperative functions were classically specified in program logics using evaluation formulae [18] and reference-reachability predicates [34]. The advent of separation logic has allowed for simpler specifications using invariants and nested triples (Sect. 1). These techniques are common in higher-order separation logics, such as HTT [22], CFML [5], Iris [19] and Steel/Pulse [30], which are encoded in proof assistants (e.g. Coq, F\(\star \) [29]) which do not natively support closures or heap reasoning. While the resulting object logics are highly expressive, they are much more complex (owing to highly nontrivial encodings) and consequently less automated than systems that discharge obligations via SMT. We push the boundaries in this area by proposing stages as a new, precise specification mechanism which is compatible with automated verification.

The guarantees of an expressive type system can significantly simplify how higher-order state is specified and managed. Prusti [32] exploits this with call descriptions (an alternative to function stages, as pure assertions saying that a call has taken place with a given pre/post) and history invariants, which rely on the ownership of mutable locations that closures have in Rust. Creusot [10] uses a prophetic mutable value semantics to achieve a similar goal with pre/post specifications of closures. Our solution is not dependent on an ownership type system, applying more generally to languages with unrestricted mutation.

Defunctionalization [24] is another promising means of reasoning about higher-order effectful programs [27], pioneered by the Why3-based Cameleer [23]. This approach currently does not support closures.

Our approach to automated verification is currently based on strict evaluation. It would be interesting to see how staged specifications can be extended to support verification of lazy programs, as had been explored in [31, 33].

8 Conclusion

We have explored how best to modularly specify and verify higher-order imperative programs. Our contributions are manifold: we propose a new staged specification logic, rules for deriving staged formulae from programs and normalizing them using biabduction, and an entailment proof system. This forms the basis of a new verification methodology, which we validate with our prototype Heifer.

To the best of the authors’ knowledge, this work is the first  to introduce a fundamental staged specification mechanism for verifying higher-order imperative programs without any presumptions; being more concise (without the need for specifying abstract properties) and more precise (without imposing preconditions on function-typed parameters) when compared to existing solutions.