Keywords

1 Introduction

SMT-based program analysis and verification have advanced dramatically in the past two decades. These advances have been partly fuelled by major improvements in SAT and SMT solving techniques, as well as their implementations in state-of-the-art solvers such as Z3 [22] and cvc5 [2]. Leveraging these advances in SMT, a huge number of program analysis and verification tools have been based on SMT, including for example Dafny [17], Why3 [12] and Viper [24].

Such tools must translate a wide range of problem features into SMT queries that model these domain-specific concerns. While some theories relevant to problem features (e.g. linear arithmetic [22]) are natively supported by SMT solvers, most problem features must be modelled by axiomatisation.

Axiomatising problem features involves introducing uninterpreted sorts, uninterpreted functions on these sorts, and (crucially) quantifiersFootnote 1 that define the intended meaning of these features. For instance, one can model sets of integers by introducing a sort \(\textit{Set}\) for sets, uninterpreted functions \(\textit{member}\) and \(\textit{diff}\) to represent set membership and set difference respectively, and quantifiers such as \(\forall s_{1},s_{2}:\textit{Set},\,x:\textit{Int}.\;\textit{member}(x,s_{2})\rightarrow \lnot \textit{member}(x,\textit{diff}(s_{1},s_{2}))\).

Such modelling to SMT is expressive, but makes heavy use of quantifiers that must be instantiated during SMT solving. But quantifier instantiation in SMT notoriously presents notable challenges, potentially causing slow performance and even non-termination, as well as unexpectedly-failing proofs [4, 19]. Worse still, latent quantifier instantiation issues may not surface on all runs, but cause a “butterfly effect” [16], meaning that unrelated changes to an input problem may lead to substantial changes in solver behaviour along these lines.

To manage these issues, solvers allow quantifiers to be annotated with instantiation triggers (a.k.a. instantiation patterns). Triggers specify (possibly multiple) shapes of ground terms that must be known (occur in the current proof context, modulo known equalities) to enable a quantifier instantiation. This method of guiding quantifier instantiation is referred to as E-matching [8, 25] and is supported by virtually all modern SMT solvers.

However, selecting appropriate triggers is an art. The choice requires expertise in managing a fine balance: not too restrictive, to avoid insufficient quantifier instantiations for proofs, and not too permissive, to prevent excessive instantiations. Subtle issues can easily lead to the same hard-to-debug problems even for the most talented of SMT artists [16, 19], and even when successful it is unclear how one can know that the chosen triggers are guaranteed to work in the future.

The ideal aim is to achieve both instantiation completeness and instantiation termination. Instantiation completeness means that all necessary quantifier instantiations for a proof can be made by the solver. Instantiation termination means that the solver will never endlessly explore infinitely many quantifier instantiations. In this paper, we focus on instantiation termination.Footnote 2

Failures of instantiation termination stem from matching loops: the problematic scenario of a quantifier instantiation (possibly indirectly) leading to learning new terms that cause further instantiations of the same quantifier, potentially creating an endless loop. Matching loops can cause non-termination, but (problematically, for debugging) may only do so on some runs (in case heuristics in the solver arrive at the facts necessary to complete a proof “in time”).

Our paper enables proving that matching loops have been avoided altogether. We present a high-level formal model of E-matching-based quantifier instantiation that suffices to prove once and for all that a given set of trigger-annotated quantifiers, when combined with any possible ground facts, guarantees instantiation termination, thereby ensuring the absence of matching loops. Our model is designed to be broadly applicable because it models the core E-matching rules common to most solvers, but abstracts over implementation details where individual solvers make different choices. Our model enables formal termination proofs based on familiar concepts from program reasoning, with manageable complexity, allowing axiomatisation practitioners to independently construct these proofs and confidently seek terminating responses to ground theory queries.

Our main technical contributions are as follows:

  1. 1.

    We develop a formal model for reasoning about instantiation termination in E-matching-based axiomatisations. The model abstracts from solver implementation details but accounts for the essential features necessary for rigorous instantiation termination proofs.

  2. 2.

    We validate the practical utility of our formal model by using it to prove instantiation termination of a challenging set theory axiomatisation adapted from the cores of those used in the Dafny and Viper verifiers.

  3. 3.

    We outline a methodology for constructing instantiation termination proofs using our model. Our methodology involves classifying quantifiers according to certain characteristics, using these to incrementally define and refine a progress measure that eventually supports the whole axiomatisation.

Our research draws inspiration from Dross et al.’s [11] prior formalism for quantifier instantiation via E-matching. To the best of our knowledge, their work represents the sole formal attempt in this space before ours. However, we find their formalism incompatible with our goals: we elaborate on this point in Sect. 5.

Full details and supporting proofs are available in our technical report (TR hereafter) [13].

2 Problem Statement

We begin with a basic grounding in E-matching, and use this to lay out the most important challenges a formal model needs to address to be useful in practice.

2.1 Quantifier Instantiation via E-matching

Quantifiers are crucial for effectively modelling external problem features as an SMT problem. However, when determining whether such a first-order problem is satisfiable, an SMT solver must contend with quantifiers ranging over infinite sorts. A successful proof will (and need) only involve finitely many instantiations of the quantifiers, but selecting these is in general undecidable. Most solvers provide E-matching as the main means of guiding instantiation.

E-matching requires each quantifier to be associated with instantiation triggers (a.k.a. instantiation patterns). Triggers consist of terms containing the quantified variables, and prescribe that instantiations should only be made when ground terms of matching shape(s) arise in the current proof search.

During a proof search, SMT solvers maintain and update the currently-known ground terms and (dis)equalities on them in an efficient congruence-closure data structure called an E-graph. This information enables E-matching [21, 25]—matching modulo currently-known equalities—of known terms against quantifier triggers, which enables new instantiations, and of potential instantiations against previous ones, which prevents redundant instantiations.

Example 1

Consider the set theory axiom presented early in Sect. 1, now annotated with triggers (written comma-separated inside square brackets)Footnote 3:

$$ \forall s_{1},s_{2},x.\left[ \textit{diff}(s_{1},s_{2}),\textit{member}(x,s_{2})\right] \textit{member}(x,s_{2})\rightarrow \lnot \textit{member}(x,\textit{diff}(s_{1},s_{2})) $$

The trigger consists of two terms, \(\textit{diff}(s_{1},s_{2})\) and \(\textit{member}(x,s_{2})\); a multi-term trigger prescribes that terms matching all (here, both) patterns must be known for some instantiation of the quantified variables. If so, the corresponding instantiation of the quantifier itself will be made: the instantiated quantifier bodyFootnote 4 will be treated as a newly-derived fact (typically, a clause), and the solver will also record that this instantiation has been made (to avoid doing so again).

Suppose that an E-graph represents the congruence closure of the facts: \(\textit{member}(t,a){=}\top \), \(\textit{diff}(b,c){\ne }b\) and \(a{=}c\). E-matching will find a successful match against the trigger above; although it might seem that there is no consistent pair of terms here, the equality \(a=c\) means that (modulo equalities) we can consider the terms \(\textit{member}(t,a)\) and \(\textit{diff}(b,a)\) as known in the E-graph, which match the triggers under the instantiation \(s_{1}{\mapsto }b\), \(s_{2}{\mapsto }a\) and \(x{\mapsto }t\). The corresponding instantiation of the quantifier body yields \(\lnot \textit{member}(t,a)\vee \lnot \textit{member}(t,\textit{diff}(b,a))\). Subsequently, the same quantifier cannot be instantiated with e.g. \(s_{1}{\mapsto }b\), \(s_{2}{\mapsto }c\) and \(x{\mapsto }t\) since, again modulo equalities, this is an equivalent instantiation.

Example 2

Consider a variant of the previous quantifier, modified with a different trigger, and in the context of a different E-graph that represents instead the congruence closure of the facts: \(\textit{member}(t,a){=}\top \) and \(\textit{member}(t,b){=}\top \).

$$ \begin{array}{l} \forall s_{1},s_{2},x.\left[ \textit{member}(x,s_{1}),\textit{member}(x,s_{2})\right] \\ \quad \quad \quad \textit{member}(x,s_{2})\rightarrow \lnot \textit{member}(x,\textit{diff}(s_{1},s_{2})) \end{array} $$

Now four instantiations are enabled: one for each pair of \(\textit{member}\) applications in our current model (and E-graph): e.g. instantiating \(s_{1}{\mapsto }a\), \(s_{2}{\mapsto }b\) and \(x{\mapsto }t\) or \(s_{1}{\mapsto }b\), \(s_{2}{\mapsto }a\) and \(x{\mapsto }t\). All four will be made: they are different choices since we don’t know that \(a=b\). The second, for example, causes the new clause (rewritten as a disjunction) \(\lnot \textit{member}(t,a)\vee \lnot \textit{member}(t,\textit{diff}(b,a))\) to be assumed. This doesn’t change the E-graph (which is populated only by assumed literals); clauses are kept separately in the prover state. However, case-splitting on this clause may lead to the literal \(\lnot \textit{member}(t,\textit{diff}(b,a))\) being added. At this point, five new quantifier instantiations will be enabled; the number of pairs of \(\textit{member}\) applications has increased. In fact, by alternately instantiating this quantifier and case-splitting on newly-learned clauses, we can uncover new instantiations indefinitely, in a so-called matching loop.

These first examples show that the choice of triggers affects instantiation behaviour, and that modelling instantiations requires considering not only initial terms, but also facts learned during proof search and case-splitting choices.

Example 3

Consider the following “subset elimination” axiom (from the set theory axiomatisation we tackle later) with nested quantifiers:

$$ \begin{array}{l} \forall s_{1},s_{2}.\left[ \textit{subset}(s_{1},s_{2})\right] \;\textit{subset}(s_{1},s_{2})\rightarrow \\ \quad \quad \left( \forall x.[\textit{member}(x,s_{1})][\textit{member}(x,s_{2})]\;\textit{member}(x,s_{1})\rightarrow \textit{member}(x,s_{2})\right) \end{array} $$

The inner quantifier has two triggers, defining alternative conditions for instantiation (a term of either shape is sufficient). Note that these triggers depend on the outer-quantified variables \(s_1\) and \(s_2\), and thus their instantiations.

Instantiating an outer quantifier expands the current quantifiers for instantiations. In this example, instantiating the outer quantifier (\(\forall s_1, s_2.\dots \)) results in a clause that includes a copy of the inner quantifier (\(\forall x.\dots \)); case-splitting on this clause can cause the copy to be assumed, effectively adding one more quantifier for future potential instantiations. As such, the instantiation of outer quantifiers dynamically introduces new quantifiers, adding complexity to establishing termination arguments—one must be able to identify and predict the quantifiers (and their instantiations) that will be dynamically introduced.

2.2 Objectives for a Formal Model of E-matching

Given the difficulty of choosing quantifier triggers and knowing that their instantiations can never continue forever, our objective is to provide formal and usable means of proving such E-matching termination proofs once-and-for-all. Rather than attempt to capture the precise behaviour of a specific solver and its configuration, we want a model that abstracts over the behaviours of any reasonable implementation of E-matching, while still being sufficiently precise for the proofs to work and be reasonable to construct in practice.

The design of a model for E-matching must address multiple challenges:

  1. 1.

    How should (intermediate) solver states and the transitions between them be modelled, avoiding over-fitting to specific solver choices while retaining clear and pertinent information suitable for understandable proofs?

  2. 2.

    How should equality-related information and reasoning be captured, given their central nature (for defining enabled E-matches) but the complexities of the data structures employed in real implementations?

  3. 3.

    How can nested quantifiers (cf. Example 3), when instantiations can introduce new quantifiers on the fly, be supported?

  4. 4.

    How can we make the model extensible to more-complex future applications (e.g. axiomatisations whose termination depends on theory reasoning)?

  5. 5.

    How can a formal model enable formal proofs with manageable complexity?

We present our model, designed to address these challenges in the next section; we demonstrate its applicability for termination proofs in Sect. 4.

3 An Operational Semantics for E-matching

We develop our formal model in the style of a small-step operational semantics, a popular choice for programming languages. In this operational style, states represent intermediate points of a proof search, while transitions represent solver steps; non-determinism abstracts over choices specific solvers make. With this design, our desired notion of instantiation termination can be recast as a familiar style of termination proof, albeit against a semantics with novel core details.

3.1 Preliminaries

Our syntax for formulas is based around a generalisation of conjunctive normal form, used internally in SMT algorithms; we assume all formulas are pre-converted to this form (existential quantifiers are eliminated by Skolemisation).

Definition 1

(Formula Syntax). We assume a pre-defined set of atomsFootnote 5, including equalities on terms \(t_{1}=t_{2}\). A (simple) literal l is either an atom or its negation. The grammars of extended literals \(\phi \), extended clauses C and extended conjunctive normal form (ECNF) formulas A are as follows:

$$ \begin{array}{rrlrrlrrl} \phi & :\,\!:=& l \mid (\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A)^{\sharp \alpha } & \quad \quad C & :\,\!:=& \phi \mid C\vee C & \quad \quad A & :\,\!:=& C \mid A\wedge A \end{array} $$

Here, \((\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A)^{\sharp \alpha }\) denotes a tagged quantifier: the (possibly-multiple) variables \(\overrightarrow{x}\) are bound, the (possibly-multiple) trigger sets \(\overrightarrow{T}\) are each marked with square brackets and positioned before the quantifier body A, and \(\sharp \alpha \) is a tag used to uniquely identify this particular quantifier (see also Sect. 3.6).

As presented in Example 1, a trigger set T is a (non-empty) set of terms, written comma-separated. There are additional requirements: each trigger set must contain each quantified variable at least once, and each term must contain at least one quantified variable. Furthermore, each term must contain at least one uninterpreted function application and no interpreted function symbols such as equalities. These restrictions are common for SMT solvers.

When quantifier tags are not relevant, we omit them for brevity.

3.2 States

As illustrated in Examples 1 and 2, both case-splitting and quantifier instantiation steps are crucial to our problem; we define our semantics around these two kinds of transitions. Furthermore, we must abstractly capture information relevant for deciding E-matching questions, tracking in particular which terms and equalities are known (modulo currently known equalities), and which quantifier instantiations have already been made.

Definition 2

(States). States \(s\in \textsc {State}\) are defined as follows:

$$ s:\,\!:=\left\langle W,A,E\right\rangle \mid \lozenge \mid \bot $$

where \(\lozenge \) and \(\bot \) are distinguished symbols for saturated and inconsistent states, W (the current quantifiers) is a set of tagged quantifiers, A (the current clauses) is a set of extended clauses, and E (the current E-state) is explained below.

For simple applications of our semantics, the set of current quantifiers remains fixed, but for problems with nested quantifiers (e.g. Example 3), it may grow as a solver runs. As we show, which instantiations are immediately enabled is definable in terms of both the current quantifiers and the current E-state. The current clauses, on the other hand, through case-splitting, introduce new quantifiers to the current quantifiers and generate new literals for the E-state; new extended clauses may be added as a consequence of quantifier instantiations.

The inconsistent and saturated states represent two different termination conditions for traces in our semantics: the former due to logical inconsistency, and the latter due to all quantifier instantiations having been exhausted.

3.3 E-interfaces

Each solver maintains its own implementation of E-graphs to efficiently represent and query the currently-known ground terms modulo congruences and known equalities. Rather than formalising such an implementation, we devise an abstraction called an E-interface, capturing the operations and expected mathematical properties of E-graph implementations.

Definition 3

(E-interface Judgements). An E-interface \(E^{\textrm{I}}\) is a set of equalities and disequalities on terms.Footnote 6 We write \(E^{\textrm{I}}\Vdash _{\textrm{kn}}t\) to express that the ground term t is known in the E-interface \(E^{\textrm{I}}\); we write \(E^{\textrm{I}}\Vdash t_{1}\sim t_{2}\) to express that the ground terms \(t_1\) and \(t_2\) are known equal in \(E^{\textrm{I}}\). These two judgements are (mutually recursively) defined by (the least fixed-point of) the derivation rules:

figure a

The judgement \(E^{\textrm{I}}\Vdash t_{1}\not \sim t_{2}\) represents \(t_1\) and \(t_2\) being known disequal in \(E^{\textrm{I}}\); the judgement \(E^{\textrm{I}}\Vdash \bot \) represents that \(E^{\textrm{I}}\) is inconsistent (in the logical sense); cf. App. A of the TR.

E-interfaces are equivalent if they agree on these judgements in all cases. When a proof step adds new literals, we must be able to extend our E-interfaces.

Definition 4

(E-interface Extension). For a set of equality and disequality literals L, the update of an E-interface \(E^{\textrm{I}}\) with L, denoted \(E^{\textrm{I}}\triangleleft L\), is a minimal E-interface which satisfies all E-interface judgements that \(E^{\textrm{I}}\) does, while also satisfying \(E^{\textrm{I}}\Vdash l\) for all \(l\in L\).

We call a set of terms a basis of \(E^{\textrm{I}}\) if each element is a representative of a different equivalence classFootnote 7 induced by the \(E^{\textrm{I}}\Vdash t_{1}\sim t_{2}\) relation on the terms known in \(E^{\textrm{I}}\). As we shall see in the next subsection, equivalence classes are relevant for defining which quantifier instantiations can be made after which.

3.4 E-histories, E-states, E-matching

As illustrated in Example 1, E-matching against triggers does not suffice to determine whether a quantifier instantiation should be considered enabled; we must also determine whether the instantiation is considered redundant given previous ones. We record previous instantiations using our next formal ingredient:

Definition 5

(E-histories and E-states). An E-history \(E^{\textrm{H}}\) is a set of pairs (each denoted \((\sharp \alpha :\overrightarrow{r})\)) in our formalism: the first element is a tag (identifying a quantifier), and the second is a vector of ground terms (representing an instantiation of the corresponding quantifier).

An E-state (cf. Definition 2) E is a pair \((E^{\textrm{I}}, E^{\textrm{H}})\) of E-interface and E-history.

Recall that E-states are a part of the states in our formalism. E-states consist of an E-interface component, which captures the current known terms and equality information, and an E-history component, which records the history of instantiations, in particular representing sufficient information to reject redundant instantiations.

Definition 6

(History-Enabled E-matches). Given a candidate match pair \(\left( \sharp \alpha :\overrightarrow{r}\right) \) (of tag \(\sharp \alpha \) and vector of terms \(\overrightarrow{r}\)), the E-state E enables \(\left( \sharp \alpha :\overrightarrow{r}\right) \), written \(E\Vdash _{\textrm{hist}}\left( \sharp \alpha :\overrightarrow{r}\right) \), if: for every instantiation pair \((\sharp \alpha :\overrightarrow{r^{\prime }})\in E^{\textrm{H}}\), at least one of the pointwise equalities \(\overrightarrow{r_i\sim r_i^\prime }\) is not known in \(E^{\textrm{I}}\).

Example 4

Revisiting Example 1, suppose the tag of the quantifier is \(\sharp \tau \), E is the E-state whose E-interface component contains the example literals. The first instantiation \(s_{1}{\mapsto }b\), \(s_{2}{\mapsto }a\) and \(x{\mapsto }t\) is represented in our formal model by adding \(\left( \sharp \tau :\left( b,a,t\right) \right) \) to the E-history, resulting in a new E-state, say \(E'\). The second candidate match \(s_{1}{\mapsto }b\), \(s_{2}{\mapsto }c\) and \(x{\mapsto }t\) is not enabled in \(E'\) since the three pointwise equalities between instantiated terms are all known in \(E'\).

With the help of the above ingredients, we formally characterise E-matching:

Definition 7

(E-matching). For a given state \(\left\langle W,A,E\right\rangle \), the judgement \(\left\langle W,A,E\right\rangle \vdash _{\textrm{match}}(\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A^{\prime })^{\sharp \alpha }\sphericalangle \overrightarrow{r}\) defines which instantiations (using terms \(\overrightarrow{r}\)) of which quantifiers \((\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A^{\prime })^{\sharp \alpha }\) are enabled by E-matching rules, as follows:

$$ \dfrac{\begin{array}{c} \begin{array}{cc} (\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A^{\prime })^{\sharp \alpha }\in W \quad &{} \quad \overrightarrow{t} \text { is one trigger set of } \overrightarrow{[T]} \\ E^{\textrm{I}}\Vdash _{\textrm{kn}}\overrightarrow{t}\left[ \overrightarrow{r}/\overrightarrow{x}\right] \quad &{} \quad E\Vdash _{\textrm{hist}}\left( \sharp \alpha :\overrightarrow{r}\right) \end{array} \end{array}}{\left\langle W,A,E\right\rangle \vdash _{\textrm{match}}(\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A^{\prime })^{\sharp \alpha }\sphericalangle \overrightarrow{r}} $$

We write \(\left\langle W,A,E\right\rangle \not \vdash _{\textrm{match}}\) to mean no instantiations are enabled in this state.

E-matching \(\vdash _{\textrm{match}}\) requires (1) a quantifier in the current state, (2) a trigger set \(\overrightarrow{t}\) with replacement terms \(\overrightarrow{r}\) for quantified variables \(\overrightarrow{x}\) to be known in \(E^{\textrm{I}}\), and (3) that this potential match is enabled by the E-state E. Note that (2) implies the terms \(\overrightarrow{r}\) to match against the quantified variables of one trigger set \(\overrightarrow{t}\) to be known in the current E-interface \(E^{\textrm{I}}\).

3.5 State Transitions

The last main ingredient of our formal model is the definition of state transitions.

Definition 8

(State Transitions). The (single step) state transition relation \(\longrightarrow \,\subseteq \textsc {State}\times \textsc {State}\) is defined by the union of the following cases:

$$ \dfrac{\begin{array}{c} \emptyset \subset \varPhi \subseteq \left\{ \phi _{i}\mid C\in A;\;W_{1},E_{1}^{\textrm{I}}\not \Vdash _{\textrm{sat}}C;\;C\text { is }\cdots \vee \phi _{i}\vee \cdots \right\} \\[2pt] W_{2}={W_{1}}\cup {\textrm{filter}_{\forall }\left( \varPhi \right) } \quad E_{2}^{\textrm{I}}=E_{1}^{\textrm{I}}\triangleleft \textrm{filter}_{\textrm{lit}}\left( \varPhi \right) \quad E_{2}^{\textrm{H}}=E_{1}^{\textrm{H}} \end{array}}{\left\langle W_{1},A,E_{1}\right\rangle \longrightarrow \left\langle W_{2},A,E_{2}\right\rangle }\textsc {(split)} $$
$$ \dfrac{E^{\textrm{I}}\Vdash \bot }{\left\langle W,A,E\right\rangle \longrightarrow \bot }\textsc {(bot)} $$
$$ \dfrac{\begin{array}{ccc} E^{\textrm{I}}\not \Vdash \bot \quad & \quad W,E^{\textrm{I}}\Vdash _{\textrm{sat}}C\text { for every }C\in A\quad & \quad \left\langle W,A,E\right\rangle \not \vdash _{\textrm{match}}\end{array}}{\left\langle W,A,E\right\rangle \longrightarrow \lozenge }\textsc {(sat)} $$
$$ \dfrac{\begin{array}{c} \left\langle W_{1},A_{1},E_{1}\right\rangle \vdash _{\textrm{match}}(\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A_{11})^{\sharp \alpha }\sphericalangle \overrightarrow{r}\\ \begin{array}{cc} A_{12}=A_{11}\left[ \overrightarrow{r}/\overrightarrow{x}\right] &{} A_{12}^{\prime }=\textrm{filter}_{\forall }\left( A_{12}\right) \cup \textrm{filter}_{\textrm{lit}}\left( A_{12}\right) \\ A_{2}=A_{1}\cup \left( A_{12}\backslash A_{12}^{\prime }\right) &{} W_{2}=W_{1}\cup \textrm{filter}_{\forall }\left( A_{12}\right) \\ E_{2}^{\textrm{I}}=E_{1}^{\textrm{I}}\triangleleft \textrm{filter}_{\textrm{lit}}\left( A_{12}\right) &{} E_{2}^{\textrm{H}}=E_{1}^{\textrm{H}}\triangleleft \left( \sharp \alpha :\overrightarrow{r}\right) \end{array} \end{array}}{\left\langle W_{1},A_{1},E_{1}\right\rangle \longrightarrow \left\langle W_{2},A_{2},E_{2}\right\rangle }\textsc {(inst)} $$

where the overloaded operators \(\textrm{filter}_{\forall }\) and \(\textrm{filter}_{\textrm{lit}}\) select quantifiers and simple literals, respectively, from any provided set of extended literals, or from unit clauses of any provided set of extended clauses; the judgement \(W,E^{\textrm{I}}\Vdash _{\textrm{sat}}C\) holds if: for some disjunct \(\phi _{i}\) of C, either \(\phi _{i}\) is a tagged quantifier from W, or \(\phi _{i}\) is a simple literal that \(E^{\textrm{I}}\) knows.

Our state transition relation \(\longrightarrow \) consists of case-splitting steps, steps that deduce the inconsistent state, steps that deduce the saturated state, and quantifier instantiation steps, corresponding to the rules (split), (bot), (sat) and (inst) respectively.

We allow a case-splitting transition to non-deterministically select any non-empty subset of the disjuncts in the unsatisfied current clauses—those that have not yet been made true in the current state. A case-splitting transition must make progress towards satisfying the clauses. We do not impose restrictions on the order in which unsatisfied current clauses are chosen, nor on the number of disjuncts assumed within a clause, provided that progress is being made.Footnote 8

We model case-splitting as non-deterministic. Recall Example 2, where the clause \(\lnot \textit{member}(t,a)\vee \lnot \textit{member}(t,\textit{diff}(b,a))\) is learnt. Subsequently, the solver can choose to assume either one or both of the disjuncts; generally, it can choose to assume neither disjunct as long as it selects at least one disjunct from some other unsatisfied clause. Here, the disjuncts are ground simple literals (which are added to the E-state); in general, some could be new quantifiers to record.

Our \(\Vdash _{\textrm{sat}}\) judgement checks if a provided clause is satisfied (i.e. at least one disjunct is assumed in the current state). If all current clauses are satisfied, and the E-interface is not inconsistent, and there are no enabled instantiations, the (sat) rule applies and transitions to the saturated state (\(\lozenge \)). Conversely, if the current E-interface is inconsistent, the (bot) rule transitions to the inconsistent state (\(\bot \)); if there are enabled instantiations, the (inst) rule applies.

The instantiation rule (inst) relies on the \(\vdash _{\textrm{match}}\) judgement to select an instantiation enabled by E-matching rules. The effect of an instantiation transition involves adding quantifiers and simple literals occurring as unit clauses in the quantifier body to the current quantifiers \(W_1\) and E-interface \(E_1^{\textrm{I}}\), respectively; any remaining non-unit clauses are added to the current clauses \(A_{1}\). Finally, the E-history \(E_1^{\textrm{H}}\) is updated to record this instantiation.

In practice, common SMT solvers such as cvc5 [2] perform quantifier instantiation both (1) up-front and (2) in phases interleaved with other solver steps. In particular, the latter is essential for many applications: most quantifier instantiations lead to e.g. clauses requiring context-aware case-splitting via DPLL/CDCL. Our model effectively captures both processes through its unrestricted interleavings of quantifier instantiation and case-splitting steps.

In retrospect, Sects. 3.2 to 3.5 have tackled design challenges #1 and #2 (cf. Sect. 2.2). We address #3 and #4 in the next two subsections, respectively.

3.6 Nested Quantifiers

Example 3 demonstrates that instantiating outer quantifiers in nested structures of quantifiers can introduce new quantifiers on the fly. To effectively argue for termination regarding these instantiations (as will be discussed in Sect. 4), one must be able to identify and predict these dynamically introduced quantifiers. To facilitate this, we employ a tagging system that is capable of handling nested structures (cf. App. A of the TR for details). Each quantifier in an axiomatisation is labelled with a distinct tag. The tag for any non-nested quantifier (including the outermost quantifier in any nested structure of quantifiers) is not parameterised. A nested quantifier has its tag parameterised by all of its outer-quantified variables. Instantiating an outer quantifier produces a copy of the quantifier body in which (among other changes) tags of all inner quantifiers that are parameterised by this outer-quantifier are updated to reflect this instantiation. In Example 3, we label the outer and inner quantifiers with tags \(\sharp \text {union-elim}\) and \(\sharp \text {union-elim}(s_1, s_2)\), respectively. Instantiating the outer quantifier with \(s_{1}{\mapsto }a\) and \(s_{2}{\mapsto }b\) introduces a copy of the quantifier body in which the inner quantifier is tagged with \(\sharp \text {union-elim}(a, b)\).

To further mitigate redundancy in quantifier instantiation, our semantics supports two additional optimisations. First, a quantifier is only permitted to join the current quantifiers W if its tag is known to be distinct from the tags of existing quantifiers in W, modulo equivalence on the parameters of the tags, as assessed in the current E-interface. This criterion prevents adding redundant quantifiers into W. Second, the relation of history-enabled E-matches \(\Vdash _{\textrm{hist}}\) leverages the current E-interface to verify the uniqueness of tags—once again, modulo equivalence on tag parameters—before enabling an E-match. An E-match is enabled only if no quantifier with an equivalent tag has been instantiated with an equivalent match previously. (cf. App. A of the TR for related definitions.)

3.7 Theory-Specific Reasoning

Although our rules do not yet account for (interpreted) theory reasoning (as performed by theory solvers in a typical SMT solver design), our small-step semantics is intentionally chosen to easily accommodate future extensions: “hot-plugging” new kinds of primitive transitions is straightforward, and will not disturb the existing formal rules (e.g. for quantifier instantiations or case-splitting). Similarly to our E-interfaces for abstracting of E-graph details, we plan to do this in a way which abstracts over the effects of theory deduction steps, without exposing the solver-specific internals. For example, we can add deduction steps which extend the E-interface with new terms and/or (dis)equalities, based on a valid deduction within, say, an integer theory.

Just as for quantifier instantiations, it may be necessary for some applications to guarantee that theory reasoning is performed under some fairness conditions (e.g. that inconsistencies detectable by a theory solver are not infinitely postponed). Imposing custom fairness constraints on the traces of our semantics for specific examples can be achieved in a standard way for small-step semantics.

While it is clear that extensions to theory solving will be straightforward, we choose the case study for this paper to be a complex and practically-relevant axiomatisation which nonetheless does not rely on external theory solvers.

4 Proving Instantiation Termination for E-matching

We now apply our model to prove instantiation termination for a practical E-matching-based axiomatisation. First, we briefly present our set theory axiomatisation, adapted from Dafny and Viper. We then demonstrate our methodology for constructing instantiation termination proofs using our model.

4.1 Axiomatisation for Set Theory

To assess our formal model, we tackle formal proofs of instantiation termination for axiomatisations currently employed by state-of-the-art verification tools, specifically targeting set theory in this paper. Set theory, despite the known challenges associated with its quantifier instantiation, is extensively used in verifiers.

Drawing from the axioms used by Dafny [18] and Viper [27], we aim to construct an axiomatisation that (1) faithfully models the core of set theory, (2) supports various encodings of set theory used by verifiers, and (3) strives to maintain a balance on triggers to ensure instantiation termination without harming instantiation completeness.

Our axiomatisation involves 12 uninterpreted functions, representing a wider range of set operations than the counterparts in Dafny and Viper. Cardinality operators are, however, removed due to their dependency on external linear arithmetic solvers (cf. Sect. 3.7 for explanation). Refer to App. C.1 and C.2 of the TR for a full presentation of our axiomatisation and comparison with theirs.

Dafny and Viper typically use complex “iff” formulas to define set operations, restricting trigger flexibility as they must apply in both directions of the “iff”. Inspired by proof systems for formal logic, we redefine set operations using analogues of introduction and elimination axioms, introducing independent triggers for each implication direction and thereby enhancing trigger flexibility.

Example 5

Below is our elimination rule for set union, named (union-elim), allowing more alternative triggers than the counterparts from Dafny and Viper.

$$ \begin{array}{l} \forall s_{1},s_{2},x.\left[ \textit{member}(x,\textit{union}(s_{1},s_{2}))\right] \\ \left[ \textit{union}(s_{1},s_{2}),\textit{member}(x,s_{1})\right] \left[ \textit{union}(s_{1},s_{2}),\textit{member}(x,s_{2})\right] \\ \;\textit{member}\left( x,\textit{union}\left( s_{1},s_{2}\right) \right) \rightarrow \textit{member}(x,s_{1})\vee \textit{member}(x,s_{2}) \end{array} $$

Our axiomatisation overall has more permissive triggers, which provides more flexibility for instantiation, but also increases the risk of non-termination. That instantiation termination holds for our axiomatisation means that Dafny and Viper’s more restrictive triggers are not necessary to ensure termination.

4.2 Progress Measure

To prove instantiation termination for an axiomatisation, it suffices to prove that querying any set of ground literals on the axiomatisation cannot lead to an infinite trace in our formal semantics. The proof argument is parametric with respect to the ground literals in the initial state.Footnote 9 Drawing inspiration from program reasoning [7, 26], we identify a suitable measure on solver states and then establish its decrease at appropriate steps in a well-founded manner.

This method leverages the specific features of the axioms under consideration. We analyse our set theory axioms and classify them by two criteria: (1) whether instantiating the axiom would potentially generate new quantifiers or new equivalence classes of terms, i.e. new terms modulo equalities, and (2) whether the axiom contains nested quantifiers.

Non-generative Quantifiers. We call a quantifier non-generative if its instantiations yield neither new quantifiers nor new equivalence classes of terms. The majority of our set theory axioms are non-generative.

For instance, the (union-elim) axiom from Example 5, when instantiated with \(s_{1}{\mapsto }a\), \(s_{2}{\mapsto }b\) and \(x{\mapsto }t\), yields \(\lnot \textit{member}\left( t,\textit{union}\left( a,b\right) \right) \vee \textit{member}(t,a) \vee \textit{member}(t,b)\), without the potential (via case-splitting) to introduce new quantifiers or new equivalence classes of terms. The absence of new terms is because all of t, a, b and \(\textit{union}(a,b)\) are subterms of the matched trigger and hence known. \(\textit{Bool}\)-sorted terms never add new equivalence classes (cf. Definition 3).

Instantiating a non-generative quantifier reduces the amount of enabled E-matches by at least one since, on the one hand, history-enabled E-matches prevent instantiating the same quantifier with equivalent matches; on the other hand, instantiating a non-generative quantifier does not introduce new quantifiers or equivalence classes, thereby not expanding the match pool. This suggests:

Idea 1

Define the progress measure to be about the amount of enabled E-matches.

Generative Quantifiers. A quantifier is generative if its instantiations may introduce new quantifiers or new equivalence classes of terms. Among our set theory axioms without nested quantifiers, four are generative, with each potentially creating new applications of Skolem functions upon instantiation.

For instance, the following (subset-intro) axiom, when instantiated, may create a new term \(\textit{Sk}_{\textit{ss}}(s_{1},s_{2})\) for some sets \(s_1\) and \(s_2\):

$$ \begin{array}{r} \forall s_{1},s_{2}.\left[ \textit{subset}(s_{1},s_{2})\right] \left( \textit{subset}(s_{1},s_{2})\vee \textit{member}(\textit{Sk}_{\textit{ss}}(s_{1},s_{2}),s_{1})\right) \wedge \\ \left( \textit{subset}(s_{1},s_{2})\vee \lnot \textit{member}(\textit{Sk}_{\textit{ss}}(s_{1},s_{2}),s_{2})\right) \end{array} $$

Similarly, axioms for introducing extensional equality on sets, set disjointness, and set emptiness—namely (equal-sets-intro), (disjoint-intro), and (isEmpty-intro-1), respectively—can each produce new applications of Skolem functions: \(\textit{Sk}_{\textit{eq}}(s_{1},s_{2})\), \(\textit{Sk}_{\textit{dj}}(s_{1},s_{2})\), and \(\textit{Sk}_{ie}(s)\), respectively (cf. App. C.1 of the TR).

Generative quantifiers, by introducing new equivalence classes of terms, may expand the pool of E-matches, including those enabled. We thereby suggest:

Idea 2

Predict new equivalence classes of terms introduced by instantiating generative quantifiers; incorporate these forecasts to estimate enabled E-matches.

Set theory axioms with nested quantifiers are all generative because their instantiations can potentially create new quantifiers. Such axioms include (subset-elim) from Example 3, and axioms (disjoint-elim) and (isEmpty-elim-1) for eliminating set disjointness and emptiness, respectively (cf. App. C.1 of the TR).

Instantiating these three axioms does not introduce new equivalence classes of ground terms. However, since they contain nested quantifiers, their instantiations can create new quantifiers—each with its own set of enabled E-matches, effectively raising the total amount of enabled E-matches. We therefore propose:

Idea 3

Incorporate predicted effects from instantiating generative quantifiers with nested quantifier structures to refine estimates of enabled E-matches.

In practice, provided that these ideas are respected, one can often define simpler termination measures via over-approximations of these candidate instantiations (provided this over-approximation remains finite and decreasing).

Formalising a Practical Progress Measure. A basis of an E-interface is a representation of the known equivalence classes. We define its overapproximation to include potential new equivalence classes introduced by generative quantifiers.

Definition 9

(Overapproximation of Basis for Set Theory). Suppose B is a basis of an E-interface. The functions \(O_{1}(B)\) and \(O_{2}(B)\) denote overapproximations for the \(\textit{Set}(T)\)-sorted and T-sorted elements within basis B, respectively, to accommodate new expected equivalence classes of terms.

$$\begin{aligned} O_{1}(B) & =\textrm{filter}_{Set(T)}(B)\\ O_{2}(B) & =\textrm{filter}_{T}(B)\cup \widehat{\textit{Sk}_{\textit{ss}}}(O_{1}(B),O_{1}(B))\cup \widehat{\textit{Sk}_{\textit{eq}}}(O_{1}(B),O_{1}(B))\\ & \quad \cup \widehat{\textit{Sk}_{\textit{dj}}}(O_{1}(B),O_{1}(B))\cup \widehat{\textit{Sk}_{\textit{ie}}}(O_{1}(B)) \end{aligned}$$

Here \(\textrm{filter}_{Set(T)}\) and \(\textrm{filter}_{T}\) take a basis and select its \(\textit{Set}(T)\)-sorted and T-sorted elements, respectively; each \(\widehat{\textit{Sk}}\) is lifted from the corresponding \(\textit{Sk}\) to support sets.

The potential new terms introduced by generative quantifiers are all T-sorted Skolem terms. Thus predictions are solely performed by \(O_{2}(B)\), not by \(O_{1}(B)\).

Note that the results of these two overapproximations are guaranteed to be finite. E-interface bases always remain finite: elements are added (at most) for the new terms introduced in a step. Since our construction filters and e.g. maps Skolem functions over these finite sets, its results are finite. Leveraging this overapproximation of equivalence classes, we estimate enabled E-matches.

Definition 10

(Overestimation of Enabled E-matches for Set Theory). Consider an arbitrary state \(s = \left\langle W,A,E\right\rangle \). Let B be a basis of the E-interface \(E^{\textrm{I}}\). Define an overestimation of the enabled E-matches for s from B as follows:

$$P(\left\langle W,A,E\right\rangle , B)=\{ \dots p_{\sharp \tau _{i}},\dots ,p_{\sharp \tau _{j}(\overrightarrow{r})},\dots \}$$

where \(p_{\sharp \tau _{i}}\) and \(p_{\sharp \tau _{j}(\overrightarrow{r})}\) each denote a set of tuples that overapproximate the enabled E-matches from the basis B to the quantifiers with tags \(\sharp \tau _{i}\) and \(\sharp \tau _{j}(\overrightarrow{r})\), respectively; each tag \(\sharp \tau _{i}\) identifies an original quantifier from W, and each \(\sharp \tau _{j}(\overrightarrow{r})\) identifies a quantifier introduced by instantiating an original quantifier \(\sharp \tau _{j}\) from W with terms \(\overrightarrow{r}\) from approximations \(O_{1}(B)\) or \(O_{2}(B)\). Original quantifiers from W are those from the axiomatisation, not those introduced at runtime.

To clarify, examples for each category are presented as follows; the remaining quantifiers shall adhere to the same pattern.

  • An (original) non-generative quantifier:

    \(\begin{array}{ll} p_{\sharp \text {union-elim}}=\{(s_{1},s_{2},x)\;\vert \; &{} s_{1},s_{2}\in O_{1}(B),x\in O_{2}(B),\\ &{} E\Vdash _{\textrm{hist}}\left( \sharp \text {union-elim}:(s_{1},s_{2},x)\right) \} \end{array}\)

  • An (original) generative quantifier without nested quantifiers:

    \(p_{\sharp \text {subset-intro}}=\left\{ \left( s_{1},s_{2}\right) \; \left| \; s_{1},s_{2}\in O_{1}(B)\right. ;\; E\Vdash _{\textrm{hist}}\left( \sharp \text {subset-intro}:\left( s_{1},s_{2}\right) \right) \right\} \)

  • An (original) generative quantifier with nested quantifiers:

    \(p_{\sharp \text {subset-elim}}=\left\{ \left( s_{1},s_{2}\right) \; \left| \; s_{1},s_{2}\in O_{1}(B)\right. ;\; E\Vdash _{\textrm{hist}}\left( \sharp \text {subset-elim}:\left( s_{1},s_{2}\right) \right) \right\} \)

  • A quantifier introduced by instantiating an (original) generative quantifier:

    \(p_{\sharp \text {subset-elim(a,b)}}=\left\{ x\;\left| \; x\in O_{2}(B)\right. ;\; E\Vdash _{\textrm{hist}}\left( \sharp \text {subset-elim(a,b)}:x\right) \right\} \)

    where \(a,b\in O_{1}(B)\).

We define a progress measure for our set theory axiomatisation. The first and foremost ingredient of our progress measure is an overestimation on the amount of enabled E-matches. We anticipate that this overestimation strictly descents after each instantiation step and does not ascend after each case-splitting step. The second ingredient is the amount of unsatisfied current clauses, which we expect to descent by at least one after each case-splitting step. The result of the progress measure is a lexicographically ordered pair of the above two ingredients.

Definition 11

(Progress Measure for Set Theory). We define the progress measure \(M : \textsc {State} \longrightarrow (\mathbb {N}\cup \{-1\})^{2}\), as follows, where \(\left\| \cdot \right\| \) denotes cardinality.

figure b

Inconsistent or saturated states are assigned (the smallest) measures \((-1, -1)\). The order on \((\mathbb {N}\cup \{-1\})\) is the natural extension of that on \(\mathbb {N}\).

4.3 Invariants and Termination Theorem

Drawing on program reasoning, we anticipate classical techniques such as induction variants can be employed to termination proofs. We maintain two kinds of induction variants: general-purpose and problem-specific invariants.

General-purpose invariants uphold the integrity of our formal semantics, remaining valid across all applications. For example, the E-history \(E^{\textrm{H}}\) of an arbitrary state \(s=\left\langle W,A,E\right\rangle \) must be up to date w.r.t. the current quantifiers W and E-interface \(E^{\textrm{I}}\). That is, for every pair \(\left( \sharp \tau :\overrightarrow{r}\right) \) from \(E^{\textrm{H}}\), there exists a quantifier \(\forall \overrightarrow{x}.\overrightarrow{\left[ T\right] }A\) from W whose tag is \(\sharp \tau \), the dimension of \(\overrightarrow{x}\) is equal to that of \(\overrightarrow{r}\), \(E^{\textrm{I}}\Vdash _{\textrm{kn}}\overrightarrow{r}\), and \(E^{\textrm{I}}\Vdash _{\textrm{kn}}\overrightarrow{t}\left[ \overrightarrow{r}/\overrightarrow{x}\right] \) for some trigger set \(\overrightarrow{t}\) from \(\overrightarrow{[T]}\). (cf. App. A of the TR for more invariants.)

Problem-specific invariants are tailored to the distinct features of each problem, focusing on properties of solver states reachable from specified initial states, and tracing the origins of terms in intermediate states. For example, consider an arbitrary intermediate state \(\left\langle W,A,E\right\rangle \): for each extended clause in A of the form \(\lnot \textit{member}\left( t,\textit{union}\left( a,b\right) \right) \vee \textit{member}(t,a)\vee \textit{member}(t,b)\), \(\left( \sharp \text {union-elim}:(a,b,t)\right) \in E^{\textrm{H}}\) holds; the tag being for the axiom (union-elim) discussed in Example 5. This invariant concerns the origins of the extended clauses in the current clauses A. Case-splitting on a current clause (e.g. the one above) may seem to introduce a new term, but this invariant indicates that this term is not new—it is equal to a known term that triggered a prior instantiation, as tracked by the E-history \(E^{\textrm{H}}\). This ensures a traceable lineage for each clause, linking it back to a specific quantifier in the E-history. (cf. App. B of the TR for more invariants.)

We finally define the instantiation termination theorem for our set theory axiomatisation, proven by induction on traces leveraging both general-purpose and set-theory-specific invariants. Note that termination is proved against an arbitrary set of ground literals—this works because our progress measure and invariants are defined parametrically with the current state. Given these right invariants and termination measure, the proof is straightforward (cf. App. B of the TR). This theorem guarantees the absence of matching loops in this axiomatisation; practitioners of this axiomatisation hence can confidently seek terminating answers to ground theory queries.

Theorem 1

(Instantiation Termination for Set Theory). Suppose L is an arbitrary set of ground literals. The initial state is \(s_{0}=\left\langle W_{0},A_{0},E_{0}\right\rangle \), where \(W_{0}\) is our axiomatisation for set theory with tags, \(A_{0}=\emptyset \), \(E_{0}^{\textrm{I}} = \emptyset \triangleleft L\), and \(E_{0}^{\textrm{H}}=\emptyset \). Any sequence of transitions from the initial state \(s_{0}\), where \(\longrightarrow \) defined in Sect. 3.5 represents the transition relation, has a finite length.

5 Related Work

For the purpose of program verification, where SMT solvers are used to prove unsatisfiability, E-matching is widely used to handle quantifiers. The idea of E-matching dates back to Nelson [25], which was first put into practice in Simplify [8]. Since then, efficient handling of E-matching-based quantifier instantiation has been studied by, e.g.  de Moura and Bjørner [21] for Z3, Ge et al. [14] for CVC3, Bansal et al. [1] for Z3 and CVC4, and Moskal et al. [20] for Fx7. When satisfiable results and their models are of interest, model-based quantifier instantiation (MBQI) [15] can be used to handle quantifiers.

Dross et al. [9,10,11] formally define and reason about instantiation termination in a similar context. They define a novel logic with first-class triggers, introduce instantiation trees as algebraic objects to help define termination, and provide an ingenious technique for showing, for their implementation in Alt-Ergo, that finding a single finite instantiation tree is sufficient for termination.

Despite being a powerful tool for numerous deep meta-theoretic results [9], we believe that applying a formal inductive construction of instantiation trees for larger examples would be complex in practice: existing examples focus instead on bounds for the sets of terms ever generatable by a solver run. These arguments closely relate to our inductive termination proofs over traces. Our work enables detailed formal proofs based directly on such familiar notions from program reasoning, including inductive invariants and well-founded measures.

The approach of this prior work also requires restrictions on solver behaviour, including fairness of quantifier instantiation, and eager application of theory deductions (via entailments in their custom logic)Footnote 10. Our operational model and termination proofs do not require or build in such assumptions. Still, restricting our traces (e.g. with fairness constraints) would be simple to do if desired for specific applications. Our weak assumptions make our approach (extended with appropriate theory deduction steps) applicable to SMT solvers broadly; solvers such as Z3 [22] and cvc5 [2] commonly interleave theory reasoning and quantifier instantiation in (bounded or exhaustive) rounds of multiple steps.

The Axiom Profiler [4] leverages Z3 log files to provide comprehensive support for analysing quantifier instantiations. The tool focuses on helping users effectively understand and debug problematic solver runs, rather than proving their absence. It was validated by empirical evidence rather than formal proofs.

Existing works on the termination of SMT transition systems [3, 5, 6, 23] demonstrate that divergence is prevented by ensuring all new terms derive from a finite basis. In contrast, in our work, a finite basis does not imply termination—the basis can grow. At a high level these works prove that certain solver aspects always terminate. However, E-matching cannot have this property; instead it places the onus on the author of an axiomatisation to achieve termination through careful selection of axioms and triggers, motivating a user-facing model.

6 Conclusion and Future Work

We have shown a novel model for E-matching as widely employed in SMT solvers, abstracting over solver details while enabling detailed and formal proofs of instantiation termination. Our model has been shown to apply directly and rigorously to the kinds of axiomatisations used in practical verification tools.

In future work, we would like to explore axiomatisations that rely on more-restricted characteristics of a solver, such as fairness of instantiation selection or theory reasoning steps. Similarly to our E-interfaces, we will investigate suitable abstractions over theory solver interactions incorporated into a proof search.

While instantiation termination is a much sought-after property, the complementary problem of guaranteed instantiation completeness is a natural next target to investigate with our novel operational model, which may require us to also explore various fairness restrictions of our model’s transition relation.