This section overviews recent techniques used by SMT solvers for quantifier instantiation, and comments on their relative strengths and weaknesses. We will focus on enumerative quantifier instantiation, a technique which has received little attention in recent work, but has several compelling advantages with respect to current techniques.
Definition 1
(Instantiation Strategy). An instantiation strategy takes as input:
-
1.
A \(\mathscr {T}\)-satisfiable set of ground literals \(\mathsf {E}\), and
-
2.
A quantified formula \(\forall \bar{x}.\>\varphi \).
It outputs a set of substitutions \(\{ \sigma _1, \ldots , \sigma _n \}\) where \(\mathsf {dom}( \sigma _i ) = \bar{x}\) for each \(i = 1,\ldots ,n\).
Figure 2 gives four instantiation strategies used by modern SMT solvers, each that have the interface given in Definition 1. The first three have been described in detail in previous works (see [25] for a recent overview). We briefly review these techniques in this section. The fourth, enumerative quantifier instantiation, is the subject of this paper.
Conflict-based instantiation (\(\mathbf{c}\)) was introduced in [28] as a technique for improving the performance of SMT solvers for unsatisfiable problems. In this strategy, we return a substitution \(\sigma \) such that \(\varphi \sigma \) together with \(\mathsf {E}\) is unsatisfiable, We refer to \(\varphi \sigma \) as a conflicting instance (for \(\mathsf {E}\)). Typical implementations of this strategy do not insist that a conflicting instance be returned if one exists, and hence the strategy may choose to return the empty set of substitutions. Recent work [4, 5] gives a strategy for conflict-based instantiation that has refutational completeness guarantees for the empty theory with equality, that is, when a conflict instance exists for a quantified formula in this theory, the strategy is guaranteed to return it.
E-matching instantiation (\(\mathbf{e}\)) is the most commonly used strategy for quantifier instantiation in modern SMT solvers [13, 15, 18]. In this strategy, we first heuristically choose a set of triggers for a quantified formula \(\forall \bar{x}.\> \varphi \), where a trigger is a tuple of terms whose free variables are \(\bar{x}\). In practice, triggers can be selected using user-provided annotations, or selected automatically by the SMT solver. For each trigger \(\bar{t}_i\), we select a set of substitutions \(S_i\) such that for each \(\sigma \) in this set, \(\mathsf {E}\) entails that \(\bar{t}_i \sigma \) is equal to a tuple of ground terms \(g_i\) in \(\mathsf {E}\). We return the union of these sets \(S_i\) for each selected trigger. E-matching instantiation is generally incomplete, but works well in practice for unsatisfiable problems, and hence is a key component of most SMT solvers that support quantified formulas.
Model-based quantifier instantiation (\(\mathbf{m}\)) was introduced in [19], and has also been used for improving the performance of finite model finding [29]. In this strategy, we first construct a model \(\mathscr {M}\) for the quantifier-free portion of our input \(\mathsf {E}\), where typically the interpretations of functions for values not constrained by \(\mathsf {E}\) are chosen heuristically. Notice that \(\mathscr {M}\) does not necessarily satisfy the quantified formula \(\forall \bar{x}.\> \varphi \). If it does not, we return a single substitution \(\sigma \) for which \(\mathscr {M}\) does not satisfy \(\varphi \sigma \), where typically \(\sigma \) maps variables from \(\bar{x}\) to terms that occur in
. With respect to conflict-based and E-matching instantiation, model-based quantifier instantiation has the advantage that it is model sound: when it returns \(\emptyset \), then \(\mathsf {E}\cup \{ \forall \bar{x}.\> \varphi \}\) is satisfiable.
This paper revisits enumerative quantifier instantiation (\(\mathbf{u}\)) as a viable alternative to model-based quantifier instantiation. In this strategy, we assume an ordering \(\preceq \) on quantifier-free terms. This ordering is not related to the usual term ordering one generally uses for saturation theorem proving, but rather determines which instance will be generated first. The strategy returns the substitution \(\{ \bar{x} \mapsto \bar{t} \}\), where \(\bar{t}\) is the minimal tuple of terms with respect to \(\preceq \) from
such that \(\varphi \{ \bar{x} \mapsto \bar{t} \}\) is not entailed by \(\mathsf {E}\). We refer to this strategy as enumerative instantiation since in the worst case it generates instantiations by enumerating tuples of all terms of the proper sort from \(\mathsf {E}\), according to the ordering \(\preceq \). In practice, the number of instantiations produced by this strategy is kept small by interleaving it with other strategies like \(\mathbf{c}\) or \(\mathbf{e}\), or due to the fact that a small number of instances may already allow the SMT solver to conclude the input is unsatisfiable. Moreover, thanks to the results in Sect. 3, this strategy is refutationally complete and model sound for quantified formulas in the empty theory with equality.
Example 1
Consider the set of ground literals \(\mathsf {E}= \{ \lnot P( a ), \lnot P( b ), P( c ), \lnot R( b ) \}\). For the input \(( \mathsf {E}, \forall x.\>P( x ) \vee R( x ) )\), the strategies in this section will do the following.
-
1.
Conflict based: Since \(\mathsf {E},\, P( b ) \vee R( b )\,\models \,\bot \), this strategy will return \(\{ \{ x \mapsto b \} \}\).
-
2.
E-matching: This strategy may choose the singleton set of triggers \(\{ ( P( x ) ) \}\). Based on this trigger, since \(\mathsf {E}\,\models \,P( x ) \{ x \mapsto t \} \approx P( t )\) where
for \(t = a, b, c\), this strategy may return \(\{ \{ x \mapsto a \},\, \{ x \mapsto b \},\, \{ x \mapsto c \} \}\).
-
3.
Model-based: This strategy will construct a model \(\mathscr {M}\) for \(\mathsf {E}\), where assume that \(P^\mathscr {M}= \lambda x.\> \mathsf {ite}( x \approx c,\, \top ,\, \bot )\) and \(R^\mathscr {M}= \lambda x.\> \bot \). Since \(\mathscr {M}\) does not satisfy \(P( a ) \vee R( a )\), this strategy may return \(\{ \{ x \mapsto a \} \}\).
-
4.
Enumerative instantiation: This strategy chooses an ordering on tuples of terms, say the lexicographic extension of \(\preceq \) where \(a \prec b \prec c\). Since \(\mathsf {E}\) does not entail \(P( a ) \vee R( a )\), this strategy returns \(\{ \{ x \mapsto a \} \}\). \(\square \)
In the previous example, clearly \(\{ x \mapsto b \}\) is the most useful substitution, since it leads to an instance \(P( b ) \vee R( b )\) which together with \(\mathsf {E}\) is unsatisfiable. The substitution \(\{ x \mapsto c \}\) is definitely not a useful substitution, since it is already entailed by \(P( c ) \in \mathsf {E}\). The substitution \(\{ x \mapsto a \}\) is potentially useful since it forces the solver to satisfy \(P( a ) \vee R( a )\). Here, we point out that the effect of enumerative instantiation and model-based instantiation is essentially the same, as both return an instance that is not entailed by \(\mathsf {E}\). However, the substitutions produced by enumerative instantiation often have advantages with respect to model-based instantiation on unsatisfiable problems.
Example 2
Consider the set of ground literals \(\mathsf {E}= \{ \lnot P( a ),\, R( b ),\, S( c ) \}\) and the quantified clauses \(\mathsf {Q}= \{ \forall x.\>R( x ) \vee S( x ),\, \forall x.\>\lnot R( x ) \vee P( x ),\, \forall x.\>\lnot S( x ) \vee P( x ) \}\) in a mono-sorted signature. Notice that \(\mathsf {E}\cup \mathsf {Q}\) is unsatisfiable: it suffices to consider the instances of the three quantified formulas in \(\mathsf {Q}\) with \(x \mapsto a\). On such an input, model-based instantiation will first construct a model for \(\mathsf {E}\). Assume this model \(\mathscr {M}\) is such that \(P^{\mathscr {M}} = \lambda x.\> \bot \), \(R^{\mathscr {M}} = \lambda x.\> \mathsf {ite}( x \approx b,\, \top ,\, \bot )\), and \(S^{\mathscr {M}} = \lambda x.\> \mathsf {ite}( x \approx c,\, \top ,\, \bot )\). Assuming enumerative instantiation chooses the lexicographic extension of a term ordering \(\preceq \) where \(a \prec b \prec c\). The following table summarizes the result of running the two strategies.
The second and third columns show the sets of possible values of x that are considered with model-based and enumerative instantiation respectively, and the third and fourth columns show one possible selection. The instances corresponding to the three substitutions returned by enumerative instantiation \(R( a ) \vee S( a )\), \(\lnot R( a ) \vee P( a )\) and \(\lnot S( a ) \vee P( a )\) when conjoined with \(\lnot P( a )\) from \(\mathsf {E}\) are unsatisfiable, whereas the instances produced by model-based instantiation do not suffice to show that \(\mathsf {E}\) is unsatisfiable. Hence, the latter will consider an extension of \(\mathsf {E}\) that satisfies the instances \(R( a ) \vee S( a )\), \(\lnot R( b ) \vee P( b )\) and \(\lnot S( c ) \vee P( c )\) and guess another model for this extension. \(\square \)
A key observation is that useful instantiations can be obscured by guesses made when constructing models \(\mathscr {M}\). Here, since we decided \(R( a )^\mathscr {M}= \bot \), the substitution \(\{ x \mapsto a \}\) was not considered when applying model-based instantiation to the second quantified formula, and since \(S( a )^\mathscr {M}= \bot \), the substitution \(\{ x \mapsto a \}\) was not considered when applying it to the third. In implementations of model-based instantiation, certain values in models are chosen heuristically, leading to this behavior. This is done out of necessity, since determining whether there exists a model that satisfies quantified formulas, even for a fixed context, is a challenging problem.
On the other hand, the range of substitutions considered by enumerative instantiation in the previous example include all terms that correspond to instances that are not entailed by \(\mathsf {E}\). The substitutions it considers are “minimally diverse”, that is, in the previous example they introduce new predicates on term a only, whereas model-based instantiation introduces new predicates on a, b and c. Reducing the number of new terms introduced by instantiations can have a significant positive impact on performance in practice. Furthermore, enumerative instantiation has the advantage that a term ordering allows fine-grained heuristics better suited for unsatisfiable problems, which we comment on in Sect. 4.1.
Example 3
Consider the sets \(\mathsf {E}= \{ a \not \approx b,\, b \not \approx c,\, a \not \approx c \}\) and \(\mathsf {Q}= \{ \forall x.\>P( x ) \}\). For the input \(( \mathsf {E},\, \forall x.\> P( x ) )\), model-based quantifier instantiation will first construct a model \(\mathscr {M}\) for \(\mathsf {E}\), where assume that \(P^\mathscr {M}= \lambda x.\> \top \). It is easy to see \(\mathscr {M}\,\models \,\varphi \{ x \mapsto t \}\) for
, and hence it returns the empty set of substitutions, indicating that \(\mathsf {E}\cup \mathsf {Q}\) is satisfiable. On the other hand, assume enumerative instantiation chooses the lexicographic extension of a term ordering \(\preceq \) where \(a \prec b \prec c\). Since
and a is smaller than b and c according to \(\preceq \), \(\mathbf{u}( \mathsf {E},\, P( x ) )\) returns the set containing \(\{ x \mapsto a \}\). Subsequently and for similar reasons, two more iterations of this strategy will be invoked, resulting in the instances P(b) and P(c) before it terminates with the empty set. \(\square \)
In this example, model-based instantiation was able to terminate on the first iteration, since it guessed the correct interpretation for P, whereas enumerative instantiation considered substitutions mapping x to each ground term a, b, c from \(\mathsf {E}\). For this reason, model-based instantiation is typically better suited for satisfiable problems.
4.1 Implementing Enumerative Instantiation
We comment on several important details concerning the implementation of enumerative quantifier instantiation in the SMT solver CVC4.
Term Ordering. Given a term ordering \(\preceq \), CVC4 considers the extension to tuples of terms such that:
$$\begin{aligned} \begin{array}{lcr} ( t_1, \ldots , t_n ) \prec ( s_1, \ldots , s_n ) &{} \text { if } &{} {\left\{ \begin{array}{ll} {\mathop {\max }\nolimits _{i=1}^n} t_i \prec {\mathop {\max }\nolimits _{i=1}^n} s_i, \text { or } \\ {\mathop {\max }\nolimits _{i=1}^n} t_i = {\mathop {\max }\nolimits _{i=1}^n} s_i \text { and } ( t_1, \ldots , t_n ) \prec _{\text {lex}} ( s_1, \ldots , s_n ) \end{array}\right. } \end{array} \end{aligned}$$
where \(\prec _{\text {lex}}\) is the lexicographic extension of \(\prec \). For example, if \(a \prec b \prec c\), then we have that \(( a, a ) \prec ( a, b ) \prec ( b, a ) \prec ( b, b ) \prec ( a, c ) \prec ( c, b ) \prec ( c, c )\). By this ordering, we consider substitutions involving c only after all combinations of substitutions involving a and b are considered. This choice is important since it leads to instantiations that introduce fewer terms, and are thus more likely to lead to conflicts at the ground level.
The underlying term ordering is determined dynamically based on the current set of assertions \(\mathsf {E}\). At all times, we maintain a finite list of quantifier-free terms such that we have fixed the ordering \(t_1 \prec \ldots \prec t_n\). Then, if all combinations of instantiations for \(t_1, \ldots , t_n\) are currently entailed by \(\mathsf {E}\), we choose a term
that is such that
for \(i = 1, \ldots , n\) if one exists, and append it to our ordering so that \(t_n \prec t\). The particular choice of t beyond this criteria is arbitrary. An experimental evaluation of more sophisticated term orderings, such as those inspired by first-order automated theorem proving [2] is the subject of future work.
Entailment Checks. For a set of ground equalities and disequalities \(\mathsf {E}\), quantified formula \(\forall \bar{x}.\> \varphi \) and substitution \(\{ \bar{x} \mapsto \bar{t} \}\), CVC4 implements a two-layered method for checking whether the entailment \(\mathsf {E}\,\models \,\varphi \{ \bar{x} \mapsto \bar{t} \}\) holds. First, we maintain a cache of instantiations that have already been returned on previous iterations. Hence if \(\mathsf {E}\) satisfies a set of formulas containing \(\varphi \{ \bar{x} \mapsto \bar{s} \}\), where \(\mathsf {E}\,\models \,\bar{t} \approx \bar{s}\), then the entailment clearly holds.
Second, we use an incomplete and fast method for inferring when an entailment holds. We first compute from \(\mathsf {E}\) congruence classes over
. For each
, let \([ t ]\) be the representative of term t in this equivalence relation. For each function f, we use a term index data structure \(\mathscr {I}_{f}\) that stores an entry of the form \(( [ t_1 ], \ldots , [ t_n ] ) \rightarrow [ f( t_1, \ldots , t_n ) ] \in \mathscr {I}_{f}\) for each uninterpreted function application
. To check the entailment of \(\mathsf {E}\,\models \,\ell \) where \(\ell \) is a literal, we update \(\ell \) based on the iterative process until a fixed point is reached:
-
1.
Replace each constant t in \(\ell \) with \([ t ]\).
-
2.
Replace each function term \(f( t_1, \ldots , t_n )\) in \(\ell \) with s if \(( t_1, \ldots , t_n ) \rightarrow s \in \mathscr {I}_{f}\).
-
3.
If \(\ell \) is \(t \approx t\), replace it by \(\top \).
-
4.
If \(\ell \) is \(t \not \approx s\) and \(t' \not \approx s' \in \mathsf {E}\) where \([ t' ] = t\) and \([ s' ] = s\), replace it by \(\top \).
Then, if the resultant \(\psi \) is \(\top \), then the entailment holds. Although not shown here, the above process is extended in a straightforward way to handle Boolean structure, and also can be extended in the presence of other background theories in a straightforward way by incorporating theory-specific rewriting steps.
Restricting Enumeration Space. Enumerative instantiation can be refined further by noticing that only a subset of the set of terms
will ever be relevant for showing unsatisfiability of a quantified formula. An approach in this spirit was used by Ge and de Moura [19], where decidable fragments were identified by noticing that the relevant domains of quantified formulas in these fragments are guaranteed to be finite. In that work, the relevant domain of a quantified formula
is computed based on the terms in \(\mathsf {E}\) and the structure of its body \(\psi \). For example, t is in the relevant domain of function f for all ground terms f(t), the relevant domain of x for a quantified formula containing the term f(x) is equal to the relevant domain of f, and so on. A related approach is to use sort inference [8, 9, 22], to compute more precise sort information and thus decrease the number of possible instantiations.
Example 4
Say \(\mathsf {E}\cup \mathsf {Q}= \{ a \not \approx b, f( a ) \approx c \} \cup \{ \forall x.\>P( f( x ) ) \}\), where a, b, c, x are of sort \(\tau \), f is a unary function \(\tau \rightarrow \tau \), and P is a predicate on \(\tau \). It can be shown that \(\mathsf {E}\cup \mathsf {Q}\) is equivalent to \(\mathsf {E}^s \cup \mathsf {Q}^s =\) \(\{ a_1 \not \approx b_1, f_{12}( a_1 ) \approx c_2 \} \cup \{ P_2( f_{12}( x_1 ) ) \}\), where \(a_1, b_1\), \(x_1\) are of sort \(\tau _1\), \(c_2\) is of sort \(\tau _2\), \(f_{12}\) is of sort \(\tau _1 \rightarrow \tau _2\), and \(P_2\) is a predicate on \(\tau _2\). \(\square \)
Sorts can be inferred in this manner using a linear traversal on the input formula (for details, see for instance Sect. 4 of [22]). This technique narrows the set of terms considered by enumerative instantiation. In the above example, whereas enumerative instantiation for \(\mathsf {E}\cup \mathsf {Q}\) might consider the substitutions \(\{ x \mapsto c \}\) or \(\{ x \mapsto f( c ) \}\), for \(\mathsf {E}^s \cup \mathsf {Q}^s\) it would not consider \(\{ x_1 \mapsto c_2 \}\) since their sorts are different, nor would it consider \(\{ x_1 \mapsto f_{12}( c_2 ) \}\) since \(f_{12}( c_2 )\) is not a well-sorted term. Moreover, the Herbrand universe of an inferred subsort may be finite when the universe of its parent sort is infinite. In the above example, the Herbrand universe of \(\tau _1\) is \(\{ a_1,b_1 \}\) and \(\tau _2\) is \(\{ f_{12}( a_1 ), f_{12}( b_1 ), c_2 \}\), whereas the Herbrand universe of \(\tau \) is infinite.
Compound Strategies. Since the instantiation strategies from this section have their respective strengths and weaknesses, it is valuable to combine them. We consider two ways of combining strategies which we refer as priority instantiation and interleaved instantiation. For base strategies \(\mathbf{s_1}\) and \(\mathbf{s_2}\), priority instantiation (\(\mathbf{s_1};\mathbf{s_2}\)) first invokes \(\mathbf{s_1}\). If this strategy returns a non-empty set of substitutions, it returns that set, otherwise it returns the instances returned by \(\mathbf{s_2}\). On the other hand, interleaved instantiation (\(\mathbf {s_1}\)+\(\mathbf {s_2}\)) returns the union of the substitutions returned by the two strategies.
Enumerative instantiation is the most effective when used as a complement to heuristic strategies. In particular, we will see in the next section that the strategies c;e;u and c;e+u are the most effective strategies for unsatisfiable problems in CVC4.