1 Introduction

Applications of automated reasoning generate formulas involving both quantifiers and symbols defined in background theories. For example, software verification needs reasoners that decide the satisfiability of quantified formulas modulo theories such as data structures and arithmetic (e.g., [20]). Therefore, endowing SMT solvers with quantifier reasoning (e.g., [3, 9, 11,12,13,14, 22]), enriching first-order theorem provers with built-in theories (e.g., [1, 2, 19]), and integrating provers and solvers [7], are major research objectives.

If there is a single background theory \(\mathcal {T}\), the \(\mathcal {T}\)-satisfiability of quantified formulas can be reduced to that of quantifier-free formulas if \(\mathcal {T}\) admits quantifier elimination (QE): for every formula \(\varphi \) there exists a quantifier-free formula F that is \(\mathcal {T}\)-equivalent to \(\varphi \). Since computing F can be prohibitively expensive (e.g., exponential in linear rational arithmetic (\(\textsf{LRA}\)) and doubly exponential in linear integer arithmetic (\(\textsf{LIA}\)) [8]), QE is not a practical solution.

In this paper we propose a practical solution in the form of a new algorithm called \(\textsf{QSMA}\). In \(\textsf{QSMA}\) the computation of quantifier-free model-based under-approximations (MBU) and model-based over-approximations (MBO) of quantified formulas embodies a lazy approach to QE, which is tailored for \(\mathcal {T}\)-satisfiability. MBU generates a quantifier-free implicant of the given formula that is true in the given model. Model(-guided) generalization for linear [12] and nonlinear real arithmetic (\(\textsf{NRA}\)) [17] is an instance of MBU. MBO generates a quantifier-free implied formula that is false in the given model. Model interpolation for \(\textsf{NRA}\) [17] is an instance of MBO.

The \(\textsf{QSMA}\) algorithm assumes that the theory \(\mathcal {T}\) is complete. By its recursive nature, \(\textsf{QSMA}\) solves a generalized form of the satisfiability problem, called quantified SMA (satisfiability modulo theory and assignment): given a formula \(\varphi \) with arbitrary quantification, and an initial assignment to Boolean or first-order subterms of \(\varphi \), find a theory model of \(\varphi \) that extends the initial assignment, or report that none exists. In addition to \(\textsf{QSMA}\) and its total correctness, we present an optimized variant named \(\textsf{OptiQSMA}\), which preserves total correctness and is implemented in the YicesQS solver built on top of Yices 2. A report on experimental results from the 2022 SMT competition and a discussion complete the paper. We begin with a high-level view of \(\textsf{QSMA}\).

1.1 High-Level View of the \(\textsf{QSMA}\) Algorithm

The \(\textsf{QSMA}\) algorithm works by progressively instantiating quantified variables. Consider a formula \(\varphi \) of the form \(\exists \bar{x}_1 . \forall \bar{x}_2 . \exists \bar{x}_3 \ldots F[\bar{x}_1,\bar{x}_2,\bar{x}_3,\ldots ]\) where F is quantifier-free. For example, suppose the theory is \(\textsf{LRA}\), \(\varphi = \exists x . \forall y . \exists z. F\) and \(F = z \ge 0 \wedge x \ge 0 \wedge y + z \ge 0\). Say that \(\textsf{QSMA}\) assigns \(x {\leftarrow }_{{}}0\). Whatever value is chosen for y, the algorithm can show that \(\varphi \) is true in \(\textsf{LRA}\) by assigning \(z {\leftarrow }_{{}}max(0,-y)\). If \(F = z \ge 0 \wedge x \ge 0 \wedge y + z \le 0\), no matter which (non-negative) value \(\textsf{QSMA}\) chooses for x, it can show that \(\varphi \) is false in \(\textsf{LRA}\) by picking \(y {\leftarrow }_{{}}1\), because there is no value for z that satisfies \(z \ge 0 \wedge z \le -1\).

For an example that is not in prenex normal form, consider a formula \(\varphi \) of the form \(\exists x . ((\forall y_1 . F_1[x,y_1]) \Rightarrow (\forall y_2 . F_2[x,y_2]))\), where \(F_1\) and \(F_2\) are quantifier-free. \(\textsf{QSMA}\) sees the formula as \(\exists x . ((\exists y_1 . \lnot F_1[x,y_1]) \vee (\lnot \exists y_2 . \lnot F_2[x,y_2]))\), and then as \(\exists x . (p_1 \vee \lnot p_2)\), where \(p_1\) and \(p_2\) are proxy Boolean variables for the quantified subformulas. \(\textsf{QSMA}\) assigns values to x, \(p_1\), and \(p_2\). If \(p_1\) is assigned \(\textsf{true}\), the algorithm tries to extend the assignment with a value for \(y_1\) that satisfies \(\lnot F_1[x,y_1]\). If \(p_2\) is assigned \(\textsf{false}\), the algorithm tries to show that there is no value for \(y_2\) that satisfies \(\lnot F_2[x,y_2]\).

Without loss of generality (\(\lnot \lnot \) converts \(\forall \) into \(\lnot \exists \lnot \)), we consider formulas

$$ \varphi = \exists \bar{x}. F[\bar{z},\bar{x},\bar{p}] \{p_i {\leftarrow }_{{}}\exists \bar{y}_i. G_i[\bar{z},\bar{x},\bar{y}_i]\}_{i=1}^k. $$

\(F[\bar{z},\bar{x},\bar{p}]\) denotes a quantifier-free formula where the variables \(\bar{z}\), \(\bar{x}\), and \(\bar{p}\) occur. Tuples \(\bar{z}\) and \(\bar{x}\) contain the first-order variables occurring free in F. Formula F is quantifier-free because the quantified subformulas \(\varphi _i = \exists \bar{y}_i. G_i[\bar{z},\bar{x},\bar{y}_i]\) are replaced by proxy Boolean variables \(\bar{p} = p_1,\ldots p_k\). Given an initial assignment to the free variables \(\bar{z}\), we construct a \(\textsf{QSMA}\)-tree for \(\varphi \). \(\textsf{QSMA}\) starts trying to satisfy \(F[\bar{z},\bar{x},\bar{p}]\). If it fails, it means that \(\varphi \) is false under the initial assignment. If it succeeds, there are two cases. If \(k = 0\), formula \(\varphi \) is true under the initial assignment. If \(k > 0\), the algorithm descends recursively to consider the \(\textsf{QSMA}\)-subtrees for the \(\varphi _i\) subformulas (\(1\le i\le k\)). If \(\textsf{QSMA}\) assigned \(\textsf{true}\) to \(p_i\), it tries to show that \(\varphi _i\) is true. If \(\textsf{QSMA}\) assigned \(\textsf{false}\) to \(p_i\), it tries to show that \(\varphi _i\) is false. If it succeeds for all \(\textsf{QSMA}\)-subtrees, formula \(\varphi \) is true under the initial assignment. For this, the model built by \(\textsf{QSMA}\) should satisfy \(F[\bar{z},\bar{x},\bar{p}] \wedge \bigwedge _{i=1}^n (p_i \Leftrightarrow \varphi _i)\). Otherwise, formula \(\varphi \) is false under the initial assignment.

2 Preliminaries

A signature \(\varSigma \) is given by a set S of sorts and a set of sorted symbols. Given a class \(\mathcal V^{}= (\mathcal V^{s})_{s\in S}\) of disjoint sets of sorted variables, \({\varSigma }[\mathcal V^{}]\)-formulas, \(\varSigma \)-sentences, and \({\varSigma }[\mathcal V^{}]\)-interpretations are defined as usual. A \(\varSigma \)-structure is a \({\varSigma }[\emptyset ]\)-interpretation. We use x, y, z for first-order variables, p for Boolean ones, and \(\bar{x}\), \(\bar{y}\), \(\bar{z}\), and \(\bar{p}\) for tuples of such variables. We also use \(\varphi \) and \(\psi \) for formulas, F and G for quantifier-free formulas, \(\mathcal {M}\) for interpretations, \(\models \) for satisfaction and entailment, \(=\) for identity, \(\uplus \) for disjoint union, and \(\setminus \) for set difference. \( FV (\varphi )\) is the set of the variables occurring free in \(\varphi \). Slightly abusing the notation, \( FV (\varphi )\) is also treated as a tuple. Implication is written \(\Rightarrow \) and logical equivalence is written \(\Leftrightarrow \). If \(\mathcal V^{}_1\subseteq \mathcal V^{}_2\) (i.e., \(\mathcal V^{s}_1\subseteq \mathcal V^{s}_2\) for all \(s\in S\)), a \({\varSigma }[\mathcal V^{}_2]\)-interpretation \(\mathcal {M}_2\) is an extension of a \({\varSigma }[\mathcal V^{}_1]\)-interpretation \(\mathcal {M}_1\) to \(\mathcal V^{}_2\), if \(\mathcal {M}_2\) interprets the variables in \(\mathcal V^{s}_2\setminus \mathcal V^{s}_1\) for all \(s\in S\) and is otherwise identical to \(\mathcal {M}_1\).

A theory \(\mathcal {T}\) is defined by a signature \(\varSigma \) and a set of \(\varSigma \)-sentences called \(\mathcal {T}\)-axioms. A model of \(\mathcal {T}\), or \(\mathcal {T}\)-model, is a \(\varSigma \)-structure that satisfies the \(\mathcal {T}\)-axioms. A \({\mathcal T}[\mathcal V^{}]\)-model is a \({\varSigma }[\mathcal V^{}]\)-interpretation that is a \(\mathcal T\)-model when the interpretation of variables is ignored. A theory \(\mathcal {T}\) is complete, if it is consistent, and for all \(\varSigma \)-sentences F, either F or \(\lnot F\) is provable from the \(\mathcal {T}\)-axioms. In this paper we deal with a single theory \(\mathcal {T}\) that has a unique \(\mathcal {T}\)-model \(\mathcal {M}_0\), so that the interpretation of everything except variables is fixed. Therefore \(\mathcal {T}\) is complete, for \(\varSigma \)-sentences \(\mathcal {T}\)-validity, \(\mathcal {T}\)-satisfiability, and truth in \(\mathcal {M}_0\) coincide, all \({\mathcal T}[\mathcal V^{}]\)-models are extensions of \(\mathcal {M}_0\), and a \(\mathcal {T}\)-satisfiability procedure is concerned only with assignments to variables. Since there are one theory and one signature, we write formula for \({\varSigma }[\mathcal V^{}]\)-formula and model for \(\mathcal T\)-model or \({\mathcal T}[\mathcal V^{}]\)-model. A conservative theory extension \(\mathcal {T}^+\) of \(\mathcal {T}\) adds to \(\varSigma \) special constants, called values, to name elements in the domain of \(\mathcal {M}_0\) as needed. Conservative means that a \(\mathcal {T}\)-satisfiable formula is also \(\mathcal {T}^+\)-satisfiable.

The quantified SMA problem for theory \(\mathcal {T}\) asks whether \(\mathcal {M}_0 \models \varphi \) for an arbitrary formula \(\varphi \) and an initial assignment of values to the variables in \( FV (\varphi )\). Formulas have the form \(\varphi = \exists \bar{x}. F[\bar{z},\bar{x},\bar{p}] \{p_i {\leftarrow }_{{}}\exists \bar{y}_i. G_i[\bar{z},\bar{x},\bar{y}_i]\}_{i=1}^k\) described in the introduction, where \( FV (\varphi ) = \bar{z}\) and quantified variables are standardized apart. If \( FV (\varphi ) = \emptyset \), we still have SMA problems when considering subformulas under an assignment to existentially quantified variables.

3 The \(\textsf{QSMA}\) Framework

The \(\textsf{QSMA}\) algorithm works with a tree representation of a formula \(\varphi \). A node n in the tree is labeled with a pair \((\bar{x},F)\), where \(\bar{x}\) is a tuple of first-order variables, called the local variables of n, and F is a quantifier-free formula. The local variables are implicitly existentially quantified: they are existentially quantified variables whose quantifers have been stripped, so that they are locally free, so to speak, and can be assigned by the algorithm. An arc from a node n to a child node b is labeled with a Boolean variable p. This Boolean variable stands as a proxy for the quantified subformula represented by the subtree rooted at node b. Therefore, the Boolean variable p is also considered a proxy of b itself.

A formula \(\varphi \) may have free variables \( FV (\varphi ) = \bar{z}\), whose assignment is given initially as part of the SMA problem instance. These variables are called rigid, because their assignments do not change during the tree traversal. As the algorithm traverses the tree, the local variables of a node n are rigid from the point of view of a child node b: their assignments do not change during the traversal of the subtree rooted at b. Therefore, we represent a formula \(\varphi \) as a pair formed by a tuple of rigid variables and a labeled tree. Slightly abusing the terminology, we call this pair a \(\textsf{QSMA}\)-tree. The root of a tree T is denoted \( root (T)\).

Definition 1

(\(\textsf{QSMA}\)-tree). Given \(\varphi = \exists \bar{x}. F[\bar{z},\bar{x},\bar{p}] \{p_i {\leftarrow }_{{}}\exists \bar{y}_i. G_i[\bar{z},\bar{x},\bar{y}_i]\}_{i=1}^k\), where \( FV (\varphi ) = \bar{z}\) and \(\varphi _i = \exists \bar{y}_i. G_i[\bar{z},\bar{x},\bar{y}_i]\), \(1\le i \le k\), the \(\textsf{QSMA}\)-tree for \(\varphi \) is the pair \(\mathcal {G}= (\bar{z},T)\), where \(\bar{z}\) is called the tuple of the rigid variables of \(\mathcal {G}\), and T is a labeled tree defined inductively as follows:

  • If \(k = 0\), T consists of a single node r labeled \((\bar{x},F[\bar{z},\bar{x}])\);

  • If \(k > 0\), for all i, \(1\le i \le k\), let \(\mathcal {G}_i = ((\bar{z},\bar{x}),T_i)\) be the \(\textsf{QSMA}\)-tree for \(\varphi _i\), where \( root (T_i)\) is a node \(b_i\) labeled \((\bar{y}_i, G_i[\bar{z},\bar{x},\bar{y}_i])\). Then T is the tree with a new node r labeled \((\bar{x},F[\bar{z},\bar{x},\bar{p}])\) as root, k outgoing arcs labeled \(p_1,\ldots ,p_k\), and \(b_1,\ldots ,b_k\) as children.

If subformula \(\varphi _i\) occurs more than once in \(\varphi \), the same proxy variable \(p_i\) is used for all occurrences. The ancestors of a node n in T are the nodes on the unique path from \( root (T)\) to n excluding n itself. If node n in T is labeled \((\bar{x},F)\), its k outgoing arcs are labeled \(p_1,\ldots ,p_k\), and \(\bar{x}_1,\ldots ,\bar{x}_m\) are the local variables of the ancestors of n, then \( FV (F) \subseteq \{\bar{z},\bar{x}_1,\ldots ,\bar{x}_m,\bar{x},p_1,\ldots ,p_k\}\). The set of the assignable variables at node n is \( Var (n) = \bar{x} \uplus \{p_1,\ldots ,p_k\}\). The set of the rigid variables at node n is \( Rigid (n) = \bar{z} \uplus \bar{x}_1 \uplus \ldots \uplus \bar{x}_m\). Thus, \( FV (F) \subseteq Rigid (n) \cup Var (n)\), \( Rigid ( root (T)) = \bar{z}\), and the \(\textsf{QSMA}\)-subtree rooted at node n is \(\mathcal {G}_n = ( Rigid (n),T_n)\). For a node n with label \((\bar{x},F)\), the components of the label are denoted \(n.\bar{x}\) and n.F. The label of the arc from n to a child b is denoted b.p.

Example 1

Given \(\exists x . ((\forall y_1 . F_1[x,y_1]) \Rightarrow (\forall y_2 . F_2[x,y_2]))\) from Sect. 1.1, let \(\varphi = \exists x . ((\exists y_1 . \lnot F_1[x,y_1]) \vee (\lnot \exists y_2 . \lnot F_2[x,y_2])) = \exists x . (p_1 \vee \lnot p_2)\{p_i\leftarrow \exists y_i . \lnot F_i[x,y_i]\}_{i=1}^2\). The \(\textsf{QSMA}\)-tree for \(\varphi \) has root r labeled \((x, p_1 \vee \lnot p_2)\) with left child \(b_1\) labeled \((y_1, \lnot F_1[x,y_1])\), right child \(b_2\) labeled \((y_2, \lnot F_2[x,y_2])\), and arcs from r to \(b_1\) and from r to \(b_2\) labeled \(p_1\) and \(p_2\), respectively. Note how \( FV (r.F) \subseteq \{x, p_1, p_2\}\), \( Var (r) = \{x, p_1, p_2\}\), and \( Rigid (r) = \emptyset \). Also, \( FV (b_1.F) \subseteq \{x, y_1\}\), \( FV (b_2.F) \subseteq \{x, y_2\}\), \( Var (b_1) = \{y_1\}\), \( Var (b_2) = \{y_2\}\), and \( Rigid (b_1) = Rigid (b_2) =\{x\}\).

Example 2

Consider \(\forall x . ((\exists y_1 . (x \simeq 2{\cdot }y_1)) \Rightarrow (\exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)))\). A double negation eliminates the \(\forall \), yielding \(\lnot (\exists x . ((\exists y_1 . (x \simeq 2{\cdot }y_1)) \wedge (\forall y_2 . (3{\cdot }x \not \simeq 2{\cdot }y_2))))\). Again, a double negation eliminates the \(\forall \), producing \(\lnot (\exists x . ((\exists y_1 . (x \simeq 2{\cdot }y_1)) \wedge (\lnot (\exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)))))\). Let \(\varphi = \exists x . ((\exists y_1 . (x \simeq 2{\cdot }y_1)) \wedge (\lnot (\exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)))) = \exists x . (p_1 \wedge \lnot p_2) \{p_1\leftarrow \exists y_1 . (x \simeq 2{\cdot }y_1),\ p_2\leftarrow \exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)\}\). The original formula is true in \(\textsf{LRA}\) iff \(\varphi \) is false in \(\textsf{LRA}\). The \(\textsf{QSMA}\)-tree for \(\varphi \) has root r labeled \((x, p_1 \wedge \lnot p_2)\) with left child \(b_1\) labeled \((y_1, x \simeq 2{\cdot }y_1)\), right child \(b_2\) labeled \((y_2, 3{\cdot }x \simeq 2{\cdot }y_2)\), and arcs from r to \(b_1\) and from r to \(b_2\) labeled \(p_1\) and \(p_2\), respectively. The variable sets of this tree are as in Example 1.

Conversely, given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\), we can associate a formula \(n.\psi \) to any node n in T and hence to the \(\textsf{QSMA}\)-subtree \(\mathcal {G}_n = ( Rigid (n),T_n)\).

Definition 2

(Formula at a node). Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\), for all nodes n of T, the formula \(n.\psi \) at node n is defined inductively as follows:

  • If n is a leaf labeled \((\bar{x},F[\bar{z},\bar{x}])\), then \(n.\psi = \exists \bar{x} . F[\bar{z},\bar{x}]\);

  • If n has label \((\bar{x},F[\bar{z},\bar{x},\bar{p}])\) and outgoing arcs labeled \(p_1,\ldots ,p_k\), \(k > 0\), connecting n to children \(b_1,\ldots ,b_k\), let \(b_1.\psi ,\ldots ,b_k.\psi \) be the formulas at \(b_1,\ldots ,b_k\). Then \(n.\psi = \exists \bar{x} . F[\bar{z},\bar{x},\bar{p}]\{p_i \leftarrow b_i.\psi \}_{i=1}^k\).

If \(\mathcal {G}= (\bar{z},T)\) is the \(\textsf{QSMA}\)-tree for \(\varphi \) and \(r = root (T)\), then \(r.\psi = \varphi \).

Example 3

For the \(\textsf{QSMA}\)-tree in Example 2, \(b_1.\psi = \exists y_1 . (x \simeq 2{\cdot }y_1)\), \(b_2.\psi = \exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)\), and \(r.\psi = \exists x . (p_1 \wedge \lnot p_2) \{p_1\leftarrow \exists y_1 . (x \simeq 2{\cdot }y_1),\ p_2\leftarrow \exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2)\} = \exists x . ((\exists y_1 . (x \simeq 2{\cdot }y_1)) \wedge \lnot (\exists y_2 . (3{\cdot }x \simeq 2{\cdot }y_2))) = \varphi \).

Since the input formula \(\varphi \) is represented as a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\), the problem of satisfying \(\varphi \) becomes the problem of satisfying \(\mathcal {G}\). Therefore, we define satisfaction of a \(\textsf{QSMA}\)-tree next. Slightly abusing the notation, we use \(\models \) also for satisfaction of \(\textsf{QSMA}\)-trees.

Definition 3

(Satisfaction of a \(\textsf{QSMA}\)-tree). Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\) with \(r = root (T)\), and an extension \(\mathcal {M}\) of \(\mathcal {M}_0\) to \( Rigid (r) = \bar{z}\), \(\mathcal {M}\models \mathcal {G}\) if there exists an extension \(\mathcal {M}^\prime \) of \(\mathcal {M}\) to \( Var (r)\) such that (i) \(\mathcal {M}^\prime \models r.F\), and (ii) for all children b of r, \(\mathcal {M}^\prime (b.p) = \textsf{true}\) iff \(\mathcal {M}^\prime \models \mathcal {G}_b\).

The \(\textsf{QSMA}\) algorithm works by traversing the \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\), and at each node n in T it assigns the assignable variables in \( Var (n) = \bar{x} \uplus \{p_1,\ldots ,p_k\}\). This assignment corresponds to the extension \(\mathcal {M}^\prime \) in Definition 3. Let b be a child of n: the Boolean variable b.p labeling the arc from n to b is a proxy for the quantified subformula \(b.\psi \) of the formula \(n.\psi \). If \(\mathcal {M}^\prime (b.p) = \textsf{true}\), the aim of the algorithm is to show that \(b.\psi \) is true, and if \(\mathcal {M}^\prime (b.p) = \textsf{false}\), the aim is to show that \(b.\psi \) is false. Therefore Condition (ii) in Definition 3 says \(\mathcal {M}^\prime \models \mathcal {G}_b\) if \(\mathcal {M}^\prime (b.p) = \textsf{true}\) and \(\mathcal {M}^\prime \not \models \mathcal {G}_b\) if \(\mathcal {M}^\prime (b.p) = \textsf{false}\). The next theorem shows that satisfying a formula \(\varphi \) and satisfying the \(\textsf{QSMA}\)-tree for \(\varphi \) correspond.

Theorem 1

For all formulas \(\varphi \) with \( FV (\varphi ) = \bar{z}\), for all models \(\mathcal {M}\) extending \(\mathcal {M}_0\) to \(\bar{z}\), if \(\mathcal {G}\) is the \(\textsf{QSMA}\)-tree for \(\varphi \) then \(\mathcal {M}\models \mathcal {G}\) iff \(\mathcal {M}\models \varphi \).

Checking whether \(\mathcal {M}\models \mathcal {G}\) by testing all possible extensions \(\mathcal {M}^\prime \) would not do, because for most theories (e.g., \(\textsf{LRA}\)) there is an infinite number of extensions. We need a way to weed out large parts of the space of candidate models. Let \(\llbracket \varphi \rrbracket \) denote the set of \(\varphi \)’s models. We introduce under-approximations and over-approximations of \(\varphi \) in order to under-approximate and over-approximate \(\llbracket \varphi \rrbracket \).

Definition 4

(Under- and over-approximation). Let \(\varphi \) be a formula with \( FV (\varphi ) = \bar{z}\). Quantifier-free formulas U and O with \( FV (U) = FV (O) = \bar{z}\) are, respectively, an under-approximation and an over-approximation of \(\varphi \), if for all extensions \(\mathcal {M}\) of \(\mathcal {M}_0\) to \(\bar{z}\), \(\mathcal {M}\models U\) implies \(\mathcal {M}\models \varphi \) and \(\mathcal {M}\models \varphi \) implies \(\mathcal {M}\models O\).

It follows that \(\llbracket U\rrbracket \subseteq \llbracket \varphi \rrbracket \subseteq \llbracket O\rrbracket \). Let \(\mathcal {G}= (\bar{z},T)\) be the \(\textsf{QSMA}\)-tree for \(\varphi \), and U and O under- and over-approximations of \(\varphi \), respectively. Then, \(\mathcal {M}\models U\) implies \(\mathcal {M}\models \varphi \) which implies \(\mathcal {M}\models \mathcal {G}\) by Theorem 1. Thus, satisfying an under-approximation is a sufficient condition to have a solution. On the other hand, \(\mathcal {M}\models \lnot O\) implies \(\mathcal {M}\not \models \varphi \) which implies \(\mathcal {M}\not \models \mathcal {G}\) by Theorem 1. By the contrapositive, if \(\mathcal {M}\models \mathcal {G}\) then \(\mathcal {M}\not \models \lnot O\), that is, \(\mathcal {M}\models O\). Thus, satisfying an over-approximation is a necessary condition to have a solution. In order to construct such approximations, we assume to have a solver for theory \(\mathcal {T}\) (and model \(\mathcal {M}_0\)) offering:

  • Model extension: A function \(\textsf{SMA}\) such that for all formulas \(\exists \bar{x} . F[\bar{z},\bar{x}]\), where \(F[\bar{z},\bar{x}]\) is quantifier-free, and all extensions \(\mathcal {M}\) of \(\mathcal {M}_0\) to \(\bar{z}\), \(\textsf{SMA}(F[\bar{z},\bar{x}],\mathcal {M})\) returns either an extension \(\mathcal {M}^\prime \) of \(\mathcal {M}\) to \(\bar{x}\) such that \(\mathcal {M}^\prime \models F[\bar{z},\bar{x}]\), or nil if there is no such extension.

  • Model-based under-approximation: A function \(\textsf{MBU}\) such that for all formulas \(\exists \bar{x} . F[\bar{z},\bar{x}]\), where \(F[\bar{z},\bar{x}]\) is quantifier-free, and all extensions \(\mathcal {M}\) of \(\mathcal {M}_0\) to \(\bar{z}\) such that \(\mathcal {M}\models \exists \bar{x} . F[\bar{z},\bar{x}]\), \(\textsf{MBU}(F[\bar{z},\bar{x}],\bar{x},\mathcal {M})\) returns a quantifier-free formula \(U[\bar{z}]\) such that \(\mathcal {M}\models U[\bar{z}]\) and \(\mathcal {T}\models U[\bar{z}] \Rightarrow (\exists \bar{x} . F[\bar{z},\bar{x}])\).

  • Model-based over-approximation: A function \(\textsf{MBO}\) such that for all formulas \(\exists \bar{x} . F[\bar{z},\bar{x}]\), where \(F[\bar{z},\bar{x}]\) is quantifier-free, and all extensions \(\mathcal {M}\) of \(\mathcal {M}_0\) to \(\bar{z}\) such that \(\mathcal {M}\not \models \exists \bar{x} . F[\bar{z},\bar{x}]\), \(\textsf{MBO}(F[\bar{z},\bar{x}],\bar{x},\mathcal {M})\) returns a quantifier-free formula \(O[\bar{z}]\) such that \(\mathcal {M}\not \models O[\bar{z}]\) and \(\mathcal {T}\models (\exists \bar{x} . F[\bar{z},\bar{x}]) \Rightarrow O[\bar{z}]\).

\(\textsf{MBU}\) and \(\textsf{MBO}\) produce, respectively, an under-approximation and an over-approximation. Formula \(U[\bar{z}]\) is true in model \(\mathcal {M}\) and implies \(\exists \bar{x} . F[\bar{z},\bar{x}]\), and hence can be seen as an interpolant between model and formula. It was called model generalization [12, 17], because \(U[\bar{z}]\) may have other models in addition to \(\mathcal {M}\). Formula \(O[\bar{z}]\) follows from \(\exists \bar{x} . F[\bar{z},\bar{x}]\) and is false in \(\mathcal {M}\), and hence can be seen as a reverse interpolant between formula and model, called model interpolant [17].

4 The \(\textsf{QSMA}\) Algorithm and Its Total Correctness

Let \(\mathcal {G}= (\bar{z},T)\) be the \(\textsf{QSMA}\)-tree for input formula \(\varphi \) with \( FV (\varphi ) = \bar{z}\). Given a model \(\mathcal {M}\) extending \(\mathcal {M}_0\) to \(\bar{z}\), the \(\textsf{QSMA}\) algorithm determines whether \(\mathcal {M}\models \mathcal {G}\). Suppose that U and O are under- and over-approximations of \(\varphi \), respectively. Picture \(\llbracket U\rrbracket \), \(\llbracket \varphi \rrbracket \), and \(\llbracket O\rrbracket \) as bubbles. The \(\llbracket U\rrbracket \) bubble is inside the \(\llbracket \varphi \rrbracket \) bubble, which is inside the \(\llbracket O\rrbracket \) bubble. The idea of the algorithm is to zoom in on a model of \(\varphi \), by progressively weakening U, so that the \(\llbracket U\rrbracket \) bubble inflates, and progressively strengthening O, so that the \(\llbracket O\rrbracket \) bubble deflates. The algorithm operates in this manner for all subformulas of \(\varphi \): for all nodes n of T it maintains under and over-approximations n.U and n.O of \(n.\psi \), progressively weakening n.U and strengthening n.O. The weakening of n.U is done by introducing a disjunction with an MBU. The strengthening of n.O is done by introducing a conjunction with an MBO. The goal is that \(\mathcal {M}\) satisfies \(n.U \vee \lnot n.O\). As soon as \(\mathcal {M}\) satisfies n.U, we know that \(\mathcal {M}\models \mathcal {G}_n\). As soon as \(\mathcal {M}\) satisfies \(\lnot n.O\), we know that \(\mathcal {M}\not \models \mathcal {G}_n\).

Fig. 1.
figure 1

Pseudocode of the main function of the QSMA algorithm

The main function \(\textsf{QSMA}\) (Fig. 1) initializes n.U to \(\perp \) (under-approximation of all formulas and identity for disjunction) and n.O to \(\top \) (over-approximation of all formulas and identity for conjunction) for all nodes n of T. Then \(\textsf{QSMA}\) calls the function subtreeIsSolved (Fig. 2) with arguments \( root (T)\) and \(\mathcal {M}\).

Fig. 2.
figure 2

Pseudocode of the auxiliary functions of the QSMA algorithm

Function subtreeIsSolved takes a node n and a model \(\mathcal {M}\) extending \(\mathcal {M}_0\) to \( Rigid (n)\) and determines whether \(\mathcal {M}\models \mathcal {G}_n\). If \(\mathcal {M}\models n.U\) it returns \( true \); if \(\mathcal {M}\models \lnot n.O\) it returns \( false \) (lines 3–5 in Fig. 2). Otherwise (i.e., \(\mathcal {M}\models \lnot n.U \wedge n.O\)), it enters a loop whose body contains the following steps:

  1. 1.

    labelstep1 Build a formula L as the conjunction of n.F and a formula for every child b of n, denoted \(n \rightarrow b\) (line 7 in Fig. 2). The shape of the formula for b is better explained by considering a model of L and hence in the next step.

  2. 2.

    Invoke the \(\textsf{SMA}\) function to search for an extension \(\mathcal {M}^\prime \) of \(\mathcal {M}\) to \( Var (n)\) such that \(\mathcal {M}^\prime \models L\) (line  8). For all children b of n, \(b.p\in Var (n)\) and \(\mathcal {M}^\prime \) assigns a Boolean value to b.p. If \(\mathcal {M}^\prime (b.p) = \textsf{true}\), the subformula for b in L reduces to b.O, so that \(\mathcal {M}^\prime \models L\) implies \(\mathcal {M}^\prime \models b.O\). Since \(\textsf{QSMA}\) seeks to satisfy \(b.\psi \) and \(\llbracket b.\psi \rrbracket \subseteq \llbracket b.O\rrbracket \), it starts at least from a model of b.O. If \(\mathcal {M}^\prime (b.p) = \textsf{false}\), the subformula for b in L reduces to \(\lnot b.U\), so that \(\mathcal {M}^\prime \models L\) implies \(\mathcal {M}^\prime \models \lnot b.U\). Since \(\textsf{QSMA}\) seeks to falsify \(b.\psi \) and \(\llbracket b.U\rrbracket \subseteq \llbracket b.\psi \rrbracket \), it starts at least from a model of \(\lnot b.U\). The proof of partial correctnessFootnote 1 of subtreeIsSolved shows that the existence of an \(\mathcal {M}^\prime \) such that \(\mathcal {M}^\prime \models L\) is necessary for \(\mathcal {M}\models \mathcal {G}_n\).

  3. 3.

    If \(\textsf{SMA}\) returns nil, then \(\mathcal {M}\not \models \mathcal {G}_n\); subtreeIsSolved updates n.O to its conjunction with \(\textsf{MBO}(L, FV (L)\setminus Rigid (n), \mathcal {M})\) (line 10). Since \(\mathcal {M}\not \models L\), by \(\textsf{MBO}\)’s specification we know that \(\mathcal {M}\not \models \textsf{MBO}(L, FV (L)\setminus Rigid (n), \mathcal {M})\). This update ensures that \(\mathcal {M}\not \models n.O\), so that \(\mathcal {M}\models \lnot n.O\). Then subtreeIsSolved returns \( false \) (line 11).

  4. 4.

    Otherwise, we have an extension \(\mathcal {M}^\prime \) that satisfies L and hence n.F, so that there is potential for \(\mathcal {M}\models \mathcal {G}_n\). Function solutionForallChildren is invoked to determine whether this is the case.

  5. 5.

    The function solutionForallChildren calls subtreeIsSolved for every child b of n. As soon as it finds a child b such that \(\mathcal {M}(b.p) = \textsf{true}\) and the call subtreeIsSolved(b,\(\mathcal {M}\)) returns \( false \), or \(\mathcal {M}(b.p) = \textsf{false}\) and the call subtreeIsSolved(b,\(\mathcal {M}\)) returns \( true \), it returns \( false \), because it found a \(\textsf{QSMA}\)-subtree where candidate model \(\mathcal {M}\) fails. If this does not happen, solutionForallChildren returns \( true \).

  6. 6.

    If solutionForallChildren returns \( true \), subtreeIsSolved builds a formula \(L^\prime \) as the conjunction of n.F and a formula for every child b of n (line 14). If \(\mathcal {M}^\prime (b.p) = \textsf{true}\), the subformula for b in \(L^\prime \) reduces to b.U. If \(\mathcal {M}^\prime (b.p) = \textsf{false}\), the subformula for b in \(L^\prime \) reduces to \(\lnot b.O\). The proof of partial correctness of subtreeIsSolved shows that \(\mathcal {M}^\prime \models L^\prime \) and that \(\mathcal {M}^\prime \models L^\prime \) is a sufficient condition for \(\mathcal {M}\models \mathcal {G}_n\). Then subtreeIsSolved updates n.U to its disjunction with \(\textsf{MBU}(L^\prime , FV (L^\prime )\setminus Rigid (n),\mathcal {M})\) (line 15). Since \(\mathcal {M}^\prime \models L^\prime \), by \(\textsf{MBU}\)’s specification we know that \(\mathcal {M}^\prime \models \textsf{MBU}(L^\prime , FV (L^\prime )\setminus Rigid (n),\mathcal {M})\). This update ensures that \(\mathcal {M}^\prime \models n.U\). Then subtreeIsSolved returns \( true \) (line 16).

  7. 7.

    If solutionForallChildren returns \( false \), the control returns to line 7. Suppose that solutionForallChildren returned \( false \), because it found a child b of n such that \(\mathcal {M}(b.p) = \textsf{true}\) and subtreeIsSolved(b,\(\mathcal {M}\)) returned \( false \). Then the call subtreeIsSolved(b,\(\mathcal {M}\)) updated the formula b.O (line 10). Suppose that solutionForallChildren returned \( false \), because it found a child b of n such that \(\mathcal {M}(b.p) = \textsf{false}\) and subtreeIsSolved(b,\(\mathcal {M}\)) returned \( true \). Then the call subtreeIsSolved(b,\(\mathcal {M}\)) updated the formula b.U (line 15). Either way the state has changed, variable L gets a new formula on line 7, and the subsequent call to \(\textsf{SMA}\) will not produce the same model.

Example 4

Apply subtreeIsSolved to the root of the \(\textsf{QSMA}\)-tree in Example 1. Formula L gets \(p_1 \vee \lnot p_2\). \(\textsf{SMA}\) produces an \(\mathcal {M}^\prime \) that assigns values to x, \(p_1\), and \(p_2\). Suppose that \(\mathcal {M}^\prime \) satisfies \(p_1 \vee \lnot p_2\) by assigning \(\textsf{true}\) to \(p_1\). In the recursive call on \(b_1\), formula L gets \(\lnot F_1[x,y_1]\). If \(\textsf{SMA}\) produces an \(\mathcal {M}^{\prime \prime }\) that extends \(\mathcal {M}^\prime \) with an assignment to \(y_1\) such that \(\mathcal {M}^{\prime \prime }\models \lnot F_1[x,y_1]\), we have a model. Suppose that \(\mathcal {M}^\prime \) satisfies \(p_1 \vee \lnot p_2\) by assigning \(\textsf{false}\) to \(p_2\). In the recursive call on \(b_2\), formula L gets \(\lnot F_2[x,y_2]\). If \(\textsf{SMA}\) fails to produces an \(\mathcal {M}^{\prime \prime }\) that extends \(\mathcal {M}^\prime \) with an assignment to \(y_2\) such that \(\mathcal {M}^{\prime \prime }\models \lnot F_2[x,y_2]\), we have a model.

Theorem 2

The function subtreeIsSolved is partially correct: if the preconditions hold and the function halts, then the postconditions hold.

For termination, we begin with the \(\textsf{MBU}\) and \(\textsf{MBO}\) functions. Let \(\mathcal {T}\) be \(\textsf{LRA}\) with a theory extension \(\textsf{LRA}^+\) that adds constant symbols \(\tilde{q}\) for all rational numbers q. Consider an \(\textsf{MBU}\) function such that \(\textsf{MBU}(F[\bar{z},x],x,\mathcal {M}) = F[\bar{z},x]\{x{\leftarrow }_{{}}\tilde{q}\}\) and \(\mathcal {M}\models F[\bar{z},\tilde{q}]\). This kind of MBU is called generalization-by-substitution [12]. While \(F[\bar{z},\tilde{q}]\) is an under-approximation of \(\exists x. F[\bar{z},x]\), this \(\textsf{MBU}\) is not a good choice for termination. By applying \(\textsf{MBU}\) repeatedly with an infinite enumeration of rational constants, the \(\textsf{QSMA}\) algorithm could build an infinite sequence of under-approximations \((\bigvee _{i=1}^n F[\bar{z},x]\{x{\leftarrow }_{{}}\tilde{q}_i\})_{n\in \mathbb N}\) none of which is \(\textsf{LRA}\)-equivalent to \(\exists x. F[\bar{z},x]\). The next definition excludes such \(\textsf{MBU}\) functions, by requiring that for a given formula and variable tuple (that depends on the formula), \(\textsf{MBU}\) can generate only finitely many formulas.

Definition 5

(Finite basis). An \(\textsf{MBU}\) function has finite basis if the set \(\{\textsf{MBU}(F[\bar{z},\bar{x}],\bar{x},\mathcal {M}) \mid \mathcal {M}:\text{ extension } \text{ of } \mathcal {M}_0 \text{ to } \bar{z} \text{ such } \text{ that } \mathcal {M}\models \exists \bar{x} . F[\bar{z},\bar{x}]\}\) is finite for all quantifier-free formulas \(F[\bar{z},\bar{x}]\) and tuples \(\bar{x}\).

The notion of an \(\textsf{MBO}\) function having a finite basis is defined in the same way with \(\not \models \) in place of \(\models \).

Lemma 1

If \(\textsf{MBU}\) and \(\textsf{MBO}\) have finite basis, for all (possibly infinite) series of calls \(\{\texttt{subtreeIsSolved}(n,\mathcal {M}_i)\}_i\), all satisfying the preconditions and all terminating, formulas n.U and n.O are updated only a finite number of times.

Once nontermination due to \(\textsf{MBU}\) or \(\textsf{MBO}\) is excluded even for an infinite series of halting calls, termination is proved by induction on the \(\textsf{QSMA}\)-tree.

Theorem 3

If the \(\textsf{MBU}\) and \(\textsf{MBO}\) functions have finite basis, whenever the preconditions are satisfied the function subtreeIsSolved halts.

Example 5

Apply subtreeIsSolved to the root of the \(\textsf{QSMA}\)-tree in Example 2. Formula L gets \(p_1 \wedge \lnot p_2\). \(\textsf{SMA}\) produces an \(\mathcal {M}^\prime \) that assigns values to x, \(p_1\), and \(p_2\). Suppose that \(\mathcal {M}^\prime \) assigns 1 to x, while it must assign \(\textsf{true}\) to \(p_1\) and \(\textsf{false}\) to \(p_2\). In the recursive call on \(b_1\), formula L gets \(x \simeq 2{\cdot }y_1\). If \(\textsf{SMA}\) produces an \(\mathcal {M}^{\prime \prime }\) that extends \(\mathcal {M}^\prime \) with \(y_1{\leftarrow }_{{}}\frac{1}{2}\), we have a model of \(\mathcal {G}_{b_1}\). In the recursive call on \(b_2\), formula L gets \(3{\cdot }x \simeq 2{\cdot }y_2\). If \(\textsf{SMA}\) produces an \(\mathcal {M}^{\prime \prime }\) that extends \(\mathcal {M}^\prime \) with \(y_2{\leftarrow }_{{}}\frac{3}{2}\), we have a model of \(\mathcal {G}_{b_2}\), but because \(\mathcal {M}^\prime (p_2) = \textsf{false}\), there is no model of \(\mathcal {G}\). Indeed, formula \(\varphi \) of Example 2 is false as the original formula is true.

5 The \(\textsf{OptiQSMA}\) Algorithm and Its Total Correctness

YicesQS implements an optimized variant of \(\textsf{QSMA}\), called \(\textsf{OptiQSMA}\), that reduces the number of recursive calls to subtreeIsSolved by entrusting more work to each call to \(\textsf{SMA}\). Reconsider the behavior of \(\textsf{QSMA}\) in Example 4. We can avoid a recursive call to subtreeIsSolved by asking \(\textsf{SMA}\) to satisfy \((p_1 \vee \lnot p_2) \wedge (p_1 \Rightarrow \lnot F_1[x,y_1])\) in lieu of \(p_1 \vee \lnot p_2\). This way, if the candidate model returned by \(\textsf{SMA}\) assigns \(\textsf{true}\) to \(p_1\), it also assigns to x and \(y_1\) values that satisfy \(\lnot F_1[x,y_1]\). This means that \(\exists y_1 . \lnot F_1[x,y_1]\) is found true without recursion. On the other hand, if \(p_2\) is assigned \(\textsf{false}\), the algorithm still has to make the recursive call to see if it can satisfy \(\exists y_2 . \lnot F_2[x,y_2]\).

The idea of \(\textsf{OptiQSMA}\) is to do a look-ahead on a path in the \(\textsf{QSMA}\)-tree, doing the work in one shot rather then through recursive calls on all the nodes in the path. The look-ahead applies to a path such that the Boolean labels of all the arcs in the path are assigned \(\textsf{true}\) by the candidate model. The following definition builds a formula to allow the look-ahead.

Definition 6

(Look-ahead formula). Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\), for all nodes n of T the look-ahead formula of n is \( LF (n) = n.F \wedge \bigwedge _{n \rightarrow b} (b.p \Rightarrow LF (b))\).

The next definition distinguishes the nodes that are handled together in one shot without recursion and those where recursion is still needed. Nodes of the first kind are called no alternation nodes, because such nodes are on a path as described above, where all Boolean labels are assigned \(\textsf{true}\) and hence there is no alternation between \(\textsf{true}\) and \(\textsf{false}\). Nodes of the second kind are called first alternation nodes, because they are the nodes reached by the first arc whose Boolean label is assigned \(\textsf{false}\).

Definition 7

(No alternation nodes and first alternation nodes). Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\) for all nodes n of T and extensions \(\mathcal {M}\) of \(\mathcal {M}_0\) to \( FV ( LF (n))\), the set \(\textsf{NAN}(n,\mathcal {M})\) of the no-alternation nodes from n according to \(\mathcal {M}\) (resp. the set \(\textsf{FAN}(n,\mathcal {M})\) of the first-alternation nodes from n according to \(\mathcal {M}\)) contains all and only the nodes b such that: (i) b is a descendant of n through a path \(n \rightarrow n_1 \rightarrow \ldots \rightarrow n_q \rightarrow b\) (\(q\ge 0\)), (ii) \(\forall i\), \(1\le i\le q\), \(\mathcal {M}(n_i.p) = \textsf{true}\), and (iii) \(\mathcal {M}(b.p) = \textsf{true}\) (resp. \(\mathcal {M}(b.p) = \textsf{false}\)).

A node \(b\in \textsf{FAN}(n,\mathcal {M})\) such that \(q = 0\) in Condition (i) of Definition 7 is a child of n: for a child there is no optimization. The \(\textsf{OptiQSMA}\) algorithm seeks a candidate model \(\mathcal {M}\) that satisfies \( LF (n)\) and recurses only on the nodes in \(\textsf{FAN}(n,\mathcal {M})\). Therefore, the definition of satisfaction with look-ahead, denoted \(\models _{la}\), follows the pattern of Definition 3, replacing r.F with \( LF (r)\) and Condition (ii) of Definition 3 with a condition for the nodes in the \(\textsf{FAN}\) set.

Definition 8

(Satisfaction with look-ahead). Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\) with \(r = root (T)\) and an extension \(\mathcal {M}\) of \(\mathcal {M}_0\) to \( Rigid (r) = \bar{z}\), \(\mathcal {M}\models _{la} \mathcal {G}\) if there exists an extension \(\mathcal {M}^\prime \) of \(\mathcal {M}\) to \( FV ( LF (r))\) such that (i) \(\mathcal {M}^\prime \models LF (r)\) and (ii) for all nodes \(b \in \textsf{FAN}(r,\mathcal {M}^\prime )\), \(\mathcal {M}^\prime \not \models _{la} \mathcal {G}_b\).

Since for the nodes \(b \in \textsf{FAN}(r,\mathcal {M}^\prime )\) it is \(\mathcal {M}^\prime (b.p) = \textsf{false}\), the \(\models _{la}\) relation is negated in Condition (ii). The next theorem shows that the optimization does not change the problem.

Theorem 4

Given a \(\textsf{QSMA}\)-tree \(\mathcal {G}= (\bar{z},T)\) and an extension \(\mathcal {M}\) of \(\mathcal {M}_0\) to \(\bar{z}\), \(\mathcal {M}\models \mathcal {G}\) if and only if \(\mathcal {M}\models _{la} \mathcal {G}\).

Fig. 3.
figure 3

Pseudocode of the main function of the \(\textsf{OptiQSMA}\) algorithm

Fig. 4.
figure 4

Pseudocode of the auxiliary functions of the optiQSMA algorithm

The \(\textsf{OptiQSMA}\) algorithm maintains under-approximations n.U of \(n.\psi \) for all nodes n, but not over-approximations. Accordingly, the main function \(\textsf{OptiQSMA}\) (Fig. 3) initializes only n.U for all nodes n, and then calls optiSubtreeIsSolved (Fig. 4). This function returns \(\textsf{SAT}(U)\) if \(\mathcal {M}\models _{la} \mathcal {G}\) and \(\textsf{UNSAT}(O)\) if \(\mathcal {M}\not \models _{la} \mathcal {G}\). The formula U is an under-approximation of \(r.\psi \) (\(r = root (T)\)) such that \(\mathcal {M}\models U\). The formula O is an over-approximation of \(r.\psi \) such that \(\mathcal {M}\not \models O\). The main function \(\textsf{OptiQSMA}\) has no usage for U and O and merely returns \( true \) or \( false \) accordingly. Function optisubtreeIsSolved builds and returns under-approximations and over-approximations recursively. The reason for saving only under-approximations is practical, and will become clear after the illustration of optisubtreeIsSolved. This function takes a node n and a model \(\mathcal {M}\) extending \(\mathcal {M}_0\) to \( Rigid (n)\) and determines whether \(\mathcal {M}\models _{la} \mathcal {G}_n\), by executing a loop whose body contains the following steps:

  1. 1.

    Build a formula L (line 3 in Fig. 4) as the conjunction of the look-ahead formula \( LF (n)\) (in lieu of n.F in line 7 of Fig. 2) and a formula for every descendant b of n, denoted \(n \rightarrow ^+ b\) (in lieu of child as in Fig. 2).

  2. 2.

    Invoke the \(\textsf{SMA}\) function to search for an extension \(\mathcal {M}^\prime \) of \(\mathcal {M}\) to \( Var (n)\) such that \(\mathcal {M}^\prime \models L\). For those descendants b for which \(\mathcal {M}^\prime (b.p) = \textsf{false}\), the subformula for b in L reduces to \(\lnot b.U\) as in Step 2 of the description of subtreeIsSolved. For those descendants b for which \(\mathcal {M}^\prime (b.p) = \textsf{true}\), the subformula for b in L reduces to \(\textsf{true}\), in agreement with the fact that over-approximations are not kept.

  3. 3.

    If \(\textsf{SMA}\) returns nil, optiSubtreeIsSolved returns \(\textsf{UNSAT}(O)\), where O is simply the outcome of applying \(\textsf{MBO}\) to L and \(\mathcal {M}\), as over-approximations are not kept. Otherwise, there is potential for satisfaction with look-ahead. Function optiSubtreeIsSolved initializes the formula reasons to \(\top \) and invokes solutionForallDescendants passing reasons by reference.

  4. 4.

    Function solutionForallDescendants considers first all descendants b in \(\textsf{FAN}(n,\mathcal {M})\), and calls optiSubtreeIsSolved \((b,\mathcal {M})\) for each of them. If this call returns \(\textsf{SAT}(U)\), it means that \(\mathcal {M}\models _{la} \mathcal {G}_b\); solutionForallDescendants weakens b.U by disjunction with U and returns \( false \). If optiSubtreeIsSolved \((b,\mathcal {M})\) returns \(\textsf{UNSAT}(O)\), it means that \(\mathcal {M}\not \models _{la} \mathcal {G}_b\), and we move on to the next descendant in \(\textsf{FAN}(n,\mathcal {M})\). Prior to that, reasons is strengthened by conjunction with \(\lnot b.p \Rightarrow \lnot O\). For all descendants b in \(\textsf{NAN}(n,\mathcal {M})\), solutionForallDescendants strengthens reasons by conjunction with b.p.

  5. 5.

    If solutionForallDescendants returns \( true \), optiSubtreeIsSolved builds formula \(L^\prime \) as \( LF (n) \wedge \texttt{reasons}\), and returns \(\textsf{SAT}(U)\), where U is the outcome of the application of \(\textsf{MBU}\) to \(L^\prime \) and \(\mathcal {M}\). Otherwise, the control returns to line 3. Since solutionForallDescendants returned \( false \), it means that it found a node b in \(\textsf{FAN}(n,\mathcal {M})\) for which optiSubtreeIsSolved(b,\(\mathcal {M}\)) returned \(\textsf{SAT}(U)\) and the formula b.U was updated (line 17). Therefore the state has changed, variable L gets a new formula on line 3, and the subsequent call to \(\textsf{SMA}\) will not produce the same model.

In the experiments it turned out that storing over-approximations for all nodes is less efficient than using them to compute \(L^\prime \) and then forget them. Thus, the over-approximation O encapsulated in the \(\textsf{UNSAT}(O)\) value returned by a recursive call to optiSubtreeIsSolved is used to build the temporary formula reasons, but it is not saved, and reasons is used to compute \(L^\prime \).

Theorem 5

The function optiSubtreeIsSolved is partially correct: if the preconditions hold and the function halts, then the postconditions hold.

The proof of partial correctness of optiSubtreeIsSolved shows that every model that satisfies \(L^\prime = ( LF (n) \wedge \texttt{reasons})\) fulfills Definition 8. In this sense, reasons is an explanation of why a model is found with look-ahead.

Theorem 6

If the \(\textsf{MBU}\) and \(\textsf{MBO}\) functions have finite basis, whenever the preconditions are satisfied the function optiSubtreeIsSolved halts.

6 The YicesQS Solver and Experimental Results

The \(\textsf{OptiQSMA}\) algorithm is implemented in YicesQS to equip Yices 2 with support for quantifiers for complete theories (unrelated to Yices 2 support for quantifiers in \(\textsf{UF}\)).Footnote 2 MBO is available as model interpolation from Yices’s MCSAT [10] solver for quantifier-free formulas, including theory-specific techniques for bitvectors (\(\textsf{BV}\)) [15] and arithmetic. The latter are based on NLSAT [16] and ultimately on Cylindrical Algebraic Decomposition (CAD). Basic MBU is done as generalization-by-substitution [12] and improved with model-based projection (e.g., [18]) for arithmetic, and invertibility conditions [21], including \(\epsilon \)-terms, for \(\textsf{BV}\). In YicesQS model-based projection also is based on CAD.

Fig. 5.
figure 5

Plot for BV.

In the 2022 SMT competition, YicesQS entered the single-query, non-incremental tracks of \(\textsf{BV}\), \(\textsf{LRA}\), \(\textsf{LIA}\), \(\textsf{NRA}\), and \(\textsf{NIA}\) (nonlinear integer arithmetic). The experiments were run on the Starexec cluster with a 20 min timeout per benchmark and 60GB of memory. The benchmarks were a subset of the SMT-LIB collection. The results presented below were computed by running the competition script join.sh on the raw data from StarExec,Footnote 3 sorting the data, and producing the plots that are available online.Footnote 4 A description of the participating solvers can be found on the competition website.Footnote 5

Fig. 6.
figure 6

Plots for the four arithmetics.

Figure 5 shows the results for \(\textsf{BV}\), where YicesQS solved quickly a high number of benchmarks (compared for example with CVC5), but was not outstanding, possibly because YicesQS 2022 makes a limited use of invertibility conditions for model interpolation. Figure 6 shows the results for the four arithmetics. The columns on the left list number of solved instances and time to solve them for each logic and solver. In the plot on the right, each color corresponds to a solver and point (xy) of that color means that the \(x^{th}\) fastest-solved benchmark was solved by that solver in time y (log scale). 2021 Z3 is included because in some of these logics it performed slightly better than 2022 Z3. The logic where YicesQS performed best is \(\textsf{LRA}\): it was the only solver to solve all 1,003 benchmarks. Z3 2021 was second best, solving 948 benchmarks with a total runtime about 100 times higher. YicesQS has neither a special treatment (e.g., simplex-based) of linear problems, nor integer-specific techniques: it relies on CAD-based techniques for MBU and MBO also for integer problems. Thus, it is somewhat average on \(\textsf{LIA}\) and \(\textsf{NIA}\). These two theories are undecidable (\(\textsf{NRA}\) due to division by 0) and hence they lie outside of the theoretical framework of \(\textsf{QSMA}\). YicesQS answers should still be correct, but termination can be lost. With Z3 being a non-competing participant in the SMT 2022 competition, YicesQS came second for Largest Contribution (single queries), because of its overall performance in the four arithmetics, where it also came first for satisfiable instances and in the 24 sec timeout setup (instead of 20 min).

7 Discussion: Related Work and Future Work

Quantified SMT was approached by a procedure with an \(\exists \)-solver and a \(\forall \)-solver for prenex normal form formulas with \(\exists \forall \) prefix [12]. A formulation as a game between an \(\exists \)-player and a \(\forall \)-player appeared with the QSAT algorithm [3] for prenex normal form formulas with \((\exists \forall )^+\) prefix. \(\textsf{QSMA}\) accepts arbitrary formulas with quantifiers in arbitrary positions.

Both QSAT and \(\textsf{QSMA}\) work for a generic theory \(\mathcal {T}\) over basic \(\mathcal {T}\)-specific components. QSAT uses model-based projection [3, 18] and a solver for quantifier-free satisfiability that supports UNSAT cores. Model-based projection is an instance of MBU. An UNSAT core (as a conjunction) is an MBO in the special case where the input assignment is Boolean. While MBO can produce UNSAT cores, MBO generalizes the concept of UNSAT core with theory-specific reasoning when there are non-Boolean input assignments, as it is the case in \(\textsf{QSMA}\). It is unclear whether the combination of UNSAT cores and theory-specific MBU can emulate MBO or provide the same benefits. QSAT is implemented in Z3 and it is the default solver for \(\textsf{LIA}\), \(\textsf{LRA}\), and \(\textsf{NRA}\).

YicesQS is a recent implementation that only participated in the SMT competition in 2021 and 2022. Directions for further development include augmenting integer reasoning, and improving model interpolation in \(\textsf{BV}\) by a better usage of invertibility conditions. Another lead for future work is to compose \(\textsf{QSMA}\) within the CDSAT framework for conflict-driven reasoning in unions of theories [4,5,6]. For this, one may need to drop the assumption that there is a unique model \(\mathcal {M}_0\) and only its extensions need to be considered, which will be a generalization also in the single theory case. As most known MBU and MBO functions are for single theories, one may have to study how to get MBU and MBO functions for a union of theories from such functions for the component theories. Another issue is the interplay between \(\textsf{QSMA}\)’s recursive descent over the \(\textsf{QSMA}\)-tree for the formula and CDSAT’s conflict-driven search.