Program Synthesis in Saturation

We present an automated reasoning framework for synthesizing recursion-free programs using saturation-based theorem proving. Given a functional specification encoded as a first-order logical formula, we use a first-order theorem prover to both establish validity of this formula and discover program fragments satisfying the specification. As a result, when deriving a proof of program correctness, we also synthesize a program that is correct with respect to the given specification. We describe properties of the calculus that a saturation-based prover capable of synthesis should employ, and extend the superposition calculus in a corresponding way. We implemented our work in the first-order prover Vampire, extending the successful applicability of first-order proving to program synthesis. This is an extended version of an Automated Deduction -- CADE 29 paper with the same title and the same authors.


Introduction
Program synthesis constructs code from a given specification.In this work we focus on synthesis using functional specifications summarized by valid first-order formulas [13,1], ensuring that our programs are provably correct.While being a powerful alternative to formal verification [19], synthesis faces intrinsic computational challenges.One of these challenges is posed to the reasoning backend used for handling program specifications, as the latter typically include firstorder quantifier alternations and interpreted theory symbols.As such, efficient reasoning with both theories and quantifiers is imperative for any effort towards program synthesis.
In this paper we address this demand for recursion-free programs.We advocate the use of first-order theorem proving for extracting code from correctness proofs of functional specifications given as first-order formulas ∀x.∃y.F [x, y].These formulas state that "for all (program) inputs x there exists an output y such that the input-output relation (program computation) F [x, y] is valid".
Given such a specification, we synthesize a recursion-free program while also deriving a proof certifying that the program satisfies the specification.
The programs we synthesize are built using first-order theory terms extended with if−then−else constructors.To ensure that our programs yield computational models, i.e., that they can be evaluated for given values of input variables x, we restrict the programs we synthesize to only contain computable symbols.
Our approach in a nutshell.In order to synthesize a recursion-free program, we prove its functional specification using saturation-based theorem proving [14,10].We extend saturation-based proof search with answer literals [5], allowing us to track substitutions into the output variable y of the specification.These substitutions correspond to the sought program fragments and are conditioned on clauses they are associated with in the proof.When we derive a clause corresponding to a program branch if C then r, where C is a condition and r a term and both C, r are computable, we store it and continue proof search assuming that ¬C holds; we refer to such conditions C as (program) branch conditions.The saturation process for both proof search and code construction terminates when the conjunction of negations of the collected branch conditions becomes unsatisfiable.Then we synthesize the final program satisfying the given (and proved) specification by assembling the recorded program branches (see e.g. . The main challenges of making our approach effective come with (i) integrating the construction of the programs with if − then− else into the proof search, turning thus proof search into program search/synthesis, and (ii) guiding program synthesis to only computable branch conditions and programs.
Contributions.We bring the next contributions solving the above challenges: 5   • We formalize the semantics for clauses with answer literals and introduce a saturation-based algorithm for program synthesis based on this semantics.We prove that, given a sound inference system, our saturation algorithm derives correct and computable programs (Section 4).• We define properties of a sound inference calculus in order to make the calculus suitable for our saturation-based algorithm for program synthesis.We accordingly extend the superposition calculus and define a class of substitutions to be used within the extended calculus; we refer to these substitutions as computable unifiers (Section 5).• We extend a first-order unification algorithm to find computable unifiers (Section 6) to be further used in saturation-based program synthesis.• We implement our work in the Vampire prover [10] and evaluate our synthesis approach on a number of examples, complementing other techniques in the area (Section 7).For example, our results demonstrate the applicability of our work on synthesizing programs for specifications that cannot be even encoded in the SyGuS syntax [15].

Preliminaries
We assume familiarity with standard multi-sorted first-order logic with equality.We denote variables by x, y, terms by s, t, atoms by A, literals by L, clauses by C, D, formulas by F, G, all possibly with indices.Further, we write σ for Skolem constants.We reserve the symbol for the empty clause which is logically equivalent to ⊥. Formulas and clauses with free variables are considered implicitly universally quantified (i.e.we consider closed formulas).By ≃ we denote the equality predicate and write t ≃ s as a shorthand for ¬t ≃ s.We use a distinguished integer sort, denoted by Z.When we use standard integer predicates <, ≤, >, ≥, functions +, −, . . .and constants 0, 1, . . ., we assume that they denote the corresponding interpreted integer predicates and functions with their standard interpretations.Additionally, we include a conditional term constructor if−then−else in the language, as follows: given a formula F and terms s, t of the same sort, we write if F then s else t to denote the term s if F is valid and t otherwise.
An expression is a term, literal, clause or formula.We write E[t] to denote that the expression E contains the term t.For simplicity, E[s] denotes the expression E where all occurrences of t are replaced by the term s.A substitution θ is a mapping from variables to terms.A substitution θ is a unifier of two expressions E and E ′ if Eθ = E ′ θ, and is a most general unifier (mgu) if for every unifier η of E and E ′ , there exists substitution µ such that η = θµ.We denote the mgu of E and E ′ with mgu(E, E ′ ).We write valid, and extend the notation also to validity modulo a theory T .Symbols occurring in a theory T are interpreted and all other symbols are uninterpreted.

Computable Symbols and Programs
We distinguish between computable and uncomputable symbols in the signature.The set of computable symbols is given as part of the specification.Intuitively, a symbol is computable if it can be evaluated and hence is allowed to occur in a synthesized program.A term or a literal is computable if all symbols it contains are computable.A symbol, term or literal is uncomputable if it is not computable.
A functional specification, or simply just a specification, is a formula The variables x of a specification (1) are called input variables.Note that while we use specifications with a single variable y, our work can analogously be used with a tuple of variables y in (1).Let σ denote a tuple of Skolem constants.Consider a computable term r[σ] such that the instance F [σ, r[σ]] of (1) holds.Since σ are fresh Skolem constants, the formula ∀x.F [x, r[x]] also holds; we call such r[x] a program for (1) and say that the program r[x] computes a witness of (1).Superposition: where θ := mgu(s, s ′ ); tθ sθ; (first rule only) L[s ′ ] is not an equality literal; and (second and third rules only) u ′ θ u[s ′ ]θ.
i=1 F i to refer to a program with conditions F 1 , . . ., F n for (1).In the sequel, we refer to (parts of) programs with conditions also as conditional branches.In Section 4 we show how to build programs for (1) by composing programs with conditions for (1) (see Corollary 3).

Saturation and Superposition
Saturation-based proof search implements proving by refutation [10]: to prove validity of F , a saturation algorithm establishes unsatisfiability of ¬F .Firstorder theorem provers work with clauses, rather than with arbitrary formulas.To prove a formula F , first-order provers negate F which is further skolemized and converted to clausal normal form (CNF).The CNF of ¬F is denoted by cnf(¬F ) and represents a set S of initial clauses.First-order provers then saturate S by computing logical consequences of S with respect to a sound inference system I.The saturated set of S is called the closure of S and the process of computing the closure of S is called saturation.If the closure of S contains the empty clause , the original set S of clauses is unsatisfiable, and hence the formula F is valid.We may extend the set S of initial clauses with additional clauses C 1 , . . ., C n .If C is derived by saturating this extended set, we say C is derived from S under additional assumptions C 1 , . . ., C n .
The superposition calculus, denoted as Sup and given in Figure 1, is the most common inference system used by saturation-based provers for first-order logic with equality [14].The Sup calculus is parametrized by a simplification ordering ≻ on terms and a selection function, which selects in each non-empty clause a non-empty subset of literals (possibly also positive literals).We denote selected literals by underlining them.An inference rule can be applied on the given premise(s) if the literals that are underlined in the rule are also selected in the premise(s).For a certain class of selection functions, the superposition calculus Sup is sound (if is derived from F , then F is unsatisfiable) and refutationally complete (if F is unsatisfiable, then can be derived from it).

Answer Literals
Answer literals [5] provide a question answering technique for tracking substitutions into given variables throughout the proof.Suppose we want to find a witness for the validity of the formula ( Within saturation-based proving, we first derive the skolemized negation of (2) and add an answer literal using a fresh predicate ans with argument y, yielding ∀y.(¬F [y] ∨ ans(y)). ( We then saturate the CNF of (3), while ensuring that answer literals are not selected for performing inferences.Answer literals with if−then−else.The derivation of disjunctive answers can be avoided by modifying the inference rules to only derive clauses containing at most one answer literal.One such modification is given within the A(R)calculus for binary resolution [21], where R is a so-called strongly liftable term restriction.The A(R)-calculus replaces the binary resolution rule when both premises contain an answer literal by the following A-resolution rule: where θ := mgu(A, A ′ ) and the restriction R(if A then r ′ else r) holds.
In our work we go beyond the A-resolution rule and modify both the superposition calculus and the saturation algorithm to reason not only about answer literals but also about their use of if−then−else terms (see Sections 4-5).

Illustrative Example
Let us illustrate our approach to program synthesis.We use answer literals in saturation to construct programs with conditions while proving specifications (1).
By adding an answer literal to the skolemized negation of (1), we obtain In this example we assume that all symbols are computable.To synthesize a program for (5), we add an answer literal to the skolemized negation of ( 5) and convert the resulting formula to CNF (preprocessing).We consider the set S of clauses containing the obtained CNF and the axioms (A1)-(A3).We saturate S using Sup and obtain the following derivation: 61.σ * y ≃ e ∨ ans(y) [Sup 4., A1] 6. ans(i(σ)) [BR 5., 1.] Using the above derivation, we construct a program for the functional specification (5) as follows: we replace σ in the definite answer i(σ) by x, yielding the program i(x).Note that for each input x, our synthesized program computes the inverse i(x) of x as an output.In other words, our synthesized program for (5) ensures that each group element x has a right inverse i(x).
While Example 1 yields a definite answer within saturation-based proof search, our work supports the synthesis of more complex recursion-free programs (see Examples 2-3) by composing program fragments derived in the program search (Section 4) as well as by using answer literals with if − then− else to effectively handle disjunctive answers (Section 5).

Program Synthesis with Answer Literals
We now introduce our approach to saturation-based program synthesis using answer literals (Algorithm 1).We focus on recursion-free program synthesis and present our work in a more general setting.Namely, we consider functional specifications whose validity may depend on additional assumptions (e.g.additional program requirements) A 1 , . . ., A n , where each A i is a closed formula: Note that specification ( 1) is a special case of ( 6).However, since A 1 , . . ., A n are closed formulas, ( 6) is equivalent to ∀x.∃y.
), which is a special case of (1).Given a functional specification (6), we use answer literals to synthesize programs with conditions (Section 4.1) and extend saturation-based proof search to reason about answer literals (Section 4.2).For doing so, we add the answer literal ans(y) to the skolemized negation of ( 6) and obtain We saturate the CNF of ( 7), while ensuring that answer literals are not selected within the inference rules used in saturation.We That is, under the assumptions C 1 , . . ., C m , ¬C, the computable term r[σ] provides a definite answer to (6).
We further use Theorem 1 to synthesize programs with conditions for (6). Corollary is a program with conditions for (6).
Note that a program with conditions r[x], to finally synthesize a program for (6).To this end, we use Corollary 2 to derive programs with conditions, and once their conditions cover all possible cases given the initial assumptions A 1 , . . ., A n , we compose them into a program for (6).
Corollary 3 [From Programs with Conditions to Programs for (6)] Let , be programs with conditions for (6), such that given by . . .
is a program for (6).
Note that since the conditional branches of ( 8) cover all possible cases to be considered over x, we do not need the condition if ¬C k .In particular, if k = 1, i.e.

Saturation-Based Program Synthesis
Our program synthesis results from Theorem 1, Corollary 2 and Corollary 3 rely upon a saturation algorithm using a sound (but not necessarily complete) inference system I.In this section, we present our modifications to extend stateof-the-art saturation algorithms with answer literal reasoning, allowing to derive clauses C[σ]∨ans(r[σ]), where both C[σ] and r[σ] are computable.In Sections 5-6 we then describe modifications of the inference system I to implement rules over clauses with answer literals.Remark 1.Compared to [21] where potentially large programs (with conditions) are tracked in answer literals, Algorithm 1 removes answer literals from clauses and constructs the final program only after saturation found a refutation of the negated (6).Our approach has two advantages: first, we do not have to keep track of potentially many large terms using if−then−else, which might slow down saturation-based program synthesis.Second, our work can naturally be integrated with clause splitting techniques within saturation (see Section 7).

Superposition with Answer Literals
We note that our saturation-based program synthesis approach is not restricted to a specific calculus.Algorithm 1 can thus be used with any sound set of inference rules, including theory-specific inference rules, e.g.[9], as long as the rules allow derivation of clauses in the form C∨ans(r), where C, r are computable and C is ground.I.e., the rules should only derive clauses with at most one answer literal, and should not introduce uncomputable symbols into answer literals.
In this section we present changes tailored to the superposition calculus Sup, yet, without changing the underlying saturation process of Algorithm 1.We first introduce the notion of an abstract unifier [16] and define a computable unifiera mechanism for dealing with the uncomputable symbols in the reasoning instead of introducing them into the programs.The use of such a unifier in any sound calculus is explained, with particular focus on the Sup calculus.
Definition 1 (Abstract unifier [16]).An abstract unifier of two expressions E 1 , E 2 is a pair (θ, D) such that: Intuitively speaking, an abstract unifier combines disequality constraints D with a substitution θ such that the substitution is a unifier of E 1 , E 2 if the constraints D are not satisfied.

Definition 2 (Computable unifier).
A computable unifier of two expressions E 1 , E 2 with respect to an expression E 3 is an abstract unifier (θ, D) of E 1 , E 2 such that the expression E 3 θ is computable.
For example, let f be computable and g uncomputable.Then ({y → f (z)}, z ≃ g(x)) is a computable unifier of the terms f (g(x)), y with respect to f (y).Further, ({y → f (g(x))}, ∅) is an abstract unifier of the same terms, but not a computable unifier with respect to f (y).
Ensuring computability of answer literal arguments.We modify the rules of a sound inference system I to use computable unifiers with respect to the answer literal argument instead of unifiers.Since a computable unifier may entail disequality constraints D, we add D to the conclusions of the inference rules.That is, for an inference rule of I as below where θ is a substitution such that Eθ ≃ E ′ θ holds for some expressions E, E ′ , we extend I with the following n inference rules with computable unifiers: where (θ ′ , D) is a computable unifier of E, E ′ with respect to r and none of C 1 , . . ., C n contains an answer literal.We obtain the following result.
Superposition (Sup): where (θ, D) is a computable unifier of s, s ′ w.r.t. the argument of the answer literal in the rule conclusion (i.e. if s ≃ t then r ′ else r for the left-column rules, and r for the others); (rules on the first line only) L[s ′ ] is not an equality literal; and (rules on the second and third line only) u ′ θ u[s ′ ]θ.

Binary resolution (BR):
where (θ, D) is a computable unifier of A, A ′ w.r.t.(first rule) if A then r ′ else r or (second rule) r.
Equality factoring (EF): where (θ, D) is a computable unifier of s, s ′ w.r.t.r; tθ sθ; and t ′ θ tθ.We note that we keep the original rule (9) in I, but impose that none of its premises C 1 , . . ., C n contains an answer literal.Clearly, neither the such modified rule (9) nor the new rules (10) introduce uncomputable symbols into answer literals.Rather, these rules add disequality constraints D into their conclusions and immediately select D for further applications of inference rules.Such a selection guides the saturation process in Algorithm 1 to first discharge the constraints D containing uncomputable symbols with the aim of deriving a clause C ′ ∨ ans(r ′ ) where C ′ is computable.The clause C ′ ∨ ans(r ′ ) is then converted into a program with conditions using Corollary 2. Superposition with answer literals.We make the inference rule modifications (9), together with the addition of new rules (10), for each inference rule of the Sup calculus from Figure 1.Further, we also ensure that rules with multiple premises, when applied on several premises containing answer literals, derive clauses with at most one answer literal.We therefore introduce the following two rule modifications.(i) We use the if−then−else constructor to combine answer literals of premises, by adapting the use of if−then−else within binary resolution [12,13,21] to superposition rules.(ii) We use an answer literal from only one of the rule premises in the rule conclusion and add new disequality constraint r ≃ r ′ between the premises' answer literal arguments, similar to the constraints D of the computable unifier.Analogously to the computable unifier constraints, we immediately select this disequality constraint r ≃ r ′ .
The resulting extension of the Sup calculus with answer literals is given in Figure 3.In addition to the rules of Figure 3, the extended calculus contains rules constructed as (10) for superposition and binary resolution rules of Figure 1.Using Lemma 4, we conclude the following.
Lemma 5 [Soundness of Sup with Answer Literals] The inference rules from Figure 3 of the extended Sup calculus with answer literals are sound.
By the soundness results of Lemmas 4-5, Corollaries 2-3 imply that, when applying the calculus of Figure 3 in the saturation-based program synthesis approach of Algorithm 1, we construct correct programs.
Example 2. We illustrate the use of Algorithm 1 with the extended Sup calculus of Figure 3, strengthening our motivation from Section 3 with if−then−else reasoning.To this end, consider the functional specification over group theory: asserting that, if the group is not commutative, there is an element whose square is not e.In addition to the axioms (A1)-(A3) of Figure 2, we also use the right identity axiom (A2') ∀x.
x * e ≃ x. 7 Based on Algorithm 1, we obtain the following derivation of the program for (11): 11.
[answer literal removal 11. (Algorithm 1, line 10)] The programs with conditions collected during saturation-based program synthesis, in particular corresponding to steps 3. and 11. above, are: describing the inverse element z of i(x) * i(y).We annotate the inverse i(•) as uncomputable to disallow the trivial solution i(i(x) * i(y)).Using computable unifiers, we synthesize8 the program y * x; that is, a program computing y * x as the inverse of i(x) * i(y).

Computable Unification with Abstraction
When compared to the Sup calculus of Figure 1, our extended Sup calculus with answer literals from Figure 3 uses computable unifiers instead of mgus.To find computable unifiers, we introduce Algorithm 2 by extending a standard unification algorithm [17,7] and an algorithm for unification with abstraction of [16].Algorithm 2 combines computable unifiers with mgu computation, resulting in the computable unifier θ := mgu comp (E 1 , E 2 , E 3 ) to be further used in Figure 3. Algorithm 2 modifies a standard unification algorithm to ensure computability of E 3 θ.Changes compared to a standard unification algorithm are highlighted.Algorithm 2 does not add s → t to θ if s is a variable in E 3 and t is uncomputable.Instead, if t is f (t 1 , . . ., t n ) where f is computable but not all t 1 , . . ., t n are computable, we extend θ by s → f (x 1 , . . ., x n ) and then add equations x 1 = t 1 , . . ., x n = t n to the set of equations E to be processed.Otherwise, f is uncomputable and we perform an abstraction: we consider s and t to be unified under the condition that s ≃ t holds.Therefore we add a constraint s ≃ t to the set of literals D which will be added to any clause invoking the computable unifier.To discharge the literal s ≃ t, one must prove s ≃ t.While s can be later substituted for other terms, as long as we use mgu comp , s will never be substituted for an uncomputable term.Thus, we conclude the following result.
, and replace the clause by C[σ]∨ m i=1 ¬C i [σ] (see lines 7-10 of Algorithm 1), which may be then further split by AVATAR.
Finally, our implementation simplifies the programs we synthesize.If during Algorithm 1 we record a program z, F where z is a variable, we do not use this program in the final program construction (line 12 of Algorithm 1) even if F occurs in the derivation of (see Example 2).Examples and experimental setup.The goal of our experimental evaluation is to showcase the benefits of our approach on problems that are deemed to be hard, even unsolvable, by state-of-the-art synthesis techniques.We therefore focused on first-order theory reasoning and evaluated our work on the group theory problems of Examples 1-3, as well as on integer arithmetic problems.
As the SMT-LIB2 format can easily be translated into the SyGuS 2.1 syntax [15], we compared our results to cvc5 1.0.4[3], supporting SyGuS-based synthesis [2].Our experiments were run on an AMD Epyc 7502, 2.5 GHz CPU with 1 TB RAM, using a 5 minutes time limit per example.Our benchmarks as well as the configurations for our experiments are available at: https://github.com/vprover/vampirebenchmarks/tree/master/synthesis Experimental results with group theory properties.Vampire synthesizes the solutions of the Examples 1-3 in 0.01, 13, and 0.03 seconds, respectively.Since these examples use uninterpreted functions, they cannot be encoded in the SyGuS 2.1 syntax, showcasing the limits of other synthesis tools.
Experimental results with maximum of n ≥ 2 integers.For the maximum of 2 integers, the specification is , and the program we synthesize is if x 1 < x 2 then x 2 else x 1 .Both our work and cvc5 are able to synthesize programs choosing the maximal value for up to n = 23 input variables, as summarized below.For n > 23, both Vampire and cvc5 time out.Experimental results with polynomial equations.Vampire can synthesize the solution of polynomial equations; for example, for 2 ), we synthesize x 1 +x 2 .Vampire finds the corresponding program in 26 seconds using simple first-order reasoning, while cvc5 fails in our setup.

Related Work
Our work builds upon deductive synthesis [13] adapted for the resolution calculus [12,21].We extend this line of work with saturation-based program synthesis, by using adjustments of the superposition calculus.
Component-based synthesis of recursion-free programs [20] from logical specifications is addressed in [20,6,23].The work of [20] uses first-order theorem proving to prove specifications and extract programs from proofs.In [6,23], ∃∀ formulas are produced to capture specifications over component properties and SMT solving is applied to find a term satisfying the formula, corresponding to a straight-line program.We complement [20] with saturation-based superposition proving and avoid template-based SMT solving from [6,23].
A prominent line of research comes with syntax guided synthesis (SyGuS) [1], where functional specifications are given using a context-free grammar.This grammar yields program templates to be synthesized via an enumerative search procedure based on SMT solving [3,8].We believe our work is complementary to SyGuS, by strengthening first-order reasoning for program synthesis, as evidenced by Examples 1-3.
The sketching technique [18,24] synthesizes program assignments to variables, using an alternative framework to the program synthesis setting we rely upon.In particular, sketching addresses domains that do not involve input logical formulas as functional specifications, such as example-guided synthesis [22].

Conclusions
We extend saturation-based proof search to saturation-based program synthesis, aiming to derive recursion-free programs from specifications.We integrate answer literals with saturation, and modify the superposition calculus and unification to synthesize computable programs.Our initial experiments show that a first-order theorem prover becomes an efficient program synthesizer, potentially opening up interesting avenues toward recursive program synthesis, for example using saturation-based proving with induction.assumptions C 1 [σ], . . ., C m [σ], by using saturation based on a sound inference system I.Then, is a program with conditions for (6).
Proof.From Theorem 1 follows that holds.Since σ are fresh uninterpreted constants, we obtain that ] is valid as well, and that is equivalent to , be programs with conditions for (6), such that given by . . .
is a program for (6).
Proof.For any interpretation I and any variable assignment v, let p be the smallest index such that ¬C p [x] holds in I under v, but all ¬C j [x], where 1 ≤ j < p, do not hold in I under v. Since is unsatisfiable, under the assumptions A 1 , . . ., A n such a p has to exist.Then in I under v and under the assumptions A 1 , . . ., A n , the interpretation of P [x] is the same as the interpretation of r p [x].
Further, since is the condition for P p [x], from the definition of a program with conditions we obtain that Finally, since this argument holds for any I and v, and since all A 1 , . . ., A n are closed formulas, also Lemma 4 [Soundness of Inferences with Answer Literals] If the rule (9) is sound, the rules (10) are sound as well.
Proof.For clarity we repeat the original rule: where θ is a substitution such that Eθ ≃ E ′ θ holds for some expressions E, E ′ .We will prove the soundness of the new rule where (θ ′ , D) is a computable unifier of E, E ′ with respect to r, and none of C 1 , . . ., C n contains an answer literal.The proof of soundness of the other new rules of ( 10) is analogous.
Assume interpretation I to be a model of the universal closures of the premises of (10'), but not a model of the universal closure of its conclusion.Then Dθ ′ , Cθ ′ and ans(r)θ ′ are false in I. From Dθ ′ being false in I and from (θ ′ , D) being an abstract unifier follows that Eθ ′ ≃ E ′ θ ′ holds.We can therefore set θ := θ ′ .From the soundness of ( 9) and Cθ ′ being false in I then follows that some of C 1 , . . ., C n is false in I.However, none of C 2 , . . ., C n can be false in I, because we assumed all premises of (10') to be true in I. Hence, C 1 is false in I. Further, from ans(r)θ ′ being false in I follows that ans(r) is false in I.However, that means that C 1 ∨ ans(r) is false in I, which contradicts the assumption that the universal closures of all premises of rule (10') are true in I.
Lemma 5 [Soundness of Sup with Answer Literals] The inference rules from Figure 3 of the extended Sup calculus with answer literals are sound.
Proof.Soundness of the factoring, equality factoring and equality resolution rules follows from Lemma 4. We will prove soundness for the first superposition rule and the second binary resolution rule.The proofs for other superposition and binary resolution rules are analogous.
For clarity we repeat the first superposition rule of Figure 3: Assume interpretation I to be a model of the universal closures of the premises of the rule, but not a model of the universal closure of its conclusion.Then there is some variable assignment v such that (D ∨ L a variable assignment that assigns to each variable x the value that xθ has in I under v. Then: Since Dθ is false in I under v, from (θ, D) being an abstract unifier of s, s ′ follows that sθ ≃ s ′ θ is true in I under v, and therefore s, s ′ have the same interpretation in I under v ′ .Then consider two cases: (a) s ≃ t is true in I under v ′ and sθ ≃ tθ is true in I under v. Then from ans(if s ≃ t then r ′ else r)θ being false in I under v follows that ans(r ′ )θ is false in I under v and therefore ans(r ′ ) is false in I under v ′ .Also from s ≃ t being true in I under v ′ , 1., and 2. follows that L[s ′ ] is false in I under v ′ .Then the whole second premise of the rule is false in I under v ′ , which is a contradiction with the assumption that I is a model of its universal closure.(b) s ≃ t is false in I under v ′ and sθ ≃ tθ is false in I under v.This case leads similarly to the first premise being false, in contradiction with the assumption.
Therefore the first superposition rule is sound.
For clarity we repeat the second binary resolution rule of Figure 3: Assume interpretation I to be a model of the universal closures of the premises of the rule, but not a model of the universal closure of its conclusion.Then there is some variable assignment v such that (D ∨ r ≃ r ′ ∨ C ∨ C ′ ∨ ans(r))θ is false in I under v.Let v ′ be a variable assignment that assigns to each variable x the value that xθ has in I under v. Then: 1. From rθ ≃ r ′ θ, Cθ, C ′ θ being false in I under v follows that r ≃ r ′ , C, C ′ are false in I under v ′ .Therefore r, r ′ have the same interpretation in I under v ′ .2. Since ans(r)θ is false in I under v, also ans(r) is false in I under v ′ .Then from 1. follows that ans(r ′ ) is also false in I under v ′ .3. Since Dθ is false in I under v, from (θ, D) being an abstract unifier of A, A ′ follows that Aθ, A ′ θ have the same interpretation in I under v, and therefore A, A ′ have the same interpretation in I under v ′ .Therefore, only one of A, ¬A ′ is true in I under v ′ , which together with C, C ′ , ans(r), ans(r ′ ) being false in I under v ′ forms a contradiction with the assumption that I is a model of both premises of the rule.
Therefore the second binary resolution rule is sound as well.
Proof.We will denote the subexpression of the expression E at position p by E|p.
We first prove that (θ, D) is an abstract unifier of E 1 , E 2 .If E 1 θ|p ′ and E 2 θ|p ′ differ, there has to be a position p, where p ′ is a prefix of p, such that the top-level symbol of E 1 θ|p and E 2 θ|p differs.From the construction of θ follows that for any position p, the subexpressions E 1 θ|p, E 2 θ|p differ in their top-level symbol only if E 1 |p = s and E 2 |p = f (t 1 , . . ., t n ) (or, symmetrically, E 1 |p = f (t 1 , . . ., t n ) and E 2 |p = s) where s is a variable and f is uncomputable.However, in this case s ≃ f (t 1 , . . ., t n ) occurs in D. Therefore, for any interpretation I, any variable assignment v, and any position p ′ , the interpretations of E 1 θ|p ′ , E 2 θ|p ′ in I under v will either be the same, or sθ ≃ f (t 1 , . . ., t n )θ will be true in I under v. Hence, (D ∨ E 1 ≃ E 2 )θ is valid, and therefore (θ, D) is an abstract unifier of E 1 , E 2 .
Next, we prove that E 3 θ is computable.Since the algorithm successfully terminated, E 3 must have been computable (otherwise it would fail).Further, the algorithm only extends the substitution θ by s → t where t is uncomputable if s does not occur in E 3 .Thus, E 3 θ is computable, and hence (θ, D) is a computable unifier.

B Example
describing the inverse element of i(x) * i(y).The trivial program derivation for this specification would only have three steps:

C Splitting and AVATAR
One of the keys to the efficiency of saturation-based theorem proving is clause splitting, with the leading approach being the AVATAR architechture [25].The main idea of splitting is as follows.Let S be a set of clauses and C 1 ∨ C 2 a clause such that C 1 , C 2 have disjoint sets of variables.We call such clauses C 1 , C 2 the components of C 1 ∨ C 2 .Then S ∪ {C 1 ∨ C 2 } is unsatisfiable iff both S ∪ {C 1 } and S ∪ {C 2 } are unsatisfiable.Therefore, instead of checking satisfiability of a set of large clauses, we check the satisfiability of multiple sets of smaller clauses.AVATAR implements this idea by using an interplay between a saturation-based first-order theorem prover and a SAT/SMT solver.The SAT/SMT solver finds a set of clause components, satisfiability of which implies satisfiability of all split clauses.These components, called assertions, are then used by the theorem prover for further derivations in saturation.All clauses derived using assertions C 1 , . . ., C n are called clauses with assertion C 1 , . . ., C n .However, when using Algorithm 1 for program synthesis with (standard) AVATAR to saturate a preprocessed specification (7) x] is a program with conditions.However, if not all of the assertions C 1 [σ], . . ., C m [σ] are computable and ground, then Algorithm 1 should continue reasoning with these assertions with the aim of reducing them to computable and ground literals.This, however, is not directly possible in the AVATAR framework.Hence, we modify AVATAR as described in Section 7.
guide saturation-based proof search to derive clauses C[σ] ∨ ans(r[σ]), where C[σ] and r[σ] are computable.4.1 From Answer Literals to Programs Our next result ensures that, if we derive the clause C[σ] ∨ ans(r[σ]), the term r[σ] is a definite answer under the assumption ¬C[σ] (Theorem 1).We note that we do not terminate saturation-based program synthesis once a clause C[σ] ∨ ans(r[σ]) is derived.We rather record the program r[x] with condition ¬C[x] (and possibly also other conditions), replace clause C[σ] ∨ ans(r[σ]) by C[σ], and continue saturation (Corollary 2).As a result, upon establishing validity of (6), we synthesized a program for (6) (Corollary 3).Theorem 1 [Semantics of Clauses with Answer Literals] Let C be a clause not containing an answer literal.Assume that, using a saturation algorithm based on a sound inference system I, the clause C ∨ ans(r[σ]) is derived from the set of clauses consisting of initial assumptions A 1 , . . ., A n , the clausified formula cnf(¬F [σ, y] ∨ ans(y)) and additional assumptions C 1 , . . ., C m .Then,

Algorithm 1 3 repeat 4 Select clause G ∈ S 5 6 for each Ci do 7 if 8 P 9 C 11 S
Saturation Loop for Recursion-Free Program Synthesis 1 initial set of clauses S := {cnf(A1 ∧ . . .∧ An ∧ ∀y.(¬F [σ, y] ∨ ans(y)))} 2 initial sets of additional assumptions C := ∅ and programs P := ∅ Derive consequences C1, . . ., Cn of G and formulas from S using rules of I Ci = (C[σ] ∨ ans(r[σ])) and C[σ] is ground and computable then := P ∪ { r[x], C ′ ∈C C ′ ∧ ¬C[x] } /* Corollary 2 */ := C ∪ {C[x]} 10 Ci := C[σ] := S ∪ {C1, . . ., Cn} 12 if ∈ S then 13 return program (8) for specification (6), derived from P /* Corollary 3 */ Our saturation algorithm is given in Algorithm 1.In a nutshell, we use Corollary 2 to construct programs from clauses C[σ] ∨ ans(r[σ]) and replace clauses C[σ] ∨ ans(r[σ]) by C[σ] (lines 7-10 of Algorithm 1).The newly added computable assumptions C[σ] are used to guide saturation towards deriving programs with conditions where the conditions contain C[x]; these programs with conditions are used for synthesizing programs for (6), as given in Corollary 3.Compared to a standard saturation algorithm used in first-order theorem proving (e.g.lines 4-5 of Algorithm 1), Algorithm 1 implements additional steps for processing newly derived clauses C[σ] ∨ ans(r[σ]) with answer literals (lines 6-10).As a result, Algorithm 1 establishes not only the validity of the specification (6) but also synthesizes a program (lines 12-13).Throughout the algorithm, we maintain a set P of programs with conditions derived so far and a set C of additional assumptions.For each new clause C i , we check if it is in the form C[σ] ∨ ans(r[σ]) where C[σ] is ground and computable (line 7).If yes, we construct a program with conditions r[x], C ′ ∈C C ′ ∧ ¬C[x] , extend C with the additional assumption C[x], and replace C i by C[σ] (lines 8-10).Then, when we derive the empty clause, we construct the final program as follows.We first collect all clauses that participated in the derivation of .We use this clause collection to filter the programs in P -we only keep a program originating from a clause C[σ] ∨ ans(r[σ]) if the condition C[σ] was used in the proof, obtaining programs P 1 , . . ., P k .From P 1 , . . ., P k we then synthesize the final program P using the construction (8) from Corollary 3.

Fig. 3 .
Fig.3.Selected rules of the extended superposition calculus Sup for reasoning with answer literals, with underlined literals being selected.
P1[x, y] := z, x * y ≃ y * x P2[x, y] := if x * (y * x) ≃ y then x else (if e ≃ x * (y * (x * y)) then x else x * y), x * y ≃ y * x Note the variable z, representing an arbitrary witness, in P 1 [x, y].An arbitrary value is a correct witness in case x * y ≃ y * x holds, as in this case (11) is trivially satisfied.Thus, we do not need to consider the case x * y ≃ y * x separately.Hence, we construct the final program P [x, y] only from P 2 [x, y] and obtain: P [x, y] := if x * (y * x) ≃ x then x else (if e ≃ x * (y * (x * y)) then x else x * y) We conclude this section by illustrating the benefits of computable unifiers.Example 3. Consider the group theory specification ∀x, y.∃z.z *