What's hard about Boolean Functional Synthesis

Given a relational specification between Boolean inputs and outputs, the goal of Boolean functional synthesis is to synthesize each output as a function of the inputs such that the specification is met. In this paper, we first show that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must necessarily generate exponential-sized Skolem functions, thereby requiring exponential time, in the worst-case. Given this inherent hardness, what does one do to solve the problem? We present a two-phase algorithm for Boolean functional synthesis, where the first phase is efficient both in terms of time and sizes of synthesized functions, and solves an overwhelming majority of benchmarks. To explain this surprisingly good performance, we provide a sufficient condition under which the first phase must produce exact correct answers. When this condition fails, the second phase builds upon the result of the first phase, possibly requiring exponential time and generating exponential-sized functions in the worst-case. Detailed experimental evaluation shows our algorithm to perform better than state-of-the-art techniques for a majority of benchmarks.


Introduction
The algorithmic synthesis of Boolean functions satisfying relational specifications has long been of interest to logicians and computer scientists. Informally, given a Boolean relation between input and outupt variables denoting the specification, our goal is to synthesize each output as a function of the inputs such that the relational specification is satisfied. Such functions have also been called Skolem functions in the literature [22,29]. Boole [7] and Lowenheim [27] studied variants of this problem in the context of finding most general unifiers. While these studies are theoretically elegant, implementations of the underlying techniques have been found to scale poorly beyond small problem instances [28]. More recently, synthesis of Boolean functions has found important applications in a wide range of contexts including reactive strategy synthesis [3,18,41], certified QBF-SAT solving [20,34,6,31], automated program synthesis [38,36], circuit repair and debugging [21], disjunctive decomposition of symbolic transition relations [40] and the like. This has spurred recent interest in developing practically efficient Boolean function synthesis algorithms. The resulting new generation of tools [29,22,2,16,39,34,33] have enabled synthesis of Boolean functions from much larger and more complex relational specifications than those that could be handled by earlier techniques, viz. [19,20,28].
In this paper, we re-examine the Boolean functional synthesis problem from both theoretical and practical perspectives. Our investigation shows that unless some hard conjectures in complexity theory are falsified, Boolean functional synthesis must necessarily generate super-polynomial or even exponential-sized Skolem functions, thereby requiring super-polynomial or exponential time, in the worst-case. Therefore, it is unlikely that an efficient algorithm exists for solving all instances of Boolean functional synthesis. There are two ways to address this hardness in practice: (i) design algorithms that are provably efficient but may give "approximate" Skolem functions that are correct only on a fraction of all possible input assignments, or (ii) design a phased algorithm, wherein the initial phase(s) is/are provably efficient and solve a subset of problem instances, and subsequent phase(s) have worst-case exponential behaviour and solve all remaining problem instances. In this paper, we combine the two approaches while giving heavy emphasis on efficient instances. We also provide a sufficient condition for our algorithm to be efficient, which indeed is borne out by our experiments.
The primary contributions of this paper can be summarized as follows.
1. We start by showing that unless P = NP, there exist problem instances where Boolean functional synthesis must take super-polynomial time. Also, unless the Polynomial Hierarchy collapses to the second level, there must exist problem instances where Boolean functional synthesis must generate super polynomial sized Skolem functions. Moreover, if the non-uniform exponential time hypothesis [13] holds, there exist problem instances where Boolean functional synthesis must generate exponential sized Skolem functions, thereby also requiring at least exponential time. 2. We present a new two-phase algorithm for Boolean functional synthesis.
(a) Phase 1 of our algorithm generates candidate Skolem functions of size polynomial in the input specification. This phase makes polynomially many calls to an NP oracle (SAT solver in practice). Hence it directly benefits from the progess made by the SAT solving community, and is efficient in practice. Our experiments indicate that Phase 1 suffices to solve a large majority of publicly available benchmarks. (b) However, there are indeed cases where the first phase is not enough (our theoretical results imply that such cases likely exist). In such cases, the first phase provides good candidate Skolem functions as starting points for the second phase. Phase 2 of our algorithm starts from these candidate Skolem functions, and uses a CEGAR-based approach to produce correct Skolem functions whose size may indeed be exponential in the input specification. 3. We analyze the surprisingly good performance of the first phase (especially in light of the theoretical hardness results) and show a sufficient condition on the structure of the input representation that guarantees correctness of the first phase. Interestingly, popular representations like ROBDDs [10] give rise to input structures that satisfy this condition. The goodness of Skolem functions generated in this phase of the algorithm can also be quantified with high confidence by invoking an approximate model counter [12], whose complexity lies in BPP NP . 4. We conduct an extensive set of experiments over a variety of benchmarks, and show that our algorithm performs favourably vis-a-vis state-of-the-art algorithms for Boolean functional synthesis.
Related work The literature contains several early theoretical studies on variants of Boolean functional synthesis [7,27,15,8,30,5]. More recently, researchers have tried to build practically efficient synthesis tools that scale to medium or large problem instances. In [29], Skolem functions for X are extracted from a proof of validity of ∀Y∃X F (X, Y). Unfortunately, this doesn't work when ∀Y∃X F (X, Y) is not valid, despite this class of problems being important, as discussed in [16,2]. Inspired by the spectacular effectiveness of CDCL-based SAT solvers, an incremental determinization technique for Skolem function synthesis was proposed in [33]. In [19,40], a synthesis approach based on iterated compositions was proposed. Unfortunately, as has been noted in [22,16], this does not scale to large benchmarks. A recent work [16] adapts the composition-based approach to work with ROBDDs. For factored specifications, ideas from symbolic model checking using implicitly conjoined ROBDDs have been used to enhance the scalability of the technique further in [39]. In the genre of CEGAR-based techniques, [22] showed how CEGAR can be used to synthesize Skolem functions from factored specifications. Subsequently, a compositional and parallel technique for Skolem function synthesis from arbitrary specifications represented using AIGs was presented in [2]. The second phase of our algorithm builds on some of this work. In addition to the above techniques, template-based [38] or sketch-based [37] approaches have been found to be effective for synthesis when we have information about the set of candidate solutions. A framework for functional synthesis that reasons about some unbounded domains such as integer arithmetic, was proposed in [25].

Notations and Problem Statement
The set of variables {z 1 , . . . z p } is called the support of the formula, and denoted sup(F ). A literal is either a variable or its complement. We use F | zi=0 (resp. F | zi=1 ) to denote the positive (resp. negative) cofactor of F with respect to z i . A satisfying assignment or model of F is a mapping of variables in sup(F ) to {0, 1} such that F evaluates to 1 under this assignment. If π is a model of F , we write π |= F and use π(z i ) to denote the value assigned to z i ∈ sup(F ) by π. Let Z = (z i1 , z i2 , . . . z ij ) be a sequence of variables in sup(F ). We use π↓Z to denote the projection of π on Z, i.e. the sequence (π(z i1 ), π(z i2 ), . . . π(z ij )). A Boolean formula is in negation normal form (NNF) if (i) the only operators used in the formula are conjunction (∧), disjunction (∨) and negation (¬), and (ii) negation is applied only to variables. Every Boolean formula can be converted to a semantically equivalent formula in NNF. We assume an NNF formula is represented by a rooted directed acyclic graph (DAG), where nodes are labeled by ∧ and ∨, and leaves are labeled by literals. In this paper, we use AIGs [24] as the initial representation of specifications. Given an AIG with t nodes, an equivalent NNF formula of size O(t) can be constructed in O(t) time. We use |F | to denote the number of nodes in a DAG represention of F .
Let α be the subformula represented by an internal node N (labeled by ∧ or ∨) in a DAG representation of an NNF formula. We use lits(α) to denote the set of literals labeling leaves that have a path to the node N representing α in the AIG. A formula is said to be in weak decomposable NNF, or wDNNF, if it is in NNF and if for every ∧-labeled internal node in the AIG, the following holds: let α = α 1 ∧ . . . ∧ α k be the subformula represented by the internal node. Then, there is no literal l and distinct indices i, j ∈ {1, . . . k} such that l ∈ lits(α i ) and ¬l ∈ lits(α j ). Note that wDNNF is a weaker structural requirement on the NNF representation vis-a-vis the well-studied DNNF representation, which has elegant properties [14]. Specifically, every DNNF formula is also a wDNNF formula.
We say a literal l is pure in F iff the NNF representation of F has a leaf labeled l, but no leaf labeled ¬l. F is said to be positive unate in We also use X = (x 1 , . . . x n ) to denote a sequence of Boolean outputs, and Y = (y 1 , . . . y m ) to denote a sequence of Boolean inputs. The Boolean functional synthesis problem, henceforth denoted BFnS, asks: given a Boolean formula F (X, Y) specifying a relation between inputs Y = (y 1 , . . . y m ) and outputs X = (x 1 , . . . x n ), determine functions Ψ = (ψ 1 (Y), . . . ψ n (Y)) such that F (Ψ, Y) holds whenever ∃XF (X, Y) holds. Thus, ∀Y∃X (F (X, Y) ⇔ F (Ψ, Y)) must be rendered valid. The function ψ i is called a Skolem function for x i in F , and Ψ = (ψ 1 , . . . ψ n ) is called a Skolem function vector for X in F .
. . x j ) and let . It has been argued in [22,16,2,19] that given a relational specification F (X, Y), the BFnS problem can be solved by first ordering the outputs, say as x 1 ≺ x 2 · · · ≺ x n , and then synthesizing a func- . Once all such ψ i are obtained, one can substitute ψ i+1 through ψ n for x i+1 through x n respectively, in ψ i to obtain a Skolem function for x i as a function of only Y. We adopt this approach, and therefore focus on obtaining ψ i in terms of X n i+1 and Y. Furthermore, we know from [22,19] that a function ψ i is a Skolem function for . When F is clear from the context, we often omit it and write ∆ i and Γ i . It is easy to see that both ∆ i and ¬Γ i serve as Skolem functions for x i in F .

Complexity-theoretical limits
In this section, we investigate the computational complexity of BFnS. It is easy to see that BFnS can be solved in EXPTIME. Indeed a naive solution would be to enumerate all possible values of inputs Y and invoke a SAT solver to find values of X corresponding to each valuation of Y that makes F (X, Y) true. This requires worst-case time exponential in the number of inputs and outputs, and may produce an exponential-sized circuit. Given this one can ask if we can develop a better algorithm that works faster and synthesizes "small" Skolem functions in all cases? Our first result shows that existence of such small Skolem functions would violate hard complexity-theoretic conjectures. The assumption in the first statement implies that the Polynomial Hierarchy (PH) collapses completely (to level 1), while the second implies that PH collapses to level 2. A consequence of the third statement is that, under this hypothesis, there must exist an instance of BFnS for which any algorithm must take EXPTIME time. The exponential-time hypothesis ETH and its strengthened version, the nonuniform exponential-time hypothesis ETH nu are unproven computational hardness assumptions (see [17], [13]), which have been used to show that several classical decision, functional and parametrized NP-complete problems (such as clique) are unlikely to have sub-exponential algorithms. ETH nu states that there is no family of algorithms (one for each family of inputs of size n) that can solve 3-SAT in subexponential time. In [13] it is shown that if ETH nu holds, then p-Clique, the parametrized clique problem, cannot be solved in sub-exponential time, i.e., for all d ∈ N, and sufficiently large fixed k, determining whether a graph G has a clique of size k cannot be done in DTIME(n d ).
Proof. We describe a reduction from p-Clique to BFnS. Given an undirected graph G = (V, E) on n-vertices and a number k (encoded in binary), we want to check if G has a clique of size k. We encode the graph as follows: each vertex v ∈ V is identified by a unique number in {1, . . . n}, and for every (i, j) ∈ V × V , we introduce an input variable y i,j that is set to 1 iff (i, j) ∈ E. We call the resulting vector of input variables y. We also have additional input variables z = z 1 , . . . z m , which represent the binary encoding of k (m = ⌈log 2 k⌉). Finally, we introduce output variables x v for each v ∈ V , whose values determine which vertices are present in the clique. Let x denote the vector of x v variables.
Given inputs Y = {y, z}, and outputs X = {x}, our specification is represented by a circuit F over X, Y that verifies whether the vertices encoded by X indeed form a k-clique of the graph G. The circuit F is constructed as follows: 1. For every i, j such that 1 ≤ i < j ≤ n, we construct a sub-circuit implementing x i ∧ x j ⇒ y i,j . The outputs of all such subcircuits are conjoined to give an intermediate output, say EdgesOK. Clearly, all the subcircuits taken together have size O(n 2 ). 2. We have a tree of binary adders implementing x 1 + x 2 + . . . x n . Let the ⌈log 2 n⌉-bit output of the adder be denoted CliqueSz. The size of this adder is clearly O(n). 3. We have an equality checker that checks if CliqueSz = k. Clearly, this subcircuit has size ⌈log 2 n⌉. Let the output of this equality checker be called SizeOK. 4. The output of the specification circuit F is EdgesOK ∧ SizeOK.
Given an instance Y = {y, z} of p-Clique, we now consider the specification F (X, Y) as constructed above and feed it as input to any algorithm A for solving BFnS. Let Ψ be the Skolem function vector output by A. For each i ∈ {1, . . . n}, we now feed ψ i to the input y i of the circuit F . This effectively constructs a circuit for F (Ψ, Y). It is easy to see from the definition of Skolem functions that for every valuation of Y, the function F (Ψ, Y) evaluates to 1 iff the graph encoded by Y contains a clique of size k.
Using this reduction, we can complete the proofs of our statements: 1. If the circuits for the Skolem functions Ψ are super-polynomial size, then of course any algorithm generating Ψ must take super-polynomial time. On the other hand, if the circuits for the Skolem functions Ψ are always polysized, then F (Ψ, Y) is polynomial-sized, and evaluating it takes time that is polynomial in the input size. Thus, if A is a polynomial-time algorithm, we also get an algorithm for solving p-Clique in polynomial time, which implies that P = NP. 2. If the circuits for the Skolem functions Ψ produced by algorithm A are always polynomial-sized, then F (Ψ, Y) is polynomial-sized. Thus,with polynomialsized circuits we are able to solve p-Clique. Recall that problems that can be solved using polynomial-sized circuits are said to be in the class PSIZE (equivalently called P/poly). But since p-Clique is an NP-complete problem, we obtain that NP ⊆ P/poly. By the Karp-Lipton Theorem [23], this implies that Σ P 2 = Π P 2 , which implies that PH collapses to level 2. 3. If the circuits for the Skolem functions Ψ are sub-exponential sized in the input n, then F (Ψ, Y) is also sub-exponential sized and can be evaluated in sub-exponential time. It then follows that we can solve any instance p-Clique of input length n in sub-exponential time -a violation of ETH nu . Note that since our circuits can change for different input lengths, we may have different algorithms for different n. Hence we have to appeal to the non-uniform variant of ETH. ⊓ ⊔ Theorem 1 implies that efficient algorithms for BFnS are unlikely. We therefore propose a two-phase algorithm to solve BFnS in practice. The first phase runs in polynomial time relative to an NP-oracle and generates polynomialsized "approximate" Skolem functions. We show that under certain structural restrictions on the NNF representation of F , the first phase always returns exact Skolem functions. However, these structural restrictions may not always be met. An NP-oracle can be used to check if the functions computed by the first phase are indeed exact Skolem functions. In case they aren't, we proceed to the second phase of our algorithm that runs in worst-case exponential time. Below, we discuss the first phase in detail. The second phase is an adaptation of an existing CEGAR-based technique and is described briefly later.

Phase 1: Efficient polynomial-sized synthesis
An easy consequence of the definition of unateness is the following.
Hence, we conclude that 1 is indeed a correct Skolem function for x i in F . The proof for negative unateness follows on the same lines.

⊓ ⊔
The above result gives us a way to identify outputs x i for which a Skolem function can be easily computed. Note that if x i (resp. ¬x i ) is a pure literal in F , then F is positive (resp. negative) unate in x i . However, the converse is not necessarily true. In general, a semantic check is necessary for unateness. In fact, it follows from the definition of unateness that F is positive (resp. negative) unate in x i , iff the formula η + i (resp. η − i ) defined below is unsatisfiable.
Note that each such check involves a single invocation of an NP-oracle, and a variant of this method is described in [4]. If F is binate in an output x i , Proposition 1 doesn't help in synthesizing ψ i . Towards synthesizing Skolem functions for such outputs, recall the definitions of ∆ i and Γ i from Section 2. Clearly, if we can compute these functions, we can solve BFnS. While computing ∆ i and Γ i exactly for all x i is unlikely to be efficient in general (in light of Theorem 1), we show that polynomial-sized "good" approximations of ∆ i and Γ i can be computed efficiently. As our experiments show, these approximations are good enough to solve BFnS for several benchmarks. Further, with an access to an NP-oracle, we can also check when these approximations are indeed good enough.
Given a relational specification F (X, Y), we use F (X, X, Y) to denote the formula obtained by first converting F to NNF, and then replacing every occurrence of ¬x i (x i ∈ X) in the NNF formula with a fresh variable x i . As an The following is an easy consequence of Proposition 2.
Proposition 3. For every i ∈ {1, . . . n}, the following holds: , Y) Proposition 3 allows us to bound ∆ i and Γ i as follows.

Lemma 1.
For every x i ∈ X, we have: In the remainder of the paper, we only use under-approximations of ∆ i and Γ i , and use δ i and γ i respectively, to denote them. Recall from Section 2 that both ∆ i and ¬Γ i suffice as Skolem functions for x i . Therefore, we propose to use either δ i or ¬γ i (depending on which has a smaller AIG) obtained from Lemma 1 as our approximation of ψ i . Specifically, ). As noted in [33], this is a difficult example for CEGAR-based QBF solvers, when n is large.
. Clearly, ψ n = y n . On reverse-substituting, we get ψ n−1 = y n−1 ∨ (ψ n ⇔ ¬y n ) = y n−1 ∨ 0 = y n−1 . Continuing in this way, we get ψ i = y i for all i ∈ {1, . . . n}. The same result is obtained regardless of whether we choose δ i or ¬γ i for each ψ i . Thus, our approximation is good enough to solve this problem. In fact, it can be shown Note that the approximations of Skolem functions, as given in Eqn (3), are efficiently computable for all i ∈ {1, . . . n}, as they involve evaluating F with a subset of inputs set to constants. This takes no more than O(|F |) time and space. As illustrated by Example 1, these approximations also often suffice to solve BFnS. The following theorem partially explains this.
Proof. To prove part (a), we use induction on i. The base case corresponds to i = 1. Recall that ∃X 1 . Therefore, if the condition in Theorem 2(a) holds for i = 1, we then have . This proves the base case. Let us now assume (inductive hypothesis) that the statement of Theorem 2(a) holds for 1 ≤ i < n. We prove below that the same statement holds for i + 1 as well. Clearly, ∃X i+1 . From the condition in Theorem 2(a), we also have The implication in the reverse direction follows from Proposition 2(a). Thus we have a bi-implication above, which we have already seen is equivalent to ∃X i+1 1 F (X, Y). This proves the inductive case. To prove part (b), we first show that if F (X, ¬X, Y) is in wDNNF, then the condition in Theorem 2(a) must hold for all j ∈ {1, . . . n}. Theorem 2(b) then follows from the definitions of ∆ i and Γ i (see Section 2), from the statement of Theorem 2(a) and from the definitions of δ i and γ i (see Eqn 3).
To prove by contradiction, suppose F is in wDNNF but there exists j (1 ≤ j ≤ n) such that ζ(X n j+1 , X n j+1 , Y) is satisfiable. Let X n j+1 = σ, X n j+1 = κ and Y = θ be a satisfying assignment of ζ. We now consider the simplified circuit obtained by substituting 1 j−1 for X j−1 1 as well as for X j−1 1 , σ for X n j+1 , κ for X n j+1 and θ for Y in the AIG for F . This simplification replaces the output of every internal node with a constant (0 or 1), if the node evaluates to a constant under the above assignment. Note that the resulting circuit can have only x j and x j as its inputs. Furthermore, since the assignment satisfies ζ, it follows that the simplified circuit evaluates to 1 if both x j and x j are set to 1, and it evaluates to 0 if any one of x j or x j is set to 0. This can only happen if there is a node labeled ∧ in the AIG representing F with a path leading from the leaf labeled x j , and another path leading from the leaf labeled ¬x j . This is a contradiction, since F is in wDNNF. Therefore, there is no j ∈ {1, . . . n} such that the condition of Theorem 2(a) is violated.
⊓ ⊔ In general, the candidate Skolem functions generated from the approximations discussed above may not always be correct. Indeed, the conditions discussed above are only sufficient, but not necessary, for the approximations to be exact. Hence, we need a separate check to see if our candidate Skolem functions are correct. To do this, we use an error formula , as described in [22], and check its satisfiability. The correctness of this check depends on the following result from [22].

Theorem 3 ([22]
). ε Ψ is unsatisfiable iff Ψ is a correct Skolem function vector. We now combine all the above ingredients to come up with algorithm bfss (for Blazingly Fast Skolem Synthesis), as shown in Algorithm 1. The algorithm can be divided into three parts. In the first part (lines 2-11), unateness is checked. This is done in two ways: (i) we identify pure literals in F by simply examining the labels of leaves in the DAG representation of F in NNF, and (ii) we check the satisfiability of the formulas η + i and η − i , as defined in Eqn 1 and Eqn 2. This requires invoking a SAT solver in the worst-case, and is repeated at most O(n 2 ) times until there are no more unate variables. Hence this requires O(n 2 ) calls to a SAT solver. Once we have done this, by Proposition 1, the constants 1 or 0 (for positive or negative unate variables respectively) are correct Skolem functions for these variables.
In the second part, we fix an ordering of the remaining output variables according to an experimentally sound heuristic, as described in Section 6, and compute candidate Skolem functions for these variables according to Equation 3. We then check the satisfiability of the error formula ǫ Ψ to determine if the candidate Skolem functions are indeed correct. If the error formula is found to be unsatisfiable, we know from Theorem 3 that we have the correct Skolem functions, which can therefore be output. This concludes phase 1 of algorithm bfss. If the error formula is found to be satisfiable, we move to phase 2 of algorithm bfss -an adaptation of the CEGAR-based technique described in [22], and discussed briefly in Section 5. It is not difficult to see that the running time of phase 1 _ is polynomial in the size of the input, relative to an NP-oracle (SAT solver in practice). This also implies that the Skolem functions generated can be of at most polynomial size. Finally, from Theorem 2(b) we also obtain that if F is in wDNNF, Skolem functions generated in phase 1 are correct. From the above reasoning, we obtain the following properties of phase 1 of bfss: Discussion: We make two crucial and related observations. First, by our hardness results in Section 3, we know that the above algorithm cannot solve BFnS for all inputs, unless some well-regarded complexity-theoretic conjectures fail. As a result, we must go to phase 2 on at least some inputs. Surprisingly, our experiments show that this is not necessary in the majority of benchmarks. The second observation tries to understand why phase 1 works in most cases in practice. While a conclusive explanation isn't easy, we believe Theorem 2 explains the success of phase 1 in several cases. By [14], we know that all Boolean functions have a DNNF (and hence wDNNF) representation, although it may take exponential time to compute this representation. This allows us to define two preprocessing procedures. In the first, we identify cases where we can directly convert to wDNNFand use the Phase 1 algorithm above. And in the second, we use several optimization scripts available in the ABC [26] library to optimize the AIG representation of F . For a majority of benchmarks, this appears to yield a representation of F that allows the proof of Theorem 2(a) to go through. For the rest, we apply the Phase 2 algorithm as described below.
Quantitative guarantees of "goodness" Given our theoretical and practical insights of the applicability of phase 1 of bfss, it would be interesting to measure how much progress we have made in phase 1, even if it does not give the correct Skolem functions. One way to measure this "goodness" is to estimate the number of counterexamples as a fraction of the size of the input space. Specifically, given the error formula, we get an approximate count of the number of models for this formula projected on the inputs Y. This can be obtained efficiently in practice with high confidence using state-of-the-art approximate model counters, viz. [12], with complexity in BPP NP . The approximate count thus obtained, when divided by 2 |Y| gives the fraction of input combinations for which the candidate Skolem functions output by phase 1 do not work correctly. We call this the goodness ratio of our approximation.

Phase 2: Counterexample-guided refinement
For phase 2, we can use any off-the-shelf worst-case exponential-time Skolem function generator. However, given that we already have candidate Skolem functions with guarantees on their "goodness", it is natural to use them as starting points for phase 2. Hence, we start off with candidate Skolem functions for all x i as computed in phase 1, and then update (or refine) them in a counterexampledriven manner. Intuitively, a counterexample is a value of the inputs Y for which there exists a value of X that renders F (X, Y) true, but for which F (Ψ, Y) evaluates to false. As shown in [22], given a candidate Skolem function vector, every satisfying assignment of the error formula ε Ψ gives a counterexample. The refinement step uses this satisfying assignment to update an appropriate subset of the approximate δ i and γ i functions computed in phase 1. The entire process is then repeated until no counterexamples can be found. The final updated vector of Skolem functions then gives a solution of the BFnS problem. Note that this idea is not new [22,2]. The only significant enhancement we do over the algorithm in [22] is to use an almost-uniform sampler [11] to efficiently sample the space of counterexamples almost uniformly. This allows us to do refinement with a diverse set of counterexamples, instead of using counterexamples in a corner of the solution space of ε Ψ that the SAT solver heuristics zoom down on.

Experimental results
Experimental methodology. Our implementation consists of two parallel pipelines that accept the same input specification but represent them in two different ways. The first pipeline takes the input formula as an AIG and builds an NNF (not necessarily wDNNF) DAG, while the second pipeline builds an ROBDD from the input AIG using dynamic variable reordering (no restrictions on variable order), and then obtains a wDNNF representation from it using the linear-time algorithm described in [14]. Once the NNF/wDNNF representation is built, we use Algorithm 1 in Phase 1 and CEGAR-based synthesis using UniGen [11] to sample counterexamples in Phase 2. We call this ensemble of two pipelines as bfss. We compare bfss with the following algorithms/tools: (i) parSyn [2], (ii) Cadet [35], (iii) RSynth [39], and (iv) AbsSynthe-Skolem (based on the BFnS step of AbsSynthe [9]).
Our implementation of bfss uses the ABC [26] library to represent and manipulate Boolean functions. Two different SAT solvers can be used with bfss: ABC's default SAT solver, or UniGen [11] (to give almost-uniformly distributed counterexamples). All our experiments use UniGen.
Since different tools accept benchmarks in different formats, each benchmark was converted to both qdimacs and verilog/aiger formats. All benchmarks and the procedure by which we generated (and converted) them are detailed in [1]. Recall that we use two pipelines for bfss. We use "balance; rewritel; refactor -l; balance; rewrite -l; rewrite -lz; balance; refactor -lz; rewrite -lz; balance" as the ABC script for optimizing the AIG representation of the input specification. We observed that while this results in only 4 benchmarks being in wDNNF in the first pipeline, 219 benchmarks were solved in Phase 1 using this pipeline. This is attributable to specifications being unate in several output variables, and also satisfying the condition of Theorem 2(a) (while not being in wDNNF). In the second pipeline, however, we could represent 230 benchmarks in wDNNF, and all of these were solved in Phase 1.
For each benchmark, the order (ref. step 12 of Algorithm 1) in which Skolem functions are generated is such that the variable which occurs in the transitive fan-in of the least number of nodes in the AIG representation of the specification is ordered before other variables. This order ( ) is used for both bfss and parSyn. Note that the order is completely independent of the dynamic variable order used to construct an ROBDD of the input specification in the second pipeline, prior to getting the wDNNF representation. All experiments were performed on a message-passing cluster, with 20 cores and 64 GB memory per node, each core being a 2.2 GHz Intel Xeon processor. The operating system was Cent OS 6.5. Twenty cores were assigned to each run of parSyn. For RSynth and Cadet a single core on the cluster was used, since these tools don't exploit parallel processing. Each pipeline of bfss was executed on a single node; the computation of candidate functions, building of error formula and refinement of the counterexamples was performed sequentially on 1 thread, and UniGen had 19 threads at its disposal (idle during Phase 1).
The maximum time given for execution of any run was 3600 seconds. The total amount of main memory for any run was restricted to 16GB. The metric used to compare the algorithms was time taken to synthesize Boolean functions. The time reported for bfss is the better of the two times obtained from the alternative pipelines described above. Detailed results from the individual pipelines are available in Appendix A.
Results. Of the 504 benchmarks, 177 benchmarks were not solved by any tool -6 of these being from arithmetic benchmarks and 171 from QBFEval.   Table 1 gives a summary of the performance of bfss (considering the combined pipelines) over different benchmarks suites. Of the 504 benchmarks, bfss was successful on 278 benchmarks; of these, 170 are from QBFEval, 68 from Disjunctive Decomposition, 35 from Arithmetic and 5 from Factorization.
Of the 383 benchmarks in the QBFEval suite, we ran bfss only on 254 since we could not build succinct AIGs for the remaining benchmarks. Of these, 159 benchmarks were solved by Phase 1 (i.e., 62% of built QBFEval benchmarks) and 73 proceeded to Phase 2, of which 11 reached completion. On another 11 QBFEval benchmarks Phase 1 timed out. Of the 48 Arithmetic benchmarks, Phase 1 successfully solved 35 (i.e., ∼ 72%) and Phase 2 was started for 8 benchmarks; Phase 1 timed out on 5 benchmarks. Of the 68 Disjunctive Decomposition benchmarks, Phase 1 successfully solved 66 benchmarks (i.e., 97%), and Phase 2 was started and reached completion for 2 benchmarks. For the 5 Factorization benchmarks, Phase 1 was successful on all 5 benchmarks.
Recall that the goodness ratio is the ratio of the number of counterexamples remaining to the total size of the input space after Phase 1. For all benchmarks solved by Phase 1, the goodness ratio is 0. We analyzed the goodness ratio at the beginning of Phase 2 for 83 benchmarks for which Phase 2 started. For 13 benchmarks this ratio was small (< 0.002), and Phase 2 reached completion for these. Of the remaining benchmarks, 34 also had a small goodness ratio (< 0.1), indicating that we were close to the solution at the time of timeout. However, 27 benchmarks in QBFEval had goodness ratio greater than 0.9, indicating that most of the counter-examples were not eliminated by timeout. We next compare the performance of bfss with other state-of-art tools. For clarity, since the number of benchmarks in the QBFEval suite is considerably greater, we plot the QBFEval benchmarks separately.
bfss vs Cadet: Of the 504 benchmarks, Cadet was successful on 231 benchmarks, of which 24 belonged to Disjunctive Decomposition, 22 to Arithmetic, 1 to Factorization and 184 to QBFEval. Figure 1(a) gives the performance of the two algorithms with respect to time on the QBFEval suite. Here, Cadet solved 35 benchmarks that bfss could not solve, whereas bfss solved 21 benchmarks that could not be solved by Cadet. Figure 1(b) gives the performance of the two algorithms with respect to time on the Arithmetic, Factorization and Disjunctive Decomposition benchmarks. In these categories, there were a total of 62 benchmarks that bfss solved that Cadet could not solve, and there was 1 benchmark that Cadet solved but bfss did not solve. While Cadet takes less time on Arithmetic benchmarks and many QBFEval benchmarks, on Disjunctive Decomposition and Factorization, bfss takes less time. bfss vs parSyn: Figure 2 shows the comparison of time taken by bfss and parSyn. parSyn was successful on a total of 185 benchmarks, and could solve 1 benchmark which bfss could not solve. On the other hand, bfss solved 94 benchmarks that parSyn could not solve. From Figure 2, we can see that on most of the Arithmetic, Disjunctive Decomposition and Factorization benchmarks, bfss takes less time than parSyn. bfss vs RSynth: We next compare the performance of bfss with RSynth. As shown in Figure 3, RSynth was successful on 51 benchmarks, with 4 benchmarks  Figure 1) that could be solved by RSynth but not by bfss. In contrast, bfss could solve 231 benchmarks that RSynth could not solve! Of the benchmarks that were solved by both solvers, we can see that bfss took less time on most of them. bfss vs AbsSynthe-Skolem: AbsSynthe-Skolem was successful on 217 benchmarks, and could solve 31 benchmarks that bfss could not solve. In contrast, bfss solved a total of 92 benchmarks that AbsSynthe-Skolem could not. Figure 4 shows a comparison of running times of bfss and AbsSynthe-Skolem.

Conclusion
In this paper, we showed some complexity-theoretic hardness results for the Boolean functional synthesis problem. We then developed a two-phase approach to solve this problem, where the first phase, which is an efficient algorithm generating poly-sized functions surprisingly succeeds in solving a large number of benchmarks. To explain this, we identified sufficient conditions when phase 1 gives the correct answer. For the remaining benchmarks, we employed the second phase of the algorithm that uses a CEGAR-based approach and builds Skolem functions by exploiting recent advances in SAT solvers/approximate counters. As future work, we wish to explore further improvements in Phase 2, and other structural restrictions on the input that ensure completeness of Phase 1. As mentioned in section 6, bfss is an ensemble of two pipelines, an AIG-NNF pipeline and a BDD-wDNNF pipeline. These two pipelines accept the same input specification but represent them in two different ways. The first pipeline takes the input formula as an AIG and builds an NNF (not necessarily a wDNNF) DAG, while the second pipeline first builds an ROBDD from the input AIG using dynamic variable reordering, and then obtains a wDNNF representation from the ROBDD using the linear-time algorithm described in [14]. Once the NNF/wDNNF representation is built, the same algorithm is used to generate skolem functions, namely, Algorithm 1 is used in Phase 1 and CEGAR-based synthesis using UniGen [11] to sample counterexamples is used in Phase 2. In this section, we give the individual results of the two pipelines.
A.  In the AIG-NNF pipeline, bfss solves a total of 236 benchmarks, with 133 benchmarks in QBFEval, 31 in Arithmetic, all the 68 benchmarks of Disjunctive Decomposition and 4 benchmarks in Factorization. Of the 254 benchmarks in QBFEval (as mentioned in Section 6, we could not build succinct AIGs for the remaining benchmarks and did not run our tool on them), Phase 1 solved 122 benchmarks and Phase 2 was started on 110 benchmarks, of which 11 benchmarks reached completion. Of the 48 benchmarks in Arithmetic, Phase 1 solved 31 and Phase 2 was started on 12. On the remaining 5 Arithmetic benchmarks, Phase 1 did not reach completion. Of the 68 Disjunctive Decomposition benchmarks, 66 were successfully solved by Phase 1 and the remaining 2 by Phase 2. Phase 2 had started on all the 5 benchmarks in Factorization and reached completion on 4 benchmarks.
Plots for the AIG-NNF pipeline Figure 5 shows the performance of bfss (AIG-NNF pipeline) versus Cadet for all the four benchmark domains. Amongst the four domains, Cadet solved 53 benchmarks that bfss could not solve. Of these, 52 belonged to QBFEval and 1 belonged to Arithmetic. On the other hand, bfss solved 58 benchmarks that Cadet could not solve. Of these, 1 belonged to QBFEval, 10 to Arithmetic, 3 to Factorization and 44 to Disjunctive Decomposition. From Figure 5, we can see that while Cadet takes less time than bfss on many Arithmetic and QBFEval benchmarks, on Disjunctive Decomposition and Factorization, the AIG-NNF pipeline of bfss takes less time.  Figure 5) Figure 6 shows the performance of bfss (AIG-NNF pipeline) versus parSyn. Amongst the 4 domains, parSyn solved 22 benchmarks that bfss could not solve, of these 1 benchmark belonged to the Arithmetic domain and 21 benchmarks belonged to QBFEval. On the other hand, bfss solved 73 benchmarks that parSyn could not solve. Of these, 51 belonged to QBFEval, 17 to Arithmetic and 4 to Disjunctive Decomposition. From 5, we can see that while the behaviour of parSyn and bfss is comparable for many QBFEval benchmarks, on most of the Arithmetic, Disjunctive Decomposition and Factorization benchmarks, the AIG-NNF pipeline of bfss takes less time.   Figure  5) Figure 8 gives the comparison of of the performance of the AIG-NNF pipeline of bfss and AbsSynthe-Skolem. While AbsSynthe-Skolem solves 72 benchmarks that bfss could not solve, bfss solved 91 benchmarks that AbsSynthe-Skolem could not solve. Of these 44 belonged to QBFEval, 8 to Arithmetic and 39 to Disjunctive Decomposition.

A.2 Performance of the BDD-wDNNF pipeline
In this section, we discuss the performance of the BDD-wDNNF pipeline of bfss. Recall that in this pipeline the tool builds an ROBDD from the input AIG using dynamic variable reordering and then converts the ROBDD in a wDNNF representation. In this section, by bfss we mean, the BDD-wDNNF pipeline of the tool. Table 3 gives the performance summary of the BDD-wDNNF pipeline. Using this pipeline, the tool solved a total of 230 benchmarks, of which 143 belonged to QBFEval, 23 belonged to Arithmetic, 59 belonged to Disjunctive Decomposition and 5 belonged to Factorization. As expected, since the representation is already in wDNNF, the skolem functions generated at end of Phase 1 were indeed exact (see Theorem 2(b)) and we did not require to start Phase 2 on any benchmark. We also found that the memory requirements of this pipeline were higher, and for some benchmarks the tool failed because the ROBDDs (and hence resulting wDNNF representation) were large in size, resulting in out of memory errors or assertion failures in the underlying AIG library.  Plots for the BDD-wDNNF pipeline Figure 5 gives the performance of bfss versus Cadet. The performance of Cadet and bfss is comparable, with Cadet solving 74 benchmarks across all domains that bfss could not and bfss solving 73 benchmarks that Cadet could not. While Cadet takes less time on many QBFEval benchmarks, on many Arithmetic, Disjunctive Decomposition and Factorization Benchmarks, the BDD-wDNNF pipeline of bfss takes less time.  Figure 10 gives the performance of bfss versus parSyn. While parSyn could solve 30 benchmarks across all domains that bfss could not, the BDD-wDNNF pipeline of bfss solved 75 benchmarks that parSyn could not. Figure 11 gives the performance of bfss versus RSynth. While RSynth could solve 9 benchmarks across all domains that bfss could not, the BDD-wDNNF pipeline of bfss solved 188 benchmarks that RSynth could not. Furthermore from Figure 11, we can see that on most benchmarks, which both the tools could solve, bfss takes less time. Figure 12 gives the performance of bfss versus AbsSynthe-Skolem. While AbsSynthe-Skolem could solve 39 benchmarks across all domains that bfss could not, the BDD-wDNNF pipeline of bfss solved 52 benchmarks which could not be solved by AbsSynthe-Skolem. A.3 Comparison of the two pipelines Figure 13 compares the performances of the two pipelines. We can see that while there were some benchmarks which only one of the pipelines could solve, apart from Factorization benchmarks, for most of the QBFEval, Arithmetic and Disjunctive Decomposition Benchmarks, the time taken by the AIG-NNF pipeline was less than the time taken by the BDD-wDNNF pipeline.