Abstract
Given a set of constraints F and a user-defined weight function W on the assignment space, the problem of constrained sampling is to sample satisfying assignments of F conditioned on W. Constrained sampling is a fundamental problem with applications in probabilistic reasoning, synthesis, software and hardware testing. Consequently, the problem of sampling has been subject to intense theoretical and practical investigations over the years. Despite such intense investigations, there still remains a gap between theory and practice. In particular, there has been significant progress in the development of sampling techniques when W is a uniform distribution, but such techniques fail to handle general weight functions W. Furthermore, we are, often, interested in \(\varSigma _1^1\) formulas, i.e., \(G(X):=\,\exists Y F(X, Y)\) for some F; typically the set of variables Y are introduced as auxiliary variables during encoding of constraints to F. In this context, one wonders whether it is possible to design sampling techniques whose runtime performance is agnostic to the underlying weight distribution and can handle \(\varSigma _1^1\) formulas?
The primary contribution of this work is a novel technique, called \(\mathsf {WAPS}\), for sampling over \(\varSigma _1^1\) whose runtime is agnostic to W. \(\mathsf {WAPS}\) is based on our recently discovered connection between knowledge compilation and uniform sampling. \(\mathsf {WAPS}\) proceeds by compiling F into a well studied compiled form, d-DNNF, which allows sampling operations to be conducted in linear time in the size of the compiled form. We demonstrate that \(\mathsf {WAPS}\) can significantly outperform existing state-of-the-art weighted and projected sampler \(\mathsf {WeightGen}\), by up to 3 orders of magnitude in runtime while achieving a geometric speedup of \(296{\times }\) and solving 564 more instances out of 773. The distribution generated by \(\mathsf {WAPS}\) is statistically indistinguishable from that generated by an ideal weighted and projected sampler. Furthermore, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.
Appendix along with open source tool is available at https://github.com/meelgroup/waps.
This work was supported in part by NUS ODPRT Grant R-252-000-685-133 and National Research Foundation Singapore under its AI Singapore Programme [Award Number: AISG-RP-2018-005]. The computational work for this article was performed on resources of the National Supercomputing Centre, Singapore https://www.nscc.sg.
1 Introduction
Boolean satisfiability (SAT) has gathered applications in bounded model checking of hardware and software systems [5, 7, 51], classical planning [35] and scheduling [27]. Despite the worst-case hardness of SAT, the past few decades have witnessed a significant improvement in the runtime performance of the state-of-the-art SAT solvers [41]. This improvement has led to the usage of SAT solvers as oracles to handle problems whose complexity lies beyond NP. Among these problems, constrained sampling, that concerns with sampling from the space of solutions of a set of constraints F, subject to a user-defined weight function W, has witnessed a surge of interest owing to the wide range of applications ranging from machine learning, probabilistic reasoning, software and hardware verification to statistical physics [3, 32, 39, 45].
Not surprisingly, the problem of sampling is known to be computationally intractable. When the weight function W is fixed to a uniform distribution, the problem of constrained sampling is also known as uniform sampling. Uniform sampling has witnessed a long-standing interest from theoreticians and practitioners alike [4, 33, 38, 45]. The past few years, however, have witnessed a significant improvement in the runtime performance of the sampling tools when the weight function W is fixed to a uniform distribution owing to the rise of hashing-based paradigm [2, 11, 13, 22]. While the significant progress for uniform sampling has paved the way for its usage in constrained random simulation [45], the restriction of uniform distribution is limiting, and several applications of constrained sampling require the underlying techniques to be able to handle a wide variety of distributions and related problem formulations as listed below:
-
Literal-Weighted Sampling. In case of literal-weighted sampling, we consider the weight function over assignments defined as the product of the weight of literals, which is specified using a weight function \(W \! \left( \cdot \right) \) that assigns a non-negative weight to each literal l in a boolean formula F. As argued in [12], literal-weighted weight function suffices for most of the practical applications ranging from constrained random simulation, probabilistic reasoning, and reliability of power-grids [10, 14, 21, 45].
-
Projected Sampling. Typically, users define constraints in high-level modeling languages such as Verilog [1], Bayesian networks [14] and configuration of grids [21] and then \(\mathsf {CNF}\) encodings are employed to convert them into a \(\mathsf {CNF}\) [6]. Commonly used schemes like Tseitin encoding [50] introduce auxiliary variables during encoding; though the encoded formulas are equisatisfiable, they typically do not preserve the number of satisfying assignments. In particular, given an initial set of constraints G expressed over a set of variables X, we obtain another formula F such that \(G(X) = \exists Y F(X, Y)\). Therefore, we are concerned with sampling over solutions of F projected over a subset of variables (such as X in this case). In other words, we are concerned with sampling over \(\varSigma _1^1\) formulas.
-
Conditioned Sampling. Given a boolean formula \(\varPhi \) and a partial assignment \(\sigma \), conditioned sampling refers to sampling from the models of \(\varPhi \) that satisfy \(\sigma \). Conditioning has interesting applications in testing where one is interested in fuzzing the system with inputs that satisfy certain patterns (preconditions). Conditioning has been applied in the past for fault diagnosis [23], conformant planning [46] and databases [15].
Typically, practical applications require sampling techniques that can handle all the above formulations. While techniques based on interval propagation, binary decision diagrams and random perturbation of solution space [22, 25, 44] cannot handle projection, conditioned, and weighted sampling efficiently, the hashing-based techniques have significantly improved the scalability of sampling techniques and are capable of handling projection and literal-weighted scheme [11, 42]. However, the performance of hashing-based techniques is extremely limited in their ability to handle literal-weighted sampling, and one observes a drastic drop in their performance as the weight distribution shifts away from uniform. In this context, one wonders: whether it is possible to design techniques which can handle projection, conditioned, and literal-weighted sampling without degradation in their performance?
In this work, we answer the above question in affirmative: we extend our previously proposed knowledge compilation framework in the context of uniform sampling to handle all the three variants. We have implemented a prototype of our framework, named \(\mathsf {WAPS}\), and demonstrate that within a time limit of 1800 s, \(\mathsf {WAPS}\) performs better than the current state-of-the-art weighted and projected sampler \(\mathsf {WeightGen}\) [10], by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of \(296{\times }\). Out of the 773 benchmarks available, \(\mathsf {WAPS}\) was able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks. Furthermore, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.
A significant advantage of our framework is its simplicity: we show that our previously proposed framework in the context of uniform sampling, \(\mathsf {KUS}\) [49], can be lifted to handle literal-weighted, projection and conditioned sampling. We demonstrate that unlike hashing-based techniques, the runtime performance of \(\mathsf {WAPS}\) is not dependent on the underlying weight distribution. We want to assert that the simplicity of our framework, combined with its runtime performance and its ability to be agnostic to the underlying distribution is a significant novel contribution to the area of constrained sampling. Besides, an important contribution of our work is the theoretical analysis of sampling techniques that employ knowledge compilation.
The rest of the paper is organized as follows. We first discuss the related work in Sect. 2. We then introduce notations and preliminaries in Sect. 3. In Sect. 4 we present \(\mathsf {WAPS}\) and do theoretical analysis of \(\mathsf {WAPS}\) in Sect. 5. We then describe the experimental methodology and discuss results in Sect. 6. Finally, we conclude in Sect. 7.
2 Related Work
Weighted sampling is extensively studied in the literature with the objective of providing scalability while ensuring strong theoretical guarantees. Markov Chain Monte Carlo (\(\mathsf {MCMC}\)) sampling [32, 40] is the most popular technique for weighted sampling; several algorithms like Metropolis-Hastings and simulated annealing have been extensively studied in the literature [36, 40]. While \(\mathsf {MCMC}\) based sampling is guaranteed to converge to a target distribution under mild requirements, convergence is often impractically slow [31]. The practical adaptations for \(\mathsf {MCMC}\)-based sampling in the context of constrained-random verification has been proposed in [37]. Unfortunately, practical \(\mathsf {MCMC}\) based sampling tools use heuristics that destroy the theoretical guarantees. Interval-propagation and belief networks have also been employed for sampling [20, 26, 29], but, though these techniques are scalable, the generated distributions can deviate significantly from the uniform distribution, as shown in [38].
To bridge the wide gap between scalable algorithms and those that give strong guarantees of uniformity several hashing-based techniques have been proposed [10, 11, 24, 28] for weighted sampling. The key idea behind hashing-based techniques is to employ random parity constraints as pairwise independent hash functions to partition the set of satisfying assignments of \(\mathsf {CNF}\) formula into cells. The hashing-based techniques have achieved significant runtime performance improvement in case of uniform sampling but their scalability suffers for weight distribution and depends strongly on parameters such as tilt, which are unlikely to be small for most practical distributions [42].
In recent past, a significant amount of work has been done to compile propositional theory, often represented as a propositional formula in \(\mathsf {CNF}\) into tractable knowledge representations. One of the prominent and earliest representations is Ordered Binary Decision Diagrams (OBDDs), which have been effectively used for circuit analysis and synthesis [9]. Another family of representations known as Deterministic Decomposable Negation Normal Form (d-DNNF) [19] have proved to be influential in many probabilistic reasoning applications [14, 17, 18]. Recently, another representation called as Sentential Decision Diagram (SDD) [16] was proposed which maintains canonicity and polytime support for boolean combinations and bridged the gap of succinctness between OBDDs and d-DNNFs. In our recent work [49], we were able to tackle the problem of uniform sampling by exploiting the properties of d-DNNF. Specifically, we were able to take advantage of recent advancements made in the field of knowledge compilation and use the compiled structure to generate uniform samples while competing with the state-of-the-art tools for uniform sampling.
3 Notations and Preliminaries
A literal is a boolean variable or its negation. A clause is a disjunction of a set of literals. A propositional formula F in conjunctive normal form (\(\mathsf {CNF}\)) is a conjunction of clauses. Let Vars(F) be the set of variables appearing in F. The set Vars(F) is called support of F. A satisfying assignment or witness of F, denoted by \(\sigma \), is an assignment of truth values to variables in its support such that F evaluates to true. We denote the set of all witnesses of F as \(R_{F}\). Let var(l) denote the variable of literal l, i.e., \(var(l) = var(\lnot l)\) and \(F_{|l}\) denotes the formula obtained when literal l is set to true in F. Given an assignment \(\sigma \) over Vars(F) and a set of variables \(P \subseteq Vars(F)\), define \(\sigma _P = \{l\) | \(l \in \sigma \), \(var(l) \in P\}\) and \(R_{F\downarrow P}\) to be the projection of \(R_{F}\) onto P, i.e., \(R_{F\downarrow P} = \{ \sigma _P | \sigma \in R_{F}\}\).
Given a propositional formula F and a weight function \(W(\cdot )\) that assigns a non-negative weight to every literal, the weight of assignment \(\sigma \) denoted as \(W(\sigma )\) is the product of weights of all the literals appearing in \(\sigma \), i.e., \(W(\sigma ) = \prod _{l \in \sigma } W(l)\). The weight of a set of assignments Y is given by \(W(Y) = \sum _{\sigma \in Y} W(\sigma )\). Note that, we have overloaded the definition of weight function \(W(\cdot )\) to support different arguments – a literal, an assignment and a set of assignments. We want to highlight that the assumption about weight distribution being generated solely by a literal-weighted function stands well, as many real-world applications like probabilistic inference can be efficiently reduced to literal-weighted sampling [14]. Also, for notational convenience, whenever the formula F, weight function W and sampling set P is clear from the context, we omit mentioning it.
3.1 Weighted and Projected Generators
A weighted and projected probabilistic generator is a probabilistic algorithm that generates a witness from \(R_{F\downarrow P}\) with respect to weight distribution generated by weight function W. A weighted and projected generator \(\mathcal {G}^{wp}(\cdot , \cdot , \cdot )\) is a probabilistic generator that guarantees
An almost weighted and projected generator \(\mathcal {G}^{awp}(\cdot , \cdot , \cdot )\) relaxes this requirement, ensuring that: given a tolerance \( \varepsilon > 0\), \(\forall y \in R_{F\downarrow P}\) we have
Probabilistic generators are allowed to occasionally “fail” in the sense that no witness may be returned even if \(R_{F\downarrow P}\) is non-empty. The failure probability for such generators must be bounded by a constant strictly less than 1.
3.2 Deterministic Decomposable Negation Normal Form (d-DNNF)
To formally define d-DNNF, we first define the Negation Normal Form (NNF):
Definition 1
[19]. Let X be the set of propositional variables. A sentence in NNF is a rooted, directed acyclic graph (DAG) where each leaf node i is labeled with true, false, x or \(\lnot x\), \(x \in X\); and each internal node is labeled with \(\vee \) or \(\wedge \) and can have arbitrarily many children.
d-DNNF further imposes that the representation is:
-
Deterministic: An NNF is deterministic if the operands of \(\vee \) in all well-formed boolean formula in the NNF are mutually inconsistent.
-
Decomposable: An NNF is decomposable if the operands of \(\wedge \) in all well-formed boolean formula in the NNF are expressed on a mutually disjoint set of variables.
The deterministic and decomposable properties are conveniently expressed by AND-OR graphs (DAGs) where a node is either an AND node, an OR node or a literal. The operands of AND/OR nodes appear as children of the node. Figure 1 shows an example of d-DNNF representation. For every node t, the subformula corresponding to t is the formula corresponding to d-DNNF obtained by removing all the nodes u such that there does not exist a path from t to u. T(t) represents the set of all partial satisfying assignments for the subformula corresponding to t. The siblings of a node t are the children of the parent of t excluding t itself and the set of such children is given by Siblings(t).
Decision-DNNF is a subset of d-DNNF where the deterministic OR nodes are decision nodes [18]. The state-of-the-art d-DNNF construction tools like C2D [18], \(\textsc {Dsharp}\) [43] and D4 [30], construct the Decision-DNNF representation where each OR node has exactly two children while an AND node may have multiple children. Since our framework \(\mathsf {WAPS}\) employs modern d-DNNF compilers, we assume that the OR node has exactly two children. This assumption is only for the simplicity of exposition as our algorithms can be trivially adopted to the general d-DNNF representations.
4 Algorithm
In this section, we discuss our primary technical contribution: \(\mathsf {WAPS}\), weighted and projected sampler that samples from \(R_{F\downarrow P}\) with respect to weight function W by employing the knowledge compilation techniques.
\(\mathsf {WAPS}\) takes a \(\mathsf {CNF}\) formula F, a set of sampling variables P, a function assigning weights to literals W and required number of samples s as inputs and returns \(\mathsf {SampleList}\), a list of size s which contain samples such that each sample is independently drawn from the weighted distribution generated by W over \(R_{F\downarrow P}\).
Similar to \(\mathsf {KUS}\), \(\mathsf {WAPS}\) (Algorithm 1) mainly comprises of three phases: Compilation, Annotation and Sampling. For d-DNNF compilation, \(\mathsf {WAPS}\) invokes a specialized compilation routine \(\mathsf {PCompile}\) over the formula F and the sampling set P (line 1). This is followed by the normalization of weights such that for any literal l, \(W'(l) + W'(\lnot l) = 1\), where \(W'\) is the normalized weight returned in line 2. Then for annotation, \(\mathsf {WAnnotate}\) is invoked in line 3 which uses the weight function \(W'\) to annotate weights to all the nodes of the d-DNNF tree. Finally, subroutine \(\mathsf {Sampler}\) (line 4) is invoked which returns s independently drawn samples over P following the weighted distribution generated by W over \(R_{F\downarrow P}\). We now describe these three phases in detail.
4.1 Compilation
The compilation phase is performed using the subroutine \(\mathsf {PCompile}\). \(\mathsf {PCompile}\) is a modified procedure over the component caching and clause learning based algorithm of the d-DNNF compiler \(\textsc {Dsharp}\) [43, 47]. It is presented in Algorithm 2. The changes from the existing algorithm are . The rest of the procedure which is similar to \(\textsc {Dsharp}\) is mentioned here for completeness. The description of \(\mathsf {PCompile}\) is as follows:
\(\mathsf {PCompile}\) takes in a \(\mathsf {CNF}\) formula F in the clausal form and a set of sampling variables P as input and returns a d-DNNF over P. If the formula does not contain any variable from P, \(\mathsf {PCompile}\) invokes \(\mathsf {SAT}\) (line 2) which returns a True node if the formula is satisfiable, else it returns a False node. Otherwise, \(\mathsf {DecideLiteral}\) is invoked to choose a literal appearing in F such that \(var(l) \in P\) (line 3). This decision is then recursively propagated by invoking \(\mathsf {CompileBranch}\) to create \(t_1\), the d-DNNF of \(F_{|l}\) (line 4) and \(t_2\), the d-DNNF of \(F_{|\lnot l}\) (line 5). \(\mathsf {Disjoin}\) is invoked in line 6 which takes \(t_1\) and \(t_2\) as input and returns \(t_2\) if \(t_1\) is False node, \(t_1\) if \(t_2\) is False node otherwise a new tree composed by an OR node as the parent of \(t_1\) and \(t_2\). The result of \(\mathsf {Disjoin}\) is then stored in the cache (line 7) and returned as an output of \(\mathsf {PCompile}\) (line 8).
We now discuss the subroutine \(\mathsf {CompileBranch}\). It is presented in Algorithm 3. It takes in a \(\mathsf {CNF}\) formula F, set of sampling variables P and literal l as input and returns a d-DNNF tree of \(F_{|l}\) on P. It first invokes \(\mathsf {BCP}\) (Binary Constraint Propagation), with F, l and P which performs unit-propagation to return a tuple of reduced formula \(F'\) and a set of implied literals (term) projected over variables in P (line 1). Then \(\mathsf {CompileBranch}\) checks if \(F'\) contains an empty clause and returns False node to indicate that \(F_{|l}\) is not satisfiable, else the formula is solved using component decomposition as described below.
At line 6 it breaks the formula \(F'\) into separate components formed by disjoint set of clauses such that no two components share any variables. Then each component is solved independently (lines 8–15). For each component, it first examines the cache to see if this component has been solved earlier and is present in the cache (line 9). If cache lookup fails, it solves the component with a recursive call to \(\mathsf {PCompile}\) (line 11). If any component is found to be unsatisfiable, False node is returned implying that the overall formula is unsatisfiable too, else \(\mathsf {CompileBranch}\) simply conjoins the components’ d-DNNFs together with the decided l and implied literals (term) and returns this after storing it in the cache for the formula \(F_{|l}\) (lines 16–18).
We illustrate \(\mathsf {PCompile}\) procedure on the following example formula F:
Example 1
\(F = \{\{x_1,x_2\}, \{\lnot x_3, \lnot x_5, x_6\}, \{\lnot x_2, x_4, \lnot x_1\}, \{x_3, \lnot x_6, \lnot x_1\}, \{x_6, x_5, \lnot x_1, x_3\}, \{x_3, x_6, \lnot x_5, \lnot x_1\}\}\)
Figure 2 represents the d-DNNF of Example 1 on \(P = \{x_1,x_2,x_3\}\). For detailed discussion about applying \(\mathsf {PCompile}\) on F please refer to Appendix.
4.2 Annotation
The subroutine \(\mathsf {WAnnotate}\) is presented in Algorithm 4. \(\mathsf {WAnnotate}\) takes in a d-DNNF \(\mathsf {dag}\) and a weight function W as inputs and returns an annotated d-DNNF \(\mathsf {dag}\) whose each node t is annotated with a weight, given by the sum of weights of all the partial assignments represented by subtree rooted at t. The weights subsequently annotated on children of an OR node indicate the probability with which it would be selected in the sampling phase.
\(\mathsf {WAnnotate}\) performs a bottom up traversal on d-DNNF \(\mathsf {dag}\) in a reverse topological order. For each node in the d-DNNF \(\mathsf {dag}\), \(\mathsf {WAnnotate}\) maintains an attribute, weight, as per the label of the node given as follows:
-
Literal (lines 2–3): The weight of a literal node is taken as the weight of the literal given by the weight function W.
-
OR (lines 4–8): The weight of OR node is made equal to the sum of weights of both of its children.
-
AND (lines 9–13): The weight of AND node is made equal to the product of weights of all its children.
4.3 Sampling
Algorithm \(\mathsf {Sampler}\) takes the annotated d-DNNF \(\mathsf {dag}\) and the required number of samples s and returns \(\mathsf {SampleList}\), a list of s samples conforming to the distribution of their weights as governed by weight function W given to \(\mathsf {WAnnotate}\). The subroutine \(\mathsf {Sampler}\) is very similar to the sampling procedure in our previous work [49] except that we take the annotated weight of the node instead of the annotated count in the previous work as the probability for Bernoulli trials. We refer the readers to Appendix for a detailed discussion.
4.4 Conditioned Sampling
The usage of samplers in testing environment necessitates sampling from F conditioned on fixing assignment to a subset of variables. The state of the art techniques, such as those based on universal hashing, treat every query as independent and are unable to reuse computation across different queries. In contrast, the compilation to d-DNNF allows \(\mathsf {WAPS}\) to reuse the same d-DNNF. In particular, for a given conditioning expressed as conjunction of different literals, i.e., \( \hat{C} = \bigwedge _{i} l_i\).
In particular, instead of modifying computationally expensive d-DNNF, we modify the weight function as follows:
5 Theoretical Analysis
We now present theoretical analysis of \(\mathsf {WAPS}\), which consists of two components: correctness of \(\mathsf {WAPS}\) and analysis of behavior of \(\mathsf {Sampler}\) on the underlying d-DNNF graph. First, we prove that \(\mathsf {WAPS}\) is an exact weighted and projected sampler in Theorem 1. To this end, we prove the correctness of our projected d-DNNF \(\mathsf {dag}\) compilation procedure \(\mathsf {PCompile}\) in Lemma 1. In Lemma 2, we show that \(\mathsf {WAnnotate}\) annotates each node of the d-DNNF with weights that represent the weight of assignments represented by subtree rooted at that node. This enables us to sample as per the weight distribution in the sampling phase which is proved in Theorem 1 using Lemmas 1 and 2. Secondly, further probing into the behavior of subroutine \(\mathsf {Sampler}\), we provide an analysis of the probability of visiting any node in the d-DNNF \(\mathsf {dag}\) while sampling. For this, we first find a probability of visiting a node by following a particular path in Lemma 3 and then we use this result to prove an upper bound for the general case of visiting a node from all possible paths in Theorem 2. We believe that this analysis will motivate the researchers to find new ways to speed up or device new methods to find exact or approximate sampling techniques over a given compiled representation.
The proofs of Theorem 1 and Lemmas 1, 2 and 3 can be found in Appendix.
Lemma 1
Given a formula F and set of sampling variables P, the tree returned by \(\mathsf {PCompile}\)(F, P) is a d-DNNF \(\mathsf {dag}\) which represents the set of satisfying assignments of the formula F projected over the set of sampling variables P.
Lemma 2
Every node t in the d-DNNF \(\mathsf {dag}\) returned by \(\mathsf {WAnnotate}\) is annotated by W(T(t)), where T(t) is the set of all the partial assignments corresponding to the subtree rooted at t.
Theorem 1
For a given F, P and s, \(\mathsf {SampleList}\) is the list of samples generated by \(\mathsf {WAPS}\). Let \(\mathsf {SampleList}\)[i] indicate the sample at the \(i^{th}\) index of the list. Then for each \(y \in R_{F\downarrow P}\), \(\forall i \in [s]\), we have \(\mathsf {Pr}\left[ y = \mathsf {SampleList}[i]\right] = \frac{W(y)}{W(R_{F\downarrow P})}.\)
Lemma 3
For a given F and P, let \(\mathrm {fol}(\rho )\) be the event of following a path \(\rho \), which start at root and ends at node t, then \(\mathsf {Pr}\left[ \mathrm {fol}(\rho )\right] = \frac{W(T(t))\times c_\rho }{W(R_{F\downarrow P})}\) where \(c_\rho \) is the product of weight of all the OR nodes’ siblings encountered in the path \(\rho \) from root to t and T(t) is the set of all the partial satisfying assignments represented by subtree rooted at t.
Theorem 2
For a given F and P, let \(\mathrm {visit}(t)\) be the event of visiting a node t to fetch one sample as per subroutine \(\mathsf {Sampler}\), then \(\mathsf {Pr}\left[ \mathrm {visit}(t)\right] \le \frac{W(\varGamma (t))}{W(R_{F\downarrow P})}\) where \(\varGamma (t) = \{\sigma ~|~\sigma \in R_{F\downarrow P},~\sigma _{\downarrow Vars(t)} \in T(t)\}\) and T(t) is a set of all the partial satisfying assignments represented by subtree rooted at t.
Proof
In Lemma 3 we have calculated the probability of visiting a node t by taking a particular path from root to node t. So the probability of visiting a node t will be the sum of probability of visiting t by all possible paths. Let \(\mathcal {P} = \{\rho _1, \rho _2, \cdots , \rho _m\}\) be the set of all paths from root to node t and \(\mathrm {visit}(t)\) be the event of visiting a node t in subroutine \(\mathsf {Sampler}\) then,
From Lemma 3, \( \mathsf {Pr}\left[ \mathrm {visit}(t)\right] = \sum _{\rho \in \mathcal {P}} \frac{W(T(t))\times c_\rho }{W(R_{F\downarrow P})} \) where \(c_\rho \) is the product of the weight of all the OR nodes’ siblings encountered in a path \(\rho \) from root to t. For any such path, we call \(\{t^{1}_{\rho }, t^2_\rho , \cdots , t^n_\rho \}\) as the set of all the OR node siblings encountered on the path \(\rho \). Now, let \(\sigma ^{ext}_\rho \) be the set of assignments over P represented by path \(\rho \). Therefore,
where, \(T(\cdot )\) are set of assignments and \(\times \) is a cross product. Now, any tuple from \(\sigma ^{ext}_\rho \) represents a satisfying assignment in the d-DNNF. Therefore, \(\sigma ^{ext}_\rho \subseteq R_{F\downarrow P}\). Note that, from Lemma 2, it follows that weight annotated by \(\mathsf {WAnnotate}\) at t is equal to W(T(t)). Therefore,
And, \( W(\sigma ^{ext}_\rho ) = W(T(t))\times c_\rho \). Notice that, \(\sigma ^{ext}_\rho \subseteq \varGamma (t)\) as \(\sigma ^{ext}_\rho \) represents satisfying extensions of partial assignments contained in T(t) itself. This is true \(\forall \) \(\rho \in \mathcal {P}\). Therefore as W(.) is an increasing function,
Note that, the inequality indeed holds as the intersection of sets of partial assignments represented by t and any other node not lying on the path from root to t may not be \(\phi \) (empty). Therefore,
6 Evaluation
In order to evaluate the runtime performance and analyze the quality of samples generated by \(\mathsf {WAPS}\), we implemented a prototype in Python. For d-DNNF compilation, our prototype makes use of \(\textsc {Dsharp}\) [43] when sampling set is available else we use \(\mathsf {D4}\) [30]. We would have preferred to use state-of-the-art d-DNNF compiler \(\mathsf {D4}\) but owing to its closed source implementation, we could not modify it as per our customized compilation procedure \(\mathsf {PCompile}\). Therefore, for projected compilation, we have modified \(\textsc {Dsharp}\) which has an open-source implementation. We have conducted our experiments on a wide range of publicly available benchmarks. In all, our benchmark suite consisted of 773 benchmarks arising from a wide range of real-world applications. Specifically, we used constraints arising from DQMR networks, bit-blasted versions of SMT-LIB (SMT) benchmarks, and ISCAS89 circuits [8] with parity conditions on randomly chosen subsets of outputs and nextstate variables [34, 48]. We assigned random weights to literals wherever weights were not already available in our benchmarks. All our experiments were conducted on a high performance compute cluster whose each node consists of E5-2690 v3 CPU with 24 cores and 96 GB of RAM. We utilized single core per instance of benchmark with a timeout of 1800 s.
The objective of our evaluation was to answer the following questions:
-
1.
How does \(\mathsf {WAPS}\) perform in terms of runtime in comparison to \(\mathsf {WeightGen}\), the current state-of-the-art weighted and projected sampler?
-
2.
How does \(\mathsf {WAPS}\) perform for incremental sampling and scales when asked for different number of samples?
-
3.
How does the distribution of samples generated by \(\mathsf {WAPS}\) compare with the distribution generated by an ideal weighted and projected sampler?
-
4.
How does \(\mathsf {WAPS}\) perform for conditioning on arbitrary variables?
-
5.
How does our knowledge compilation based sampling techniques perform in comparison to hashing based sampling techniques for the task of generalizing to arbitrary weight distributions?
Our experiment demonstrated that within a time limit of 1800 s, \(\mathsf {WAPS}\) is able to significantly outperform existing state-of-the-art weighted and projected sampler \(\mathsf {WeightGen}\), by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of \(296{\times }\). Out of the 773 benchmarks available \(\mathsf {WAPS}\) was able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks. For incremental sampling, \(\mathsf {WAPS}\) achieves a geometric speedup of 3.69. Also, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested. Empirically, the distribution generated by \(\mathsf {WAPS}\) is statistically indistinguishable from that generated by an ideal weighted and projected sampler. Also, while performing conditioned sampling in \(\mathsf {WAPS}\), we incur no extra cost in terms of runtime in most of the cases. Moreover, the performance of our knowledge compilation based sampling technique is found to be oblivious to weight distribution. We present results for only a subset of representative benchmarks here. Detailed data along with the expanded versions of all the tables presented here is available at https://github.com/meelgroup/waps.
Number of Instances Solved. We compared the runtime performance of \(\mathsf {WAPS}\) with \(\mathsf {WeightGen}\) [10] (state-of-the-art weighted and projected sampler) by generating 1000 samples from each tool with a timeout of 1800 s. Figure 3 shows the cactus plot for \(\mathsf {WeightGen}\) and \(\mathsf {WAPS}\). We present the number of benchmarks on the \(x-\)axis and the time taken on \(y-\)axis. A point (x, y) implies that x benchmarks took less than or equal to y seconds to sample. All our runtime statistics for \(\mathsf {WAPS}\) include the time for the knowledge compilation phase (via \(\mathsf {D4}\) or \(\textsc {Dsharp}\)). From all the 773 available benchmarks \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks while \(\mathsf {WAPS}\) was able to sample from 588 benchmarks. Table 1 shows the runtimes of some of the benchmarks on the two tools. The columns in the table give the benchmark name, number of variables, number of clauses, size of sampling set, time taken in seconds by \(\mathsf {WeightGen}\) and \(\mathsf {WAPS}\) divided into time taken by Compilation and A+S: Annotation and Sampling followed by speedup of \(\mathsf {WAPS}\) with respect to \(\mathsf {WeightGen}\). Table 1 clearly shows that \(\mathsf {WAPS}\) outperforms \(\mathsf {WeightGen}\) by upto 3 orders of magnitude. For all the 24 benchmarks that \(\mathsf {WeightGen}\) was able to solve \(\mathsf {WAPS}\) outperformed \(\mathsf {WeightGen}\) with a geometric speedup of \(296{\times }\).
Incremental Sampling. Incremental sampling involves fetching multiple, relatively small-sized samples until the objective (such as desired coverage or violation of property) is achieved. We benefit from pre-compiled knowledge representations in this scenario, as they allow us to perform repeated sampling as per varied distributions. If weights are changed, we simply Annotate the tree again followed by sampling, else, we directly move to the sampling phase, thus saving a significant amount of time by bypassing the compilation phase.
In our experiments, we have evaluated the time taken by \(\mathsf {WAPS}\) for 1000 samples in 10 successive calls with same weights. The results are presented in Table 2 for a subset of benchmarks. The first column mentions the benchmark name with the number of variables, clauses and size of sampling set in subsequent columns. The time taken by \(\mathsf {WAPS}\) for first run to fetch 1000 samples is given in the fifth column while the overall time taken for first run together with the subsequent 9 incremental runs is presented in sixth column. The final column shows the average gain in terms of speedup calculated by taking the ratio of time taken by \(\mathsf {WAPS}\) for first run with the average time taken by \(\mathsf {WAPS}\) for subsequent 9 incremental runs thus resulting in a total of 10000 samples. Overall, \(\mathsf {WAPS}\) achieves a geometric speedup of \(3.69{\times }\) on our set of benchmarks.
Effect of Number of Samples. To check how \(\mathsf {WAPS}\) scales with different number of samples, we invoked \(\mathsf {WAPS}\) for fetching different number of samples: 1000, 2000, 4000, 8000, 10000 with a timeout of 1800 s. Table 3 presents the runtime of \(\mathsf {WAPS}\) for different samples on some benchmarks. The first column represents the benchmark name. Second, third and fourth columns represent the number of variables, clauses and size of sampling set. The next five columns represent the time taken by \(\mathsf {WAPS}\) for 1000, 2000, 4000, 8000 and 10000 samples. Table 3 clearly demonstrates that \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.
Uniform Sampling Generalized for Weighted Sampling. To explore the trend in performance between uniform and weighted sampling on the dimension of hashing based techniques pitched against our newly proposed sampling techniques based on knowledge compilation, we compared \(\mathsf {WAPS}\) to \(\mathsf {KUS}\) in a parallel comparison between \(\mathsf {WeightGen}\) and \(\mathsf {UniGen2}\). Specifically, we ran \(\mathsf {WAPS}\) for weighted sampling and \(\mathsf {KUS}\) for uniform sampling without utilizing the sampling set as \(\mathsf {KUS}\) does not support the sampling set. On the other hand, for hashing based sampling techniques, we compared \(\mathsf {WeightGen}\) to \(\mathsf {UniGen2}\) while using the sampling set. Figure 4 shows the cactus plot for \(\mathsf {WeightGen}\) and \(\mathsf {UniGen2}\) and Fig. 5 shows a cactus plot for \(\mathsf {WAPS}\) and \(\mathsf {KUS}\). From all the 773 benchmarks, \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks while \(\mathsf {UniGen2}\) was able to sample from 208 benchmarks. In comparison, \(\mathsf {WAPS}\) was able to sample from 606 benchmarks while \(\mathsf {KUS}\) was able to sample from 602 benchmarks. Our experiments demonstrated that the performance of hashing-based techniques is extremely limited in their ability to handle literal-weighted sampling and there is a drastic drop in their performance as the weight distribution shifts away from uniform. While for our knowledge compilation based sampling techniques we observe that their performance is oblivious to the weight distribution.
Distribution Comparison. We measure the distribution of \(\mathsf {WAPS}\) vis-a-vis an ideal weighted and projected sampler (\(\mathsf {IS}\)) and observed that \(\mathsf {WAPS}\) is statistically indistinguishable from \(\mathsf {IS}\). Please refer to Appendix for more detailed discussion.
Effect of Conditioning on Variables. We evaluated the performance of \(\mathsf {WAPS}\) in the context of conditioned sampling. We observed a slight improvement in average runtime as more and more variables get constrained. For detailed results, please refer to Appendix.
7 Conclusion
In this paper, we designed a knowledge compilation-based framework, called \(\mathsf {WAPS}\), for literal-weighted, projected and conditional sampling. \(\mathsf {WAPS}\) provides strong theoretical guarantees and its runtime performance upon the existing state-of-the-art weighted and projected sampler WeightGen, by up to 3 orders of magnitude in terms of runtime. Out of the 773 benchmarks available, \(\mathsf {WAPS}\) is able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) is only able to sample from 24 benchmarks. \(\mathsf {WAPS}\) achieves a geometric speedup of 3.69 for incremental sampling. It is worth noting that \(\mathsf {WeightGen}\) has weaker guarantees than \(\mathsf {WAPS}\). Furthermore, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.
Change history
02 April 2019
In the version of this paper that was originally published, there was an error in the acknowledgement at the bottom of the first page. “AI Singapore Grant [R-252-000-A16-490]” was mentioned instead of “National Research Foundation Singapore under its AI Singapore Programme [Award Number: AISG-RP-2018-005]”. This has now been corrected.
References
System Verilog (2015). http://www.systemverilog.org
Achlioptas, D., Hammoudeh, Z.S., Theodoropoulos, P.: Fast sampling of perfectly uniform satisfying assignments. In: Beyersdorff, O., Wintersteiger, C.M. (eds.) SAT 2018. LNCS, vol. 10929, pp. 135–147. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-94144-8_9
Bacchus, F., Dalmao, S., Pitassi, T.: Algorithms and complexity results for #SAT and Bayesian inference. In: Proceedings of FOCS, pp. 340–351 (2003)
Bellare, M., Goldreich, O., Petrank, E.: Uniform generation of NP-witnesses using an NP-oracle. Inf. Comput. 163(2), 510–526 (2000)
Biere, A., Cimatti, A., Clarke, E., Fujita, M., Zhu, Y.: Symbolic model checking using SAT procedures instead of BDDs. In: Proceedings of DAC, pp. 317–320 (1999)
Biere, A., Heule, M., van Maaren, H., Walsh, T.: Handbook of Satisfiability. IOS Press, Amsterdam (2009)
Bjesse, P., Leonard, T., Mokkedem, A.: Finding bugs in an Alpha microprocessor using satisfiability solvers. In: Berry, G., Comon, H., Finkel, A. (eds.) CAV 2001. LNCS, vol. 2102, pp. 454–464. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44585-4_44
Brglez, F., Bryan, D., Kozminski, K.: Combinational profiles of sequential benchmark circuits. In: Proceedings of ISCAS, pp. 1929–1934 (1989)
Bryant, R.E.: Symbolic Boolean manipulation with ordered binary-decision diagrams. ACM Comput. Surv. (CSUR) 24(3), 293–318 (1992)
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: Distribution-aware sampling and weighted model counting for SAT. In: Proceedings of AAAI, pp. 1722–1730 (2014)
Chakraborty, S., Fremont, D.J., Meel, K.S., Seshia, S.A., Vardi, M.Y.: On parallel scalable uniform SAT witness generation. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 304–319. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_25
Chakraborty, S., Fried, D., Meel, K.S., Vardi, M.Y.: From weighted to unweighted model counting. In: Proceedings of IJCAI, pp. 689–695 (2015)
Chakraborty, S., Meel, K.S., Vardi, M.Y.: A scalable and nearly uniform generator of SAT witnesses. In: Sharygina, N., Veith, H. (eds.) CAV 2013. LNCS, vol. 8044, pp. 608–623. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39799-8_40
Chavira, M., Darwiche, A.: On probabilistic inference by weighted model counting. Artif. Intell. 172(6), 772–799 (2008)
Dalvi, N.N., Schnaitter, K., Suciu, D.: Computing query probability with incidence algebras. In: Proceedings of PODS, pp. 203–214 (2010)
Darwiche, A.: SDD: a new canonical representation of propositional knowledge bases. In: Proceedings of 22nd International Joint Conference on Artificial Intelligence, pp. 819–826 (2011)
Darwiche, A.: On the tractable counting of theory models and its application to belief revision and truth maintenance. CoRR (2000)
Darwiche, A.: New advances in compiling CNF to decomposable negation normal form. In: Proceedings of ECAI, pp. 318–322 (2004)
Darwiche, A., Marquis, P.: A knowledge compilation map. J. Artif. Intell. Res. 17, 229–264 (2002)
Dechter, R., Kask, K., Bin, E., Emek, R.: Generating random solutions for constraint satisfaction problems. In: Proceedings of AAAI, pp. 15–21 (2002)
Duenas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y.: Counting-based reliability estimation for power-transmission grids. In: Proceedings of AAAI (2017)
Dutra, R., Laeufer, K., Bachrach, J., Sen, K.: Efficient sampling of SAT solutions for testing. In: Proceedings of ICSE, pp. 549–559 (2018)
Elliott, P., Williams, B.: DNNF-based belief state estimation. In: Proceedings of AAAI, pp. 36–41 (2006)
Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B.: Embed and project: discrete sampling with universal hashing. In: Proceedings of NIPS, pp. 2085–2093 (2013)
Ermon, S., Gomes, C.P., Selman, B.: Uniform solution sampling using a constraint solver as an Oracle. In: Proceedings of UAI, pp. 255–264 (2012)
Gogate, V., Dechter, R.: A new algorithm for sampling CSP solutions uniformly at random. In: Benhamou, F. (ed.) CP 2006. LNCS, vol. 4204, pp. 711–715. Springer, Heidelberg (2006). https://doi.org/10.1007/11889205_56
Gomes, C.P., Selman, B., McAloon, K., Tretkoff, C.: Randomization in backtrack search: exploiting heavy-tailed profiles for solving hard scheduling problems. In: Proceedings of AIPS (1998)
Ivrii, A., Malik, S., Meel, K.S., Vardi, M.Y.: On computing minimal independent support and its applications to sampling and counting. Constraints 21, 1–18 (2015). https://doi.org/10.1007/s10601-015-9204-z
Iyer, M.A.: RACE: a word-level ATPG-based constraints solver system for smart random simulation. In: Proceedings of ITC, pp. 299–308 (2003)
Lagniez, J.-M ., Marquis, P.: An improved decision-DNNF compiler. In: Proceedings of IJCAI, pp. 667–673 (2017)
Jerrum, M.R., Sinclair, A.: Approximating the permanent. SIAM J. Comput. 18(6), 1149–1178 (1989)
Jerrum, M.R., Sinclair, A.: The Markov Chain Monte Carlo method: an approach to approximate counting and integration. In: Hochbaum, D.S. (ed.) Approximation Algorithms for NP-Hard Problems, pp. 482–520. ACM, New York (1996)
Jerrum, M.R., Valiant, L.G., Vazirani, V.V.: Random generation of combinatorial structures from a uniform distribution. Theoret. Comput. Sci. 43(2–3), 169–188 (1986)
John, A.K., Chakraborty, S.: A quantifier elimination algorithm for linear modular equations and disequations. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 486–503. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_39
Kautz, H., Selman, B.: Pushing the envelope: planning, propositional logic, and stochastic search. In: Proceedings of AAAI (1996)
Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)
Kitchen, N.: Markov Chain Monte Carlo stimulus generation for constrained random simulation. Ph.D. thesis, University of California, Berkeley (2010)
Kitchen, N., Kuehlmann, A.: Stimulus generation for constrained random simulation. In: Proceedings of ICCAD, pp. 258–265 (2007)
Madras, N., Piccioni, M.: Importance sampling for families of distributions. Ann. Appl. Probab. 9, 1202–1225 (1999)
Madras, N.: Lectures on Monte Carlo Methods, Fields Institute Monographs, vol. 16. American Mathematical Society, Providence (2002)
Malik, S., Zhang, L.: Boolean satisfiability from theoretical hardness topractical success. Commun. ACM 52(8), 76–82 (2009)
Meel, K.S.: Constrained counting and sampling: bridging the gap between theory and practice. Ph.D. thesis, Rice University (2017)
Muise, C., McIlraith, S.A., Beck, J.C., Hsu, E.I.: DSHARP: fast d-DNNF compilation with sharpSAT. In: Proceedings of AAAI, pp. 356–361 (2016)
Naveh, R., Metodi, A.: Beyond feasibility: CP usage in constrained-random functional hardware verification. In: Schulte, C. (ed.) CP 2013. LNCS, vol. 8124, pp. 823–831. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40627-0_60
Naveh, Y., et al.: Constraint-based random stimuli generation for hardware verification. In: Proceedings of IAAI, pp. 1720–1727 (2006)
Palacios, H., Bonet, B., Darwiche, A., Geffner, H.: Pruning conformant plans by counting models on compiled d-DNNF representations. In: Proceedings of ICAPS, pp. 141–150 (2005)
Sang, T., Bacchus, F., Beame, P., Kautz, H.A., Pitassi, T.: Combining component caching and clause learning for effective model counting. In: Proceedings of SAT (2004)
Sang, T., Beame, P., Kautz, H.: Performing Bayesian inference by weighted model counting. In: Proceedings of AAAI, pp. 475–481 (2005)
Sharma, S., Gupta, R., Roy, S., Meel, K.S.: Knowledge compilation meets uniform sampling. In: Proceedings of LPAR-22, pp. 620–636 (2018)
Tseitin, G.S.: On the complexity of derivation in propositional calculus. In: Siekmann, J.H., Wrightson, G. (eds.) Automation of Reasoning. Symbolic Computation (Artificial Intelligence). Springer, Berlin, Heidelberg (1983). https://doi.org/10.1007/978-3-642-81955-1_28
Velev, M.N., Bryant, R.E.: Effective use of Boolean satisfiability procedures in the formal verification of superscalar and VLIW microprocessors. J. Symb. Comput. 2, 73–106 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2019 The Author(s)
About this paper
Cite this paper
Gupta, R., Sharma, S., Roy, S., Meel, K.S. (2019). \(\mathsf {WAPS}\): Weighted and Projected Sampling. In: Vojnar, T., Zhang, L. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2019. Lecture Notes in Computer Science(), vol 11427. Springer, Cham. https://doi.org/10.1007/978-3-030-17462-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-17462-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17461-3
Online ISBN: 978-3-030-17462-0
eBook Packages: Computer ScienceComputer Science (R0)