## Abstract

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in artificial intelligence with applications in diverse domains ranging from network reliability to probabilistic databases. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques for #DNF. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do the various approaches stack up against each other empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF. The result of our study produces a nuanced picture. First of all, we observe that there is no single best algorithm that outperforms all others for all classes of formulas and input parameters. Second, we observe that the algorithm with one of the worst time complexities solves the largest number of benchmarks.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
Note that \({\mathcal {A}}\) is typically represented implicitly such as using constraints in DNF in the context of this paper.

- 2.
Code and results can be accessed at https://gitlab.com/Shrotri/DNF_Counting

- 3.
Figures are best viewed online in color.

## References

- 1.
Dueñas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y. (2017). Counting-based reliability estimation for power-transmission grids. In

*Proceedings of AAAI conference on artificial intelligence (AAAI)*. - 2.
Bacchus, F., Dalmao, S., Pitassi, T. (2003). Algorithms and complexity results for #SAT and Bayesian inference, In

*Proceedings of FOCS*(pp. 340–351) ISBN: 0-7695-2040-5. http://dl.acm.org/citation.cfm?id=946243.946291. - 3.
Sang, T., Beame, P., Kautz, H. (2005). Performing Bayesian inference by weighted model counting. In

*Prof. of AAAI*(pp. 475–481). - 4.
Dalvi, N., & Suciu, D. (2007). Efficient query evaluation on probabilistic databases.

*The VLDB Journal*,*16*(4), 523–544. - 5.
Biondi, F., Enescu, M., Heuser, A., Legay, A., Meel, K.S., Quilbeuf, J. (2018). Scalable approximation of quantitative information flow in programs. In

*Proceedings of VMCAI*. - 6.
Karger, D.R. (2001). A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem.

*SIAM Review*. - 7.
Valiant, L.G. (1979). The complexity of enumeration and reliability problems.

*SIAM Journal on Computing*,*8*(3), 410–421. - 8.
Karp, R.M., & Luby, M. (1983). Monte Carlo algorithms for enumeration and reliability problems. In

*Proceedings of FOCS*. - 9.
Karp, R.M., Luby, M., Madras, N. (1989). Monte Carlo approximation algorithms for enumeration problems.

*Journal of Algorithms*,*10*(3), 429–448. - 10.
Vazirani, V.V. (2013).

*Approximation algorithms*. Springer Science & Business Media. - 11.
Dagum, P., Karp, R., Luby, M., Ross, S. (2000). An optimal algorithm for Monte Carlo estimation.

*SIAM Journal on Computing*,*29*(5), 1484–1496. - 12.
Chakraborty, S., Meel, K.S., Vardi, M.Y. (2016). Algorithmic improvements in approximate counting for probabilistic inference: from linear to logarithmic SAT call. In

*Proceedings of IJCAI*. - 13.
Meel, K.S., Shrotri, A.A., Vardi, M.Y. (2017). On hashing-based approaches to approximate DNF-counting. In

*Proceedings of FSTTCS*. - 14.
Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. (2013). Taming the curse of dimensionality: discrete integration by hashing and optimization. In

*Proceedings of ICML*(pp. 334–342). - 15.
Meel, K.S. (2018). Constrained counting and sampling: bridging the gap between theory and practice. arXiv:1806.02239.

- 16.
Carter, J.L., & Wegman, M.N. (1977). Universal classes of hash functions. In

*Proceedings of STOC*(pp. 106–112). ACM. - 17.
Luby, M., & Veličković, B. (1996). On deterministic approximation of DNF.

*Algorithmica*,*16*(4), 415–433. - 18.
Trevisan, L. (2004). A note on approximate counting for k-DNF. In

*Approximation, randomization, and combinatorial optimization. Algorithms and techniques*(pp. 417–425). Springer. - 19.
Gopalan, P., Meka, R., Reingold, O. (2013). DNF sparsification and a faster deterministic counting algorithm.

*Computational Complexity*. - 20.
Ajtai, M., & Wigderson, A. (1985). Deterministic simulation of probabilistic constant depth circuits. In

*Proceedings of FOCS*(pp. 11–19). IEEE. - 21.
Nisan, N. (1991). Pseudorandom bits for constant depth circuits.

*Combinatorica*,*11*(1), 63–70. - 22.
De, A., Etesami, O., Trevisan, L., Tulsiani, M. (2010). Improved pseudorandom generators for depth 2 circuits. In

*Approximation, randomization, and combinatorial optimization. Algorithms and techniques*(pp. 504–517). Springer. - 23.
Olteanu, D., Huang, J., Koch, C. (2010). Approximate confidence computation in probabilistic databases. In

*ICDE*(pp. 145–156). IEEE. - 24.
Fink, R., & Olteanu, D. (2011). On the optimal approximation of queries using tractable propositional languages. In

*Proceedings of ICDT*. ACM. - 25.
Gatterbauer, W., & Suciu, D. (2014). Oblivious bounds on the probability of Boolean functions.

*ACM TODS*,*39*(1), 5. - 26.
Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T. (2004). SVM-based generalized multiple-instance learning via approximate box counting. In

*Proceedings of the twenty-first international conference on machine learning*(p. 101). ACM. - 27.
Babai, L. (1979). Monte-Carlo algorithms in graph isomorphism testing. Université tde Montréal Technical Report. DMS, pp 79–10.

- 28.
Motwani, R., & Raghavan, P. (2010). Randomized algorithms.

- 29.
Albrecht, M., & Bard, G. (2012). The M4RI Library – Version 20121224. http://m4ri.sagemath.org.

- 30.
Huang, J., Antova, L., Koch, C., Olteanu, D. (2009). MayBMS: a probabilistic database management system. In

*Proceedings of SIGMOD*. ACM. - 31.
TPC Benchmark H. http://www.tpc.org/.

- 32.
Mitchell, D., Selman, B., Levesque, H. (1992). Hard and easy distributions of SAT problems. In

*Proceedings of AAAI*(pp. 459–465). - 33.
Thurley, M. (2006). SharpSAT: counting models with advanced component caching and implicit BCP. In

*Proceedings of SAT*(pp. 424–429). - 34.
Chakraborty, S., Meel, K.S., Vardi, M.Y. (2013). A scalable approximate model counter. In

*Proceedings of CP*(pp. 200–216).

## Acknowledgements

The authors would like to thank anonymous reviewers for their insightful comments and suggestions. Moshe Y. Vardi and Aditya A. Shrotri’s work was supported in parts by NSF grant IIS-1527668, NSF Expeditions in Computing project “ExCAPE: Expeditions in Computer Augmented Program Engineering”. Kuldeep S. Meel’s work was supported in parts by NUS ODPRT Grant R-252-000-685-133, AI Singapore Grant R-252-000-A16-490, and Sung Kah Kay Assistant Professorship Fund.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author names are ordered alphabetically by last name and does not indicate contribution

## Appendix

### Appendix

For obtaining a concrete algorithm from the framework described in Algorithm 2, we need to instantiate the sub-procedures SampleHashFunction, GetLowerBound, GetUpperBound, EnumerateNextSol, ExtractSlice and ComputeIncrement for a particular counting problem. We now show how SymbolicDNFApproxMC [13], which uses Row Echelon XOR hash functions, and the concepts of Symbolic Hashing and Stochastic Cell-Counting, can be obtained through such instantiations. Then we prove that by substituting the BinarySearch procedure by ReverseSearch, the complexity of the resulting algorithm is improved by polylog factors.

### SampleHashFunction

One can directly invoke the procedure SampleBase described in Algorithm 4 of [13] with minor modifications. This is shown in Algorithm 7. Note that the hash function ** A**,

**,**

*b***so obtained belongs to the Row Echelon XOR family.**

*y*### Lower and upper bounds

As shown in [13], it suffices to search between \( {\mathsf {n}} - {\mathsf {w}} - \log {\mathsf {hiThresh}} \) and \( {\mathsf {n}} - {\mathsf {w}} + \log {\mathsf {m}} - \log {\mathsf {hiThresh}} \) hash constraints. Therefore the functions GetLowerBound and GetUpperBound return these values respectively.

### Extracting a prefix slice

Procedure ExtractSlice required for ReverseSearch is shown in Algorithm 8. If flip is false, ExtractSlice returns the result of the procedure Extract (described in [13]) directly. Otherwise, the *p*-th bit of *y*^{[y]} is negated before being passed to Extract.

### EnumerateNextSol

SymbolicDNFApproxMC enumerates solutions in the cell, in the order of a Gray code sequence, for better complexity. This is achieved by invoking the procedure enumREX (Algorithm 1 in [13]).

### ComputeIncrement

Procedure CheckSAT (Algorithm 10 adapted from [13]) can be used to compute the increments to *Y*_{cell} as shown in Algorithm 9. The assignment *s* is divided into a solution ** x** and a cube

*F*

^{i}using the same

*I*

*n*

*t*

*e*

*r*

*p*

*r*

*e*

*t*function used in line 7 of Algorithm 6 in [13]. CheckSAT samples a cube at random in line 3 and checks if the assignment

**satisfies it in line 5. The returned value follows the geometric distribution [9], and can be used to compute an accurate probabilistic estimate**

*x**Y*

_{cell}of the true number of solutions in the cell [13].

###
**Lemma 1**

*The complexity of*
*BSAT**is*
\(\mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}})\)
*.*

### Proof

*Y*_{cell} is incremented by *c*_{x}/*m* in line 5 of BSAT after a call to ComputeIncrement and CheckSAT. Since BSAT returns after *Y*_{cell} reaches *t**h**r**e**s**h**o**l**d*, the sum of *c*_{x} over all invocations of CheckSAT is *m* ⋅*t**h**r**e**s**h**o**l**d*. Every time *c*_{x} is incremented, the check in line 5 of CheckSAT is performed which takes \( \mathcal {O}(n) \) time. Moreover, EnumerateNextSol also takes \( \mathcal {O}(n) \) time as enumREX in [13] takes \( \mathcal {O}(n) \) time. As a result, the complexity of BSAT is \( \mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}}) \). \(\ \Box \)

###
**Lemma 2**

*The complexity of*
*ReverseSearch**is*
\(\mathcal {O}({\textsf {m}} \cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\)
*.*

### Proof

In ReverseSearch, BSAT is invoked with different thresholds (say *T*_{1},*T*_{2},*T*_{3}…) in each iteration of the for loop in line 9 (Algorithm 6) depending on the value of *Y*_{total}. As a result of the check in line 13, it follows that *T*_{1} + *T*_{2} + *T*_{3} + … = *h**i**T**h**r**e**s**h*. Therefore the complexity of all invocations of BSAT is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (T_{1} + T_{2} + T_{3} + \ldots )) = \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot {\mathsf {hiThresh}}) \). The complexity of ExtractSlice in line 12 is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) \) [13], and the loop in line 9 can be executed at most \(\mathcal {O}(\log \log {\mathsf {m}}) \) times. Therefore, the complexity of ReverseSearch is \(\mathcal {O}(\log \log {\mathsf {m}}\cdot ({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) + {\textsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\), which is \(\mathcal {O}({\mathsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\). \(\ \Box \)

We are now ready to prove Theorem 1.

### Proof

In Algorithm 2, ApproxMCCore is invoked \( \mathcal {O}(\log (1/\delta )) \) times, which in turn makes a call to ReverseSearch. The complexity of SampleHashFunction is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))) \) [13]. Since \(\mathsf {hiThresh} = \mathcal {O}(1/\varepsilon ^{2}) \), the complexity of Algorithm 2 is \( \mathcal {O}(\mathsf {m}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ) + \mathsf {n}(\log \mathsf {m} + \log (1/\varepsilon ^{2})) \), which is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ))\). \(\ \Box \)

## Rights and permissions

## About this article

### Cite this article

Meel, K.S., Shrotri, A.A. & Vardi, M.Y. Not all FPRASs are equal: demystifying FPRASs for DNF-counting.
*Constraints* **24, **211–233 (2019). https://doi.org/10.1007/s10601-018-9301-x

Published:

Issue Date:

### Keywords

- Model counting
- Hashing
- Disjunctive normal form
- Boolean formulas
- Fully polynomial randomized approximation scheme
- ApproxMC