Not all FPRASs are equal: demystifying FPRASs for DNF-counting

Abstract

The problem of counting the number of solutions of a DNF formula, also called #DNF, is a fundamental problem in artificial intelligence with applications in diverse domains ranging from network reliability to probabilistic databases. Owing to the intractability of the exact variant, efforts have focused on the design of approximate techniques for #DNF. Consequently, several Fully Polynomial Randomized Approximation Schemes (FPRASs) based on Monte Carlo techniques have been proposed. Recently, it was discovered that hashing-based techniques too lend themselves to FPRASs for #DNF. Despite significant improvements, the complexity of the hashing-based FPRAS is still worse than that of the best Monte Carlo FPRAS by polylog factors. Two questions were left unanswered in previous works: Can the complexity of the hashing-based techniques be improved? How do the various approaches stack up against each other empirically? In this paper, we first propose a new search procedure for the hashing-based FPRAS that removes the polylog factors from its time complexity. We then present the first empirical study of runtime behavior of different FPRASs for #DNF. The result of our study produces a nuanced picture. First of all, we observe that there is no single best algorithm that outperforms all others for all classes of formulas and input parameters. Second, we observe that the algorithm with one of the worst time complexities solves the largest number of benchmarks.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Notes

  1. 1.

    Note that \({\mathcal {A}}\) is typically represented implicitly such as using constraints in DNF in the context of this paper.

  2. 2.

    Code and results can be accessed at https://gitlab.com/Shrotri/DNF_Counting

  3. 3.

    Figures are best viewed online in color.

References

  1. 1.

    Dueñas-Osorio, L., Meel, K.S., Paredes, R., Vardi, M.Y. (2017). Counting-based reliability estimation for power-transmission grids. In Proceedings of AAAI conference on artificial intelligence (AAAI).

  2. 2.

    Bacchus, F., Dalmao, S., Pitassi, T. (2003). Algorithms and complexity results for #SAT and Bayesian inference, In Proceedings of FOCS (pp. 340–351) ISBN: 0-7695-2040-5. http://dl.acm.org/citation.cfm?id=946243.946291.

  3. 3.

    Sang, T., Beame, P., Kautz, H. (2005). Performing Bayesian inference by weighted model counting. In Prof. of AAAI (pp. 475–481).

  4. 4.

    Dalvi, N., & Suciu, D. (2007). Efficient query evaluation on probabilistic databases. The VLDB Journal, 16(4), 523–544.

    Article  Google Scholar 

  5. 5.

    Biondi, F., Enescu, M., Heuser, A., Legay, A., Meel, K.S., Quilbeuf, J. (2018). Scalable approximation of quantitative information flow in programs. In Proceedings of VMCAI.

  6. 6.

    Karger, D.R. (2001). A randomized fully polynomial time approximation scheme for the all-terminal network reliability problem. SIAM Review.

  7. 7.

    Valiant, L.G. (1979). The complexity of enumeration and reliability problems. SIAM Journal on Computing, 8(3), 410–421.

    MathSciNet  Article  Google Scholar 

  8. 8.

    Karp, R.M., & Luby, M. (1983). Monte Carlo algorithms for enumeration and reliability problems. In Proceedings of FOCS.

  9. 9.

    Karp, R.M., Luby, M., Madras, N. (1989). Monte Carlo approximation algorithms for enumeration problems. Journal of Algorithms, 10(3), 429–448.

    MathSciNet  Article  Google Scholar 

  10. 10.

    Vazirani, V.V. (2013). Approximation algorithms. Springer Science & Business Media.

  11. 11.

    Dagum, P., Karp, R., Luby, M., Ross, S. (2000). An optimal algorithm for Monte Carlo estimation. SIAM Journal on Computing, 29(5), 1484–1496.

    MathSciNet  Article  Google Scholar 

  12. 12.

    Chakraborty, S., Meel, K.S., Vardi, M.Y. (2016). Algorithmic improvements in approximate counting for probabilistic inference: from linear to logarithmic SAT call. In Proceedings of IJCAI.

  13. 13.

    Meel, K.S., Shrotri, A.A., Vardi, M.Y. (2017). On hashing-based approaches to approximate DNF-counting. In Proceedings of FSTTCS.

  14. 14.

    Ermon, S., Gomes, C.P., Sabharwal, A., Selman, B. (2013). Taming the curse of dimensionality: discrete integration by hashing and optimization. In Proceedings of ICML (pp. 334–342).

  15. 15.

    Meel, K.S. (2018). Constrained counting and sampling: bridging the gap between theory and practice. arXiv:1806.02239.

  16. 16.

    Carter, J.L., & Wegman, M.N. (1977). Universal classes of hash functions. In Proceedings of STOC (pp. 106–112). ACM.

  17. 17.

    Luby, M., & Veličković, B. (1996). On deterministic approximation of DNF. Algorithmica, 16(4), 415–433.

    MathSciNet  Article  Google Scholar 

  18. 18.

    Trevisan, L. (2004). A note on approximate counting for k-DNF. In Approximation, randomization, and combinatorial optimization. Algorithms and techniques (pp. 417–425). Springer.

  19. 19.

    Gopalan, P., Meka, R., Reingold, O. (2013). DNF sparsification and a faster deterministic counting algorithm. Computational Complexity.

  20. 20.

    Ajtai, M., & Wigderson, A. (1985). Deterministic simulation of probabilistic constant depth circuits. In Proceedings of FOCS (pp. 11–19). IEEE.

  21. 21.

    Nisan, N. (1991). Pseudorandom bits for constant depth circuits. Combinatorica, 11(1), 63–70.

    MathSciNet  Article  Google Scholar 

  22. 22.

    De, A., Etesami, O., Trevisan, L., Tulsiani, M. (2010). Improved pseudorandom generators for depth 2 circuits. In Approximation, randomization, and combinatorial optimization. Algorithms and techniques (pp. 504–517). Springer.

  23. 23.

    Olteanu, D., Huang, J., Koch, C. (2010). Approximate confidence computation in probabilistic databases. In ICDE (pp. 145–156). IEEE.

  24. 24.

    Fink, R., & Olteanu, D. (2011). On the optimal approximation of queries using tractable propositional languages. In Proceedings of ICDT. ACM.

  25. 25.

    Gatterbauer, W., & Suciu, D. (2014). Oblivious bounds on the probability of Boolean functions. ACM TODS, 39(1), 5.

    MathSciNet  Article  Google Scholar 

  26. 26.

    Tao, Q., Scott, S., Vinodchandran, N.V., Osugi, T.T. (2004). SVM-based generalized multiple-instance learning via approximate box counting. In Proceedings of the twenty-first international conference on machine learning (p. 101). ACM.

  27. 27.

    Babai, L. (1979). Monte-Carlo algorithms in graph isomorphism testing. Université tde Montréal Technical Report. DMS, pp 79–10.

  28. 28.

    Motwani, R., & Raghavan, P. (2010). Randomized algorithms.

  29. 29.

    Albrecht, M., & Bard, G. (2012). The M4RI Library – Version 20121224. http://m4ri.sagemath.org.

  30. 30.

    Huang, J., Antova, L., Koch, C., Olteanu, D. (2009). MayBMS: a probabilistic database management system. In Proceedings of SIGMOD. ACM.

  31. 31.

    TPC Benchmark H. http://www.tpc.org/.

  32. 32.

    Mitchell, D., Selman, B., Levesque, H. (1992). Hard and easy distributions of SAT problems. In Proceedings of AAAI (pp. 459–465).

  33. 33.

    Thurley, M. (2006). SharpSAT: counting models with advanced component caching and implicit BCP. In Proceedings of SAT (pp. 424–429).

    Google Scholar 

  34. 34.

    Chakraborty, S., Meel, K.S., Vardi, M.Y. (2013). A scalable approximate model counter. In Proceedings of CP (pp. 200–216).

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank anonymous reviewers for their insightful comments and suggestions. Moshe Y. Vardi and Aditya A. Shrotri’s work was supported in parts by NSF grant IIS-1527668, NSF Expeditions in Computing project “ExCAPE: Expeditions in Computer Augmented Program Engineering”. Kuldeep S. Meel’s work was supported in parts by NUS ODPRT Grant R-252-000-685-133, AI Singapore Grant R-252-000-A16-490, and Sung Kah Kay Assistant Professorship Fund.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Aditya A. Shrotri.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author names are ordered alphabetically by last name and does not indicate contribution

Appendix

Appendix

For obtaining a concrete algorithm from the framework described in Algorithm 2, we need to instantiate the sub-procedures SampleHashFunction, GetLowerBound, GetUpperBound, EnumerateNextSol, ExtractSlice and ComputeIncrement for a particular counting problem. We now show how SymbolicDNFApproxMC [13], which uses Row Echelon XOR hash functions, and the concepts of Symbolic Hashing and Stochastic Cell-Counting, can be obtained through such instantiations. Then we prove that by substituting the BinarySearch procedure by ReverseSearch, the complexity of the resulting algorithm is improved by polylog factors.

SampleHashFunction

One can directly invoke the procedure SampleBase described in Algorithm 4 of [13] with minor modifications. This is shown in Algorithm 7. Note that the hash function A,b,y so obtained belongs to the Row Echelon XOR family.

figureg

Lower and upper bounds

As shown in [13], it suffices to search between \( {\mathsf {n}} - {\mathsf {w}} - \log {\mathsf {hiThresh}} \) and \( {\mathsf {n}} - {\mathsf {w}} + \log {\mathsf {m}} - \log {\mathsf {hiThresh}} \) hash constraints. Therefore the functions GetLowerBound and GetUpperBound return these values respectively.

Extracting a prefix slice

Procedure ExtractSlice required for ReverseSearch is shown in Algorithm 8. If flip is false, ExtractSlice returns the result of the procedure Extract (described in [13]) directly. Otherwise, the p-th bit of y[y] is negated before being passed to Extract.

figureh

EnumerateNextSol

SymbolicDNFApproxMC enumerates solutions in the cell, in the order of a Gray code sequence, for better complexity. This is achieved by invoking the procedure enumREX (Algorithm 1 in [13]).

ComputeIncrement

Procedure CheckSAT (Algorithm 10 adapted from [13]) can be used to compute the increments to Ycell as shown in Algorithm 9. The assignment s is divided into a solution x and a cube Fi using the same Interpret function used in line 7 of Algorithm 6 in [13]. CheckSAT samples a cube at random in line 3 and checks if the assignment x satisfies it in line 5. The returned value follows the geometric distribution [9], and can be used to compute an accurate probabilistic estimate Ycell of the true number of solutions in the cell [13].

figurei
figurej

Lemma 1

The complexity of BSAT is \(\mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}})\) .

Proof

Ycell is incremented by cx/m in line 5 of BSAT after a call to ComputeIncrement and CheckSAT. Since BSAT returns after Ycell reaches threshold, the sum of cx over all invocations of CheckSAT is mthreshold. Every time cx is incremented, the check in line 5 of CheckSAT is performed which takes \( \mathcal {O}(n) \) time. Moreover, EnumerateNextSol also takes \( \mathcal {O}(n) \) time as enumREX in [13] takes \( \mathcal {O}(n) \) time. As a result, the complexity of BSAT is \( \mathcal {O}({\mathsf {m}} \cdot {\mathsf {n}} \cdot {\mathsf {threshold}}) \). \(\ \Box \)

Lemma 2

The complexity of ReverseSearch is \(\mathcal {O}({\textsf {m}} \cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\) .

Proof

In ReverseSearch, BSAT is invoked with different thresholds (say T1,T2,T3…) in each iteration of the for loop in line 9 (Algorithm 6) depending on the value of Ytotal. As a result of the check in line 13, it follows that T1 + T2 + T3 + … = hiThresh. Therefore the complexity of all invocations of BSAT is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (T_{1} + T_{2} + T_{3} + \ldots )) = \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot {\mathsf {hiThresh}}) \). The complexity of ExtractSlice in line 12 is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) \) [13], and the loop in line 9 can be executed at most \(\mathcal {O}(\log \log {\mathsf {m}}) \) times. Therefore, the complexity of ReverseSearch is \(\mathcal {O}(\log \log {\mathsf {m}}\cdot ({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))^{2}) + {\textsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\), which is \(\mathcal {O}({\mathsf {m}}\cdot {\textsf {n}} \cdot {\textsf {hiThresh}})\). \(\ \Box \)

We are now ready to prove Theorem 1.

Proof

In Algorithm 2, ApproxMCCore is invoked \( \mathcal {O}(\log (1/\delta )) \) times, which in turn makes a call to ReverseSearch. The complexity of SampleHashFunction is \( \mathcal {O}({\mathsf {n}}(\log {\mathsf {m}} + \log (1/\varepsilon ^{2}))) \) [13]. Since \(\mathsf {hiThresh} = \mathcal {O}(1/\varepsilon ^{2}) \), the complexity of Algorithm 2 is \( \mathcal {O}(\mathsf {m}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ) + \mathsf {n}(\log \mathsf {m} + \log (1/\varepsilon ^{2})) \), which is \( \mathcal {O}({\mathsf {m}}\cdot {\mathsf {n}}\cdot (1/\varepsilon ^{2})\cdot \log (1/\delta ))\). \(\ \Box \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Meel, K.S., Shrotri, A.A. & Vardi, M.Y. Not all FPRASs are equal: demystifying FPRASs for DNF-counting. Constraints 24, 211–233 (2019). https://doi.org/10.1007/s10601-018-9301-x

Download citation

Keywords

  • Model counting
  • Hashing
  • Disjunctive normal form
  • Boolean formulas
  • Fully polynomial randomized approximation scheme
  • ApproxMC