# Anytime approximation in probabilistic databases

Regular Paper

First Online:

Received:

Revised:

Accepted:

- 252 Downloads
- 9 Citations

## Abstract

This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.

### Keywords

Probabilistic databases Query evaluation Anytime approximation Model-based approximation## Notes

### Acknowledgments

We would like to thank the anonymous reviewers and Peter Haas for their insightful comments that helped improve this article. We also thank Christoph Koch and Swaroop Rath for their collaboration on earlier work on which this article is partially based. Jiewen Huang’s work was done while at Oxford.

## Supplementary material

778_2013_310_MOESM1_ESM.pdf (152 kb)

### References

- 1.Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J.
**18**(5), 1041–1064 (2009)CrossRefGoogle Scholar - 2.Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for aggregate queries. In: PODS, pp. 153–164 (2011)Google Scholar
- 3.Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)Google Scholar
- 4.Barcelo, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. In: PODS, pp. 249–260 (2012)Google Scholar
- 5.Birnbaum, E., Lozinskii, E.: The good old Davis-Putnam procedure helps counting models. J. AI Res.
**10**(6), 457–477 (1999)MathSciNetMATHGoogle Scholar - 6.Brayton, R.K.: Factoring logic functions. IBM J. Res. Dev.
**31**(2), 187 (1987)MathSciNetCrossRefMATHGoogle Scholar - 7.Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R. Jr., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)Google Scholar
- 8.Cormode, G., Garofalakis, M., Haas, P., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases
**4**(1–3), 1–294 (2012)Google Scholar - 9.Dagum, P., Karp, R.M., Luby, M., Ross, S.M.: An optimal algorithm for Monte Carlo Estimation. SIAM J. Comput.
**29**(5), 1484–1496 (2000)MathSciNetCrossRefMATHGoogle Scholar - 10.Dalvi, N., Schnaitter, K., Suciu, D.: Computing query probability with incidence algebras. In: PODS, pp. 203–214 (2010)Google Scholar
- 11.Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, pp. 864–875 (2004)Google Scholar
- 12.Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J.
**16**(4), 523–544 (2007)CrossRefGoogle Scholar - 13.Darwiche, A., Marquis, P.: A knowlege compilation map. J. AI Res.
**17**, 229–264 (2002)MathSciNetMATHGoogle Scholar - 14.Davis, M., Putnam, H.: A computing procedure for quantification theory. J. ACM
**7**(3), 201–215 (1960)MathSciNetCrossRefMATHGoogle Scholar - 15.Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE (2013, to appear)Google Scholar
- 16.Elbassioni, K., Makino, K., Rauf, I.: On the readability of monotone boolean formulae. In: COCOON, pp. 496–505 (2009)Google Scholar
- 17.Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB
**5**(5), 490–501 (2012)Google Scholar - 18.Fink, R., Hogue, A., Olteanu, D., Rath, S.: SPROUT\(^2\): a squared query engine for uncertain web data. In: SIGMOD, pp. 1299–1302 (2011)Google Scholar
- 19.Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)Google Scholar
- 20.Fink, R., Olteanu, D., Rath, S.: Providing support for full relational algebra queries in probabilistic databases. In: ICDE, pp. 315–326 (2011)Google Scholar
- 21.Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of
*NP*-Completeness. W.H. Freeman (1979)Google Scholar - 22.Gatterbauer, W., Jha, A.K., Suciu, D.: Dissociation and propagation for efficient query evaluation over probabilistic databases. TR UW-CSE-10-04-01, U. Washington (2010)Google Scholar
- 23.Golumbic, M., Mintza, A., Rotics, U.: Read-once functions revisited and the readability number of a Boolean function. In: International Colloquium on Graph Theory, pp. 357–361 (2005)Google Scholar
- 24.Gomes, C.P., Sabharwal, A., Selman, B.: Handbook of satisfiability, Chapter. Model Counting. IOS Press (2009)Google Scholar
- 25.Grädel, E., Gurevich, Y., Hirsch, C.: The Complexity of query reliability. In: PODS, pp. 227–234 (1998)Google Scholar
- 26.Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: VLDB 965–976 (2006)Google Scholar
- 27.Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a probabilistic database management system. In: SIGMOD, pp. 1071–1074 (2009)Google Scholar
- 28.Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM
**31**(4), 761–791 (1984)MathSciNetCrossRefMATHGoogle Scholar - 29.Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C.M., Haas, P.J.: MCDB: a Monte Carlo approach to managing uncertain data. In: SIGMOD, pp. 687–700 (2008)Google Scholar
- 30.Jha, A.K., Suciu, D.: Knowledge compilation meets database theory: compiling queries to decision diagrams. In: ICDT, pp. 162–173 (2011)Google Scholar
- 31.Johnson, D., Papadimitriou, C., Yannakakis, M.: On generating all maximal independent sets. Inf. Process. Lett.
**27**(3), 119–123 (1988)MathSciNetCrossRefMATHGoogle Scholar - 32.Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: SIGMOD, pp. 841–852 (2011)Google Scholar
- 33.Karp, R.M., Luby, M., Madras, N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms
**10**(3), 429–448 (1989)MathSciNetCrossRefMATHGoogle Scholar - 34.Koch, C.: Approximating predicates and expressive queries on probabilistic databases. In: PODS, pp. 99–108 (2008)Google Scholar
- 35.Koch, C., Olteanu, D.: Conditioning probabilistic databases. PVLDB
**1**(1), 313–325 (2008)Google Scholar - 36.Li, J., Deshpande, A.: Consensus answers for queries over probabilistic databases. In: PODS, pp. 259–268 (2009)Google Scholar
- 37.Meinel, C., Theobald, T.: Algorithms and Data Structures in VLSI Design. Springer, Berlin (1998)CrossRefMATHGoogle Scholar
- 38.Olteanu, D., Huang, J.: Using OBDDs for efficient query evaluation on probabilistic databases. In: SUM, pp. 326–340 (2008)Google Scholar
- 39.Olteanu, D., Huang, J.: Secondary-storage confidence computation for conjunctive queries with inequalities. In: SIGMOD, pp. 389–402 (2009)Google Scholar
- 40.Olteanu, D., Huang, J., Koch, C.: SPROUT: Lazy vs. Eager query plans for tuple-independent probabilistic databases. In: ICDE, pp. 640–651 (2009)Google Scholar
- 41.Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation for probabilistic databases. In: ICDE, pp. 145–156 (2010)Google Scholar
- 42.Olteanu, D., Koch, C., Antova, L.: World-set decompositions: expressiveness and efficient algorithms. Theor. Comput. Sci.
**403**(2–3), 265–284 (2008)MathSciNetCrossRefMATHGoogle Scholar - 43.Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)Google Scholar
- 44.Pe’er, J., Pinter, R.Y.: Minimal decomposition of boolean functions using non-repeating literal trees. In: IFIP Workshop on Logic and Architecture Synthesis (1995)Google Scholar
- 45.Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput.
**12**(4), 777–788 (1983) Google Scholar - 46.Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)Google Scholar
- 47.Ré, C., Suciu, D.: Approximate lineage for probabilistic databases. PVLDB
**1**(1), 797–808 (2008)Google Scholar - 48.Ré, C., Suciu, D.: The trichotomy of having queries on a probabilistic database. VLDB J.
**18**(5), 1091–1116 (2009)Google Scholar - 49.Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and difference operators. J. ACM
**27**(4), 633–655 (1980)MathSciNetCrossRefMATHGoogle Scholar - 50.Selman, B.: Knowledge compilation and theory approximation. J. ACM
**43**(2), 193–224 (1996)MathSciNetCrossRefMATHGoogle Scholar - 51.Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. PVLDB
**3**(1), 1068–1079 (2010)Google Scholar - 52.Souihli, A., Senellart, P.: Optimizing approximations of DNF query lineage in probabilistic XML. In: ICDE (2013, to appear)Google Scholar
- 53.Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Morgan & Claypool Publishers (2011)Google Scholar
- 54.Trevisan, L.: A note on deterministic approximate counting for k-DNF. In: APPROX-RANDOM, pp. 417–426 (2004)Google Scholar
- 55.Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput.
**6**(3), 505–517 (1977)MathSciNetCrossRefMATHGoogle Scholar - 56.Vadhan, S.: The complexity of counting in sparse, regular, and planar graphs. SIAM J. Comput.
**32**(2), 398–427 (2001)MathSciNetCrossRefGoogle Scholar - 57.Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)Google Scholar
- 58.Wang, T.Y., Ré, C., Suciu, D.: Implementing not exists predicates over a probabilistic database. In: QDB/MUD, pp. 73–86 (2008)Google Scholar
- 59.Wei, W., Selman, B.: A new approach to model counting. In: SAT, pp. 324–339 (2005)Google Scholar
- 60.Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res.
**33**, 452–473 (1977)Google Scholar

## Copyright information

© Springer-Verlag Berlin Heidelberg 2013