The VLDB Journal

, Volume 22, Issue 6, pp 823–848 | Cite as

Anytime approximation in probabilistic databases

Regular Paper


This article describes an approximation algorithm for computing the probability of propositional formulas over discrete random variables. It incrementally refines lower and upper bounds on the probability of the formulas until the desired absolute or relative error guarantee is reached. This algorithm is used by the SPROUT query engine to approximate the probabilities of results to relational algebra queries on expressive probabilistic databases.


Probabilistic databases Query evaluation Anytime approximation Model-based approximation 



We would like to thank the anonymous reviewers and Peter Haas for their insightful comments that helped improve this article. We also thank Christoph Koch and Swaroop Rath for their collaboration on earlier work on which this article is partially based. Jiewen Huang’s work was done while at Oxford.

Supplementary material

778_2013_310_MOESM1_ESM.pdf (152 kb)
Supplementary material 1 (PDF 152 KB)


  1. 1.
    Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. 18(5), 1041–1064 (2009)CrossRefGoogle Scholar
  2. 2.
    Amsterdamer, Y., Deutch, D., Tannen, V.: Provenance for aggregate queries. In: PODS, pp. 153–164 (2011)Google Scholar
  3. 3.
    Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and simple relational processing of uncertain data. In: ICDE, pp. 983–992 (2008)Google Scholar
  4. 4.
    Barcelo, P., Libkin, L., Romero, M.: Efficient approximations of conjunctive queries. In: PODS, pp. 249–260 (2012)Google Scholar
  5. 5.
    Birnbaum, E., Lozinskii, E.: The good old Davis-Putnam procedure helps counting models. J. AI Res. 10(6), 457–477 (1999)MathSciNetMATHGoogle Scholar
  6. 6.
    Brayton, R.K.: Factoring logic functions. IBM J. Res. Dev. 31(2), 187 (1987)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R. Jr., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)Google Scholar
  8. 8.
    Cormode, G., Garofalakis, M., Haas, P., Jermaine, C.: Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends Databases 4(1–3), 1–294 (2012)Google Scholar
  9. 9.
    Dagum, P., Karp, R.M., Luby, M., Ross, S.M.: An optimal algorithm for Monte Carlo Estimation. SIAM J. Comput. 29(5), 1484–1496 (2000)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Dalvi, N., Schnaitter, K., Suciu, D.: Computing query probability with incidence algebras. In: PODS, pp. 203–214 (2010)Google Scholar
  11. 11.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: VLDB, pp. 864–875 (2004)Google Scholar
  12. 12.
    Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)CrossRefGoogle Scholar
  13. 13.
    Darwiche, A., Marquis, P.: A knowlege compilation map. J. AI Res. 17, 229–264 (2002)MathSciNetMATHGoogle Scholar
  14. 14.
    Davis, M., Putnam, H.: A computing procedure for quantification theory. J. ACM 7(3), 201–215 (1960)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE (2013, to appear)Google Scholar
  16. 16.
    Elbassioni, K., Makino, K., Rauf, I.: On the readability of monotone boolean formulae. In: COCOON, pp. 496–505 (2009)Google Scholar
  17. 17.
    Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDB 5(5), 490–501 (2012)Google Scholar
  18. 18.
    Fink, R., Hogue, A., Olteanu, D., Rath, S.: SPROUT\(^2\): a squared query engine for uncertain web data. In: SIGMOD, pp. 1299–1302 (2011)Google Scholar
  19. 19.
    Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: ICDT, pp. 174–185 (2011)Google Scholar
  20. 20.
    Fink, R., Olteanu, D., Rath, S.: Providing support for full relational algebra queries in probabilistic databases. In: ICDE, pp. 315–326 (2011)Google Scholar
  21. 21.
    Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W.H. Freeman (1979)Google Scholar
  22. 22.
    Gatterbauer, W., Jha, A.K., Suciu, D.: Dissociation and propagation for efficient query evaluation over probabilistic databases. TR UW-CSE-10-04-01, U. Washington (2010)Google Scholar
  23. 23.
    Golumbic, M., Mintza, A., Rotics, U.: Read-once functions revisited and the readability number of a Boolean function. In: International Colloquium on Graph Theory, pp. 357–361 (2005)Google Scholar
  24. 24.
    Gomes, C.P., Sabharwal, A., Selman, B.: Handbook of satisfiability, Chapter. Model Counting. IOS Press (2009)Google Scholar
  25. 25.
    Grädel, E., Gurevich, Y., Hirsch, C.: The Complexity of query reliability. In: PODS, pp. 227–234 (1998)Google Scholar
  26. 26.
    Gupta, R., Sarawagi, S.: Creating probabilistic databases from information extraction models. In: VLDB 965–976 (2006)Google Scholar
  27. 27.
    Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a probabilistic database management system. In: SIGMOD, pp. 1071–1074 (2009)Google Scholar
  28. 28.
    Imielinski, T., Lipski, W.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)MathSciNetCrossRefMATHGoogle Scholar
  29. 29.
    Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C.M., Haas, P.J.: MCDB: a Monte Carlo approach to managing uncertain data. In: SIGMOD, pp. 687–700 (2008)Google Scholar
  30. 30.
    Jha, A.K., Suciu, D.: Knowledge compilation meets database theory: compiling queries to decision diagrams. In: ICDT, pp. 162–173 (2011)Google Scholar
  31. 31.
    Johnson, D., Papadimitriou, C., Yannakakis, M.: On generating all maximal independent sets. Inf. Process. Lett. 27(3), 119–123 (1988)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: SIGMOD, pp. 841–852 (2011)Google Scholar
  33. 33.
    Karp, R.M., Luby, M., Madras, N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429–448 (1989)MathSciNetCrossRefMATHGoogle Scholar
  34. 34.
    Koch, C.: Approximating predicates and expressive queries on probabilistic databases. In: PODS, pp. 99–108 (2008)Google Scholar
  35. 35.
    Koch, C., Olteanu, D.: Conditioning probabilistic databases. PVLDB 1(1), 313–325 (2008)Google Scholar
  36. 36.
    Li, J., Deshpande, A.: Consensus answers for queries over probabilistic databases. In: PODS, pp. 259–268 (2009)Google Scholar
  37. 37.
    Meinel, C., Theobald, T.: Algorithms and Data Structures in VLSI Design. Springer, Berlin (1998)CrossRefMATHGoogle Scholar
  38. 38.
    Olteanu, D., Huang, J.: Using OBDDs for efficient query evaluation on probabilistic databases. In: SUM, pp. 326–340 (2008)Google Scholar
  39. 39.
    Olteanu, D., Huang, J.: Secondary-storage confidence computation for conjunctive queries with inequalities. In: SIGMOD, pp. 389–402 (2009)Google Scholar
  40. 40.
    Olteanu, D., Huang, J., Koch, C.: SPROUT: Lazy vs. Eager query plans for tuple-independent probabilistic databases. In: ICDE, pp. 640–651 (2009)Google Scholar
  41. 41.
    Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation for probabilistic databases. In: ICDE, pp. 145–156 (2010)Google Scholar
  42. 42.
    Olteanu, D., Koch, C., Antova, L.: World-set decompositions: expressiveness and efficient algorithms. Theor. Comput. Sci. 403(2–3), 265–284 (2008)MathSciNetCrossRefMATHGoogle Scholar
  43. 43.
    Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: complexity and efficient algorithms. In: ICDE, pp. 282–293 (2012)Google Scholar
  44. 44.
    Pe’er, J., Pinter, R.Y.: Minimal decomposition of boolean functions using non-repeating literal trees. In: IFIP Workshop on Logic and Architecture Synthesis (1995)Google Scholar
  45. 45.
    Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983) Google Scholar
  46. 46.
    Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE, pp. 886–895 (2007)Google Scholar
  47. 47.
    Ré, C., Suciu, D.: Approximate lineage for probabilistic databases. PVLDB 1(1), 797–808 (2008)Google Scholar
  48. 48.
    Ré, C., Suciu, D.: The trichotomy of having queries on a probabilistic database. VLDB J. 18(5), 1091–1116 (2009)Google Scholar
  49. 49.
    Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and difference operators. J. ACM 27(4), 633–655 (1980)MathSciNetCrossRefMATHGoogle Scholar
  50. 50.
    Selman, B.: Knowledge compilation and theory approximation. J. ACM 43(2), 193–224 (1996)MathSciNetCrossRefMATHGoogle Scholar
  51. 51.
    Sen, P., Deshpande, A., Getoor, L.: Read-once functions and query evaluation in probabilistic databases. PVLDB 3(1), 1068–1079 (2010)Google Scholar
  52. 52.
    Souihli, A., Senellart, P.: Optimizing approximations of DNF query lineage in probabilistic XML. In: ICDE (2013, to appear)Google Scholar
  53. 53.
    Suciu, D., Olteanu, D., Ré, C., Koch, C.: Probabilistic databases. Morgan & Claypool Publishers (2011)Google Scholar
  54. 54.
    Trevisan, L.: A note on deterministic approximate counting for k-DNF. In: APPROX-RANDOM, pp. 417–426 (2004)Google Scholar
  55. 55.
    Tsukiyama, S., Ide, M., Ariyoshi, H., Shirakawa, I.: A new algorithm for generating all the maximal independent sets. SIAM J. Comput. 6(3), 505–517 (1977)MathSciNetCrossRefMATHGoogle Scholar
  56. 56.
    Vadhan, S.: The complexity of counting in sparse, regular, and planar graphs. SIAM J. Comput. 32(2), 398–427 (2001)MathSciNetCrossRefGoogle Scholar
  57. 57.
    Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)Google Scholar
  58. 58.
    Wang, T.Y., Ré, C., Suciu, D.: Implementing not exists predicates over a probabilistic database. In: QDB/MUD, pp. 73–86 (2008)Google Scholar
  59. 59.
    Wei, W., Selman, B.: A new approach to model counting. In: SAT, pp. 324–339 (2005)Google Scholar
  60. 60.
    Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.University of OxfordOxfordUK
  2. 2.Yale UniversityNew HavenUSA

Personalised recommendations