Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Query Processing over Uncertain Data

  • Nilesh Dalvi
  • Dan Olteanu
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80689

Synonyms

Query processing over probabilistic data

Definition

An uncertain or probabilistic database is defined as a probability distribution over a set of deterministic database instances called possible worlds.

In the classical deterministic setting, the query processing problem is to compute the set of tuples representing the answer of a given query on a given database. In the probabilistic setting, this problem becomes the computation of all pairs (t, p), where the tuple t is in the query answer in some random world of the input probabilistic database with probability p.

Scientific Fundamentals

Representation of Uncertain Data

All aspects of query processing over uncertain data, and in particular its complexity and existing techniques, highly depend on data representation. Since it is prohibitively expensive to explicitly represent the extremely large set of all possible worlds of a probabilistic database, one has to settle for succinct data representations. Three such...

This is a preview of subscription content, log in to check access.

References

  1. 1.
    Amsterdamer Y, Deutch D, Tannen V. Provenance for aggregate queries. In: Proceedings of the 30th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2011. p. 153–64.Google Scholar
  2. 2.
    Beame P, Li J, Roy S, Suciu D. Counting of query expressions: limitations of propositional methods. In: Proceedings of the 17th International Conference on Database Theory; 2014. p. 177–88.Google Scholar
  3. 3.
    Benedikt M, Kharlamov E, Olteanu D, Senellart P. Probabilistic XML via Markov Chains. Proc VLDB Endow. 2010;3(1-2):770–81.CrossRefGoogle Scholar
  4. 4.
    Chen L, Lian X. Query processing over uncertain databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2012.Google Scholar
  5. 5.
    Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. VLDB J. 2007;16(4):523–44.CrossRefGoogle Scholar
  6. 6.
    Dalvi NN, Suciu D. Efficient query evaluation on probabilistic databases. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 864–75.CrossRefGoogle Scholar
  7. 7.
    Dalvi NN, Suciu D. The dichotomy of probabilistic inference for unions of onjunctive queries. J ACM. 2012;59(6):30:1–30:87. https://doi.org/10.1145/2395116.2395119.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W. Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2014. p. 601–10.Google Scholar
  9. 9.
    Dylla M, Miliaraki I, Theobald M. Top-k query processing in probabilistic databases with non-materialized views. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 122–33.Google Scholar
  10. 10.
    Fink R, Hogue A, Olteanu D, Rath S. SPROUT2: a squared query engine for uncertain web data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 1299–302.Google Scholar
  11. 11.
    Fink R, Huang J, Olteanu D. Anytime approximation in probabilistic databases. VLDB J. 2013;22(6): 823–48.CrossRefGoogle Scholar
  12. 12.
    Fink R, Olteanu D. Dichotomies for queries with negation in probabilistic databases. ACM Trans Database Syst. 2016;41(1):4.MathSciNetCrossRefGoogle Scholar
  13. 13.
    Fuhr N, Rölleke T. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans Information Syst. 1997;15(1):32–66.CrossRefGoogle Scholar
  14. 14.
    Gatterbauer W, Suciu D. Oblivious bounds on the probability of boolean functions. ACM Trans Database Syst. 2014;39(1):5.MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Grädel E, Gurevich Y, Hirsch C. The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1998. p. 227–34.Google Scholar
  16. 16.
    Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.Google Scholar
  17. 17.
    Huang J, Antova L, Koch C, Olteanu D. MayBMS: a probabilistic database management system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2009. p. 1071–74.Google Scholar
  18. 18.
    Ilyas IF, Soliman MA. Probabilistic ranking techniques in relational databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.zbMATHGoogle Scholar
  19. 19.
    Imieliński T, Lipski W Jr. Incomplete information in relational databases. J ACM. 1984;31(4):761–91.MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Jampani R, Xu F, Wu M, Perez LL, Jermaine C, Haas PJ. The Monte Carlo database system: stochastic analysis close to the data. ACM Trans Database Syst. 2011;36(3):18.CrossRefGoogle Scholar
  21. 21.
    Jha AK, Suciu D. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory Comput Syst. 2013;52(3): 403–40.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Kanagal B, Li J, Deshpande A. Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 841–52.Google Scholar
  23. 23.
    Karp RM, Luby M, Madras N. Monte-Carlo approximation algorithms for enumeration problems. J Algorithms. 1989;10(3):429–48.MathSciNetzbMATHCrossRefGoogle Scholar
  24. 24.
    Kimelfeld B, Senellart P. Probabilistic XML: models and complexity. In: Advances in Probability Database for Uncertain Information Management; 2013. p. 39–66.Google Scholar
  25. 25.
    Lian X, Chen L. Efficient query answering in probabilistic RDF graphs. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2011. p. 157–68.Google Scholar
  26. 26.
    Olteanu D, Huang J, Koch C. SPROUT: lazy vs. eager query plans for tuple-independent probabilistic databases. In: Proceedings of the 25th International Conference on Data Engineering; 2009. p. 640–51.Google Scholar
  27. 27.
    Olteanu D, Wen H. Ranking query answers in probabilistic databases: complexity and efficient algorithms. In: Proceedings of the 28th International Conference on Data Engineering; 2012. p. 282–93.Google Scholar
  28. 28.
    Ré C, Dalvi NN, Suciu D. Query evaluation on probabilistic databases. IEEE Data Eng Bull. 2006;29(1):25–31.Google Scholar
  29. 29.
    Ré C, Letchner J, Balazinksa M, Suciu D. Event queries on correlated probabilistic streams. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 715–28.Google Scholar
  30. 30.
    Ré C, Suciu D. The trichotomy of having queries on a probabilistic database. VLDB J. 2009;18(5): 1091–116.CrossRefGoogle Scholar
  31. 31.
    Sen P, Deshpande A, Getoor L. PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB J. 2009;18(5):1065–90.CrossRefGoogle Scholar
  32. 32.
    Singh S, Mayfield C, Mittal S, Prabhakar S, Hambrusch SE, Shah R. Orion 2.0: native support for uncertain data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1239–42.Google Scholar
  33. 33.
    Souihli A, Senellart P. Optimizing approximations of DNF query lineage in probabilistic XML. In: Proceedings of the 29th International Conference on Data Engineering; 2013. p. 721–32.Google Scholar
  34. 34.
    Suciu D, Olteanu D, Ré C, Koch C. Probabilistic databases. Synthesis lectures on data management. San Rafael: Morgan & Claypool Publishers; 2011.zbMATHGoogle Scholar
  35. 35.
    Vazirani VV. Approximation algorithms. Springer; 2001. ISBN:978-3-540-65367-7.Google Scholar
  36. 36.
    Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005. p. 262–76.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.AirbnbSan FranciscoUSA
  2. 2.Department of Computer ScienceUniversity of OxfordOxfordUK