Skip to main content

Querying and Learning in Probabilistic Databases

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8714))

Abstract

Probabilistic Databases (PDBs) lie at the expressive intersection of databases, first-order logic, and probability theory. PDBs employ logical deduction rules to process Select-Project-Join (SPJ) queries, which form the basis for a variety of declarative query languages such as Datalog, Relational Algebra, and SQL. They employ logical consistency constraints to resolve data inconsistencies, and they represent query answers via logical lineage formulas (aka.“data provenance”) to trace the dependencies between these answers and the input tuples that led to their derivation. While the literature on PDBs dates back to more than 25 years of research, only fairly recently the key role of lineage for establishing a closed and complete representation model of relational operations over this kind of probabilistic data was discovered. Although PDBs benefit from their efficient and scalable database infrastructures for data storage and indexing, they couple the data computation with probabilistic inference, the latter of which remains a #P-hard problem also in the context of PDBs.

In this chapter, we provide a review on the key concepts of PDBs with a particular focus on our own recent research results related to this field. We highlight a number of ongoing research challenges related to PDBs, and we keep referring to an information extraction (IE) scenario as a running application to manage uncertain and temporal facts obtained from IE techniques directly inside a PDB setting.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Herr, L., den Bussche, J.V.: Temporal Connectives versus Explicit Timestamps in Temporal Query Languages. In: Clifford, J., Tuzhilin, A. (eds.) Recent Advances in Temporal Databases, Proceedings of the International Workshop on Temporal Databases, Zürich, Switzerland, September 17-18. Workshops in Computing, pp. 43–57. Springer, New York (1995)

    Chapter  Google Scholar 

  2. Abiteboul, S., Hull, R., Vianu, V. (eds.): Foundations of Databases, 1st edn. Addison-Wesley, Boston (1995)

    MATH  Google Scholar 

  3. Abiteboul, S., Kanellakis, P., Grahne, G.: On the representation and querying of sets of possible worlds. SIGMOD Record 16(3), 34–48 (1987)

    Article  MATH  Google Scholar 

  4. Allen, J.F.: Maintaining knowledge about temporal intervals. Communications of the ACM 26(11), 832–843 (1983)

    Article  MATH  Google Scholar 

  5. Antova, L., Jansen, T., Koch, C., Olteanu, D.: Fast and Simple Relational Processing of Uncertain Data. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 983–992. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  6. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: A Nucleus for a Web of Open Data. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)

    Google Scholar 

  7. Benjelloun, O., Das Sarma, A., Halevy, A., Theobald, M., Widom, J.: Databases with uncertainty and lineage. VLDB Journal 17(2), 243–264 (2008)

    Article  Google Scholar 

  8. Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., Suciu, D.: MYSTIQ: a system for finding more answers by using probabilities. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 891–893. ACM, New York (2005)

    Google Scholar 

  9. Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  10. Ceri, S., Gottlob, G., Tanca, L.: What You Always Wanted to Know About Datalog (And Never Dared to Ask). IEEE Transactions on Knowledge and Data Engineering 1(1), 146–166 (1989)

    Article  Google Scholar 

  11. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)

    MATH  Google Scholar 

  12. Cui, Y., Widom, J., Wiener, J.L.: Tracing the Lineage of View Data in a Warehousing Environment. ACM Transactions on Database Systems 25(2), 179–227 (2000)

    Article  Google Scholar 

  13. Dalvi, N., Ré, C., Suciu, D.: Probabilistic Databases: Diamonds in the Dirt. Communications of the ACM 52(7), 86–94 (2009)

    Article  Google Scholar 

  14. Dalvi, N., Schnaitter, K., Suciu, D.: Computing Query Probability with Incidence Algebras. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 203–214. ACM, New York (2010)

    Chapter  Google Scholar 

  15. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDB Journal 16(4), 523–544 (2007)

    Article  Google Scholar 

  16. Dalvi, N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the Twenty-Sixth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 293–302. ACM, New York (2007)

    Chapter  Google Scholar 

  17. Dalvi, N., Suciu, D.: The Dichotomy of Probabilistic Inference for Unions of Conjunctive Queries. Journal of the ACM 59(6), 30:1–30:87 (2013)

    Google Scholar 

  18. Dickenstein, A., Emiris, I.Z.: Solving Polynomial Equations: Foundations, Algorithms, and Applications, 1st edn. Springer, Heidelberg (2010)

    MATH  Google Scholar 

  19. Dignös, A., Böhlen, M.H., Gamper, J.: Temporal alignment. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 433–444. ACM, New York (2012)

    Google Scholar 

  20. Dylla, M.: Efficient Querying and Learning in Probabilistic and Temporal Databases. Doctoral Dissertation. Saarland University (2014)

    Google Scholar 

  21. Dylla, M., Miliaraki, I., Theobald, M.: A Temporal-Probabilistic Database Model for Information Extraction. Proceedings of the VLDB Endowment 6(14), 1810–1821 (2013)

    Article  Google Scholar 

  22. Dylla, M., Theobald, M.: Learning Tuple Probabilities in Probabilistic Databases. Research Report MPI-I-2014-5-001, Max-Planck-Institut für Informatik, Stuhlsatzenhausweg 85, 66123 Saarbrücken, Germany (January 2014)

    Google Scholar 

  23. Dylla, M., Theobald, M., Miliaraki, I.: Top-k query processing in probabilistic databases with non-materialized views. In: Proceedings of the 29th International Conference on Data Engineering, ICDE, pp. 122–133. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  24. Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 102–113. ACM, New York (2001)

    Chapter  Google Scholar 

  25. Feller, W.: An introduction to probability theory and its applications, 3rd edn. Wiley, Hoboken (1968)

    MATH  Google Scholar 

  26. Fink, R., Olteanu, D.: On the optimal approximation of queries using tractable propositional languages. In: Proceedings of the 14th International Conference on Database Theory, ICDT, pp. 174–185. ACM, New York (2011)

    Chapter  Google Scholar 

  27. Fisher, M., Gabbay, D., Vila, L.: Handbook of Temporal Reasoning in Artificial Intelligence, 1st edn. Foundations of Artificial Intelligence. Elsevier, Essex (2005)

    MATH  Google Scholar 

  28. Fuhr, N., Rölleke, T.: A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems. ACM Transactions on Information Systems 15(1), 32–66 (1997)

    Article  Google Scholar 

  29. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York (1990)

    MATH  Google Scholar 

  30. Grädel, E., Gurevich, Y., Hirsch, C.: The Complexity of Query Reliability. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, pp. 227–234. ACM, New York (1998)

    Chapter  Google Scholar 

  31. Green, T.J., Tannen, V.: Models for Incomplete and Probabilistic Information. IEEE Data Engineering Bulletin 29(1), 17–24 (2006)

    Google Scholar 

  32. Widom, J., Garcia-Molina, H., Ullman, J.D.: Database systems: the complete book, 1st edn. Prentice-Hall, Upper Saddle River (2002)

    Google Scholar 

  33. Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence 194, 28–61 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Ilyas, I.F., Beskales, G., Soliman, M.A.: A Survey of Top-k Query Processing Techniques in Relational Database Systems. ACM Computing Surveys 40(4), 11:1–11:58 (2008)

    Google Scholar 

  35. Jensen, C.S.: Temporal Database Management. PhD thesis, Aalborg University, Aalborg, Denmark (April 2000)

    Google Scholar 

  36. Jha, A., Suciu, D.: Knowledge Compilation Meets Database Theory: Compiling Queries to Decision Diagrams. Theory of Computing Systems 52(3), 403–440 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  37. Kanagal, B., Deshpande, A.: Lineage processing over correlated probabilistic databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 675–686. ACM, New York (2010)

    Google Scholar 

  38. Kanagal, B., Li, J., Deshpande, A.: Sensitivity analysis and explanations for robust query evaluation in probabilistic databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 841–852. ACM, New York (2011)

    Google Scholar 

  39. Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. In: Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS, pp. 299–313. ACM, New York (1990)

    Chapter  Google Scholar 

  40. Khanna, S., Roy, S., Tannen, V.: Queries with Difference on Probabilistic Databases. Proceedings of the VLDB Endowment 4(11), 1051–1062 (2011)

    Google Scholar 

  41. Koch, C., Olteanu, D.: Conditioning probabilistic databases. Proceedings of the VLDB Endowment 1(1), 313–325 (2008)

    Article  Google Scholar 

  42. Nakashole, N., Weikum, G., Suchanek, F.: PATTY: A Taxonomy of Relational Patterns with Semantic Types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL, pp. 1135–1145. Association for Computational Linguistics, Stroudsburg (2012)

    Google Scholar 

  43. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  44. Ohrstrom, P.: Temporal Logic: From Ancient Ideas to Artificial Intelligence. Springer, Heidelberg (2009)

    MATH  Google Scholar 

  45. Olteanu, D., Huang, J.: Using OBDDs for Efficient Query Evaluation on Probabilistic Databases. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 326–340. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  46. Olteanu, D., Huang, J., Koch, C.: SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases. In: Proceedings of the 25th International Conference on Data Engineering, ICDE, pp. 640–651. IEEE Computer Society, Washington, DC (2009)

    Google Scholar 

  47. Olteanu, D., Huang, J., Koch, C.: Approximate confidence computation in probabilistic databases. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, pp. 145–156. IEEE Computer Society, Washington, DC (2010)

    Google Scholar 

  48. Olteanu, D., Wen, H.: Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms. In: Proceedings of the 28th International Conference on Data Engineering, ICDE, pp. 282–293. IEEE Computer Society, Washington, DC (2012)

    Google Scholar 

  49. Ré, C., Dalvi, N.N., Suciu, D.: Efficient Top-k Query Evaluation on Probabilistic Data. In: Proceedings of the 23rd International Conference on Data Engineering, ICDE, pp. 886–895. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  50. Ré, C., Suciu, D.: Approximate Lineage for Probabilistic Databases. Proceedings of the VLDB Endowment 1(1), 797–808 (2008)

    Article  Google Scholar 

  51. Ré, C., Suciu, D.: The trichotomy of HAVING queries on a probabilistic database. VLDB Journal 18(5), 1091–1116 (2009)

    Article  Google Scholar 

  52. Rekatsinas, T., Deshpande, A., Getoor, L.: Local Structure and Determinism in Probabilistic Databases. In: Proceedings of the International Conference on Management of Data, SIGMOD, pp. 373–384. ACM, New York (2012)

    Google Scholar 

  53. Sagiv, Y., Yannakakis, M.: Equivalences Among Relational Expressions with the Union and Difference Operators. Journal of the ACM 27(4), 633–655 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  54. Sarma, A.D., Theobald, M., Widom, J.: Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. In: Proceedings of the 24th International Conference on Data Engineering, ICDE, pp. 1023–1032. IEEE Computer Society, Washington, DC (2008)

    Google Scholar 

  55. Das Sarma, A., Theobald, M., Widom, J.: LIVE: a lineage-supported versioned DBMS. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 416–433. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  56. Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. VLDB Journal 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  57. Sen, P., Deshpande, A., Getoor, L.: Read-once Functions and Query Evaluation in Probabilistic Databases. In: Proceedings of the VLDB Endowment, vol. 3(1-2), pp. 1068–1079 (2010)

    Google Scholar 

  58. Shiryaev, A.N.: Probability, 2nd edn. Springer, Heidelberg (1995)

    MATH  Google Scholar 

  59. Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S.E., Shah, R.: Orion 2.0: native support for uncertain data. In: SIGMOD Conference, pp. 1239–1242 (2008)

    Google Scholar 

  60. Smullyan, R.M.: First-order logic, 1st edn. Springer, Heidelberg (1968)

    Book  MATH  Google Scholar 

  61. Stoyanovich, J., Davidson, S., Milo, T., Tannen, V.: Deriving Probabilistic Databases with Inference Ensembles. In: Proceedings of the 27th International Conference on Data Engineering, ICDE, pp. 303–314. IEEE Computer Society, Washington, DC (2011)

    Google Scholar 

  62. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW, pp. 697–706. ACM, New York (2007)

    Chapter  Google Scholar 

  63. Suciu, D., Olteanu, D., Christopher, R., Koch, C.: Probabilistic Databases, 1st edn. Morgan & Claypool, San Rafael (2011)

    MATH  Google Scholar 

  64. Tuzhilin, A., Clifford, J.: A Temporal Relational Algebra As Basis for Temporal Relational Completeness. In: Proceedings of the 16th International Conference on Very Large Data Bases, VLDB, pp. 13–23. Morgan Kaufmann Publishers, San Rafael (1990)

    Google Scholar 

  65. Sperschneider, G.A.V.: Logic: A Foundation for Computer Science, 1st edn. Addison-Wesley, Boston (1991)

    MATH  Google Scholar 

  66. Valiant, L.G.: The Complexity of Computing the Permanent. Theoretical Computer Science 8(2), 189–201 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  67. van Benthem, J.: The Logic of Time: A Model-Theoretic Investigation into the Varieties of Temporal Ontology and Temporal Discourse, 2nd edn. Kluwer Academic Publishers, Dordrecht (1991)

    Book  MATH  Google Scholar 

  68. Weikum, G., Theobald, M.: From Information to Knowledge: Harvesting Entities and Relationships from Web Sources. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pp. 65–76. ACM, New York (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Dylla, M., Theobald, M., Miliaraki, I. (2014). Querying and Learning in Probabilistic Databases. In: Koubarakis, M., et al. Reasoning Web. Reasoning on the Web in the Big Data Era. Reasoning Web 2014. Lecture Notes in Computer Science, vol 8714. Springer, Cham. https://doi.org/10.1007/978-3-319-10587-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10587-1_8

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10586-4

  • Online ISBN: 978-3-319-10587-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics