The VLDB Journal

, Volume 16, Issue 4, pp 523–544 | Cite as

Efficient query evaluation on probabilistic databases

Regular Paper

Abstract

We describe a framework for supporting arbitrarily complex SQL queries with “uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is query evaluation. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. CIDR (2003)Google Scholar
  2. 2.
    Bacchus, F., Grove, A.J., Halpern, J.Y., Koller, D.: From statistical knowledge bases to degrees of belief. Artif. Intell. 87(1/2), 75–143 (1996)CrossRefGoogle Scholar
  3. 3.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)Google Scholar
  4. 4.
    Barbará, D., Garcia-Molina, H., Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)CrossRefGoogle Scholar
  5. 5.
    Cavallo, R., Pittarelli, M.: The theory of probabilistic databases. VLDB 71–81 (1987)Google Scholar
  6. 6.
    Chaudhuri, S., Das, G., Narasayya, V.: Dbexplorer: A system for keyword search over relational databases. In: Proceedings of the 18th International Conference on Data Engineering. San Jose, USA (2002)Google Scholar
  7. 7.
    Cheng, R., Kalashnikov, D.V., Prabhakar, S.: Evaluating probabilistic queries over imprecise data. SIGMOD 551–562 (2003)Google Scholar
  8. 8.
    Dey, D., Sarkar, S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)CrossRefGoogle Scholar
  9. 9.
    Eiter, T., Lu, J.J., Lukasiewicz, T., Subrahmanian, V.S.: Probabilistic object bases. ACM Trans. Database Syst. 26(3), 264–312 (2001)CrossRefGoogle Scholar
  10. 10.
    Fagin, R., Halpern, J.Y.: Reasoning about knowledge and probability. In: Theoretical Aspects of Reasoning about Knowledge, pp. 277–293. San Francisco (1988)Google Scholar
  11. 11.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. PODS 102–113 (2001)Google Scholar
  12. 12.
    Fuhr, N., Rolleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)CrossRefGoogle Scholar
  13. 13.
    Gradel, E., Gurevich, Y., Hirch, C.: The complexity of query reliability. PODS 227–234 (1998)Google Scholar
  14. 14.
    Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: Xrank: Ranked keyword search over xml documents. SIGMOD 16–27 (2003)Google Scholar
  15. 15.
    Hristidis, V., Papakonstantinou, Y.: Discover: Keyword search in relational databases. In: Proceedings of the 28th Internatinal Conference Very Large Data Bases, VLDB (2002)Google Scholar
  16. 16.
    Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval xml. ICDE (2003)Google Scholar
  17. 17.
    Hung, E., Getoor, L., Subrahmanian, V.S.: Pxml: A probabilistic semistructured data model and algebra. ICDE (2003)Google Scholar
  18. 18.
    Karp, R., Luby, M.: Monte-carlo algorithms for enumeration and reliability problems. STOC (1983)Google Scholar
  19. 19.
    Lakshmanan, L.V.S., Leone, N., Ross, R., Subrahmanian, V.S.: Probview: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)CrossRefGoogle Scholar
  20. 20.
    Motro, A.: Vague: a user interface to relational databases that permits vague queries. ACM Trans. Inf. Syst. 6(3), 187–214 (1988)CrossRefGoogle Scholar
  21. 21.
  22. 22.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  23. 23.
    Ng, R.T., Subrahmanian, V.S.: Probabilistic logic programming. Inf. Comput. 101(2), 150–201 (1992)MATHCrossRefGoogle Scholar
  24. 24.
    Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. VLDB (2002)Google Scholar
  25. 25.
    Nottelmann, H., Fuhr, N.: Combining DAML+OIL, XSLT and probabilistic logics for uncertain schema mappings in MIND. ECDL (2003)Google Scholar
  26. 26.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., ó San Francisco, CA, USA (1988)Google Scholar
  27. 27.
    Provan, J.S., Ball, M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)MATHCrossRefGoogle Scholar
  28. 28.
    Ross, R., Subrahmanian, V., Grant, J.: Aggregate operators in probabilistic databases. J. ACM 52(1), 54–101 (2005)CrossRefGoogle Scholar
  29. 29.
    Sadri, F.: Reliability of answers to queries in relational databases. TKDE 3(2), 245–251 (1991)Google Scholar
  30. 30.
    Sadri, F.: Aggregate operations in the information source tracking method. Theor. Comput. Sci. 133(2), 421–442 (1994)MATHCrossRefGoogle Scholar
  31. 31.
    Sadri, F.: Information source tracking method: Efficiency issues. TKDE 7(6), 947–954 (1995)Google Scholar
  32. 32.
    Sadri, F.: Integrity constraints in the information source tracking method. IEEE Transactions on Knowledge and Data Engineering 7(1), 106–119 (1995)CrossRefGoogle Scholar
  33. 33.
    Stoer, M., Wagner, F.: A simple min cut algorithm. Algorithms–ESA ‘94 pp. 141–147 (1994)Google Scholar
  34. 34.
    Theobald, A., Weikum, G.: The xxl search engine: ranked retrieval of xml data using indexes and ontologies. SIGMOD 615–615 (2002)Google Scholar
  35. 35.
    Ullman, J.D., Widom, J.: First Course in Database Systems, 2nd ed. Prentice Hall (1997)Google Scholar
  36. 36.
    Valiant, L.: The complexity of enumeration and reliability problems. SIAM J. Comput. 8, 410–421 (1979)MATHCrossRefGoogle Scholar
  37. 37.
    Wordnet 2.0: A lexical database for the english language: http://www.cogsci.princeton.edu/wn/ (2003)
  38. 38.
    Zimanyi, E.: Query evaluation in probabilistic databases. Theor. Comput. Sci. 171(1/2), 179–219 (1997)MATHCrossRefGoogle Scholar
  39. 39.
    Zobel, J., Dart, P.W.: Phonetic string matching: Lessons from information retrieval. In: Research and Development in Information Retrieval, pp. 166–172 (1996)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA

Personalised recommendations