The VLDB Journal

, 18:1117 | Cite as

Query evaluation over probabilistic XML

  • Benny Kimelfeld
  • Yuri Kosharovsky
  • Yehoshua Sagiv
Special Issue Paper

Abstract

Query evaluation over probabilistic XML is explored. The queries are twig patterns with projection, and the data is represented in terms of three models of probabilistic XML (that extend existing ones in the literature). The first model makes an assumption of independence among the probabilistic junctions, whereas the second model can encode probabilistic dependencies. The third model combines the first two and, hence, is the most general. An efficient algorithm (under data complexity) is given for query evaluation in the first model. In addition, various optimizations are proposed, and their effectiveness is shown both analytically and experimentally. For the other two models, it is shown that every query is either intractable or trivial. Nonetheless, efficient (additive and multiplicative) approximation algorithms are given for these two models. Finally, Boolean queries are enriched by allowing disjunctions and negations of branches. The above algorithm for the first model is extended to handle these queries. For the other two models, there is an efficient additive approximation, and a multiplicative one also exists if there is no negation; in addition, it is shown that if the query is non-monotonic, then no efficient multiplicative approximation exists unless NP = RP.

Keywords

Probabilistic databases Probabilistic XML Query processing Query optimization Approximate query evaluation 

References

  1. 1.
    Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceedings of the Thirty Third International Conference on Very Large Data Bases, pp. 27–38. ACM Press, New York (2007)Google Scholar
  2. 2.
    Kimelfeld, B., Kosharovski, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York (2008)Google Scholar
  3. 3.
    Pittarelli M.: An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2), 293–303 (1994)CrossRefGoogle Scholar
  4. 4.
    Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 886–895. IEEE Computer Society, USA (2007)Google Scholar
  5. 5.
    Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 293–302. ACM Press, New York (2007)Google Scholar
  6. 6.
    Dalvi N.N., Suciu D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)CrossRefGoogle Scholar
  7. 7.
    Dey D., Sarkar S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)CrossRefGoogle Scholar
  8. 8.
    Fuhr N., Rölleke T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)CrossRefGoogle Scholar
  9. 9.
    Lakshmanan L.V.S., Leone N., Ross R.B., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)CrossRefGoogle Scholar
  10. 10.
    Barbará D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)CrossRefGoogle Scholar
  11. 11.
    Nierman, A., Jagadish, H.V.: ProTDB: probabilistic data in XML. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 646–657. Morgan Kaufmann, San Francisco (2002)Google Scholar
  12. 12.
    Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Advances in Database Technology—EDBT 2006, 10th International Conference on Extending Database Technology, Lecture Notes in Computer Science, vol. 3896, pp. 1059–1068. Springer, Berlin (2006)Google Scholar
  13. 13.
    Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 283–292. ACM Press, New York (2007)Google Scholar
  14. 14.
    Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering, pp. 467–478 (2003)Google Scholar
  15. 15.
    Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. ACM Trans. Comput. Logic 8(4) (2007)Google Scholar
  16. 16.
    van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 459–470. IEEE Computer Society, USA (2005)Google Scholar
  17. 17.
    Li, T., Shao, Q., Chen, Y.: PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, pp. 848–849. ACM Press, New York (2006)Google Scholar
  18. 18.
    Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. (2009). doi: 10.1007/s00778-009-0146-1
  19. 19.
    Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM Press, New York (2002)Google Scholar
  20. 20.
    Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: 11th International Conference on Extending Database Technology, pp. 61–72. ACM Press, New York (2008)Google Scholar
  21. 21.
    Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321. ACM Press, New York (2002)Google Scholar
  22. 22.
    Cohen S., Kanza Y., Kogan Y.A., Sagiv Y., Nutt W., Serebrenik A.: EquiX—a search and query language for XML. J. Am. Soc. Inf. Sci. Technol. 53(6), 454–466 (2002)CrossRefGoogle Scholar
  23. 23.
    Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pp. 137–146. ACM Press, New York (1982)Google Scholar
  24. 24.
    Johnson D., Yannakakis M., Papadimitriou C.: On generating all maximal independent sets. Inf. Process. Lett. 27, 119–123 (1988)MATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pp. 77–90. ACM Press, New York (1977)Google Scholar
  26. 26.
    Papadimitriou C.H., Yannakakis M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407–427 (1999)MATHCrossRefMathSciNetGoogle Scholar
  27. 27.
    Downey R.G., Fellows M.R.: Fixed-parameter tractability and completeness. I. Basic results. SIAM J. Comput. 24(4), 873–921 (1995)MATHCrossRefMathSciNetGoogle Scholar
  28. 28.
    Toda S., Ogiwara M.: Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21(2), 316–328 (1992)MATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227–234. ACM Press, New York (1998)Google Scholar
  30. 30.
    Provan J.S., Ball M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)MATHCrossRefMathSciNetGoogle Scholar
  31. 31.
    Warren D.S.: Memoing for logic programs. Commun. ACM 35(3), 93–111 (1992)CrossRefGoogle Scholar
  32. 32.
    Karp R.M., Luby M., Madras N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429–448 (1989)MATHCrossRefMathSciNetGoogle Scholar
  33. 33.
    Ko K.I.: Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett. 14(1), 39–43 (1982)MATHCrossRefGoogle Scholar
  34. 34.
    Zachos S.: Probabilistic quantifiers and games. J. Comput. Syst. Sci. 36(3), 433–451 (1988)MATHCrossRefMathSciNetGoogle Scholar
  35. 35.
    Roth D.: On the hardness of approximate reasoning. Artif. Intell. 82(1–2), 273–302 (1996)CrossRefGoogle Scholar
  36. 36.
    Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 109–118. ACM Press, New York (2008)Google Scholar
  37. 37.
    Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: Proceedings of the Twenty-Eigthth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227–236. ACM (2009)Google Scholar
  38. 38.
    Meyer A.R.: Weak monadic second-order theory of successor is not elementary recursive. Log. Colloquim 453, 132–154 (1975)CrossRefGoogle Scholar
  39. 39.
    Frick, M., Grohe, M.: The complexity of first-order and monadic second-order logic revisited. In: LICS, pp. 215–224. IEEE Computer Society, USA (2002)Google Scholar

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Benny Kimelfeld
    • 1
  • Yuri Kosharovsky
    • 2
  • Yehoshua Sagiv
    • 2
  1. 1.IBM Almaden Research CenterSan JoseUSA
  2. 2.The Hebrew UniversityJerusalemIsrael

Personalised recommendations