Skip to main content
Log in

Query evaluation over probabilistic XML

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Query evaluation over probabilistic XML is explored. The queries are twig patterns with projection, and the data is represented in terms of three models of probabilistic XML (that extend existing ones in the literature). The first model makes an assumption of independence among the probabilistic junctions, whereas the second model can encode probabilistic dependencies. The third model combines the first two and, hence, is the most general. An efficient algorithm (under data complexity) is given for query evaluation in the first model. In addition, various optimizations are proposed, and their effectiveness is shown both analytically and experimentally. For the other two models, it is shown that every query is either intractable or trivial. Nonetheless, efficient (additive and multiplicative) approximation algorithms are given for these two models. Finally, Boolean queries are enriched by allowing disjunctions and negations of branches. The above algorithm for the first model is extended to handle these queries. For the other two models, there is an efficient additive approximation, and a multiplicative one also exists if there is no negation; in addition, it is shown that if the query is non-monotonic, then no efficient multiplicative approximation exists unless NP = RP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceedings of the Thirty Third International Conference on Very Large Data Bases, pp. 27–38. ACM Press, New York (2007)

  2. Kimelfeld, B., Kosharovski, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York (2008)

  3. Pittarelli M.: An algebra for probabilistic databases. IEEE Trans. Knowl. Data Eng. 6(2), 293–303 (1994)

    Article  Google Scholar 

  4. Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of the 23rd International Conference on Data Engineering, pp. 886–895. IEEE Computer Society, USA (2007)

  5. Dalvi, N.N., Suciu, D.: The dichotomy of conjunctive queries on probabilistic structures. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 293–302. ACM Press, New York (2007)

  6. Dalvi N.N., Suciu D.: Efficient query evaluation on probabilistic databases. VLDB J. 16(4), 523–544 (2007)

    Article  Google Scholar 

  7. Dey D., Sarkar S.: A probabilistic relational model and algebra. ACM Trans. Database Syst. 21(3), 339–369 (1996)

    Article  Google Scholar 

  8. Fuhr N., Rölleke T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)

    Article  Google Scholar 

  9. Lakshmanan L.V.S., Leone N., Ross R.B., Subrahmanian V.S.: ProbView: a flexible probabilistic database system. ACM Trans. Database Syst. 22(3), 419–469 (1997)

    Article  Google Scholar 

  10. Barbará D., Garcia-Molina H., Porter D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)

    Article  Google Scholar 

  11. Nierman, A., Jagadish, H.V.: ProTDB: probabilistic data in XML. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 646–657. Morgan Kaufmann, San Francisco (2002)

  12. Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Advances in Database Technology—EDBT 2006, 10th International Conference on Extending Database Technology, Lecture Notes in Computer Science, vol. 3896, pp. 1059–1068. Springer, Berlin (2006)

  13. Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 283–292. ACM Press, New York (2007)

  14. Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: a probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering, pp. 467–478 (2003)

  15. Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. ACM Trans. Comput. Logic 8(4) (2007)

  16. van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 459–470. IEEE Computer Society, USA (2005)

  17. Li, T., Shao, Q., Chen, Y.: PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, pp. 848–849. ACM Press, New York (2006)

  18. Abiteboul, S., Kimelfeld, B., Sagiv, Y., Senellart, P.: On the expressiveness of probabilistic XML models. VLDB J. (2009). doi:10.1007/s00778-009-0146-1

  19. Miklau, G., Suciu, D.: Containment and equivalence for an XPath fragment. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 65–76. ACM Press, New York (2002)

  20. Kimelfeld, B., Sagiv, Y.: Revisiting redundancy and minimization in an XPath fragment. In: 11th International Conference on Extending Database Technology, pp. 61–72. ACM Press, New York (2008)

  21. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 310–321. ACM Press, New York (2002)

  22. Cohen S., Kanza Y., Kogan Y.A., Sagiv Y., Nutt W., Serebrenik A.: EquiX—a search and query language for XML. J. Am. Soc. Inf. Sci. Technol. 53(6), 454–466 (2002)

    Article  Google Scholar 

  23. Vardi, M.Y.: The complexity of relational query languages (extended abstract). In: Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing, pp. 137–146. ACM Press, New York (1982)

  24. Johnson D., Yannakakis M., Papadimitriou C.: On generating all maximal independent sets. Inf. Process. Lett. 27, 119–123 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  25. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational data bases. In: Conference Record of the Ninth Annual ACM Symposium on Theory of Computing, pp. 77–90. ACM Press, New York (1977)

  26. Papadimitriou C.H., Yannakakis M.: On the complexity of database queries. J. Comput. Syst. Sci. 58(3), 407–427 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  27. Downey R.G., Fellows M.R.: Fixed-parameter tractability and completeness. I. Basic results. SIAM J. Comput. 24(4), 873–921 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  28. Toda S., Ogiwara M.: Counting classes are at least as hard as the polynomial-time hierarchy. SIAM J. Comput. 21(2), 316–328 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  29. Grädel, E., Gurevich, Y., Hirsch, C.: The complexity of query reliability. In: Proceedings of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227–234. ACM Press, New York (1998)

  30. Provan J.S., Ball M.O.: The complexity of counting cuts and of computing the probability that a graph is connected. SIAM J. Comput. 12(4), 777–788 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  31. Warren D.S.: Memoing for logic programs. Commun. ACM 35(3), 93–111 (1992)

    Article  Google Scholar 

  32. Karp R.M., Luby M., Madras N.: Monte-Carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429–448 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  33. Ko K.I.: Some observations on the probabilistic algorithms and NP-hard problems. Inf. Process. Lett. 14(1), 39–43 (1982)

    Article  MATH  Google Scholar 

  34. Zachos S.: Probabilistic quantifiers and games. J. Comput. Syst. Sci. 36(3), 433–451 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  35. Roth D.: On the hardness of approximate reasoning. Artif. Intell. 82(1–2), 273–302 (1996)

    Article  Google Scholar 

  36. Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 109–118. ACM Press, New York (2008)

  37. Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: Proceedings of the Twenty-Eigthth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 227–236. ACM (2009)

  38. Meyer A.R.: Weak monadic second-order theory of successor is not elementary recursive. Log. Colloquim 453, 132–154 (1975)

    Article  Google Scholar 

  39. Frick, M., Grohe, M.: The complexity of first-order and monadic second-order logic revisited. In: LICS, pp. 215–224. IEEE Computer Society, USA (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benny Kimelfeld.

Additional information

Some of the results described in this paper were reported in [1,2]. This research was supported by The Israel Science Foundation (Grant 893/05). Some of the work of Benny Kimelfeld was done while he was at The Hebrew University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kimelfeld, B., Kosharovsky, Y. & Sagiv, Y. Query evaluation over probabilistic XML. The VLDB Journal 18, 1117–1140 (2009). https://doi.org/10.1007/s00778-009-0150-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0150-5

Keywords

Navigation