Skip to main content
Log in

On the expressiveness of probabilistic XML models

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic distribution. Particular families of p-documents are determined by the types of distributional nodes that can be used as well as by the structural constraints on the placement of those nodes in a p-document. Some of the resulting families provide natural extensions and combinations of previously studied probabilistic XML models. The focus of the paper is on the expressive power of families of p-documents. In particular, two main issues are studied. The first is the ability to (efficiently) translate a given p-document of one family into another family. The second is closure under updates, namely, the ability to (efficiently) represent the result of updating the instances of a p-document of a given family as another p-document of that family. For both issues, we distinguish two variants corresponding to value-based and object-based semantics of p-documents.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kimelfeld, B., Kosharovski, Y., Sagiv, Y.: Query efficiency in probabilistic XML models. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, New York (2008)

  2. Senellart, P., Abiteboul, S.: On the complexity of managing probabilistic XML data. In: Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 283–292. ACM Press, New York (2007)

  3. Nierman, A., Jagadish, H.V.: ProTDB: Probabilistic data in XML. In: VLDB 2002, Proceedings of 28th International Conference on Very Large Data Bases, pp. 646–657. Morgan Kaufmann, Menlo Park (2002)

  4. Hung, E., Getoor, L., Subrahmanian, V.S.: PXML: A probabilistic semistructured data model and algebra. In: Proceedings of the 19th International Conference on Data Engineering, pp. 467–478 (2003)

  5. Hung, E., Getoor, L., Subrahmanian, V.S.: Probabilistic interval XML. ACM Trans. Comput. Logic 8(4), (2007)

  6. van Keulen, M., de Keijzer, A., Alink, W.: A probabilistic XML approach to data integration. In: Proceedings of the 21st International Conference on Data Engineering, ICDE 2005, pp. 459–470. IEEE Computer Society, Washington, DC (2005)

  7. Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in XML. In: Advances in Database Technology—EDBT 2006, 10th International Conference on Extending Database Technology. Lecture Notes in Computer Science, vol. 3896, pp. 1059–1068. Springer, Berlin (2006)

  8. Senellart, P.: Comprendre le Web caché. Understanding the Hidden Web, vol. 11. Ph.D. thesis, Université Paris-Sud (2007)

  9. Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic XML. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 27–38. ACM Press, New York (2007)

  10. Kimelfeld, B., Kosharovsky, Y., Sagiv, Y.: Query evaluation over probabilistic XML. VLDB J. (2009)

  11. Li, T., Shao, Q., Chen, Y.: PEPX: a query-friendly probabilistic XML database. In: Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management, pp. 848–849. ACM Press, New York (2006)

  12. Dalvi, N.N., Suciu, D.: Management of probabilistic data: foundations and challenges. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 1–12. ACM Press, New York (2007)

  13. Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: CIDR 2005, 2nd Biennial Conference on Innovative Data Systems Research, pp. 262–276 (2005)

  14. Koch C.: MayBMS: A system for managing large uncertain and probabilistic databases. In: Aggarwal, C. (eds) Managing and Mining Uncertain Data, Springer, Berlin (2009)

    Google Scholar 

  15. Imieliński T., Lipski W. Jr.: Incomplete information in relational databases. J. ACM 31(4), 761–791 (1984)

    Article  MATH  Google Scholar 

  16. Green, T.J., Tannen, V.: Models for incomplete and probabilistic information. In: Current Trends in Database Technology—EDBT 2006, EDBT 2006 Workshops PhD, DataX, IIDB, IIHA, ICSNW, QLQP, PIM, PaRMA, and Reactivity on the Web. Lecture Notes in Computer Science, vol. 4254, pp. 278–296. Springer, Berlin (2006)

  17. XML::DB Initiative: XUpdate. http://xmldb-org.sourceforge.net/xupdate/ (2000). Working Draft

  18. W3C: XQuery Update facility. http://www.w3.org/TR/xquery-update-10/ (2008). Candidate Recommandation

  19. W3C: XML Path language (XPath). http://www.w3.org/TR/xpath (1999). Recommandation

  20. W3C: XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/ (2007). Recommandation

  21. Cohen, S., Kimelfeld, B., Sagiv, Y.: Incorporating constraints in probabilistic XML. In: Proceedings of the 27th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 109–118. ACM Press, New York (2008)

  22. Cohen, S., Kimelfeld, B., Sagiv, Y.: Running tree automata on probabilistic XML. In: Proceedings of the 28th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2009, to appear)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benny Kimelfeld.

Additional information

Some of the results described in this paper were reported in [1,2].

The work of Abiteboul and Senellart was supported by the Agence Nationale de la Recherche under grant Docflow O6-MDCA-005, and by the Webdam Grant of the European Research Council.

Some of the work of Benny Kimelfeld was done while he was at The Hebrew University.

The work of Kimelfeld and Sagiv was supported by The Israel Science Foundation (Grant 893/05).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abiteboul, S., Kimelfeld, B., Sagiv, Y. et al. On the expressiveness of probabilistic XML models. The VLDB Journal 18, 1041–1064 (2009). https://doi.org/10.1007/s00778-009-0146-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-009-0146-1

Keywords

Navigation