Skip to main content
Log in

Path Summaries and Path Partitioning in Modern XML Databases

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

XML path summaries are compact structures representing all the simple parent-child paths of an XML document. Such paths have also been used in many works as a basis for partitioning the document’s content in a persistent store, under the form of path indices or path tables. We revisit the notions of path summaries and path-driven storage model in the context of current-day XML databases. This context is characterized by complex queries, typically expressed in an XQuery subset, and by the presence of efficient encoding techniques such as structural node identifiers. We review a path summary’s many uses for query optimization, and given them a common basis, namely relevant paths. We discuss summary-based tree pattern minimization and present some efficient summary-based minimization heuristics. We consider relevant path computation and provide a time- and memory-efficient computation algorithm. We combine the principle of path partitioning with the presence of structural identifiers in a simple path-partitioned storage model, which allows for selective data access and efficient query plans. This model improves the efficiency of twig query processing up to two orders of magnitude over the similar tag-partitioned indexing model. We have implemented the path-partitioned storage model and path summaries in the XQueC compressed database prototype [8]. We present an experimental evaluation of a path summary’s practical feasibility and of tree pattern matching in a path-partitioned store.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aboulnaga, A., Alamendeen, A.R., Naughton, J.F.: Estimating the selectivity of XML path expressions for internet scale applications. In: VLDB (2001)

  2. Al-Khalifa, S., Jagadish, H.V., Patel, J.M., Wu, Y., Koudas, N., Srivastava, D.: Structural joins: A primitive for efficient XML query pattern matching. In: ICDE (2002)

  3. Amer-Yahia, S., Cho, S., Lakshmanan, L.: Minimization of tree pattern queries. In: SIGMOD (2001)

  4. Arion, A., Benzaken, V., Manolescu, I.: XML Access Modules: Towards Physical Data Independence in XML Databases. XIME-P Workshop (2005)

  5. Arion, A., Benzaken, V., Manolescu, I., Vijay, R.: ULoad: choosing the right storage for your XML application. In: VLDB (2005)

  6. Arion, A., Benzaken, V., Manolescu, I., Vijay, R.: Algebra-based tree pattern extraction in XQuery. In: FQAS Conference (2006)

  7. Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: XQueC: Pushing queries to compressed XML data (demo). In: VLDB (2003)

  8. Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: Efficient query evaluation over compressed XML data. In: EDBT (2004)

  9. Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: Path summaries and path partitioning in modern XML databases (poster). In: WWW (2006)

  10. Balmin, A., Ozcan, F., Beyer, K., Cochrane, R., Pirahesh, H.: A framework for using materialized XPath views in XML query processing. In: VLDB (2004)

  11. Barbosa, D., Barta, A., Mendelzon, A., Mihaila, G.: The Toronto XML engine. In: WIIW Workhshop (2001)

  12. Barta, A., Consens, M., Mendelzon, A.: Benefits of path summaries in an XML query optimizer supporting multiple access methods. In: VLDB (2005)

  13. Bruno, N., Koudas, N., Srivastava, D.: Holistic twig joins: optimal XML pattern matching. In: SIGMOD (2002)

  14. Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: VLDB (2003)

  15. Buneman P., Choi B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large XML repositories. In: ICDE, pp. 261–272 (2005)

  16. Chen, Z., Jagadish, H.V., Lakshmanan, L., Paparizos, S.: From tree patterns to generalized tree patterns: on efficient evaluation of XQuery. In: VLDB (2003)

  17. Chien, S., Vagena, Z., Zhang, D., Tsotras, V.: Efficient structural joins on indexed XML documents. In: VLDB (2002)

  18. Chung, C.W., Min, J.K., Shim, K.: APEX: an adaptive path index for XML data. In: SIGMOD (2002)

  19. Deutsch, A., Papakonstantinou, Y., Xu, Y.: The NEXT logical framework for XQuery. In: VLDB, pp. 168–179 (2004)

  20. Drukh, N., Polyzotis, N., Garofalakis, M.N., Matias, Y.: Fractional XSKETCH synopses for XML databases. In: XSym (2004)

  21. Fagin, R.: Multivalued dependencies and a new normal form for relational databases. ACM Trans. Database Syst. 2(3), 262–278 (1977)

    Article  MathSciNet  Google Scholar 

  22. Fiebig, T., Helmer, S., Kanne, C., Moerkotte, G., Neumann, J., Schiele, R., Westmann, T.: Anatomy of a native XML base management system. VLDB J. 11(4), 292–314 (2002)

    Article  MATH  Google Scholar 

  23. Florescu, D., Kossmann, D.: Storing and querying XML data using an RDMBS. In: IEEE D. Eng. Bull (1999)

  24. Goldman, R., Widom, J.: Dataguides: enabling query formulation and optimization in semistructured databases. In: VLDB. Athens, Greece (1997)

  25. Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS (2003)

  26. Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A.N., Tian, F., Viglas, S., Wang, Y., Naughton, J.F., DeWitt, D.J.: Mixed mode XML query processing. In: VLDB (2003)

  27. He, H., Yang, J.: Multiresolution Indexing of XML for Frequent Queries. In: ICDE (2004)

  28. Jiang, H., Lu, H., Wang, W., Yu, J.: Path materialization revisited: an efficient XML storage model. In: AICE (2001)

  29. Jagadish, H.V., Al-Khalifa, S., Chapman, A., Lakshmanan, L.V.S., Nierman, A., Paparizos, S., Patel, J., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: Timber: a native XML database. VLDB J. 11(4), (2002)

  30. Kaushik, R., Bohannon, P., Naughton, J., Korth, H.: Covering indexes for branching path queries. In: SIGMOD (2002)

  31. Kaushik, R., Shenoy, P., Bohannon, P., Gudes, E.: Exploiting local similarity for indexing paths in graph-structured data. In: ICDE (2002)

  32. Lakshmanan, L., Ramesh, G., Wang, H., Zhao, Z.: On testing satisfiability of tree pattern queries. In: VLDB (2004)

  33. Lee, M., Li, H., Hsu, W., Ooi, B.: A statistical approach for XML query size estimation. In: DataX workshop (2004)

  34. Manolescu, I., Arion, A., Bonifati, A., Pugliese, A.: Un modèle de stockage xml basé sur les séquences. Ing. Syst. Inf. 2(10), 9–37 (2005)

    Google Scholar 

  35. Manolescu, I., Benzaken, V., Arion, A., Papakonstantinou, Y.: Structured materialized views for XML queries. INRIA Tech. Report No. 1233, Available at http://hal.inria.fr.

  36. McHugh, J., Widom, J., Abiteboul, S., Luo, Q., Rajaraman, A.: Indexing semistructured data. Technical Report (1998)

  37. Mignet, L., Barbosa, D., Veltri, P.: The XML web: a first study. In: WWW Conference (2003)

  38. Miklau, G., Suciu, D.: Containment and equivalence for an xpath fragment. In: PODS, pp. 65–76 (2002)

  39. Milo, T., Suciu, D.: Index structures for path expressions. In: ICDT (1999)

  40. O’Neil, P., O’Neil, E., Pal, S., Cseri, I., Schaller, G., Westbury, N.: ORDPATHs: insert-friendly XML node labels. In: SIGMOD (2004)

  41. Nestorov, S., Ullman, J.D., Wiener, J.L., Chawathe, S.S.: Representative objects: concise represenations of semistructured, hierarchical data. In: ICDE (1997)

  42. Paparizos, S., Jagadish, H.V.: Pattern tree algebras: sets or sequences? In: VLDB (2005)

  43. Paparizos, S., Wu, Y., Lakshmanan, L., Jagadish, H.: Tree logical classes for the efficient evaluation of XQuery. In: SIGMOD (2004)

  44. Polyzotis, N., Garofalakis, M.N.: Statistical synopses for graph-structured XML databases. In: SIGMOD (2002)

  45. Polyzotis, N., Garofalakis, M.N.: Structure and value synopses for xml data graphs. In: VLDB (2002)

  46. Qun, C., Lim, A., Ong, K.W.: D(k)-Index: an adaptive structural summary for graph-structured data. In: SIGMOD (2003)

  47. Shanmugasundaram, J., Kiernan, J., Shekita, E., Fan, C., Funderburk, J.: Querying XML views of relational data. In: VLDB (2001)

  48. Tatarinov, I., Viglas, S., Beyer, K., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered XML using a relational database system. In: SIGMOD (2002)

  49. Teubner, J., Grust, T.,van Keulen, M.: Bridging the GAP between relational and native XML storage with staircase join. In: VLDB (2003)

  50. W3: The extensible markup language (XML). www.w3.org/TR/XML (2006)

  51. Schmidt, A.: The XMark benchmark. www.xml-benchmark.org (2002)

  52. Marchiori, M.: The XQuery 1.0 language. www.w3.org/XML/Query (2000)

  53. Ullman, J.: Principles of database and knowledge-base systems. Computer Science Press (1989)

  54. University of Washington’s XML repository www.cs.washington.edu/research/xmldatasets (2004)

  55. XSum: www-rocq.inria.fr/gemo/XSum (2005)

  56. Xu, W., Ozsoyoglu, M.: Rewriting XPath queries using materialized views. In: VLDB (2005)

  57. Yoshikawa, M., Amagasa, T., Uemura, T., Shimura, S.: XRel: a path-based approach to storage and retrieval of XML documents using RDBMSs. In: ACM TOIT (2001)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioana Manolescu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arion, A., Bonifati, A., Manolescu, I. et al. Path Summaries and Path Partitioning in Modern XML Databases. World Wide Web 11, 117–151 (2008). https://doi.org/10.1007/s11280-007-0036-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-007-0036-7

Keywords

Navigation