Skip to main content
Log in

XML indexing and storage: fulfilling the wish list

  • Regular Paper
  • Published:
Computer Science - Research and Development

Abstract

XML Indexing and Storage (XMIS) techniques are crucial for the functionality and the overall performance of an XML database management system (XDBMS). Because of the complexity of XQuery and performance demands of XML query processing, efficient path processing operators—including those for tree-pattern queries (so-called twigs)—are urgently needed for which tailor-made indexes and their flexible use are indispensable. Although XML indexing and storage are standard problems and, of course, manifold approaches have been proposed in the last decade, adaptive and broad-enough solutions for satisfactory query evaluation support of all path processing operators are missing in the XDBMS context. Therefore, we think that it is worthwhile to take a step back and look at the complete picture to derive a salient and holistic solution. To do so, we first compile an XMIS wish list containing what—in our opinion—are essential functional storage and indexing requirements in a modern XDBMS. With these desiderata in mind, we then develop a new XMIS scheme, which—by reconsidering previous work—can be seen as a practical and general approach to XML storage and indexing. Interestingly, by working on both problems at the same time, we can make the storage and index managers live in a kind of symbiotic partnership, because the document store re-uses ideas originally proposed by the indexing community and vice versa. The XMIS scheme is implemented in XTC, an XDBMS used for empirical tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. To avoid an application-specific focus, these properties are not ranked. For a universal approach, all properties should have the same importance.

  2. All variations of document stores and index types are implemented using B*-trees. With code reuse for the base structures, the tree entries only differ in the representation of keys and values.

  3. In the following, we are only interested in the path up to the root for a given PCR. Therefore, the relative order among siblings is not relevant, e.g., all permutations of elements issn, name, publisher, and issue as children of journal may appear in the document.

  4. For uniprot, the structure-related saving of the pc and po formats consists of 465 and 731 Mbytes w.r.t. the naive format. Content compression would reduce the content part by 23–35%, in addition.

  5. Although we simplified the path specification for presentation purposes, their XPath equivalent is used to express twig queries as well by simply combining several path specifications (join) and, thereby, allowing their application for twig operators.

  6. Note, subscript D and type T are omitted where non-ambiguous.

References

  1. Arion A, Bonifati A, Manolescu I, Pugliese A (2008) Path summaries and path partitioning in modern XML databases. World Wide Web 11(1):117–151

    Article  Google Scholar 

  2. Balmin A, Özcan F, Beyer KS, Chochrane RJ, Pirahesh H (2004) A framework for using materialized XPath views in XML query processing. In: Proc VLDB, pp 60–71

    Google Scholar 

  3. Beyer K, Cochrane R, Josifovski V, Kleewein J, Lapis G, Lohman GM, Lyle R, Özcan F, Pirahesh H, Seemann N, Truong TC, Van der Linden B, Vickery B, Zhang C, System RX (2005) One part relational, one part XML. In: Proc SIGMOD, pp 358–374

    Google Scholar 

  4. Boncz P, Grust T, van Keulen M, Manegold S, Rittinger J, Teubner J (2006) MonetDB/XQuery: a fast XQuery processor powered by a relational engine. In: Proc SIGMOD, pp 479–490

    Google Scholar 

  5. Bruno N, Koudas N, Srivastava D (2002) Holistic twig joins: optimal XML pattern matching. In: Proc SIGMOD, pp 310–321

    Google Scholar 

  6. Chen Q, Lim A, Ong KW (2003) D(k)-index: an adaptive structural summary for graph-structured data. In: Proc SIGMOD, pp 134–144

    Google Scholar 

  7. Draper D, Frankhauser P, Fernandéz M, Malhotra A, Rose K, Rys M, Siméon J, Wadler P (2004) XQuery 1.0 and XPath 2.0 formal semantics

  8. Fomichev A, Grinev M, Kuznetsov S (2006) Sedna: a native XML DBMS. In: Proc SOFSEM, pp 272–281

    Google Scholar 

  9. Goldman R, Widom J (1997) DataGuides: enabling query formulation and optimization in semistructured databases. In: Proc VLDB, pp 436–445

    Google Scholar 

  10. Graefe G, Larson P-A (2001) B-tree indexes and CPU caches. In: Proc ICDE, pp 349–358

    Google Scholar 

  11. Grust T, van Keulen M, Teubner J (2003) Staircase join: teach a relational DBMS to watch its (axis) steps. In: Proc VLDB, pp 524–525

    Google Scholar 

  12. Härder T, Haustein MP, Mathis C, Wagner M (2007) Node labeling schemes for dynamic XML documents reconsidered. Data Knowl Eng 60(1):126–149

    Article  Google Scholar 

  13. Härder T, Mathis C, Schmidt K (2007) Comparison of complete and elementless native storage of XML documents. In: Proc IDEAS, pp 102–113

    Google Scholar 

  14. Haustein MP, Härder T, Mathis C, Wagner M (2005) DeweyIDs—the key to fine-grained management of XML documents. In: Proc 20th Brazilian symposium on databases, pp 85–99

    Google Scholar 

  15. Haustein MP, Härder T (2007) An efficient infrastructure for native transactional XML processing. Data Knowl Eng 61(3):500–523

    Article  Google Scholar 

  16. Jiang H, Wang W, Lu H, Yu Xu J (2003) Holistic twig joins on indexed XML documents. In: Proc VLDB, pp 273–284

    Google Scholar 

  17. Kaushik R, Bohannon P, Naughton JF, Korth HF (2002) Covering indexes for branching path queries. In: Proc SIGMOD, pp 133–144

    Google Scholar 

  18. Kaushik R, Shenoy P, Bohannon P, Gudes E (2002) Exploiting local similarity for indexing paths in graph-structured data. In: Proc ICDE, pp 129–140

    Google Scholar 

  19. Kaushik R, Krishnamurthy R, Naughton JF, Ramakrishnan R (2004) On the integration of structure indexes and inverted lists. In: Proc SIGMOD, pp 779–790

    Chapter  Google Scholar 

  20. Li H-G, Aghili SA, Agrawal D, El Abbadi A (2006) FLUX: content-and-structure matching of XPath queries with range predicates. In: Proc XSym. LNCS, vol 4156, pp 61–76

    Google Scholar 

  21. Mathis C (2009) Storing, indexing, and querying XML documents in native XML database management systems. PhD thesis, Verlag Dr Hut

  22. Mathis C, Härder T, Schmidt K (2009) Storing and indexing XML documents upside down. Comput Sci Res Dev 24(1–2):51–68

    Article  Google Scholar 

  23. McHugh J, Abiteboul S (1997) Lore: a database management system for semistructured data. SIGMOD Rec 26:54–66

    Article  Google Scholar 

  24. Meier W (2002) eXist: an open source native xml database. Proc Web, Web-services, and database systems. Lect Notes Comput Sci 2593:169–183

    Article  Google Scholar 

  25. Miklau G. XML data repository. www.cs.washington.edu/research/xmldatasets

  26. Milo T, Suciu D (1999) Index structures for path expressions. In: Proc ICDT, pp 277–295

    Google Scholar 

  27. O’Neil PE, Pal S, Cseri I, Schaller G, Westbury N (2004) ORDPATHs: insert-friendly XML node labels. In: Proc SIGMOD, pp 903–908

    Chapter  Google Scholar 

  28. Prakash S, Bhowmick SS, Madria S (2006) Efficient recursive XML query processing using relational database systems. Data Knowl Eng 58(3):207–242

    Article  Google Scholar 

  29. Prasad KH, Kumar PS (2005) Efficient indexing and querying of XML data using modified prüfer sequences. In: Proc CIKM, pp 397–404

    Google Scholar 

  30. Sample N, Cooper BF, Franklin MJ, Hjaltason GR, Shadmon M, Cohen L (2002) Managing complex and varied data with the IndexFabric(tm). In: Proc ICDE, pp 492–493

    Google Scholar 

  31. Schmidt AR, Waas F, Kersten ML, Carey MJ, Manolescu I, Busse R (2002) XMark: a benchmark for XML data management. In: Proc VLDB, pp 974–985

    Google Scholar 

  32. Document Object Model (DOM) Level 3 core specification, W3C recommendations (Jan 2004)

  33. Brownell D (2002) SAX2. O’Reilly Media

  34. Wang H, Park S, Fan W, PS Yu (2003) ViST: a dynamic index method for querying XML data by tree structures. In: Proc SIGMOD, pp 110–121

    Google Scholar 

  35. Wang W, Jiang H, Wang H, Lin X, Lu H, Li J (2005) Efficient processing of XML path queries using the disk-based F&B-index. In: Proc VLDB, pp 145–165

    Google Scholar 

  36. XQuery 1.0 (2007) An XML query language. W3C recommendation (Jan 2007)

  37. XQuery Update Facility 1.0 (2011) W3C recommendation (17 March 2011)

  38. Yoshikawa M, Amagasa T, Shimura T, Uemura S (2001) XRel: a path-based approach to storage and retrieval of XML documents using relational databases. ACM TOIT 1(1):110–141

    Article  Google Scholar 

  39. Zhang N, Kacholia V, Özsu T (2004) A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc ICDE, pp 54–63

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Theo Härder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mathis, C., Härder, T., Schmidt, K. et al. XML indexing and storage: fulfilling the wish list. Comput Sci Res Dev 30, 51–68 (2015). https://doi.org/10.1007/s00450-012-0204-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00450-012-0204-6

Keywords

Navigation