Skip to main content

Efficient Memory Representation of XML Documents

  • Conference paper
Database Programming Languages (DBPL 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3774))

Included in the following conference series:

Abstract

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: XQueC: Pushing queries to compressed XML data. In: Proc. VLDB, pp. 1065–1068 (2003)

    Google Scholar 

  2. Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large XML repositories. To appear in Proc. ICDE (2005)

    Google Scholar 

  3. Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Proc. VLDB, pp. 141–152 (2003)

    Google Scholar 

  4. Charikar, M., et al.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: Proc. STOC 2002, pp. 792–801. ACM Press, New York (2002)

    Google Scholar 

  5. Chen, S., Reif, J.H.: Efficient lossless compression of trees and graphs. In: Proc. DCC 1996, p. 428. IEEE Computer Society Press, Los Alamitos (1996)

    Google Scholar 

  6. Cheney, J.R.: First-order term compression: techniques and applications. Master’s thesis, Carnegie Mellon University (August 1998)

    Google Scholar 

  7. Cheney, J.R.: Personal communication (2004)

    Google Scholar 

  8. Cheng, J., Ng, W.: XQzip: Querying compressed XML using structural indexing. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 219–236. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Fernandez, M.F., Siméon, J., Choi, B., Marian, A., Sur, G.: Implementing xquery 1.0: The galax experience. In: Proc. VLDB, pp. 1077–1080 (2003)

    Google Scholar 

  10. Frick, M., Grohe, M., Koch, C.: Query evaluation on compressed trees (extended abstract). In: Proc. LICS, pp. 188–197. IEEE, Los Alamitos (2003)

    Google Scholar 

  11. Gapeyev, V., Levin, M.Y., Pierce, B.C., Schmitt, A.: XML goes native: Run-time representations for Xtatic. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 43–58. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)

    Google Scholar 

  13. Gécseg, F., Steinby, M.: Tree languages. In: Handbook of Formal Languages, ch. 1, vol. 3. Springer, Heidelberg (1997)

    Google Scholar 

  14. Katajainen, J., Mäkinen, E.: Tree compression and optimization with applications. Intern. J. of Foundations of Comput. Sci. 1, 425–447 (1990)

    Article  MATH  Google Scholar 

  15. Lamping, J.: An algorithm for optimal lambda calculus reductions. In: Proc. POPL 1990, pp. 16–30. ACM Press, New York (1990)

    Chapter  Google Scholar 

  16. Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: Proc. SODA, pp. 205–212. SIAM Press, Philadelphia (2002)

    Google Scholar 

  17. Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: Chen, W., et al. (eds.) Proc. SIGMOD, pp. 153–164. ACM, New York (2000)

    Chapter  Google Scholar 

  18. Lohrey, M., Maneth, S.: Tree automata on compressed trees. Submitted manuscript (2005)

    Google Scholar 

  19. Maneth, S., Busatto, G.: Tree transducers and tree compressions. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 363–377. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  20. Megginson, D.: Imperfect XML: Rants, Raves, Tips, and Tricks... from an Insider. Addison-Wesley, Reading (2004)

    Google Scholar 

  21. Milo, T., Suciu, D., Vianu, V.: Typechecking for XML transformers. J. Comp. Syst. Sci. 66, 66–97 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  22. Min, J., Park, M., Chung, C.: XPRESS: A queriable compression for XML data. In: Proc. SIGMOD, pp. 122–133. ACM Press, New York (2003)

    Google Scholar 

  23. Murata, M., Lee, D., Mani, M.: Taxonomy of XML schema languages using formal language theory. In: Proc. Extreme Markup Languages (2000)

    Google Scholar 

  24. Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, New York (1994)

    MATH  Google Scholar 

  25. Plandowski, W.: Testing equivalence of morphisms on context-free languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  26. Rytter, W.: Algorithms on compressed strings and arrays. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 48–65. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  27. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302, 211–222 (2002)

    Article  MathSciNet  Google Scholar 

  28. Tolani, P.M., Hartisa, J.R.: XGRIND: A query-friendly XML compressor. In: Proc. ICDE 2002, pp. 225–234. IEEE Computer Society, Los Alamitos (2002)

    Google Scholar 

  29. Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proc. ECDE 2004, pp. 621–633. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  30. Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. ICDE, pp. 54–65 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Busatto, G., Lohrey, M., Maneth, S. (2005). Efficient Memory Representation of XML Documents. In: Bierman, G., Koch, C. (eds) Database Programming Languages. DBPL 2005. Lecture Notes in Computer Science, vol 3774. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11601524_13

Download citation

  • DOI: https://doi.org/10.1007/11601524_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30951-2

  • Online ISBN: 978-3-540-31445-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics