Efficient Memory Representation of XML Documents

  • Giorgio Busatto
  • Markus Lohrey
  • Sebastian Maneth
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3774)

Abstract

Implementations that load XML documents and give access to them via, e.g., the DOM, suffer from huge memory demands: the space needed to load an XML document is usually many times larger than the size of the document. A considerable amount of memory is needed to store the tree structure of the XML document. Here a technique is presented that allows to represent the tree structure of an XML document in an efficient way. The representation exploits the high regularity in XML documents by “compressing” their tree structure; the latter means to detect and remove repetitions of tree patterns. The functionality of basic tree operations, like traversal along edges, is preserved in the compressed representation. This allows to directly execute queries (and in particular, bulk operations) without prior decompression. For certain tasks like validation against an XML type or checking equality of documents, the representation allows for provably more efficient algorithms than those running on conventional representations.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: XQueC: Pushing queries to compressed XML data. In: Proc. VLDB, pp. 1065–1068 (2003)Google Scholar
  2. 2.
    Buneman, P., Choi, B., Fan, W., Hutchison, R., Mann, R., Viglas, S.: Vectorizing and querying large XML repositories. To appear in Proc. ICDE (2005)Google Scholar
  3. 3.
    Buneman, P., Grohe, M., Koch, C.: Path queries on compressed XML. In: Proc. VLDB, pp. 141–152 (2003)Google Scholar
  4. 4.
    Charikar, M., et al.: Approximating the smallest grammar: Kolmogorov complexity in natural models. In: Proc. STOC 2002, pp. 792–801. ACM Press, New York (2002)Google Scholar
  5. 5.
    Chen, S., Reif, J.H.: Efficient lossless compression of trees and graphs. In: Proc. DCC 1996, p. 428. IEEE Computer Society Press, Los Alamitos (1996)Google Scholar
  6. 6.
    Cheney, J.R.: First-order term compression: techniques and applications. Master’s thesis, Carnegie Mellon University (August 1998)Google Scholar
  7. 7.
    Cheney, J.R.: Personal communication (2004)Google Scholar
  8. 8.
    Cheng, J., Ng, W.: XQzip: Querying compressed XML using structural indexing. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 219–236. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Fernandez, M.F., Siméon, J., Choi, B., Marian, A., Sur, G.: Implementing xquery 1.0: The galax experience. In: Proc. VLDB, pp. 1077–1080 (2003)Google Scholar
  10. 10.
    Frick, M., Grohe, M., Koch, C.: Query evaluation on compressed trees (extended abstract). In: Proc. LICS, pp. 188–197. IEEE, Los Alamitos (2003)Google Scholar
  11. 11.
    Gapeyev, V., Levin, M.Y., Pierce, B.C., Schmitt, A.: XML goes native: Run-time representations for Xtatic. In: Bodik, R. (ed.) CC 2005. LNCS, vol. 3443, pp. 43–58. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Geary, R.F., Raman, R., Raman, V.: Succinct ordinal trees with level-ancestor queries. In: Proc. SODA, pp. 1–10 (2004)Google Scholar
  13. 13.
    Gécseg, F., Steinby, M.: Tree languages. In: Handbook of Formal Languages, ch. 1, vol. 3. Springer, Heidelberg (1997)Google Scholar
  14. 14.
    Katajainen, J., Mäkinen, E.: Tree compression and optimization with applications. Intern. J. of Foundations of Comput. Sci. 1, 425–447 (1990)MATHCrossRefGoogle Scholar
  15. 15.
    Lamping, J.: An algorithm for optimal lambda calculus reductions. In: Proc. POPL 1990, pp. 16–30. ACM Press, New York (1990)CrossRefGoogle Scholar
  16. 16.
    Lehman, E., Shelat, A.: Approximation algorithms for grammar-based compression. In: Proc. SODA, pp. 205–212. SIAM Press, Philadelphia (2002)Google Scholar
  17. 17.
    Liefke, H., Suciu, D.: XMill: An efficient compressor for XML data. In: Chen, W., et al. (eds.) Proc. SIGMOD, pp. 153–164. ACM, New York (2000)CrossRefGoogle Scholar
  18. 18.
    Lohrey, M., Maneth, S.: Tree automata on compressed trees. Submitted manuscript (2005)Google Scholar
  19. 19.
    Maneth, S., Busatto, G.: Tree transducers and tree compressions. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 363–377. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  20. 20.
    Megginson, D.: Imperfect XML: Rants, Raves, Tips, and Tricks... from an Insider. Addison-Wesley, Reading (2004)Google Scholar
  21. 21.
    Milo, T., Suciu, D., Vianu, V.: Typechecking for XML transformers. J. Comp. Syst. Sci. 66, 66–97 (2003)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Min, J., Park, M., Chung, C.: XPRESS: A queriable compression for XML data. In: Proc. SIGMOD, pp. 122–133. ACM Press, New York (2003)Google Scholar
  23. 23.
    Murata, M., Lee, D., Mani, M.: Taxonomy of XML schema languages using formal language theory. In: Proc. Extreme Markup Languages (2000)Google Scholar
  24. 24.
    Papadimitriou, C.H.: Computational Complexity. Addison-Wesley, New York (1994)MATHGoogle Scholar
  25. 25.
    Plandowski, W.: Testing equivalence of morphisms on context-free languages. In: van Leeuwen, J. (ed.) ESA 1994. LNCS, vol. 855, pp. 460–470. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  26. 26.
    Rytter, W.: Algorithms on compressed strings and arrays. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 48–65. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  27. 27.
    Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoret. Comput. Sci. 302, 211–222 (2002)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Tolani, P.M., Hartisa, J.R.: XGRIND: A query-friendly XML compressor. In: Proc. ICDE 2002, pp. 225–234. IEEE Computer Society, Los Alamitos (2002)Google Scholar
  29. 29.
    Yao, B.B., Özsu, M.T., Khandelwal, N.: XBench benchmark and performance testing of XML DBMSs. In: Proc. ECDE 2004, pp. 621–633. IEEE Computer Society, Los Alamitos (2004)Google Scholar
  30. 30.
    Zhang, N., Kacholia, V., Özsu, M.T.: A succinct physical storage scheme for efficient evaluation of path queries in XML. In: Proc. ICDE, pp. 54–65 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Giorgio Busatto
    • 1
  • Markus Lohrey
    • 2
  • Sebastian Maneth
    • 3
  1. 1.Department für InformatikUniversität OldenburgGermany
  2. 2.FMIUniversität StuttgartGermany
  3. 3.Faculté I & C, EPFLSwitzerland

Personalised recommendations