Efficient Query Evaluation over Compressed XML Data

  • Andrei Arion
  • Angela Bonifati
  • Gianni Costa
  • Sandra D’Aguanno
  • Ioana Manolescu
  • Andrea Pugliese
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2992)

Abstract

XML suffers from the major limitation of high redundancy. Even if compression can be beneficial for XML data, however, once compressed, the data can be seldom browsed and queried in an efficient way. To address this problem, we propose XQueC, an [XQue]ry processor and [C]ompressor, which covers a large set of XQuery queries in the compressed domain. We shred compressed XML into suitable data structures, aiming at both reducing memory usage at query time and querying data while compressed. XQueC is the first system to take advantage of a query workload to choose the compression algorithms, and to group the compressed data granules according to their common properties. By means of experiments, we show that good trade-offs between compression ratio and query capability can be achieved in several real cases, as those covered by an XML benchmark. On average, XQueC improves over previous XML query-aware compression systems, still being reasonably closer to general-purpose query-unaware XML compressors. Finally, QETs for a wide variety of queries show that XQueC can reach speed comparable to XQuery engines on uncompressed data.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Westmann, T., Kossmann, D., Helmer, S., Moerkotte, G.: The Implementation and Performance of Compressed Databases. ACM SIGMOD Record 29, 55–67 (2000)CrossRefGoogle Scholar
  2. 2.
    Chen, Z., Gehrke, J., Korn, F.: Query Optimization In Compressed Database Systems. In: Proc. of ACM SIGMOD (2000)Google Scholar
  3. 3.
    Chen, Z., Seshadri, P.: An Algebraic Compression Framework for Query Results. In: Proc. of the ICDE Conf. (2000)Google Scholar
  4. 4.
    Tolani, P., Haritsa, J.: XGRIND: A query-friendly XML compressor. In: Proc. of the ICDE Conf. (2002)Google Scholar
  5. 5.
    Min, J.K., Park, M., Chung, C.: XPRESS: A queriable compression for XML data. In: Proc. of ACM SIGMOD (2003)Google Scholar
  6. 6.
    Arion, A., Bonifati, A., Costa, G., D’Aguanno, S., Manolescu, I., Pugliese, A.: XQueC: Pushing XML Queries to Compressed XML Data (demo). In: Proc. of the VLDB Conf. (2003)Google Scholar
  7. 7.
    Liefke, H., Suciu, D.: XMILL: An efficient compressor for XML data. In: Proc. of ACM SIGMOD (2000)Google Scholar
  8. 8.
    Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management. In: Proc. of the VLDB Conf. (2002)Google Scholar
  9. 9.
    Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: Proc. of the VLDB Conf. (2003)Google Scholar
  10. 10.
    Marian, A., Simeon, J.: Projecting XML Documents. In: Proc. of the VLDB Conf. (2003)Google Scholar
  11. 11.
    Huffman, D.A.: A Method for Construction of Minimum-Redundancy Codes. In: Proc. of the IRE (1952)Google Scholar
  12. 12.
    Antoshenkov, G.: Dictionary-Based Order-Preserving String Compression. VLDB Journal 6, 26–39 (1997)CrossRefGoogle Scholar
  13. 13.
    Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing Relations and Indexes. In: Proc. of the ICDE Conf., pp. 370–379 (1998)Google Scholar
  14. 14.
    Poess, M., Potapov, D.: Data Compression in Oracle. In: Proc. of the VLDB Conf. (2003)Google Scholar
  15. 15.
    Moura, E.D., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and Flexible Word Searching on Compressed Text. ACM Transactions on Information Systems 18, 113–139 (2000)CrossRefGoogle Scholar
  16. 16.
    Witten, I.H.: Arithmetic Coding For Data Compression. Communications of ACM (1987)Google Scholar
  17. 17.
    Hu, T.C., Tucker, A.C.: Optimal Computer Search Trees And Variable-Length Alphabetical Codes. SIAM J. APPL. MATH 21, 514–532 (1971)MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    Moffat, A., Zobel, J.: Coding for Compression in Full-Text Retrieval Systems. In: Proc. of the Data Compression Conference (DCC), pp. 72–81 (1992)Google Scholar
  19. 19.
    Antoshenkov, G., Lomet, D., Murray, J.: Order preserving string compression. In: Proc. of the ICDE Conf., pp. 655–663 (1996)Google Scholar
  20. 20.
    Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems. Prentice-Hall, Englewood Cliffs (1999)Google Scholar
  21. 21.
    Amer-Yahia, S.: Storage Techniques and Mapping Schemas for XML. SIGMOD Record (2003)Google Scholar
  22. 22.
    Bohannon, P., Freire, J., Roy, P., Simeon, J.: From XML Schema to Relations: A Cost-based Approach to XML Storage. In: Proc. of the ICDE Conf. (2002)Google Scholar
  23. 23.
    Website: The bzip2 and libbzip2 Official Home Page (2002), http://sources.redhat.com/bzip2/
  24. 24.
    Shanmugasundaram, J., Shekita, E., Barr, R., Carey, M., Lindsay, B., Pirahesh, H., Reinwald, B.: Efficiently Publishing Relational Data as XML Documents. In: Proc. of the VLDB Conf. (2000)Google Scholar
  25. 25.
    Website: Berkeley DB Data Store (2003), http://www.sleepycat.com/pro-ducts/data.shtml
  26. 26.
    Paparizos, S., Al-Khalifa, S., Chapman, A., Jagadish, H.V., Lakshmanan, L.V.S., Nierman, A., Patel, J.M., Srivastava, D., Wiwatwattana, N., Wu, Y., Yu, C.: TIMBER: A Native System for Querying XML. In: Proc. of ACM SIGMOD, p. 672 (2003)Google Scholar
  27. 27.
    Grust, T.: Accelerating XPath location steps. In: Proc. of ACM SIGMOD, pp. 109–120 (2002)Google Scholar
  28. 28.
    Srivastava, D., Al-Khalifa, S., Jagadish, H.V., Koudas, N., Patel, J.M., Wu, Y.: Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In: Proc. of the ICDE Conf. (2002)Google Scholar
  29. 29.
    Website: The XML Query Language (2003), http://www.w3.org/XML/Query
  30. 30.
    Website: XQuery and XPath Full-text Use Cases (2003), http://www.w3.org/TR/xmlqueryfull-text-use-cases

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Andrei Arion
    • 1
  • Angela Bonifati
    • 2
  • Gianni Costa
    • 2
  • Sandra D’Aguanno
    • 1
  • Ioana Manolescu
    • 1
  • Andrea Pugliese
    • 3
  1. 1.INRIA Futurs, Parc Club Orsay-UniversiteOrsay CedexFrance
  2. 2.Icar-CNRRende (CS)Italy
  3. 3.DEISUniversity of CalabriaRende(CS)Italy

Personalised recommendations