Compressing XML Data Streams with DAG+BSBC

  • Stefan Böttcher
  • Rita Hartel
  • Christian Heinzemann
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 18)

Abstract

Whenever the growing amount of XML data that has to be stored, processed, exchanged, or transmitted becomes a major cost driver or performance bottleneck, XML compression is an important way to reduce these problems. However, many applications, e.g. those exchanging XML data streams, also require efficient path query processing on the structure of compressed XML data streams. We present an XML compression technique called DAG+BSBC, which extends Bit-Stream-Based-Compression (BSBC) [3] by a sparse index to compressed constants that reflects DAG pointers. Furthermore, DAG+BSBC supports XML stream compression, queries on compressed data, and provides a compression ra tio that not only significantly outperforms that of other queriable XML compression tech niques, like XGrind, but is also very competitive compared to non-queriable compression tech niques like gzip and XMill.

Keywords

Compression Ratio Query Processing Query Evaluation Path Query Inverted List 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arion, A., Bonifati, A., Manolescu, I., Pugliese, A.: XQueC: A Query-Conscious Compressed XML Database. ACM Transactions on Internet Technology (to appear)Google Scholar
  2. 2.
    Bayardo, R.J., Gruhl, D., Josifovski, V., Myllymaki, J.: An evaluation of binary XML encoding optimizations for fast stream based XML processing. In: Proc. of the 13th international conference on World Wide Web (2004)Google Scholar
  3. 3.
    Böttcher, S., Hartel, R., Heinzemann, C.: Towards a succinct data format for XML streams. In: International Conference on Web Information Systems (WEBIST) (2008)Google Scholar
  4. 4.
    Böttcher, S., Steinmetz, R., Klein, N.: XML Index Compression by DTD Subtraction. In: International Conference on Enterprise Information Systems (ICEIS) (2007)Google Scholar
  5. 5.
    Böttcher, S., Steinmetz, R.: Data Management for Mobile Ajax Web 2.0 Applications. In: Wagner, R., Revell, N., Pernul, G. (eds.) DEXA 2007. LNCS, vol. 4653, pp. 424–433. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  6. 6.
    Buneman, P., Grohe, M., Koch, C.: Path Queries on Compressed XML. In: VLDB (2003)Google Scholar
  7. 7.
    Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)Google Scholar
  8. 8.
    Busatto, G., Lohrey, M., Maneth, S.: Efficient Memory Representation of XML Documents. In: Bierman, G., Koch, C. (eds.) DBPL 2005. LNCS, vol. 3774, pp. 199–216. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Candan, K.S., Hsiung, W.-P., Chen, S., Tatemura, J., Agrawal, D.: AFilter: Adaptable XML Filtering with Prefix-Caching and Suffix-Clustering. In: VLDB (2006)Google Scholar
  10. 10.
    Cheney, J.: Compressing XML with multiplexed hierarchical models. In: Proceedings of the 2001 IEEE Data Compression Conference (DCC 2001) (2001)Google Scholar
  11. 11.
    Cheng, J., Ng, W.: XQzip, Querying Compressed XML Using Structural Indexing. In: EDBT (2004) Google Scholar
  12. 12.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and Searching XML Data Via Two Zips. In: Proceedings of the Fifteenth International World Wide Web Conference (2006)Google Scholar
  13. 13.
    Franceschet, M.: XPathMark: an XPath benchmark for XMark generated data. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 129–143. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Girardot, M., Sundaresan, N., Millau: An Encoding Format for Efficient Representation and Exchange of XML over the Web. In: Proceedings of the 9th International WWW Conference (2000)Google Scholar
  15. 15.
    Huffman, D.A.: A method for the construction of minimum-redundancy codes. In: Proc. of the I.R.E. (1952)Google Scholar
  16. 16.
    Liefke, H., Suciu, D.: XMill: An Efficient Compressor for XML Data. In: Proc. of ACM SIGMOD (2000)Google Scholar
  17. 17.
    Min, J.K., Park, M.J., Chung, C.W.: XPRESS: A Queriable Compression for XML Data. In: Proceedings of SIGMOD (2003)Google Scholar
  18. 18.
    Ng, W., Lam, W.Y., Wood, P.T., Levene, M.: XCQ: A queriable XML compression system. Knowledge and Information Systems (2006)Google Scholar
  19. 19.
    Olteanu, D., Meuss, H., Furche, T., Bry, F.: XPath: Looking Forward. In: Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.) EDBT 2002. LNCS, vol. 2490, pp. 109–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: XMark: A benchmark for XML data management, Hong Kong, China (2002)Google Scholar
  21. 21.
    Tolani, P.M., Hartisa, J.R.: XGRIND: A query-friendly XML compressor. In: Proc. ICDE (2002)Google Scholar
  22. 22.
    Yao, B.B., Özsu, M.T.: XBench - A family of benchmarks for XML DBMS (2002)Google Scholar
  23. 23.
    Zhang, N., Kacholia, V., Özsu, M.T.: A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML. In: ICDE (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Stefan Böttcher
    • 1
  • Rita Hartel
    • 1
  • Christian Heinzemann
    • 1
  1. 1.EIM - Electrical Engineering, Computer Science and MathematicsUniversity of PaderbornPaderbornGermany

Personalised recommendations