A Highly Efficient XML Compression Scheme for the Web

  • Przemysław Skibiński
  • Jakub Swacha
  • Szymon Grabowski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4910)

Abstract

Contemporary XML documents can be tens of megabytes long, and reducing their size, thus allowing to transfer them faster, poses a significant advantage for their users. In this paper, we describe a new XML compression scheme which outperforms the previous state-of-the-art algorithm, SCMPPM, by over 9% on average in compression ratio, having the practical feature of streamlined decompression and being almost twice faster in the decompression. Applying the scheme can significantly reduce transmission time/bandwidth usage for XML documents published on the Web. The proposed scheme is based on a semi-dynamic dictionary of the most frequent words in the document (both in the annotation and contents), automatic detection and compact encoding of numbers and specific patterns (like dates or IP addresses), and a back-end PPM coding variant tailored to efficiently handle long matching sequences. Moreover, we show that the compression ratio can be improved by additional 9% for the price of a significant slow-down.

Keywords

XML compression semi-structural data compression text transform prediction by partial matching 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adiego, J., de la Fuente, P., Navarro, G.: Using Structural Contexts to Compress Semistructured Text Collections. Information Processing and Management 43(3), 769–790 (2007)CrossRefGoogle Scholar
  2. 2.
    Burrows, M., Wheeler, D.J.: A block-sorting data compression algorithm. SRC Research Report 124. Digital Equipment Corporation, Palo Alto, CA, USA (1994)Google Scholar
  3. 3.
    Cheney, J.: Compressing XML with multiplexed hierarchical PPM models. In: Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA, pp. 163–172 (2001)Google Scholar
  4. 4.
    Cleary, J.G., Witten, I.H.: Data Compression Using Adaptive Coding and Partial String Matching. IEEE Trans. on Comm. 32(4), 396–402 (1984)CrossRefGoogle Scholar
  5. 5.
    Cleary, J.G., Teahan, W.J., Witten, I.H.: Unbounded length contexts for PPM. In: Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA, pp. 52–61 (1995)Google Scholar
  6. 6.
    Deutsch, P.: DEFLATE Compressed Data Format Specification version 1.3. RFC 1951 (1996), http://www.ietf.org/rfc/rfc1951.txt
  7. 7.
    Ferragina, P., Luccio, F., Manzini, G., Muthukrishnan, S.: Compressing and searching XML data via two zips. In: WWW. Proceedings of the International World Wide Web Conference, Edinburgh, Scotland, pp. 751–760 (2006)Google Scholar
  8. 8.
    Hariharan, S., Shankar, P.: Compressing XML documents with finite state automata. In: CIAA. Proceedings of the Tenth International Conference on Implementation and Application of Automata, Antipolis, France, pp. 285–296 (2005)Google Scholar
  9. 9.
    Liefke, H., Suciu, D.: XMill: an efficient compressor for XML data. In: Proceedings of the 19th ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, pp. 153–164 (2000)Google Scholar
  10. 10.
    Mahoney, M.: Adaptive Weighing of Context Models for Lossless Data Compression. Technical Report TR-CS-2005-16, Florida Tech., USA (2005)Google Scholar
  11. 11.
    Mahoney, M.: The PAQ Data Compression Programs (2007), http://www.cs.fit.edu/~mmahoney/compression/paq.html
  12. 12.
    Pavlov, I.: LZMA Software Development Kit (2007), http://www.7-zip.org/sdk.html
  13. 13.
    Skibiński, P., Grabowski, Sz.: Variable-length contexts for PPM. In: Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA, pp. 409–418 (2004)Google Scholar
  14. 14.
    Skibiński, P., Grabowski, Sz., Deorowicz, S.: Revisiting Dictionary-Based Compression. Software – Practice and Experience 35(15), 1455–1476 (2005)CrossRefGoogle Scholar
  15. 15.
    Skibiński, P.: Grabowski, Sz., Swacha, J.: Fast transform for effective XML compression. In: CADSM. Proceedings of the IXth International Conference, Lviv, Ukraine, pp. 323–326 (2007)Google Scholar
  16. 16.
    Skibiński, P.: Grabowski, Sz., Swacha, J.: Effective Asymmetric XML Compression. Software – Practice and Experience (to appear)Google Scholar
  17. 17.
    Skibiński, P., Swacha, J.: Combining efficient XML compression with query processing. In: ADBIS 2007. LNCS, vol. 4690, pp. 330–342. Springer, Heidelberg (2007)Google Scholar
  18. 18.
    Shkarin, D.: PPM: One step to practicality. In: Proceedings of the IEEE Data Compression Conference, Snowbird, UT, USA, pp. 202–211 (2002)Google Scholar
  19. 19.
    Toman, V.: Syntactical compression of XML data. In: Presented at the doctoral consortium of the 16th International Conference on Advanced Information Systems Engineering, Riga, Latvia (2004), http://caise04dc.idi.ntnu.no/CRC_CaiseDC/toman.pdf
  20. 20.
    Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inform. Theory 23(3), 337–343 (1977)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Przemysław Skibiński
    • 1
  • Jakub Swacha
    • 2
  • Szymon Grabowski
    • 3
  1. 1.Institute of Computer ScienceUniversity of WrocławPoland
  2. 2.Institute of Information Technology in ManagementSzczecin UniversitySzczecinPoland
  3. 3.Computer Engineering DepartmentTechnical University of ŁódźŁódźPoland

Personalised recommendations