Definition
XML is an extremely verbose data format, with a high degree of redundant information, due to the same tags being repeated over and over for multiple data items, and due to both tags and data values being represented as strings. Viewed in relational database terms, XML stores the “schema” with each and every “record” in the repository. The size increase incurred by publishing data in XML format is estimated to be as much as 400 % [14], making it a prime target for compression. While standard general-purpose compressors, such as zip, gzip or bzip, typically compress XML data reasonably well, specialized XML compressors have been developed over the last decade that exploit the specific structural aspects of XML data. These new techniques fall into two classes: (i) Compression-oriented, where the goal is to maximize the compression ratio of the data, typically up to a factor of two better than the general-purpose compressors; and (ii) Query-oriented, where the goal is to...
Recommended Reading
Arion A, Bonifati A, Manolescu I, Pugliese A. XQueC: a query-conscious compressed XML database. ACM Trans Internet Technol. 2007;7(2):1–35.
Cheney J. Compressing XML with multiplexed hierarchical PPM models. In: Proceedings Data Compression Conference; 2001. p. 163–72
Ferragina P, Luccio F, Manzini G, Muthukrishnan M. Compressing and searching XML data via two zips. In: Proceeding 15th International World Wide Web Conference; 2006. p. 751–60.
Girardot M, Sundaresan N. Millau: an encoding format for efficient representation and exchange of XML over the Web. In: Proceedings 9th International World Wide Web Conference; 2000.
Liefke H, Suciu D. An extensible compressor for XML data. ACM SIGMOD Rec. 2000;29(1):57–62.
Liefke H, Suciu D. XMill: an efficent compressor for XML data. In: Proceedings ACM SIGMOD International Conference on Management of Data; 2000. p. 153–64.
Min JK, Park M, Chung C. XPRESS: a queriable compression for XML data. In: Proceedings ACM SIGMOD International Conference on Management of Data; 2003. p. 122–33.
Min JK, Park M, Chung C. XPRESS: a compressor for effective archiving, retrieval, and update of XML documents. ACM Trans Internet Technol. 2006;6(3):223–58.
Tolani P, Haritsa J. XGRIND: a query-friendly XML compressor. In: Proceedings 18th International Conference on Data Engineering; 2002. p. 225–35.
Ziv J, Lempel A. A universal algorithm for sequential data compression. IEEE Trans Inf Theory. 1977;23(3):337–43.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media LLC
About this entry
Cite this entry
Suciu, D., Haritsa, J.R. (2016). XML Compression. In: Liu, L., Özsu, M. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4899-7993-3_783-2
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7993-3_783-2
Received:
Accepted:
Published:
Publisher Name: Springer, New York, NY
Online ISBN: 978-1-4899-7993-3
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering