Document Decomposition for XML Compression: A Heuristic Approach

  • Byron Choi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3882)

Abstract

Sharing of common subtrees has been reported useful not only for XML compression but also for main-memory XML query processing. This method compresses subtrees only when they exhibit identical structure. Even slight irregularities among subtrees dramatically reduce the performance of compression algorithms of this kind. Furthermore, when XML documents are large, the chance of having large number of identical subtrees is inherently low. In this paper, we proposed a method of decomposing XML documents for better compression. We proposed a heuristic method of locating minor irregularities in XML documents. The irregularities are then projected out from the original XML document. We refered this process to as document decomposition. We demonstrated that better compression can be achieved by compressing the decomposed documents separately. Experimental results demonstrated that the compressed skeletons, for all real-world datasets, to our knowledge, fit comfortably into main memory of commodity computers nowadays. Preliminary results on querying compressed skeletons validate the effectiveness our approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Byron Choi
    • 1
  1. 1.Nanyang Technological UniversitySingapore

Personalised recommendations