Algorithm Using Expanded LZ Compression Scheme for Compressing Tree Structured Data

  • Yuko ItokawaEmail author
  • Koichiro Katoh
  • Tomoyuki Uchida
  • Takayoshi Shoudai
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 52)


Due to the rapid growth of information technologies, the use of electronic data such as XML/HTML documents, which are a form of tree structured data, has been rapidly increasing. We have developed an algorithm for effectively compressing tree structured data and one for decompressing a compressed tree that are based on the Lempel–Ziv compression scheme. Next, we have implemented both compression and decompression algorithms by applying our algorithms for the XMill compressor and XDemill decompressor presented by Liefke and Suciu. Then, testing using synthetic large ordered trees and real-world tree structured data demonstrated the effectiveness and efficiency of our algorithms.


Tree structured data Lemplel–Ziv compression scheme XMill XDemill 


  1. 1.
    Abiteboul, S., Buneman, P., & Suciu, D. (2000). Data on the Web: from relations to semistructured data and XML. Morgan Kaufmann. San Francisco, CA, USA.Google Scholar
  2. 2.
    Adiego, J., Navarro, G., & de la Fuente, P. (2004). Lempel-ziv compression of structured text. Proceedings of the IEEE data compression conference (DCC 2004) (pp. 112–121).Google Scholar
  3. 3.
    Cheney, J. (2001). Compressing xml with multiplexed hierarchical ppm models. Proceedings of the IEEE data compression conference (DCC 2001) (pp. 163–172).Google Scholar
  4. 4.
    Cook, D.J., & Holder, L.B. (2000). Graph-based data mining. IEEE Intelligent Systems, 15(2), 32–41.CrossRefGoogle Scholar
  5. 5.
    Itokawa, Y., Uchida, T., Shoudai, T., Miyahara, T., & Nakamura, Y. (2003). Finding frequent subgraphs from graph structured data with geometric information and its application to lossless compression. Proceedings of the 7th Pacific–Asia conference on advances in knowledge discovery and data mining (PAKDD-2003), Springer, LNAI 2637 (pp. 582–594).Google Scholar
  6. 6.
    Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., & Arikawa, S. (2003). Collage system: a unifying framework for compressed pattern matching. Theoretical Computer Science, 1(298), 253–272.MathSciNetCrossRefGoogle Scholar
  7. 7.
    Liefke, H., & Suciu, D. (2000). Xmill: an efficient compressor for xml data. Proceedings of the 2000 ACM SIGMOD international conference on management of data (pp. 153–164).Google Scholar
  8. 8.
    Matsumoto, S., Hayashi, Y., & Shoudai, T. (1997). Polynomial time inductive inference of regular term tree languages from positive data. Proceedings of the 8th workshop on algorithmic learning theory (ALT-97), Springer, LNAI 1316 (pp. 212–227).Google Scholar
  9. 9.
    Sakamoto, H. (2005). A fully linear-time approximation algorithm for grammar-based compression. Journal of Discrete Algorithms, 3(2–4), 416–430.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Storer, J.A., & Szymanski, T.G. (1982). Data compression via textual substitution. Journal of the ACM, 29(4), 928–951.MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Suzuki, Y., Shoudai, T., Uchida, T., & Miyahara, T. (2006). Ordered term tree languages which are polynomial time inductively inferable from positive data. Theoretical Computer Science, 350, 63–90.MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Tolani, P.M., & Haritsa, J.R. (2002). Xgrind: a query-friendly xml compressor. Proceedings of the 18th international conference on data engineering (pp. 225–234).Google Scholar
  13. 13.
    Yamagata, K., Uchida, T., Shoudai, T., & Nakamura, Y. (2003). An effective grammar-based compression algorithm for tree structured data. Proceedings of the 13th international conference on inductive logic programming (ILP-03), Springer, LNAI 2835 (pp. 383–400).Google Scholar
  14. 14.
    Yasamaki, H., Sasaki, Y., Shoudai, T., Uchida, T., & Suzuki, Y. (2009). Learning block-preserving graph patterns and its application to data mining. Machine Learning, 76, 137–173.CrossRefGoogle Scholar
  15. 15.
    Ziv, J., & Lempel, A. (1977). A universal algorithm for sequential data compression. IEEE Transactions on Information Theory, IT-23(3), 337–343.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Yuko Itokawa
    • 1
    Email author
  • Koichiro Katoh
    • 2
  • Tomoyuki Uchida
    • 3
  • Takayoshi Shoudai
    • 4
  1. 1.Department of Kansei DesignHiroshima International UniversityHigashi HiroshimaJapan
  2. 2.Enterprise Server Division Department I Server DevelopmentHitachi Ltd.,1 HoriyamashitaHadanoJapan
  3. 3.Faculty of Information SciencesHiroshima City UniversityAsa-Minami-KuJapan
  4. 4.Department of InformaticsKyushu UniversityNishi-kuJapan

Personalised recommendations