A Compression Technique for XML Element Retrieval

Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 110)


The main objective of compression technique has been changed, not only to reduce the storage but also to other efficiency. For instance, the large scale of XML collection, compression techniques is required for improving retrieval time. In this paper, we propose new XML compression algorithm that allows supporting absolute document XPath indexing and score sharing function by a top-down scheme approach. It has been discovered that these steps reduce the size of the data down by 82.29%, and reduce the length of retrieving time down by 51.38% when compare to GPX system. In addition, It has been reduced the length of score sharing processing time down by 44.18% when compared to before the compression.


XML retrieval Compression strategies Ranking strategies Indexing unit 


  1. 1.
    Extensible Markup Language (XML) 1.1 (Second Edition) [Online May, 2011]. Available:
  2. 2.
    INitiative for the Evaluation of XML Retrieval (INEX) [Online May, 2011]. Available:
  3. 3.
    Kamps J (2009) Indexing units. In: Liu L, Tamer Özsu M (ed) Encyclopedia of database systems (EDS). Springer, HeidelbergGoogle Scholar
  4. 4.
    Kamps J, de Rijke M, Sigurbjörnsson B (2005) The importance of length normalization for XML retrieval. Inform Retrieval 8(4):631–654CrossRefGoogle Scholar
  5. 5.
    Geva S (2005) GPX – Gardens point XML information retrieval. INEX 2004. Advances in XML information retrieval. Third International Workshop of the Initiative for the Evaluation of XML, Springer Verlag, Berlin, Heidelberg, Vol. 3977, pp 211–223Google Scholar
  6. 6.
    Ogilvie P, Callan J (2005) Hierarchical language models for XML component retrieval. INEX 2004, Lecture Notes in Computer Science, Springer Verlag, Berlin, Heidelberg, Vol. 3493Google Scholar
  7. 7.
    Mass Y, Mandelbrod M (2005) Component ranking and automatic query refinement for XML retrieval. INEX 2004, Springer Verlag, Berlin, Heidelberg, Vol. 3493Google Scholar
  8. 8.
    Liefke H, Suciu D (2000) XMill: an efficient compressor for XML data. In Proceeding of the 2000 ACM SIGMOD International Conference on Management of Data, ACM, pp 153–164Google Scholar
  9. 9.
    Gailly JL, Adler M (2011) gzip: The compressor data. [Online May, 2011]. Available:
  10. 10.
    Maireang K, Pleurmpitiwiriyavach C (2003) XPACK: a grammar-based XML document compression. Proceeding of NCSEC2003 the 7th national computer science and engineering conference, 28–30 October 2003Google Scholar
  11. 11.
    Tolani PM, Haritsa JR (2002) XGRIND: A query-friendly XML compressor. In Proceedings of 18th International Conference on Databases Engineering, IEEE Computer Society Press, Los Alamitos, February 2002, pp 225–234Google Scholar
  12. 12.
    Min J-K, Park M-J, Chung C-W (2003) XPRESS: a queriable compression for XML data. Proceeding of the 2003 ACM SIGMOD international conference on management of data, 9–12 June 2003Google Scholar
  13. 13.
    Wichaiwong T, Jaruskulchai C (2007) Improve XML web services’ performance by compressing XML schema tag. The 4th international technical conference on electrical engineering/electronics, computer, telecommunications and information technology, Chiang Rai, 9–12 May 2007Google Scholar
  14. 14.
    Wichaiwong T, Jaruskulchai C (2011) XML retrieval more efficient using adxpi indexing scheme. The 4th international symposium on mining and web, Biopolis, Singapore, 22–25 March 2011Google Scholar
  15. 15.
    Wichaiwong T, Jaruskulchai C (2011) XML retrieval more efficient using compression technique, lecture notes in engineering and computer science. Proceedings of the international multiconference of engineers and computer scientists 2011, IMECS 2011, Hong Kong, 16–18 March 2011Google Scholar
  16. 16.
    Wichaiwong T, Jaruskulchai C (2011) MEXIR: An implementation of high performance and high precision on XML retrieval. Computer Technology and Application, David Publishing Company, 2(4):301–310Google Scholar
  17. 17.
    MySQL Full-Text Search Functions [Online May, 2011]. Available:
  18. 18.
    Sphinx Open Source Search Server [Online May, 2011]. Available:
  19. 19.
    Aksyonoff A (2011) Introduction to Search with Sphinx, O’Reilly MediaGoogle Scholar
  20. 20.
    Robertson SE, Walker S, Jones S, Hancock Beaulieu MM, Gatford M (1995) Okapi at TREC-3. In: Harman DK (ed) Proceedings of the Third Text REtrieval Conference (TREC-3), NIST Special Publication, pp 500–225Google Scholar
  21. 21.
    Wichaiwong T, Jaruskulchai C (2010) A simple approach to optimize XML retrieval. The 6th International Conference on Next Generation Web Services Practices, Goa, India, November 23–25, pp 426–431Google Scholar
  22. 22.
    Denoyer L, Gallinari P (2006) The Wikipedia XML corpus. SIGIR Forum, pp 64–69Google Scholar
  23. 23.
    Schenkel R, Suchanek FM, Kasneci G (2007) YAWN: A semantically annotated Wikipedia XML corpus. In BTW, pp 277–291Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer Science, Faculty of ScienceKasetsart UniversityBangkokThailand

Personalised recommendations