Abstract
We address the problem of improving variable-length-to- fixed-length codes (VF codes), which have favourable properties for fast compressed pattern matching but moderate compression ratios. Compression ratio of VF codes depends on the parse tree that is used as a dictionary. We propose a method that trains a parse tree by scanning an input text repeatedly, and we show experimentally that it improves the compression ratio of VF codes rapidly to the level of state-of-the-art compression methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F.: (s, c)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Brisaboa, N.R., Iglesias, E.L., Navarro, G., Paramá, J.R.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Kida, T.: Suffix tree based VF-coding for compressed pattern matching. In: Data Compression Conference 2009, p. 449. IEEE Computer Society, Los Alamitos (March 2009)
Klein, S.T., Shapira, D.: Improved variable-to-fixed length codes. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 39–50. Springer, Heidelberg (2008)
Maruyama, S., Tanaka, Y., Sakamoto, H., Takeda, M.: Context-sensitive grammar transform: Compression and pattern matching. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 27–38. Springer, Heidelberg (2008)
Savari, S.A.: Variable-to-fixed length codes for predictable sources. In: Data Compression Conference 1998, pp. 481–490. IEEE Computer Society, Los Alamitos (1998)
Tunstall, B.P.: Synthesis of noiseless compression codes. Ph.D. thesis, Georgia Inst. Technol., Atlanta, GA (1967)
Weiner, P.: Linear pattern matching algorithms. In: 14th IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Uemura, T., Yoshida, S., Kida, T., Asai, T., Okamoto, S. (2010). Training Parse Trees for Efficient VF Coding. In: Chavez, E., Lonardi, S. (eds) String Processing and Information Retrieval. SPIRE 2010. Lecture Notes in Computer Science, vol 6393. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16321-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-16321-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16320-3
Online ISBN: 978-3-642-16321-0
eBook Packages: Computer ScienceComputer Science (R0)