Abstract
Given a text, grammar-based compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either off-line or on-line, according to how a text is processed. One representative tactics for off-line compression is to substitute the longest repeated factors of a text with a production rule. In this paper, we present an algorithm that compresses a text basing on this longest-first principle, in linear time. The algorithm employs a suitable index structure for a text, and involves technically efficient operations on the structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A.: The myriad virtues of subword trees. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithm on Words. NATO Advanced Science Institutes, Series F, vol. 12, pp. 85–96. Springer, Heidelberg (1985)
Apostolico, A., Lonardi, S.: Off-line compression by greedy textual substitution. Proc. IEEE 88(11), 1733–1744 (2000)
Apostolico, A., Preparata, F.P.: Data structures and algorithms for the string statistics problem. Algorithmica 15, 481–494 (1996)
Bell, T.C., Cleary, J.G., Witten, I.H.: Text Compression. Prentice Hall, New Jersey (1990)
Bentley, J., McIlroy, D.: Data compression using long common strings. In: Proc. Data Compression Conference 1999 (DCC 1999), pp. 287–295. IEEE Computer Society, Los Alamitos (1999)
Blumer, A., Blumer, J., Haussler, D., McConnell, R., Ehrenfeucht, A.: Complete inverted files for efficient text retrieval and analysis. J. ACM 34(3), 578–595 (1987)
Brødal, G.S., Lyngsø, R.B., Östlin, A., Pedersen, C.N.S.: Solving the string statistics problem in time \(\mathcal{O}(n\log n)\). In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 728–739. Springer, Heidelberg (2002)
Crochemore, M., Rytter, W.: Jewels of Stringology. World Scientific, Singapore (2002)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, New York (1997)
Inenaga, S., Hoshino, H., Shinohara, A., Takeda, M., Arikawa, S., Mauri, G., Pavesi, G.: On-line construction of compact directed acyclic word graphs. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089, pp. 169–180. Springer, Heidelberg (2001)
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proc. IEEE 88(11), 1722–1732 (2000)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical structure in sequences: a linear-time algorithm. J. Artificial Intelligence Research 7, 67–82 (1997)
Nevill-Manning, C.G., Witten, I.H.: Phrase hierarchy inference and compression in bounded space. In: Proc. Data Compression Conference 1998 (DCC 1998), pp. 179–188. IEEE Computer Society, Los Alamitos (1998)
Nevill-Manning, C.G., Witten, I.H.: Online and offline heuristics for inferring hierarchies of repetitions in sequences 88(11), 1745–1755 (2000)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Annual Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Wolff, J.G.: An algorithm for the segmentation for an artificial language analogue. Britich Journal of Psychology 66, 79–90 (1975)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Trans Information Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Inenaga, S., Funamoto, T., Takeda, M., Shinohara, A. (2003). Linear-Time Off-Line Text Compression by Longest-First Substitution. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2003. Lecture Notes in Computer Science, vol 2857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39984-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39984-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20177-9
Online ISBN: 978-3-540-39984-1
eBook Packages: Springer Book Archive