Abstract
The performance of data compression on a large static text may be improved if certain variable-length strings are included in the character set for which a code is generated. A new method for extending the alphabet is presented, based on a reduction to a graph-theoretic problem. A related optimization problem is shown to be NP-complete, a fast heuristic is suggested, and experimental results are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Appostolico A., The myriad virtues of subword trees, Combinatorial Algorithms on Words, NATO ASI Series Vol F12, Springer Verlag, Berlin (1985) 85–96.
Aho A.V., Hopcroft J.E., Ullman J.D., The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, MA (1974).
Bell T.C., Cleary J.G., Witten I.A., Text Compression, Prentice Hall, Englewood Cliffs, NJ (1990).
Bookstein A., Klein S.T., Compression, Information Theory and Grammars: A Unified Approach, ACM Trans. on Information Systems 8 (1990) 27–49.
Bookstein A., Klein S.T., Raita T., An overhead reduction technique for megastate compression schemes, Information Processing & Management33 (1997) 745–760.
Bookstein A., Klein S.T., Ziff D.A., A systematic approach to compressing a full text retrieval system, Information Processing & Management28 (1992) 795–806.
Even S., Graph Algorithms, Computer Science Press (1979).
Fraenkel A.S., All about the Responsa Retrieval Project you always wanted to know but were afraid to ask, Expanded Summary, Jurimetrics J.16 (1976) 149–156.
Fraenkel A.S., Mor M., Perl Y., Is text compression by prefixes and suffixes practical? Acta Informatica20 (1983) 371–389.
Garey M.R., Johnson D.S., Computers and Intractability: A Guidetothe Theory of NP-Completeness, W.H. Freeman, San Francisco (1979).
Halldorsson M.M., Radhakrishnan J., Greed is good: approximating independent sets in sparse and bounded degree graphs, Proc. 26th ACM-STOC (1994) 439–448.
Hochbaum D.S., Approximation Algorithms for NP-Hard Problems, PWS Publishing Company, Boston (1997).
Klein S.T., Space and time-efficient decoding with canonical Huffman trees, Proc. 8th Symp. on Combinatorial Pattern Matching, Aarhus, Denmark, Lecture Notes in Computer Science1264, Springer Verlag, Berlin (1997) 65–75.
McCreight E.M., A space economical suffix tree construction algorithm, Journal of the ACM23 (1976) 262–272.
Kortsarz G., Peleg D., On choosing dense subgraphs, Proc. 34th FOCS, Palo-Alto, CA (1993) 692–701.
Storer J.A., Szymanski, T.G., Data compression via textual substitution, J. ACM29 (1982) 928–951.
Witten I.H., Moffat A., Bell T.C., Managing Gigabytes: Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York (1994).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klein, S.T. (2000). Improving Static Compression Schemes by Alphabet Extension. In: Giancarlo, R., Sankoff, D. (eds) Combinatorial Pattern Matching. CPM 2000. Lecture Notes in Computer Science, vol 1848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45123-4_19
Download citation
DOI: https://doi.org/10.1007/3-540-45123-4_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67633-1
Online ISBN: 978-3-540-45123-5
eBook Packages: Springer Book Archive