# Compression Algorithms for Symbolic Data

• Roy Hoffman

## Abstract

In Chapters 4 and 5, we will learn how some important data compression algorithms work. The marketplace has selected these algorithms for its de facto and official standards because they are effective and implementable, allowing many other compression techniques to languish or remain subjects for further investigation. This chapter begins by looking at how data compression algorithms are constructed. Then it examines algorithms for compressing symbolic data, including character text, numbers, computer programs, and so on. In Chapter 5, the discussion continues with algorithms for compressing diffuse data, including speech, audio, image, and video. The simplest, most general algorithms for each data type are described first, followed by more powerful or more specialized algorithms.

## Keywords

Compression Ratio Data Compression Compression Algorithm Dictionary Entry Symbolic Data
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## Notes

1. 1.
The probability of a symbol is a number between 0 and 1.0 that measures how frequently it occurs relative to all possible symbols.Google Scholar
2. 2.
The information content of a message is measured by entropy, giving rise to the term entropy coding. If the probabilities of individual symbols are p19 the entropy calculation tells us the minimum number of bits for coding the message is — ∑ log2 Pi bits.Google Scholar
3. 3.
For in-depth treatments of arithmetic coding see [Bell 90, Nels 92, Penn 93, Witt 94].Google Scholar
4. 4.
Some references use “code” and “codeword” to describe the output of dictionary compression; others use “token.” Here, “token” will be used because the output contains multiple components.Google Scholar
5. 5.
Note that most studies found in the literature only measure the compression ratio and, sometimes, the compression speed — but only on large data files. Other characteristics of interest such as performance versus memory consumed by the algorithm and adaption speed are less well covered, except in [Bell90].Google Scholar