# Identification Entropy

• R. Ahlswede
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4123)

## Abstract

Shannon (1948) has shown that a source $$({\mathcal {U}},P,U)$$ with output U satisfying Prob (U=u)=P u , can be encoded in a prefix code $${\mathcal{C}}=\{c_u:u\in{\mathcal {U}}\}\subset\{0,1\}^*$$ such that for the entropy

$$H(P)=\sum\limits_{u\in{\mathcal {U}}}-p_u\log p_u\leq\sum p_u|| c_u|| \leq H(P)+1,$$

where || c u || is the length of c u .

We use a prefix code $$\mathcal{C}$$ for another purpose, namely noiseless identification, that is every user who wants to know whether a u $$(u\in{\mathcal {U}})$$ of his interest is the actual source output or not can consider the RV C with $$C=c_u=(c_{u_1},\dots,c_{u || c_u ||})$$ and check whether C=(C 1,C 2,...) coincides with c u in the first, second etc. letter and stop when the first different letter occurs or when C=c u . Let $$L_{\mathcal{C}}(P,u)$$ be the expected number of checkings, if code $$\mathcal{C}$$ is used.

Our discovery is an identification entropy, namely the function

$$H_I(P)=2\left(1-\sum\limits_{u\in{\mathcal {U}}}P_u^2\right).$$

We prove that $$L_{\mathcal{C}}(P,P)=\sum\limits_{u\in{\mathcal {U}}}P_u$$ $$L_{\mathcal{C}}(P,u)\geq H_I(P)$$ and thus also that

$$L(P)=\min\limits_{\mathcal{C}}\max\limits_{u\in{\mathcal {U}}}L_{\mathcal{C}}(P,u)\geq H_I(P)$$

and related upper bounds, which demonstrate the operational significance of identification entropy in noiseless source coding similar as Shannon entropy does in noiseless data compression.

Also other averages such as $$\bar L_{\mathcal{C}}(P)=\frac1{|{\mathcal {U}}|} \sum\limits_{u\in{\mathcal {U}}}L_{\mathcal{C}}(P,u)$$ are discussed in particular for Huffman codes where classically equivalent Huffman codes may now be different.

We also show that prefix codes, where the codewords correspond to the leaves in a regular binary tree, are universally good for this average.

## Keywords

Extended Node Huffman Code Decomposition Formula Codeword Length Satisfying Prob

## References

1. 1.
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Techn. J. 27, 379–423 (1948)
2. 2.
Huffman, D.A.: A method for the construction of minimum redundancy codes. In: Proc. IRE, vol. 40, pp. 1098–1101 (1952)Google Scholar
3. 3.
Ahlswede, R., Dueck, G.: Identification via channels. IEEE Trans. Inf. Theory 35(1), 15–29 (1989)
4. 4.
Ahlswede, R.: General theory of information transfer: updated, General Theory of Information Transfer and Combinatorics, a Special issue of Discrete Applied MathematicsGoogle Scholar
5. 5.
Ahlswede, R., Balkenhol, B., Kleinewächter, C.: Identification for sources. In: Ahlswede, R., Bäumer, L., Cai, N., Aydinian, H., Blinovsky, V., Deppe, C., Mashurian, H. (eds.) General Theory of Information Transfer and Combinatorics. LNCS, vol. 4123, pp. 51–61. Springer, Heidelberg (2006)