Abstract
When facing data from a “huge” alphabet, one may not be able to apply the previous results with satisfying theoretical guarantees, especially when those results are asymptotic. By a “huge alphabet”, we mean for instance that within the data, some letters may not have occurred yet. To understand how to cope with such situations, we will be interested in the case where the alphabet is infinite. In a finite alphabet, we have seen that there exist universal codes over the class of stationary ergodic sources. For classes of memoryless or Markovian sources, minimax redundancy and regret are both asymptotically equivalent to half the number of parameters times the logarithm base 2 of the encoded word length. In the non-parametric class of renewal sources, minimax redundancy and regret have the same asymptotic speed, up to multiplicative constants. All of this does not extend to infinite alphabets: there is no weakly universal code over the class of stationary ergodic sources, and we will see examples of classes for which the regret is infinite whereas the minimax redundancy is not. The chapter starts with an encoding of the integers, which will be useful in the design of other codes. Thanks to a theorem due to John Kieffer, we show that there is no weakly universal code over the class of stationary ergodic sources with values in a countable alphabet. We then focus on memoryless sources (sequences of i.i.d. random variables) and make use of the Minimax-Maximin Theorem 2.12 to obtain lower bounds on the minimax redundancy of classes characterized by the decay of the probability measure at infinity. Another approach is to code in two steps: first, encode the observed alphabet (letters occurring in the data), then, encode what is known as the “pattern”, containing information about the positions of letter repetitions, in their order of occurrence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
P. Elias, Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21, 194–203 (1975)
J. Kieffer, A unified approach to weak universal source coding. IEEE Trans. Inf. Theory 24, 674–682 (1978)
J. Acharya, A. Jafarpour, A. Orlitsky, A.T. Suresh, Poissonization and universal compression of envelope classes, in 2014 IEEE International Symposium on Information Theory (ISIT), pp. 1872–1876 (IEEE, 2014)
D. Bontemps, Universal coding on infinite alphabets: exponentially decreasing envelopes. IEEE Trans. Inf. Theory 57(3), 1466–1478 (2011). ISSN 0018-9448. http://dx.doi.org/10.1109/TIT.2010.2103831
D. Haussler, M. Opper, Mutual information, metric entropy and cumulative relative entropy risk. Annals Stat. 25, 2451–2492 (1997)
P. Massart, Concentration inequalities and model selection, in Lecture Notes in Mathematics. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard, vol. 1896 (Springer, Berlin, 2007). ISBN 978-3-540-48497-4; 3-540-48497-3
G.M. Gemelos, T. Weissman, On the entropy rate of pattern processes. IEEE Trans. Inf. Theory 52, 3994–4007 (2006)
A. Orlitsky, N.P. Santhanam, Speaking of infinity. IEEE Trans. Inf. Theory 50, 2215–2230 (2004)
G. Hardy, S. Ramanujan, Asymptotic formulæ in combinatory analysis (Proc. London Math. Soc. 17(2), 75–115, (1918)), in Collected Papers of Srinivasa Ramanujan, pp. 276–309 (AMS Chelsea Publ., Providence, RI, 2000)
A. Garivier, A lower bound for the maximin redundancy in pattern coding. Entropy 11, 634–642 (2009)
J. Dixmier, J.-L. Nicolas, Partitions sans petits sommants, in A Tribute to Paul Erdős, pp. 121–152 (Cambridge Univ. Press, Cambridge, 1990)
S. Boucheron, A. Garivier, E. Gassiat, Coding on countably infinite alphabets. IEEE Trans. Inf. Theory 55, 358–373 (2009)
D. He, E. Yang, The universality of grammar-based codes for sources with countably infinite alphabets. IEEE Trans. Inf. Theory 51, 3753–3765 (2005)
D. Foster, R. Stine, A. Wyner, Universal codes for finite sequences of integers drawn from a monotone distribution. IEEE Trans. Inf. Theory 48, 1713–1720 (2002)
A. Dhulipala, A. Orlitsky, Universal compression of Markov and related sources over arbitrary alphabets. IEEE Trans. Inf. Theory 53, 4182–4190 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Gassiat, É. (2018). Universal Coding on Infinite Alphabets. In: Universal Coding and Order Identification by Model Selection Methods. Springer Monographs in Mathematics. Springer, Cham. https://doi.org/10.1007/978-3-319-96262-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-96262-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96261-0
Online ISBN: 978-3-319-96262-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)