Skip to main content

Part of the book series: Springer Monographs in Mathematics ((SMM))

  • 470 Accesses

Abstract

When facing data from a “huge” alphabet, one may not be able to apply the previous results with satisfying theoretical guarantees, especially when those results are asymptotic. By a “huge alphabet”, we mean for instance that within the data, some letters may not have occurred yet. To understand how to cope with such situations, we will be interested in the case where the alphabet is infinite. In a finite alphabet, we have seen that there exist universal codes over the class of stationary ergodic sources. For classes of memoryless or Markovian sources, minimax redundancy and regret are both asymptotically equivalent to half the number of parameters times the logarithm base 2 of the encoded word length. In the non-parametric class of renewal sources, minimax redundancy and regret have the same asymptotic speed, up to multiplicative constants. All of this does not extend to infinite alphabets: there is no weakly universal code over the class of stationary ergodic sources, and we will see examples of classes for which the regret is infinite whereas the minimax redundancy is not. The chapter starts with an encoding of the integers, which will be useful in the design of other codes. Thanks to a theorem due to John Kieffer, we show that there is no weakly universal code over the class of stationary ergodic sources with values in a countable alphabet. We then focus on memoryless sources (sequences of i.i.d. random variables) and make use of the Minimax-Maximin Theorem   2.12 to obtain lower bounds on the minimax redundancy of classes characterized by the decay of the probability measure at infinity. Another approach is to code in two steps: first, encode the observed alphabet (letters occurring in the data), then, encode what is known as the “pattern”, containing information about the positions of letter repetitions, in their order of occurrence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. P. Elias, Universal codeword sets and representations of the integers. IEEE Trans. Inf. Theory 21, 194–203 (1975)

    Article  MathSciNet  Google Scholar 

  2. J. Kieffer, A unified approach to weak universal source coding. IEEE Trans. Inf. Theory 24, 674–682 (1978)

    Article  MathSciNet  Google Scholar 

  3. J. Acharya, A. Jafarpour, A. Orlitsky, A.T. Suresh, Poissonization and universal compression of envelope classes, in 2014 IEEE International Symposium on Information Theory (ISIT), pp. 1872–1876 (IEEE, 2014)

    Google Scholar 

  4. D. Bontemps, Universal coding on infinite alphabets: exponentially decreasing envelopes. IEEE Trans. Inf. Theory 57(3), 1466–1478 (2011). ISSN 0018-9448. http://dx.doi.org/10.1109/TIT.2010.2103831

  5. D. Haussler, M. Opper, Mutual information, metric entropy and cumulative relative entropy risk. Annals Stat. 25, 2451–2492 (1997)

    Article  MathSciNet  Google Scholar 

  6. P. Massart, Concentration inequalities and model selection, in Lecture Notes in Mathematics. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003, With a foreword by Jean Picard, vol. 1896 (Springer, Berlin, 2007). ISBN 978-3-540-48497-4; 3-540-48497-3

    Google Scholar 

  7. G.M. Gemelos, T. Weissman, On the entropy rate of pattern processes. IEEE Trans. Inf. Theory 52, 3994–4007 (2006)

    Article  MathSciNet  Google Scholar 

  8. A. Orlitsky, N.P. Santhanam, Speaking of infinity. IEEE Trans. Inf. Theory 50, 2215–2230 (2004)

    Article  MathSciNet  Google Scholar 

  9. G. Hardy, S. Ramanujan, Asymptotic formulæ in combinatory analysis (Proc. London Math. Soc. 17(2), 75–115, (1918)), in Collected Papers of Srinivasa Ramanujan, pp. 276–309 (AMS Chelsea Publ., Providence, RI, 2000)

    Google Scholar 

  10. A. Garivier, A lower bound for the maximin redundancy in pattern coding. Entropy 11, 634–642 (2009)

    Article  MathSciNet  Google Scholar 

  11. J. Dixmier, J.-L. Nicolas, Partitions sans petits sommants, in A Tribute to Paul Erdős, pp. 121–152 (Cambridge Univ. Press, Cambridge, 1990)

    Google Scholar 

  12. S. Boucheron, A. Garivier, E. Gassiat, Coding on countably infinite alphabets. IEEE Trans. Inf. Theory 55, 358–373 (2009)

    Article  MathSciNet  Google Scholar 

  13. D. He, E. Yang, The universality of grammar-based codes for sources with countably infinite alphabets. IEEE Trans. Inf. Theory 51, 3753–3765 (2005)

    Article  MathSciNet  Google Scholar 

  14. D. Foster, R. Stine, A. Wyner, Universal codes for finite sequences of integers drawn from a monotone distribution. IEEE Trans. Inf. Theory 48, 1713–1720 (2002)

    Article  MathSciNet  Google Scholar 

  15. A. Dhulipala, A. Orlitsky, Universal compression of Markov and related sources over arbitrary alphabets. IEEE Trans. Inf. Theory 53, 4182–4190 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Élisabeth Gassiat .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Gassiat, É. (2018). Universal Coding on Infinite Alphabets. In: Universal Coding and Order Identification by Model Selection Methods. Springer Monographs in Mathematics. Springer, Cham. https://doi.org/10.1007/978-3-319-96262-7_3

Download citation

Publish with us

Policies and ethics