Offline to Online Conversion

  • Marcus Hutter
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8776)

Abstract

We consider the problem of converting offline estimators into an online predictor or estimator with small extra regret. Formally this is the problem of merging a collection of probability measures over strings of length 1,2,3,... into a single probability measure over infinite sequences. We describe various approaches and their pros and cons on various examples. As a side-result we give an elementary non-heuristic purely combinatoric derivation of Turing’s famous estimator. Our main technical contribution is to determine the computational complexity of online estimators with good guarantees in general.

Keywords

Offline online batch sequential probability estimation prediction time-consistency normalization tractable regret combinatorics Bayes Laplace Ristad Good-Turing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [AB09]
    Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge University Press (2009)Google Scholar
  2. [AS74]
    Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover Publications (1974)Google Scholar
  3. [BC91]
    Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Transactions on Information Theory 37, 1034–1054 (1991)MathSciNetCrossRefMATHGoogle Scholar
  4. [CG99]
    Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Computer Speech and Language 13, 359–394 (1999)CrossRefGoogle Scholar
  5. [Goo53]
    Good, I.J.: The population frequencies of species and the estimation of population parameters. Biometrika 40(3/4), 237–264 (1953)MathSciNetCrossRefMATHGoogle Scholar
  6. [Grü07]
    Grünwald, P.D.: The Minimum Description Length Principle. The MIT Press, Cambridge (2007)Google Scholar
  7. [Hut03]
    Hutter, M.: Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research 4, 971–1000 (2003)MathSciNetGoogle Scholar
  8. [Hut05]
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  9. [Hut09]
    Hutter, M.: Discrete MDL predicts in total variation. In: Advances in Neural Information Processing Systems 22 (NIPS 2009), pp. 817–825. Curran Associates, Cambridge (2009)Google Scholar
  10. [Hut14]
    Hutter, M.: Offline to online conversion. Technical report (2014), http://www.hutter1.net/publ/off2onx.pdf
  11. [Nad85]
    Nadas, A.: On Turing’s formula for word probabilities. IEEE Transactions on Acoustics, Speech, and Signal Processing 33(6), 1414–1416 (1985)CrossRefMATHGoogle Scholar
  12. [PH05]
    Poland, J., Hutter, M.: Asymptotics of discrete MDL for online prediction. IEEE Transactions on Information Theory 51(11), 3780–3795 (2005)MathSciNetCrossRefMATHGoogle Scholar
  13. [RH07]
    Ryabko, D., Hutter, M.: On sequence prediction for arbitrary measures. In: Proc. IEEE International Symposium on Information Theory (ISIT 2007), pp. 2346–2350. IEEE, Nice (2007)CrossRefGoogle Scholar
  14. [Ris95]
    Ristad, E.S.: A natural law of succession. Technical Report CS-TR-495-95. Princeton University (1995)Google Scholar
  15. [San06]
    Santhanam, N.: Probability Estimation and Compression Involving Large Alphabets. PhD thesis, Univerity of California, San Diego, USA (2006)Google Scholar
  16. [Sol78]
    Solomonoff, R.J.: Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory IT-24, 422–432 (1978)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marcus Hutter
    • 1
  1. 1.Research School of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations