Concentration and Confidence for Discrete Bayesian Sequence Predictors

  • Tor Lattimore
  • Marcus Hutter
  • Peter Sunehag
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8139)


Bayesian sequence prediction is a simple technique for predicting future symbols sampled from an unknown measure on infinite sequences over a countable alphabet. While strong bounds on the expected cumulative error are known, there are only limited results on the distribution of this error. We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence. We also consider the problem of constructing upper confidence bounds on the KL and Hellinger errors similar to those constructed from Hoeffding-like bounds in the i.i.d. case. The new results are applied to show that Bayesian sequence prediction can be used in the Knows What It Knows (KWIK) framework with bounds that match the state-of-the-art.


Bayesian sequence prediction concentration of measure information theory KWIK learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BD62]
    Blackwell, D., Dubins, L.: Merging of opinions with increasing information. The Annals of Mathematical Statistics 33(3), 882–886 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  2. [CB90]
    Clarke, B., Barron, A.: Information-theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory 36, 453–471 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  3. [DLL09]
    Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 249–256. ACM (2009)Google Scholar
  4. [HM07]
    Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  5. [Hut01]
    Hutter, M.: Convergence and error bounds for universal prediction of nonbinary sequences. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 239–250. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. [Hut03]
    Hutter, M.: Optimality of universal Bayesian prediction for general loss and alphabet. Journal of Machine Learning Research 4, 971–997 (2003)MathSciNetGoogle Scholar
  7. [Hut05]
    Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)Google Scholar
  8. [LHS13]
    Lattimore, T., Hutter, M., Sunehag, P.: Concentration and confidence for discrete Bayesian predictors. Technical report (2013),
  9. [LLWS11]
    Li, L., Littman, M., Walsh, T., Strehl, A.: Knows what it knows: a framework for self-aware learning. Machine Learning 82(3), 399–443 (2011)CrossRefMathSciNetzbMATHGoogle Scholar
  10. [Sol78]
    Solomonoff, R.: Complexity-based induction systems: Comparisons and convergence theorems. IEEE Transactions on Information Theory 24(4), 422–432 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  11. [SSVV11]
    Shafer, G., Shen, A., Vereshchagin, N., Vovk, V.: Test martingales, Bayes factors and p-values. Statistical Science 26(1), 84–101 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  12. [Vil39]
    Ville, J.: Etude critique de la notion de collectif. Gauthier-Villars, Paris (1939)Google Scholar
  13. [Vov87]
    Vovk, V.: On a randomness criterion. Soviet Mathematics Doklady 35, 656–660 (1987)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tor Lattimore
    • 1
  • Marcus Hutter
    • 1
  • Peter Sunehag
    • 1
  1. 1.Australian National UniversityAustralia

Personalised recommendations