Consistency of Feature Markov Processes

  • Peter Sunehag
  • Marcus Hutter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6331)

Abstract

We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [BP66]
    Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of Finite State Markov chains. The Annals of Mathematical Statistics 37(6), 1554–1563 (1966)MATHCrossRefMathSciNetGoogle Scholar
  2. [CMR05]
    Cappé, O., Moulines, E., Rydenp, T.: Inference in Hidden Markov Models. Springer Series in Statistics. Springer, New York (2005)MATHGoogle Scholar
  3. [CS00]
    Csiszr, I., Shields, P.C.: The consistency of the bic markov order estimator (2000)Google Scholar
  4. [EM02]
    Ephraim, Y., Merhav, N.: Hidden Markov processes. IEEE Transactions on Information Theory 48(6), 1518–1569 (2002)MATHCrossRefMathSciNetGoogle Scholar
  5. [FLN96]
    Finesso, L., Liu, C., Narayan, P.: The optimal error exponent for markov order estimation. IEEE Trans. Inform. Theory 42, 1488–1497 (1996)MATHCrossRefMathSciNetGoogle Scholar
  6. [GB03]
    Gassiat, E., Boucheron, S.: Optimal error exponents in hidden Markov models order estimation. IEEE Transactions on Information Theory 49(4), 964–980 (2003)MATHCrossRefMathSciNetGoogle Scholar
  7. [Hut09]
    Hutter, M.: Feature reinforcement learning: Part I: Unstructured MDPs. Journal of Artificial General Intelligence 1, 3–24 (2009)Google Scholar
  8. [Mah10]
    Mahmud, M.M.: Constructing states for reinforcement learning. In: The 27:th International Conference on Machine Learning, ICML 2010 (2010)Google Scholar
  9. [McC96]
    McCallum, A.K.: Reinforcement learning with selective perception and hidden state. PhD thesis, The University of Rochester (1996)Google Scholar
  10. [Pet69]
    Petrie, T.: Probabilistic functions of Finite State Markov chains. The Annals of Mathematical Statistics 40(1), 97–115 (1969)MATHCrossRefMathSciNetGoogle Scholar
  11. [Ris83]
    Rissanen, J.: A universal data compression system. IEEE Transactions on Information Theory 29(5), 656–663 (1983)MATHCrossRefMathSciNetGoogle Scholar
  12. [Ris86]
    Rissanen, J.: Complexity of strings in the class of Markov sources. IEEE Transactions on Information Theory 32(4), 526–532 (1986)MATHCrossRefMathSciNetGoogle Scholar
  13. [RN10]
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall, Englewood Cliffs (2010)Google Scholar
  14. [SB98]
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). MIT Press, Cambridge (March 1998)Google Scholar
  15. [Sin96]
    Singer, Y.: Adaptive mixtures of probabilistic transducers. Neural Computation 9, 1711–1733 (1996)CrossRefGoogle Scholar
  16. [VTdlH+5a]
    Vidal, E., Thollard, F., de la Higuera, C., Casacuberta, F., Carrasco, R.C.: Probabilistic finite-state machines – Part I. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005a)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Peter Sunehag
    • 1
  • Marcus Hutter
    • 1
  1. 1.RSISE@Australian National University and SML@NICTACanberraAustralia

Personalised recommendations