Consistency of Feature Markov Processes
We are studying long term sequence prediction (forecasting). We approach this by investigating criteria for choosing a compact useful state representation. The state is supposed to summarize useful information from the history. We want a method that is asymptotically consistent in the sense it will provably eventually only choose between alternatives that satisfy an optimality property related to the used criterion. We extend our work to the case where there is side information that one can take advantage of and, furthermore, we briefly discuss the active setting where an agent takes actions to achieve desirable outcomes.
Unable to display preview. Download preview PDF.
- [CS00]Csiszr, I., Shields, P.C.: The consistency of the bic markov order estimator (2000)Google Scholar
- [Hut09]Hutter, M.: Feature reinforcement learning: Part I: Unstructured MDPs. Journal of Artificial General Intelligence 1, 3–24 (2009)Google Scholar
- [Mah10]Mahmud, M.M.: Constructing states for reinforcement learning. In: The 27:th International Conference on Machine Learning, ICML 2010 (2010)Google Scholar
- [McC96]McCallum, A.K.: Reinforcement learning with selective perception and hidden state. PhD thesis, The University of Rochester (1996)Google Scholar
- [RN10]Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall, Englewood Cliffs (2010)Google Scholar
- [SB98]Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning). MIT Press, Cambridge (March 1998)Google Scholar