Abstract
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1-2), 163–223 (2003)
Even-Dar, E., Mansour, Y.: Approximate equivalence of Markov decision processes. In: Proc. 16th COLT, pp. 581–594. Springer, Heidelberg (2003)
Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proc. 20th UAI, pp. 162–169. AUAI Press (2004)
Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains. Springer, Heidelberg (1976)
Hunter, J.J.: Mixing times with applications to perturbed Markov chains. Linear Algebra Appl. 417, 108–123 (2006)
Ortner, R.: Linear dependence of stationary distributions in ergodic Markov decision processes. Oper. Res. Lett. (in press, 2007), doi:10.1016/j.orl.2006.12.001
Puterman, M.L.: Markov Decision Processes. Wiley, Chichester (1994)
Leitgeb, H.: A new analysis of quasianalysis. J. Philos. Logic 36(2), 181–226 (2007)
Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: Proc. 11th ICML, pp. 284–292. Morgan Kaufmann, San Francisco (1994)
Cho, G.E., Meyer, C.D.: Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra Appl. 335, 137–150 (2001)
Seneta, E.: Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains. In: Numerical solution of Markov chains, pp. 121–129. Dekker, New York (1991)
Seneta, E.: Markov and the creation of Markov chains. In: MAM 2006: Markov Anniversary Meeting, pp. 1–20. Boson Books, Raleigh (2006)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, San Diego (2006)
Auer, P., Ortner, R.: Logarithmic online regret bounds for reinforcement learning. In: Proc. 19th NIPS, pp. 49–56. MIT Press, Cambridge (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ortner, R. (2007). Pseudometrics for State Aggregation in Average Reward Markov Decision Processes. In: Hutter, M., Servedio, R.A., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science(), vol 4754. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75225-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-75225-7_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75224-0
Online ISBN: 978-3-540-75225-7
eBook Packages: Computer ScienceComputer Science (R0)