Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

  • Ronald Ortner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4754)

Abstract

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 147(1-2), 163–223 (2003)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Even-Dar, E., Mansour, Y.: Approximate equivalence of Markov decision processes. In: Proc. 16th COLT, pp. 581–594. Springer, Heidelberg (2003)Google Scholar
  3. 3.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for finite Markov decision processes. In: Proc. 20th UAI, pp. 162–169. AUAI Press (2004)Google Scholar
  4. 4.
    Kemeny, J., Snell, J., Knapp, A.: Denumerable Markov Chains. Springer, Heidelberg (1976)CrossRefMATHGoogle Scholar
  5. 5.
    Hunter, J.J.: Mixing times with applications to perturbed Markov chains. Linear Algebra Appl. 417, 108–123 (2006)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Ortner, R.: Linear dependence of stationary distributions in ergodic Markov decision processes. Oper. Res. Lett. (in press, 2007), doi:10.1016/j.orl.2006.12.001Google Scholar
  7. 7.
    Puterman, M.L.: Markov Decision Processes. Wiley, Chichester (1994)CrossRefMATHGoogle Scholar
  8. 8.
    Leitgeb, H.: A new analysis of quasianalysis. J. Philos. Logic 36(2), 181–226 (2007)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable Markovian decision processes. In: Proc. 11th ICML, pp. 284–292. Morgan Kaufmann, San Francisco (1994)Google Scholar
  10. 10.
    Cho, G.E., Meyer, C.D.: Comparison of perturbation bounds for the stationary distribution of a Markov chain. Linear Algebra Appl. 335, 137–150 (2001)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Seneta, E.: Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite Markov chains. In: Numerical solution of Markov chains, pp. 121–129. Dekker, New York (1991)Google Scholar
  12. 12.
    Seneta, E.: Markov and the creation of Markov chains. In: MAM 2006: Markov Anniversary Meeting, pp. 1–20. Boson Books, Raleigh (2006)Google Scholar
  13. 13.
    Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 3rd edn. Academic Press, San Diego (2006)MATHGoogle Scholar
  14. 14.
    Auer, P., Ortner, R.: Logarithmic online regret bounds for reinforcement learning. In: Proc. 19th NIPS, pp. 49–56. MIT Press, Cambridge (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ronald Ortner
    • 1
  1. 1.University of Leoben, A-8700 LeobenAustria

Personalised recommendations