Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

  • Ronald Ortner
Conference paper

DOI: 10.1007/978-3-540-75225-7_30

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4754)
Cite this paper as:
Ortner R. (2007) Pseudometrics for State Aggregation in Average Reward Markov Decision Processes. In: Hutter M., Servedio R.A., Takimoto E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science, vol 4754. Springer, Berlin, Heidelberg

Abstract

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ronald Ortner
    • 1
  1. 1.University of Leoben, A-8700 LeobenAustria

Personalised recommendations