Pseudometrics for State Aggregation in Average Reward Markov Decision Processes
- Cite this paper as:
- Ortner R. (2007) Pseudometrics for State Aggregation in Average Reward Markov Decision Processes. In: Hutter M., Servedio R.A., Takimoto E. (eds) Algorithmic Learning Theory. ALT 2007. Lecture Notes in Computer Science, vol 4754. Springer, Berlin, Heidelberg
We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.
Unable to display preview. Download preview PDF.