A Framework for Computing Bounds for the Return of a Policy
We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge – namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds.
Unable to display preview. Download preview PDF.
- 1.Brunskill, E., Leffler, B., Li, L., Littman, M., Roy, N.: CORL: A continuous-state offset-dynamics reinforcement learner. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pp. 53–61 (2008)Google Scholar
- 3.Ermon, S., Conrad, J., Gomes, C., Selman, B.: Playing games against nature: optimal policies for renewable resource allocation. In: Proceedings of The 26th Conference on Uncertainty in Artificial Intelligence (2010)Google Scholar
- 4.Fonteneau, R., Murphy, S., Wehenkel, L., Ernst, D.: Inferring bounds on the performance of a control policy from a sample of trajectories. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 117–123 (2009)Google Scholar
- 6.Kaelbling, L.P.: Learning in embedded systems. MIT Press (1993)Google Scholar
- 7.Kakade, S., Kearns, M., Langford, J.: Exploration in Metric State Spaces. In: International Conference on Machine Learning, vol. 20, p. 306 (2003)Google Scholar