Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics
We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.
KeywordsMarkov Decision Process Spectral Cluster Feature Construction Spectral Cluster Method Reward Information
Unable to display preview. Download preview PDF.
- 2.Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics (1997)Google Scholar
- 3.Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: Conference on Uncertainty in Artificial Intelligence (2004)Google Scholar
- 4.Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Conference on Uncertainty in Artificial Intelligence (2005)Google Scholar
- 5.Keller, P.W., Mannor, S., Precup, D.: Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. In: International Conference on Machine Learning, pp. 449–456. ACM Press, New York (2006)Google Scholar
- 6.Mahadevan, S.: Proto-Value Functions: Developmental Reinforcement Learning. In: International Conference on Machine Learning, pp. 553–560 (2005)Google Scholar
- 8.Parr, R., Painter-Wakefiled, H., Li, L., Littman, M.L.: Analyzing Feature Generation for Value Function Approximation. In: International Conference on Machine Learning, pp. 737–744 (2008)Google Scholar
- 9.Petrik, M.: An Analysis of Laplacian Methods for Value Function Approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)Google Scholar
- 10.Puterman, M.L.: Markov Decision Processes: Discrete and Stochastic Dynamic Programming. Wiley (1994)Google Scholar
- 11.Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press (1998)Google Scholar