Abstract
We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Bellman (1996)
Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics (1997)
Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: Conference on Uncertainty in Artificial Intelligence (2004)
Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Conference on Uncertainty in Artificial Intelligence (2005)
Keller, P.W., Mannor, S., Precup, D.: Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. In: International Conference on Machine Learning, pp. 449–456. ACM Press, New York (2006)
Mahadevan, S.: Proto-Value Functions: Developmental Reinforcement Learning. In: International Conference on Machine Learning, pp. 553–560 (2005)
Mahadevan, S., Maggioni, M.: Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes. Machine Learning 8, 2169–2231 (2005)
Parr, R., Painter-Wakefiled, H., Li, L., Littman, M.L.: Analyzing Feature Generation for Value Function Approximation. In: International Conference on Machine Learning, pp. 737–744 (2008)
Petrik, M.: An Analysis of Laplacian Methods for Value Function Approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)
Puterman, M.L.: Markov Decision Processes: Discrete and Stochastic Dynamic Programming. Wiley (1994)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press (1998)
Tsitsiklis, J.N., Van Roy, B.: An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Comanici, G., Precup, D. (2012). Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics. In: Vrancx, P., Knudson, M., GrzeÅ›, M. (eds) Adaptive and Learning Agents. ALA 2011. Lecture Notes in Computer Science(), vol 7113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28499-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-28499-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28498-4
Online ISBN: 978-3-642-28499-1
eBook Packages: Computer ScienceComputer Science (R0)