Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics

Comanici, Gheorghe; Precup, Doina

doi:10.1007/978-3-642-28499-1_6

Gheorghe Comanici²² &
Doina Precup²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7113))

Included in the following conference series:

International Workshop on Adaptive and Learning Agents

850 Accesses
3 Citations

Abstract

We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Bellman (1996)
MATH Google Scholar
Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics (1997)
Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: Conference on Uncertainty in Artificial Intelligence (2004)
Google Scholar
Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Conference on Uncertainty in Artificial Intelligence (2005)
Google Scholar
Keller, P.W., Mannor, S., Precup, D.: Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. In: International Conference on Machine Learning, pp. 449–456. ACM Press, New York (2006)
Google Scholar
Mahadevan, S.: Proto-Value Functions: Developmental Reinforcement Learning. In: International Conference on Machine Learning, pp. 553–560 (2005)
Google Scholar
Mahadevan, S., Maggioni, M.: Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes. Machine Learning 8, 2169–2231 (2005)
MathSciNet MATH Google Scholar
Parr, R., Painter-Wakefiled, H., Li, L., Littman, M.L.: Analyzing Feature Generation for Value Function Approximation. In: International Conference on Machine Learning, pp. 737–744 (2008)
Google Scholar
Petrik, M.: An Analysis of Laplacian Methods for Value Function Approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)
Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete and Stochastic Dynamic Programming. Wiley (1994)
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press (1998)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montreal, QC, Canada
Gheorghe Comanici & Doina Precup

Authors

Gheorghe Comanici
View author publications
You can also search for this author in PubMed Google Scholar
Doina Precup
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AI& Computational Modeling Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussel, Belgium
Peter Vrancx
NASA Ames Research Park, Carnegie Mellon University, Building 23 (MS 23-11), P.O.Box 1, 94035, Moffet Field, CA, USA
Matthew Knudson
School of Computer Science, University of Waterloo, 200 University Avenue West, N2L 3G1, Waterloo, Ontario, Canada
Marek Grześ

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Comanici, G., Precup, D. (2012). Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics. In: Vrancx, P., Knudson, M., Grześ, M. (eds) Adaptive and Learning Agents. ALA 2011. Lecture Notes in Computer Science(), vol 7113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28499-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-28499-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28498-4
Online ISBN: 978-3-642-28499-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics