Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics

  • Gheorghe Comanici
  • Doina Precup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7113)

Abstract

We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide bisimulation metric bounds for general feature maps. This result suggests a new way of generating features, with strong theoretical guarantees on the quality of the obtained approximation. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used in the basis function construction process.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Bellman (1996)MATHGoogle Scholar
  2. 2.
    Chung, F.: Spectral Graph Theory. CBMS Regional Conference Series in Mathematics (1997)Google Scholar
  3. 3.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for Finite Markov Decision Processes. In: Conference on Uncertainty in Artificial Intelligence (2004)Google Scholar
  4. 4.
    Ferns, N., Panangaden, P., Precup, D.: Metrics for Markov Decision Processes with Infinite State Spaces. In: Conference on Uncertainty in Artificial Intelligence (2005)Google Scholar
  5. 5.
    Keller, P.W., Mannor, S., Precup, D.: Automatic Basis Function Construction for Approximate Dynamic Programming and Reinforcement Learning. In: International Conference on Machine Learning, pp. 449–456. ACM Press, New York (2006)Google Scholar
  6. 6.
    Mahadevan, S.: Proto-Value Functions: Developmental Reinforcement Learning. In: International Conference on Machine Learning, pp. 553–560 (2005)Google Scholar
  7. 7.
    Mahadevan, S., Maggioni, M.: Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes. Machine Learning 8, 2169–2231 (2005)MathSciNetMATHGoogle Scholar
  8. 8.
    Parr, R., Painter-Wakefiled, H., Li, L., Littman, M.L.: Analyzing Feature Generation for Value Function Approximation. In: International Conference on Machine Learning, pp. 737–744 (2008)Google Scholar
  9. 9.
    Petrik, M.: An Analysis of Laplacian Methods for Value Function Approximation in MDPs. In: International Joint Conference on Artificial Intelligence, pp. 2574–2579 (2007)Google Scholar
  10. 10.
    Puterman, M.L.: Markov Decision Processes: Discrete and Stochastic Dynamic Programming. Wiley (1994)Google Scholar
  11. 11.
    Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning. MIT Press (1998)Google Scholar
  12. 12.
    Tsitsiklis, J.N., Van Roy, B.: An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control 42(5), 674–690 (1997)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gheorghe Comanici
    • 1
  • Doina Precup
    • 1
  1. 1.School of Computer ScienceMcGill UniversityMontrealCanada

Personalised recommendations