Advertisement

Low Complexity Proto-Value Function Learning from Sensory Observations with Incremental Slow Feature Analysis

  • Matthew Luciw
  • Juergen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)

Abstract

We show that Incremental Slow Feature Analysis (IncSFA) provides a low complexity method for learning Proto-Value Functions (PVFs). It has been shown that a small number of PVFs provide a good basis set for linear approximation of value functions in reinforcement environments. Our method learns PVFs from a high-dimensional sensory input stream, as the agent explores its world, without building a transition model, adjacency matrix, or covariance matrix. A temporal-difference based reinforcement learner improves a value function approximation upon the features, and the agent uses the value function to achieve rewards successfully. The algorithm is local in space and time, furthering the biological plausibility and applicability of PVFs.

Keywords

Proto-Value Functions Incremental Slow Feature Analysis Biologically Inspired Reinforcement Learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bradtke, S.J., Barto, A.G.: Linear least-squares algorithms for temporal difference learning. Machine Learning 22(1), 33–57 (1996)zbMATHGoogle Scholar
  2. 2.
    Chung, F.R.K.: Spectral graph theory. AMS Press, Providence (1997)zbMATHGoogle Scholar
  3. 3.
    Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences of the United States of America 102(21), 7426 (2005)CrossRefGoogle Scholar
  4. 4.
    Coifman, R.R., Maggioni, M.: Diffusion wavelets. Applied and Computational Harmonic Analysis 21(1), 53–94 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    da Motta Salles Barreto, A., Anderson, C.W.: Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artificial Intelligence 172(4-5), 454–482 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    Dayan, P., Abbott, L.F.: Theoretical neuroscience: Computational and mathematical modeling of neural systems (2001)Google Scholar
  7. 7.
    Forsythe, G.E., Henrici, P.: The cyclic Jacobi method for computing the principal values of a complex matrix. Applied Mathematics and Statistics Laboratories, Stanford University (1958)Google Scholar
  8. 8.
    Franzius, M., Sprekeler, H., Wiskott, L.: Slowness and sparseness lead to place, head-direction, and spatial-view cells. PLoS Computational Biology 3(8), e166 (2007)Google Scholar
  9. 9.
    Kompella, V.R., Luciw, M.D., Schmidhuber, J.: Incremental slow feature analysis: Adaptive low-complexity slow feature updating from high-dimensional input streams. Neural Computation (accepted and to appear, 2012)Google Scholar
  10. 10.
    Kompella, V.R., Luciw, M., Schmidhuber, J.: Incremental slow feature analysis. In: International Joint Conference of Artificial Intelligence (2011)Google Scholar
  11. 11.
    Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: International Joint Conference on Neural Networks, Barcelona, Spain (2010)Google Scholar
  12. 12.
    Legenstein, R., Wilbert, N., Wiskott, L.: Reinforcement learning on slow features of high-dimensional input streams. PLoS Computational Biology 6(8) (2010)Google Scholar
  13. 13.
    Lin, L.J.: Reinforcement learning for robots using neural networks. School of Computer Science, Carnegie Mellon University (1993)Google Scholar
  14. 14.
    Luciw, M., Kompella, V.R., Schmidhuber, J.: Hierarchical incremental slow feature analysis. In: Workshop on Deep Hierarchies in Vision (2012)Google Scholar
  15. 15.
    Mahadevan, S.: Proto-value functions: Developmental reinforcement learning. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 553–560. ACM (2005)Google Scholar
  16. 16.
    Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134(1), 215–238 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  17. 17.
    Santamaria, J.C., Sutton, R.S., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6(2), 163 (1997)CrossRefGoogle Scholar
  18. 18.
    Schmidhuber, J.: A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science 1(4), 403–412 (1989)CrossRefGoogle Scholar
  19. 19.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  20. 20.
    Sprekeler, H.: On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation, 1–16 (2011)Google Scholar
  21. 21.
    Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. Cambridge Univ. Press (1998)Google Scholar
  22. 22.
    Wiskott, L., Sejnowski, T.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715–770 (2002)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Matthew Luciw
    • 1
  • Juergen Schmidhuber
    • 1
  1. 1.IDSIA-USI-SUPSIManno-LuganoSwitzerland

Personalised recommendations