Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning

  • Bohdana Ratitch
  • Doina Precup
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3201)


In this paper, we advocate the use of Sparse Distributed Memories (SDMs) for on-line, value-based reinforcement learning (RL). SDMs provide a linear, local function approximation scheme, designed to work when a very large/ high-dimensional input (address) space has to be mapped into a much smaller physical memory. We present an implementation of the SDM architecture for on-line, value-based RL in continuous state spaces. An important contribution of this paper is an algorithm for dynamic on-line allocation and adjustment of memory resources for SDMs, which eliminates the need for choosing the memory size and structure a priori. In our experiments, this algorithm provides very good performance while efficiently managing the memory resources.


Reinforcement Learning Input Space Memory Location Memory Size Memory Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Anderson, C.: Q-learning with hidden-unit restarting. In: NIPS, pp. 81–88 (1993)Google Scholar
  2. 2.
    Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning. Artificial Intelligence Review, 11–73 (1997)Google Scholar
  3. 3.
    Atkeson, C.G., Moore, A.W., Schaal, S.: Locally weighted learning for control. Artificial Intelligence Review, 75–113 (1997)Google Scholar
  4. 4.
    Blanzieri, E., Katenkamp, P.: Learning RBFNs on-line. In: ICML, pp. 37–45 (1996)Google Scholar
  5. 5.
    Dietterich, T.G., Wang, X.: Batch value function approximation via support vectors. In: NIPS, pp. 444–450 (2001)Google Scholar
  6. 6.
    Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: ICML, pp. 154–161 (2003)Google Scholar
  7. 7.
    Flachs, B., Flynn, J.M.: Sparse adaptive memory (Tech. Rep. 92-530). Computer Systems Lab., Dptm. of Electrical Engineering and Computer Science, Stanford University (1992)Google Scholar
  8. 8.
    Forbes, J.R.N.: Reinforcement learning for autonomous vehicles. Ph.D. Thesis, Computer Science Department, University of California at Berkeley (2002)Google Scholar
  9. 9.
    Fritzke, B.: A self-organizing network that can follow non-stationary distributions. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 613–618. Springer, Heidelberg (1997)Google Scholar
  10. 10.
    Gordon, G.J.: Stable function approximation in dynamic programming. In: ICML, pp. 261–268 (1995)Google Scholar
  11. 11.
    Gordon, G.J.: Reinforcement learning with function approximation converges to a region. In: NIPS, pp. 1040–1046 (2000)Google Scholar
  12. 12.
    Hely, T.A., Willshaw, D.J., Hayes, G.M.: A new approach to Kanerva’s sparse distributed memory. Neural Networks 3, 791–794 (1997)Google Scholar
  13. 13.
    Kanerva, P.: Sparse distributed memory and related models. In: Hassoun, M. (ed.) Associative neural memories: Theory and implementation, pp. 50–76. Oxford University Press, Oxford (1993)Google Scholar
  14. 14.
    Kondo, T., Ito, K.: A reinforcement learning with adaptive state space recruitment strategy for real autonomous mobile robots. In: IROS (2002)Google Scholar
  15. 15.
    Kretchmar, R., Anderson, C.: Comparison of CMACs and RBFs for local function approximators in reinforcement learning. In: IEEE Int. Conf. on Neural Networks, pp. 834–837 (1997)Google Scholar
  16. 16.
    Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: ICML, pp. 424–431 (2003)Google Scholar
  17. 17.
    Martin, M.: On-line support vector machine regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 282–294. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  18. 18.
    Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine learning 49, 291–323 (2000)CrossRefGoogle Scholar
  19. 19.
    Platt, J.: A resource-allocating network for function interpolation. Neural Computation 3, 213–225 (1991)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Ralaivola, L., d’Alche Buc, F.: Incremental support vector machine learning: a local approach. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, p. 322. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  21. 21.
    Rao, R.P.N., Fuentes, O.: Hierarchical learning of navigational behaviors in an autonomous robot using a predictive SDM. Autonomous Robots 5, 297–316 (1998)CrossRefGoogle Scholar
  22. 22.
    Ratitch, B., Mahadevan, S., Precup, D.: Sparse distribute memories as function approximators in value-based reinforcement learning: Case studies. In: AAAI Workshop on Learning and Planning in Markov Processes (2004)Google Scholar
  23. 23.
    Reynolds, S.I.: Decision boundary partitioning: variable resolution model-free reinforcement learning. In: ICML, pp. 783–790 (2000)Google Scholar
  24. 24.
    Santamaria, J.C., Sutton, R.S., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6, 163–218 (1998)CrossRefGoogle Scholar
  25. 25.
    Scholkopf, B.: The kernel trick for distances. In: NIPS, pp. 301–307 (2000)Google Scholar
  26. 26.
    Smart, W., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: ICML, pp. 903–910 (2000)Google Scholar
  27. 27.
    Sutton, R.S., Barto, A.G.: Reinforcement learning. An introduction. The MIT Press, Cambridge (1998)Google Scholar
  28. 28.
    Sutton, R.S., Whitehead, S.D.: Online learning with random representations. In: ICML, pp. 314–321 (1993)Google Scholar
  29. 29.
    Szepesvari, C., Smart, W.D.: Convergent value function approximation methods (2004),
  30. 30.
    Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large scale dynamic programming. Machine Learning, 59–94 (1996)Google Scholar
  31. 31.
    Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42, 674–690 (1997)zbMATHCrossRefGoogle Scholar
  32. 32.
    Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcement learning. In: AAAI, pp. 769–774 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Bohdana Ratitch
    • 1
  • Doina Precup
    • 1
  1. 1.McGill UniversityMontrealCanada

Personalised recommendations