Prioritized Sweeping Neural DynaQ with Multiple Predecessors, and Hippocampal Replays

  • Lise Aubin
  • Mehdi Khamassi
  • Benoît Girard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10928)


During sleep and wakeful rest, the hippocampus replays sequences of place cells that have been activated during prior experiences. These replays have been interpreted as a memory consolidation process, but recent results suggest a possible interpretation in terms of reinforcement learning. The Dyna reinforcement learning algorithms use off-line replays to improve learning. Under limited replay budget, prioritized sweeping, which requires a model of the transitions to the predecessors, can be used to improve performance. We investigate if such algorithms can explain the experimentally observed replays. We propose a neural network version of prioritized sweeping Q-learning, for which we developed a growing multiple expert algorithm, able to cope with multiple predecessors. The resulting architecture is able to improve the learning of simulated agents confronted to a navigation task. We predict that, in animals, learning the transition and reward models should occur during rest periods, and that the corresponding replays should be shuffled.


Reinforcement learning Replays DynaQ Prioritized sweeping Neural networks Hippocampus Navigation 



The authors thank O. Sigaud for fruitful discussions, and F. Cinotti for proofreading. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 640891 (DREAM Project). This work was performed within the Labex SMART (ANR-11-LABX-65) supported by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR-11-IDEX-0004-02.


  1. 1.
    O’Keefe, J., Dostrovsky, J.: The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. Brain Res. 34(1), 171–175 (1971)CrossRefGoogle Scholar
  2. 2.
    Wilson, M.A., McNaughton, B.L., et al.: Reactivation of hippocampal ensemble memories during sleep. Science 265(5172), 676–679 (1994)CrossRefGoogle Scholar
  3. 3.
    Girardeau, G., Benchenane, K., Wiener, S.I., Buzsáki, G., Zugaro, M.B.: Selective suppression of hippocampal ripples impairs spatial memory. Nat. Neurosci. 12(10), 1222–1223 (2009)CrossRefGoogle Scholar
  4. 4.
    Foster, D.J., Wilson, M.A.: Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440(7084), 680–683 (2006)CrossRefGoogle Scholar
  5. 5.
    Lee, A.K., Wilson, M.A.: Memory of sequential experience in the hippocampus during slow wave sleep. Neuron 36(6), 1183–1194 (2002)CrossRefGoogle Scholar
  6. 6.
    Gupta, A.S., van der Meer, M.A.A., Touretzky, D.S., Redish, A.D.: Hippocampal replay is not a simple function of experience. Neuron 65(5), 695–705 (2010)CrossRefGoogle Scholar
  7. 7.
    Chen, Z., Wilson, M.A.: Deciphering neural codes of memory during sleep. Trends Neurosci. 40(5), 260–275 (2017)CrossRefGoogle Scholar
  8. 8.
    Peyrache, A., Khamassi, M., Benchenane, K., Wiener, S.I., Battaglia, F.P.: Replay of rule-learning related neural patterns in the prefrontal cortex during sleep. Nat. Neurosci. 12(7), 919–926 (2009)CrossRefGoogle Scholar
  9. 9.
    McClelland, J.L., McNaughton, B.L., O’reilly, R.C.: Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. Psychol. Rev. 102(3), 419 (1995)CrossRefGoogle Scholar
  10. 10.
    De Lavilléon, G., Lacroix, M.M., Rondi-Reig, L., Benchenane, K.: Explicit memory creation during sleep demonstrates a causal role of place cells in navigation. Nat. Neurosci. 18(4), 493–495 (2015)CrossRefGoogle Scholar
  11. 11.
    Cazé, R., Khamassi, M., Aubin, L., Girard, B.: Hippocampal replays under the scrutiny of reinforcement learning models (2018, submitted)Google Scholar
  12. 12.
    Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 216–224 (1990)CrossRefGoogle Scholar
  13. 13.
    Moore, A.W., Atkeson, C.G.: Prioritized sweeping: reinforcement learning with less data and less time. Mach. Learn. 13(1), 103–130 (1993)Google Scholar
  14. 14.
    Peng, J., Williams, R.J.: Efficient learning and planning within the Dyna framework. Adapt. Behav. 1(4), 437–454 (1993)CrossRefGoogle Scholar
  15. 15.
    Khamassi, M., Lacheze, L., Girard, B., Berthoz, A., Guillot, A.: Actor-critic models of reinforcement learning in the basal ganglia: from natural to arificial rats. Adapt. Behav. 13, 131–148 (2005)CrossRefGoogle Scholar
  16. 16.
    Lin, L.H.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8(3/4), 69–97 (1992)CrossRefGoogle Scholar
  17. 17.
    Paz-Villagrán, V., Save, E., Poucet, B.: Independent coding of connected environments by place cells. Eur. J. Neurosci. 20(5), 1379–1390 (2004)CrossRefGoogle Scholar
  18. 18.
    Eichenbaum, H.: Prefrontal-hippocampal interactions in episodic memory. Nat. Rev. Neurosci. 18(9), 547 (2017)CrossRefGoogle Scholar
  19. 19.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-diffference learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1075–1081 (1997)Google Scholar
  21. 21.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institut des Systèmes Intelligents et de Robotique (ISIR)Sorbonne Université, CNRSParisFrance

Personalised recommendations