Encyclopedia of Computational Neuroscience

Living Edition
| Editors: Dieter Jaeger, Ranu Jung

Reinforcement Learning in Cortical Networks

  • Walter SennEmail author
  • Jean-Pascal Pfister
Living reference work entry

Latest version View entry history

DOI: https://doi.org/10.1007/978-1-4614-7320-6_580-2



Reinforcement learning represents a basic paradigm of learning in artificial intelligence and biology. The paradigm considers an agent (robot, human, animal) that acts in a typically stochastic environment and receives rewards when reaching certain states. The agent’s goal is to maximize the expected reward by choosing the optimal action at any given state. In a cortical implementation, the states are defined by sensory stimuli that feed into a neuronal network, and after the network activity is settled, an action is read out. Learning consists in adapting the synaptic connection strengths into and within the neuronal network based on a (typically binary) feedback about the appropriateness of the chosen action. Policy gradient and temporal difference learning are two methods for deriving synaptic plasticity rules that maximize the expected reward in response...


Spike Train Partially Observable Markov Decision Process Presynaptic Spike Eligibility Trace Plasticity Induction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.



This work was supported by the Swiss National Science Foundation with grants 31003A_133094 and CRSII2_147636 to W.S and Grants PZ00P3_137200 and PP00P3_150637 to J.-P.P. We thank Robert Urbanczik and Johannes Friedrich for valuable comments on the manuscript.


  1. Baxter J, Bartlett P (2001) Infinite-horizon policy-gradient estimation. J Artif Intell Res 15:319–350Google Scholar
  2. Castro D, Volkinshtein S, Meir R (2009) Temporal difference based actor critic learning: convergence and neural implementation. In: Advances in neural information processing systems, vol 21. MIT Press, Cambridge, MA, pp 385–392Google Scholar
  3. Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441:876–879PubMedCrossRefPubMedCentralGoogle Scholar
  4. Dayan P, Niv Y (2008) Reinforcement learning: the good, the bad and the ugly. Curr Opin Neurobiol 18:185–196PubMedCrossRefGoogle Scholar
  5. Fiete IR, Seung HS (2006) Gradient learning in spiking neural networks by dynamic perturbation of conductances. Phys Rev Lett 97:048104PubMedCrossRefGoogle Scholar
  6. Florian RV (2007) Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Comput 19:1468–1502PubMedCrossRefGoogle Scholar
  7. Frémaux N, Sprekeler H, Gerstner W (2010) Functional requirements for reward-modulated spike-timing-dependent plasticity. J Neurosci 30:13326–13337PubMedCrossRefGoogle Scholar
  8. Frémaux N, Sprekeler H, Gerstner W (2013) Reinforcement learning using a continuous time actor-critic framework with spiking neurons. PLoS Comput Biol 9:e1003024PubMedCrossRefPubMedCentralGoogle Scholar
  9. Friedrich J, Urbanczik R, Senn W (2011) Spatio-temporal credit assignment in neuronal population learning. PLoS Comput Biol 7:e1002092PubMedCrossRefPubMedCentralGoogle Scholar
  10. Friedrich J, Urbanczik R, Senn W (2014) Code-specific learning rules improve action selection by populations of spiking neurons. Int J of Neural Syst 24:1–17CrossRefGoogle Scholar
  11. Izhikevich EM (2007) Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex 17:2443–2452PubMedCrossRefGoogle Scholar
  12. Kolodziejski C, Porr B, Worgotter F (2009) On the asymptotic equivalence between differential hebbian and temporal difference learning. Neural Comput 21:1173–1202PubMedCrossRefGoogle Scholar
  13. Legenstein R, Pecevski D, Maass W (2008) A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput Biol 4:e1000180PubMedCrossRefPubMedCentralGoogle Scholar
  14. Pfister J, Toyoizumi T, Barber D, Gerstner W (2006) Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning. Neural Comput 18:1318–1348PubMedCrossRefGoogle Scholar
  15. Potjans W, Morrison A, Diesmann M (2009) A spiking neural network model of an actor-critic learning agent. Neural Comput 21:301–339PubMedCrossRefGoogle Scholar
  16. Potjans W, Diesmann M, Morrison A (2011) An imperfect dopaminergic error signal can drive temporal-difference learning. PLoS Comput Biol 7:e1001133PubMedCrossRefPubMedCentralGoogle Scholar
  17. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599PubMedCrossRefGoogle Scholar
  18. Seung HS (2003) Learning in spiking neural networks by reinforcement of stochastic synaptic transmission. Neuron 40:1063–1073PubMedCrossRefGoogle Scholar
  19. Sjöström J, Gerstner W (2010) Spike-timing dependent plasticity. Scholarpedia 5:1362CrossRefGoogle Scholar
  20. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MAGoogle Scholar
  21. Tanimoto H, Heisenberg M, Gerber B (2004) Experimental psychology: event timing turns punishment to reward. Nature 430:983PubMedCrossRefGoogle Scholar
  22. Urbanczik R, Senn W (2009) Reinforcement learning in populations of spiking neurons. Nat Neurosci 12:250–252PubMedCrossRefGoogle Scholar
  23. Williams RJ (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 8:229–256Google Scholar
  24. Wunderlich K, Dayan P, Dolan RJ (2012) Mapping value based planning and extensively trained choice in the human brain. Nat Neurosci 15:786–791PubMedCrossRefPubMedCentralGoogle Scholar
  25. Xie X, Seung HS (2004) Learning in neural networks by reinforcement of irregular spiking. Phys Rev E Stat Nonlin Soft Matter Phys 69:041909PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Institut fur PhysiologieUniversität BernBernSwitzerland
  2. 2.Theoretical Neuroscience Group, Institute of NeuroinformaticsUniversity of Zurich and ETH ZurichZurichSwitzerland