Advertisement

Learning with an Asymmetric Teacher: Asymmetric Dopamine-Like Response Can Be Used as an Error Signal for Reinforcement Learning

  • Rea Mitelman
  • Mati Joshua
  • Hagai Bergman
Conference paper
Part of the Advances in Behavioral Biology book series (ABBI, volume 58)

Abstract

According to many computational models, dopamine (DA) neurons in the basal ganglia (BG) play a major role in reinforcement learning. DA signal is proportional to the difference between actual and predicted reward, hence it could serve as the error signal in a temporal difference (TD) learning algorithm implemented in the BG. Indeed, a proportional increase in the firing rate of DA neurons in states with higher values than expected (the positive domain of the error signal) has been found experimentally. However, many studies indicate that DA neurons do not decrease their firing rate symmetrically in the negative domain. Some studies report a smaller gain relative to the positive domain, whereas others report a decrease in the firing to a constant level.

Our work focuses on using such an asymmetric error signal in a TD-like computational algorithm. We simulated a probabilistic classical conditioning task in which the agent sequentially received stimuli with different probabilities of reward or aversion. The algorithm calculated the value of each of the stimuli, using an asymmetric TD signal. This was done by manipulating the negative domain of the error signal function by decreasing its slope by a multiplicative factor smaller than one, fixing its negative values to a constant negative level, or fixing them to zero.

We show that learning can be achieved when the negative domain of the error signal function is either constant negative or with reduced gain, although it is slower than with a symmetric error signal. However, learning cannot be achieved with a constant value of zero for the negative domain of the error signal function. We examined learning by comparing the values the algorithm assigned to the stimuli with those assigned by a symmetric TD algorithm. These values had a nonlinear concave trend, with higher calculated values than those assigned by the symmetric TD algorithm.

This suggests that the DA asymmetric signal could be used as an error signal in a TD algorithm as implemented in the BG and that DA asymmetric coding does not require a complementary BG modulatory error signal. We further hypothesize that the concave values of the error signal curve could thus lead to some aspects of irrational human behavior.

Keywords

Basal Ganglion Conditioned Stimulus Firing Rate Error Signal Unconditioned Stimulus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bayer HM and Glimcher PW (2005) Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47: 129–141CrossRefPubMedGoogle Scholar
  2. Bayer HM, Lau B and Glimcher PW (2007) Statistics of midbrain dopamine neuron spike trains in the awake primate. J Neurophysiol 98: 1428–1439CrossRefPubMedGoogle Scholar
  3. Fiorillo CD, Tobler PN and Schultz W (2003) Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 1898–1902CrossRefPubMedGoogle Scholar
  4. Haber SN and Gdowski MJ (2004) The basal ganglia. In Paxinos G and Mai JK (eds) The Human Nervous System. Elsevier, London, pp. 676–738.CrossRefGoogle Scholar
  5. Hollerman JR and Schultz W (1998) Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1: 304–309CrossRefPubMedGoogle Scholar
  6. Joshua M, Adler A, Mitelman R, Vaadia E and Bergman H (2008) Midbrain dopaminergic neu­rons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J Neurosci 28: 11673–11684CrossRefPubMedGoogle Scholar
  7. Kawato M and Samejima K (2007) Efficient reinforcement learning: Computational theories, neuroscience and robotics. Curr Opin Neurobiol 17: 205–212CrossRefPubMedGoogle Scholar
  8. Mirenowicz J and Schultz W (1994) Importance of unpredictability for reward responses in primate dopamine neurons. J Neurophysiol 72:1024–1027PubMedGoogle Scholar
  9. Morris G, Arkadir D, Nevet A, Vaadia E and Bergman H (2004) Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43: 133–143CrossRefPubMedGoogle Scholar
  10. Pavlov IP (1927) Conditioned Reflexes. Oxford University Press, OxfordGoogle Scholar
  11. Satoh T, Nakai S, Sato T and Kimura M (2003) Correlated coding of motivation and outcome of decision by dopamine neurons. J Neurosci 23: 9913–9923PubMedGoogle Scholar
  12. Schultz W, Dayan P and Montague PR (1997) A neural substrate of prediction and reward. Science 275: 1593–1599CrossRefPubMedGoogle Scholar
  13. Sutton RS and Barto AG (1998) Reinforcement Learning: An Introduction. MIT, Cambridge, MAGoogle Scholar
  14. Tversky A and Kahneman D (1981) The framing of decisions and the psychology of choice. Science 211: 453–458CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Physiology, Hadassah Medical School and the Interdisciplinary Center for Neural ComputationThe Hebrew UniversityJerusalemIsrael

Personalised recommendations