Adaptive properties of differential learning rates for positive and negative outcomes
- 571 Downloads
The concept of the reward prediction error—the difference between reward obtained and reward predicted—continues to be a focal point for much theoretical and experimental work in psychology, cognitive science, and neuroscience. Models that rely on reward prediction errors typically assume a single learning rate for positive and negative prediction errors. However, behavioral data indicate that better-than-expected and worse-than-expected outcomes often do not have symmetric impacts on learning and decision-making. Furthermore, distinct circuits within cortico-striatal loops appear to support learning from positive and negative prediction errors, respectively. Such differential learning rates would be expected to lead to biased reward predictions and therefore suboptimal choice performance. Contrary to this intuition, we show that on static “bandit” choice tasks, differential learning rates can be adaptive. This occurs because asymmetric learning enables a better separation of learned reward probabilities. We show analytically how the optimal learning rate asymmetry depends on the reward distribution and implement a biologically plausible algorithm that adapts the balance of positive and negative learning rates from experience. These results suggest specific adaptive advantages for separate, differential learning rates in simple reinforcement learning settings and provide a novel, normative perspective on the interpretation of associated neural data.
KeywordsReinforcement learning Reward prediction error Decision-making Meta-learning Basal ganglia
This work originated at the Okinawa Computational Neuroscience Course at the Okinawa Institute for Science and Technology (OIST), Japan. We are grateful to the organizers for providing a stimulating learning environment.
- Kahneman D, Tversky A (1979) Prospect theory: an analysis of decision under risk. Econ J Econ Soc 47(2):263–292Google Scholar
- Khamassi M, Lallée S, Enel P, Procyk E, Dominey PF (2011) Robot cognitive control with a neurophysiologically inspired reinforcement learning model. Front Neurorobotic 5(July):1Google Scholar
- Kravitz AV, Tye LD, Kreitzer AC (2012) Distinct roles for direct and indirect pathway striatal neurons in reinforcement. Nat Neurosci 15:816–818Google Scholar
- Sharot T (2011) The optimism bias. Curr Biol 21(23):R941–R945 Google Scholar
- Sutton RS (1984) Temporal credit assignment in reinforcement learning. Doctoral Dissertation, UMass AmherstGoogle Scholar
- Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MAGoogle Scholar
- Watkins C (1989) Learning from delayed rewards. PhD thesisGoogle Scholar