Skip to main content
Log in

Velocity Estimation in Reinforcement Learning

  • Published:
Computational Brain & Behavior Aims and scope Submit manuscript

Abstract

The current work aims to study how people make predictions, under a reinforcement learning framework, in an environment that fluctuates from trial to trial and is corrupted with Gaussian noise. We developed a computer-based experiment where subjects were required to predict the future location of a spaceship that orbited around planet Earth. Its position was sampled from a Gaussian distribution with the mean changing at a variable velocity and four different values of variance that defined our signal-to-noise conditions. Three reinforcement learning algorithms using hierarchical Bayesian modeling were proposed as candidates to describe our data. The first and second models are the standard delta-rule and its Bayesian counterpart, the Kalman Filter. The third model is a delta-rule incorporating a velocity component which is updated using prediction errors. The main advantage of the later model over the first two is that it assumes participants estimate the trial-by-trial changes in the mean of the distribution generating the observations. We used leave-one-out cross-validation and the widely applicable information criterion to compare the predictive accuracy of the models. In general, our results provided evidence in favor of the model with the velocity term and showed that the learning rate of velocity and the decision noise change depending on the value of the signal-to-noise ratio. Finally, we modeled these changes using an extension of its hierarchical structure that allows us to make prior predictions for untested signal-to-noise conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. Throughout the text, we use the parametrization of the Gaussian distribution in terms of a mean and a precision, where the precision is the reciprocal of the variance. This is largely because the software used for our model-based analysis (JAGS) adopts this convention.

References

  • Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.

    Article  Google Scholar 

  • Brainard, D.H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.

    Article  PubMed  Google Scholar 

  • Bush, R.R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.

    Article  PubMed  Google Scholar 

  • Courville, A.C., Daw, N.D., Touretzky, D.S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10, 294–300.

    Article  PubMed  Google Scholar 

  • Daw, N.D., & Tobler, P.N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics: decision making and the brain. 2nd edn. (pp. 283–298): Elsevier.

  • Dayan, P., & Nakahara, H. (2018). Models and methods for reinforcement learning. In Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.

  • Gallistel, C.R., Krishan, M., Liu, Y., Miller, R., Latham, P.E. (2014). The perception of probability. Psychological Review, 121, 96–123.

    Article  PubMed  Google Scholar 

  • Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.

    Article  Google Scholar 

  • Gershman, S.J. (2015). A unifying probabilistic view of associative learning. PLOS Computational Biology, 11, e1004567.

    Article  PubMed  PubMed Central  Google Scholar 

  • Gershman, S.J. (2017). Dopamine, inference, and uncertainty. Neural Computation, 29, 3311–3326.

    Article  PubMed  Google Scholar 

  • Kakade, S., & Dayan, P. (2002). Acquisition and extinction in autoshaping. Psychological Review, 109, 533–544.

    Article  PubMed  Google Scholar 

  • Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.

    Article  Google Scholar 

  • Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., Broussard, C. (2007). What’s new in psychtoolbox-3. Perception, 36, 1–16.

    Google Scholar 

  • Lee, M.D. (2018). Bayesian methods in cognitive modeling. In The Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.

  • Lee, M.D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review, 25, 114–127.

    Article  Google Scholar 

  • Lee, M.D., & Wagenmakers, E.J. (2013). Bayesian cognitive modeling: a practical course. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Matzke, D., Dolan, C.V., Batchelder, W.H., Wagenmakers, E.J. (2015). Bayesian estimation of multinomial processing tree models with heterogeneity in participants and items. Psychometrika, 80, 205–235.

    Article  PubMed  Google Scholar 

  • McGuire, J.T., Nassar, M.R., Gold, J.I., Kable, J.W. (2014). Functionally dissociable influences on learning rate in a dynamic environment. Neuron, 84, 870–881.

    Article  PubMed  PubMed Central  Google Scholar 

  • Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., Rushworth, M. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nature Communications, 8, 1942.

    Article  PubMed  PubMed Central  Google Scholar 

  • Miller, R.R., Barnet, R.C., Grahame, N.J. (1995). Assessment of the rescorla-wagner model. Psychological Bulletin, 117, 363–386.

    Article  PubMed  Google Scholar 

  • Nassar, M.R., Wilson, R.C., Heasly, B., Gold, J.I. (2010). An approximately bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience, 30, 12366–12378.

    Article  PubMed  Google Scholar 

  • Navarro, D.J., Tran, P., Baz, N. (2018). Aversion to option loss in a restless bandit task. Computational Brain & Behavior.

  • Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139–154.

    Article  Google Scholar 

  • O’Reilly, J.X. (2013). Making predictions in a changing world—inference, uncertainty, and learning. Frontiers in Neuroscience, 7, 105.

    PubMed  PubMed Central  Google Scholar 

  • Pelli, D.G. (1997). The videotoolbox software for visual psychophysics: transforming numbers into movies. Spatial Vision, 10, 437–442.

    Article  PubMed  Google Scholar 

  • Plummer, M. (2003). Jags: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, Vienna, Austria, vol 124.

  • Pratte, M.S., & Rouder, J.N. (2011). Hierarchical single-and dual-process models of recognition memory. Journal of Mathematical Psychology, 55, 36–46.

    Article  Google Scholar 

  • Rescorla, R.A., & Wagner, A.R. (1972). A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current Research and Theory, 2, 64–99.

    Google Scholar 

  • Ricci, M., & Gallistel, R. (2017). Accurate step-hold tracking of smoothly varying periodic and aperiodic probability. Attention, Perception, & Psychophysics, 79, 1480–1494.

    Article  Google Scholar 

  • Ritz, H., Nassar, M.R., Frank, M.J., Shenhav, A. (2018). A control theoretic model of adaptive learning in dynamic environments. Journal of Cognitive Neuroscience, 30, 1405–1421.

    Article  PubMed  Google Scholar 

  • Robinson, G.H. (1964). Continuous estimation of a time-varying probability. Ergonomics, 7, 7–21.

    Article  Google Scholar 

  • Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.

    Article  PubMed  Google Scholar 

  • Shiffrin, R.M., Lee, M.D., Kim, W.J., Wagenmakers, E.J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32, 1248–1284.

    Article  PubMed  Google Scholar 

  • Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7, 351–367.

    Article  PubMed  Google Scholar 

  • Speekenbrink, M., & Shanks, D.R. (2010). Learning in a changing environment. Journal of Experimental Psychology: General, 139, 266–298.

    Article  Google Scholar 

  • Sutton, R.S. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.

    Google Scholar 

  • Vehtari, A., Gelman, A., Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27, 1413–1432.

    Article  Google Scholar 

  • Wilson, R.C., Nassar, M.R., Gold, J.I. (2013). A mixture of delta-rules approximation to Bayesian inference in change-point problems. PLOS Computational Biology, 9, e1003150.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wittmann, M.K., Kolling, N., Akaishi, R., Chau, B., Brown, J.W., Nelissen, N., Rushworth, M.F. (2016). Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nature Communications, 7, 12327.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yu, A.J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46, 681–692.

    Article  PubMed  Google Scholar 

  • Zajkowski, W.K., Kossut, M., Wilson, R.C. (2017). A causal role for right frontopolar cortex in directed, but not random, exploration. eLife, 6, e27430.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Funding

This research was supported by the project PAPIIT IG120818.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Velázquez.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary material of this article, including code and data, is available as a project page on the Open Science Framework at https://osf.io/d6tjw/. A preliminary version of this work was presented at the 51st Annual Meeting of the Society for Mathematical Psychology in 2018.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 27.1 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Velázquez, C., Villarreal, M. & Bouzas, A. Velocity Estimation in Reinforcement Learning. Comput Brain Behav 2, 95–108 (2019). https://doi.org/10.1007/s42113-019-00026-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42113-019-00026-1

Keywords

Navigation