Velocity Estimation in Reinforcement Learning

Velázquez, Carlos; Villarreal, Manuel; Bouzas, Arturo

doi:10.1007/s42113-019-00026-1

Velocity Estimation in Reinforcement Learning

Published: 06 February 2019

Volume 2, pages 95–108, (2019)
Cite this article

Computational Brain & Behavior Aims and scope Submit manuscript

Carlos Velázquez¹,
Manuel Villarreal¹ &
Arturo Bouzas¹

1463 Accesses
1 Citation
2 Altmetric
Explore all metrics

Abstract

The current work aims to study how people make predictions, under a reinforcement learning framework, in an environment that fluctuates from trial to trial and is corrupted with Gaussian noise. We developed a computer-based experiment where subjects were required to predict the future location of a spaceship that orbited around planet Earth. Its position was sampled from a Gaussian distribution with the mean changing at a variable velocity and four different values of variance that defined our signal-to-noise conditions. Three reinforcement learning algorithms using hierarchical Bayesian modeling were proposed as candidates to describe our data. The first and second models are the standard delta-rule and its Bayesian counterpart, the Kalman Filter. The third model is a delta-rule incorporating a velocity component which is updated using prediction errors. The main advantage of the later model over the first two is that it assumes participants estimate the trial-by-trial changes in the mean of the distribution generating the observations. We used leave-one-out cross-validation and the widely applicable information criterion to compare the predictive accuracy of the models. In general, our results provided evidence in favor of the model with the velocity term and showed that the learning rate of velocity and the decision noise change depending on the value of the signal-to-noise ratio. Finally, we modeled these changes using an extension of its hierarchical structure that allows us to make prior predictions for untested signal-to-noise conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PsychoPy2: Experiments in behavior made easy

Article Open access 07 February 2019

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

Throughout the text, we use the parametrization of the Gaussian distribution in terms of a mean and a precision, where the precision is the reciprocal of the variance. This is largely because the software used for our model-based analysis (JAGS) adopts this convention.

References

Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.
Article Google Scholar
Brainard, D.H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.
Article PubMed Google Scholar
Bush, R.R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.
Article PubMed Google Scholar
Courville, A.C., Daw, N.D., Touretzky, D.S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10, 294–300.
Article PubMed Google Scholar
Daw, N.D., & Tobler, P.N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics: decision making and the brain. 2nd edn. (pp. 283–298): Elsevier.
Dayan, P., & Nakahara, H. (2018). Models and methods for reinforcement learning. In Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.
Gallistel, C.R., Krishan, M., Liu, Y., Miller, R., Latham, P.E. (2014). The perception of probability. Psychological Review, 121, 96–123.
Article PubMed Google Scholar
Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.
Article Google Scholar
Gershman, S.J. (2015). A unifying probabilistic view of associative learning. PLOS Computational Biology, 11, e1004567.
Article PubMed PubMed Central Google Scholar
Gershman, S.J. (2017). Dopamine, inference, and uncertainty. Neural Computation, 29, 3311–3326.
Article PubMed Google Scholar
Kakade, S., & Dayan, P. (2002). Acquisition and extinction in autoshaping. Psychological Review, 109, 533–544.
Article PubMed Google Scholar
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.
Article Google Scholar
Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., Broussard, C. (2007). What’s new in psychtoolbox-3. Perception, 36, 1–16.
Google Scholar
Lee, M.D. (2018). Bayesian methods in cognitive modeling. In The Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.
Lee, M.D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review, 25, 114–127.
Article Google Scholar
Lee, M.D., & Wagenmakers, E.J. (2013). Bayesian cognitive modeling: a practical course. Cambridge: Cambridge University Press.
Book Google Scholar
Matzke, D., Dolan, C.V., Batchelder, W.H., Wagenmakers, E.J. (2015). Bayesian estimation of multinomial processing tree models with heterogeneity in participants and items. Psychometrika, 80, 205–235.
Article PubMed Google Scholar
McGuire, J.T., Nassar, M.R., Gold, J.I., Kable, J.W. (2014). Functionally dissociable influences on learning rate in a dynamic environment. Neuron, 84, 870–881.
Article PubMed PubMed Central Google Scholar
Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., Rushworth, M. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nature Communications, 8, 1942.
Article PubMed PubMed Central Google Scholar
Miller, R.R., Barnet, R.C., Grahame, N.J. (1995). Assessment of the rescorla-wagner model. Psychological Bulletin, 117, 363–386.
Article PubMed Google Scholar
Nassar, M.R., Wilson, R.C., Heasly, B., Gold, J.I. (2010). An approximately bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience, 30, 12366–12378.
Article PubMed Google Scholar
Navarro, D.J., Tran, P., Baz, N. (2018). Aversion to option loss in a restless bandit task. Computational Brain & Behavior.
Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139–154.
Article Google Scholar
O’Reilly, J.X. (2013). Making predictions in a changing world—inference, uncertainty, and learning. Frontiers in Neuroscience, 7, 105.
PubMed PubMed Central Google Scholar
Pelli, D.G. (1997). The videotoolbox software for visual psychophysics: transforming numbers into movies. Spatial Vision, 10, 437–442.
Article PubMed Google Scholar
Plummer, M. (2003). Jags: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, Vienna, Austria, vol 124.
Pratte, M.S., & Rouder, J.N. (2011). Hierarchical single-and dual-process models of recognition memory. Journal of Mathematical Psychology, 55, 36–46.
Article Google Scholar
Rescorla, R.A., & Wagner, A.R. (1972). A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current Research and Theory, 2, 64–99.
Google Scholar
Ricci, M., & Gallistel, R. (2017). Accurate step-hold tracking of smoothly varying periodic and aperiodic probability. Attention, Perception, & Psychophysics, 79, 1480–1494.
Article Google Scholar
Ritz, H., Nassar, M.R., Frank, M.J., Shenhav, A. (2018). A control theoretic model of adaptive learning in dynamic environments. Journal of Cognitive Neuroscience, 30, 1405–1421.
Article PubMed Google Scholar
Robinson, G.H. (1964). Continuous estimation of a time-varying probability. Ergonomics, 7, 7–21.
Article Google Scholar
Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Article PubMed Google Scholar
Shiffrin, R.M., Lee, M.D., Kim, W.J., Wagenmakers, E.J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32, 1248–1284.
Article PubMed Google Scholar
Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7, 351–367.
Article PubMed Google Scholar
Speekenbrink, M., & Shanks, D.R. (2010). Learning in a changing environment. Journal of Experimental Psychology: General, 139, 266–298.
Article Google Scholar
Sutton, R.S. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Google Scholar
Vehtari, A., Gelman, A., Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27, 1413–1432.
Article Google Scholar
Wilson, R.C., Nassar, M.R., Gold, J.I. (2013). A mixture of delta-rules approximation to Bayesian inference in change-point problems. PLOS Computational Biology, 9, e1003150.
Article PubMed PubMed Central Google Scholar
Wittmann, M.K., Kolling, N., Akaishi, R., Chau, B., Brown, J.W., Nelissen, N., Rushworth, M.F. (2016). Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nature Communications, 7, 12327.
Article PubMed PubMed Central Google Scholar
Yu, A.J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46, 681–692.
Article PubMed Google Scholar
Zajkowski, W.K., Kossut, M., Wilson, R.C. (2017). A causal role for right frontopolar cortex in directed, but not random, exploration. eLife, 6, e27430.
Article PubMed PubMed Central Google Scholar

Download references

Funding

This research was supported by the project PAPIIT IG120818.

Author information

Authors and Affiliations

Universidad Nacional Autónoma de México, Avenida Universidad 3004, Coyoacán, Col. Copilco Universidad, 04510, Ciudad de México, México
Carlos Velázquez, Manuel Villarreal & Arturo Bouzas

Authors

Carlos Velázquez
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Villarreal
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Bouzas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Velázquez.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary material of this article, including code and data, is available as a project page on the Open Science Framework at https://osf.io/d6tjw/. A preliminary version of this work was presented at the 51st Annual Meeting of the Society for Mathematical Psychology in 2018.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 27.1 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Velázquez, C., Villarreal, M. & Bouzas, A. Velocity Estimation in Reinforcement Learning. Comput Brain Behav 2, 95–108 (2019). https://doi.org/10.1007/s42113-019-00026-1

Download citation

Published: 06 February 2019
Issue Date: 14 June 2019
DOI: https://doi.org/10.1007/s42113-019-00026-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Velocity Estimation in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

PsychoPy2: Experiments in behavior made easy

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 27.1 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Velocity Estimation in Reinforcement Learning

Abstract

Access this article

Similar content being viewed by others

PsychoPy2: Experiments in behavior made easy

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Overfitting, Model Tuning, and Evaluation of Prediction Performance

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 27.1 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation