Abstract
The current work aims to study how people make predictions, under a reinforcement learning framework, in an environment that fluctuates from trial to trial and is corrupted with Gaussian noise. We developed a computer-based experiment where subjects were required to predict the future location of a spaceship that orbited around planet Earth. Its position was sampled from a Gaussian distribution with the mean changing at a variable velocity and four different values of variance that defined our signal-to-noise conditions. Three reinforcement learning algorithms using hierarchical Bayesian modeling were proposed as candidates to describe our data. The first and second models are the standard delta-rule and its Bayesian counterpart, the Kalman Filter. The third model is a delta-rule incorporating a velocity component which is updated using prediction errors. The main advantage of the later model over the first two is that it assumes participants estimate the trial-by-trial changes in the mean of the distribution generating the observations. We used leave-one-out cross-validation and the widely applicable information criterion to compare the predictive accuracy of the models. In general, our results provided evidence in favor of the model with the velocity term and showed that the learning rate of velocity and the decision noise change depending on the value of the signal-to-noise ratio. Finally, we modeled these changes using an extension of its hierarchical structure that allows us to make prior predictions for untested signal-to-noise conditions.
Similar content being viewed by others
Notes
Throughout the text, we use the parametrization of the Gaussian distribution in terms of a mean and a precision, where the precision is the reciprocal of the variance. This is largely because the software used for our model-based analysis (JAGS) adopts this convention.
References
Behrens, T.E., Woolrich, M.W., Walton, M.E., Rushworth, M. (2007). Learning the value of information in an uncertain world. Nature Neuroscience, 10, 1214–1221.
Brainard, D.H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.
Bush, R.R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.
Courville, A.C., Daw, N.D., Touretzky, D.S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10, 294–300.
Daw, N.D., & Tobler, P.N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics: decision making and the brain. 2nd edn. (pp. 283–298): Elsevier.
Dayan, P., & Nakahara, H. (2018). Models and methods for reinforcement learning. In Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.
Gallistel, C.R., Krishan, M., Liu, Y., Miller, R., Latham, P.E. (2014). The perception of probability. Psychological Review, 121, 96–123.
Gelman, A., & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472.
Gershman, S.J. (2015). A unifying probabilistic view of associative learning. PLOS Computational Biology, 11, e1004567.
Gershman, S.J. (2017). Dopamine, inference, and uncertainty. Neural Computation, 29, 3311–3326.
Kakade, S., & Dayan, P. (2002). Acquisition and extinction in autoshaping. Psychological Review, 109, 533–544.
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82, 35–45.
Kleiner, M., Brainard, D., Pelli, D., Ingling, A., Murray, R., Broussard, C. (2007). What’s new in psychtoolbox-3. Perception, 36, 1–16.
Lee, M.D. (2018). Bayesian methods in cognitive modeling. In The Stevens’ handbook of experimental psychology and cognitive neuroscience. 4th edn. New York: Wiley.
Lee, M.D., & Vanpaemel, W. (2018). Determining informative priors for cognitive models. Psychonomic Bulletin & Review, 25, 114–127.
Lee, M.D., & Wagenmakers, E.J. (2013). Bayesian cognitive modeling: a practical course. Cambridge: Cambridge University Press.
Matzke, D., Dolan, C.V., Batchelder, W.H., Wagenmakers, E.J. (2015). Bayesian estimation of multinomial processing tree models with heterogeneity in participants and items. Psychometrika, 80, 205–235.
McGuire, J.T., Nassar, M.R., Gold, J.I., Kable, J.W. (2014). Functionally dissociable influences on learning rate in a dynamic environment. Neuron, 84, 870–881.
Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., Rushworth, M. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nature Communications, 8, 1942.
Miller, R.R., Barnet, R.C., Grahame, N.J. (1995). Assessment of the rescorla-wagner model. Psychological Bulletin, 117, 363–386.
Nassar, M.R., Wilson, R.C., Heasly, B., Gold, J.I. (2010). An approximately bayesian delta-rule model explains the dynamics of belief updating in a changing environment. Journal of Neuroscience, 30, 12366–12378.
Navarro, D.J., Tran, P., Baz, N. (2018). Aversion to option loss in a restless bandit task. Computational Brain & Behavior.
Niv, Y. (2009). Reinforcement learning in the brain. Journal of Mathematical Psychology, 53, 139–154.
O’Reilly, J.X. (2013). Making predictions in a changing world—inference, uncertainty, and learning. Frontiers in Neuroscience, 7, 105.
Pelli, D.G. (1997). The videotoolbox software for visual psychophysics: transforming numbers into movies. Spatial Vision, 10, 437–442.
Plummer, M. (2003). Jags: a program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, Vienna, Austria, vol 124.
Pratte, M.S., & Rouder, J.N. (2011). Hierarchical single-and dual-process models of recognition memory. Journal of Mathematical Psychology, 55, 36–46.
Rescorla, R.A., & Wagner, A.R. (1972). A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. Classical conditioning II: Current Research and Theory, 2, 64–99.
Ricci, M., & Gallistel, R. (2017). Accurate step-hold tracking of smoothly varying periodic and aperiodic probability. Attention, Perception, & Psychophysics, 79, 1480–1494.
Ritz, H., Nassar, M.R., Frank, M.J., Shenhav, A. (2018). A control theoretic model of adaptive learning in dynamic environments. Journal of Cognitive Neuroscience, 30, 1405–1421.
Robinson, G.H. (1964). Continuous estimation of a time-varying probability. Ergonomics, 7, 7–21.
Schultz, W., Dayan, P., Montague, P.R. (1997). A neural substrate of prediction and reward. Science, 275, 1593–1599.
Shiffrin, R.M., Lee, M.D., Kim, W.J., Wagenmakers, E.J. (2008). A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive Science, 32, 1248–1284.
Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7, 351–367.
Speekenbrink, M., & Shanks, D.R. (2010). Learning in a changing environment. Journal of Experimental Psychology: General, 139, 266–298.
Sutton, R.S. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Vehtari, A., Gelman, A., Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and waic. Statistics and Computing, 27, 1413–1432.
Wilson, R.C., Nassar, M.R., Gold, J.I. (2013). A mixture of delta-rules approximation to Bayesian inference in change-point problems. PLOS Computational Biology, 9, e1003150.
Wittmann, M.K., Kolling, N., Akaishi, R., Chau, B., Brown, J.W., Nelissen, N., Rushworth, M.F. (2016). Predictive decision making driven by multiple time-linked reward representations in the anterior cingulate cortex. Nature Communications, 7, 12327.
Yu, A.J., & Dayan, P. (2005). Uncertainty, neuromodulation, and attention. Neuron, 46, 681–692.
Zajkowski, W.K., Kossut, M., Wilson, R.C. (2017). A causal role for right frontopolar cortex in directed, but not random, exploration. eLife, 6, e27430.
Funding
This research was supported by the project PAPIIT IG120818.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material of this article, including code and data, is available as a project page on the Open Science Framework at https://osf.io/d6tjw/. A preliminary version of this work was presented at the 51st Annual Meeting of the Society for Mathematical Psychology in 2018.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Velázquez, C., Villarreal, M. & Bouzas, A. Velocity Estimation in Reinforcement Learning. Comput Brain Behav 2, 95–108 (2019). https://doi.org/10.1007/s42113-019-00026-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42113-019-00026-1