Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning

Abstract

What determines the speed of our decisions? Various models of decision-making have focused on perceptual evidence, past experience, and task complexity as important factors determining the degree of deliberation needed for a decision. Here, we build on a sequential sampling decision-making framework to develop a new model that captures a range of reaction time (RT) effects by accounting for both working memory and instrumental learning processes. The model captures choices and RTs at various stages of learning, and in learning environments with varying complexity. Moreover, the model generalizes from tasks with deterministic reward contingencies to probabilistic ones. The model succeeds in part by incorporating prior uncertainty over actions when modeling RT. This straightforward process model provides a parsimonious account of decision dynamics during instrumental learning and makes unique predictions about internal representations of action values.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Data availability

The data and code for all experiments and models will be made freely available upon publication (at https://github.com/sdmcdougle). None of the experiments described here were preregistered.

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705

    Article  Google Scholar 

  2. Ballard, I. C., & McClure, S. M. (2019). Joint modeling of reaction times and choice improves parameter identifiability in reinforcement learning models. Journal of Neuroscience Methods, 317, 37–44. https://doi.org/10.1016/j.jneumeth.2019.01.006

    Article  PubMed  Google Scholar 

  3. Bertelson, P. (1965). Serial Choice Reaction-time as a Function of Response versus Signal-and-Response Repetition. Nature, 206(4980), 217–218. https://doi.org/10.1038/206217a0

    Article  PubMed  Google Scholar 

  4. Braver, T. S. (2012). The variable nature of cognitive control: A dual mechanisms framework. Trends in Cognitive Sciences, 16(2), 106–113. https://doi.org/10.1016/j.tics.2011.12.010

    Article  PubMed  PubMed Central  Google Scholar 

  5. Brown, S. D., & Heathcote, A. (2008). The simplest complete model of choice response time: Linear ballistic accumulation. Cognitive Psychology, 57(3), 153–178. https://doi.org/10.1016/j.cogpsych.2007.12.002

    Article  Google Scholar 

  6. Busemeyer, J. R., Gluth, S., Rieskamp, J., & Turner, B. M. (2019). Cognitive and Neural Bases of Multi-Attribute, Multi-Alternative, Value-based Decisions. Trends in Cognitive Sciences, 23(3), 251–263. https://doi.org/10.1016/j.tics.2018.12.003

    Article  Google Scholar 

  7. Campbell, K. C., & Proctor, R. W. (1993). Repetition Effects With Categorizable Stimulus and Response Sets. Journal of Experimental Psychology. Learning, Memory, and Cognition, 19(6), 1345–1362.

    Article  Google Scholar 

  8. Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024–1035.

    Article  Google Scholar 

  9. Collins, A. G., Brown, J. K., Gold, J. M., Waltz, J. A., & Frank, M. J. (2014). Working memory contributions to reinforcement learning impairments in schizophrenia. Journal of Neuroscience, 34(41), 13747–13756.

    Article  Google Scholar 

  10. Collins, A. G., Ciullo, B., Frank, M. J., & Badre, D. (2017). Working memory load strengthens reward prediction errors. Journal of Neuroscience, 37(16), 4332–4342.

    Article  Google Scholar 

  11. Collins, A. G. E. (2018). The Tortoise and the Hare: Interactions between Reinforcement Learning and Working Memory. Journal of Cognitive Neuroscience, 30(10), 1422–1432. https://doi.org/10.1162/jocn_a_01238

    Article  PubMed  Google Scholar 

  12. Collins, A. G. E., & Frank, M. J. (2014). Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337–366. https://doi.org/10.1037/a0037015

    Article  PubMed  Google Scholar 

  13. Collins, A. G. E., & Frank, M. J. (2018). Within- and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. Proceedings of the National Academy of Sciences, 115(10), 2502–2507. https://doi.org/10.1073/pnas.1720963115

    Article  Google Scholar 

  14. Davis, R., Moray, N., & Treisman, A. (1961). Imitative responses and the rate of gain of information. Quarterly Journal of Experimental Psychology, 13(2), 78–89. https://doi.org/10.1080/17470216108416477

    Article  Google Scholar 

  15. Donkin, C., Brown, S. D., & Heathcote, A. (2009). The overconstraint of response time models: Rethinking the scaling problem. Psychonomic Bulletin & Review, 16(6), 1129–1135. https://doi.org/10.3758/PBR.16.6.1129

    Article  Google Scholar 

  16. Fontanesi, L., Gluth, S., Spektor, M. S., & Rieskamp, J. (2019). A reinforcement learning diffusion decision model for value-based decisions. Psychonomic Bulletin & Review, 26(4), 1099–1121. https://doi.org/10.3758/s13423-018-1554-2

    Article  Google Scholar 

  17. Frank, M. J., Gagne, C., Nyhus, E., Masters, S., Wiecki, T. V., Cavanagh, J. F., & Badre, D. (2015). FMRI and EEG Predictors of Dynamic Decision Parameters during Human Reinforcement Learning. Journal of Neuroscience, 35(2), 485–494. https://doi.org/10.1523/JNEUROSCI.2036-14.2015

    Article  PubMed  Google Scholar 

  18. Hale, D. J. (1969). Repetition and probability effects in a serial choice reaction task. Acta Psychologica, 29, 163–171. https://doi.org/10.1016/0001-6918(69)90011-0

    Article  Google Scholar 

  19. Heathcote, A., Lin, Y.-S., Reynolds, A., Strickland, L., Gretton, M., & Matzke, D. (2019). Dynamic models of choice. Behavior Research Methods, 51(2), 961–985. https://doi.org/10.3758/s13428-018-1067-y

    Article  PubMed  Google Scholar 

  20. Hick, W. E. (1952). On the Rate of Gain of Information. Quarterly Journal of Experimental Psychology, 4(1), 11–26. https://doi.org/10.1080/17470215208416600

    Article  Google Scholar 

  21. Huys, Q. J. M., Maia, T. V., & Frank, M. J. (2016). Computational psychiatry as a bridge from neuroscience to clinical applications. Nature Neuroscience, 19(3), 404–413. https://doi.org/10.1038/nn.4238

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hyman, R. (1953). Stimulus information as a determinant of reaction time. Journal of Experimental Psychology, 45(3), 188–196. https://doi.org/10.1037/h0056940

    Article  PubMed  Google Scholar 

  23. Joel, D., Niv, Y., & Ruppin, E. (2002). Actor–critic models of the basal ganglia: New anatomical and computational perspectives. Neural Networks, 15(4), 535–547. https://doi.org/10.1016/S0893-6080(02)00047-3

    Article  PubMed  Google Scholar 

  24. Latimer, K. W., Yates, J. L., Meister, M. L. R., Huk, A. C., & Pillow, J. W. (2015). Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science, 349(6244), 184–187. https://doi.org/10.1126/science.aaa4056

    Article  PubMed  PubMed Central  Google Scholar 

  25. Lohse, K. R., Miller, M. W., Daou, M., Valerius, W., & Jones, M. (2020). Dissociating the contributions of reward-prediction errors to trial-level adaptation and long-term learning. Biological Psychology, 149, 107775. https://doi.org/10.1016/j.biopsycho.2019.107775

    Article  PubMed  Google Scholar 

  26. McDougle, S. D., & Taylor, J. A. (2019). Dissociable cognitive strategies for sensorimotor learning. Nature Communications, 10(1). https://doi.org/10.1038/s41467-018-07941-0

  27. Miletić, S., Boag, R. J., & Forstmann, B. U. (2020). Mutual benefits: Combining reinforcement learning with sequential sampling models. Neuropsychologia, 136, 107261. https://doi.org/10.1016/j.neuropsychologia.2019.107261

    Article  PubMed  Google Scholar 

  28. Mohr, H., Zwosta, K., Markovic, D., Bitzer, S., Wolfensteller, U., & Ruge, H. (2018). Deterministic response strategies in a trial-and-error learning task. PLoS Computational Biology, 14(11), e1006621. https://doi.org/10.1371/journal.pcbi.1006621

    Article  PubMed  PubMed Central  Google Scholar 

  29. Mowbray, G. H., & Rhoades, M. V. (1959). On the Reduction of Choice Reaction Times with Practice. Quarterly Journal of Experimental Psychology, 11(1), 16–23. https://doi.org/10.1080/17470215908416282

    Article  Google Scholar 

  30. Nosofsky, R. M., & Palmeri, T. J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104(2), 266–300. https://doi.org/10.1037/0033-295X.104.2.266

    Article  PubMed  Google Scholar 

  31. O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable Roles of Ventral and Dorsal Striatum in Instrumental Conditioning. Science, 304(5669), 452–454. https://doi.org/10.1126/science.1094285

    Article  PubMed  Google Scholar 

  32. Pearson, B., Raškevičius, J., Bays, P. M., Pertzov, Y., & Husain, M. (2014). Working memory retrieval as a decision process. Journal of Vision, 14(2). https://doi.org/10.1167/14.2.2

  33. Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin & Review, 24(4), 1234–1251. https://doi.org/10.3758/s13423-016-1199-y

    Article  Google Scholar 

  34. Posner, M. I., & Keele, S. W. (1967). Decay of Visual Information from a Single Letter. Science, 158(3797), 137–139. https://doi.org/10.1126/science.158.3797.137

    Article  PubMed  Google Scholar 

  35. Proctor, R. W., & Schneider, D. W. (2018). Hick’s law for choice reaction time: A review. Quarterly Journal of Experimental Psychology, 71(6), 1281–1299. https://doi.org/10.1080/17470218.2017.1322622

    Article  Google Scholar 

  36. Rabbitt, P. M. A. (1968). Repetition effects and signal classification strategies in serial choice-response tasks. Quarterly Journal of Experimental Psychology, 20(3), 232–240. https://doi.org/10.1080/14640746808400157

    Article  PubMed  Google Scholar 

  37. Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59–108.

    Article  Google Scholar 

  38. Ratcliff, R., & McKoon, G. (2008). The Diffusion Decision Model: Theory and Data for Two-Choice Decision Tasks. Neural Computation, 20(4), 873–922. https://doi.org/10.1162/neco.2008.12-06-420

    Article  PubMed  PubMed Central  Google Scholar 

  39. Ratcliff, R., & Rouder, J. N. (1998). Modeling Response Times for Two-Choice Decisions. Psychological Science, 9(5), 347–356. https://doi.org/10.1111/1467-9280.00067

    Article  Google Scholar 

  40. Remington, R. J. (1969). Analysis of sequential effects on choice reaction times. Journal of Experimental Psychology, 82(2), 250–257. https://doi.org/10.1037/h0028122

    Article  PubMed  Google Scholar 

  41. Rescorla, R. A., & Wagner, A. R. (1972). A Theory of Pavlovian Conditioning: Variations in the Effectiveness of Reinforcement and Nonreinforcement. In Classical conditioning II: current research and theory (pp. 64–99). Appleton-Century-Crofts.

  42. Schaaf, J. V., Jepma, M., Visser, I., & Huizenga, H. M. (2019). A hierarchical Bayesian approach to assess learning and guessing strategies in reinforcement learning. Journal of Mathematical Psychology, 93, 102276. https://doi.org/10.1016/j.jmp.2019.102276

    Article  Google Scholar 

  43. Schneider, D. W., & Anderson, J. R. (2011). A Memory-Based Model of Hick’s Law. Cognitive Psychology, 62(3), 193–222. https://doi.org/10.1016/j.cogpsych.2010.11.001

    Article  PubMed  PubMed Central  Google Scholar 

  44. Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136

    Article  Google Scholar 

  45. Sewell, D. K., Jach, H. K., Boag, R. J., & Van Heer, C. A. (2019). Combining error-driven models of associative learning with evidence accumulation models of decision-making. Psychonomic Bulletin & Review, 26(3), 868–893. https://doi.org/10.3758/s13423-019-01570-4

    Article  Google Scholar 

  46. Shadlen, M. N., & Newsome, W. T. (1996). Motion perception: Seeing and deciding. Proceedings of the National Academy of Sciences, 93(2), 628–633. https://doi.org/10.1073/pnas.93.2.628

    Article  Google Scholar 

  47. Shahar, N., Hauser, T. U., Moutoussis, M., Moran, R., Keramati, M., Consortium, N., & Dolan, R. J. (2019). Improving the reliability of model-based decision-making estimates in the two-stage decision task with reaction-times and drift-diffusion modeling. PLoS Computational Biology, 15(2), e1006803. https://doi.org/10.1371/journal.pcbi.1006803

    Article  PubMed  PubMed Central  Google Scholar 

  48. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

    Article  Google Scholar 

  49. Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. NeuroImage, 46(4), 1004–1017. https://doi.org/10.1016/j.neuroimage.2009.03.025

    Article  PubMed  PubMed Central  Google Scholar 

  50. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction (Vol. 1). MIT Press.

  51. Tajima, S., Drugowitsch, J., Patel, N., & Pouget, A. (2019). Optimal policy for multi-alternative decisions. Nature Neuroscience, 22(9), 1503–1511. https://doi.org/10.1038/s41593-019-0453-9

    Article  PubMed  Google Scholar 

  52. Usher, M., & McClelland, J. L. (2001). The time course of perceptual choice: The leaky, competing accumulator model. Psychological Review, 108(3), 550–592. https://doi.org/10.1037/0033-295X.108.3.550

    Article  Google Scholar 

  53. Wifall, T., Hazeltine, E., & Toby Mordkoff, J. (2016). The roles of stimulus and response uncertainty in forced-choice performance: An amendment to Hick/Hyman Law. Psychological Research, 80(4), 555–565. https://doi.org/10.1007/s00426-015-0675-8

    Article  PubMed  Google Scholar 

  54. Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. ELife, 8, e49547. https://doi.org/10.7554/eLife.49547

    Article  PubMed  PubMed Central  Google Scholar 

  55. Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of Experimental Psychology. General, 143(6), 2074–2081. https://doi.org/10.1037/a0038199

    Article  PubMed  PubMed Central  Google Scholar 

  56. Yartsev, M. M., Hanks, T. D., Yoon, A. M., & Brody, C. D. (2018). Causal contribution and dynamical encoding in the striatum during evidence accumulation. ELife, 7:e34929, 24.

    Google Scholar 

Download references

Acknowledgements

We would like to thank William Ryan for help with data collection, the CCN lab at UC Berkeley for helpful comments on analyses and interpretations, and the reviewers for their thorough and helpful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Samuel D. McDougle.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

ESM 1

(PDF 862 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

McDougle, S.D., Collins, A.G.E. Modeling the influence of working memory, reinforcement, and action uncertainty on reaction time and choice during instrumental learning. Psychon Bull Rev (2020). https://doi.org/10.3758/s13423-020-01774-z

Download citation

Keywords

  • Reinforcement learning
  • Working memory
  • Human memory and learning
  • Reaction time analysis