Skip to main content

Reinforcement Learning

Application to fMRI

  • Chapter
  • First Online:
An Introduction to Model-Based Cognitive Neuroscience

Abstract

Over the last two decades, the model-based approach to analysing functional magnetic resonance imaging (fMRI) data has been adopted across the cognitive neurosciences to study how computations are implemented in the brain. In this time, methods have advanced along both computational modelling and neuroimaging domains. This chapter aims to provide an introduction to the general method of integrating computational models into fMRI analyses as well as a discussion on contemporary considerations regarding these recent advances. The chapter begins with an exposition to the formalisation of qualitative psychological hypotheses into quantitatively testable and falsifiable computational models. We use examples from the conditioning and reinforcement learning literature to ground this discussion given the origin of model-based fMRI in uncovering neural correlates of learning processes. We then provide an overview of the methodological approach underlying model-based fMRI. This extends to pragmatic considerations when working in this domain with an eye towards more recent developments in both fMRI and computational modelling, such as multivariate analyses and assessing model quality, respectively. Finally, we provide examples in which computations described in the first section of the chapter were successfully bridged with fMRI analyses to provide a richer understanding of reinforcement learning in the brain. This chapter is therefore aimed at both the cognitive neuroscientist seeking to adapt computational approaches to their neuroimaging research as well as those specifically interested in learning and decision-making across levels of analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    There are a variety of ways in which \(\lambda ^{US}\) might be set depending on the experimental context. A simple coding would be to define \(\lambda ^{US}\) as 1 on trials in which a reward is presented and 0 otherwise, for example, in experiments where reward magnitudes are not manipulated. Alternatively, the magnitude of the UR elicited by the US may be a useful metric.

  2. 2.

    It should be noted that the full formulation of the R-W model includes a second learning rate parameter associated with the US to incorporate the assumption that the rate of learning may also depend on the particular US in the experiment (Rescorla, 1972).

  3. 3.

    Here we ignore some important aspects of the TD framework for simplicity, such as temporal discounting and eligibility traces. References to relevant papers on these aspects are presented in Sect. 7.

  4. 4.

    The term ‘policy’ refers to how an animal or human acts given the situation they face.

  5. 5.

    In the case of two alternatives, the softmax function reduces to a logistic sigmoid function of their difference.

  6. 6.

    It can be important to check whether the parametric regressor is normalised or mean-centred. If not, it can be artificially highly correlated with the corresponding onset regressor, particularly if the parametric variable has only positive values. Some, but not all, common fMRI software packages automatically scale this regressor.

  7. 7.

    This can be done easily with most standard fMRI software packages, which include functions to estimate the efficiency of the GLM design (Fig. 2c).

  8. 8.

    In the case of Kahnt et al. (2011), values were computed from a model based on the objective features of the choice alternatives, rather a model of subjective valuation that depends on parameters fit to participants’ data. One possibility is that in the latter case it is advantageous to binarise computationally derived continuous variables, thereby turning the problem from one of regression to classification, depending on the reliability of the parameter estimates. Further simulation work is required to test these different approaches.

References

  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

    Article  Google Scholar 

  • Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 5, 834–846.

    Article  Google Scholar 

  • Botvinick, M. M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3), 262–280.

    Article  PubMed  Google Scholar 

  • Büchel, C., Bornhövd, K., Quante, M., Glauche, V., Bromm, B., & Weiller, C. (2002). Dissociable neural responses related to pain intensity, stimulus intensity, and stimulus awareness within the anterior cingulate cortex: a parametric single-trial laser functional magnetic resonance imaging study. Journal of Neuroscience, 22(3), 970–976.

    Article  PubMed  Google Scholar 

  • Büchel, C., Holmes, A., Rees, G., & Friston, K. (1998). Characterizing stimulus–response functions using nonlinear regressors in parametric fMRI experiments. Neuroimage, 8(2), 140–148.

    Article  PubMed  Google Scholar 

  • Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58(5), 313.

    Article  PubMed  Google Scholar 

  • Caplin, A., & Dean, M. (2008). Axiomatic methods, dopamine and reward prediction error. Current Opinion in Neurobiology, 18(2), 197–202.

    Article  PubMed  Google Scholar 

  • Casella, G., & Berger, R. L. (2021). Statistical inference. Cengage Learning.

    Google Scholar 

  • Chan, S. C., Niv, Y., & Norman, K. A. (2016). A probability distribution over latent causes, in the orbitofrontal cortex. Journal of Neuroscience, 36(30), 7817–7828.

    Article  PubMed  Google Scholar 

  • Cohen, J. D., Daw, N., Engelhardt, B., Hasson, U., Li, K., Niv, Y., Norman, K. A., Pillow, J., Ramadge, P. J., Turk-Browne, N. B., et al. (2017). Computational approaches to fMRI analysis. Nature Neuroscience, 20(3), 304–313.

    Article  PubMed  PubMed Central  Google Scholar 

  • Colas, J. T., Pauli, W. M., Larsen, T., Tyszka, J. M., & O’Doherty, J. P. (2017). Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: Evidence from high-resolution fMRI. PLoS Computational Biology, 13(10), e1005810.

    Article  PubMed  PubMed Central  Google Scholar 

  • Collins, A. G., & Frank, M. J. (2014). Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337.

    Article  PubMed  Google Scholar 

  • Cross, L., Cockburn, J., Yue, Y., & O’Doherty, J. P. (2020). Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron, 109(4), 724–738.

    Article  PubMed  PubMed Central  Google Scholar 

  • Davis, T., LaRocque, K. F., Mumford, J. A., Norman, K. A., Wagner, A. D., & Poldrack, R. A. (2014). What do differences between multi-voxel and univariate analysis mean? how subject-, voxel-, and trial-level variance impact fMRI analysis. Neuroimage, 97, 271–283.

    Article  PubMed  Google Scholar 

  • Daw, N. D. et al. (2011). Trial-by-trial data analysis using computational models. Decision Making, Affect, and Learning: Attention and Performance XXIII, 23(1), 3–38.

    Article  Google Scholar 

  • Daw, N. D., O’doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.

    Article  PubMed  PubMed Central  Google Scholar 

  • Daw, N. D., & Tobler, P. N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics (pp. 283–298). Elsevier.

    Google Scholar 

  • Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

    Article  PubMed  PubMed Central  Google Scholar 

  • Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325.

    Article  PubMed  PubMed Central  Google Scholar 

  • Edelman, S., Grill-Spector, K., Kushnir, T., & Malach, R. (1998). Toward direct visualization of the internal shape representation space by fMRI. Psychobiology, 26(4), 309–321.

    Article  Google Scholar 

  • Friston, K. J., Holmes, A. P., Price, C., Büchel, C., & Worsley, K. (1999). Multisubject fMRI studies and conjunction analyses. Neuroimage, 10(4), 385–396.

    Article  PubMed  Google Scholar 

  • Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-P., Frith, C. D., & Frackowiak, R. S. (1994). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2(4), 189–210.

    Article  Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC Press.

    Book  Google Scholar 

  • Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–760.

    Google Scholar 

  • Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.

    Article  Google Scholar 

  • Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6.

    Article  Google Scholar 

  • Gittins, J. C., & Jones, D. M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66(3), 561–565.

    Article  Google Scholar 

  • Gläscher, J. P., & O’Doherty, J. P. (2010). Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 501–510.

    PubMed  Google Scholar 

  • Glaser, J. I., Benjamin, A. S., Chowdhury, R. H., Perich, M. G., Miller, L. E., & Kording, K. P. (2020). Machine learning for neural decoding. Eneuro, 7(4), 1–16.

    Article  Google Scholar 

  • Güçlü, U., & van Gerven, M. A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.

    Article  PubMed  Google Scholar 

  • Hampton, A. N., Bossaerts, P., & O’doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26(32), 8360–8367.

    Article  PubMed  Google Scholar 

  • Hampton, A. N., Bossaerts, P., & O’Doherty, J. P. (2008). Neural correlates of mentalizing-related computations during strategic interactions in humans. Proceedings of the National Academy of Sciences, 105(18), 6741–6746.

    Article  Google Scholar 

  • Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425–2430.

    Article  PubMed  Google Scholar 

  • Haxby, J. V., Gobbini, M. I., & Nastase, S. A. (2020). Naturalistic stimuli reveal a dominant role for agentic action in visual representation. Neuroimage, 216, 116561.

    Article  PubMed  Google Scholar 

  • Haynes, J.-D. (2015). A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives. Neuron, 87(2), 257–270.

    Article  PubMed  Google Scholar 

  • Haynes, J.-D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534.

    Article  PubMed  Google Scholar 

  • Holland, P. C., & Rescorla, R. A. (1975). Second-order conditioning with food unconditioned stimulus. Journal of Comparative and Physiological Psychology, 88(1), 459.

    Article  PubMed  Google Scholar 

  • Hull, C. L. (1939). The problem of stimulus equivalence in behavior theory. Psychological Review, 46(1), 9.

    Article  Google Scholar 

  • Hunt, L. T., Malalasekera, W. N., de Berker, A. O., Miranda, B., Farmer, S. F., Behrens, T. E., & Kennerley, S. W. (2018). Triple dissociation of attention and decision computations across prefrontal cortex. Nature Neuroscience, 21(10), 1471–1481.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hutcherson, C. A., Bushong, B., & Rangel, A. (2015). A neurocomputational model of altruistic choice and its implications. Neuron, 87(2), 451–462.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kahnt, T., Heinzle, J., Park, S. Q., & Haynes, J.-D. (2011). Decoding different roles for VMPFC and DLPFC in multi-attribute decision making. Neuroimage, 56(2), 709–715.

    Article  PubMed  Google Scholar 

  • Kamin, L. (1969). Predictability, surprise, attention, and conditioning. in B. A. Campbell, & R. M. Church (Eds.). Punishment and aversive behavior (pp. 279-296). New York: Appleton-Century-Crofts.

    Google Scholar 

  • Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10(11), e1003915.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446.

    Article  PubMed  Google Scholar 

  • Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences, 103(10), 3863–3868.

    Article  Google Scholar 

  • Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401–412.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4.

    PubMed  PubMed Central  Google Scholar 

  • Lau, B., & Glimcher, P. W. (2005). Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84(3), 555–579.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lebreton, M., Bavard, S., Daunizeau, J., & Palminteri, S. (2019). Assessing inter-individual differences with task-related functional neuroimaging. Nature Human Behaviour, 3(9), 897–905.

    Article  PubMed  Google Scholar 

  • Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.

    Book  Google Scholar 

  • Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology, 23(20), 2023–2027.

    Article  PubMed  Google Scholar 

  • Marr, D., & Poggio, T. (1976). From understanding computation to understanding neural circuitry.

    Google Scholar 

  • Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117(3), 363.

    Article  PubMed  Google Scholar 

  • Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for value-based choice response times under high and low time pressure. Judgment and Decision Making, 5(6), 437–449.

    Article  Google Scholar 

  • Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16(5), 1936–1947.

    Article  PubMed  Google Scholar 

  • Mumford, J. A., Davis, T., & Poldrack, R. A. (2014). The impact of study design on pattern estimation for single-trial multivariate pattern analysis. Neuroimage, 103, 130–138.

    Article  PubMed  Google Scholar 

  • Mumford, J. A., Poline, J.-B., & Poldrack, R. A. (2015). Orthogonalization of regressors in fMRI models. PloS One, 10(4), e0126255.

    Article  PubMed  PubMed Central  Google Scholar 

  • Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. Neuroimage, 59(3), 2636–2643.

    Article  PubMed  Google Scholar 

  • Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47(1), 90–100.

    Article  Google Scholar 

  • Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., & Gallant, J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6), 902–915.

    Article  PubMed  PubMed Central  Google Scholar 

  • Nastase, S. A., Goldstein, A., & Hasson, U. (2020). Keep it real: Rethinking the primacy of experimental control in cognitive neuroscience. NeuroImage, 222, 117254.

    Article  PubMed  Google Scholar 

  • Niv, Y., Daniel, R., Geana, A., Gershman, S. J., Leong, Y. C., Radulescu, A., & Wilson, R. C. (2015). Reinforcement learning in multidimensional environments relies on attention mechanisms. Journal of Neuroscience, 35(21), 8145–8157.

    Article  PubMed  Google Scholar 

  • Niv, Y., & Langdon, A. (2016). Reinforcement learning with MARR. Current Opinion in Behavioral Sciences, 11, 67–73.

    Article  PubMed  PubMed Central  Google Scholar 

  • Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12(7), 265–272.

    Article  PubMed  Google Scholar 

  • Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.

    Article  PubMed  Google Scholar 

  • O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304(5669), 452–454.

    Article  PubMed  Google Scholar 

  • O’Doherty, J. P., Cockburn, J., & Pauli, W. M. (2017). Learning, reward, and decision making. Annual Review of Psychology, 68, 73–100.

    Article  PubMed  Google Scholar 

  • O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38(2), 329–337.

    Article  PubMed  Google Scholar 

  • O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences, 1104(1), 35–53.

    Article  PubMed  Google Scholar 

  • O’Doherty, J. P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., & Charpentier, C. J. (2021). Why and how the brain weights contributions from a mixture of experts. Neuroscience & Biobehavioral Reviews, 123, 14–23.

    Article  Google Scholar 

  • Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433.

    Article  PubMed  Google Scholar 

  • Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems, 10, 1043–1049.

    Google Scholar 

  • Pavlov, I. P., & Anrep, G. V. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex (Vol. 3). London: Oxford University Press

    Google Scholar 

  • Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS Computational Biology, 15(6), e1007043.

    Article  PubMed  PubMed Central  Google Scholar 

  • Polyn, S. M., Natu, V. S., Cohen, J. D., & Norman, K. A. (2005). Category-specific cortical activity precedes retrieval during memory search. Science, 310(5756), 1963–1966.

    Article  PubMed  Google Scholar 

  • Pouget, A., Dayan, P., & Zemel, R. (2000). Information processing with population codes. Nature Reviews Neuroscience, 1(2), 125–132.

    Article  PubMed  Google Scholar 

  • Rescorla, R. A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Current research and theory (pp. 64–99).

    Google Scholar 

  • Rizley, R. C., & Rescorla, R. A. (1972). Associations in second-order conditioning and sensory preconditioning. Journal of Comparative and Physiological Psychology, 81(1), 1.

    Article  PubMed  Google Scholar 

  • Rutledge, R. B., Dean, M., Caplin, A., & Glimcher, P. W. (2010). Testing the reward prediction error hypothesis with an axiomatic model. Journal of Neuroscience, 30(40), 13525–13536.

    Article  PubMed  Google Scholar 

  • Schoenmakers, S., Barth, M., Heskes, T., & Van Gerven, M. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83, 951–961.

    Article  PubMed  Google Scholar 

  • Schuck, N. W., Cai, M. B., Wilson, R. C., & Niv, Y. (2016). Human orbitofrontal cortex represents a cognitive map of state space. Neuron, 91(6), 1402–1412.

    Article  PubMed  PubMed Central  Google Scholar 

  • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.

    Article  PubMed  Google Scholar 

  • Schwarz, G., et al. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

    Article  Google Scholar 

  • Skinner, B. F. (1963). Operant behavior. American Psychologist, 18(8), 503.

    Article  Google Scholar 

  • Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends in Cognitive Sciences, 23(8), 699–714.

    Article  PubMed  Google Scholar 

  • Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. Neuroimage, 46(4), 1004–1017.

    Article  PubMed  Google Scholar 

  • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.

    Article  Google Scholar 

  • Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Machine Learning Proceedings 1995 (pp. 531–539). Elsevier.

    Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88(2), 135.

    Article  PubMed  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 355–378). Seattle, WA.

    Google Scholar 

  • Sutton, R. S., Barto, A. G., et al. (1998). Introduction to reinforcement learning (Vol. 135). Cambridge: MIT Press.

    Google Scholar 

  • Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2), 181–211.

    Article  Google Scholar 

  • Thorndike, E. L. (1898). Animal intelligence: an experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i.

    Google Scholar 

  • Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189.

    Article  PubMed  Google Scholar 

  • Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79.

    Article  PubMed  Google Scholar 

  • Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. Elife, 8, e49547.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wilson, R. C., & Niv, Y. (2015). Is model fitting necessary for model-based fMRI? PLoS Computational Biology, 11(6), e1004237.

    Article  PubMed  PubMed Central  Google Scholar 

  • Witten, I. H. (1977). An adaptive optimal controller for discrete-time Markov environments. Information and Control, 34(4), 286–295.

    Article  Google Scholar 

  • Worsley, K. J., Liao, C. H., Aston, J., Petre, V., Duncan, G., Morales, F., & Evans, A. (2002). A general statistical analysis for fMRI data. Neuroimage, 15(1), 1–15.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincent Man .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Man, V., O’Doherty, J.P. (2024). Reinforcement Learning. In: Forstmann, B.U., Turner, B.M. (eds) An Introduction to Model-Based Cognitive Neuroscience. Springer, Cham. https://doi.org/10.1007/978-3-031-45271-0_3

Download citation

Publish with us

Policies and ethics