Abstract
Over the last two decades, the model-based approach to analysing functional magnetic resonance imaging (fMRI) data has been adopted across the cognitive neurosciences to study how computations are implemented in the brain. In this time, methods have advanced along both computational modelling and neuroimaging domains. This chapter aims to provide an introduction to the general method of integrating computational models into fMRI analyses as well as a discussion on contemporary considerations regarding these recent advances. The chapter begins with an exposition to the formalisation of qualitative psychological hypotheses into quantitatively testable and falsifiable computational models. We use examples from the conditioning and reinforcement learning literature to ground this discussion given the origin of model-based fMRI in uncovering neural correlates of learning processes. We then provide an overview of the methodological approach underlying model-based fMRI. This extends to pragmatic considerations when working in this domain with an eye towards more recent developments in both fMRI and computational modelling, such as multivariate analyses and assessing model quality, respectively. Finally, we provide examples in which computations described in the first section of the chapter were successfully bridged with fMRI analyses to provide a richer understanding of reinforcement learning in the brain. This chapter is therefore aimed at both the cognitive neuroscientist seeking to adapt computational approaches to their neuroimaging research as well as those specifically interested in learning and decision-making across levels of analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
There are a variety of ways in which \(\lambda ^{US}\) might be set depending on the experimental context. A simple coding would be to define \(\lambda ^{US}\) as 1 on trials in which a reward is presented and 0 otherwise, for example, in experiments where reward magnitudes are not manipulated. Alternatively, the magnitude of the UR elicited by the US may be a useful metric.
- 2.
It should be noted that the full formulation of the R-W model includes a second learning rate parameter associated with the US to incorporate the assumption that the rate of learning may also depend on the particular US in the experiment (Rescorla, 1972).
- 3.
Here we ignore some important aspects of the TD framework for simplicity, such as temporal discounting and eligibility traces. References to relevant papers on these aspects are presented in Sect. 7.
- 4.
The term ‘policy’ refers to how an animal or human acts given the situation they face.
- 5.
In the case of two alternatives, the softmax function reduces to a logistic sigmoid function of their difference.
- 6.
It can be important to check whether the parametric regressor is normalised or mean-centred. If not, it can be artificially highly correlated with the corresponding onset regressor, particularly if the parametric variable has only positive values. Some, but not all, common fMRI software packages automatically scale this regressor.
- 7.
This can be done easily with most standard fMRI software packages, which include functions to estimate the efficiency of the GLM design (Fig. 2c).
- 8.
In the case of Kahnt et al. (2011), values were computed from a model based on the objective features of the choice alternatives, rather a model of subjective valuation that depends on parameters fit to participants’ data. One possibility is that in the latter case it is advantageous to binarise computationally derived continuous variables, thereby turning the problem from one of regression to classification, depending on the reliability of the parameter estimates. Further simulation work is required to test these different approaches.
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 5, 834–846.
Botvinick, M. M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113(3), 262–280.
Büchel, C., Bornhövd, K., Quante, M., Glauche, V., Bromm, B., & Weiller, C. (2002). Dissociable neural responses related to pain intensity, stimulus intensity, and stimulus awareness within the anterior cingulate cortex: a parametric single-trial laser functional magnetic resonance imaging study. Journal of Neuroscience, 22(3), 970–976.
Büchel, C., Holmes, A., Rees, G., & Friston, K. (1998). Characterizing stimulus–response functions using nonlinear regressors in parametric fMRI experiments. Neuroimage, 8(2), 140–148.
Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58(5), 313.
Caplin, A., & Dean, M. (2008). Axiomatic methods, dopamine and reward prediction error. Current Opinion in Neurobiology, 18(2), 197–202.
Casella, G., & Berger, R. L. (2021). Statistical inference. Cengage Learning.
Chan, S. C., Niv, Y., & Norman, K. A. (2016). A probability distribution over latent causes, in the orbitofrontal cortex. Journal of Neuroscience, 36(30), 7817–7828.
Cohen, J. D., Daw, N., Engelhardt, B., Hasson, U., Li, K., Niv, Y., Norman, K. A., Pillow, J., Ramadge, P. J., Turk-Browne, N. B., et al. (2017). Computational approaches to fMRI analysis. Nature Neuroscience, 20(3), 304–313.
Colas, J. T., Pauli, W. M., Larsen, T., Tyszka, J. M., & O’Doherty, J. P. (2017). Distinct prediction errors in mesostriatal circuits of the human brain mediate learning about the values of both states and actions: Evidence from high-resolution fMRI. PLoS Computational Biology, 13(10), e1005810.
Collins, A. G., & Frank, M. J. (2014). Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on reinforcement learning and choice incentive. Psychological Review, 121(3), 337.
Cross, L., Cockburn, J., Yue, Y., & O’Doherty, J. P. (2020). Using deep reinforcement learning to reveal how the brain encodes abstract state-space representations in high-dimensional environments. Neuron, 109(4), 724–738.
Davis, T., LaRocque, K. F., Mumford, J. A., Norman, K. A., Wagner, A. D., & Poldrack, R. A. (2014). What do differences between multi-voxel and univariate analysis mean? how subject-, voxel-, and trial-level variance impact fMRI analysis. Neuroimage, 97, 271–283.
Daw, N. D. et al. (2011). Trial-by-trial data analysis using computational models. Decision Making, Affect, and Learning: Attention and Performance XXIII, 23(1), 3–38.
Daw, N. D., O’doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.
Daw, N. D., & Tobler, P. N. (2014). Value learning through reinforcement: the basics of dopamine and reinforcement learning. In Neuroeconomics (pp. 283–298). Elsevier.
Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325.
Edelman, S., Grill-Spector, K., Kushnir, T., & Malach, R. (1998). Toward direct visualization of the internal shape representation space by fMRI. Psychobiology, 26(4), 309–321.
Friston, K. J., Holmes, A. P., Price, C., Büchel, C., & Worsley, K. (1999). Multisubject fMRI studies and conjunction analyses. Neuroimage, 10(4), 385–396.
Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-P., Frith, C. D., & Frackowiak, R. S. (1994). Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapping, 2(4), 189–210.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC Press.
Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–760.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.
Gershman, S. J. (2016). Empirical priors for reinforcement learning models. Journal of Mathematical Psychology, 71, 1–6.
Gittins, J. C., & Jones, D. M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66(3), 561–565.
Gläscher, J. P., & O’Doherty, J. P. (2010). Model-based approaches to neuroimaging: combining reinforcement learning theory with fMRI data. Wiley Interdisciplinary Reviews: Cognitive Science, 1(4), 501–510.
Glaser, J. I., Benjamin, A. S., Chowdhury, R. H., Perich, M. G., Miller, L. E., & Kording, K. P. (2020). Machine learning for neural decoding. Eneuro, 7(4), 1–16.
Güçlü, U., & van Gerven, M. A. (2015). Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27), 10005–10014.
Hampton, A. N., Bossaerts, P., & O’doherty, J. P. (2006). The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. Journal of Neuroscience, 26(32), 8360–8367.
Hampton, A. N., Bossaerts, P., & O’Doherty, J. P. (2008). Neural correlates of mentalizing-related computations during strategic interactions in humans. Proceedings of the National Academy of Sciences, 105(18), 6741–6746.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., & Pietrini, P. (2001). Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539), 2425–2430.
Haxby, J. V., Gobbini, M. I., & Nastase, S. A. (2020). Naturalistic stimuli reveal a dominant role for agentic action in visual representation. Neuroimage, 216, 116561.
Haynes, J.-D. (2015). A primer on pattern-based approaches to fMRI: principles, pitfalls, and perspectives. Neuron, 87(2), 257–270.
Haynes, J.-D., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523–534.
Holland, P. C., & Rescorla, R. A. (1975). Second-order conditioning with food unconditioned stimulus. Journal of Comparative and Physiological Psychology, 88(1), 459.
Hull, C. L. (1939). The problem of stimulus equivalence in behavior theory. Psychological Review, 46(1), 9.
Hunt, L. T., Malalasekera, W. N., de Berker, A. O., Miranda, B., Farmer, S. F., Behrens, T. E., & Kennerley, S. W. (2018). Triple dissociation of attention and decision computations across prefrontal cortex. Nature Neuroscience, 21(10), 1471–1481.
Hutcherson, C. A., Bushong, B., & Rangel, A. (2015). A neurocomputational model of altruistic choice and its implications. Neuron, 87(2), 451–462.
Kahnt, T., Heinzle, J., Park, S. Q., & Haynes, J.-D. (2011). Decoding different roles for VMPFC and DLPFC in multi-attribute decision making. Neuroimage, 56(2), 709–715.
Kamin, L. (1969). Predictability, surprise, attention, and conditioning. in B. A. Campbell, & R. M. Church (Eds.). Punishment and aversive behavior (pp. 279-296). New York: Appleton-Century-Crofts.
Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10(11), e1003915.
Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417–446.
Kriegeskorte, N., Goebel, R., & Bandettini, P. (2006). Information-based functional brain mapping. Proceedings of the National Academy of Sciences, 103(10), 3863–3868.
Kriegeskorte, N., & Kievit, R. A. (2013). Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences, 17(8), 401–412.
Kriegeskorte, N., Mur, M., & Bandettini, P. A. (2008). Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2, 4.
Lau, B., & Glimcher, P. W. (2005). Dynamic response-by-response models of matching behavior in rhesus monkeys. Journal of the Experimental Analysis of Behavior, 84(3), 555–579.
Lebreton, M., Bavard, S., Daunizeau, J., & Palminteri, S. (2019). Assessing inter-individual differences with task-related functional neuroimaging. Nature Human Behaviour, 3(9), 897–905.
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.
Mack, M. L., Preston, A. R., & Love, B. C. (2013). Decoding the brain’s algorithm for categorization from its neural implementation. Current Biology, 23(20), 2023–2027.
Marr, D., & Poggio, T. (1976). From understanding computation to understanding neural circuitry.
Miller, R. R., Barnet, R. C., & Grahame, N. J. (1995). Assessment of the Rescorla-Wagner model. Psychological Bulletin, 117(3), 363.
Milosavljevic, M., Malmaud, J., Huth, A., Koch, C., & Rangel, A. (2010). The drift diffusion model can account for value-based choice response times under high and low time pressure. Judgment and Decision Making, 5(6), 437–449.
Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience, 16(5), 1936–1947.
Mumford, J. A., Davis, T., & Poldrack, R. A. (2014). The impact of study design on pattern estimation for single-trial multivariate pattern analysis. Neuroimage, 103, 130–138.
Mumford, J. A., Poline, J.-B., & Poldrack, R. A. (2015). Orthogonalization of regressors in fMRI models. PloS One, 10(4), e0126255.
Mumford, J. A., Turner, B. O., Ashby, F. G., & Poldrack, R. A. (2012). Deconvolving bold activation in event-related designs for multivoxel pattern classification analyses. Neuroimage, 59(3), 2636–2643.
Myung, I. J. (2003). Tutorial on maximum likelihood estimation. Journal of Mathematical Psychology, 47(1), 90–100.
Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., & Gallant, J. L. (2009). Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6), 902–915.
Nastase, S. A., Goldstein, A., & Hasson, U. (2020). Keep it real: Rethinking the primacy of experimental control in cognitive neuroscience. NeuroImage, 222, 117254.
Niv, Y., Daniel, R., Geana, A., Gershman, S. J., Leong, Y. C., Radulescu, A., & Wilson, R. C. (2015). Reinforcement learning in multidimensional environments relies on attention mechanisms. Journal of Neuroscience, 35(21), 8145–8157.
Niv, Y., & Langdon, A. (2016). Reinforcement learning with MARR. Current Opinion in Behavioral Sciences, 11, 67–73.
Niv, Y., & Schoenbaum, G. (2008). Dialogues on prediction errors. Trends in Cognitive Sciences, 12(7), 265–272.
Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424–430.
O’Doherty, J., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304(5669), 452–454.
O’Doherty, J. P., Cockburn, J., & Pauli, W. M. (2017). Learning, reward, and decision making. Annual Review of Psychology, 68, 73–100.
O’Doherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal difference models and reward-related learning in the human brain. Neuron, 38(2), 329–337.
O’Doherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to reward learning and decision making. Annals of the New York Academy of Sciences, 1104(1), 35–53.
O’Doherty, J. P., Lee, S., Tadayonnejad, R., Cockburn, J., Iigaya, K., & Charpentier, C. J. (2021). Why and how the brain weights contributions from a mixture of experts. Neuroscience & Biobehavioral Reviews, 123, 14–23.
Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433.
Parr, R., & Russell, S. (1998). Reinforcement learning with hierarchies of machines. Advances in Neural Information Processing Systems, 10, 1043–1049.
Pavlov, I. P., & Anrep, G. V. (1927). Conditioned reflexes: An investigation of the physiological activity of the cerebral cortex (Vol. 3). London: Oxford University Press
Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLoS Computational Biology, 15(6), e1007043.
Polyn, S. M., Natu, V. S., Cohen, J. D., & Norman, K. A. (2005). Category-specific cortical activity precedes retrieval during memory search. Science, 310(5756), 1963–1966.
Pouget, A., Dayan, P., & Zemel, R. (2000). Information processing with population codes. Nature Reviews Neuroscience, 1(2), 125–132.
Rescorla, R. A. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In Current research and theory (pp. 64–99).
Rizley, R. C., & Rescorla, R. A. (1972). Associations in second-order conditioning and sensory preconditioning. Journal of Comparative and Physiological Psychology, 81(1), 1.
Rutledge, R. B., Dean, M., Caplin, A., & Glimcher, P. W. (2010). Testing the reward prediction error hypothesis with an axiomatic model. Journal of Neuroscience, 30(40), 13525–13536.
Schoenmakers, S., Barth, M., Heskes, T., & Van Gerven, M. (2013). Linear reconstruction of perceived images from human brain activity. NeuroImage, 83, 951–961.
Schuck, N. W., Cai, M. B., Wilson, R. C., & Niv, Y. (2016). Human orbitofrontal cortex represents a cognitive map of state space. Neuron, 91(6), 1402–1412.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599.
Schwarz, G., et al. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Skinner, B. F. (1963). Operant behavior. American Psychologist, 18(8), 503.
Sonkusare, S., Breakspear, M., & Guo, C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends in Cognitive Sciences, 23(8), 699–714.
Stephan, K. E., Penny, W. D., Daunizeau, J., Moran, R. J., & Friston, K. J. (2009). Bayesian model selection for group studies. Neuroimage, 46(4), 1004–1017.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44.
Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Machine Learning Proceedings 1995 (pp. 531–539). Elsevier.
Sutton, R. S., & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psychological Review, 88(2), 135.
Sutton, R. S., & Barto, A. G. (1987). A temporal-difference model of classical conditioning. In Proceedings of the Ninth Annual Conference of the Cognitive Science Society (pp. 355–378). Seattle, WA.
Sutton, R. S., Barto, A. G., et al. (1998). Introduction to reinforcement learning (Vol. 135). Cambridge: MIT Press.
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2), 181–211.
Thorndike, E. L. (1898). Animal intelligence: an experimental study of the associative processes in animals. The Psychological Review: Monograph Supplements, 2(4), i.
Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55(4), 189.
Turner, B. M., Forstmann, B. U., Love, B. C., Palmeri, T. J., & Van Maanen, L. (2017). Approaches to analysis in model-based cognitive neuroscience. Journal of Mathematical Psychology, 76, 65–79.
Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. Elife, 8, e49547.
Wilson, R. C., & Niv, Y. (2015). Is model fitting necessary for model-based fMRI? PLoS Computational Biology, 11(6), e1004237.
Witten, I. H. (1977). An adaptive optimal controller for discrete-time Markov environments. Information and Control, 34(4), 286–295.
Worsley, K. J., Liao, C. H., Aston, J., Petre, V., Duncan, G., Morales, F., & Evans, A. (2002). A general statistical analysis for fMRI data. Neuroimage, 15(1), 1–15.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Man, V., O’Doherty, J.P. (2024). Reinforcement Learning. In: Forstmann, B.U., Turner, B.M. (eds) An Introduction to Model-Based Cognitive Neuroscience. Springer, Cham. https://doi.org/10.1007/978-3-031-45271-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-45271-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45270-3
Online ISBN: 978-3-031-45271-0
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)