Cognitive, Affective, & Behavioral Neuroscience

, Volume 18, Issue 1, pp 117–126 | Cite as

Pure correlates of exploration and exploitation in the human brain

  • Tommy C. Blanchard
  • Samuel J. Gershman


Balancing exploration and exploitation is a fundamental problem in reinforcement learning. Previous neuroimaging studies of the exploration–exploitation dilemma could not completely disentangle these two processes, making it difficult to unambiguously identify their neural signatures. We overcome this problem using a task in which subjects can either observe (pure exploration) or bet (pure exploitation). Insula and dorsal anterior cingulate cortex showed significantly greater activity on observe trials compared to bet trials, suggesting that these regions play a role in driving exploration. A model-based analysis of task performance suggested that subjects chose to observe until a critical evidence threshold was reached. We observed a neural signature of this evidence accumulation process in the ventromedial prefrontal cortex. These findings support theories positing an important role for anterior cingulate cortex in exploration, while also providing a new perspective on the roles of insula and ventromedial prefrontal cortex.


reinforcement learning fMRI decision making 



We are grateful to Joel Voss for helpful comments on an earlier draft. This research was carried out at the Harvard Center for Brain Science with the support of the Pershing Square Fund for Research on the Foundations of Human Behavior. This work involved the use of instrumentation supported by the NIH Shared Instrumentation Grant Program, Grant No. S10OD020039. We acknowledge the University of Minnesota Center for Magnetic Resonance Research for use of the multiband-EPI pulse sequences.

Compliance with ethical standards

Conflict of interest

The authors declare no competing financial interests.


  1. Amiez, C., Sallet, J., Procyk, E., & Petrides, M. (2012). Modulation of feedback related activity in the rostral anterior cingulate cortex during trial and error exploration. NeuroImage, 63, 1078–1090.CrossRefPubMedGoogle Scholar
  2. Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73, 595–607.CrossRefPubMedPubMedCentralGoogle Scholar
  3. Bartra, O., McGuire, J. T., Kable, J. W. (2013). The valuation system: A coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. NeuroImage, 76, 412–27.CrossRefPubMedPubMedCentralGoogle Scholar
  4. Beharelle, A. R., Polania, R., Hare, T. A., & Ruff, C. C. (2015). Transcranial stimulation over frontopolar cortex elucidates the choice attributes and neural mechanisms used to resolve exploration-exploitation trade-offs. Journal of Neuroscience, 35(43), 14544–14556.CrossRefGoogle Scholar
  5. Blanchard, T. C., & Hayden, B. Y. (2014). Neurons in dorsal anterior cingulate cortex signal postdecisional variables in a foraging task. Journal of Neuroscience, 34, 646–655.CrossRefPubMedPubMedCentralGoogle Scholar
  6. Boorman, E. D., Behrens, T. E., Woolrich, M. W., & Rushworth, M. F. (2009). How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron, 62, 733–743.CrossRefPubMedGoogle Scholar
  7. Boorman, E. D., Rushworth, M. F., & Behrens, T. E. (2013). Ventromedial prefrontal and anterior cingulate cortex adopt choice and default reference frames during sequential multi-alternative choice. The Journal of Neuroscience, 33, 2242–2253.Google Scholar
  8. Bush, R. R., & Mosteller, F. (1951). A mathematical model for simple learning. Psychological Review, 58, 313–323.CrossRefPubMedGoogle Scholar
  9. Chan, S. C. Y., Niv, Y., & Norman, K. A. (2016). A probability distribution over latent causes in the orbitofrontal cortex. Journal of Neuroscience, 36, 7817–7828.CrossRefPubMedPubMedCentralGoogle Scholar
  10. Cohen, J. D., McClure, S. M., & Yu, A. J. (2007). Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 362, 933–942.CrossRefPubMedPubMedCentralGoogle Scholar
  11. d’Acremont, M., Fornari, E., & Bossaerts, P. (2013). Activity in inferior parietal and medial prefrontal cortex signals the accumulation of evidence in a probability learning task. PLOS ONE, 9, e1002895.Google Scholar
  12. Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441, 876–879.CrossRefPubMedPubMedCentralGoogle Scholar
  13. Donoso, M., Collins, A. G., & Koechlin, E. (2014). Human cognition: Foundations of human reasoning in the prefrontal cortex. Science, 344, 1481–1486.CrossRefPubMedGoogle Scholar
  14. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. American Economic Review, 88, 848–881.Google Scholar
  15. Feinberg, D. A., Moeller, S., Smith, S. M., Auerbach, E., Ramanna, S., Glasser, M. F., … Yacoub, E. (2010). Multiplexed echo planar imaging for sub-second whole brain fMRI and fast diffusion imaging. PLOS ONE, 5, e15710.Google Scholar
  16. Frank, M. J., Doll, B. B., Oas-Terpstra, J., & Moreno, F. (2009). Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nature Neuroscience, 12, 1062–1068.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Frank, M. J., Gagne, C., Nyhus, E., Masters, S., Wiecki, T.V., Cavanagh, J. F., & Badre, D. (2015). fMRI and EEG predictors of dynamic decision parameters during human reinforcement learning. Journal of Neuroscience, 35, 484–494.CrossRefGoogle Scholar
  18. Gershman, S. J., & Niv, Y. (2015). Novelty and inductive generalization in human reinforcement learning. Topics in Cognitive Science, 7, 391–415.CrossRefPubMedPubMedCentralGoogle Scholar
  19. Hayden, B. Y., Pearson, J. M., & Platt, M. L. (2011). Neuronal basis of sequential foraging decisions in a patchy environment. Nature Neuroscience, 14, 933–939.CrossRefPubMedPubMedCentralGoogle Scholar
  20. Karlsson, M. P., Tervo, D. G. R., & Karpova, A. Y. (2012). Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science, 338, 135–139.CrossRefPubMedGoogle Scholar
  21. Kayser, A. S., Op de Macks, Z., Dahl, R. E., & Frank, M. J. (2016). A neural correlate of strategic exploration at the onset of adolescence. Journal of Cognitive Neuroscience, 28, 199–209.CrossRefPubMedGoogle Scholar
  22. Knox, W. B., Otto, A. R., Stone, P., & Love, B. C. (2012). The nature of belief-directed exploratory choice in human decision-making. Frontiers in Psychology, 2, 398.CrossRefPubMedPubMedCentralGoogle Scholar
  23. Kolling, N., Behrens, T. E. J., Mars, R. B., & Rushworth, M. F. S. (2012). Neural mechanisms of foraging. Science, 336, 95–98.CrossRefPubMedPubMedCentralGoogle Scholar
  24. Li, J., McClure, S. M., King-Casas, B., & Montague, P. R. (2006). Policy adjustment in a dynamic economic game. PLOS ONE, 1, e103.CrossRefPubMedPubMedCentralGoogle Scholar
  25. Moeller, S., Yacoub, E., Olman, C. A., Auerbach, E., Strupp, J., Harel, N., & Uğurbil, K. (2010). Multiband multislice GE-EPI at 7 Tesla with 16-fold acceleration using partial parallel imaging with application to high spatial and temporal whole-brain fMRI. Magnetic Resonance in Medicine, 63, 1144–1153.CrossRefPubMedPubMedCentralGoogle Scholar
  26. Navarro, D. J., Newell, B., & Schulze, C. (2016). Learning and choosing in an uncertain world: An investigation of the explore-exploit dilemma in static and dynamic environments. Cognitive Psychology, 85, 43–77.CrossRefPubMedGoogle Scholar
  27. Ohira, H., Matsunaga, M., Murakami, H., Osumi, T., Fukuyama, S., Shinoda, J., & Yamada J. (2013). Neural mechanisms mediating association of sympathetic activity and exploration in decision-making. Neuroscience, 246, 362–374.CrossRefPubMedGoogle Scholar
  28. Olveczky, B. P., Andalman, A. S., & Fee, M. S. (2005). Vocal experimentation in the juvenile songbird requires a basal ganglia circuit. PLOS ONE: Biology, 3, 153.CrossRefGoogle Scholar
  29. Otto, A. R., Knox, W. B., Markman, A. B., & Love, B. C. (2014). Physiological and behavioral signatures of reflective exploratory choice. Cognitive, Affective, & Behavioral Neuroscience, 14, 1167–1183.CrossRefGoogle Scholar
  30. Pedersen, M. L., Frank, M. J., & Biele, G. (2017). The drift diffusion model as the choice rule in reinforcement learning. Psychonomic Bulletin & Review, 24(4), 1234–1251.CrossRefGoogle Scholar
  31. Procyk, E., Tanaka, Y. L., & Joseph, J. P. (2000). Anterior cingulate activity during routine and non-routine sequential behaviors in macaques. Nature Neuroscience, 3, 502–508.CrossRefPubMedGoogle Scholar
  32. Quilodran, R., Rothe, M., & Procyk, E. (2008). Behavioral shifts and action valuation in the anterior cingulate cortex. Neuron, 57, 314–325.Google Scholar
  33. Santos, F. J., Oliveira, R. F., Jin, X., & Costa, R. M. (2015). Corticostriatal dynamics encode the refinement of specific behavioral variability during skill learning. eLife, 4, e09423.PubMedPubMedCentralGoogle Scholar
  34. Shenhav, A., Botvinick, M. M., & Cohen, J. D. (2013). The expected value of control: An integrative theory of anterior cingulate cortex function. Neuron, 79, 217–240.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Speekenbrink, M., & Konstantinidis, E. (2015). Uncertainty and exploration in a restless bandit problem. Topics in Cognitive Science, 7, 351–367.Google Scholar
  36. Spunt, B. (2016). spunt/bspmview: BSPMVIEW v.20161108. Zenodo. Retrieved from
  37. Stan Development Team (2016). RStan: The R interface to Stan (R Package Version 2.14.1) [Computer software]. Retrieved from
  38. Stern, E. R., Gonzalez, R., Welsh, R. C., & Taylor, S. F. (2010). Updating beliefs for a decision: Neural correlates of uncertainty and underconfidence. Journal of Neuroscience, 30, 8032–8041.CrossRefPubMedPubMedCentralGoogle Scholar
  39. Summerfield, C. S., & Koechlin, E. (2008). A neural representation of prior information during perceptual inference. Neuron, 59, 336–347.CrossRefPubMedGoogle Scholar
  40. Tversky, A., & Edwards, W. (1966). Information versus reward in binary choices. Journal of Experimental Psychology, 71(5), 680–683.CrossRefPubMedGoogle Scholar
  41. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., … Joliot, M. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage, 15, 273–289.CrossRefPubMedGoogle Scholar
  42. Wang, J. X., & Voss, J. L. (2014). Brain networks for exploration decisions utilizing distinct modeled information types during contextual learning. Neuron, 82, 1171–1182.CrossRefPubMedPubMedCentralGoogle Scholar
  43. Wilson, R. C., Geana, A., White, J. M., Ludvig, E. A., & Cohen, J. D. (2014). Humans use directed and random exploration to solve the explore-exploit dilemma. Journal of Experimental Psychology: General, 143, 2074–2081.CrossRefGoogle Scholar
  44. Woolley S. C., Rajan, R., Joshua, M., & Doupe, A. J. (2014). Emergence of context dependent variability across a basal ganglia network. Neuron, 82, 208–223.CrossRefPubMedPubMedCentralGoogle Scholar
  45. Worthy, D. A., Pang, B., & Byrne, K. A. (2013). Decomposing the roles of perseveration and expected value representation in models of the Iowa gambling task. Frontiers in Psychology, 4, 640.CrossRefPubMedPubMedCentralGoogle Scholar
  46. Xu, J., Moeller, S., Auerbach, E. J., Strupp, J., Smith, S. M., Feinberg, D. A., … Ugurbil, K. (2013). Evaluation of slice accelerations using multiband echo planar imaging at 3 T. NeuroImage, 83, 991–1001.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2017

Authors and Affiliations

  1. 1.Department of Psychology and Center for Brain ScienceHarvard UniversityCambridgeUSA

Personalised recommendations