Abstract
The question of how animals and humans can solve arbitrary goal-driven problems remains open. Reinforcement learning (RL) methods have approached goal-directed control problems through model-based algorithms. However, RL focus on maximizing long-term reward is inconsistent with the psychological notion of planning to satisfy homeostatic drives, which involves setting goals first, then planning actions to achieve them. Optimal control theory suggests a solution: animals can learn a model of the world, learn where goals can be fulfilled, set a goal, and then act to minimize the difference between actual and desired world states. Here, we present a purely localist neural network model that can autonomously learn the structure of an environment and then achieve any arbitrary goal state in a changing environment without relearning reward values. The model, GOLSA, achieves this through a backwards spreading activation that propagates goal-values to an agent. The model elucidates how neural inhibitory mechanisms can support competition between goal representations, serving to push needs-based planning versus exploration. The model performs similar to humans in canonical revaluation tasks used to classify human and rodent behavior as goal-directed. The model revaluates optimal actions when goals, goal-values, world structure, and need to fulfill drive changes. The model also clarifies a number of issues inherent in other RL-based representations such as policy dependence in successor representations, while elucidating biological constraints such as the role of oscillations in gating information flow for learning versus action. Together, our proposed model suggests a biologically grounded framework for multi-step planning behaviors through consideration of how goal representations compete for behavioral expression in planning.
Similar content being viewed by others
References
Alvernhe, A., Save, E., & Poucet, B. (2011). Local remapping of place cell firing in the Tolman detour task. The European Journal of Neuroscience, 33, 1696–1705. https://doi.org/10.1111/j.1460-9568.2011.07653.x.
Attias, H. (2003). Planning by probabilistic inference. In Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics.
Averbeck, B. B., Chafee, M. V., Crowe, D. A., & Georgopoulos, A. P. (2002). Parallel processing of serial movements in prefrontal cortex. Proceedings of the National Academy of Sciences of the United States of America, 99, 13172–13177. https://doi.org/10.1073/pnas.162485599.
Badre, D., & Nee, D. E. (2018). Frontal cortex and the hierarchical control of behavior. In Trends in Cognitive Sciences., 22, 170–188. https://doi.org/10.1016/j.tics.2017.11.005.
Balleine, B. W., & O’Doherty, J. P. (2010). Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. In Neuropsychopharmacology., 35, 48–69. https://doi.org/10.1038/npp.2009.131.
Behrens, T. E. J., Muller, T. H., Whittington, J. C. R., Mark, S., Baram, A. B., Stachenfeld, K. L., & Kurth-Nelson, Z. (2018). What is a cognitive map? Organizing knowledge for flexible behavior. In Neuron, 100, 490–509. https://doi.org/10.1016/j.neuron.2018.10.002.
Beierholm, U., Guitart-Masip, M., Economides, M., Chowdhury, R., Düzel, E., Dolan, R., & Dayan, P. (2013). Dopamine modulates reward-related vigor. Neuropsychopharmacology., 38, 1495–1503. https://doi.org/10.1038/npp.2013.48.
Berger, H. (1929). Uber das Elektroenkephalogramm des Menschen. Archiv Fur Psychatrie, 87, 527–570.
Berke, J. D. (2018). What does dopamine mean? In Nature Neuroscience., 21, 787–793. https://doi.org/10.1038/s41593-018-0152-y.
Bertsekas, D. P. (2010). Dynamic programming and optimal control 3rd Edition , Volume II by Chapter 6 Approximate dynamic programming approximate dynamic programming. Control., 10(1), 1.141.6891.
Bonnefond, M., Kastner, S., & Jensen, O. (2017). Communication between brain areas based on nested oscillations. ENeuro., 4, ENEURO.0153–ENEU16.2017. https://doi.org/10.1523/ENEURO.0153-16.2017.
Botvinick, M. M., & Cohen, J. D. (2014). The computational and neural basis of cognitive control: charted territory and new frontiers. Cognitive Science, 38, 1249–1285. https://doi.org/10.1111/cogs.12126.
Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515. https://doi.org/10.1016/j.tics.2010.09.001.
Caplan, J. B., Madsen, J. R., Schulze-Bonhage, A., Aschenbrenner-Scheibe, R., Newman, E. L., & Kahana, M. J. (2003). Human θ oscillations related to sensorimotor integration and spatial learning. Journal of Neuroscience, 23(11), 4726–4736.
Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control. In Trends in cognitive sciences (Vol. 18, Issue 8, pp. 414–421). https://doi.org/10.1016/j.tics.2014.04.012.
Coon, W. G., Gunduz, a., Brunner, P., Ritaccio, a. L., Pesaran, B., & Schalk, G. (2016). Oscillatory phase modulates the timing of neuronal activations and resulting behavior. NeuroImage, 133, 294–301. https://doi.org/10.1016/j.neuroimage.2016.02.080.
Daw, N. D., & Dayan, P. (2014). The algorithmic anatomy of model-based evaluation. Philosophical Transactions of the Royal Society, B: Biological Sciences, 369, 20130478. https://doi.org/10.1098/rstb.2013.0478.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8, 1704–1711. https://doi.org/10.1038/nn1560.
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron., 69, 1204–1215. https://doi.org/10.1016/j.neuron.2011.02.027.
Dayan, P. (1993). Improving generalization for temporal difference Learning: The Successor Representation. Neural Computation. https://doi.org/10.1162/neco.1993.5.4.613.
Dayan, P. (2009). Goal-directed control and its antipodes. Neural Networks, 22, 213–219. https://doi.org/10.1016/j.neunet.2009.03.004.
Dayan, P., & Berridge, K. C. (2014). Model-based and model-free Pavlovian reward learning: revaluation, revision, and revelation. In Cognitive, Affective and Behavioral Neuroscience., 14, 473–492. https://doi.org/10.3758/s13415-014-0277-8.
De Wit, S., & Dickinson, A. (2009). Associative theories of goal-directed behaviour: a case for animal-human translational models. Psychological Research Psychologische Forschung, 73, 463–476. https://doi.org/10.1007/s00426-009-0230-6.
Dickinson, A., & Balleine, B. (1994). Motivational control of goal-directed action. Animal Learning & Behavior, 22, 1–18. https://doi.org/10.3758/BF03199951.
Djikstra, E. W. (1959). A note on two problems in Connexion with graphs. Numerische Mathematik.
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. In Neuron., 80, 312–325. https://doi.org/10.1016/j.neuron.2013.09.007.
Doll, B. B., Simon, D. A., & Daw, N. D. (2012). The ubiquity of model-based reinforcement learning. In Current Opinion in Neurobiology., 22, 1075–1081. https://doi.org/10.1016/j.conb.2012.08.003.
Ekstrom, A. D., Kahana, M. J., Caplan, J. B., Fields, T. A., Isham, E. A., Newman, E. L., & Fried, I. (2003). Cellular networks underlying human spatial navigation. Nature., 425, 184–188. https://doi.org/10.1038/nature01964.
Eliasmith, C., & Anderson, C. H. (2003). Neural engineering: computation, representation, and dynamics in neurobiological systems. MIT Press.
Epstein, R. A., Patai, E. Z., Julian, J. B., & Spiers, H. J. (2017). The cognitive map in humans: spatial navigation and beyond. In Nature Neuroscience., 20, 1504–1513. https://doi.org/10.1038/nn.4656.
Friedrich, J., & Lengyel, M. (2016). Goal-directed decision making with spiking neurons. The Journal of Neuroscience, 36, 1529–1546. https://doi.org/10.1523/JNEUROSCI.2854-15.2016.
Gaussier, P., Revel, A., Banquet, J. P., & Babeau, V. (2002). From view cells and place cells to cognitive map learning: processing stages of the hippocampal system. Biological Cybernetics, 86, 15–28. https://doi.org/10.1007/s004220100269.
Gauthier, J. L., & Tank, D. W. (2018). A dedicated population for reward coding in the Hippocampus. Neuron., 99, 179–193.e7. https://doi.org/10.1016/j.neuron.2018.06.008.
Genovesio, A., Tsujimoto, S., & Wise, S. P. (2012a). Encoding goals but not abstract magnitude in the primate prefrontal cortex. Neuron., 74, 656–662. https://doi.org/10.1016/j.neuron.2012.02.023.
Genovesio, A., Tsujimoto, S., & Wise, S. P. (2012b). Encoding goals but not abstract magnitude in the primate prefrontal cortex. Neuron., 74, 656–662. https://doi.org/10.1016/j.neuron.2012.02.023.
Gershman, S. J. (2018). The successor representation: its computational logic and neural substrates. The Journal of Neuroscience, 38, 7193–7200. https://doi.org/10.1523/JNEUROSCI.0151-18.2018.
Goel, V., & Grafman, J. (1995). Are the frontal lobes implicated in “planning” functions? Interpreting data from the Tower of Hanoi. Neuropsychologia., 33, 623–642. https://doi.org/10.1016/0028-3932(95)90866-P.
Guerguiev, J., Lillicrap, T. P., & Richards, B. A. (2017). Towards deep learning with segregated dendrites. ELife., 6. https://doi.org/10.7554/eLife.22901.
Guitart-Masip, M., Huys, Q. J. M., Fuentemilla, L., Dayan, P., Duzel, E., & Dolan, R. J. (2012). Go and no-go learning in reward and punishment: interactions between affect and effect. NeuroImage., 62, 154–166. https://doi.org/10.1016/j.neuroimage.2012.04.024.
Hart, P. E., Nilsson, N. J., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics., 4, 100–107. https://doi.org/10.1109/TSSC.1968.300136.
Herrmann, C. S., Munk, M. H. J., & Engel, A. K. (2004). Cognitive functions of gamma-band activity: memory match and utilization. Trends in Cognitive Sciences, 8(8), 347–355. https://doi.org/10.1016/j.tics.2004.06.006.
Hull, C. L. (1943). Principles of behavior. An introduction to behavior theory. The Journal of Philosophy, 40, 558. https://doi.org/10.2307/2019960.
Huys, Q. J. M., Eshel, N., O’Nions, E., Sheridan, L., Dayan, P., & Roiser, J. P. (2012). Bonsai trees in your head: how the pavlovian system sculpts goal-directed choices by pruning decision trees. PLoS Computational Biology, 8. https://doi.org/10.1371/journal.pcbi.1002410.
Ivey, R., Bullock, D., & Grossberg, S. (2011). A neuromorphic model of spatial lookahead planning. Neural Networks, 24, 257–266. https://doi.org/10.1016/j.neunet.2010.11.002.
Jones, J. L., Esber, G. R., McDannald, M. A., Gruber, A. J., Hernandez, A., Mirenzi, A., & Schoenbaum, G. (2012). Orbitofrontal cortex supports behavior and learning using inferred but not cached values. Science., 338, 953–956. https://doi.org/10.1126/science.1227489.
Juechems, K., & Summerfield, C. (2019). Where does value come from? In Trends in cognitive sciences, 23, 836–850. https://doi.org/10.1016/j.tics.2019.07.012.
Juechems, K., Balaguer, J., Herce Castañón, S., Ruz, M., O’Reilly, J. X., & Summerfield, C. (2019). A network for computing value equilibrium in the human medial prefrontal cortex. Neuron., 101, 977–987.e3. https://doi.org/10.1016/j.neuron.2018.12.029.
Keramati, M., & Gutkin, B. (2014). Homeostatic reinforcement learning for integrating reward collection and physiological stability. ELife., 3. https://doi.org/10.7554/eLife.04811.
Klausberger, T., Somogyi, P. (2008). Neuronal diversity and temporal dynamics: the unity of hippocampal circuit operations. Science, 321(5885), 53–57. https://doi.org/10.1126/science.1149381.
Knoblock, C. A. (1990). Learning abstraction hierarchies for problem solving. In AAAI (pp. 923–928). Chicago.
Lau, B., & Glimcher, P. W. (2008). Value representations in the primate striatum during matching behavior. Neuron., 58, 451–463. https://doi.org/10.1016/j.neuron.2008.02.021.
Levine, S. (2018). Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909.
Liu, D., & Todorov, E. (2007). Evidence for the flexible sensorimotor strategies predicted by optimal feedback control. The Journal of Neuroscience, 27, 9354–9368. https://doi.org/10.1523/JNEUROSCI.1110-06.2007.
Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12, 2519–2535. https://doi.org/10.1162/089976600300014827.
Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., & Gershman, S. J. (2017). The successor representation in human reinforcement learning. Nature Human Behaviour, 1, 680–692. https://doi.org/10.1038/s41562-017-0180-8.
Mushiake, H., Saito, N., Sakamoto, K., Itoyama, Y., & Tanji, J. (2006). Activity in the lateral prefrontal cortex reflects multiple steps of future events in action plans. Neuron., 50, 631–641. https://doi.org/10.1016/j.neuron.2006.03.045.
Niv, Y., Joel, D., & Dayan, P. (2006). A normative perspective on motivation. Trends in Cognitive Sciences, 10, 375–381. https://doi.org/10.1016/j.tics.2006.06.010.
O'Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. Oxford: Clarendon Press.
O’Keefe, J., Burgess, N., Donnett, J. G., Jeffery, K. J., & Maguire, E. A. (1998). Place cells, navigational accuracy, and the human hippocampus. Philosophical Transactions of the Royal Society, B: Biological Sciences, 353, 1333–1340. https://doi.org/10.1098/rstb.1998.0287.
O’Reilly, R. C. (2020). Unraveling the mysteries of motivation. In Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2020.03.001, 24, 425, 434.
O’Reilly, R. C., & Munakata, Y. (2019). Computational explorations in cognitive neuroscience. In Computational Explorations in Cognitive Neuroscience. https://doi.org/10.7551/mitpress/2014.001.0001.
O’Reilly, R., Hazy, T. E., Mollick, J. A., Mackie, P., & Herd, S. A. (2014). Goal-driven cognition in the brain: a computational framework. arXiv: Neurons and Cognition.
Palva, J. M., Monto, S., Kulashekhar, S., & Palva, S. (2010). Neuronal synchrony reveals working memory networks and predicts individual memory capacity. Proceedings of the National Academy of Sciences of the United States of America, 107(16), 7580–7585. https://doi.org/10.1073/pnas.0913113107.
Passingham, R., & Wise, S. (2012). The neurobiology of the prefrontal cortex: anatomy, evolution, and the origin of insight. OUP Oxford.
Piray, P., & Daw, N. D. (2020). Linear reinforcement learning: flexible reuse of computation in planning, grid fields, and cognitive control. bioRxiv. https://doi.org/10.1101/856849.
Poucet, B., Lenck-Santini, P. P., Hok, V., Save, E., Banquet, J. P., Gaussier, P., & Muller, R. U. (2004). Spatial navigation and hippocampal place cell firing: the problem of goal encoding. In Reviews in the Neurosciences., 15, 89–107. https://doi.org/10.1515/REVNEURO.2004.15.2.89.
Roesch, M. R., Calu, D. J., & Schoenbaum, G. (2007). Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nature Neuroscience, 10(12), 1615–1624. https://doi.org/10.1038/nn2013.
Roux, F., & Uhlhaas, P. J. (2014). Working memory and neural oscillations: alpha-gamma versus theta-gamma codes for distinct WM information? Trends in Cognitive Sciences, 18(1), 16–25. https://doi.org/10.1016/j.tics.2013.10.010.
Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., & Daw, N. D. (2017). Predictive representations can link model-based reinforcement learning to model-free mechanisms. PLoS Computational Biology, 13, e1005768. https://doi.org/10.1371/journal.pcbi.1005768.
Salamone, J. D., & Correa, M. (2012). The mysterious motivational functions of mesolimbic dopamine. In Neuron., 76, 470–485. https://doi.org/10.1016/j.neuron.2012.10.021.
Schuck, N. W., Cai, M. B., Wilson, R. C., & Niv, Y. (2016). Human orbitofrontal cortex represents a cognitive map of state space. Neuron., 91, 1402–1412. https://doi.org/10.1016/j.neuron.2016.08.019.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science., 275, 1593–1599. https://doi.org/10.1126/science.275.5306.1593.
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129–138. https://doi.org/10.1037/h0042769.
Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20, 1643–1653. https://doi.org/10.1038/nn.4650.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. Cambridge: MIT press.
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211. https://doi.org/10.1016/S0004-3702(99)00052-1.
Tai, L. H., Lee, A. M., Benavidez, N., Bonci, A., & Wilbrecht, L. (2012). Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nature Neuroscience, 15, 1281–1289. https://doi.org/10.1038/nn.3188.
Toates. (1986). Motivational Systems. In Problems in the behavioral sciences. New York: Cambridge University Press.
Todorov, E. (2009). Efficient computation of optimal actions. Proceedings of the National Academy of Sciences of the United States of America, 106, 11478–11483. https://doi.org/10.1073/pnas.0710743106.
Tolman, E. C. (1948). Cognitive maps in rats and men. Psychological Review, 55, 189–208. https://doi.org/10.1037/h0061626.
Voloh, B., Valiante, T. A., Everling, S., & Womelsdorf, T. (2015). Theta-gamma coordination between anterior cingulate and prefrontal cortex indexes correct attention shifts. Proceedings of the National Academy of Sciences of the United States of America, 112, 8457–8462. https://doi.org/10.1073/pnas.1500438112.
Welsh, M., Cicerello, A., Cuneo, K., & Brennan, M. (1995). Error and temporal patterns in tower of Hanoi performance: cognitive mechanisms and individual differences. The Journal of General Psychology, 122, 69–81. https://doi.org/10.1080/00221309.1995.9921223.
Westbrook, J., van den Bosch, R., Maatta, J.I., Hofmans, L., Papadopetraki, D., Cools, R., & Frank, M. J. (2020). Dopamine promotes cognitive effort by biasing the benefits versus costs of cognitive work. Science, 367, 1362–1366.
Wikenheiser, A. M., & Schoenbaum, G. (2016). Over the river, through the woods: cognitive maps in the hippocampus and orbitofrontal cortex. In Nature Reviews Neuroscience., 17, 513–523. https://doi.org/10.1038/nrn.2016.56.
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G., & Niv, Y. (2014). Orbitofrontal cortex as a cognitive map of task space. Neuron., 81, 267–279. https://doi.org/10.1016/j.neuron.2013.11.005.
Wimmer, G. E., & Büchel, C. (2019). Learning of distant state predictions by the orbitofrontal cortex in humans. Nature Communications, 10. https://doi.org/10.1038/s41467-019-10597-z.
Wunderlich, K., Smittenaar, P., & Dolan, R. J. (2012). Dopamine enhances model-based over model-free choice behavior. Neuron., 75, 418–424. https://doi.org/10.1016/j.neuron.2012.03.042.
Yamagata, T., Nakayama, Y., Tanji, J., & Hoshi, E. (2012). Distinct information representation and processing for goal-directed behavior in the dorsolateral and ventrolateral prefrontal cortex and the dorsal premotor cortex. The Journal of Neuroscience, 32, 12934–12949. https://doi.org/10.1523/JNEUROSCI.2398-12.2012.
Acknowledgments
We thank A. Ramamoorthy for helpful discussions on the manuscript.
Funding
JWB was supported by NIH R21 DA040773.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
ESM 1
(PDF 3210 kb)
Rights and permissions
About this article
Cite this article
Fine, J.M., Zarr, N. & Brown, J.W. Computational Neural Mechanisms of Goal-Directed Planning and Problem Solving. Comput Brain Behav 3, 472–493 (2020). https://doi.org/10.1007/s42113-020-00095-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42113-020-00095-7