State-transition-free reinforcement learning in chimpanzees (Pan troglodytes)

Sato, Yutaro; Sakai, Yutaka; Hirata, Satoshi

doi:10.3758/s13420-023-00591-3

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes)

Published: 27 June 2023

Volume 51, pages 413–427, (2023)
Cite this article

Learning & Behavior Aims and scope Submit manuscript

187 Accesses
1 Altmetric
Explore all metrics

Abstract

The outcome of an action often occurs after a delay. One solution for learning appropriate actions from delayed outcomes is to rely on a chain of state transitions. Another solution, which does not rest on state transitions, is to use an eligibility trace (ET) that directly bridges a current outcome and multiple past actions via transient memories. Previous studies revealed that humans (Homo sapiens) learned appropriate actions in a behavioral task in which solutions based on the ET were effective but transition-based solutions were ineffective. This suggests that ET may be used in human learning systems. However, no studies have examined nonhuman animals with an equivalent behavioral task. We designed a task for nonhuman animals following a previous human study. In each trial, participants chose one of two stimuli that were randomly selected from three stimulus types: a stimulus associated with a food reward delivered immediately, a stimulus associated with a reward delivered after a few trials, and a stimulus associated with no reward. The presented stimuli did not vary according to the participants’ choices. To maximize the total reward, participants had to learn the value of the stimulus associated with a delayed reward. Five chimpanzees (Pan troglodytes) performed the task using a touchscreen. Two chimpanzees were able to learn successfully, indicating that learning mechanisms that do not depend on state transitions were involved in the learning processes. The current study extends previous ET research by proposing a behavioral task and providing empirical data from chimpanzees.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deconstructing the effect of self-directed study on episodic memory

Article 19 June 2014

Optimizing performance through intrinsic motivation and attention for learning: The OPTIMAL theory of motor learning

Article 29 January 2016

Flow

Data availability

Data and materials are available upon reasonable request.

Code availability

Codes are available upon reasonable request.

References

Akam, T., Rodrigues-Vaz, I., Marcelo, I., Zhang, X., Pereira, M., Oliveira, R. F., Dayan, P., & Costa, R. M. (2021). The anterior cingulate cortex predicts future states to mediate model-based action selection. Neuron, 109(1), 149-163.e7. https://doi.org/10.1016/j.neuron.2020.10.013
Article PubMed PubMed Central Google Scholar
Amsel, A. (1958). The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin, 55(2), 102–119. https://doi.org/10.1037/h0043125
Article PubMed Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, SMC-13(5), 834–846. https://doi.org/10.1109/TSMC.1983.6313077
Article Google Scholar
Ben-Artzi, I., Luria, R., & Shahar, N. (2022). Working memory capacity estimates moderate value learning for outcome-irrelevant features. Scientific Reports, 12, 19677. https://doi.org/10.1038/s41598-022-21832-x
Article PubMed PubMed Central Google Scholar
Beran, M. J. (2001). Do chimpanzees have expectations about reward presentation following correct performance on computerized cognitive testing? The Psychological Record, 51(2), 173–183. https://doi.org/10.1007/BF03395393
Article Google Scholar
Beran, M. J., Perdue, B. M., Futch, S. E., Smith, J. D., Evans, T. A., & Parrish, A. E. (2015). Go when you know: Chimpanzees’ confidence movements reflect their responses in a computerized memory task. Cognition, 142, 236–246. https://doi.org/10.1016/j.cognition.2015.05.023
Article PubMed PubMed Central Google Scholar
Bogacz, R., McClure, S. M., Li, J., Cohen, J. D., & Montague, P. R. (2007). Short-term memory traces for action bias in human reinforcement learning. Brain Research, 1153(1), 111–121. https://doi.org/10.1016/j.brainres.2007.03.057
Article PubMed Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215. https://doi.org/10.1016/j.neuron.2011.02.027
Article PubMed PubMed Central Google Scholar
Eckstein, M. K., Wilbrecht, L., & Collins, A. G. E. (2021). What do reinforcement learning models measure? Interpreting model parameters in cognition and neuroscience. Current Opinion in Behavioral Sciences, 41, 128–137. https://doi.org/10.1016/j.cobeha.2021.06.004
Article PubMed PubMed Central Google Scholar
Fay, M. (2010). Confidence intervals that match Fisher’s exact or Blaker’s exact tests. Biostatistics, 11(2), 373–374. https://doi.org/10.1093/biostatistics/kxp050
Article PubMed Google Scholar
Gabry, J., & Češnovar, R. (2021). cmdstanr: R Interface to “CmdStan.” https://mc-stan.org/cmdstanr
Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D., & Brea, J. (2018). Eligibility traces and plasticity on behavioral time scales: Experimental support of neoHebbian three-factor learning rules. Frontiers in Neural Circuits, 12, 53. https://doi.org/10.3389/fncir.2018.00053
Article PubMed PubMed Central Google Scholar
Gureckis, T. M., & Love, B. C. (2009). Short-term gains, long-term pains: How cues about state aid learning in dynamic environments. Cognition, 113(3), 293–313. https://doi.org/10.1016/j.cognition.2009.03.013
Article PubMed PubMed Central Google Scholar
Hothorn, T., Hornik, K., van de Wiel, M. A., & Zeileis, A. (2006). A Lego system for conditional inference. The American Statistician, 60(3), 257–263. https://doi.org/10.1198/000313006X118430
Article Google Scholar
Itakura, S. (1993). Emotional behavior during the learning of a contingency task in a chimpanzee. Perceptual and Motor Skills, 76(2), 563–566. https://doi.org/10.2466/pms.1993.76.2.563
Article PubMed Google Scholar
Jocham, G., Brodersen, K. H. H., Constantinescu, A. O. O., Kahn, M. C. C., Ianni, A. M., Walton, M. E. E., Rushworth, M. F. F. S., & Behrens, T. E. E. J. (2016). Reward-guided learning with and without causal attribution. Neuron, 90(1), 177–190. https://doi.org/10.1016/j.neuron.2016.02.018
Article PubMed PubMed Central Google Scholar
Katahira, K. (2018). Kodo deta no keisanron moderingu—Kyoka gakusyu moderu wo rei toshite— [Computational Modeling of Behavioral Data]. Ohmsha.
Katahira, K., Yu, B., & Nakao, T. (2017). Pseudo-learning effects in reinforcement learning model-based analysis: A problem of misspecification of initial preference. PsyArXiv. https://doi.org/10.31234/osf.io/a6hzq
Lehmann, M. P., Xu, H. A., Liakoni, V., Herzog, M. H., Gerstner, W., & Preuschoff, K. (2019). One-shot learning and behavioral eligibility traces in sequential decision making. eLife, 8, e47463. https://doi.org/10.7554/eLife.47463
Article PubMed PubMed Central Google Scholar
Minsky, M. (1961). Steps toward artificial intelligence. Proceedings of the IRE, 49(1), 8–30. https://doi.org/10.1109/JRPROC.1961.287775
Article Google Scholar
Nassar, M. R., & Frank, M. J. (2016). Taming the beast: Extracting generalizable knowledge from computational models of cognition. Current Opinion in Behavioral Sciences, 11, 49–54. https://doi.org/10.1016/j.cobeha.2016.04.003
Article PubMed PubMed Central Google Scholar
Nonomura, S., Nishizawa, K., Sakai, Y., Kawaguchi, Y., Kato, S., Uchigashima, M., …, Kimura, M. (2018). Monitoring and updating of action selection for goal-directed behavior through the striatal direct and indirect pathways. Neuron, 99(6), 1302–1314.e5. https://doi.org/10.1016/j.neuron.2018.08.002
Nosarzewska, A., Peng, D. N., & Zentall, T. R. (2021). Pigeons acquire the 1-back task: Implications for implicit versus explicit learning? Learning & Behavior. https://doi.org/10.3758/s13420-021-00468-3
Article Google Scholar
Palminteri, S., Wyart, V., & Koechlin, E. (2017). The importance of falsification in computational cognitive modeling. Trends in Cognitive Sciences, 21(6), 425–433. https://doi.org/10.1016/j.tics.2017.03.011
Article PubMed Google Scholar
Papini, M. R., Guarino, S., Hagen, C., & Torres, C. (2022). Incentive disengagement and the adaptive significance of frustrative nonreward. Learning & Behavior, 50(3), 372–388. https://doi.org/10.3758/s13420-022-00519-3
Article Google Scholar
Pike, A. C., Lowther, M., & Robinson, O. J. (2021). The importance of common currency tasks in translational psychiatry. Current Behavioral Neuroscience Reports, 8(1), 1–10. https://doi.org/10.1007/s40473-021-00225-w
Article PubMed PubMed Central Google Scholar
R Core Team. (2021). R: A laguage and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/
Redish, A. D., Kepecs, A., Anderson, L. M., Calvin, O. L., Grissom, N. M., Haynos, A. F., …, Zilverstand, A. (2022). Computational validity: Using computation to translate behaviours across species. Philosophical Transactions of the Royal Society B, 377(1844), 20200525. https://doi.org/10.1098/rstb.2020.0525
Rosati, A. G., & Hare, B. (2013). Chimpanzees and bonobos exhibit emotional responses to decision outcomes. PLoS ONE, 8(5), e63058. https://doi.org/10.1371/journal.pone.0063058
Article PubMed PubMed Central Google Scholar
Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. CUED/F-INFENG/TR 166. Cambridge University Engineering Department.
Sakai, Y., Sakai, Y., Abe, Y., Narumoto, J., & Tanaka, S. C. (2022). Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior. Cell Reports, 40(9), 111275. https://doi.org/10.1016/j.celrep.2022.111275
Article PubMed Google Scholar
Sato, Y., Sakai, Y., & Hirata, S. (2020). Computerized intertemporal choice task in chimpanzees (Pan troglodytes) with/without postreward delay. Journal of Comparative Psychology, 135(2), 185–195. https://doi.org/10.1037/com0000254
Article PubMed Google Scholar
Scholl, J., & Klein-Flügge, M. (2018). Understanding psychiatric disorder by capturing ecologically relevant features of learning and decision-making. Behavioural Brain Research, 355, 56–75. https://doi.org/10.1016/j.bbr.2017.09.050
Article PubMed PubMed Central Google Scholar
Schultz, W. (1997). Dopamine neurons and their role in reward mechanisms. Current Opinion in Neurobiology, 7(2), 191–197. https://doi.org/10.1016/S0959-4388(97)80007-4
Article PubMed Google Scholar
Seo, H., & Lee, D. (2010). Orbitofrontal cortex assigns credit wisely. Neuron, 65(6), 736–738. https://doi.org/10.1016/j.neuron.2010.03.016
Article PubMed Google Scholar
Shen, W., Flajolet, M., Greengard, P., & Surmeier, D. J. (2008). Dichotomous dopaminergic control of striatal synaptic plasticity. Science, 321(5890), 848–851. https://doi.org/10.1126/science.1160575
Article PubMed PubMed Central Google Scholar
Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1–3), 123–158. https://doi.org/10.1007/BF00114726
Article Google Scholar
Smith, J. D., Jackson, B. N., & Church, B. A. (2020). Monkeys (Macaca mulatta) learn two-choice discriminations under displaced reinforcement. Journal of Comparative Psychology, 134(4), 423–434. https://doi.org/10.1037/com0000227
Article Google Scholar
Stan Development Team. (2019). Stan User’s Guide 2.21.
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3(1), 9–44. https://doi.org/10.1023/A:1022633531479
Article Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). The MIT press. https://lccn.loc.gov/2018023826
Tanaka, S. C., Shishida, K., Schweighofer, N., Okamoto, Y., Yamawaki, S., & Doya, K. (2009). Serotonin affects association of aversive outcomes to past actions. Journal of Neuroscience, 29(50), 15669–15674. https://doi.org/10.1523/JNEUROSCI.2799-09.2009
Article PubMed Google Scholar
Tartaglia, E. M., Clarke, A. M., & Herzog, M. H. (2017). What to choose next? A paradigm for testing human sequential decision making. Frontiers in Psychology, 8, 312. https://doi.org/10.3389/fpsyg.2017.00312
Article PubMed PubMed Central Google Scholar
Tomonaga, M., Kurosawa, Y., Kawaguchi, Y., & Takiyama, H. (2023). Don’t look back on failure: Spontaneous uncertainty monitoring in chimpanzees. Learning & Behavior. https://doi.org/10.3758/s13420-023-00581-5
Article Google Scholar
Torchiano, M. (2020). effsize: Efficient effect size computation. https://doi.org/10.5281/zenodo.1480624, R package version 0.8.1 (https://CRAN.R-project.org/package=effsize).
Walsh, M. M., & Anderson, J. R. (2011). Learning from delayed feedback: Neural responses in temporal credit assignment. Cognitive, Affective and Behavioral Neuroscience, 11(2), 131–143. https://doi.org/10.3758/s13415-011-0027-0
Article PubMed Google Scholar
Walsh, M. M., & Anderson, J. R. (2014). Navigating complex decision spaces: Problems and paradigms in sequential choice. Psychological Bulletin, 140(2), 466–486. https://doi.org/10.1037/a0033455
Article PubMed Google Scholar
Watanabe, M., Cromwell, H. C., Tremblay, L., Hollerman, J. R., Hikosaka, K., & Schultz, W. (2001). Behavioral reactions reflecting differential reward expectations in monkeys. Experimental Brain Research, 140, 511–518. https://doi.org/10.1007/s002210100856
Article PubMed Google Scholar
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292.
Article Google Scholar
Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. Elife, 8, e49547. https://doi.org/10.7554/eLife.49547
Article PubMed PubMed Central Google Scholar
Yagishita, S., Hayashi-Takagi, A., Ellis-Davies, G. C. R., Urakubo, H., Ishii, S., & Kasai, H. (2014). A critical time window for dopamine actions on the structural plasticity of dendritic spines. Science, 345(6204), 1616–1620. https://doi.org/10.1126/science.1255514
Article PubMed PubMed Central Google Scholar
Yoo, A. H., & Collins, A. G. E. (2022). How working memory and reinforcement learning are intertwined: A cognitive, neural, and computational perspective. Journal of Cognitive Neuroscience, 34(4), 551–568. https://doi.org/10.1162/jocn_a_01808
Article PubMed Google Scholar
Zentall, T. R., Peng, D. N., & Mueller, P. M. (2022). 1-Back reinforcement matching and mismatching by pigeons: Implicit or explicit learning? Behavioural Processes, 195, 104562. https://doi.org/10.1016/j.beproc.2021.104562
Article PubMed Google Scholar

Download references

Acknowledgements

We thank the staff and researchers at Kumamoto Sanctuary for their help with the study, particularly Dr. N. Morimura and Dr. F. Kano. We thank Benjamin Knight, MSc., from Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.

Funding

This study was supported financially by the Ministry of Education, Culture, Sports, Science, Japan Society for the Promotion of Science to YSato (grant number 19J22889), to SH (grant numbers 26245069, 18H05524, 23H00494), and to TM (grant number 16H06283); a Program for Leading Graduate Schools to TM (U04); and the Great Ape Information Network.

Author information

Yutaro Sato
Present address: University Administration Office, Headquarters for Management Strategy, Niigata University, Niigata, Japan

Authors and Affiliations

Wildlife Research Center, Kyoto University, Kyoto, Japan
Yutaro Sato & Satoshi Hirata
Brain Science Institute, Tamagawa University, Tokyo, Japan
Yutaka Sakai

Authors

Yutaro Sato
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Sakai
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Hirata
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Yutaro Sato, Yutaka Sakai, Satoshi Hirata. Methodology: Yutaro Sato, Yutaka Sakai. Formal analysis, Investigation, Data Curation and Visualization: Yutaro Sato. Writing–Original Draft, Yutaro Sato, Yutaka Sakai. Resources: Yutaro Sato, Satoshi Hirata. Writing–Review and Editing: Satoshi Hirata. Supervision: Satoshi Hirata. Project administration: Satoshi Hirata. Funding acquisition: Yutaro Sato, Satoshi Hirata.

Corresponding author

Correspondence to Yutaro Sato.

Ethics declarations

Ethics approval

Animal husbandry and research protocols complied with the Guide for Animal Research Ethics provided by the Wildlife Research Center, Kyoto University (No. WRC-2020-KS006A). For human participants (Online Supplementary Materials (OSM)), the research protocol was approved by the Ethics Committee of the Unit for Advanced Study of Mind at Kyoto University (2-P-16).

Consent to participate

Informed consent was obtained from all individual human participants included in the study (OSM).

Consent for publication

Human participants (OSM) signed informed consent that included publishing their data.

Conflict of interest

The authors have no known conflicts of interest to disclose.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open practices statements

None of the data or materials for the experiments reported here have been deposited online, and none of the experiments was preregistered.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 537 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sato, Y., Sakai, Y. & Hirata, S. State-transition-free reinforcement learning in chimpanzees (Pan troglodytes). Learn Behav 51, 413–427 (2023). https://doi.org/10.3758/s13420-023-00591-3

Download citation

Accepted: 07 June 2023
Published: 27 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.3758/s13420-023-00591-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes)

Abstract

Access this article

Similar content being viewed by others

Deconstructing the effect of self-directed study on episodic memory

Optimizing performance through intrinsic motivation and attention for learning: The OPTIMAL theory of motor learning

Flow

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's note

Open practices statements

Supplementary Information

Supplementary file1 (PDF 537 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

State-transition-free reinforcement learning in chimpanzees (Pan troglodytes)

Abstract

Access this article

Similar content being viewed by others

Deconstructing the effect of self-directed study on episodic memory

Optimizing performance through intrinsic motivation and attention for learning: The OPTIMAL theory of motor learning

Flow

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's note

Open practices statements

Supplementary Information

Supplementary file1 (PDF 537 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation