Skip to main content
Log in

Humans Adopt Different Exploration Strategies Depending on the Environment

  • Original Paper
  • Published:
Computational Brain & Behavior Aims and scope Submit manuscript

Abstract

Humans explore to learn the structure of our environment. However, it remains unclear how consistent humans are in the exploration strategies we use and how often we explore across different environments which vary in their volatility. Using a within-subjects design, participants (n = 30) completed (1) a non-stationary bandit task where the reward values changed throughout, and (2) a stationary bandit task where one option always gave better reward. We used a series of reinforcement learning models to understand the exploration strategies humans adopted in the two tasks. We found that most participants adopted a behavioural heuristic strategy (Win-Stay, Lose-Shift) in the non-stationary bandit task. In contrast, most participants adopted a probabilistic, random exploration strategy (Softmax) in the stationary bandit task. We compared our results when fitting models individually within each task to when fitting models across both tasks—that is focusing on long-term predictions. When fitting across both tasks we found that most participants solely adopted a probabilistic, random exploration strategy. In addition, we found a moderate, positive relationship between exploration rate in each of the two bandit tasks. Our findings show that humans can flexibly adopt different exploration strategies depending on task demands, which we suggest is because the two bandit tasks assessed different aspects of learning and required different levels of cognitive flexibility. In addition, we speculate that the relationship between exploration rate could reflect a personality trait such as risk-taking. In sum, we found evidence for the flexible use of exploration strategies, while also observing evidence of the generalization of exploration across tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets generated are available from the corresponding author upon reasonable request. Datasets are not publicly available because participants did not consent for their data to be shared in a public repository. Modeling and analysis code will be published on: https://github.com/tomferg/BanditComp

Code Availability

Code generated during modeling and analysis will be published on the following Github link: https://github.com/tomferg/BanditComp

Notes

  1. In the non-stationary task, we always divided the points values obtained by 100 for all models where reward estimates were required (ε-Greedy, Softmax, Sliding Window Upper Confidence Bound, Gradient; Kalman Filter with Thompson Sampling).

References

Download references

Funding

Thomas D. Ferguson would like to acknowledge support from the Dr. Roland and Muriel Haryett Neuroscience Fellowship and the Natural Sciences and Engineering Research Council of Canada. Alona Fyshe would like to acknowledge support from the Canadian Institute for Advanced Research (CIFAR) Canadian AI Chairs program. Adam White would like to acknowledge support from the CIFAR Canadian AI Chairs program. Olave E. Krigolson would like to acknowledge support from the Natural Sciences and Engineering Research Council of Canada (RGPIN 2016–0943). The authors declare that none of the funding sources mentioned above had any involvement in the design of the experiment or the preparation and submission of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Thomas Ferguson and Olave Krigolson contributed to the study conception and design. Material preparation, data collection, and analysis were performed by Thomas Ferguson. Thomas Ferguson, Alona Fyshe, and Adam White contributed to the computational models used. The first draft of the manuscript was written by Thomas Ferguson and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Thomas D. Ferguson.

Ethics declarations

Ethics Approval

The Human Research Ethics Board at the University of Victoria approved all experimental procedures (Date: 25-Sep-2019; 19–0230), and all research was performed in line with the principles of the Declaration of Helsinki.

Consent to Participate

Participants provided written informed consent prior to the completion of the experimental session.

Consent to Publish

Participants provided consent to have aggregated data (averages) published in a research journal.

Competing Interests

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Correspondence should be directed to: Thomas D. Ferguson, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, T6G 2R3.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 970 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferguson, T., Fyshe, A., White, A. et al. Humans Adopt Different Exploration Strategies Depending on the Environment. Comput Brain Behav 6, 671–696 (2023). https://doi.org/10.1007/s42113-023-00178-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42113-023-00178-1

Keywords

Navigation