Introduction

Attention is a selective process that determines what information is represented by the brain in a given environment (Desimone & Duncan, 1995). Attention can be biased toward physically salient stimuli (Theeuwes, 1991, 1992; Theeuwes et al., 1998; Theeuwes et al., 1999; Yantis, 1993) or toward features that align with task goals (Bacon & Egeth, 1994, 1997; Folk et al., 1992). For example, if you are looking for a friend with a red hat, red-colored stimuli will tend to be preferentially attended. More recently, attention has also been shown to be biased toward stimuli that have been previously attended in what is referred to as selection history (Anderson et al., 2021; Awh et al., 2012). The influence of selection history on the control of attention – what we will call experience-driven attention – can be modulated by a variety of factors, including stimulus-reward associations (Anderson et al., 2011; Esterman et al., 2014; Kim & Anderson, 2019), aversive conditioning (Anderson & Britton, 2020; Nissens et al., 2017; Schmidt et al., 2015), and target-location probabilities (Gao & Theeuwes, 2020; Jiang et al., 2013). Previously reward-associated stimuli have been shown to involuntarily capture attention even when physically non-salient and currently task-irrelevant, suggesting that the control of attention can be value-driven (Anderson et al., 2011).

In addition to the influence of stimulus-reward associations, reward has been shown to modulate other examples of experience-driven attention including contextual cueing (Pollmann et al., 2016; Tseng & Lleras, 2013), statistical learning by distractor location (Pearson et al., 2020), and inter-trial priming (Kristjánsson et al., 2010). In the context of goal-directed attention, reward can serve as a motivational factor in improving task performance (Engelmann et al., 2009; Engelmann & Pessoa, 2014; Esterman et al., 2014; Mohanty et al., 2008). The influence of reward on goal-directed attention has generally been studied in situations in which the target of visual search is clearly and narrowly defined on every trial and the efficiency of selecting a single target is probed. How reward history might influence the choice of attentional control settings – or what to voluntarily search for when given a choice – remains to be investigated. Likewise, it is unclear whether value-driven attention might have downstream consequences for what a person chooses to prioritize with respect to the goal-directed control of attention.

The Adaptive Choice Visual Search (ACVS) task is a relatively new experimental paradigm that allows a participant to search for one of two possible targets in a dynamically changing environment (Irons & Leber, 2016, 2018). On each trial, one color is less abundant than the other. In this task, participants could conceivably use a variety of different search strategies on a given trial: they could search for a target of a specific color, serially search through all stimuli until a target is found, alternating between searching through red and blue clusters of stimuli, etc. It is objectively more efficient or “optimal” to adopt a strategy of preferentially searching through this less abundant color, which participants exhibit a modest but reliable tendency to do (Irons & Leber, 2016, 2018). Previously, we have demonstrated that a state of negative arousal (threat) improves the optimality of attentional control strategies in the ACVS task (Kim et al., 2021). In the present study, we aimed to investigate the modulatory role of reward history on search strategy in this task.

In a training phase, participants searched through stimuli rendered in one of two colors (red or blue) on a given trial to locate a single target. Identifying targets rendered in one color yielded reward while identifying targets in the other did not. In a subsequent test phase, stimuli were rendered in each of these two colors on a given trial and participants could choose which color to search through to find a target. Rewards were no longer available. As is typical in the ACVS task, the distribution of red and blue items varied such that items in one color could be less abundant than the other. Of interest was whether participants would be biased to search through items of the previously high-value color when they have the option of choosing one of a variety of possible search strategies, and whether this bias would be modulated by how optimal it would be to search in this manner on a given trial as a function of the distribution of color stimuli.

Experiment 1

Methods

Participants

Thirty-five participants (25 females), between the ages of 18 and 35 years inclusive (M = 20.3, SD = 3.17), were recruited from the Texas A&M University community. All participants were English-speaking and reported normal or corrected-to-normal visual acuity and normal color vision. All procedures were approved by the Texas A&M Institutional Review Board. The sample size was informed by a power analysis. We estimated the effect size for detecting a modulatory effect on choice from Kim et al. (2021), in which a similar ACVS task was used in measuring the modulatory role of threat, which was dz = 0.61. Using G*Power 3.1, the final sample size of n=31 (see Data analysis section) would yield power (1-β) > .90 at α = .05. Written informed consent was obtained for each participant and all study procedures were conducted in accordance with the principles expressed in the Declaration of Helsinki. Participants were compensated with their earnings in the task.

Apparatus

A Dell OptiPlex 7040 equipped with Matlab software and Psychophysics Toolbox extension (Brainard, 1997) was used to present the stimuli on a Dell P2717H monitor from a distance of approximately 70 cm in a dimly lit room. Participants made manual responses on a Millikey SR-5 r2 button box.

Stimuli

Each visual search array was composed of 54 colored squares (each approximately 1.1° × 1.1° visual angle) arranged in three concentric rings around the center of the screen. The inner ring had a radius of 7.3° and consisted of 12 boxes, the middle ring had a radius of 10.1° and consisted of 18 boxes, and the outer ring had a radius of 13.0° and consisted of 24 boxes. Each square in each ring was positioned equidistant from each other and contained a digit between 2 and 9, subtending 0.4° × 0.4°.

Task procedure

Following consent, participants practiced the training phase task. The practice session consisted of 20 trials and would repeat until participants found the target in > 85% of the practice trials. After completing the practice, participants completed three runs of the training phase. Then, participants completed a practice run of the test phase task, which was 21 trials long and would again repeat until > 85% accuracy was achieved. Importantly, no instructions were given concerning how to search and participants were free to follow the search strategy of their choice. Following this practice run, participants completed three runs of the test phase and were then dismissed from the experiment following monetary compensation.

Training phase

The training phase consisted of a fixation display, search array, feedback display, and an inter-trial interval (ITI). The fixation display consisted of a fixation cross at the center of the screen for 1,000 ms. There were two trial types for the search array: red and green color squares or blue and green color squares (Fig. 1a). Participants were instructed to search for a target square: a red or blue color square containing a digit between 2 and 5 inclusive. An equal number of target- and green-color squares was presented on each trial. All red or blue squares besides the target square contained a digit from 6 to 9. Green-colored squares were neutral to the task and contained digits between 2 and 9 to prevent participants from searching for numerical digits without respect to color (Irons & Leber, 2018). All digits inside non-target squares were assigned randomly using the aforementioned constraints. One of the two target colors was associated with monetary reward and which color was used as the reward-associated color alternated across participants. Upon identifying a target in the rewarded color, participants were shown “+$0.15” and their accumulated total earnings (rewarded color). In contrast, upon identifying a target in the unrewarded color, participants were shown “+0.00” and their accumulated total earnings (non-rewarded color). The feedback display was shown for 1,500 ms. If participants responded with a number other than the target number, they were presented with the words “Missed” with their accumulated total earnings. If participants did not make a manual response within the 5,500-ms time limit, they were presented with the words “Too Slow” and their accumulated total earnings. Lastly, the ITI lasted for 1,000 ms. Each run of the training phase was 60 trials long (30 trials of each trial type, randomly distributed) and participants completed a total of three runs with a short break between each run.

Fig. 1
figure 1

Sequence of trial events. a In the training phase, participants were shown a fixation display, visual search array, feedback display, and an inter-trial-interval. On correct response trials, participants would receive either 15 or 0 cents depending on the color-reward association for each participant (counterbalanced). b In the test phase, participants were shown a fixation display, visual search array, and an inter-trial-interval. Of interest was whether participants would search through the reward-associated color target when it was optimal, non-optimal, or neutral to do so

Test phase

The test phase consisted of a fixation display, search array, and ITI (Fig. 1b). The fixation display consisted of a fixation cross at the center of the screen that lasted for 1,000 ms. There were three trial types in the visual search array for the test phase: previously reward-associated color optimal (13 reward-associated color boxes, 27 non-reward-associated color boxes, and 14 green color boxes), previously reward-associated color non-optimal (27 reward-associated color boxes, 13 non-reward-associated color boxes, and 14 green color boxes), and neutral trial (18 reward-associated color boxes, 18 non-reward-associated color boxes, and 18 green color boxes). Here, optimality is defined as “optimal for maximizing task performance” in that it would be faster to search through the color with a smaller set-size (note that participants were at no point informed about the existence of an optimal strategy). Participants were informed that one red and one blue target would be present on each trial (digit from 2 to 5 in either a red or blue color square) and that they only needed to report one of these targets. Non-target squares were assigned numbers in the same manner as in the training phase. Feedback was inserted after the search array if participants responded incorrectly or failed to respond before the time limit of 6,500 ms. Lastly, the ITI lasted for 1,000 ms. Each run of the test phase was 90 trials long (30 trials for each trial type, randomly distributed), and participants completed a total of three runs with a short break between each run.

Data analysis

We excluded data from three participants due to low accuracy in the task (< 3 SD of the group mean) and one participant withdrew before completing the task. Thus, 31 data sets were fully analyzed.

Results and discussion

Training phase

Neither response time (RT) nor accuracy differed as a function of the value of the target color, t(30) = 0.47, p = .641 (reward: M = 2,809 ms, SD = 291 ms, no-reward: M = 2,829 ms, SD = 305 ms), and t(30) = 0.00, p > .999 (reward: M = 93.9%, SD = 4.9%, no-reward: M = 93.9%, SD = 4.2%), respectively.

Test phase

We conducted a one-way repeated-measures analysis of variance (ANOVA) over RT with trial type as a factor (reward optimal, reward non-optimal, neutral), which revealed a significant difference across trial types, F(2,60) = 6.41, p = .003, η2p = .176 (see Fig. 2a). Post hoc pairwise comparisons identified that participants were faster on trials when the reward-associated color was optimal compared to non-optimal, t(30) = 2.73, p = .010, dz = 0.49, slower on trials when the reward-associated color was non-optimal compared to neutral trials, t(30) = 3.14, p = .004, dz = 0.56, and there were no differences between trials when the reward-associated color was optimal and neutral, t(30) = 0.95, p = .352.

Fig. 2
figure 2

Behavioral results in the test phase. a Response time and b rate of choosing a previously reward-associated color target when it was the optimal color to search through, the non-optimal color, or when there was no optimal color in a neutral condition. Error bars depict within-subjects confidence intervals calculated using the Cousineau method with a Morey correction. **p < 0.01

Most critically, to assess whether reward history influenced choice behavior, we compared the percentage of targets found in the previously reward-associated color to chance (50%, which would reflect unbiased search). Participants were significantly biased to report targets rendered in the previously reward-associated color, t(30) = 2.53, p = .017, dz = 0.45. In contrast, excluding neutral trials in which there was no optimal strategy, participants were not more likely to select the optimal-color target than chance, t(30) = 1.10, p = .278. We then ran an ANOVA over the percentage of targets found in the previously reward-associated color over the three trial types and found no significant difference, F(2,60) = 1.18, p = .316 (Fig. 2b). That is, participants were overall biased to report a target in the previously reward-associated color regardless of how optimal this was with respect to the distribution of color stimuli.

Participants were not significantly faster to report a target in the previously reward-associated color compared to the previously unrewarded color, t(30) = 0.64, p = .528 (reward: M = 2,488 ms, SD = 391 ms, no-reward: M = 2,570 ms, SD = 522 ms). This suggests that reward history influenced which color participants chose to prioritize in their search. We do not see evidence that reporting a target in the previously unrewarded color was overall slowed due to greater sustained competition from the previously reward-associated color when searching through the previously unrewarded color.

In addition, we found significant differences in accuracy across trial type, F(2,60) = 4.63, p = .013, η2p = .134. Post hoc pairwise comparisons identified that participants were more accurate on neutral trials compared to reward-optimal trials, t(30) = 2.49, p = .018, dz = 0.45 (neutral: M = 96.9%, SD = 3.0%, optimal: M = 95.2%, SD = 4.7%), and also compared to reward non-optimal trials, t(30) = 3.65, p < .001, dz = 0.66 (non-optimal: M = 95.1%, SD = 3.7%), which did not differ from each other, t(30) = 0.23, p = .822. These differences may be related to the fact that the varying distribution of color stimuli made it such that neutral trials had a slightly higher number of green stimuli (which could not be targets).

Overall, participants were biased to adopt a search strategy that prioritized stimuli in the previously reward-associated color, reporting targets in that color more frequently than would be expected from any search strategy that did not involve a color preference. When the previously reward-associated color was the more abundant color, this strategy came at the cost of slower RTs to find and report a target. There was no evidence that participants tended to prefer a strategy that prioritizes the less abundant color; only reward history had a reliable influence on strategy as reflected in the reported target.

Experiment 2

In Experiment 1, reward history influenced the strategy that participants adopted to find a target in a situation in which they were free to choose how to conduct their search. It remains unclear how “optimal” search would have been in the absence of reward history, and whether the reward-biased strategy that participants adopted caused them to abandon an otherwise optimal strategy when the previously reward-associated color was more abundant. Although preferentially searching for the previously reward-associated color came at a cost in RTs on trials in which it was the more abundant color, participants may have otherwise tended toward strategies that do not prioritize the less abundant color, such as searching serially through all red and blue stimuli until a target is found. In prior implementations of the ACVS task, there are typically no neutral trials and there is always a less abundant task-relevant color on every trial (Irons & Leber, 2016, 2018; Kim et al., 2021). The neutral trials in Experiment 1 provided an opportunity to assess any reward history-related bias in the absence of a more optimal strategy, and it is possible that the presence of such trials reduced the overall likelihood that participants would realize or otherwise come to favor a strategy of preferentially searching through the less abundant color. In order to characterize performance in the absence of differential reward history as a baseline, we had a separate group of participants complete the test phase from Experiment 1 without any prior training phase. Of interest was the extent to which participants would favor reporting a target in the less abundant color on trials in which the distribution of red to blue stimuli was imbalanced.

Methods

Participants

Thirty-five new participants (21 females), between the ages of 18 and 35 years inclusive (M = 18.3, SD = 0.53), were recruited from the Texas A&M University community. All participants were English-speaking and reported normal or corrected-to-normal visual acuity and normal color vision. All procedures were approved by the Texas A&M Institutional Review Board. Written informed consent was obtained for each participant and all study procedures were conducted in accordance with the principles expressed in the Declaration of Helsinki. Participants were compensated with course credit.

Apparatus

Identical to Experiment 1.

Stimuli and procedure

Participants performed only the test phase of Experiment 1.

Data analysis

We excluded data from two participants due to low accuracy in the task (< 3 SD of the group mean) and one participant withdrew before completing the task. Thus, 32 data sets were fully analyzed. Trials were broken down by whether there were more red than blue stimuli or vice versa (imbalanced) or an equal number of red and blue stimuli (neutral). For imbalanced trials, the percentage of optimal targets reported (percentage of targets reported in the less abundant color) was computed and compared against unbiased choice (i.e., 50%).

Results and discussion

Neither RT nor accuracy differed between imbalanced and neutral trials, t(31) = 1.51, p = .142 (Imbalanced: M = 2,762 ms, SD = 322 ms, Neutral: M = 2,714 ms, SD = 354), and t(31) = -1.67, p = .105 (Imbalanced: 93.2%, SD = 4.0%, Neutral: 94.0%, SD = 3.5%), respectively. For the imbalanced trials, the percentage of optimal targets reported was not significantly different from chance, t(31) = -0.30, p = .770 (M = 49.4%, SD = 11.7%), indicating that participants did not preferentially search for the optimal target. For the neutral trials, there was no significant difference between the rates of reporting a red or blue target, t(31) = -0.63, p = .531 (red: M = 48.2%, SD = 16.2%, blue: M = 51.8%, SD = 16.2%).

Overall, we do not see evidence that participants favor a strategy that prioritizes the less abundant color in our task. This contrasts with prior findings using this paradigm (Irons & Leber, 2016, 2018), which might be explained by the addition of neutral trials in the present study. Prioritizing the less abundant color requires an initial appraisal of the distribution of color stimuli (Hansen et al., 2019), which itself requires time and cognitive effort. On neutral trials, such time and effort would confer no benefit, potentially dissuading participants from adopting such a strategy altogether. In the context of our Experiment 1, our findings here suggest that reward history biases the choice of search strategy when such strategy would otherwise be inconsistent with respect to color, a bias that participants were willing to adopt even on trials in which it led to slower RTs (trials on which the previously reward-associated color was more abundant). It is not the case that reward history led to an abandonment an of otherwise optimal strategy.

General discussion

In the present study, we investigated the modulatory effect of learned stimulus-reward associations on the strategic control of attention. Participants were found to be biased to search through the previously reward-associated color regardless of the distribution of color stimuli on the trial; on trials in which the reward-associated color was more abundant (rendering this strategy objectively non-optimal), this bias came with a performance cost in RT. Reporting a target in the previously unrewarded color was not itself associated with a generalized RT cost, suggesting that reward history biased the color participants chose to search through and was not associated with greater sustained competition (Desimone, 1998; Desimone & Duncan, 1995) when participants tried to search through the previously unrewarded color.

Unlike in prior studies using the ACVS task (Clarke et al., 2020; Irons & Leber, 2020; Kim et al., 2021), participants did not find a target in the optimal color more frequently than would be expected by chance, both with and without prior reward history. It does not seem to be the case that reward history caused participants to abandon a tendency to search through the less abundant color. Rather, we find evidence that reward history influences the choice of attentional strategy when strategy would have otherwise been inconsistent with respect to color. In the absence of reward history, participants might have adopted one or more of any number of possible strategies, for example searching through stimuli serially until a target is found, alternating between searching through red and blue clusters of stimuli, or initially selecting a stimulus at a particular location and continuing to prioritize its color throughout the trial. When a color is previously associated with reward, the choice of search strategy is slanted towards prioritizing the previously reward-associated color.

Previously reward-associated stimuli can capture attention (Anderson, 2013; Pollmann et al., 2016; Tseng & Lleras, 2013), but whether value-driven attentional processes have implications for the strategic control of attention has not been examined. Our results provide a link between value-driven attentional processes and goal-directed attentional control. When given the choice, people are biased to choose to search for a previously high-value stimulus, even when there is no longer any reward incentive to prioritize such stimuli and even on trials on which this strategy comes at a cost to task performance. Without explicit instructions concerning optimal behavior, it appears that strategic attentional priorities are biased in favor of the priorities that were historically favored by the reward structure of the environment.

Our findings have implications for the scope of value-driven attention as a mechanism and how habitual attentional biases that result from reward history are characterized (Anderson, 2016). The influence of learned stimulus-reward associations on the control of attention is not limited to processes of involuntary orienting that may serve in the interest of detecting an unexpected opportunity (Kim & Anderson, 2019), but extends to how an individual chooses to search. Such a mechanism may provide an attentional component underlying habitual reward-seeking behavior, such as that characteristic of drug addiction (Berridge & Robinson, 2003; Robinson & Berridge, 1993). The influence of selection history on the goal-directed/strategic control of attention is generally understudied (see Anderson et al., 2021) and reflects an important area of future research, and the present study offers one potential experimental framework with which this influence could be further explored.