How humans react to changing rewards during visual foraging
Much is known about the speed and accuracy of search in single-target search tasks, but less attention has been devoted to understanding search in multiple-target foraging tasks. These tasks raise and answer important questions about how individuals decide to terminate searches in cases in which the number of targets in each display is unknown. Even when asked to find every target, individuals quit before exhaustively searching a display. Because a failure to notice targets can have profound effects (e.g., missing a malignant tumor in an X-ray), it is important to develop strategies that could limit such errors. Here, we explored the impact of different reward patterns on these failures. In the Neutral condition, reward for finding a target was constant over time. In the Increasing condition, reward increased for each successive target in a display, penalizing early departure from a display. In the Decreasing condition, reward decreased for each successive target in a display. The experimental results demonstrate that observers will forage for longer (and find more targets) when the value of successive targets increases (and the opposite when value decreases). The data indicate that observers were learning to utilize knowledge of the reward pattern and to forage optimally over the course of the experiment. Simulation results further revealed that human behavior could be modeled with a variant of Charnov’s Marginal Value Theorem (MVT) (Charnov, 1976) that includes roles for reward and learning.
KeywordsHuman foraging Optimal foraging Search termination Reward pattern Visual search
In the last few decades, a substantial body of research has explored how humans search for a single target in displays of distractor items (Desimone & Duncan, 1995; Koch & Ullman, 1985; Treisman & Gelade, 1980; Tsotsos et al., 1995; Wolfe, 1994, 2012). Much less work has been directed at tasks where observers search for an indeterminate number of targets in each of many visual displays (visual foraging tasks). In our daily lives, numerous search tasks have a foraging structure, e.g. buying some fruit at a market or looking up information on a website. Important applied tasks can also involve foraging, e.g., finding metastases in an X-ray of a cancer patient. Unlike single target search tasks, the central question is less “Did I find the target” than “When do I quit searching in the current display and move to the next one?” Recently, search termination on foraging tasks of humans began to attract the attention of researchers (Cain, Vul, Clark, & Mitroff, 2012; Fougnie, Cormiea, Zhang, Alvarez, & Wolfe, 2015; Hutchinson, Wilke, & Todd, 2008; Wolfe, 2013; Zhang, Gong, Fougnie, & Wolfe, 2015).
When foraging, humans, like animals, tend not to search exhaustively even if explicitly asked to stop searching only when all targets were found (Wolfe, 2013). The satisfaction of search (SOS) phenomenon, studied in radiology (Berbaum et al., 1990), can be seen as a troubling example. In SOS experiments, observers are more likely to miss a target if they have previously found a different target in the same display (Berbaum et al., 1990; Fleck, Samei, & Mitroff, 2010). Such SOS errors have been found in different kinds of medical images, e.g., abdominal radiography (Franken et al., 1994), chest radiography (Berbaum, Franken, Dorfman, Caldwell, & Krupinski, 2000; Samuel, Kundel, Nodine, & Toto, 1995) and osteoradiology (Ashman, Yu, & Wolfman, 2000). Efforts to reduce SOS have included computer-aided diagnosis (Berbaum, Caldwell, Schartz, Thompson, & Franken, 2007) and an informal checklist (Berbaum, Franken, Caldwell, & Schartz, 2006).
It can be helpful to think about SOS and related errors as the by-products of strongly engrained foraging behavior. If that is so, insights from the study of foraging might suggest new approaches to reducing those errors. Work on Optimal Foraging Theory in the animal literature provides a good theoretical foundation. Charnov’s Marginal Value Theorem (MVT; Charnov, 1976) is a useful starting place for an understanding of why foragers do not collect everything from a patch in tasks like berry picking1.
The Marginal Value Theorem holds that an individual will quit searching the current patch when the current yield from searching (e.g., targets found per unit time) drops below the average rate. The rate of target acquisition slows when searching a display containing many targets because, as targets are found, the remaining items are less likely to be good quality targets and/or are harder to find. When current target acquisition becomes less valuable than average target acquisition, it is optimal to move on to the next patch. However, maximizing the rate of target acquisition is not always the ultimate goal of a foraging task. If the cost of missing a target is high, it may be preferable to search more exhaustively (i.e. avoid SOS-style errors). In such cases, it may be effective to make targets found in later periods of search more valuable than targets found first. If the value of a target increases after each target that is found, this could offset the reduced rate of target acquisition and, thus, could lead to more exhaustive search.
Our experiment had two principle aims. First, to determine whether increasing the value of targets over time eliminates, or at least ameliorates SOS-style errors by making foraging more exhaustive; secondly, to determine if MVT continues to describe human foraging behavior when items have different / changing values.
To anticipate our results, we found evidence that humans would change their behavior in response to different patterns of reward, picking more berries when rewards increased from selection to selection, and picking fewer berries when rewards decreased. Moreover, our results suggest that observers were learning to forage more optimally over time. These results indicate that a good model to predict foraging behaviors under different patterns of reward should at least consider a role of reward and a role of learning. We simulate a foraging model based on Charnov's Marginal Value Theorem and compare its performance to human foraging behavior. The results indicate that the Marginal Value Theorem (MVT) is a useful starting place for an understanding of why foragers do not collect everything from a patch in tasks like berry picking, and how humans forage under different patterns of reward. Note that, while our variant of a MVT model predicts human foraging behaviors well, other non-MVT models may also predict similar foraging behaviors so long as they consider the roles of reward and learning.
The experiments were conducted at the Harvard Decision Science Laboratory using Psychophysics Toolbox version 3.0.9 in MATLAB 7.10.0 (R2010a)(Brainard, 1997). The stimuli were presented on a 19-in Viewsonic NX1932w monitor with a resolution of 1440 × 900 pixels. Each monitor was connected to a Dell computer running Windows 7.
An example of the “berry bushes” used in our experiments is shown in Fig. 1. Each bush consisted of a 20 × 20 grid of red or green squares. Each bush subtended a visual angle of 15.6° by 15.6° and each square subtended a visual angle of 0.8° by 0.8°. The stimuli were viewed at a 60 cm viewing distance. Observers’ head positions were not fixed so the actual viewing distance was likely to vary somewhat.
Observers were instructed that the green squares were “leaves” and should be ignored when picking “berries”. The color of the green squares varied from [0, 51, 0] to [0, 100, 0] in RGB color space with a uniform distribution of the values in the green color channel. Of the 400 squares in each bush, 40 were red shaded “berries”. “Good” and “bad” berries had different average colors but were drawn from overlapping color distributions. “Good” berries that yielded points when picked while “bad” berries were worthless. The colors of these berries were defined in RGB color space by a triplet [R, (255-R)/4, (255-R)/4]. For good berries, redness (R) was assigned a value, drawn from a normal distribution with a mean of 190 and a standard deviation of 30. For bad berries, R was assigned a value, drawn from a normal distribution with a mean of 130 and a standard deviation of 30. Thus, the d’ of the color signals between good berries and bad berries was 2.0, which means the color of a berry provided an informative but imperfect guide to its “goodness”. Bush quality was fixed at 50%, i.e., 20 berries out of 40 were good. Observers were told that brighter redder berries were more likely to be good than the darker berries, but that the color information of the berries was not completely reliable. Observers were not informed about the bush quality.
The running total of points was displayed in the upper right corner of the screen (outside of the berry bush). The point total automatically updated after each click. Importantly, observers were allowed to leave current bush at any time by clicking on the “next patch” box, even if there were berries still remaining on the screen. The “next patch” box was located in the lower right corner. Participants were not penalized (or rewarded points) for any remaining good berries. When a participant quit searching a bush, a new bush would be presented after a constant 3 s delay plus recording and displaying time (“travel time”, in the language of the foraging literature; Charnov, 1976).
Each participant completed three blocks of one of the three conditions. Completing a block required reaching a particular point total. Observers were asked to reach a total of 8000 points in the Increasing condition, 10,000 points in the Decreasing condition, and 8500 points in the Neutral condition. These different point goals were designed to roughly equate the time needed to complete each condition. Since the point goal determined the termination time of the experiment, observers were motivated to earn points as quickly as possible.
A total of 39 observers took part in our experiments, of which 13 observers (three males, mean age 25.2 years, range 19–39 years old) were tested in the Increasing condition, 13 observers (three males, mean age 30.4 years, 19–50 years old) were tested in the Decreasing condition, and 13 observers (six males, mean age 24.1 years, 18–42 years old) were tested in the Neutral condition. All observers gave informed consent that was approved by Brigham and Women’s Hospital and was consistent with the Declaration of Helsinki. All observers were paid US $10/hr for their time. All observers had vision corrected to at least 20/25 and passed the Ishihara color vision screen.
In each of the three blocks of the Increasing condition, the average number of bushes that participants saw was 36.7, 33.6, and 32.7, respectively. They saw 35.1, 37.3, 38.7 bushes for each block of the Decreasing condition, and 33.3, 33.3 and 34.5 bushes for each block of the Neutral condition. We used data from all the trials (bushes) apart from the first five practice trials and the last unfinished bush in each block (the last block was unfinished because the task ended when the point total was reached).
The average elapsed time spent in each block of the Increasing condition was 944.9 s, 860.8 s and 871.4 s. Elapsed times were 753.9 s, 712.5 s and 652.3 s in the Decreasing condition and 791.4 s, 758.8 s and 761.9 s in the Neutral condition. An ANOVA with Condition and Block as factors shows a main effect of Condition [F(2,36) = 9.316, P = 0.0005] because of the different reward patterns and different goals set for these conditions. There is also a main effect of Block [F(2,72) = 25.69, P < 0.0001], demonstrating that observers sped up over the blocks. The significant interaction of Condition and Block [F(4,72) = 4.006, P = 0.0055] reflects the fact that the decrease in time per block is largest for the Decreasing condition and smallest for the Neutral condition.
Do observers react to different reward patterns?
Further analyses focused on the number of clicks in the different conditions. We removed clicks if the click time was greater than 10 s. The number of these outlier clicks was very low, with a total of 10 outliers out of 18,536 clicks in the Increasing condition, 14 outliers out of 13,217 clicks in the Decreasing condition and 1 outlier out of 15,935 clicks in the Neutral condition.
Modeling foraging behavior under different reward patterns
In previous work, we have found that the MVT provides a good description of average behavior on simple versions of this task, akin to the Neutral condition here (Wolfe, 2013; Zhang et al., 2015); MVT predicts that foragers should leave the current bush when the instantaneous rate of input drops below the average rate. In this section, we examine whether a model based on MVT is still a useful framework to understand human foraging behaviors under different patterns of changing reward.
The data reported above rejects any model that does not include a role for reward. For example, a model that proposes that observers keep picking until the PPV drops below some criterion value can be rejected because it would predict no difference between conditions.
A model that considers the reward pattern should do better. Thus, if we replace PPV with the expected value (EV) of the click (EV = PPV × Reward), then a threshold will be reached fastest in the Decreasing condition and slowest in the Increasing condition. The threshold is a free parameter, which can be constrained by adopting an MVT framework. To simulate the foraging behavior of this modified MVT model, we created simulated bushes containing 20 “good” and 20 “bad” berries. The signal strength of each berry (“redness” in the actual experiment) was drawn from one of two normal distributions; good and bad distributions, separated by d′ = 2. Note that these simulated bushes mimic the actual experiment and match the experimental parameters. We assumed that different models would pick berries in order from the largest signal (reddest) to the smallest signal. While human observer would be imperfect in this ordering, the simulation models pick berries in a strictly decreasing order of signal strength. We simulated the Increasing, Decreasing, and Neutral conditions for 10,000 trials to get the predicted results of different models.
For the foraging models based on MVT, the time to leave the current patch is when the current yield from searching (e.g., targets found per unit time) drops below the average rate. To compute the current yield (instantaneous rate), we first calculate the EV for each click; how many points are awarded for the first click, second, and so forth? To obtain an instantaneous rate, the EV is divided by the average click time for each click where “click time” is the time since the preceding click. To compute an average rate, we calculated the total reward obtained divided by the total time spent in the experiment (including the “travel time” between bushes).The EV of each click is calculated as the average reward of each click based on 10,000 trials (e.g., the first click in the Increasing condition would have a EV of 2, assuming that observers always got the first click correct and collected 2 points).
To calculate the instantaneous rate, the EV is divided by the click time of each click. We employ the average time between clicks made by human foragers. Paired t-test analyses revealed that the time between clicks sped up over blocks during the task. The average click time between clicks was 1.28 s, 1.23 s and 1.22 s separately in three blocks of the Increasing condition [comparing the first block and the last block, t(12) = 2.387, P = 0.034, two-tailed]. The average click time between clicks was 1.3 s, 1.26 s and 1.23 s separately in three blocks of the Decreasing condition [comparing the first block and the last block, t(12)=3.902, p=0.002, two-tailed]. The average click time between clicks was 1.27 s, 1.23 s and 1.22 s separately in three blocks of the Constant condition [comparing the first block and the last block, t(12) = 3.138, P = 0.008, two-tailed]. Paired t-test analyses revealed that there are no significant changes about the click time between clicks during a trial (bush). Based on these observations, we employ the same click time for different trials in the same block but different click times for different blocks (i.e., the instantaneous rate for the first block in the Increasing condition was computed as EV / 1.28 s, and the instantaneous rate for the second block in the Increasing condition was computed as EV / 1.23 s).
The simulated average rate was computed as the accumulated rewards in each simulation divided by the accumulated time (including a 6.2 s “travel time” between displays, derived from the experimental data). The actual travel time varies somewhat from trial-to-trial. Specifically, the actual “travel time” was greater than the preset “travel time” (3 s) because the actual delay included the time for the data recording and the generation of next display. We use the mean of the actual “travel time” to calculate the simulated average rate.
The preceding analyses suggests that the learning MVT model is doing what we wanted it to do, but this is a case of seeking support for a null hypothesis. Accordingly, we compared the predicted click number of the learning MVT model with the actual click number using Grouped Bayesian t-tests (Rouder, Speckman, Sun, Morey, & Iverson, 2009) to see if there is a strong preference for the null hypothesis in this case. Grouped Bayesian t-tests showed the favor of the null hypothesis in first block (Scaled JZS Bayes Factor = 1.702341), second block (Scaled JZS Bayes Factor = 2.293162) and third block (Scaled JZS Bayes Factor = 2.714465) of the Increasing condition. Similarly, Grouped Bayesian t-tests showed the favor of the null hypothesis in first block (Scaled JZS Bayes Factor = 1.765628), second block (Scaled JZS Bayes Factor = 2.215904) and third block (Scaled JZS Bayes Factor = 2.738877) of the Decreasing condition. Within a Bayesian framework, our results can be said to provide marginal evidence for the null over the alternative hypothesis.
The main goal of this section was to see whether MVT still provides a good start for a description of average behavior on simple versions of this task, as it did the earlier foraging works (Wolfe, 2013; Zhang et al., 2015). Our data indicate that prediction models that do not include a role of reward, and a role of learning can be rejected since they would predict no difference between conditions and predict no difference between blocks. Our simulations indicate that an MVT model with a learning parameter and knowledge about the reward pattern gives a credible account of the data. We are not trying to rule out other models, only to argue that any successful model will need to include learning and knowledge of reward.
A large body of work has examined the usefulness of the MVT and related concepts in the understanding of animal foraging (Stephens, Brown, & Ydenberg, 2007; Stephens & Krebs, 1986). More recently, this theoretical perspective has been applied to human behavior in visual search (Wolfe, 2013) and in human cognition more generally (Cain et al., 2012; Constantino & Daw, 2015; Fougnie et al., 2015; Hills, Jones, & Todd, 2012; Hutchinson et al., 2008; Wilke, Hutchinson, Todd, & Czienskowski, 2009). In the present work, we show that MVT continues to be a good starting point for an analysis of foraging behavior when we complicate the situation by adding different values to different foraged items.
The role of value in the control of visual attention has become a topic of interest in recent years (Anderson, Laurent, & Yantis, 2011, 2012; Anderson & Yantis, 2013; Della Libera & Chelazzi, 2009; Hickey, Chelazzi, & Theeuwes, 2010a, 2010b; Wang, Yu, & Zhou, 2013). The present work shows that value shapes foraging behavior in a manner predicted by MVT. The role of value appears to take some time to be learned in this experiment. Looking again at Fig. 10, the learning rate in the third block is significantly larger than that in the first block as the observers have learned to better extract the available value from each situation. These results suggest that a foraging model should consider both a role of reward and a role of learning in predicting human foraging behaviors. Simulation results of this learning variant of the MVT model demonstrate its power to predict human foraging behaviors under reward changing patterns. While the MVT model gives a good starting point for an analysis of foraging behaviors, non-MVT models which take the reward pattern and the learning factor into account, may also predict similar behaviors.
Beyond a fundamental interest in human foraging and decision-making, why should we care about these effects? People rarely search exhaustively when foraging for items of interest in the world. For example, when humans or animals harvest resources like berries in a bush, they are unlikely to pick all ripe berries before moving to a new bush (Wolfe, 2013), and this is what would be predicted by optimal foraging models. However, there are times when optimal foraging is not optimal. Sometimes, it is critically important to find all the targets; all the tumors, all the weapons, etc. In these situations, a tendency to stop short of exhaustive search remains even when the goal is to find everything (Ashman et al., 2000; Berbaum et al., 2000; Franken et al., 1994; Samuel et al., 1995). In medical settings, such errors have proven difficult to eradicate by means of interventions such as computer-aided diagnoses (CAD) or checklists (Berbaum et al., 2006, Berbaum et al., 2007). The present results show how a payoff strategy could be used to move an observer toward more complete search. If you counteract diminishing returns with increasing rewards, observers will stay with the current task for a longer period of time. At some level, this seems like common sense, but common sense is not an invariable guide to human behavior so it is useful to have data to back intuition.
An increasing pattern of reward is far from a cure for errors of omission in foraging search tasks. First, when stimuli are ambiguous, as they are here, increasing the number of clicks increases both hits and false alarm errors. In real world tasks like airport security or medical screening, false alarms incur a smaller cost than miss errors but they are much more common because targets are typically rare (Evans, Birdwell, & Wolfe, 2013; Wolfe, Brunelli, Rubinstein, & Horowitz, 2013). Increasing reward might be more applicable when targets, once found, are easily identified. If each successive typo in a manuscript was worth more than the last, proofreaders would probably find more of them. An experiment with stimuli of this sort would be an interesting sequel to the present work.
Second, a linearly increasing pattern of reward does increase the percentage of targets found but not to 100%. In the present experiment, observers found 57% of targets in the Decreasing condition, 76% in the Neutral condition, and 82% in the Increasing condition. Pushing that 82% to 100% is difficult because so many of the berries are ‘bad’ berries once the bulk of the good berries have been picked. Again, it may be easier to move observers to 100% with unambiguous but still hard to find items.
Third, how different would the foraging behaviors be if a bonus could be awarded after some unknown and variable number of targets was found? Would the observers forage longer in the display and try to get more bonuses? An experiment with this kind of bonus would be another interesting sequel to this work.
In sum, it seems straightforward to tell your child to pick up all the Lego pieces in his room or to ask your radiologist to find every metastasis in a liver CT. However, once these tasks are understood as foraging tasks, it becomes clear that there are strong pressures that make it unlikely that the task will be performed perfectly. Understanding how humans forage is a start to helping them forage more successfully.
We will use Charnov’s Marginal Value formulation (Charnov, 1976) for most of our discussion. It works well for tasks where observers collect a relatively large number of targets (e.g., berry picking). It is not the only approach. For instance, for tasks where the number of targets is relatively small (e.g., finding tumors in radiology), other models like “Potential Value” may be more useful (McNamara, 1985).
For an estimate of effect size we are using r 2–the square of the effect size, r, not the correlation coefficient—calculated as t 2/(t 2+df). See, for example,
r 2 = 0.01 is considered a small effect, 0.09 a medium effect, and 0.25 a large effect.
This research was supported by grants to J.M.W. from ONR MURI (N000141010278), NIH-NEI (EY017001), Hewlett-Packard, Google, CELEST (NSF SBE-0354378) and by the National Natural Science Fund of China (Grant numbers 61233011, 61374006, 61473086); Major Program of National Natural Science Foundation of China (Grant number 11190015); Natural Science Foundation of Jiangsu (Grant number BK20170692, BK20131300); the Innovation Fund of Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education (Nanjing University of Science and Technology, Grant number JYB201601); the Innovation Fund of Key Laboratory of Measurement and Control of Complex Systems of Engineering(Southeast University, Grant number MCCSE2017B01); the Fundamental Research Funds for the Central Universities (2242016k30009).
- Ashman, C., Yu, J. & Wolfman, D. (2000), ‘Satisfaction of search in osteoradiology’, American Journal of Roentgenology 177, 252–253.Google Scholar
- Cain, M. S., Vul, E., Clark, K. & Mitroff, S. R. (2012). A Bayesian optimal foraging model of human visual search. Psychological Science. 1–8.Google Scholar
- McNamara, J. (1985). An optimal sequential policy for controlling a Markov renewal process. Journal of Applied Probability. 324–335.Google Scholar
- Stephens, D. W., Brown, J. S. & Ydenberg, R. C. (2007). Foraging: behavior and ecology, University of Chicago Press.Google Scholar
- Stephens, D. W. & Krebs, J. R. (1986). Foraging theory, Princeton University Press.Google Scholar
- Wang, L., Yu, H. & Zhou, X. (2013), Interaction between value and perceptual salience in value-driven attentional capture’, Journal of Vision 13(3).Google Scholar
- Wolfe, J. (2012), Approaches to visual search: feature integration theory and guided search. In: Nobre, A.C. & Kastner, S. (Eds) Oxford Handbook of Attention, New York: Oxford University Press.Google Scholar
- Wolfe, J. M. (2013), ‘When is it time to move to the next raspberry bush? Foraging rules in human visual search’, Journal of Vision 13, 1–17.Google Scholar