Searching for the right word: Hybrid visual and memory search for words
In “hybrid search” (Wolfe Psychological Science, 23(7), 698-703, 2012), observers search through visual space for any of multiple targets held in memory. With photorealistic objects as the stimuli, response times (RTs) increase linearly with the visual set size and logarithmically with the memory set size, even when over 100 items are committed to memory. It is well-established that pictures of objects are particularly easy to memorize (Brady, Konkle, Alvarez, & Oliva Proceedings of the National Academy of Sciences, 105, 14325–14329, 2008). Would hybrid-search performance be similar if the targets were words or phrases, in which word order can be important, so that the processes of memorization might be different? In Experiment 1, observers memorized 2, 4, 8, or 16 words in four different blocks. After passing a memory test, confirming their memorization of the list, the observers searched for these words in visual displays containing two to 16 words. Replicating Wolfe (Psychological Science, 23(7), 698-703, 2012), the RTs increased linearly with the visual set size and logarithmically with the length of the word list. The word lists of Experiment 1 were random. In Experiment 2, words were drawn from phrases that observers reported knowing by heart (e.g., “London Bridge is falling down”). Observers were asked to provide four phrases, ranging in length from two words to no less than 20 words (range 21–86). All words longer than two characters from the phrase, constituted the target list. Distractor words were matched for length and frequency. Even with these strongly ordered lists, the results again replicated the curvilinear function of memory set size seen in hybrid search. One might expect to find serial position effects, perhaps reducing the RTs for the first (primacy) and/or the last (recency) members of a list (Atkinson & Shiffrin, 1968; Murdock Journal of Experimental Psychology, 64, 482–488, 1962). Surprisingly, we showed no reliable effects of word order. Thus, in “London Bridge is falling down,” “London” and “down” were found no faster than “falling.”
KeywordsVisual search Long-term memory Word recognition
Imagine searching through a list of names, looking for your own name. Assuming that the list is not alphabetical, the nature of this search is quite well understood. With one specific target in mind (your name), most visual search tasks will produce response times (RTs) that are a linear function of the number of items—in this case, the length of the list of names. This pattern is seen for many types of simple stimuli (Treisman & Gelade, 1980; Wolfe, 1994, 2007), and the pattern remains the same for visual search of words (Fisk & Schneider, 1983). That is, when observers are searching for a target word amongst varying numbers of distractor words, search times will increase as a function of the visual set size.
Suppose, instead, that you have focused your attention on one name on the list and you are trying to remember whether it is the name of one of 20 students in a class. How we perform this search through memory is less clear. Under some conditions, memory search patterns are similar to those found in the visual domain. Sternberg (1966, 1969) showed that the time that it takes to decide whether a single item is being held in memory is a linear function of the number of items in memory. Again, this result was replicated using various stimulus types, including words (Cavanagh, 1972; Fisk & Schneider, 1983; Juola & Atkinson, 1971; Sternberg, 1969). These studies had all used variable-mapping paradigms (Schneider & Shiffrin, 1977) in which the targets and distractors changed on every trial in an unpredictable manner (e.g., a target on one trial could appear as a distractor on the next). In consistent-mapping tasks, in contrast, a particular stimulus will always appear as either a target or a distractor over a block of trials. In general, consistent-mapping tasks produce much more efficient memory search slopes, in some cases completely eradicating set size effects (Fisk & Schneider, 1983; Schneider & Shiffrin, 1977). Other studies have shown that consistent mapping causes decelerating, curvilinear functions of set size, rather than the linear functions of variable mapping (Donkin & Nosofosky, 2012; Kristofferson, 1972; McElree & Dosher, 1989; Monsell, 1978; Ratcliff, 1978). Simpson (1972) found that the mean RT was a linear function of the log of the memory set size in a consistent-mapping memory recognition task with extended practice for the observers. None of the work mentioned above, however, had used memory set sizes beyond eight items. Investigating searches through large memory sets, Juola, Fischler, Wood, and Atkinson (1971) and Atkinson and Juola (1973, 1974) found linear searches through sets ranging from ten to 60 words.
Finally, imagine that you are searching the entire list for the names of any one of those 20 students. This involves you bringing each of the students’ names into some form of working memory, in order to compare it with each word or name in the list. This combination of visual and memory search is known as “hybrid search” (Schneider & Shiffrin, 1977). By combining multiple memory searches into a single trial—and therefore, a single RT—hybrid search can magnify small distinctions between set sizes that may otherwise be lost in a traditional recognition test. Moreover, the hybrid-search paradigm allows us to separate out the effects of visual set size from those of memory set size and to look at their interaction. In earlier work with visual objects, Wolfe (2012) found that the RT in hybrid search was a linear function of visual set size, as in other visual search tasks. However, Wolfe (2012) also found that RTs increased with the log of the memory set size. As we noted, Wolfe used photorealistic objects as the stimuli. Stimuli of this sort have been used to replicate the logarithmic search through memory with a search through time rather than space (Drew & Wolfe, 2013). Moreover, the full hybrid-search pattern applies to search through categories of objects, as well as through sets of specific objects (Cunningham & Wolfe, 2014). Leite and Ratcliff (2010) introduced a potentially useful version of a diffusion model to explain the logarithmically increasing RTs (diagrammed in Fig. 8 below). In the context of hybrid search, a diffuser is assigned to each member of the memory set. This diffuser accumulates evidence for the presence of its particular memory item as the trial progresses. If and when any of the diffusers cross a decision boundary, a response is given. Noise in the diffusion process might cause an incorrect diffuser to cross the decision bound, generating a false alarm error. More items in memory mean more diffusion processes, and thus a greater chance of such an error. Raising the decision boundary, and therefore requiring more information before committing to a decision, can reduce errors. Higher decision boundaries take longer to reach, however, increasing the RT. If observers attempt to hold error rates constant, decision boundaries and RTs must rise with memory set size. Leite and Ratcliff (2010) showed that the resulting RT × Set Size function will be logarithmic.
In the present article, we ask whether the pattern generalizes beyond objects to words. With small set sizes (1–4) in combined memory and visual search through alphanumeric symbols, linear RT × Memory Set Size functions have been reported (Briggs & Blaha, 1969; Burrows & Murdock, 1969; Nickerson, 1966; Schneider & Shiffrin, 1977), but the distinction between linear and logarithmic functions is not easy to detect with small set sizes.
Experiment 1 of the present article replicated Wolfe’s (2012) findings, using words rather than photorealistic objects as the stimuli. Observers were asked to remember between two and 16 words and then to search for those amongst visual set sizes of two to 16 items. As in Wolfe (2012), RTs increased linearly with the visual set size, and log-linearly with the memory set size.
Words provide an opportunity to answer other questions about hybrid search that cannot be readily addressed with object stimuli. For example, up until this point the memorization process for all of the related studies has been the same: Observers have been asked to memorize new and arbitrary sets of items at the start of a block of trials. With words, it is possible to ask observers to search through previously memorized familiar texts. These are highly ordered, structured, meaningful sets of stimuli that have long been engrained in the observer’s memory. Again, we can ask whether the basic hybrid search results are found, and again, there are good reasons to think that these stimuli might produce a different pattern of results.
For example, it is possible that an advantage might exist for targets at the beginning or end of an ordered list, mirroring the well-established primacy and recency effects (Atkinson & Shiffrin, 1968; Glanzer & Cunitz, 1966; Murdock, 1962). This U-shaped function has been found for lists stored in long-term memory up to several weeks (Neath & Brown, 2006). Moreover, serial position has been shown to be an important predictor of recalled targets in coherent passages, as well (Deese & Kaufman, 1957; Freebody & Anderson, 1981; Rubin, 1977, 1978). Kelley, Neath, and Surprenant (2013) found both primacy and recency effects in observers’ memory of cartoon theme song lyrics, the seven Harry Potter books, and two different sets of movies, all of which were thought to be recalled from semantic memory stores. Although much of the work on serial position effects has focused on recall tasks, both early and late members of a list have shown RT benefits in recognition tasks, as well (McElree & Dosher, 1989; Monsell, 1978).
In addition to serial position, the rated importance of a particular section in a passage has also proved to be important for recall. Freebody and Anderson (1981) showed that the higher rating of the semantic importance of a particular subsection of a passage, the more likely it was to be recalled. Therefore, in an extension of this finding, we might expect to find RT benefits for words that were rated as being more semantically important in a given passage.
To test these possibilities, we asked observers to provide four phrases of varying lengths that they felt confident were firmly ensconced in their memory. Those words became the memory set in a hybrid search for any member of one of the target phrases amongst distractor words. Again, we replicated the pattern of a linear increase in RTs with visual set size and a logarithmic increase with memory set size. Interestingly, we found no reliable effect of serial position and almost no effect of the importance of the word in the phrase.
Experiment 1: Arbitrary memory sets
In Experiment 1, ten observers, 18 to 48 years of age, were tested (mean age: 29.2; six males, four females). The observers gave informed consent and were compensated $10/h. All observers had at least 20/25 vision with correction, passed the Ishihara Color Blindness Test, and were fluent speakers of English.
In each of four blocks, observers memorized 2, 4, 8, or 16 words. During the memorization task, target words were presented one at a time centrally on the screen for 3 s at a time. Next, the observers were required to pass two recognition tests with 100% accuracy in order to proceed to the search portion of the block. For this learning portion of each block, observers saw a set of words, one at a time, and labeled them as “old” (i.e., part of their memory set) or “new” (distractors). Distractors made up 50% of the recognition test; therefore, in total observers saw twice as many words as the memory set size. If observers failed the test, they reviewed the target words again for 3 s each and then attempted the memory test again. Word order was randomized during all portions of the memorization block, and the distractors were always novel.
After completing the memory portion of the block, observers moved on to a series of 330 search trials: 30 practice trials and 300 experimental trials. During the search task, observers saw displays with 2, 4, 8, or 16 printed words and were instructed to localize any one of their targets with a mouse click as quickly and accurately as possible. One random member from the target list was always present among an array of distractor words. The spatial locations of all of the words in the display were randomly chosen, with the only constraints being that words could not overlap with one another and that the entire word had to fit on the display. After clicking on the target, observers received “correct”/“incorrect” feedback before moving on to the next trial. Participants completed four blocks with memory set size pseudorandomized. From start to finish, the experiment lasted about 1.5 h.
Results and discussion
Figure 2b shows RTs as a function of memory set size. Note that these are the same data points as in Fig. 2a, simply replotted. The functions appear to be curvilinear. Wolfe (2012) argued that RT was a linear function of log2(memory set size). One way to compare linear and log2 accounts of the data would be to use the three smaller memory set sizes to predict the data for set size 16. This is shown in Fig. 2b, with linear predictions shown as Os and log2 predictions shown as Xs. The data and the log2 predictions are quite close (differences: visual set size 2, 31.7 ms; set size 4, 77.8 ms; set size 8, 87.5 ms; set size 16, 131.3 ms). The linear predictions overestimate the actual data, especially for the larger visual set sizes (differences: visual set size 2, 85.5 ms; set size 4, 43.13 ms; set size 8, 481 ms; set size 16, 691 ms). This illustrates one of the virtues of the hybrid-search paradigm in comparison to a standard recognition task: Because observers need to search memory for each attended word, the larger the visual set size, the larger the number of memory searches that will contribute to the RT. This acts to magnify the differences between the predictions of linear and logarithmic processes. That is why the linear prediction is off by hundreds of milliseconds at the largest set size. The absolute errors (the differences between the predicted RT and actual RT) are significantly smaller for the logarithmic prediction at visual set sizes 2 and 16 [ts(9) > 2.2, ps < .03] and are marginally smaller at set size 8 [t(9) = 1.718, p = .0599].
Experiment 1 replicated the work of Wolfe (2012) in the lexical domain, showing that the original result is not restricted to specific objects. Furthermore, by using the hybrid-search paradigm, we can perhaps shed some light on memory search in a consistent-mapping task through words. Since larger visual set sizes require multiple memory searches, deviations from a linear model become more evident, and this perhaps explains the difference of the present results from earlier work with smaller set sizes.
In the present experiment, observers had no trouble encoding the random words up to the maximum of 16. In Experiment 1, the words in memory had no meaning or grammatical structure. Moreover, they were not presented and tested in any fixed order in the first part of the study. Accordingly, we could not assess the effects of word order, including any analysis of serial position. Under normal circumstances, however, word order is important (at least in English). In Experiment 2, we asked whether the basic hybrid-search result would change if we switched the target sets from arbitrary lists of words, learned for the task, to structured lists of words, derived from well-learned text held in our observers’ long-term memory.
Experiment 2: Familiar phrases as memory sets
In Experiment 2, 12 observers 19 to 48 years of age (mean age: 27; four males, eight females) were initially asked to select and enter into the computer four phrases that they knew very well. For three of the phrases, they were instructed to think of passages that, as closely as possible, contained 2, 8, and 16 words. For the fourth and final target set, we asked for the longest phrase that the observer had fully committed to memory. These largest set sizes ranged in length from 19 to 86 words, with an average of 33.75 words. The participants were told that words less than three letters long and repeated words would not count toward their target list. Observers were also given a list of well-known phrases as suggestions (e.g., “Twinkle, Twinkle Little Star,” The US Pledge of Allegiance, etc.); they were instructed not to simply pick “popular” phrases, but rather to choose phrases that they were sure they knew well.
Observers were given a test of their memory for each phrase prior to the search portion of the experiment. Since these target sets were user-inputted, we expected the memory test not to cause many problems. However, in order to keep the two experiments as close as possible, as in Experiment 1, observers were shown the target and distractor words and were asked to make an “old” (part of the phrase) or “new” (distractor) response to each word. Half of the words were target words. We lowered the threshold from Experiment 1; so that the experiment was not delayed due to motor errors, observers had to score 90% or higher twice in a row in order to pass this memory test. In other respects, the methods for Experiment 2 were the same as those for Experiment 1.
Results and discussion
In fact, because of the variety of memory set sizes and because of the variability of the individual observers’ data, it is difficult to see the relationship in the individual observer data. Accordingly, in order to pool the data for purposes of analysis, the memory set sizes were binned into four distinct groups: short, medium, long, and extra long. All short memory sets were two words long. The medium bin consisted of phrases of 4–11 words (average: 7.2 words), the long bin of phrases of 12–19 words (average: 15.3 words), and the extra-long group of phrases of more than 19 words (average: 35.0 words). Using these criteria, all but one observer was tested on one phrase per memory group. (The longest phrase for that observer was merely “long,” not “extra long.”).
Although serial order did not seem have a significant effect on response times, it might be that some word or words in a phrase would be privileged in activated long-term memory (the supposed store for memory set items in hybrid search; Cowan, 1995; Cunningham & Wolfe, 2014; Drew & Wolfe, 2013), in a way that would affect RTs. For instance, one might expect words that are more semantically salient to be accessed more quickly (e.g., “America” might be found faster than “under” in the US Pledge of Allegiance). We asked eight new observers to rate each word of each phrase used in Experiment 2 on a semantic salience scale from 1 to 5. The observers were asked how strongly a particular word “contributes to the overall meaning of the phrase,” where 1 indicated no contribution and 5 indicated a strong contribution. They were instructed not to consider words that contributed to the grammar or structure of a phrase, but rather to focus on words that were necessary to understand the feeling of the phrase. With these ratings in mind, we looked for a negative correlation between the higher ratings (i.e., more semantically relevant words) and the RTs.
Admittedly, the conditions of the present study were not ideal for finding serial position effects. In a replication of Sternberg (1966), Donkin and Nosofsky (2012) showed that serial position curves tend to flatten during recognition tests with enough time for rehearsal. Furthermore, since the phrases used in Experiment 2 were so well-known, they may have been stored as single units of information, and could therefore be retrieved as such. This would suggest that arbitrary words presented in a specific order (e.g., in a list) are more likely to show primacy and recency effects. However, the goal of this experiment was to investigate the efficiency of memory search in well-known phrases. The lack of serial position effects is in fact consistent with an efficient logarithmic search through the items in the memory, with little or no effect of either serial position or the significance of the words.
The apparently logarithmic increases in memory set size found in both Experiments 1 and 2, accompanied with the failure to find primacy or recency effects in Experiment 2, suggest that hybrid search is not performed by a serial search through the memory set. The added attribute of large visual set sizes, which was not found in the earlier memory search literature, allowed us to more easily distinguish logarithmic from linear increases in RT as a function of memory set size. It is also clear that Wolfe’s (2012) findings are not restricted to photorealistic objects, which are easier to memorize (Gehring, Toglia, & Kimble, 1976; Brady, Konkle, Alvarez, & Oliva, 2008) and to find (Paivio & Begg, 1974) than words.
Alternatively, Nosofsky, Cox, Cao, and Shiffrin (2014) proposed a model that also accounts for logarithmic RTs in memory search. As in Leite and Ratcliff (2010), items are racing to a decision boundary. However, in the Nosofsky version, these memory templates are not affected by the size of the memory set, but rather are dictated by their “memory strength,” or the time that they were presented last.
The present results show that this pattern of results holds for words, including words in ordered phrases, in a manner that is qualitatively similar to the results seen with specific objects. Hybrid search for words in the present experiments is substantially slower than was hybrid search for objects in Wolfe (2012). For instance, in Experiment 1, 3,500 ms were required to find the target when the memory set size and visual set size were both 16. In the comparable experiment of Wolfe (2012), the comparable RT was 2,700 ms. Not too much should be made about the differences between experiments. The most obvious source of difference is that word reading probably requires fixation on each item, but object recognition does not. What is important is that the pattern of log memory search and linear visual search is a general phenomenon.
This analysis was repeated for varying groupings, including memory sets of between six and nine words [primacy, F(4, 35) = 0.1249, p = .97; recency, F(4, 36) = 1.384, p = .259], as well as memory sets of between 18 and 24 words [primacy, F(4, 23) = 0.893, p = .48; recency, F(4, 19) = 2.631, p = .07]. No reliable effects of primacy or recency were found.
- Atkinson, R. C., & Juola, J. F. (1973). Factors influencing speed and accuracy of word recognition. In S. Kornblum (Ed.), Attention and performance IV (pp. 583–612). New York, NY: Academic Press.Google Scholar
- Atkinson, R. C., & Juola, J. F. (1974). Search and decision processes in recognition memory. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes (Eds.), Contemporary developments in mathematical psychology (Learning, memory, and thinking, Vol. 1, pp. 243–293). San Francisco, CA: W. H. Freeman.Google Scholar
- Atkinson, R. C., & Shiffrin, R. M. (1968). Human memory: A proposed system and its control processes. In K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory (Vol. 2, pp. 89–195). New York, NY: Academic Press.Google Scholar
- Cowan, N. (1995). Attention and memory: An integrated framework. New York, NY: Oxford University Press.Google Scholar
- Freebody, P., & Anderson, R. C. (1981). Serial position and rated importance in the recall of text. Urbana, IL: University of Illinois.Google Scholar
- Neath, I., & Brown, G. D. A. (2006). SIMPLE: Further applications of a local distinctiveness model of memory. In B. H. Ross (Ed.), The psychology of learning and motivation (Vol. 46, pp. 201–243). San Diego, CA: Academic Press.Google Scholar
- Nosofsky, R. M., Cox, G. E., Cao, R., & Shiffrin, R. M. (2014). An exemplar-familiarity model predicts short-term and long-term probe recognition across diverse forms of memory search. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40, 1524–1539. doi:10.1037/xlm0000015 PubMedGoogle Scholar
- Sternberg, S. (1969). Memory-scanning: Mental processes revealed by reaction-time experiments. American Scientists, 57, 421–457.Google Scholar
- Zeno, S. M. (1995). The educator’s word frequency guide. New York, NY: Touchstone Applied Science Associates.Google Scholar