Introduction

The ability to communicate and understand information regarding not only the present, but also the past, future, and even the impossible is an important feature of human language (Hockett, 1960). It has been suggested that such processes are facilitated in language users through the construction of mental representations, which depict the described events and set up expectations about forthcoming referents (Zwaan, 2004). Despite over a decade of research that has evaluated the structure and content of such mental representations for understanding concrete language, none so far has considered how they are affected by uncertainty. Therefore, in the present study, we attempted to examine the way that readers represent described events under varying levels of certainty (by comparing knows vs. thinks).

The idea that readers construct a mental representation of linguistic information is commonly referred to as a situation model (Glenberg, Meyer, & Lindem, 1987; Graesser, Millis, & Zwaan, 1997; Zwaan & Radvansky, 1998). According to this model, mental simulations are experiential in nature since they implicate embodiment and are assumed to be grounded in perception and action. Thus, understanding language entails recapitulating motor or perceptual activity as a simulation of the language input. A number of findings have been produced over the last decade, using various tasks, to demonstrate the validity of this mental simulation view (e.g., Martin & Chao, 2001; Pecher, Zeelenberg, & Barsalou, 2003; Pulvermüller, 1999, 2002; Spence, Nicholls, & Driver, 2001; Spivey, Tyler, Richardson, & Young, 2000). In this article, we will concentrate on those studies that have exploited the link between linguistic input and experimental task and have shown reliable facilitation/interference effects using different language structures (e.g., Glenberg & Kaschak, 2002; Zwaan, 2004; Zwaan & Yaxley, 2003).

One popular version of this paradigm is the sentence–picture verification task. Here, participants read sentences such as “The ranger saw an eagle in the sky,” then subsequently respond (mentioned/not mentioned) to an image that depicts the eagle in a matching physical form (i.e., an eagle with its wings outstretched) or a mismatching physical form (an eagle with folded wings). Response times reveal facilitation effects (shorter reaction times) when the depicted object’s shape and orientation matches information implied in the preceding sentence and interference effects (longer reaction times) when these two sources of information are in conflict (Zwaan, Stanfield, & Yaxley, 2002).

Interestingly, the degree of facilitation/interference that readers experience during comprehension is also influenced by the wider sentence context. Using the sentence–picture verification task described above, Yaxley and Zwaan (2007) presented participants with sentences that set up either a high level of visual resolution, such as “Through the clean goggles, the skier could easily identify the moose,” or a low level of visual resolution, such as “Through the fogged goggles, the skier could hardly identify the moose.” Results revealed faster responses when the visual resolution of the subsequent picture matched that described in the preceding sentence, suggesting that contextually defined visual properties of an object are encoded into the mental representation. Similar effects have been reported when the context modifies other perceptual aspects of an object (e.g., direction of motion and orientation; see Stanfield & Zwaan, 2001; Zwaan, Madden, Yaxley, & Aveyard, 2004). Clearly, then, understanding language involves representations driven by both the described state of affairs and inferences from the wider context.

The present study

In this article, we attempt to extend these findings by examining how contextual uncertainty about the state of an object (e.g., whether someone knows vs. thinks that an eagle is flying) influences the content and strength of subsequent mental representations. Uncertainty is a pervasive component of everyday language; however, it remains largely neglected in the psychological research literature. In speech, uncertainty is typically communicated to others through prosodic cues (see Pon-Barry & Shieber, 2011), but choice of words provides the strongest indicator of the degree of certainty afforded to an utterance (e.g., conditional terms such as some, most, maybe, often, perhaps, likely, unlikely, typically, usually, possibly, etc.). Given that most previous studies of experiential language have examined the mental representations underlying concrete statements (but see de Vega & Urrutia, 2011), it is important to understand how these representations are influenced (if at all) by a context that implies uncertainty. Moreover, given evidence that in everyday language production, adults tend to focus on past or potential future events, rather than on ongoing events in their immediate environment (Tomasello & Kruger, 1992), it is crucial to know how language listeners construct mental models of uncertain events. For example, are we less likely to set up mental representations of events when they are described as being uncertain?

Moreover, understanding uncertainty is likely to involve the production and storage of multiple representations of the world, since comprehenders need to accommodate all the possible permutations of described events. For example, saying sentence 1 below implies that both the described (i.e., “the picnic basket is open”) and the alternative (“the picnic basket is closed”) states of the world are possible. In contrast, following a definite statement such as sentence 2, only the described state is deemed plausible.

  1. (1)

    The old lady thinks that the picnic basket is open.

  2. (2)

    The old lady knows that the picnic basket is open.

The construction of such “multiple worlds” is a common feature in other language constructs, such as conditionals and negation (e.g., Ferguson & Sanford, 2008; Ferguson, Sanford, & Leuthold, 2008; Ferguson, Scheepers, & Sanford, 2010), where it has been associated with higher levels of cognitive effort. Indeed, numerous investigations of negation (e.g., “the eagle is not in the nest”) have used the sentence–picture verification task to reveal that mental representations of both the factual and negated forms of an utterance are set up during comprehension (Kaup, Lüdtke, & Zwaan, 2006; Kaup, Yaxley, Madden, Zwaan, & Ludtke, 2007; Ludke, Friedrich, De Filippis, & Kaup, 2005; but see Tian, Breheny, & Ferguson, 2010). Interestingly, these studies also demonstrate a delay in comprehension due to the process of constructing and selecting appropriate “multiple worlds,” with the “negated state” being available at short interstimulus intervals (ISIs; 250 ms) and the alternative state becoming available only with an ISI of 1,500 ms. Furthermore, representing uncertainty is likely to share many processing steps with theory of mind (ToM) understanding. Research on ToM commonly emphasizes the need to hold two mental representations in mind when carrying out a false belief task (i.e., one’s own true belief and someone else’s false belief), which naturally incurs a degree of working memory load (e.g., Apperly, Back, Samson, & France, 2008; German & Hehman, 2006; Lin, Keysar, & Epley, 2010; McKinnon & Moscovitch, 2007; but see also Ferguson & Breheny, 2011, 2012). Similar to the work on negation, much of this work has found a delay in making reality and belief judgments when the two are in conflict, suggesting that representing multiple perspectives of the world engages more effortful cognitive processes relative to other inferences that elicit only a single representation.

To investigate how contextual uncertainty influences the time course and strength of activation of mental representations, we compared performance on a sentence–picture verification task when the context depicted two levels of certainty about an object’s physical state (knows vs. thinks). In the experiment, participants read sentences like (1) and (2) above. Following a delay of either 250 or 1,500 ms,Footnote 1 they responded to pictures that varied in the physical form of the target object (matching vs. mismatching). On the basis of previous findings, we predicted shorter response times when the shape of the depicted object matched, as compared with when it mismatched, that described in the preceding target sentence. Regarding the novel certainty manipulation, we predicted that uncertain events (described by the verb thinks) would slow down the construction of a mental simulation, possibly due to readers representing both the described and alternative states. This effect should lead to processing costs reflected in increased response times for thinks versus knows, which elicits a single concrete representation of events. This prediction was tested against the null hypothesis that comprehending uncertain events is no more effortful than comprehending certain events, meaning that there should be no difference in response times between certain (knows) and uncertain conditions. This could be because readers do not activate an additional mental representation of the alternative state following an uncertain verb (thinks) or simply because this process occurs rapidly. Examining effects at the different ISIs will allow us to explore this possibility. Finally, we tested the possibility that (un)certainty might influence the size of the mismatch effect emerging between these conditions. Such a difference in processing could be manifest as a reduced mismatch effect following thinks if both possibilities are equally available for comparison.

Method

Participants

A total of 80 native English-speakers from the University of Kent took part in the study, either for course credits or for a small payment. Of these, 40 participants completed the task with a sentence–picture ISI of 250 ms, and 40 completed the task with a sentence–picture ISI of 1,500 ms.

Stimuli and design

Forty experimental sentences were created according to the form [Character] knows/thinks that the X is Y/Z, where Y and Z describe opposing physical states of X. Prior to the main experiment, a pretest was conducted to validate the binary status of the physical states described in these sentences. Twenty participants (who did not take part in the main experiment) were presented with 40 truncated experimental sentences of the form The X is Y in one of two lists (to test both states of X) and were asked to provide an alternative word for each item that described the opposite state to that described by the sentence final word (e.g., open/closed). This pretest revealed a high overall probability of eliciting the desired alternative state across items (M = 81 %), which did not differ between the two state descriptions, t(39) = 0.16, p > .88.

For the main experiment, each sentence appeared in one of four forms: two knows and two thinks (e.g., The old lady knows that the picnic basket is open; The old lady knows that the picnic basket is closed; The old lady thinks that the picnic basket is open; The old lady thinks that the picnic basket is closed). Each experimental sentence was paired with one of two color pictures (so 80 experimental pictures were used in total), with each version depicting object X in two different physical states, as described by the corresponding experimental sentence (see Table 1). The typicality of each picture was assessed in a pretest, where 20 participants rated how well each image corresponded to how they would typically think of that object. Participants saw one version of each image pair, counterbalanced across two lists. Ratings were provided on a scale of 1–5, where 1 meant that the image was not at all similar to how they would typically think about this object and 5 meant that the image was very similar to how they would normally think about this object. Typicality ratings across all pictures averaged 3.88 (SD = 0.39), with an average difference between alternate images of the same item of 0.11, t(39) = 0.52, p = .6. As such, we can assume that our images are matched in typicality across (and within) items.

Table 1 Example experimental sentences and associated visual displays

In addition, 40 filler items were added to each list. These filler items included various verb structures (e.g., saw, hopes, wishes, noticed) and states (e.g., alive, nervous, broken). They all consisted of incorrectly matched picture–sentence pairings and were interspersed randomly among the 40 experimental trials to create a single random order.

One version of each item was assigned to one of eight presentation lists, with each list containing 40 experimental items. Each list contained one of the eight possible versions (4 sentence types × 2 picture types) of each item, presented in a fixed random order to ensure that they were evenly distributed. Each participant saw each item only once, in one of the eight conditions. Thus, half the experimental sentences were paired with a picture that matched the described physical state of object X, and half were paired with a picture that mismatched the described physical state of X. All experimental trials required a mentioned response, and all filler trials required a not-mentioned response.

Comprehension questions followed half of the trials (20 experimental, 20 filler). These questions required a binary true/false or yes/no response and tested participants’ memory of the preceding sentence or picture (e.g., “Did the old lady have a picnic basket?”).

Procedure

The experiment was run on a PC using E-Prime 2.0 Professional software (Psychology Software Tools, Pittsburgh, PA). Each trial began with the presentation of a single sentence in the center of the screen. Participants read this sentence for understanding and pressed the space bar to proceed. A fixation point then appeared in the center of the screen for either 250 or 1,500 ms (depending on the experimental group), followed by the target picture (200 × 200 pixels). Participants then had to decide whether that object had been mentioned in the preceding sentence or not, by pressing “m” for mentioned and “n” for not mentioned. Participants were told to respond as quickly and as accurately as possible throughout the task.

The experiment began with 10 practice trials, and participants completed the remaining 80 experimental trials in two short blocks. Comprehension questions followed half of the trials (20 experimental, 20 filler).

Results

Main analyses

Reading times for the target sentences averaged 1,987 and 2,003 ms for sentences containing knows versus thinks, respectively (both Fs < 0.26), with no difference between ISI conditions (both Fs < 0.27). Overall, participants averaged 79 % accuracy on the comprehension questions that followed half the items, and none responded at lower than 65 %. As such, it appears that there was no initial comprehension advantage between the different levels of certainty. Furthermore, picture verification accuracy on the not-mentioned filler trials averaged 98 %.

The main analyses focused on participants’ accuracy and reaction times in the sentence–picture verification task. Mean accuracy data are shown in Fig. 1. Prior to analysis, any reaction times less than or longer than 2.5 standard deviations from the mean reaction time for each participant were removed. This eliminated 5.4 % of the data. Only correct picture verification responses were included in the reaction time analysis. Mean reaction times are shown in Fig. 2.

Fig. 1
figure 1

Average accuracy for each condition. Error bars show standard errors

Fig. 2
figure 2

Average reaction times for each experimental group and condition. Error bars show standard errors

For statistical analysis, we calculated an average accuracy and reaction time for each condition, allowing generalization to participants (F 1; in which participants are seen as a random factor and items as a fixed factor) and items (F 2; in which items are seen as a random factor and participants as a fixed factor). Significance on both these tests was examined to ensure generalizability of the results across the different participants and experimental items. Strength of association is reported in terms of partial eta-squared (p η 2). Analyses were conducted using a mixed 2 (250 ms vs. 1,500 ms ISI) × 2 (knows vs. thinks) × 2 (match vs. mismatch) × 8 (list) ANOVA, with ISI and list as the between-subjects factors and knows/thinks and match/mismatch as the repeated measures factors. List did not significantly interact with any other variables (all Fs < 1.52).

Statistics on the accuracy data revealed a main effect of match/mismatch, F 1(1, 78) = 19.71, p < .001, p η 2 = .2; F 2(1, 78) = 49.75, p < .001, p η 2 = .39, reflecting higher accuracy on match trials than on mismatch trials. No other effects or interactions reached significance (all Fs < 1.94).

Statistical analyses on the reaction time data revealed faster overall responses to the image when the physical state of the depicted object matched, as compared with when it mismatched, that described in the preceding sentence, F 1(1, 78) = 48.52, p < .001, p η 2 = .38; F 2(1, 378) = 78.1, p < .001, p η 2 = .5. There was also a main effect of ISI, F 1(1, 78) = 4.02, p < .05, p η 2 = .05; F 2(1, 78) = 10.67, p = .002, p η 2 = .12, with faster responses to the picture at the shorter ISI (250 ms) than at the longer ISI (1,500 ms). Results also revealed a significant interaction between ISI and knows/thinks, F 1(1, 78) = 6.86, p < .01, p η 2 = .08; F 2(1, 78) = 3.63, p = .06, p η 2 = .05. In order to examine the underlying effects further, we ran separate 2 (knows vs. thinks) × 2 (match vs. mismatch) ANOVAs for each ISI condition.

250 ms ISI

As in the overall ANOVA, statistical analyses for the 250 ms ISI condition revealed a main effect of match/mismatch, F 1(1, 39) = 28.31, p < .001, p η 2 = .42; F 2(1, 39) = 36.65, p < .001, p η 2 = .48. Analyses also revealed a main effect of knows/thinks at an ISI of 250 ms, F 1(1, 39) = 4.27, p < .05, p η 2 = .1; F 2(1, 39) = 3.27, p = .08, p η 2 = .06, where reaction times were significantly shorter when the preceding sentence included the definite verb knows, as compared with the uncertain verb thinks. The interaction between knows/thinks and match/mismatch was not significant, Fs < 1.

1,500 ms ISI

Once again, analyses revealed a main effect of match/mismatch, F 1(1, 39) = 23.47, p < .001, p η 2 = .38; F 2(1, 39) = 41.51, p < .001, p η 2 = .52. In contrast, at this ISI of 1,500 ms, the main effect of knows/thinks was not significant, F 1(1, 39) = 2.88, p = .1, p η 2 = .07; F 2(1, 39) = 0.99, p = .33, p η 2 = .03, and, in fact, reflected a trend for faster responses following thinks than following knows. The interaction between knows/thinks and match/mismatch was not significant, Fs < 1.

The overall ANOVA showed no significant match/mismatch × ISI, Fs < 1.31, and knows/thinks × match/mismatch, Fs < 0.49, interactions and no significant three-way knows/thinks × match/mismatch × ISI interaction, Fs < 0.26. As such, we can infer that responses to matching and mismatching images did not differ across the different levels of certainty.

Taken together, these data reveal that 250 ms after the picture onset, the uncertainty associated with thinks—and potentially, the multiple representations that are activated—has interfered with readers’ representation of the described object, leading to a slowdown in responses. However, when participants are given longer to set up these mental representations (i.e., 1,500 ms ISI), this interference is removed.

Discussion

The purpose of this study was to investigate the time course and content of the mental representations that are activated by readers under varying levels of certainty (by comparing knows vs. thinks). Specifically, we were looking for evidence that uncertain contexts (e.g., “The old lady thinks that . . .”) delay readers’ mental representations of the world, possibly through the construction of multiple representations, and examined the degree of cognitive effort involved in such processes. To this end, we compared performance of individuals on a sentence–picture verification task. On the basis of previous research with this task, we examined reaction times as a measure of the timing of representations and cognitive effort, with reduced reaction times revealing facilitation between described and depicted states of an object and increased reaction times demonstrating interference effects.

Results showed the expected mismatch effect, with responses to the picture probe being significantly faster when the object’s shape matched that described in the preceding sentence, as compared with when it mismatched. This is in line with previous findings (e.g., Stanfield & Zwaan, 2001; Yaxley & Zwaan, 2007; Zwaan et al., 2002) and reflects the fact that comprehenders rapidly mentally represent the described physical state of the object when comprehending the sentence, which in turn facilitates their matching response.

Crucially, the data also provide evidence for the use of different processing strategies following certain and uncertain utterances, which influenced the speed of responses differently at short versus long ISIs. Recall that at 250 ms ISI, reaction times to the target image were significantly shorter when the preceding sentence included the verb knows, as compared with thinks. This difference suggests that extra processing steps were required to construct and map a simulation of events onto the available image in the uncertain conditions and that these processes had not yet been completed within the short ISI period. However, this difference in reaction times between knows and thinks disappeared when the ISI was extended to 1,500 ms. This suggests that when sufficient time was available to set up the appropriate mental representations, uncertain events were no more effortful to understand than certain events. Thus, the cognitive slowdown observed following thinks could be due to either a delay in setting up these multiple representations or a delay in accessing them for comparison with the target image.

Interestingly, match/mismatch did not interact with knows/thinks in this study. This finding demonstrates that readers do not activate simultaneous multiple representations for uncertain events, which would have led to equal response times for matching and mismatching images in the thinks condition. As such, we can infer that linguistic events preferentially elicit mental representations of the objects in their described form, regardless of whether they were described as certain (e.g., “The old lady knows that the picnic basket is open”) or uncertain (e.g., “The old lady thinks that the picnic basket is open”), as evidenced by facilitation effects for matching images in both cases. Thus, these results point to stronger activation of the described state, against which the target image is initially checked prior to considering the implied alternative state (i.e., the picnic basket is closed). Taken together, these effects and the lack of difference between knows/thinks conditions at longer ISIs suggest that difficulties in accessing, rather than setting up, multiple versions of the world are responsible for the increased processing time following thinks at short ISIs.

This finding offers parallels with related research on ToM understanding. Both false belief tasks and the present task require comprehenders to construct and utilize a representation of events that differs from that of the actual/described events. Recent research has shown that considering an agent’s false belief is cognitively demanding and can lead to interference effects from one’s own true belief (e.g., Apperly, Samson, & Humphreys, 2009), leading to longer response times for judgments that involve conflicting perspectives. This disruption is often explained as a bias to initially map events onto one’s own knowledge and to consider the other person’s perspective only at some later stage (Birch & Bloom, 2007). The present data offer a wider potential explanation for this effect, since under most circumstances, it can be assumed that we are more certain about our own knowledge than about that held by others. Hence, it is possible that processing in both these types of task is influenced by the same bias to initially anchor our understanding of events onto certain information. This would suggest that understanding uncertain information about events or other peoples’ perspectives operates only as a subsequent and controlled checking mechanism, which may incur increased cognitive effort.

Identifying the source of the “costly” cognitive processes involved in representing uncertain information—and their relation to other processes such as ToM—offers an interesting avenue for future research. For example, do other linguistic (and nonlinguistic) cues about certainty elicit the same mental representations and cognitive demands as those found here? Systematically comparing how a speaker’s prosody and choice of language influence the speed and strength of mental simulations of uncertain events would help us gain a better understanding of how these cues can be used for effective communication about possible events, with obvious implications for individuals with communication disorders.

In sum, the present study supports previous findings in demonstrating that readers activate mental representations of described events during language comprehension. Furthermore, it suggests that introducing uncertainty into a discourse (knows vs. thinks) necessitates the activation of at least one additional representation: the object’s implied alternative form. The results reported here suggest that accessing these mental representations occurs as part of a time-consuming process, which maintains an advantage for the explicitly described events over the possible-world alternatives. Thus, these results point to general cognitive difficulty in representing and manipulating uncertain, as compared with certain, events.