Sentence-based mental simulations: Evidence from behavioral experiments using garden-path sentences

Schütt, Emanuel; Dudschig, Carolin; Bergen, Benjamin K.; Kaup, Barbara

doi:10.3758/s13421-022-01367-2

Sentence-based mental simulations: Evidence from behavioral experiments using garden-path sentences

Open access
Published: 28 October 2022

Volume 51, pages 952–965, (2023)
Cite this article

Download PDF

You have full access to this open access article

Memory & Cognition Aims and scope Submit manuscript

Sentence-based mental simulations: Evidence from behavioral experiments using garden-path sentences

Download PDF

Emanuel Schütt ORCID: orcid.org/0000-0002-8257-167X¹,
Carolin Dudschig¹,
Benjamin K. Bergen² &
…
Barbara Kaup¹

3491 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Language comprehenders activate mental representations of sensorimotor experiences related to the content of utterances they process. However, it is still unclear whether these sensorimotor simulations are driven by associations with words or by a more complex process of meaning composition into larger linguistic expressions, such as sentences. In two experiments, we investigated whether comprehenders indeed create sentence-based simulations. Materials were constructed such that simulation effects could only emerge from sentence meaning and not from word-based associations alone. We additionally asked when during sentence processing these simulations are constructed, using a garden-path paradigm. Participants read either a garden-path sentence (e.g., “As Mary ate the egg was in the fridge”) or a corresponding unambiguous control with the same meaning and words (e.g., “The egg was in the fridge as Mary ate”). Participants then judged whether a depicted entity was mentioned in the sentence or not. In both experiments, picture response times were faster when the picture was compatible (vs. incompatible) with the sentence-based interpretation of the target entity (e.g., both for garden-path and control sentence: an unpeeled egg), suggesting that participants created simulations based on the sentence content and only operating over the sentence as a whole.

Literalism in Autistic People: a Predictive Processing Proposal

Article Open access 12 September 2023

Semantic memory: A review of methods, models, and current challenges

Article 03 September 2020

‘I Interact Therefore I Am’: The Self as a Historical Product of Dialectical Attunement

Article Open access 13 June 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Throughout the past two decades, the embodied cognition view has had an increasing influence on research concerned with human cognition (e.g., Chatterjee, 2010). This is especially true for the area of language comprehension. Embodied cognition views of human language comprehension (e.g., Barsalou, 1999; Bergen, 2012; Glenberg & Kaschak, 2002; Zwaan & Madden, 2005) propose that comprehenders grasp the meaning of a word by mentally simulating the word’s referent. Concretely, upon hearing or reading a word, comprehension is effected by reactivating sensorimotor experiences that are associated with its referent. For example, when hearing the word “sky”, comprehenders might reactivate the experience of perceiving the typical color of the sky (i.e., blue) or the experience of looking up. Importantly, words are hypothesized to activate such sensorimotor experiences as—particularly during childhood—they tend to regularly co-occur with their referents in everyday life (Vogt et al., 2019). For instance, a mother may point at a cat and say to her child: “Look! A cat”.

Naturally, words are typically encountered together with other words forming phrases and sentences that convey meaning beyond the word level. Embodied cognition views of language comprehension suggest that comprehenders combine reactivated word-based sensorimotor experiences to create mental simulations corresponding to the meaning of the phrase or sentence in question. Thus, the literature proposes two types of simulation mechanism: Word-based simulations that are sensorimotor experiences triggered by individual words and sentence-based simulations that result from merging word-based simulations to obtain a combined meaning on phrasal or sentential level (see, for instance, Kaup et al., 2016).

Evidence for the word-based mechanism is extensive. A number of behavioral studies have demonstrated that comprehenders reactivate spatial experiences when they encounter words whose referents are typically associated with an upper or lower vertical location (implicit location words; e.g., “satellite” vs. “grave”). For example, in a study by Lachmair et al. (2011), implicit location words were shown centered on the screen and in different font colors. Participants were asked to respond to the font color of the words by performing upward and downward arm movements. Crucially, even though the task did not require lexical access, response times were faster when the movement direction matched the typical vertical location of the implicit location word. The identical pattern of results was obtained in further studies using highly similar materials and experimental procedures (e.g., Ahlberg et al., 2018; Dudschig et al., 2012, 2014, 2015; Öttl et al., 2017; Schütt et al., 2022; Thornton et al., 2013; Vogt et al., 2019). In another study by Dunn et al. (2014), participants made lexical decisions on auditorily presented implicit location words, non-spatial words, and non-words. Concretely, they provided their decision by fixating a target located above or below the screen center. As it turned out, initiating saccadic eye movements was faster when the vertical location typically associated with the implicit location word matched the saccade direction (e.g., when participants performed an upward saccade to state that “moon” is a word; see also Dudschig et al., 2013). Interestingly, Ansorge et al. (2010) used simple German words referring to an upper or lower spatial position on the vertical axis (translated into English: “on top”; “above”; “upward”; “high”; “downward”; “deep”; “down”; “below”) as primes and targets in a masked priming paradigm. The task was to press a higher response key when the target referred to a higher spatial position and to press a lower response key when the target referred to a lower spatial position. The results revealed that response times were faster when prime and target words had the same spatial feature (e.g., this was true for the prime–target pair “above–high”). This clearly illustrates that even rather unconscious word processing can influence subsequent sensorimotor processing. Moreover, there is also evidence from neuroscience suggesting that word processing involves reactivating sensorimotor experiences. For instance, reading odor-related words (e.g., “cinnamon”; “garlic”; “jasmine”) compared with reading odor-neutral words (e.g., “coat”; “poker”; “glasses”) induced an elicited activation in the primary olfactory cortex (González et al., 2006). Similarly, reading action verbs associated with movements of the face, the arm, or the leg (e.g., to “lick”; “pick”; “kick”) evoked a somatotopic activation in motor and premotor brain areas that are related to actual movements of the tongue, the fingers, or the feet (Hauk et al., 2004; but see Miller et al., 2018).

In contrast, the situation is much less clear for sentence-based simulations conveying meaning beyond the word level. Even though there has been research using sentence-based materials and producing results usually considered as simulation effects, it is uncertain whether these results reflect specifically sentence-based and not merely word-based simulation processes. A prototypical example for this issue is the seminal work of Zwaan et al. (2002), which asked whether language comprehenders mentally simulate the shapes of mentioned entities. Participants read sentences referring to entities in specific locations that modulated the implied shape of the entity in question. For instance, reading the sentence “The ranger saw the eagle in the sky” should trigger the simulation of an eagle with outstretched wings, but reading the sentence “The ranger saw the eagle in its nest” should be more likely to trigger the simulation of an eagle with folded wings. After reading sentences like these, participants saw an image of the entity and decided whether it had been mentioned in the sentence. The results revealed faster responses when the shape implied by the sentence matched the shape depicted in the image. Even though these effects were in response to sentences, it is also plausible that they were driven by individual words within those sentences, irrespective of sentential meaning composition. For example, associations with the words “eagle” and “sky” might have produced the mental simulation of an eagle with outstretched wings, whereas encountering the words “eagle” and “nest” might have elicited the mental simulation of an eagle with folded wings. In line with this interpretation, Kaup et al. (2007) found comparable simulation effects when sentences included a negation marker (e.g., “The eagle was not in the sky/nest”). Response times were faster when the picture matched the situation that was negated (e.g., an eagle with outstretched/folded wings) than when the picture matched the situation that was actually conveyed by the sentential meaning (e.g., an eagle with folded/outstretched wings). In addition, participants have been found to react faster to picture probes showing specific entity shapes indicated by sets of content words presented in word lists (Kaup et al., 2012). In sum, this line of work shows that simulation effects in response to sentences may or may not be attributable to simulation processes beyond the word level.

This same confound applies to other influential work in the area. The action-sentence compatibility effect (ACE; Glenberg & Kaschak, 2002) is the observation that participants are faster to judge the sensibility of sentences when the direction of the response movement matches (vs. mismatches) the direction of the action described in the sentences (e.g., when reacting with a movement towards the body to sentences such as “Courtney handed you the notebook” or “Andy delivered the pizza to you” compared with sentences such as “You handed Courtney the notebook” or “You delivered the pizza to Andy”). This ACE has recently been found to be hard to replicate (Morey et al., 2022; Winter et al., 2022). But even if the effect is real, it does not necessarily reflect simulation processes regarding the sentential meaning as a whole. Rather, participants might be reactivating sensorimotor experiences related to those words mentioned at the end of the sentence (e.g., “to you” and “handed you”: movement towards the body; “to Andy” and “handed Courtney”: movement away from the body), immediately before engaging their own motor response. The same is true of a similar paradigm, in which participants read sentences describing clockwise or counterclockwise manual rotations (e.g., “Jenny screwed in the light bulb” vs. “Liza opened the pickle jar”) while turning a knob device clockwise or counterclockwise (e.g., Capuano et al., 2022; Claus, 2015; Zwaan & Taylor, 2006). Once again, ACEs obtained in the context of this paradigm could equally be explained in terms of word-based effects.

Finally, this issue also appears to apply to experiments conducted in the context of a set of studies addressing the activation of specific hand-action representations during language comprehension (e.g., Bub & Masson, 2010; Masson et al., 2013; Masson, Bub, & Newton-Taylor, 2008; Masson, Bub, & Warren, 2008). In general, these studies distinguish functional hand actions related to interacting with an object according to its common function (e.g., pulling the trigger of a water pistol) and volumetric hand actions related to picking up or holding an object. For instance, in one experiment by Bub and Masson (2010) that might be affected by the confound discussed here, participants were presented with context sentences implying a functional or a volumetric hand action (e.g., “David wrote with the pencil” vs. “Bert picked up the pencil”). After a short or long delay (300 vs. 750 ms), which was accompanied by an image of the target referent (here: a pencil), a cue appeared prompting participants to perform an unrelated hand action or the functional or volumetric hand action typically associated with the object referenced in the sentence. Irrespective of the sentence context, there was a priming effect on response latencies for functional and volumetric actions after the short delay. In contrast, after the long delay, a priming effect was only present when the sentence context and the hand action matched. This might suggest that participants created a hand-action representation reflecting the sentential meaning over the course of time. However, it is again conceivable that these effects resulted from single words included in the sentences. For example, associations with the words “wrote” and “pencil” could have caused a functional hand-action representation, whereas reading the words “picked up” and “pencil” might have evoked a volumetric hand-action representation. Consequently—just as in the case of the examples outlined previously—the obtained results can be attributed to either sentence-based or word-based simulation effects.

The most compelling evidence for sentence-based simulation effects comes from studies using grammatical modifications to sentences for driving changes in simulation effects. For instance, Taylor and Zwaan (2008) observed that the rotational ACE persisted when a postverbal adverb referred to the matching action (e.g., “He found a new light bulb which he screwed in rapidly”) but ended when the postverbal adverb addressed the acting individual (e.g., “On the shelf, he found a closed jar which he opened hungrily”). Similarly, Bergen and Wheeler (2010) found that progressive sentences (e.g., “Beverley is closing/opening the drawer”) induce an ACE, whereas perfect sentences (e.g., “Beverley closed/opened the drawer”) do not. Another line of work (Bergen et al., 2007) showed that verbs of upwards or downwards motion provoke simulation effects when combined with concrete nouns (e.g., “The cork rocketed”), but not when combined with abstract nouns (e.g., “The numbers rocketed”). Moreover, a study by Bidet-Ildei, Gimenes, Toussaint, Almecija, et al. (2017) revealed that sentence plausibility can affect the judgment about biological motions. The visual detection capacity for human actions displayed under point-light conditions was better when an auditorily presented sentence including a congruent action verb was plausible compared with implausible (e.g., “The neighbor is running in the garden” vs. “The garden is running in the neighbor”), suggesting that simulations were influenced by contextual aspects beyond the word level (for related work, see Bidet-Ildei et al., 2020; Bidet-Ildei, Gimenes, Toussaint, Beauprez, et al., 2017). In general, however, findings suggesting sentence-based simulations are few and in some cases rely on effects that are hard to replicate.

Taken together, there is little doubt that comprehenders indeed generate word-based simulations, whereas clear evidence in favor of sentence-based simulations is still sparse. Therefore, the first aim of the present research was to provide a new method for investigating whether comprehenders engage in creating mental simulations beyond the word level when processing sentential materials. To this end, we adapted the sentence–picture verification framework (see Zwaan et al., 2002), building sentential materials containing words that independently should equally well activate both entity shapes that match the final sentence meaning and entity shapes that do not. For instance, a sentence like “The egg was in the fridge as Mary ate” includes the word “egg”, denoting an object that can take on different shapes, such as intact in its shell (i.e., unpeeled) versus cracked open and peeled. The sentence by design comprises a word associated with each of these shapes—“fridge” with the intact egg and “ate” with the cracked and peeled egg. A sentence–picture compatibility effect to a sentence like this would therefore be unlikely to derive from lexical associations alone.

As a second-order question, we also interrogated the time course of simulation processes during sentence comprehension. If simulations are constructed on the basis of language structures larger than the word alone, then does this occur incrementally over the course of processing an utterance, or does it wait until the end of a sentence, manifesting as a sort of sentential wrap-up effect? The incrementality of sentential mental simulations is an issue that has barely been tackled. Available evidence stems from research investigating the modulation of the rotational ACE during sentence comprehension. For instance, in the study by Zwaan and Taylor (2006), participants turned a knob device clockwise or counterclockwise to read sentences describing manual rotations frame by frame in a self-paced manner (e.g., “He / realized / that / the music / was / too loud / so he / turned down / the / volume”). Interestingly, the authors found that the rotational ACE occurred when encountering the critical verb region referring to the manual rotation movement (e.g., “turned down”). This suggests that participants immediately created motor simulations and did not wait until the end of the sentence. In another line of work, Sato et al. (2013) made use of the verb-final word order of the Japanese language. In one of their experiments, they investigated whether comprehenders build specific object shape simulations even before reaching the verb at the end of a sentence. Participants were presented with sentences generating the expectation of a certain object shape prior to encountering the verb. For instance, an item paraphrased as “Mother put the shirt neatly in the drawer” was arranged in the typical Japanese word order: “Mother-NOM shirt-ACC drawer-LOC neatly put”. Crucially, reading the preverbal arguments could provide sufficiently constraining information for the comprehender to infer that the shape of the shirt was folded. To test for this early activation of scene-compatible object shape, participants responded to a picture probe before the verb appeared. The results showed faster responses when the shape implied by the preverbal phrase matched the depicted shape, indicating that detailed object shape simulations were created even though critical information was still missing. However—as the authors themselves noted—it is well possible that exposure to the picture probe itself prompted the participants to form detailed object shape simulations to perform the task, even if they would not have done so spontaneously during more naturalistic language processing. Thus, based on the few currently existing findings, it remains unknown whether language comprehenders routinely create incremental simulations during sentence processing. A second aim of our research therefore was to evaluate whether comprehenders construct mental simulations incrementally when reading sentences.

In order to investigate whether language comprehenders engage in forming sentence-based simulations and whether these are built in an incremental manner over the course of sentence processing, we presented participants with two kinds of sentences. The first were unambiguous sentences (e.g., “The egg was in the fridge as Mary ate”) comprising words that were associated with multiple possible shapes as described above. The second were manipulated versions of those same sentences, so-called garden-path sentences, which used the same words but were transitionally ambiguous (e.g., “As Mary ate the egg was in the fridge”). Typically, comprehenders interpret the first verb of such garden-path sentences as transitive (in the example: “Mary ate the egg”). If comprehenders formulate incremental simulations, they should thus activate an initial shape interpretation (e.g., a ready-to-eat egg). However, when arriving at the second verb, where they have to reanalyze the sentence, they should then activate the final sentence-based shape interpretation (e.g., an unpeeled egg in its shell). Previous research on incrementality of semantic and syntactic processing has found that the initial syntactic or semantic interpretation created during garden-path processing tends to linger after the sentence has been reanalyzed (Christianson et al., 2001; Patson et al., 2009; Slattery et al., 2013). So, if participants construct incremental simulations, there should be evidence of both object shape interpretations being activated at the end of the garden-path sentences. By contrast, the unambiguous control sentences should only evoke a single entity interpretation reflecting the sentence-based object shape interpretation (e.g., an unpeeled egg in its shell).

In each trial, participants read either a garden-path or a control sentence, followed by a picture probe displaying the target entity (e.g., a ready-to-eat egg vs. an unpeeled egg in its shell). If language comprehenders create mental simulations on the basis of the sentence as a whole, we should see faster picture-verification times when the picture probe matched the sentence-based interpretation of the target entity. However, if simulation effects are driven by independent word associations, then there should be no such difference—sentences like “The egg was in the fridge as Mary ate” include the same number of words consistent with each of the two possible depicted shapes of an egg. Moreover, if language comprehenders create sentence-level simulations incrementally, then the sentence–picture compatibility effect should be larger for unambiguous sentences than for garden-path sentences; since they will have representations corresponding to both shapes active at the end of the sentence in the garden-path condition, there should be a smaller difference between response times to the pictures or none at all.

Experiment 1

Method

Participants

We aimed to collect data from N = 96 participants through Amazon Mechanical Turk. All participants reported being right-handed native English speakers. They also declared normal or corrected-to-normal vision. Their ages ranged from 23 to 60 years (M = 38.21 years, SD = 9.60 years). There were 41 female and 55 male participants. In total, 19 additional participants completed the experiment, but were excluded and replaced due to an error rate higher than 25% in at least one experimental condition or on the filler trials. All participants gave informed consent and received $4.00 in return for participation. It took about 20 to 30 minutes to conduct the experiment. The study was approved by the Ethics Committee for Psychological Research at the University of Tübingen (Identifier: 2018_0831_132).

Apparatus and stimuli

We employed the open-source JavaScript library jsPsych (Version 6.1.0; de Leeuw, 2015) to implement a browser-based experiment. Participants were explicitly asked to use a laptop or a desktop computer for participating. They pressed the space bar to start trials and indicate that they had read and understood a sentence. The “d” key and the “k” key served as response keys in the sentence–picture verification task.

For experimental trials, we created 36 pairs of critical sentences. Each pair included one garden-path sentence (e.g., “While Amber hunted the turkey was on the table”; “As Zoe bathed the baby slept in the bed”; “While Ryan won the car was in poor condition”) and one matching unambiguous control sentence (e.g., “The turkey was on the table while Amber hunted”; “The baby slept in the bed as Zoe bathed”; “The car was in poor condition while Ryan won”). As described above, comprehenders tend to interpret the first verb of such garden-path sentences as transitive, constructing an initial interpretation of the target entity mentioned in the sentence (e.g., a living turkey; a baby in a bathtub; a car in brand-new condition; see, for instance, Christianson et al., 2001). Importantly, this initial interpretation corresponds to an intermediate processing step as comprehenders have to reanalyze the sentence when encountering the second verb, which should lead to creating a final sentence-based interpretation of the target entity (e.g., a ready-to-eat turkey; a dressed baby lying in the bed; a squalid car). Unambiguous control sentences, however, required comprehenders to form only a single interpretation of the target entity reflecting the sentence-based meaning (e.g., a ready-to-eat turkey; a dressed baby lying in the bed; a squalid car). As correctly answering experimental sentences always meant giving a “yes” response during the sentence–picture verification task, we also generated 36 filler sentences demanding a “no” response. Three fourths of the filler sentences followed the structure of unambiguous control sentences (e.g., “The scarf was in the washing machine as Bill knitted”); the remaining fourth of the filler sentences had the same structure as the garden-path sentences (e.g., “While Samuel ordered the fish swam upstream”). This reduced the proportion of sentences with garden-path structure participants encountered throughout the experiment. This in turn gave them fewer chances to learn the sentence structures and draw conclusions, reducing the likelihood that they would develop specific strategies with respect to garden-path processing (e.g., avoiding the initial entity interpretation). Four additional experimental sentences and four filler sentences were created for the practice session. Our sentential materials were partially adapted from or inspired by prior research (Christianson et al., 2001; Slattery et al., 2013; van Gompel et al., 2006).

For each pair of critical sentences, there were two pictures showing the respective target entity mentioned in the sentences. One of the pictures depicted the entity in the shape implied by the initial interpretation that could be inferred during garden-path processing. The other picture displayed the entity in the shape corresponding to the sentence-based interpretation, which was always the same for both garden-path and unambiguous control sentences. For instance, for the garden-path sentence “While Amber hunted the turkey was on the table” and the corresponding unambiguous control sentence “The turkey was on the table while Amber hunted”, one picture showed a living turkey (initial entity interpretation during garden-path processing), whereas the other picture depicted a ready-to-eat turkey as served at Thanksgiving (sentence-based entity interpretation). A pretest ensured that the pictures referring to the target entity were comparable with respect to how clearly they depicted the entity irrespective of the shape (all ps > .05).^{Footnote 1} In filler trials, we presented participants with pictures showing an entity not mentioned in the respective filler sentence. For example, the filler sentence “The scarf was in the washing machine as Bill knitted” was followed by a picture of green olives. Pictures were in color and scaled to a size of 768 (width) × 576 (height) pixels. Some example materials are given in Table 1 as well as in Fig. 1. For copyright reasons, we are not able to make the pictures publicly available. However, all sentential and pictorial materials will be made accessible upon scientific request (please contact the corresponding author).

Table 1 Examples of the sentential materials used in Experiments 1 and 2

Full size table

Procedure

We instructed our participants to participate in the experiment in an interference-free environment. Each trial started with the prompt “Please press the space bar to initiate the trial”. After pressing the space bar, a fixation cross (“+”; 800 ms) appeared centered on the screen. Then the fixation cross was replaced by a critical or a filler sentence. Participants were asked to read the sentence at a normal pace and to press the space bar. After this, a blank screen followed (500 ms), before participants were presented with the picture. Their task was to judge whether the depicted entity was mentioned in the previous sentence or not. Half of the participants pressed the “d” key for a “yes” response and the “k” key for a “no” response. For the other half of the participants, the response mapping was reversed. They were instructed to provide their response as fast as possible. During practice trials, participants received feedback regarding their response accuracy (“Correct!” vs. “Wrong!”; 1000 ms). The intertrial interval was 1500 ms. Initially, participants participated in a practice session. Subsequently, they performed three experimental blocks, each consisting of 12 critical and 12 filler trials. The conditions for each item (sentence type: garden-path vs. control sentence; picture type: compatible vs. incompatible with the sentence-based entity interpretation) were counterbalanced using four lists. Likewise, conditions were counterbalanced within the blocks. The order of trials was randomized. After each block, there was a self-paced break.

Design and data analysis

The experiment had a 2 × 2 within-subjects design, including the factors sentence type (garden-path vs. unambiguous control sentence) and picture type (compatible vs. incompatible with the sentence-based entity interpretation). Importantly, regarding the factor “picture type”, the level “incompatible with the sentence-based entity interpretation” reflected the initial entity interpretation during garden-path processing (i.e., this particular entity interpretation should not be formed during control sentences). The time period from the occurrence of the picture on the screen until pressing the response key (picture response time) served as dependent variable. All data and R analysis scripts are publicly available online (https://doi.org/10.5281/zenodo.6504181).

We preprocessed and analyzed picture response times using the free statistical software R (Version 4.1.1). First, we removed filler trials and incorrectly answered critical trials. After this, extreme outliers were eliminated (picture response times shorter than 150 or longer than 3000 ms, respectively). Finally, to detect further outliers, we applied the two-step procedure proposed by Kaup et al. (2006). We transformed the picture response times of each participant to z-scores and discarded picture response times with a z-score that deviated more than two and a half standard deviations from the mean z-score of the respective item in the respective condition. In all, outlier exclusion reduced the data set by less than 4%. We made use of the R package lme4 (Version 1.1-27.1; Bates et al., 2015) to build a linear mixed model (see Baayen et al., 2008). Our model contained fixed effects for sentence type, picture type, and the interaction of both factors. In order to arrive at a suitable random effects structure, we referred to the data-driven model selection criterion introduced by Matuschek et al. (2017), which aims at balancing Type I error rate and power. When performing the procedure, however, we obtained warning messages indicating singular fits and convergence issues with respect to more complex models. Consequently, our model was finally restricted to include random intercepts for participants and items. For assessing the significance of the fixed effects, we employed the function mixed from the R package afex (Version 1.0-1; Singmann et al., 2021), which estimates mixed models based on lme4. We calculated p values through likelihood ratio tests (i.e., we chose the option “LRT” for the argument method within the function mixed). Generally, this means that the goodness of fit of a model with a specific fixed effect and the goodness of fit of a model without this specific effect were compared by referring to the ratio of their likelihoods. In case of the function mixed, the complete model with all fixed effects under consideration must be entered; the function then automatically builds suitable reduced models and performs likelihood ratio tests to compute p values for all fixed effects included in the complete model.

We also evaluated reading times (i.e., the time period from the appearance of the sentence on the screen until pressing the space bar). Particularly, we were interested in whether reading times were modulated by the factor sentence type. For this purpose, we processed and analyzed reading times in the same way as picture response times, except for the following adaptations. First, we defined extreme outliers as reading times shorter than 500 or longer than 7000 ms, respectively. In total, removing outliers—including the two-step procedure suggested by Kaup et al. (2006)—reduced the data set by less than 7%. Second, the mixed model solely comprised a fixed effect for the factor sentence type. Again, we skipped some more complex models due to a singular fit when determining the random effects structure. The final model contained random intercepts for participants and items and by-item random slopes for sentence type.

Results and discussion

The data analysis revealed that there was a significant effect of sentence type on reading times, χ²(1) = 17.78, p < .001. As expected, participants needed more time to read garden-path sentences (M = 1812 ms) than unambiguous control sentences (M = 1677 ms). Figure 2 depicts the mean response times for the sentence–picture verification task as a function of sentence type and picture type. The effect of sentence type turned out not to be significant, χ²(1) = 0.80, p = .372. However, the results showed that there was a significant effect of picture type, χ²(1) = 9.42, p = .002, with participants responding faster when the picture probe matched (M = 800 ms) compared with mismatched (M = 822 ms) the sentence-based entity interpretation.^{Footnote 2} This effect was not significantly modulated by sentence type, χ²(1) = 0.58, p = .446.

First, these results can be interpreted as evidence with respect to the validity of the experimental procedure. Specifically, reading times were slower for garden-path sentences than for control sentences. This most likely reflects the additional processing difficulties associated with garden-path sentences (e.g., Ferreira & Henderson, 1991; Frazier & Rayner, 1982; Pickering & Traxler, 1998). This provides indirect evidence that participants read the sentences for comprehension and indeed created an intermediate, incremental interpretation of some kind when facing garden-path sentences. Participants also responded faster to picture probes compatible with the sentence-based entity interpretation. Since our materials were explicitly constructed to be less prone to word-based effects, this suggests that participants indeed generated sentence-based simulations. As argued, this comes against a backdrop of quite sparse evidence in favor of sentence-based simulations.

However, it should be noted that limitations in the construction of experimental materials did not allow us to manipulate which entity shape interpretation and thus which picture of a target entity was associated with the sentence-based interpretation in an individual item. For instance, for the garden-path sentence “As Mary ate the egg was in the fridge” and the control sentence “The egg was in the fridge as Mary ate”, the sentence-based entity interpretation could not be varied and always corresponded to the unpeeled egg in its shell. Therefore—even though the pretest indicated that both pictures related to a target entity similarly clearly depicted this target entity—we cannot rule out the possibility that picture probes referring to the sentence-based shape interpretation were somehow preferred for the entities in question, and thus led to faster picture response times. Moreover, it remains possible that even if we included words in sentences aligned with each of the two entity shape interpretations, nevertheless these might have had unbalanced effects such that one shape was more consistent with the aggregate lexical associations of the sentence.

To address these limitations, we conducted a second experiment, adapting the materials in such a way that the sentence-based entity interpretation in control sentences was linked to the opposite shape and picture from the current experiment. Importantly, if the observed effect is due to comprehenders simulating the sentence-based meaning—and not an artefact of picture preference—the advantage for pictures compatible with the sentence-based entity interpretation should still occur in both garden-path sentences and control sentences. Since we did not observe evidence for the creation of incremental simulations during sentence comprehension in the present experiment (there was no significant interaction of picture type and sentence type), our material adaptions additionally aimed to create more fertile conditions for garden-path effects by making it more difficult to quickly identify this sentence type upon sentence presentation. Prior to starting data collection, we preregistered the experiment (https://aspredicted.org/tq6p9.pdf).