These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

1.1 The Complement Coercion Phenomenon

It has been proposed that verbs like begin and enjoy carry a selectional restriction and must combine with an event-denoting complement (Pustejovsky 1995; Jackendoff 1997). Evidence in support of this restriction comes from the observation that even in sentences where the complement denotes an individual of the ordinary sort, an eventive interpretation is obtained. For instance, the sentence in (1), despite containing an entity-denoting complement, the book, can be interpreted as making reference to some event involving a book with John as its agent. In (2) and (3), the selectional restrictions of begin are satisfied when it combines with the gerundival complement reading the book or an event-denoting nominal the fight.

  1. (1)

    John began/enjoyed the book.

  2. (2)

    John began/enjoyed reading the book.

  3. (3)

    John began/enjoyed the fight.

This eventive interpretation associated with (1) has been interpreted in the literature as an instance of the broader phenomenon of type coercion .Footnote 1 The hypothesis is that there is a class of verbs that exclusively selects for event-denoting complements. This selectional restriction leads to a mismatch in the semantic representation when such verbs combine with complements denoting ordinary individuals. This mismatch is resolved by a semantic operation called type-shifting (Partee 1987; Partee and Rooth 1983) that coerces the semantic type of the entity-denoting complement into the appropriate event-denoting type (see Pylkkänen 2008 for a summary of descriptions of the hypothesis).

Experimental investigation of this phenomenon has revealed behavioral and neurological patterns that are taken to support this linguistic analysis. Psycholinguistic studies, using a variety of experimental paradigms, report that combining an entity-denoting complement with a coercion verb (John began the book) engenders more processing cost than combining it with a non-coercion verb (John read/wrote the book) during real-time comprehension (Baggio et al. 2010; Frisson and McElree 2008; Katsika et al. 2012; Lapata et al. 2003; McElree et al. 2001, 2006; Pickering et al. 2005, 2006; Scheepers et al. 2004, 2008; Traxler et al. 2002, 2005). On the neurolinguistic side, it has been found that the complement coercion effect recruits three distinct cortical regions: Wernicke’s area (Piñango et al. 2001), ventro-medial prefrontal cortex (vmPFC) (Pylkkänen and McElree 2007; Pylkkänen et al. 2009), and BA45 (Husband et al. 2011). The additional cost has been interpreted as manifesting the type-shifting operation or the mechanism of building an eventive representation from an entity-denoting expression (Frisson and McElree 2008: 8; McElree et al. 2001: 22; Pickering et al. 2005: 9). The observed patterns in particular brain regions have been interpreted as evidence that these areas implement the operations of type-shifting and coercion .

1.2 Challenges to the Type-Shifting Hypothesis

The type-shifting hypothesis is, at its core, a hypothesis about the lexical semantics of so-called “coercion verbs” and the conceptual entities that composition involving such verbs is sensitive to. We propose that this hypothesis as formulated faces at least three challenges which have direct implications for how the experimental results are interpreted.

The first challenge has to do with the underlying linguistic pattern. For at least aspectual verbs , the most representative subset of coercion verbs, the eventive interpretation is not obligatory in the presence of an entity-denoting complement. In their simple transitive uses, the context in which these verbs have been tested, aspectual verbs do not exclusively select for eventive complements and agentive subject-referents. Sentences (4)–(6), found in the Corpus of Contemporary American English (COCA), exemplify such cases. For example, there is no construal of (4) in which the book is coerced into an event of which the new autobiographical memoir is an agentive participant. The same is observed in (5); the sentence does not give rise to an eventive interpretation though the “coercion verb” is followed by an entity-denoting complement (the genealogy of the kings of England).Footnote 2

  1. (4)

    Although this is mostly a collection of previously published essays, it is notable because of the new autobiographical memoir that begins the book.

  2. (5)

    This image begins the genealogy of the kings of England and flows into materials specifically written for St. Albans.

  3. (6)

    This column continues the review of the new PSA Club Services Website.

This observation has been elaborated on in Piñango and Deo (2015) who observe that “not only is there no eventive complement (syntactically explicit or coerced) in these examples, but the sentences themselves are aspectually stative, and it is more accurate to say that they report configurational relations between individuals rather than causal relations between events (p. 10).” These distributional and interpretational patterns for the subset of aspectual verbs cast doubt on the validity of the empirical generalization that underpins the type-shifting analysis.

The second challenge comes from distributional differences in the complements of coercion verbs. Reporting the results of a corpus-based study, Utt et al. (2013) found that aspectual verbs co-occur significantly more with event-denoting nominals (such as fight in (3)) than psychological verbs . They call such expressions event-nouns and define them as nouns denoting actions, cognitive processes, or biological processes. These findings, while not incompatible with an event-selecting lexical semantics for coercion verbs, are rather surprising. There should be no reason why psychological verbs, if they lexically select for event-denoting complements, occur far less frequently with such complements than aspectual verbs. Further, this differential frequency distribution suggests that if the type-shifting hypothesis is correct, then the nominal complements of psychological verbs must undergo coercion far more frequently than the nominal complements of aspectual verbs . This asymmetry between the two classes is not naturally reconcilable with the uniform event-selecting lexical semantics for aspectual and psychological verbs that the type-shifting hypothesis relies on.

Finally, the third challenge comes from real-time implementation of coercion. Specifically, Katsika et al. (2012) observe that the class of coercion verbs investigated in previous studies on complement coercion was not only semantically heterogeneous, including at least two distinctly separable subclasses—aspectual verbs (e.g. begin, finish, start) and psychological verbs (e.g. enjoy, prefer, endure),Footnote 3 they also observed that when these two subclasses were separately studied, they exhibited distinct processing profiles. The coercion effect (increased cost sometime after the complement head) was observable with the aspectual verb set and not with the psychological verb set.Footnote 4

These three challenges, taken collectively, suggest that the type-shifting hypothesis, and the lexical semantics for coercion verbs that it presupposes, is at best, problematic. In terms of grammatical possibilities, frequency distribution, as well as processing behavior, coercion verbs do not appear to form a unified class.Footnote 5

In what follows, we present a hypothesis which on the basis of a new analysis for aspectual verbs (Piñango and Deo 2015 ) captures the pattern of real-time comprehension of aspectual verbs observed so far and makes further testable predictions; predictions which we then proceed to test from processing and neurological perspectives.

1.3 The Structured Individual (SI) Analysis

Piñango and Deo (2015) propose that aspectual verbs lexically select for complements that are structured individuals, rather than events. This proposal is motivated by the goal of offering a unified analysis for both “coercion configuration” sentences like John began the book and sentences like The chapter on global warming began the book (similar examples were given (4)–(6), not part of the traditional coercion set). The intuition underpinning a structured individual is an ordinary entity that maps onto a one-dimensional directed path structure (one-dimensional DPS) along a range of dimensions. A one-dimensional directed path structure is defined, following Krifka (1998 ), as a totally ordered structure whose adjacent (non-overlapping) parts bear the precedence relation along some dimension (temporal, spatial, eventive, etc.). For instance, one may construe an entity like a bridge, which is a three-dimensional spatial entity, as mapping onto a line (a one-dimensional DPS) along its most salient dimension, say spatial (Jackendoff 1992; Verkuyl and Zwarts 1992). Similarly, the text of a book may be construed as a one-dimensional path structure along the informational dimension, taking its chapters as the adjacent parts. More examples from the COCA are given below further illustrating the intuition.

  1. (7)

    A moving train finishes the display. (Spatial part)

  2. (8)

    This January 1 begins the dawn of a new age of attainable resolution. (Temporal part)

  3. (9)

    His death begins the Revel. (Eventive part)

  4. (10)

    The only adequate or appropriate response to this reality seems to be the expression that both begins and ends the novel. (Informational part)

Following Gawron (2009) and Deo et al. (2013), Piñango and Deo (2015) call such one dimensional DPSs in any ontological domain an “axis.” In defining the notion of a structured individual, they make reference to axes onto which individuals are mapped. This simply means that the individual is in a homomorphic relation to an (ontologically, conventionally, or pragmatically) given one-dimensional DPS. The predicate axis is taken to be the set of all entities (temporal, spatial, material, or abstract) that are one-dimensional DPSs. An axis is an element from this set.

Structured individuals are defined as entities that are relatable to such axes via homomorphic functions. The formal definition is given in (11). According to (11), an individual x of any type τ is taken to be a structured individual relative to a function f of any type (τ, σ) iff f(x) is an axis and f is a homomorphism from the part structure of x to the axis f(x).

  1. (11)

    x τ [struct-ind f⟨τ,σ⟩ (x) ↔ [axis(f(x)) ∧ ∀x′, x′′ ≤ x [x′ ≤ x′′ ↔ f(x′) ≤ f(x′′)]]]

    (Piñango and Deo 2015)

Aspectual verbs are analyzed as having both a presuppositional and truth-conditional component . They carry a lexical presupposition that requires their complement denotation to be a structured individual. Truth-conditionally, they require the subject to map onto some privileged small subpart of the axis determined by the complement denotation. A sentence with an aspectual verb is true iff the subject denotation is construed as a specific (e.g. initial, medial, final) subpart of the axis that the complement denotation is mapped to.

The sample lexical entry for begin in (12) (P & D 2015), illustrates this general schema assumed for aspectual verbs. The function (f c) that maps the complement denotation to the axis comes from a (lexically encoded) set of functions each associated with a dimension. The idea encapsulated in the definition is that aspectual verbs are sensitive to whether their complement denotation can be construed in context as a structured individual, i.e. whether it is possible given contextual information to map it to an axis along some dimension. The contextually accessed function f c reflects this contextual reliance. Once the presupposition that the complement denotes a structured individual is met, aspectual verbs require that there be some function f’ (∃f′) such that it maps the subject denotation to a small initial part (< small-init) of the axis given by f c (x).Footnote 6

  1. (12) a.

    [[begin]] = λx τ λy σ : struct-ind fc (x). ∃f′ [f′(y) < small-init f c (x)]

  2. b.

    Begin(x)(y) is defined iff x is a structured individual with respect to the contextually determined function f c . If defined, begin(x)(y) is true iff there is some function f′ (possibly identical to f c ) such that f′(y) is a small initial subpart of the axis f c (x).

    (Piñango and Deo 2015)

With this generalized lexical meaning, aspectual verbs are able to combine with arguments of different semantic types to yield a range of possible interpretations.

Consider the example in (7):

  1. (7)

    A moving train finishes the display. (Spatial part)

In (7), the complement (the display) is construable as a structured individual along the spatial dimension, mapping via the spatial trace function σ to its spatial extent—yielding an axis along the spatial dimension —the spatial extent of the display. The sentence is true iff the spatial extent of the entity denoted by a moving train is a small final subpart of the axis that the display maps onto along the spatial dimension.

The selection of the functions that map the complement denotation and the subject denotation to the axis is constrained by both the properties of the complement and context. The lexical meaning of the complement must encode information about the possible dimensions that are relevant to understanding the concept denoted by the complement. For instance, an expression like magazine must contain information about its spatial as well as informational structure; expressions like bridge and river must contain the information that they have a salient spatial dimension. This information will be mined during composition with aspectual verbs (and probably in other cases as well), for determining the dimension along which the complement is to be construed as a structured individual in a given context.

The structured individual analysis treats the meanings of sentences containing aspectual verbs to be underdetermined, with full determination being dependent on which dimension is chosen for interpretation in a given context. Take “The tedious preface begins the novel” for an example. Here the complement can be conceptualized as a body of informational content—an axis along the informational dimension. The tedious preface maps to its informational content and is asserted to be a small initial subpart of the informational content of the novel. In the informational dimension, the physical form (e.g. size, number of pages) of the novel is irrelevant. It does not matter which page the content of preface appears on; the preface could be printed on page 5, preceded by the title page, the copyright page, or the table of contents. Since the truth of the sentence is determined by the informational extent of the two arguments, it is possible that the sentence is false on the spatial interpretation, in which the preface is printed on page 5. On the other hand, the novel can be conceptualized as a spatial entity—an axis along the spatial dimension, in which the novel denotes a structured individual consisting of physical pages of a particular size, with a pagination order. In the spatial dimension, the sentence is true iff the preface is printed on the first page of the novel. A printer might say something like “It is weird that, instead of the half-title and the copyright page, it is the preface that begins the novel.” In this context, no reference is made to the informational content of the preface.

Given that entity-denoting expressions may map onto axes in multiple dimensions, comprehenders must rely on the context to determine the exact function that determines the dimension along which the axis is construed. In the context of formatting or printing the novel, the spatial dimension will be chosen; in the context of reading the novel, the informational dimension is more relevant, and therefore the more salient dimension.

The “coercion” uses of aspectual verbs are naturally accommodated within this general analysis. This configuration is characterizable as one in which an aspectual verb combines with an animate agentive argument and an entity-denoting complement with the resulting reading that the animate subject referent is the agent of some implicit dynamic eventuality. These cases are analyzed as involving functions that map entities to the events that they are participants of—inverse thematic functions . While thematic roles (e.g. agent, patient, theme) map events to their participants as the actor or undergoer, inverse thematic functions are defined as functions that “map pairs of individuals and times to the smallest event that the individual bears a participant role to at that time in a given context” (Piñango and Deo 2015). So in an intuitive way, inverse thematic functions are ways of accessing events via the individuals that participate in them, rather than accessing individuals via the participant role that they bear in an event. \({f_{a{g_i}}}\) maps an individual to the smallest event that they are the agentive participant of at the reference interval i in a given context. \({f_{t{h_i}}}\) maps an individual to the smallest event that they are the patient/theme of at i in a given context.

The salient reading for a sentence like “Jane Austen began the anthology” is one that involves an eventive dimension (Jane Austen began reading/writing/etc… the anthology). In this case, the axis is construed as some event of which Jane Austen and the anthology are participants. The function f c is taken to be \({f_{t{h_i}}}\), which maps the complement, the anthology (as a structured individual), to the smallest event of which it is the theme at the reference time. The subject, Jane Austen, is mapped to the smallest event of which it is the agent by the inverse thematic function \({f_{a{g_i}}}\). The sentence comes out true if the smallest event of which Jane Austen is an agentive participant at a reference interval i in a given context is a small initial part of the event of which the book is a theme participant at i. This is the agentive reading of the sentence.

However, the sentence also has another, constitutive, and reading where the individuals, Jane Austen, and the anthology, get mapped to an informational axis, such that the informational content corresponding to Jane Austen is understood to be a small initial subpart of the informational content corresponding to the anthology. In order to achieve this interpretation, metonymy has to be applied onto the subject, yielding an interpretation that may be paraphrased as [the work of Jane Austen] began the anthology.

Although one reading might appear to be more salient than the other, the agentive and constitutive readings are both possible in principle and are determined based on context and plausibility. Thus the indeterminacy of interpretation (agentive and constitutive) also extends to aspectual verb sentences in the coercion configuration.

The underspecification of meanings for aspectual verbs in the proposed analysis may give rise to a concern that the Structured Individual analysis may over-generalize. Indeed, a reviewer notes that sentences like “The jogger began the bridge” sound awkward but are, in fact, predicted to be acceptable by our analysis, given the spatial structure of the bridge. We agree that such sentences are predicted to be acceptable and note that, given the right context, such readings do become available. Consider, for example, the web-attested sentences (13–14).

  1. (13)

    Fred Sebolt began to get rocks for the Wagon Bridge at Liberty Falls in the last week in August, began the bridge in September 1881. (→ began building the bridge)

  2. (14)

    So with 25 miles left my teammate and I attached and began the bridge in a headwind. (→ began cycling across the bridge)

Here, we note that the Structured Individual analysis, in fact, constrains the properties of the eventuality that is construed as the relevant axis in the coercion configuration . This is because the structured individual presupposition requires that the contextually determined fs be a homomorphism from the part structure of the complement denotation to the part structure of the axis. The axial eventuality must therefore be one in which the part structure of the complement denotation is incrementally related to the course of the event—that is, the complement of begin, finish, etc. must be interpreted as an incremental theme argument of the implicitly construed eventuality. This is indeed the case with coercion sentences like John began the book—which cannot make reference to an event in which the book is not construed incrementally (such as an event of John playing with or seeing the book). This incrementality constraint on the interpretation of the eventuality associated with aspectual verbs in the coercion configuration falls out as a natural consequence of the lexical meanings for aspectual verbs assumed in the Structured Individual analysis—but must be stipulated in a type-shifting semantics for these verbs.

To summarize, the Structured Individual analysis not only grounds the lexical meanings proposed (i.e., dimension functions) on independently motivated conceptual properties, it makes no assumptions about selectional restrictions with respect to events. Therefore, under this analysis, the comprehension of aspectual verbs requires neither a type mismatch nor the implementation of a type-shifting operation. Instead, under this analysis, aspectual verbs combine with their complements and subjects just like other transitive verbs, but their full interpretation requires the contextual resolution of the specific dimension along which the complement can be construed as a structured individual. We call this process the resolution of dimensional ambiguity.

The analysis leads to a hypothesis about the processing of aspectual verbs which we term the Structured Individual (SI) hypothesis. On this hypothesis, the observed psycho- and neuro-linguistic reflexes of complement coercion are taken to reflect not type-shifting operations but the retrieval of the potential dimension-functions and the ultimate dimensional resolution. This allows us to not only maintain a uniform semantics for aspectual verbs across its uses (viz. agentive and constitutive readings), but also captures the observation from Katsika et al. (2012) that aspectual verbs, but not psychological verbs , engender additional cost.

Under the Structured Individual hypothesis , the processing of aspectual verbs is implemented in real-time comprehension as follows:

  1. (A)

    When readers encounter an aspectual verb, they retrieve the verb containing a number of (lexically encoded) dimension functions. We call this process the exhaustive activation of lexically encoded functions. These functions map between the domain of individuals (denoted by the subject) and subparts of the axis construed from one of the dimensions associated with the complement denotation.

  2. (B)

    In order to get a determinate interpretation for the composition of the aspectual verb and the complement, readers must determine the dimension along which the complement denotation should be construed as a structured individual. This means in turn, that the parser must choose a particular function from among those encoded in the verb. Because each complement denotation provides multiple dimensions, the parser is faced with dimensional ambiguity; an ambiguity that must be resolved for the sentence to be interpreted at all. Once the function is chosen (out of the possible ones offered by the interaction of complement and subject’s denotations), the complement can be construed as a structured individual along the dimension associated with that function. This is an instance of ambiguity resolution . These two processes are described in Schema 1.Footnote 7 Footnote 8

    Schema 1
    figure 1

    Processing of aspectual verbs

According to this analysis then, resolving any sentence containing an aspectual verb requires the above two processes; processes that have been previously and independently invoked to capture semantic composition effects: (A) exhaustive activation of the verb’s lexical functions (e.g. Shapiro et al. 1989 Footnote 9), and (B) resolution of ambiguity created by immediate composition demands, i.e. dimension extraction from the complement (e.g. Frazier and Rayner 1990 Footnote 10).

We propose here that the “coercion” cost previously observed arises from the processing of aspectual verbs , the best represented verb class tested among the verbs tested in the past. In addition, we hypothesize that the seemingly incongruent neurological patterns previously associated with complement coercion , are in fact consistent with both step (A) and step (B).

In what follows, we present the results of two experimental studies that examine the psychological and neurological viability of the Structured Individual hypothesis (and ultimately the Structured Individual analysis). To this end, we carry out two experiments using self-paced reading (Exp. 1) and fMRI (Exp. 2) respectively, along with a pretest questionnaire. Three verb types are considered: aspectual verbs (AspectualV), psychological verbs of the enjoy-“type” (EnjoyingV) (these include only those psychological verbs that were tested as coercion verbs in previous work (see Katsika et al. 2012 for details)) and psychological verbs of the love-“type” (LovingV), which have been previously claimed to involve no coercion (Pustejovsky 1995) and thereby serve as controls. The three conditions all contain sentences with an animate subject and an entity-denoting complement.

Based on the Structured Individual hypothesis, we predict that aspectual verbs will induce longer reading times than either type of psychological verbs in the self-paced reading because the former involve the resolution of dimension ambiguity. With respect to neural correlates, we expect aspectual verbs to recruit additional cortical areas at two positions corresponding to the two processes mentioned above: (a) when the subject combines with an aspectual verb, where the parser exhausts the verb’s lexical functions, and (b) when the aspectual verb combines with the complement, where the parser must mine the complement denotation to determine the possible dimensions along which it is construable as a structured individual ( Schema 1).

2 Pretest: Norming Questionnaire

To ensure equal acceptability of the manipulated conditions, we employed a rating questionnaire to test the stimuli.

2.1 Method

2.1.1 Participants

Forty native speakers of American English took the questionnaire, all between the ages of 18–30 and without reading disabilities. The data of three participants were discarded because their responses were either undifferentiated or inconsistent.

2.1.2 Materials

We created 50 triplets; each containing the three manipulated conditions as shown in Table 1. Aspectual verbs (AspectualV) were contrasted with psychological verbs of the enjoy type (EnjoyingV). Note that the two are collapsed as “coercion verbs” under the type-shifting account. We further introduced psychological verbs of the love-type as the control condition (LovingV). Under the type-shifting hypothesis, EnjoyingV and LovingV are taken to differ in that only the former exclusively selects for eventive complements whereas the latter does not. From our perspective, both EnjoyingV and LovingV are psychological verbs. We have no independent reason to say that the two differ in terms of lexical semantics . The reason we separate them here is because the verbs in the EnjoyingV condition have been tested in the literature as “coercion verbs” while those included in the LovingV set have not. Part of the contribution of Experiment 1 (Sect. 3) is to determine if there could be processing differences between these two sets at all that could warrant two different linguistic treatments.

Table 1 Conditions and sample sentences

Aside from the 50 triplets, we introduced 150 filler sentences (50 of them were nonsensical sentences); the whole set of the stimuli amounted to 300 sentences. The 50 triplets were split into two lists, each list containing 25 triplets of the three conditions along with half of the fillers. Each list was assigned to 20 participants, and each participant received a unique pseudo-randomization of her corresponding list.

2.1.3 Procedures

The participants were asked to rate the acceptability of each sentence on a 1–5 scale (1 = does not make sense; 5 = makes sense) and answer a multiple-choice, multiple-answer question probing possible interpretations.

2.2 Results

Results from the means (Table 2) show that the three conditions were within the acceptable range.

Table 2 Results of the sensicality rating (N = 37)

Repeated measures Analysis of Variance (ANOVA) revealed an effect of condition (F(2, 72) = 32.59, p < 0.001). Planned pairwise comparisons indicated that LovingV was rated significantly higher than both AspectualV and EnjoyingV respectively (both ps < 0.001).Footnote 11 Crucially, no difference was found between the AspectualV and EnjoyingV conditions (p > 0.05). In addition, we performed a reliability test on the items within each condition to evaluate their internal consistency. The reliability results showed that the items used in each condition were highly reliable (Cronbach’s alpha: AspectualV = 0.92; EnjoyingV = 0.85; LovingV = 0.72). This means that, while each condition contained 50 items (sentences), the items within each condition were closely related as a group, yielding similar responses.

3 Experiment 1: Self-paced Reading

We conducted a self-paced reading experiment with a moving window paradigm to investigate the time-course of the cost underlying the processing of aspectual verbs and psychological verbs .

3.1 Method

3.1.1 Participants

Twenty-eight native speakers of American English were recruited, all between the ages of 18–30 and with normal vision and auditory acuity. None of them had history of reading disabilities.

3.1.2 Materials

The materials were adapted from the pretest questionnaire. The script contained 50 triplets, each consisting of the three conditions, and 150 filler sentences (among which 100 were nonsensical). Each participant saw all the 300 sentences during the experiment. Each sentence was segmented into several windows as shown in Table 3. Our windows of interest were the Verb, Complement head, Head+1, and Head+2 regions.

Table 3 Example of a set of experimental sentences and segmentations

The verbs used (and their frequencies) in the stimuli are the following: AspectualVs included start (12), begin (13), finish (12), continue (9), complete (3), end (1). EnjoyingVs included enjoy (14), prefer (10), favor (9), tolerate (8), endure (8), resist (1). These verbs were adopted from previous studies and have been used in Katsika et al. (2012). LovingVs included love (13), like (12), dislike (8), hate (5), detest (5), approve of (3), be fond of (2), disapprove of (1), respect (1). All verbs in the three conditions were matched by reaction times from an independent lexical decision study carried out in our lab (DiNardo, unpublished thesis). It showed no difference in accessing times among AspectualVs (465.77 ms), EnjoyingVs (454.46 ms), and LovingVsFootnote 12 (488 ms), all ps > 0.05.

3.1.3 Procedure

The stimuli were presented in black Courier New font in the center of a computer screen with a white background. The participants read the sentences segment by segment at their own pace, which allowed them to fully understand the sentences’ meanings. Every trial began with a series of dash lines, with a “+” sign at the left edge of the screen, signaling the starting point of the sentence. The participants began by pressing the space bar, causing the first segment to show up. With the subsequent press, the next word appeared, and the previous segment was replaced by a set of dashes. At the end of the sentence, they were presented a statement probing either the content or the acceptability of the sentence just read to ensure full comprehension. The participants responded by pressing the “Agree” or “Disagree” key on the keyboard. A practice session was given beforehand; the participants had to reach 80% accuracy in the comprehension task before proceeding to the real trials.

3.1.4 Data Analysis

All 28 participants recruited were taken into account in the data analysis; none was excluded. We performed a mixed model analysis, incorporating a fixed effect of condition (3 levels: AspectualV, EnjoyingV, LovingV) and random intercepts for subject and item. Analyses were carried out in the R statistical environment, using the lmer function in the lme4 packages (Baayen et al. 2008; R Development Core Team 2014) . The reading time measure was evaluated by contrasting a model including condition as the predictor against a base model without it. This contrast shows whether there is a significant effect of condition. For the pairwise comparisons, the p-values were corrected by Tukey tests, and the b values represent the unstandardized coefficients. All significant contrasts are reported.

3.2 Results

The accuracy of the comprehension task was 95.03%. Results of the reading task are reported in Table 4 and Fig. 1. A marginally significant effect of condition was found at the verb (χ2(2) = 5.475, p = 0.0647: AspectualV/EnjoyingV > LovingV) which went away at the complement head (χ2(2) = 1.445, p = 0.486). Sustained significant differences appeared instead at the Head+1 and Head+2 positions.

Table 4 Results of reading times in millisecond (standard errors in parenthesis)
Fig. 1
figure 1

Results of reading times (ms); error bars: ± 1 standard error of the means

At the Head+1 position, there was a significant effect of condition (χ2(2) = 14.315, p < 0.001). The pairwise comparisons indicate that AspectualV engendered significantly longer reading times (RTs) than EnjoyingV (b = 18.301, p = 0.036) and LovingV (b = 27.581, p < 0.001) respectively.

The Head+2 position revealed the same pattern. A significant effect of condition was found (χ2(2) = 11.197, p = 0.004). The pairwise comparisons suggest that AspectualV engendered longer RTs than both EnjoyingV (b = 27.172 p = 0.011) and LovingV (b = 27.504, p = 0.010).

Overall, the results indicate that aspectual verbs induced longer RTs than both the enjoy-type and the love-type of psychological verbs at the two windows following the complement head, while the two types of psychological verbs did not differ from each other. These results replicate Katsika et al.’s (2012) and Utt et al. (2013) findings, and further show that psychological verbs behave as a class in terms of processing profile. We interpret these findings as suggesting that the unique cost observed for aspectual verbs, but not psychological verbs (EnjoyingV and LovingV), is due to the specific interpretive requirements of the aspectual verb class: the determination among multiple possibilities of the dimension (e.g. eventiveinformational, spatial) along which the structured individual associated with the complement denotation must be construed. Crucially, this determination is required; failure to resolve the ambiguity leads to failure to interpret the sentence.Footnote 13

4 Experiment 2: fMRI

Previous neurological studies of the complement coercion effect report activity in three distinct brain regions: Wernicke’s area in Piñango et al. (2001) lesion study, ventral medial prefrontal cortex (vmPFC) in Pylkkänen and McElree’s (2007) MEG study, and BA45 in Husband et al.’s (2011 ) fMRI experiment. Despite the discrepant results, all these studies attribute the effect to type-shifting the complement to obtain an event interpretation. Notice that these experiments are subject to the heterogeneous stimuli problem, mixing aspectual verbs , psychological verbs, and control verbs (e.g. master, try). Yet as mentioned, recent studies (Katsika et al. 2012; Utt et al. 2013) indicate that only aspectual verbs engender additional cost. On the other hand, the Structured Individual hypothesis accounts for the complement coercion effect as the processing of aspectual verbs. In our fMRI experiment, we aim to investigate its neural basis, expecting to find brain activity corresponding to the two hypothesized processes associated with it.

4.1 Method

4.1.1 Participants

Sixteen native speakers of American English participated this study, all between the ages of 18–30, right-handed, without reading disabilities or history of neurological disorders. The data from one participant were excluded from the analysis due to severe head movement.

4.1.2 Materials

The stimuli were the same as the self-paced reading experiment, with a different set of fillers. The 50 triplets and 150 fillers yielded 300 sentences in total. Each participant saw all 300 sentences.

4.1.3 Experimental Design

We used an event-related paradigm. The stimuli were visually presented segment-by-segment as in the self-paced reading experiment, each lasting for 500 ms. For 75% of the sentences, the participants were queried with a yes/no comprehension question, which lasted for 4000 ms. There was a 500 ms interval between the sentence-final word and the comprehension question following the sentence.

The total 300 sentences were divided into 10 runs. The stimuli were pseudo-randomized such that no successive sentences were of the same condition. Each run contained 30 sentences and lasted 5 min 33 s with the inclusion of the machine connection delay.Footnote 14

4.1.4 Imaging Acquisition

Anatomical Measurements: The fMRI experiment was performed on a Siemens Sonata; 3T whole body MRI scanner (Erlangen, Germany). Each session began with a 3-plane localizer followed by a sagittal localizer, and an inversion recovery T1 weighted scan (TE/TR = 2.61/285 ms, matrix 192 × 192, FOV = 220 mm, flip angle = 70°, bandwidth = 501 Hz/pix, 51 slices with 2.5 mm thickness). This acquisition was used to define the AC-PC (anterior and posterior commissure) line for prescription of the anatomic T1 images and functional images in the following series.

Functional measurements: During the task, we conducted event-related functional MRI using gradient echo echo-planar imaging (EPI) blood oxygenation level dependent (BOLD) contrast, with TE = 30 ms, TR = 956 ms, matrix 84 × 84, FOV = 210 mm, flip angle = 62°, bandwidth = 2289 Hz/pixel, slice thickness = 2.5 mm, with 321 measurements (images per slice). The scanner was set to trigger the stimulus presentation program, which enabled the image acquisition to be synchronized with the stimulus presentation.

At the end of the functional imaging, a high-resolution 3D Magnetization Prepared Rapid Gradient Echo (MPRAGE) was used to acquire sagittal images for multi-subject registration, with TE = 2.77 ms, TR = 2530 ms, acquisition matrix 256 × 256, FOV = 256 mm, bandwidth = 179 Hz/pix, flip angle = 7°, 176 slices with slice thickness = 1 mm (the fMRI data within subjects was registered to this brain volume, which was then registered across subjects into a common 3D brain space using the Yale BioImage Suite software package (Papademetris et al. 2006).

4.1.5 fMRI Data Analysis

All data were converted from Digital Imaging and Communication in Medicine (DICOM) format to analyze format using XMedCon (Nolfe et al. 2003). During the conversion process, the first 6 images at the beginning of each of the 10 functional runs were discarded to enable the signal to achieve steady-state equilibrium between radio frequency pulsing and relaxation, leaving 315 images per slice per run for analysis. Functional images were motion-corrected with the Statistical Parametric Mapping (SPM) 5 algorithm ( for three translational directions (x, y or z) and three possible rotations (pitch, yaw or roll). Trials with linear motion that had a displacement in excess of 1.5 mm or rotation in excess of 2° were rejected. All further analyses were performed using BioImage Suite (, Papademetris et al. 2006).

Individual subject data was analyzed using a General Linear Model (GLM) on each voxel in the entire brain volume with regressors specific for each task. For each of the 3 sentence types (AspectualV, EnjoyingV, LovingV) there were two regressors for two events, as shown in Table 5. These events correspond to the two hypothesized processes induced by aspectual verbs under the SI hypothesis. Event 1 included the onset of the subject phrase and the offset of the verb; Event 2 included the onset of the complement until the offset of the sentence-final word. We hypothesize that the exhaustive activation of an aspectual verb’s functions takes place at Event 1 when readers encounter the verb and that readers attempt to determine the dimension along which the complement is construed as a structured individual at Event 2 while facing the dimension ambiguity in sentences with aspectual verbs.

Table 5 Event segmentation in the fMRI experiment

The resulting beta images for each task were spatially smoothed with a 6 mm Gaussian kernel to account for variations in the location of activation across subjects. The output maps were normalized beta-maps, which were in the acquired space (2.5 mm × 2.5 mm × 2.5 mm).

To take these data into a common reference space, three registrations were calculated within the Yale BioImage Suite software package. The first registration performed a linear registration between the individual subject raw functional image and that subject’s 2D anatomical image. The 2D anatomical image was then linearly registered to the individual’s 3D anatomical image. The 3D differs from the 2D in that it has a 1 × 1 × 1 mm resolution whereas the 2D z-dimension is set by slice-thickness and its x-y dimensions are set by voxel size. Finally, a non-linear registration was computed between the individual 3D anatomical image and a reference 3D image. The reference brain used was the Colin27 Brain (Holmes et al. 1998) in Montreal Neurological Institute (MNI) space (Evans et al. 1992). All three registrations were applied sequentially to the individual normalized beta-maps to bring all data into the common reference space.

Data were corrected for multiple comparisons by spatial extent of contiguous suprathresholded individual voxels at an experiment-wise p < 0.05. In a Monte Carlo simulation within the AFNI software package and using a smoothing kernel of 6 mm and a connection radius of 4.33 mm on 2.5 mm × 2.5 mm × 2.5 mm voxels, it was determined that an activation volume of 183 original voxels (4953 microliters) satisfied the p < 0.05 threshold. Clusters were created for each of the subtractions. Each cluster was identified with a region label, and then associated with additional numeral labels corresponding to Brodmann areas .

4.2 Results

4.2.1 Behavioral Results

The overall accuracy of the comprehension task was 88.6%. The response times (RTs) for the questions are shown in Table 6. Results of repeated measures ANOVA (trials with no response were excluded) revealed no significant effect of conditions in RTs (F(2, 28) = 2.786, p = 0.08).

Table 6 Mean response times (ms) of the comprehension questions

4.2.2 Imaging Results

The imaging results showed that, at Event 1 (Subject + Verb), AspectualV preferentially recruited Wernicke’s area (BA40), bilateral BA7, 6, 24, and primary sensory areas over EnjoyingV (Fig. 2). At Event 2 (Complement ~ Sentence-final), AspectualV preferentially recruited the left inferior frontal cortex (LIFC) , including BA44, 45, 47, and left insula, as well as bilateral BA6, right BA8, right inferior frontal cortex, and primary visual cortex over the control LovingV (Fig. 3). These results are summarized in Tables 7, 8 and 9.

Fig. 2
figure 2

AspectualV > EnjoyingV at Event 1 (Subject + Verb): activations in left BA 40 (Wernicke’s area), bilateral BA7, bilateral BA 6/24, and bilateral primary sensory areas

Fig. 3
figure 3

AspectualV > LovingV at Event 2 (Complement ~ S-final): activations in LIFC (BA44, 45, 47), left insula, right BA8, bilateral BA6, right IF cortex, and left primary visual cortex

Table 7 Summary of the results of the fMRI experiment (significant activations)
Table 8 The differentially active regions of the AspectualV—EnjoyingV at Event 1(Subject+Verb). L = left, R = right, AntCingulate = anterior cingulate, PrimSensory = primary sensory cortex, Prim_Motor = primary motor cortex
Table 9 The differentially active regions of the AspectualV—LovingV at Event 2(Complement ~S-final) L = left, R = right, IFG = inferior frontal gyrus, AntCingulate = anterior cingulate

Overall, the fMRI results reveal that at Event 1 (Subject + Verb) AspectualV shows preferential recruitment in Wernicke’s area (BA 40) over EnjoyingV (Fig. 2) whereas at Event 2 (complement phrase), preferential recruitment for AspectualV (over LovingV) shifts to left inferior frontal cortex (Fig. 3).

5 Discussion

Experiment 1 (self-paced reading) showed that the AspectualV condition engendered longer reading times than psychological verbs (EnjoyingV and LovingV) during real-time comprehension after the complement head was encountered, thus replicating and expanding on Katsika et al.’s (2012) and Utt et al.’s (2013) findings. These observations call into question our traditional understanding of the “complement coercion” phenomenon. The fact that aspectual and psychological verbs show not only distinct linguistic behaviors but also computationally distinct processing profiles suggests that the set of “coercion verbs” studied in previous work collapses at least two semantically distinct classes of verbs. These observations are in principle inconsistent with the type-shifting hypothesis, which predicts processing cost at least for those psychological verbs that fall in the coercion set—a prediction that is not supported by the results.

In order to account for this pattern of results, we propose the Structured Individual hypothesis (based on the Structured Individual analysis for aspectual verbs) whereby the processing cost associated with aspectual verbs results from (A) exhaustively retrieving the possible functions stored in the verb, and (B) the resolution of dimension ambiguity (e.g. spatial, temporal, informational, eventive) that is required to interpret the complement and consequently, the full sentence. According to this hypothesis, the dimension is identified when the complement head denoting a structured individual is encountered and the dimension along which this structure must be mapped onto the axis is determined.Footnote 15

We already find initial support for this general approach in previously puzzling findings. Traxler et al. (2005) show that a prior context sentence that either contained the same coercion verb or explicitly mentioned the event structure attenuated the cost in the following target sentence.

  1. (15)

    Context: The student started/read a book…

    Target: Before he started/read the book…

In their eye-tracking experiment, when either “started a book” or “read a book” is given in the context sentence, there was no difference between “started the book” and “read the book” in the target sentences. The Structured Individual (SI) hypothesis captures this naturally: because the proper interpretation among multiple ones is contextually determined, a context sentence that biases the interpretation towards a certain dimension (likely the eventive dimension in this case) resolves the dimension ambiguity associated with aspectual verbs and therefore attenuates the cost.

A seemingly problematic finding for the SI hypothesis is Traxler et al.’s (2002) results. They show that, following coercion verbs (including both aspectual and psychological), entity-denoting complements (started the puzzle) induced more processing cost than event-denoting complements (started the fight). This may seem to go against the SI hypothesis, which states that any sentence containing aspectual verbs (including those with eventive complements) will require both exhaustive dimension-function retrieval and dimension-ambiguity resolution (the two potential sources of cost). The SI hypothesis accounts for this reported difference as an attenuation of the effect brought about by a pre-determination of the intended dimension along which the structured individual must be construed. Recall that the objective of interpreting AspectualV sentences is to decide the exact dimension along which the structured individual is construed. In cases like started the fight, the event-denoting complement (“fight”) determines, or strongly biases towards, the eventive dimension along which a structured individual is construed. Because the dimension along which the structured individual must be construed is already determined by the eventive complement, the effort of resolving the dimension ambiguity (process B) is decreased, thus the observable cost of the composition is attenuated.

We focus now on Experiment 2 (fMRI), which shows that aspectual verbs induced preferential recruitment of the left posterior superior temporo-parietal cortex (i.e. Wernicke’s area, BA40) when readers encountered the verb and the left inferior frontal cortex (i.e. Broca’s area) when readers encountered the complement, over each of the psychological conditions (enjoy-type and love-type respectively). The Structured Individual hypothesis captures these patterns as follows: At Event 1, BA40 (green circles in Fig. 2) reflects the exhaustive activation of the aspectual verb’s dimension functions . The reason that aspectual verbs would engage this cortical region above and beyond the engagement induced by the EnjoyingV condition is presumably that only the former must encode discrete, pre-specified, functions. This encoding represents a measure of complexity and therefore of cost. This interpretation of Wernicke’s area is consistent with previous work showing not only that it is involved in the retrieval of lexico-semantic representations (Badre et al. 2005; Binder et al. 2009; Damasio et al. 1996; Hickok and Poeppel 2004, 2007; Humphries et al. 2007; Lau et al. 2008) but also as Shapiro et al. (1993) show through a Broca’s versus Wernicke’s comparison, that this cortical region is specifically involved in exhaustive retrieval of lexical items during comprehension.

The activation of the precuneus (BA 7) (pink circles in Fig. 2), which was also preferentially recruited for AspectualV at Event 1, although not initially expected given previous neurolinguistic work on complement coercion , is completely consistent with our view of the fundamental meaning of aspectual verbs. This cortical region is reported to support spatio-temporal tasks such as processing visual-spatial information in perception and memory, especially spatial representation of sequential movements (Cavanna and Trimble 2006; Fletcher et al. 1995; Wallentin et al. 2008). We therefore interpret the recruitment of BA7 for aspectual verbs over psychological verbs as reflecting axial conceptualization associated with the set of dimension functions, one of which will ultimately map the structured individual onto an axis . The activation of BA40 and BA7 combined is correlated with dimension function retrieval, which in the absence of a complement is still underdetermined. These observations are clearly grounded in the basic principles of the SI hypothesis, and in this way support it. By contrast, they are not naturally connectable to the type-shifting approach, which by its very nature is expected to target only one cortical region, and cannot be naturally connected to spatial conceptualization.

The neurological patterns discussed in the paper are those that can be clearly interpreted in the context of the comparison between the Structured Individual hypothesis and the Type-Shifting hypothesis. To the extent that EnjoyingV shows different patterns from AspectualV, we interpret the results as suggesting that the two sets of verbs do not belong to a unified class.

At Event 2, the LIF cortex and insula activations (green in Fig. 3) are taken to reflect the process of determining the dimension along which the complement denotation is structured. This is not an unexpected activation pattern given that the LIF cortex has in the past been reported to support certain kinds of ambiguity resolution (e.g. Badre et al. 2005; Krain et al. 2006; Lau et al. 2008; Rodd et al. 2010). In particular, several studies have reported that the insula is involved in processing of time. For instance, in an fMRI study using prismatic adaptation (PA)Footnote 16, Magnani et al. (2014) had participants perform a time reproduction task, in which they indicated the time intervals as perceived. The PA-induced rightward aftereffect resulted in an overestimation of time intervals whereas the leftward aftereffect resulted in an underestimation of time intervals. Their imaging results reveal that the left anterior insula and left superior frontal gyrus showed increased activity after versus before PA. They hence suggest that these regions are involved in spatial manipulation of the representation of time. Connecting those results with the ones reported here, we propose that the insula recruitment for aspectual verbs is likely to reflect a more general kind of structural configuration, more specifically the precedence relation on the axis along some dimension, including the temporal dimension.

For Event 2 (as for Event 1) the premotor area or the supplementary motor area (BA6) was recruited preferentially for AspectualV (light-blue circle in Fig. 3). This area is reported to be involved in action planning or event sequencing (e.g. Crozier 1999), action simulation, the generation of ordinally structured sequences (Stadler et al. 2011), and the updating of spatial information (Tanaka et al. 2005). We propose that the BA6 activation associated with the AspectualV condition reflects the sequential action planning along the eventive dimension.

Finally, we take the activity in the visual cortex (purple in Fig. 3) to reflect components not specifically relevant to the linguistic mechanisms in question, but mechanisms connected to attention, generalized task difficulty (Chen et al. 2008; Tan et al. 2001), or written word recognition (see review in Price, 2012). We leave it for future research to further clarify the connection between those presumed general processes and the specific meaning composition mechanisms discussed here .

Our fMRI results replicate Piñango et al. (2001) and Husband et al.’s (2011) findings. Whereas Piñango et al. (2001) report that Wernicke’s aphasics (with damage to the left posterior superior temporo-parietal cortex) have difficulty comprehending sentences involving complement coercion, Husband et al. (2011) show that BA45 in LIF cortex (Broca’s area) is preferentially involved in the implementation of sentences demanding complement coercion (that is, to the extent that a large portion of their stimuli sentences used aspectual verbs). Given their corresponding experimental designs, it was not possible in those studies to elucidate when during the course of comprehension, these respective cortical regions were maximally recruited. Experiment 2 presented here resolves this question by showing that Wernicke’s area is preferentially recruited during the unfolding of the subject + aspectual verb composition whereas LIF cortex is preferentially recruited later in the comprehension process, once the complement head (indicating the presence of a structured individual) is retrieved. And again, this multiplicity of regions finds coherence in the Structured Individual hypothesis while it is left unexplained in the type-shifting approach, which by definition requires the activation to be localized to one cortical region.

Regarding the psychological verb conditions, as we have seen, the fMRI results show a “split” behavior such that for Event 1, only EnjoyingV shows a difference in preferential recruitment with respect to AspectualV; and for Event 2, only LovingV shows a difference in preferential recruitment with respect to AspectualV. We believe that this difference in behavior is rooted in independent but potentially interacting factors which the present results can only begin to tease apart. So, what we offer here is a conjecture whose examination we leave for future work.

As mentioned in the Introduction section, we do not predict a linguistic difference between the EnjoyingV and LovingV conditions. Consistent with previous proposals, verbs in both conditions are all equally expected to select for a target of emotion (Pesetsky 1995: 55, 96; Levin 1993; Katsika et al. 2012), thus predicting unified psycholinguistic behavior, which the self-paced reading results reported here support.

Moreover, we note that it is only with respect to composition with the complement (Event 2) that our analysis predicts preferential recruitment for aspectual verbs above and beyond that for the two psychological verb conditions. With respect to composition with the subject (Event 1), our analysis in principle makes no such prediction. The reason for this is that like aspectual verbs, psychological verbs also select for both entity-denoting and event-denoting complements. It is possible that this complexity in lexical encoding gives rise to comparable degree of neurological recruitment in Wernicke’s area during lexical retrieval of the verbs.

To the extent that we found a difference at Event 1 between the AspectualV and LovingV sets on one hand and the EnjoyingV set on the other hand (AspectualV = LovingV > EnjoyingV), we look for the cause in the interpretive biases of this latter subset of psychological verbs. Indeed, what unites the verbs in the EnjoyingV set is that even though they may select for entity-denoting complements, the complement is preferentially interpreted eventively i.e. the preferential interpretation of enjoy the book is as enjoy reading/writing the book. Footnote 17 This contrasts with the LovingV set in that these verbs allow for such a paraphrase, but not necessarily preferentially. So, love the book, can be understood as love reading the book, but does not have to be; it can also be understood as love the story in the book (state). If this eventive bias in interpretation for EnjoyingV were to impact lexical retrieval (by allowing the event-denoting possibility to be considered first by the processor), it could make the EnjoyingV set less ambiguous and therefore less taxing on Wernicke’s area than the LovingV set. This could, in turn, result in a greater difference between AspectualV and EnjoyingV over AspectualV and LovingV at Event 1.

Finally, regarding Event 2, our hypothesis predicts AspectualV > {EnjoyingV = LovingV}, a prediction that holds only for LovingV. In line with our conjecture, we observe that EnjoyingV condition, constructed with an event-selecting bias, would be tapping a similar conceptual representation as the AspectualV set, which disambiguates along the eventive dimension.Footnote 18 Both conditions, AspectualV and EnjoyingV, would then recruit possibly overlapping cortical regions. Importantly, this would take place during Event 2, the segment when the decision in favor of the eventive dimension would have been made—a reading that, crucially, is not equally salient in the LovingV condition.

All this said, what is most important for our analysis is that the traditional coercion verb set shows a split in neurological recruitment of Wernicke’s area (AspectualV > EnjoyingV) at Event 1. And this split is consistent with the linguistic distinction proposed here.

6 Conclusion

This study represents an implementation of the Structured Individual analysis , working out its psychological and neurological viability in the form of the Structured Individual hypothesis ; a hypothesis that captures a psycho- and neurolinguistic distinction between aspectual verbs and psychological verbs in a linguistically principled manner, and invokes independently motivated processing mechanisms to capture their behavior. Our findings suggest that the complement coercion effect is better understood as involving the real-time composition of aspectual verbs rather than involving a special semantic operation such as type-shifting. In this way, this kind of hypothesis represents a manifestation of an approach to meaning composition which connects functional application to concept composition through lexicalization , all along grounded in fundamental principles of conceptual structure.