In this chapter, we generalize our eager left-corner incremental interpreter to cover conditionals and conjunctions. We focus on the (dynamic) semantic contrast between conditionals and conjunctions because the interaction between these sentential operators and anaphora/cataphora provides a strong argument that semantic parsing needs to be incremental, eager and competence-based, as argued in Brasoveanu and Dotlačil (2015a).

An extreme, but clear way to state the main theoretical proposal made by this chapter is the contention that anaphora, and presupposition in general, are properly understood as processing-level phenomena that guide and constrain memory retrieval processes associated with incremental interpretation. That is, they guide and constrain the cognitive process of integration, or linking, of new and old semantic information.

Under the assumption that anaphora and presuppositions are components of memory retrieval processes associated with the real-time integration of semantic information, the intrusion of world knowledge in these processes, i.e., the “pragmatics of anaphora resolution and presupposition resolution,” comes in naturally: world knowledge is stored in declarative memory, so it is natural for memory retrieval processes to be modulated by it.

Thus, the (most probably oversimplifying) hypothesis is that anaphora and presupposition have semantic effects, but anaphora and presupposition are not exclusively, or even primarily, semantics. The proper way to analyze them is as a part of the processing component of a broad theory of natural language interpretation.

This proposal is very close in spirit to the DRT account of presupposition proposed in van der Sandt (1992), Kamp (2001a, b), among others. Kamp (2001b), with its extended argument for and extensive use of preliminary representations—that is, meaning representations that explicitly include unresolved presuppositions—is a particularly close idea.

To see the connection between our proposal and the theory of presuppositions proposed in Kamp (2001b), consider the problem associated with the use of preliminary representations raised at the end of Kamp (2001b):

“[S]emanticists of a model-theoretic persuasion may want to see a formal semantics of [...] preliminary representations. [...] [T]he possibility of such a semantics is limited. To define a syntax of preliminary representations [...] which characterizes them as the expressions of a given representation formalism (or as data structures of a certain form) is not too difficult. Moreover, for those preliminary representations [...] in which all presuppositions appear in the highest possible position, an intuitively plausible model-theoretic semantics can be stated without too much difficulty. But for representations with presuppositions in subordinate positions [...] I very much doubt that one is to be had.” (Kamp 2001b, 250–51)

The solution to this problem that we propose in this chapter is that we shouldn’t even ask for a semantics of preliminary representations. This is a category error: preliminary representations are central to natural language interpretation, but they are not semantic representations: they are processing-level representations that support incremental interpretation mechanisms, and have semantic effects because of this.

The chapter is structured as follows. In Sect. 9.1, we discuss why the interaction of conditionals and cataphora provide a strong empirical argument for incremental interpretation at the semantic level. We also describe two experiments investigating the interaction between conditionals and pronominal cataphora on one hand (Sect. 9.1.1), and the interaction between conditionals and cataphoric presuppositions on the other hand (Sect. 9.1.2).

Section 9.2 sets up the general theoretical scene for the remainder of the chapter by arguing that mechanistic processing goals should be one of the core explanatory goals for formal semantics, and that a cognitive-architectural approach to natural language meaning and interpretation is one way to pursue that goal.

Section 9.3 moves on to the specifics of an ACT-R processing model for conditionals with a sentence-final if-clause that explicitly models their incremental interpretation. We show that the model qualitatively captures the interaction between conditionals and pronominal cataphora in a simple example.

Section 9.4 expands this model to capture the interaction between conditionals and cataphoric presuppositions for the items of the study reported in Sect. 9.1.2. We embed the resulting model in a Bayesian model, and show that it can quantitatively fit the experimental data fairly well. This section shows that the Bayes+ACT-R+DRT framework we introduced enables us to specify in a fully explicit way different competence-performance hypotheses about conditionals, and that experimental evidence from real-time experiments can be used to quantitatively compare alternative theories of conditionals and cataphora.

Finally, Sect. 9.5 concludes with a brief summary and an outline of directions for future research.

9.1 Two Experiments Studying the Interaction Between Conditionals and Cataphora

Brasoveanu and Dotlačil (2015a) investigate whether meaning representations commonly used in formal semantics are built up incrementally and predictively when language is used in real time, similar to the incremental and predictive construction of syntactic representations (Steedman 2001; Lewis and Vasishth 2005; Lau 2009; Hale 2011 among many others).

The main empirical challenge when studying the incremental processing of semantic representations is identifying phenomena that can tease apart the syntactic and semantic components of the interpretation process. The pervasive aspects of meaning composition that are syntax-based cannot provide an unambiguous window into the nature of semantic representation building: the incremental and predictive nature of real-time compositional interpretation could be primarily or exclusively due to our processing strategies for building syntactic representations.

There is a significant amount of work in psycholinguistics on incremental interpretation (Hagoort et al. 2004; Pickering et al. 2006 among many others), but this research usually focuses on the processing of lexical semantic and syntactic representations, and the incremental integration of world knowledge into the language interpretation process. The processing of logical representations of the kind formal semanticists are interested in is much less studied.

Similarly, there is a significant amount of work on incremental interpretation in natural language processing/understanding (Poesio 1994; Bos et al. 2004; Bos 2005; Hough et al. 2015 among many others), but this research usually discusses it from a formal and implementation perspective, and focuses much less on the cognitive aspects of processing semantic representations (the research in Steedman 2001 and related work is a notable exception).

Brasoveanu and Dotlačil (2015a) report two studies that argue for the incremental nature of processing formal semantic representations, as distinct from the syntactic representations they supervene on. The crucial evidence is provided by the interaction of anaphora and presupposition resolution on one hand, and conjunctions versus conditionals with a sentence-final antecedent on the other. Consider the contrast between and and if in the example below, where the presupposition trigger again is cataphoric:

figure a

Assume the construction of semantic representations is incremental, i.e., the interpreter processes if as soon as it is encountered. Furthermore, assume incremental interpretation is predictive, i.e., once if is read, the interpreter expects the upcoming if-clause to provide (some of) the interpretation context for the previously processed matrix clause. Then we expect to see a facilitation/speed-up in the second clause she had coffee with him after if is read, compared to when the same clause follows and. As we will see, this is what the experimental results show.

Specifically, we expect the second clause in (1) to be more difficult after and than after if because and signals that a potential antecedent for the again presupposition is unlikely to come after this point. Dynamic conjunction is interpreted sequentially: the second conjunct is interpreted relative to the context provided by the first conjunct, and not vice-versa. Consequently, an unresolved presupposition in the first conjunct cannot find an antecedent in the second conjunct.

In contrast, if leaves open the possibility that a suitable resolution for the again presupposition is forthcoming since the first clause (the matrix) is interpreted relative to the context provided by the second clause (the if-clause). This possibility allows interpreters to make better predictions about the content of the clause coming after if, which should ease its processing.

Crucially, our expectations—which arise from the interaction between the presupposition trigger again and the operators and versus if—are semantically driven. Nothing in the syntax of conjunction versus if-adjunction could make a successful presupposition resolution more or less likely.

9.1.1 Experiment 1: Anaphora Versus Cataphora in Conjunctions Versus Conditionals

[Elbourne 2009, 1] defines donkey cataphora as “a configuration in which a pronoun precedes and depends for its interpretation on an indefinite that does not c-command it.” Some cataphora examples with conditionals are provided below, both with sentence-initial, (2)–(6), and with sentence-final if-clauses, (7).

figure b

Certain configurations are not acceptable (Elbourne 2009, 2), e.g., (8c) below, due to Principle C violations. Antecedents are marked with a superscript, and the corresponding anaphors/cataphors are marked with a subscript.

figure c

The contrast between (8b) and (8c), as well as the fact that Principle C is not violated if cataphoric pronouns appear in object position (see (7) above and (9) below), provide evidence that a sentence-final if-clause is adjoined lower than the matrix-clause subject, but higher than the object. For concreteness, let’s say that a sentence-final if-clause is VP-adjoined.

figure d

In contrast, a sentence-initial if-clause is adjoined higher than the matrix-clause subject. For concreteness, let’s say it is CP-adjoined.

Other arguments for these two syntactic structures are provided by VP ellipsis, as in (10), and VP topicalization, as in (11); see Bhatt and Pancheva (2006) for more discussion.

figure e

Based on these observations, Brasoveanu and Dotlačil (2015a) conclude that there is no ‘ordinary’ syntax-mediated binding from a c-commanding position for direct object (DO) donkey cataphora in conditionals with sentence-final if clauses. That is, donkey cataphora from the DO position of the matrix clause is a ‘true’ example of donkey cataphora that can be used to test the incrementality and predictiveness of semantic parsing.

Brasoveanu and Dotlačil ’s Experiment 1 tested donkey anaphora and cataphora in a \(2 \times 2\) design, exemplified in (12):

figure f

Kazanina et al. (2007) used an on-line reading methodology (self-paced reading) to show that a cataphoric pronoun triggers an active search for an antecedent in the following material, and that this search takes into account structural constraints (Principle C) from an early stage.Footnote 1 Kazanina et al. (2007) take the temporal priority of syntactic information as evidence for the incremental and predictive nature of syntactic constraints. The question investigated in this experiment can therefore be further specified as: is this active search mechanism also semantically constrained?

The methodology used in Brasoveanu and Dotlačil ’s Experiment 1 was also self-paced reading (Just et al. 1982) with a non-cumulative moving window. The regular anaphora cases provide the baseline conditions. Assuming a deep enough incremental and predictive interpretation, the second clause in these conditions is expected to be more difficult after if than after and because of extra cognitive load coming from two sources.

The first source of difficulty is the semantics of conditionals versus conjunctions. For conditionals, we generate a hypothetical intermediate interpretation context satisfying the antecedent, and we evaluate the consequent relative to this hypothetical context. That is, we need to maintain both the actual, global interpretation context and the intermediate, antecedent-satisfying context, to complete the interpretation of conditionals. There is no similar cognitive load for conjunctions.

The second source of difficulty is specific to conditionals with a sentence-final if-clause. When such conditionals are incrementally interpreted, the matrix clause needs to be semantically reanalyzed. The matrix clause is initially interpreted relative to the global context, just like a top-level assertion is. However, when if is reached, the matrix clause has to be reinterpreted relative to the intermediate, antecedent-satisfying context: the comprehender realizes that the matrix clause is not a top-level assertion, but is the consequent of a conditional instead. There is no such difficulty for conjunctions: the first conjunct is simply interpreted relative to the global context, and the second conjunct is interpreted relative to the context that is the result of the update contributed by the first-conjunct.

For the cataphora (non-baseline cases), we expect a cognitive load reversal. The conjunction and signals that no suitable antecedent for the cataphor is forthcoming since the second clause is interpreted relative to the context provided/updated by the first clause. In contrast, if triggers the semantic reanalysis of the matrix clause and leaves open the possibility that a suitable antecedent for the cataphor is forthcoming since the matrix clause is interpreted relative to the context provided/updated by the second clause. This expectation (and the fact that it is confirmed) should speed up the processing. So we expect to see a speed-up in the if & cataphora cases, i.e., a negative if \(\times \) cataphora interaction.

These predictions were only partially confirmed in Brasoveanu and Dotlačil ’s first experiment: baseline if was indeed harder (statistically significant) but the if \(\times \) cataphora interaction, while negative, did not reach significance.

The regions of interest (ROIs) were primarily (i) the post-connective ROIs his helpful colleague, and secondarily (ii) the post-resolution ROIs that whole; see (13) below.

The mean log reading times (log RTs) for the 5 ROIs are plotted in Fig. 9.1 (plots created with R and ggplot2; R Core Team 2014; Wickham 2009).

figure g

Brasoveanu and Dotlačil analyzed the data using linear mixed-effects models. The response was the log-transformed readings times (log RTs) for the 3 ROIs immediately following the sentential operator and/if (RTs are log-transformed to mitigate their characteristic right-skewness). Residualized log RTs (residualized for word length and word position, following Trueswell et al. 1994) were also analyzed, but the pattern of results did not change, so Brasoveanu and Dotlačil report the more easily intelligible models with raw log RT responses.

Fig. 9.1
figure 1

Experiment 1: mean log RTs for the five regions of interest (ROIs)

The predictors (fixed effects) were as follows:

  • main effects of connective and ana/cataphora, and

  • their interaction;

  • the levels of the connective factor: and (reference level) versus if;

  • the levels of the ana/cataphora factor: anaphora (reference level) versus cataphora.

Crossed random effects for subjects and items were included, and the models with maximal random effect structures that converged (Barr et al. 2013) were reported, usually subject and item random intercepts, and subject and item random slopes for at least one of the two main effects. The maximum likelihood estimates (MLEs) and associated standard errors (SEs) and p-values are provided in (14) and (15) below (we omit the intercepts). Significant and nearly significant effects (\(p < 0.1\)) are boldfaced.

figure h
figure i

Table 14 shows several effects. First, baseline if (i.e., if & anaphora) is more difficult than baseline and (i.e., and & anaphora). This effect is compatible with the hypothesis that to interpret conditionals, we need to maintain both the actual, global interpretation context and the intermediate, antecedent-satisfying context. The effect is also compatible with the hypothesis that the matrix clause is reanalyzed in conditionals with a final if-clause: it is initially interpreted relative to the global context until if is reached, at which point it is reinterpreted relative to the intermediate, antecedent-satisfying context.

cataphora seems to be more difficult than anaphora for and, but the effect never reaches significance (it is close to significant in the first ROI after cataphora is resolved). Maybe the and & cataphora condition is simply too hard, so readers stop trying to fully comprehend the sentence and speed up. If so, this will obscure the if \(\times \) cataphora interaction: there is a negative interaction between if and cataphora in all ROIs, i.e., if seems to facilitate cataphora (as expected if semantic evaluation is incremental and predictive), but this effect is not significant.

The consistent negative interaction is promising, so Brasoveanu and Dotlačil (2015a) elicited it in a follow-up experiment with a hard presupposition trigger (again; Abusch 2010; Schwarz 2014 among others), which might have a larger effect. A third ‘(mis)match’ manipulation was also added to control for readers speeding up through conditions that are too hard.

9.1.2 Experiment 2: Cataphoric Presuppositions in Conjunctions Versus Conditionals

The second experiment in Brasoveanu and Dotlačil (2015a) had a 2 \(\times \) 2 \(\times \) 2 design, exemplified in (16) below. The match/mismatch manipulation was new, and consisted of verbs in the second clause that matched or didn’t match the corresponding verbs in the first clause.

figure j

The method was similar to Experiment 1. Self-paced reading with a moving window was used, but each stimulus ended with an acceptability judgment on a 5-point Likert scale, from 1 (very bad) to 5 (very good). The acceptability judgment was elicited on a new screen after every item or filler. Every experimental item was followed by a comprehension question. Each of the 8 conditions was tested 4 times, for 32 items total; one item had a typo, and was discarded from all subsequent analyses. There were 70 fillers: monoclausal and multiclausal, conditionals, conjunctions, when-clauses, relative clauses, quantifiers, adverbs, etc.

Thirty-two native speakers of English participated (UCSC undergraduate students). They completed the experiment online for course (extra-)credit on a UCSC hosted installation of the IBEX platform. Each item was passed through all 8 conditions, and 8 lists were generated following a Latin square design. The participants were rotated through the 8 lists. Every participant responded to 102 stimuli (32 items + 70 fillers), the order of which was randomized for every participant. Any two items were separated by at least one filler.

There were fillers that were both acceptable (Bob ate his burger and he rented something to watch, but he didn’t say what) and unacceptable (Willem visited Paris because Sarah visited Amsterdam too). All participants exhibited the expected difference in acceptability ratings between these two types of fillers.

There were 72 comprehension questions with correct/incorrect answers, 32 after experimental items. The accuracy for all participants was above \(80\%\).

The results of this study are visually summarized in Fig. 9.2. The ROIs for Exp. 2 are the words following the verb in the second clause, i.e., the words immediately following the last experimental manipulation, which is (mis)match. Brasoveanu and Dotlačil examined only the 4 immediately post-verbal ROIs because the fifth word was the final one for some items, and the wrap-up effect associated with sentence-final words would contribute additional, possibly biasing noise.

figure k
Fig. 9.2
figure 2

Experiment 2: mean log RTs for the four regions of interest (ROIs)

The data analysis was similar to the one conducted for Experiment 1: linear mixed-effects models with log RTs as the response variable, and main effects of connective and nothing/cataphora, match/mismatch and their 2-way and 3-way interactions as predictors (fixed effects). The levels of the 3 factors were as follows:

  • for connective, and (reference level) versus if,

  • for nothing/cataphora: nothing (reference level) versus cataphora, and

  • for (mis)match: match (reference level) versus mismatch.

The models also included crossed random effects for subjects and items, namely the maximal random effect structure that converged (usually subject and item random intercepts, and subject and item random slopes for at least two of the three main effects). The statistical modeling results are summarized in (18) below (once again, we omit the intercepts).

figure l

Just as in Experiment 1, baseline if (i.e., if & nothing & match) is more difficult than baseline and (i.e., and & nothing & match). This is compatible with the hypothesis that conditionals are harder than conjunctions—because we need to maintain two evaluation contexts, and/or because the matrix clause is semantically reanalyzed when if is reached.Footnote 2

There is a significant negative interaction of mismatch\(\times \) if (note: again is not present here), which basically cancels out the main effect of if. That is, conditionals with non-identical VP meanings in the antecedent and consequent clauses are processed more easily than conditionals with identical VP meanings, about as easily as conjunctions with non-identical VP meanings in the two conjuncts. The difficulties tied to conditionals with identical VP meanings are probably caused by a violation of Maximize Presupposition (Heim 1991), which requires that a presupposed VP meaning should be marked as such by again. This penalizes conditionals with matching VP meanings, while conditionals with non-identical VP meanings are not affected. Furthermore, if participants interpret incrementally and predictively, Maximize Presupposition should not affect coordinations (specifically, the first conjunct in a coordination), which corresponds to our findings.

The Maximize Presupposition constraint provides a third possible reason for the cost of baseline if relative to baseline and aside from the suggestions discussed before, namely that conditionals are harder than conjunctions because we need to maintain two evaluation contexts, and/or because the matrix clause is semantically reanalyzed. They all might be at work here (distinguishing between them is left for a future occasion), but Maximize Presupposition might be particularly suitable as an explanation for the Experiment 2 results: it explains the cost of if, but it also explains the negative interaction mismatch \(\times \) if, which is unexpected under the hypothesis that if on its own is costly. Furthermore, in Experiment 2, the effect of if is observed on with and her, which makes the explanation in terms of reanalysis unlikely given the lateness of the effect. In Experiment 1, the effect of if was detectable on the second word after if, so the reanalysis explanation is more plausible for that experiment.

There are no main effects of cataphora and mismatch, but their 2-way interaction is negative and significant (or close to significant) in two out of the four regions. Whenever (close to) significant, this interaction effectively cancels the main effects of both mismatch and cataphora. That is, the and & cataphora & mismatch condition is about as difficult as the reference condition and & nothing & match, which suggests that participants stopped trying to properly interpret the difficult condition and & cataphora & mismatch and moved on/sped up.

There is a (close to) significant negative interaction of cataphora\(\times \) if in the two regions immediately following the verb (note: we are discussing matching conditions). In both regions, this 2-way interaction effectively cancels out the positive main effects of cataphora and if put together. This is exactly the configuration Brasoveanu and Dotlačil were looking for in Experiment 1, only it did not reach significance there. That is, if facilitates the processing of cataphora, even though if and cataphora on their own are more difficult. This supports the hypothesis that the construction of formal semantic representations is incremental and predictive.

Finally, the statistically significant and positive 3-way interaction cataphora \(\times \) if \(\times \) mismatch in the region immediately following the verb provides further empirical support for the hypothesis that the construction of formal semantic representations is incremental and predictive. The mismatch is surprising because the human interpreter expects to find a suitable antecedent for the again presupposition, and that expectation is not satisfied.

In sum, Experiments 1 and 2 in Brasoveanu and Dotlačil (2015a) provide coherent support for the incremental and predictive nature of the process of constructing meaning representations of the kind employed in formal semantics.

9.2 Mechanistic Processing Models as an Explanatory Goal for Semantics

The main questions at this point are the following. As formal semanticists, should we account for the incremental and predictive nature of the real-time semantic interpretation process? And if so, how?

It is important to remember that addressing these questions is firmly rooted in the tradition of dynamic semantics. Kamp (1981) begins like this:

“Two conceptions of meaning have dominated formal semantics of natural language. The first of these sees meaning principally as that which determines conditions of truth. [...] According to the second conception meaning is, first and foremost, that which a language user grasps when he understands the words he hears or reads. [...] these two conceptions [...] have remained largely separated for a considerable period of time. This separation has become an obstacle to the development of semantic theory [...] The theory presented here is an attempt to remove this obstacle. It combines a definition of truth with a systematic account of semantic representations.” (Kamp 1981, 189)

Thus, the implicit overarching goal for us as (cognitive) scientists studying natural language meaning and interpretation is to provide a formally explicit account of natural language interpretive behavior, i.e., a mathematically explicit, unified theory of semantic/pragmatic competence and performance.

To contextualize our position and outline some possible alternatives, let us consider the corresponding debate on the syntax side. [Phillips and Lewis 2013, 14] identify two reasonable positions that working linguists more or less implicitly subscribe to in practice: (i) principled extensionalism, and (ii) strategic extensionalism.

Principled extensionalism takes a grammar/grammatical theory to be merely an abstract characterization of a function whose extension is all and only the well-formed sentences of a given language (plus their denotations, if the grammar incorporates a semantic component).Footnote 3 The individual components of the grammatical theory have no independent status as mental objects or processes: they are components of an abstract function, not of a more concrete description of a mental system.

This kind of position cannot be tested using most empirical evidence aside from acceptability (or truth-value/entailment) judgments, since the position only aims to capture the ‘end products’ of the grammatical system and not the way these products are actually produced/comprehended.

The ‘principled’ part is that the extensionalist enterprise is understood as an end in itself, relevant even if lower-level characterizations of the human language system are provided (algorithmic/mechanistic, or implementation/neural level; Marr 1982). The linguist’s task is to characterize what the human language system computes and distinguish it from how speakers actually carry out that computation, which is the psycholinguist’s task.

The strategic extensionalism position takes the goal of formulating a grammatical theory to be a reasonable interim goal, but not an end in itself. The ultimate goal is to move beyond extensional description to a more detailed, mechanistic understanding of the human language system: describing an abstract function that identifies all the grammatical sentences of a language is just a first step in understanding how speakers actually comprehend/produce sentences in real time. We seek theories that capture how sentences are put together, and not just what their final form is. From this perspective, we should try to account for left-to-right structure building mechanisms, both at the syntactic and at the semantic level.

The strategic-extensionalism position is closely related to the cognitive-architecture based approach to research in cognitive science, which we have in fact followed throughout this book. As [Anderson 2007, 7–8] puts it:

“A cognitive architecture is a specification of the structure of the brain at a level of abstraction that explains how it achieves the function of the mind [...] [i.e.,] human cognition in all of its complexity. [...] Th[is] type of architectural program [...] requires paying attention to three things: brain, mind (functional cognition), and the architectural abstractions that link them. [...] [A]pproaches that tried to get by with less [...] can be viewed as shortcuts to understanding.”

Anderson (2007) goes on to consider three such shortcuts. The first one is classical information-processing psychology that completely ignores the brain (at any level of abstraction), and that is basically the same as the principled-extensionalism position we characterized above. However, unlike much of formal semantics, cognitive psychology has realized by now that “cognition is not so abstract that our understanding of it can be totally divorced from our understanding of [the] physical reality [underlying it].” (Anderson 2007, 11)

This does not mean, of course, that the opposite position—eliminative connectionism—is not a shortcut also. “This approach ignores mental function as a constraint and just provides an abstract characterization of brain structure [...] [but these models] work only because we are able to imagine how [they] could serve a useful function in a larger system [...] [However, this] functionality is not achieved by a connectionist system.” (Anderson 2007, 11–14)

Finally, a third shortcut that has become recently fairly popular in formal semantics and pragmatics, is the rational-analysis approach to cognition. The basic insight behind this approach is that “a constraint on how the brain achieves the mind is that both the brain and the mind have to survive in the real world: rather than focus on architecture as the key abstraction, focus on adaptation to the environment [...] [T]he Bayesian statistical methodology that accompanies much of this research [...] comes to take the place of the cognitive architecture.” (Anderson 2007, 15–16)

Anderson’s own work on declarative memory is an early instantiation of this approach; see Anderson (1990); Anderson and Schooler (1991); Schooler and Anderson (1997). As we discussed in detail in Chap. 6, the ACT-R base activation equation encodes that “a memory for something diminishes in proportion to how likely people are to need that memory. [...] Human memory [...] mirror[s] statistical relationship[s] in the environment. [...] Thus, the argument goes, one does not need a description of how memory works, which is what an architecture gives; rather, one just needs to focus on how memory solves the problems it encounters.” (Anderson 2007, 17)

While possibly enlightening for individual cognitive components, this approach falls short of a complete theory of the human mind.

“[T]he human mind is not just the sum of core competences such as memory, or categorization, or reasoning. It is about how all these pieces and other pieces work together to produce cognition. All the pieces might be adapted to the regularities in the world, but understanding their individual adaptations does not address how they are put together. [...] What distinguishes humans is their ability to bring the pieces together, and this unique ability is just what adaptive analyses do not address, and just what a cognitive architecture is all about.” (Anderson 2007, 18)

In this book, we have consistently taken a cognitive-architectural approach to natural language meaning and interpretation. Our ultimate goal is to provide a framework in which we can build mechanistic processing models for natural language comprehension, with pieces that are independently needed for other higher-level cognitive processes.

It is in this context that we introduced and used Bayesian methods: we use them for theoretically-informed data analysis. More precisely, we use them as essential bridges that systematically connect independently-motivated semantics and processing theories and cognitive-architectural organization principles and constraints on one hand, and experimental data on the other hand.

9.3 Modeling the Interaction of Conditionals and Pronominal Cataphora

Assuming a cognitive-architecture based approach to semantics, like we have done throughout this book (or a strategic extensionalist position), the next question is: how should we account for the incremental and predictive nature of semantic interpretation? We will not settle this question here, but we will outline two distinct approaches and flesh out in detail one of them.

As far as we can tell, there is a spectrum of approaches to incrementality effects, and the two extremes on that spectrum are accounting for incrementality (i) in the semantics versus (ii) in the processor.

The first alternative is parallel to the proposal in Phillips (1996, 2003) on the syntax side. The main claim in Phillips (1996, 2003) is that syntactic structures are built left-to-right, not top-down/bottom-up, and the incremental left-to-right system is the only structure-building system that humans have (‘the parser is the grammar’).

A similar proposal on the semantics side is sketched in Brasoveanu and Dotlačil (2015a, b). The idea is to provide a recursive definition of truth and satisfaction for first-order predicate logic that is fully incremental, building on the incremental propositional logic system in Vermeulen (1994). The resulting system, dubbed Incremental Dynamic Predicate Logic (IDPL), builds incrementality into the heart of semantics.

The second alternative is parallel to the proposal in Hofmeister et al. (2013) on the syntax side, the main goal of which is to argue that “many of the findings from studies putatively supporting grammar-based interpretations of island phenomena have plausible, alternative interpretations rooted in specific, well-documented processing mechanisms” (Hofmeister et al. 2013, 44). The remainder of this chapter is dedicated to fleshing out this approach.

Our specific proposal on the processing side is to extend the eager left-corner parser for DRT we introduced in the previous chapter with conjunctions, conditionals and anaphora/cataphora, so that we can explicitly and fully model the two self-paced reading experiments discussed in Sect. 9.1 above.

In this section, we introduce the basic model that captures the qualitative pattern of interactions between cataphora and conjunctions versus conditionals in Experiment 1. In the next section, we introduce the model in its full complexity. The full model can syntactically and semantically parse the items in Experiment 2, which enables us to quantitatively fit it to the data from Experiment 2.

To model pronominal and presuppositional anaphora/cataphora, we add a new goal-like buffer unresolved_discourse to our ACT-R mind, which will store the unresolved DRSs contributed by pronouns and the presuppositional trigger again. We set the encoding delay for this buffer, as well as the imaginal and discourse_context buffers we used in the previous chapter to 0:

figure m

In principle, we might be able to model anaphora and cataphora without this additional unresolved_discourse buffer, but we decided to use it here for presentational clarity.Footnote 4

9.3.1 Chunk Types and the Lexical Information Stored in Declarative Memory

The chunk types we will need are the same as the ones we used in the previous chapter, plus a new chunk type for predicates. They are listed in (20) below.

figure n

Parsing goal chunks have the expected features:

  • the current parsing task;

  • the stack of syntactic goals driving the parsing process (stack1 stack2 stack3);

  • a stack for arguments (arg_stack1 arg_stack2) that need to be passed across different semantic chunks, e.g., from the subject to the verbal predicate;

  • the right-edge stack keeps track of possible points of attachment made available by the current, partially-built syntactic tree (right_edge_stack1 right_edge_stack2 ...);

    • we called this stack right_frontier in the previous chapter, but we renamed it here for brevity;

    • the right-edge stack we need in this chapter has additional positions because of the need to attach conjuncts and if-adjuncts;

  • the parsed_word and found features are used in much the same way as in the previous chapters;

  • the discourse_status feature will keep track of whether a DRS constructed at some point during the incremental interpretation process:

    • contributes to the at_issue meaning, or

    • is unresolved, e.g., it is the presupposition contributed by a pronoun or the adverb again, or

    • is presupposed, i.e., it is the resolved presupposition contributed by a pronoun or the adverb again;

  • just as in the previous chapter, we introduce new discourse referents (drefs) with fresh (previously unused) indices by keeping track of the current dref_peg/index in the goal buffer and updating it as soon as a dref with that peg/index is introduced;

    • but in this chapter, we have event drefs (needed for again) in addition to individual-level drefs, so we also keep track of the current event_peg;

    • in addition, we keep track of which sub-DRSs are part of the main DRS by associating them with current drs_peg; this is a flatter/simpler solution than the one we used in the previous chapter, where a main DRS had sub-DRSs stored in its slots;

    • DRS drefs are basically propositional drefs and are independently needed for conditionals, for example; we keep track of the previous DRS peg (prev_drs_peg) to be able to capture the semantic reanalysis triggered by sentence-final if-clauses;

  • finally, the three features entity_cataphora, event_cataphora and if_conseq_pred are needed to account for the processing of cataphoric pronouns and again, and will be discussed in detail later in the chapter.

Chunks of parse_state type have the same structure as before, except for the addition of a mother_of_mother feature. This feature enables us to keep track of the partial syntactic structure constructed by the incremental comprehension process in a little more detail at the local level of an individual chunk.

The lexical entry of a word keeps track of:

  • its written form (a proxy for its phonological representation),

  • its syntactic category, and

  • up to two predicates pred1 and pred2 that represent the meaning of that word.

We use these two predicate slots in various ways. For example, a proper name like Danielle contributes:

  • a predicate danielle (with a singleton set denotation) as its pred1 value (this follows the analysis of proper names in Kamp and Reyle 1993), and

  • the gender predicate female as its pred2 value that can be leveraged to resolve a subsequent pronoun she/her anaphoric to the proper name.

The values of these two pred1 and pred2 features are chunks of type pred, which specify the name of the non-logical constant associated with the predicate (the feature constant_name) and the arity of the constant.

Finally, chunks of type drs have the same structure as the one used in the previous chapter, with the addition of several new features necessary to capture the interaction of cataphora and conjunctions/conditionals:

  • the dref feature keeps track of the new propositional dref (if any) introduced by the DRS;

  • pred1 and pred2 store the predicates that are part of the conditions contributed by the DRS;

  • event_arg stores the event dref (if any) taken as an argument by pred1 and/or pred2;

  • arg1 and arg2 store the entity drefs that are the arguments of pred1 and/or pred2;

  • discourse_status keeps track of the discourse status of the DRS;

    • we only need three possible values for this feature in this chapter: at_issue, unresolved and presupposed;

  • the drs feature keeps track of the DRS peg that the current DRS is associated with;

  • finally, embedding_level keeps track of the embedding level of the DRS, the main function of which is to constrain pronoun and anaphora resolution, just as in Kamp and Reyle (1993);

    • discourse-initial main clauses are embedding_level 0;

    • conditional antecedents or the second conjunct in a conjunction are embedding_level 1;

    • conditional consequents or the third conjunct in a conjunction will be embedding_level 2;

    • pronouns and presuppositions, whether anaphoric or cataphoric, can only find antecedents at a higher embedding level (or at the same level in certain cases).

Let us look at some example lexical entries that will be stored in declarative memory (dm). The lexical entry of a proper name like Danielle is a chunk of the following form:

figure o

In (21), we use the chunkstring method to assemble the lexical entry for the proper name Danielle (of type word) from a Python3 string. The values of the form and cat features are as expected. The pred1 and pred2 values are themselves chunks of type pred. These predicate chunks are assumed to be already available in dm at the time we assemble the lexical entry for Danielle, and they are declared as follows:

figure p

Just as in the previous chapter, the lexical entry for the determiner a does not contain any semantic information. The associated semantic representations and operations, namely, introducing a new dref, predicating the common noun and the verbal predicate of it etc., are all contributed by production rules stored in procedural memory.

A pronoun like she has a lexical entry of the following form:

figure q

The gender of the pronoun, which the antecedent of the pronoun will have to satisfy, is stored as the pred2 value. We follow the semantics for pronouns proposed in Kamp and Reyle (1993) and assume that pronouns introduce their own dref, but they need to equate it with the dref contributed by a suitable antecedent. This is the reason for making EQUALS the ‘main’ predicate contributed by the pronoun, i.e., the value of the pred1 feature. The exact specification of the EQUALS predicate, which has an arity of 2 (as expected), is provided in (25) below.

figure r

It would be natural to have a lexical entry for again that would be parallel to that of pronouns, except that it would relate event drefs instead of entity drefs, and it would contribute the predicate PRECEDES in (26) below instead of EQUALS.

figure s

It turns out, however, that it is more convenient to give again a semantically empty lexical entry, shown in (27) below, and let suitably formulated production rules make the correct semantic contributions.

figure t

This superficial difference between pronouns and again is due to the fact that, syntactically, again is an adjunct that needs to retrieve the VP it adjoins to and reopen it for adjunction. In addition, semantically, again is ‘parasitic’ on the event dref contributed by the verbal predicate it adjoins to. This is in contrast to pronouns, which introduce their own entity dref.

However, what again and pronouns do have in common is that they both need to be resolved: they have to relate their dref, whether they introduce that dref or ‘inherit’ it from the VP, to another dref. A successful resolution requires the other dref to be available in, and retrieved from, declarative memory.

Just like proper names, common nouns introduce two predicates, one being the common noun itself and the other being the gender. For example, the lexical entry for car and its associated predicate chunks are as follows:

figure u

Intransitive and transitive verbs introduce a single predicate, with an arity that specifies that an event argument is required, plus 1 or 2 individual-level arguments. For example, laughed and greeted have the lexical entries in (29) and (30) below.

figure v

The items used in Experiment 2 (see Sect. 9.1.2 above) also contain prepositional verbs like argue/play with. We analyze them as transitive verbs, but we assign a different syntactic category VtPP to them. This will enable us to formulate production rules that will syntactically and semantically integrate them with the subsequent preposition.

figure w

For simplicity, we assume that prepositions like with that are part of prepositional verbs do not take an event argument, as shown in (32) below. We make the same simplifying assumption about adjectives like overcooked, as shown in (33).

figure x

Finally, we take the lexical entry of the sentential operators and and if to be semantically empty. The associated semantic representations and operations will all be contributed by production rules.

figure y

The full code for this part of the model is linked to at the end of the chapter in Appendix 9.6.1.

9.3.2 Rules to Advance Dref Peg Positions, Key Presses and Word-Related Rules

The entire set of production rules is linked to at the end of this chapter (Appendix 9.6.2). Here, we will highlight only the most crucial ones.

We have several families of production rules that advance the dref peg position for (i) entity/individual-level drefs, (ii) event drefs and (iii) DRS drefs. The idea of peg positions was introduced and justified in Chap. 8 (see Sect. 8.3) and the rules we use in this chapter are the same, except we generalize this idea to drefs for types other than entities/individuals, namely event drefs and DRS/propositional drefs.

Turning to word-related rules, the "encode word" rule in (35) below fires whenever the parser is not engaged in a set of tasks that should take priority relative to word encoding (lines 4–14). We can think of the "encode word" rule as an ‘elsewhere’ rule: if the parser is not engaged in a more pressing task, it should check whether there is a value in the visual buffer that can be encoded (lines 17–19 in (35)). If such a value is available, it is encoded in the goal buffer as the value of the parsed_word feature (line 24).

figure z

The ‘elsewhere’ nature of the "encode word" rule is also reflected in the fact that we assign it a lower utility of \(-1\) (line 27 in (35)) than the default, which is 0. In a more realistic system that learns rule utilities from data, this utility would be automatically inferred, and we would be able to greatly simplify the rule by removing the long list of negative conditions on lines 4–14: the fact that all the tasks listed on lines 4–14 need to take priority over word encoding should arise from the utilities of the relevant productions rules, rather than being hard-coded in this fashion. However, to make rule preferences transparent, we opted for hard-coding them in this model.

The "retrieve word" rule in (36) below requests lexical information about the word we just encoded from declarative memory.

figure aa

Once the lexical information is retrieved and available in the retrieval buffer, we shift and project the word (37). This means that we build a unary-branching syntactic structure in the imaginal buffer, with the word as the daughter and its syntactic category as the mother (lines 15–18 in (37)). At the same time, we update the found slot in the goal buffer to the same syntactic category (line 14), so that other syntax/semantics processing steps necessary to integrate the retrieved word are triggered. Finally, we start a key_press task (line 13). The motor operations associated with this task will execute in parallel to the additional syntax/semantics processing steps associated with the retrieved word.

figure ab

The shift-and-project rule in (37) applies to all words except nouns (N). The shift-and-project N rule is different only because the preceding Det has already created an NP syntactic structure in the imaginal buffer that the N needs to update before creating its own unary-branching structure in the same buffer. We do not list the rule here; the complete set of rules is linked to in Appendix 9.6.2.

The key_press task consists of only one rule: the "press spacebar" rule in (38) below. This rule adds a command to press the spacebar to the manual buffer (lines 13–16). Once that is done, we revert to the default task of parsing (line 12).

figure ac

Finally, when there are no more words to be read on the virtual screen, we end the syntax/semantics parsing process with the "finished: no visual input" rule in (39) below. This rule flushes all the goal/goal-like buffers.

figure ad

9.3.3 Phrase Structure Rules

The project and project-and-complete phrase structure rules encode most of the syntactic parsing and all the semantic parsing work. Consequently, these rules tend to have relative large lists of actions to be executed. We will only discuss here some of the most important phrase structure rules. The remaining ones, linked to in Appendix 9.6.2, have the same kind of structure and should be straightforward to understand.

Consider first the "project: NP ==> Det N" rule in (40) below. This rule is triggered once a determiner is shifted and projected, so the found feature stores a Det value (line 10). Another important condition for this rule is that the top parsing goal is to parse an S (line 5). That is, this rule is triggered by determiners in subject position. Determiners in object position trigger a different rule ("project and complete: NP ==> Det N"; see Appendix 9.6.2), which is very similar except that the top parsing goal is an NP rather than an S.

figure ae

The "project: NP ==> Det N" rule in (40) also conditions on the retrieval buffer making available the semantics of the word we just parsed, specifically, the two predicates =p1 and =p2 it contributes (lines 14–17). This is spurious in the present case because we have just parsed a determiner, which does not contribute any predicates, but we include it for uniformity with the phrase structure rules for proper names, nouns, verbs, pronouns etc.

The "project: NP ==> Det N" rule in (40) triggers three main parsing actions: (i) it updates the goal buffer, thereby setting the context for subsequent parsing rules (lines 19–28); (ii) it updates the imaginal buffer with the expected NP node with two daughters (Det and N), and it attaches the NP to the top right-edge attachment point (the S node) (lines 29–35); (iii) finally, it updates the discourse_context buffer with a new DRS (lines 36–42).

The new DRS has a similar structure to the sub-DRSs contributed by indefinites that we discussed in the previous chapter (Chap. 8): we introduce a new individual-level dref (line 38) and we plug it in as the argument of the upcoming N (line 39).

However, in contrast to what we did in the previous chapter, we mark the current sub-DRS as being part of a larger DRS (i.e., part of a larger propositional update) by means of a drs slot which has the current DRS dref =drs_peg as its value (line 40). Thus, the fact that this sub-DRS is part of a larger main DRS is only implicit in this representation, unlike the more explicitly hierarchical structure we built in the previous chapter. We prefer this flatter structure for sub-DRSs and main DRSs in this chapter because it is simpler to use, and sufficient for our purposes.

However, because we are modeling pronoun and presupposition resolution, we need to keep track of two other semantic features: the embedding level of the current sub-DRS (line 41), which is crucial for the resolution process, and the discourse status of the current sub-DRS (line 42). The discourse status is at_issue, to be distinguished from either (i) the unresolved discourse status associated with unresolved pronouns/presuppositions, or (ii) the presupposed discourse status associated with resolved pronouns/presuppositions.

Given these imaginal and discourse_context buffer updates, we update the goal buffer in a variety of ways:

  • we update the task to move_dref_peg (line 21 in (40)): we have just introduced a new dref indexed with the current dref_peg, so we need to move the peg to the next position in preparation for subsequent indefinites;

  • we set up the stack of parsing goals in the expected way (lines 22–24): we have just parsed a Det in subject position, so we expect to parse an N next, after which we expect to finish parsing the subject NP, and once that is completed, we revert to the initial goal of parsing an S;

  • we reset the found and parsed_word features to None (lines 25–26) to indicate that we are finished parsing the current word;

  • we push the dref we just introduced on the argument stack (lines 27–28) so that it is available as an argument for the predicate(s) introduced by the upcoming VP.

Finally, we flush the retrieval buffer (line 43 in (40)), since we have no further use for the lexical information associated with the current word.

The syntactic and semantic parsing actions triggered by the project-NP rule set up the context for the "project and complete: N" rule in (41) below. If the syntactic category at the top of the goal stack is N (line 5) and we have just shifted and projected an N (line 8), the semantics of which is available in the retrieval buffer (lines 9–12), we take two main parsing actions. On one hand, we add the two predicates =p1 and =p2 lexically contributed by the N to the DRS in the discourse_context buffer, which was contributed by the preceding indefinite determiner (lines 23–26). On the other hand, we pop the N goal off the goal-buffer stack and reset the found and parsed_word features back to None. To wrap up, we do some cognitive-state clean-up and flush the retrieval and imaginal buffers (lines 27–28).

figure af

At this point, we have fully parsed the subject NP, so we need to pop that goal off the top of the goal-buffer stack. Moreover, we have found the left corner of the S, namely the subject NP, so we can also pop the S goal off the stack and replace it with the goal of finding the VP that will complete the S. The "project and complete: S ==> NP VP" in (42) below triggers all these parsing actions, and also builds the binary-branching \([\!\!{{_{\text{ s }}\!\!\!\text{ NP }\!\!\!\text{ VP }}}\!\!]\) structure in the imaginal buffer.

figure ag

At this point, various project-and-complete-VP rules can be triggered, depending on the syntactic category of the finite verb. We will only discuss here the simplest case, namely intransitive verbs, and then work through a simple example. The "project and complete: VP ==> Vi" rule is provided in (43) below. This rule is triggered if the top syntactic category in the goal-buffer stack is VP (line 5), and we have just shifted and projected an intransitive verb (line 6), the semantics of which is available in the retrieval buffer (lines 16–18).

figure ah

If these pre-conditions are met, the "project and complete: VP ==> Vi" rule in (43) triggers three main parsing actions, just like the project-NP rule in (40) above, or most other rules associated with words contributing essential semantic information and operations. Just as before, the three main parsing actions involve: (i) the goal buffer (lines 20–29); (ii) the imaginal buffer, where new syntactic information is encoded (lines 30–36); (iii) the discourse_context buffer, where new semantic information is encoded (lines 37–45).

The imaginal buffer update (lines 30–36 in (43) simply builds a unary-branching structure \([\!\!_{\text{ VP }}\!\!\!\!\text{ Vi }\!\!]\) and specifies its lexical head and its attachment point in the larger syntactic structure.

The discourse_context buffer update on lines 37–45 is more substantial:

  • we introduce a new DRS and mark it as a sub-DRS of the current main DRS by setting the drs feature to the current =drs_peg (line 43), its embedding_level feature to the current embedding level =el (line 44), and its discourse status to at_issue (line 45);

  • since this DRS is contributed by a verb, we introduce a new event dref and index it with the current event dref peg =ev_peg (line 39);

  • the intransitive verb contributes a predicate =p1 (line 42) that takes two arguments: an event argument that is set to the newly introduced event dref =ev_peg (line 40), and an entity argument that is set to the dref =a1 previously introduced by the subject and stored at the top of the goal-buffer argument stack (line 41).

With the syntactic and semantic updates in place, the goal buffer can be updated accordingly (lines 20–29):

  • since we just introduced a new event dref, we need to update the event dref peg, so the new task is set to move_event_peg (line 22);

  • we have just completely finished parsing the intransitive verb, so we pop the VP category off the goal-buffer stack (line 23) and reset the found and parsed_word features to None (lines 24–25);

  • we also pop the S node off the right-edge stack since this node is not available for future attachments anymore (lines 26–29).

We are now ready to parse a simple example. When the model reads the sentence in (44) below word by word (as in a self-paced reading task), it goes through a cognitive process whose temporal trace is provided in (45). To run the model on the sentence in (44) and obtain the temporal trace, uncomment the relevant line in the file (linked to in Appendix 9.6.3) and run the file in the terminal with the command:

  • python3

The full pyactr output is very detailed, so we edit it down significantly in (45) below to be able to focus on the main steps of the incremental interpretation process.

figure ai

Line 1 of the temporal trace in (45) indicates that the first word, namely the indefinite determiner A, is displayed on the virtual screen from which the model takes its visual input. This word is encoded 25.5 ms later (line 2), after which a retrieval request for its lexical information is placed (line 3, time: 38 ms after the start of the cognitive process). The lexical information is retrieved after \({\approx }140\) ms (line 4), then the word is shifted and projected (line 5), and as a result, our first syntactic structure \([\!\!_{\text{ Det }}\!\!\!\text{ A }\!\!]\) is built in the imaginal buffer (lines 6–7). At this point, the cognitive process branches into two sub-processes running in parallel: (i) a motor sub-process that will culminate in pressing the spacebar to reveal the next work (line 8), and (ii) the continuation of the incremental parsing process triggered by the indefinite determiner A (lines 9–15).

The incremental parsing process continues with the project-NP rule (line 9), which results in the creation of an \([\!_{\text{ NP }}\!\!\!\!\text{ Det }\!\!\text{ N }\!\!]\) syntactic structure in the imaginal buffer (lines 10–11), and the creation of our first DRS in the discourse_context buffer (lines 12–14). Using the familiar DRT format, the DRS can be represented as follows:

figure aj

Given that we just introduced a new dref \(x_{1}\) for entities/individuals, we have to move the entity dref peg to the next position \(x_{2}\) (line 15 in (45)). After this, the incremental interpretation sub-process has nothing left to do, so we wait until the motor sub-process completes and the spacebar is pressed (line 16).

At that point, the next word woman is displayed on the virtual screen (line 17), and we go through the same cycle of encode-retrieve-project rules for the new word (lines 18–31). Specifically, we encode the word (line 18) and place a retrieval request for its lexical information (line 19). The lexical entry for the noun woman is retrieved at \({\approx }530\) ms after the start of the entire cognitive process (lines 20–22). Notably, the noun contributes two predicates: _woman_ and _female_, both of arity 1. The explicit gender specification is useful for pronoun resolution.

The noun is then shifted and projected (line 23) and, as a result, the unary branching structure \([\!\!_{\text{ N }}\!\!\!\text{ woman }\!\!]\) is created in the imaginal buffer (lines 24–25). At this point, the cognitive process branches again into a motor process that will culminate in pressing the spacebar (line 26) and the continuation of the incremental parsing process. Incremental parsing continues with the project-and-complete-N rule (line 27), which results in an update of the DRS contributed by the indefinite determiner A and stored in the discourse_context buffer (line 28). The DRS is updated with the two predicates contributed by the noun woman, and can be represented in the familiar DRT format as follows.

figure ak

The project-S rule is then fired (line 29) and the binary-branching structure \([\!\!_{\text{ S }}\!\!\!\text{ NP }\!\!\!\text{ VP }\!\!]\) is created in the imaginal buffer (lines 30–31). This completes the sequence of incremental interpretation steps triggered by the noun woman. The process then waits for the motor sub-process to complete and the spacebar to be pressed, which happens \({\approx }120\) ms later (line 32).

At this point, the final word smiled is displayed on the virtual screen (line 33). After it is encoded (line 34) and the request for its lexical entry is placed (line 35), we have access to its lexical information in the retrieval buffer (lines 36–37). The most notable aspect of this lexical entry is the arity event_plus_1 of the predicate _smile_ (line 37): this simply means that the predicate _smile_ takes an event argument and, in addition, an entity argument. The intransitive verb is then shifted and projected (line 38), at which point the unary branching syntactic structure \([\!\!_{\text{ Vi }}\!\!\!\text{ smiled }\!\!]\) is built in the imaginal buffer (lines 39–40).

Once again, and for the final time, the cognitive process branches into a motor sub-process that will culminate with a spacebar press (line 41) and a sub-process that continues with the incremental parsing triggered by the intransitive verb smiled. The parsing process continues with the project-and-complete-VP rule (line 42), which simultaneously creates a syntactic structure \([\!\!_{\text{ VP }}\!\!\!\!\text{ Vi }\!\!]\) in the imaginal buffer (lines 43–44) and a DRS in the discourse_context buffer (lines 45–48). This DRS can be represented in the usual DRT format as shown below.

figure al

The parsing process is basically done, so once the event-dref peg is updated (line 49), the "finished" rule ends the entire cognitive process because of the lack of visual input (no more words on the virtual screen). For convenience, we list the syntactic structures built during the parsing process together with their time stamps on lines 54–61 in (45), and the two DRSs with their time stamps on lines 65–71.

The model includes a variety of other rules—for transitive verbs, prepositional verbs, NPs in object position, adjectives etc. They are all available in the file linked to in Appendix 9.6.2. We encourage you to run the model on the variety of sentences in the file (Appendix 9.6.3) and examine the temporal-trace outputs to understand the time-course predictions of the syntax/semantics interpretation process implemented by this model.

9.3.4 Rules for Conjunctions and Anaphora Resolution

We are now ready to model basic bi-clausal examples, specifically, conjunctions. The project-and-complete rule for and is provided in (49) below. The rule starts a new sentence S, that is, the second conjunct, by placing the category S at the top of the goal-buffer stack (line 13), advancing the DRS dref peg (line 12: the task is updated to move_drs_peg), and setting the embedding level for the second conjunct to 1 (line 22).

Incrementing the embedding level ensures that drefs in the second conjunct are not available as antecedents for pronouns in the first conjunct. In general, we use embedding levels to model both the explicit and the implicit aspects of the discourse accessibility relation used in DRT. The fact that discourse referents in the second conjunct cannot serve as antecedents to pronouns in the first conjunct is not explicitly encoded in the accessibility relation defined in Kamp and Reyle (1993), but it is a by-product of the DRS construction algorithm that requires the DRS construction for the second conjunct to take place in the context of the DRS constructed based on the first conjunct. In our model, we use embedding level uniformly to constrain dref accessibility, both in conjunctions and conditionals.

Finally, the and rule creates a (non-heae creates a (non-headed) ternary-branching structure \([\!\!_{\text{ ConjS }}\!\!\!\text{ S }\!\!\!\text{ Conj }\!\!\!\text{ S }\!\!]\) in the imaginal buffer (lines 23–31).

figure am

We need to introduce three more rules, related to pronoun resolution, before we can go through an example. The first rule is the "project: NP ==> PRO" rule in (50) below, which completes the syntax/semantics parsing of pronouns in subject position. Given that a PRO has just been found (line 11) and is available in the retrieval buffer (lines 15–18), we take the usual three types of parsing actions and update the goal, imaginal and discourse_context buffers. In addition, we also add an unresolved DRS to the unresolved_discourse buffer. Let’s discuss them in turn.

figure an

The imaginal buffer update on lines 29–35 in (50) is straightforward: we create the expected unary branching \([\!\!_{\text{ NP }}\!\!\!\!\text{ PRO }\!\!]\) structure.

The DRS created in the discourse_context buffer on lines 36–43 follows the analysis of pronouns in Kamp and Reyle (1993). We introduce a new dref (line 38) and predicate the pronoun gender of it (line 40).

The unresolved_discourse buffer stores a DRS that encodes the unresolved presupposition of the pronoun (lines 44–52):

  • the main predicate =p1 contributed by the pronoun (line 48) is identity (EQUALS), relating the dref introduced by the pronoun (line 46) and an UNKNOWN second dref that needs to be retrieved;

  • the UNKNOWN dref is the antecedent dref that needs to be found to complete the pronoun resolution;

  • the antecedent dref needs to be accessible, which is why we keep track of the embedding level of the pronoun (line 51), and it also needs to satisfy the gender predicate =p2 contributed by the pronoun (line 49);

  • finally, the entire DRS is marked as having an unresolved discourse status (line 52).

The goal buffer is updated in the expected way after parsing an NP in subject position (lines 20–28): the task is updated to move_dref_peg (line 22), the dref introduced by the pronoun is added to the top of the argument stack (line 26), and NP becomes the top goal on the goal-buffer stack (line 23) in preparation for the project-S rule.

After fully parsing a pronoun, the cognitive state satisfies the conditions for attempting to resolve it. There are three different rules for anaphoric pronoun resolution, depending on the embedding level of the pronoun. We only discuss here the rule for embedding level 1, provided in (51) below. The other rules, linked to in Appendix 9.6.2, are very similar. The rule in (51) is triggered only if the unresolved_discourse buffer contains an unresolved DRS (lines 15–21).

Pronoun resolution rules are fired when other, higher-ranked tasks have already been completed (lines 5–11 in 51). Once these higher-ranked tasks are completed, pronoun resolution has high priority (it has a utility of 5—line 33). As we mentioned before, ordering rule firing preferences in this fashion (multiple negative specifications for the task slot & manually setting the utility to a non-default, i.e., non-0, value) is not cognitively realistic, and does not scale up to larger systems of rules. The conditions for these rules, as well as their utility, should emerge as a result of a learning algorithm that leverages both production compilation (for new rule generation) and reinforcement learning (for utility ‘tuning’).

The rule triggers two actions. The main action is a retrieval request for a suitable antecedent for the pronoun (lines 26–32): we need to retrieve a DRS in which a new dref was introduced (line 28) that was not an event dref (line 29: event_arg None ensures that only entities, not events, will be considered as possible antecedents). Furthermore, this DRS should not be indexed with the same DRS dref (line 30): this is a way to enforce Binding Principle B (Chomsky 1981). Finally, the antecedent should have an embedding level higher than 2 (line 31) and an at_issue discourse status (line 32).

The second action is updating the task so that the resolution attempt concludes with this retrieval request (line 25), and we do not keep attempting to retrieve an antecedent again and again.

figure ao

The retrieval request for an antecedent either succeeds or fails. If it succeeds, the rule in (52) below fires. The rule takes the antecedent available in the retrieval buffer (lines 6–10) and the unresolved presupposition in the unresolved_discourse buffer (lines 11–18) and merges information from them into a new DRS added to the discourse_context buffer.

This DRS encodes the resolved pronominal presupposition: it basically takes the unresolved presupposition from the unresolved_discourse buffer and specifies its second, UNKNOWN argument to be the same dref as the dref of the antecedent DRS available in the retrieval buffer (line 27). The discourse status of the resolved presupposition is marked as presupposed (line 32).

With the pronoun resolved, we can flush the unresolved_discourse and retrieval buffers (lines 33–34).

figure ap

If the retrieval request for an antecedent fails, we trigger the rule in (53) below. The rule checks that the unsuccessfully resolved presupposition targets an entity, not an event (line 12): we check that arg2 is UNKNOWN. For events, we will see that the arg1 slot will be marked as UNKNOWN. The retrieval of a suitable entity dref has failed, but we do not simply mark the pronoun as unresolved and move on: we assume that the pronoun is in fact cataphoric, so we set the entity_cataphora feature to True (line 18). This will ensure that when entity drefs will be subsequently introduced in discourse, a cataphoric search will be triggered to check if they could be suitable antecedents for the unresolved pronoun. Finally, the rule flushes the unresolved_discourse and retrieval buffers (lines 20–21).

figure aq

There is another way that pronoun resolution could fail. Recall that the retrieval request for an antecedent placed in (51) above does not constrain the gender of the antecedent, it only constrains its embedding level and DRS peg (in addition to requiring the dref slot to be non-empty and the event_arg slot to be empty). We could, therefore, retrieve an antecedent dref that is suitable with respect to all features, but that does not match in gender. This is very much like the process of retrieving foil sentences in the fan experiment and the corresponding fan model discussed in the previous chapter (Chap. 8).

The "resolution of PRO failed: antecedent with non-matching gender" rule in (54) below takes care of this case: if the retrieved DRS has a certain gender specification =p2 (line 10) and the unresolved pronoun presupposition has a different gender specification =p2 (line 13), we declare the pronoun unresolved and assume that it is a cataphoric pronoun (line 19).

figure ar

To bring all these rules together, let us work through two examples of conjoined discourses, one in which the pronoun resolution succeeds and one in which the resolution fails because of a gender mismatch. The following subsection will discuss cataphora, so we will show an example of a ‘no retrieved antecedent’ resolution failure at that point.

Let us first simulate the example in (55) below. The temporal trace is provided in (56). We omit the parsing steps associated with the first conjunct since they are identical to the ones in (45) above, except for minor random noise in visual, motor and retrieval timings. To obtain the temporal trace, we uncomment the relevant sentence in the file and run the file with the command python3, as we did before.

figure as

The first point at which the temporal trace in (56) differs from the previous one in (45) is the word and. This word is displayed on the virtual screen at 1.06 s after the model starts reading the sentence in (55)—see lines 1–2 in (56).

The word and is encoded, its lexical information is retrieved and then the word is projected, i.e., a unary branching structure \([\!\!_{\text{ Conj }}\!\!\!\text{ and }\!\!]\) is created in the imaginal buffer (lines 7–8). The by-now familiar split into two cognitive sub-processes happens at this point: on one hand, a motor process to press the spacebar is started (line 9), on the other hand, we continue to process the word and. Specifically, the project-and-complete-and rule is fired (line 10), and a ternary branching structure \([\!\!_{\text{ ConjS }}\!\!\!\text{ S }\!\!\!\text{ Conj }\!\!\!\text{ S }\!\!]\) is created in the imaginal buffer (lines 11–13). Furthermore, since a new clause (the second conjunct) is about to start, we update the DRS dref peg from \(d_{1}\) to \(d_{2}\) (line 14).

The space bar is pressed and the next word, namely the pronoun she, appears on the virtual screen (lines 15–16) at time 1.4165 s. We go through the usual encode-retrieve-project-spacebar sequence of rules (lines 17–25), after which the project-NP rule for pronouns is fired (line 26). At that point, three chunks are created in the imaginal, discourse_context and unresolved_discourse buffers. The parse state in the imaginal buffer (lines 27–29) encodes the unary branching \([\!\!_{\text{ NP }}\!\!\!\!\text{ PRO }\!\!]\) structure.

The DRS in the discourse_context buffer (lines 30–33) is the contribution made by the pronoun to at-issue content, and is represented in familiar DRT format as follows.

figure at

The DRS in the unresolved_discourse buffer (lines 34–38) is the contribution made by the pronoun to the unresolved presupposed content, and is represented in familiar DRT format as shown in (58) below. The double contribution of the pronoun to both at-issue and unresolved presupposed content follows the account of presupposition projection as anaphora resolution in van der Sandt (1992) (see also Kamp 2001a).

figure au

After the dref peg is updated to \(x_{3}\) (line 39), we attempt to resolve the pronoun (lines 40–41), which means that a retrieval request is placed for a suitable antecedent. While we wait for the retrieval to complete, the motor module presses the space bar and reveals the final word left on the virtual screen (lines 42–43). At 1.8005 s, we successfully retrieve an antecedent for the pronoun (lines 44–47), namely the DRS contributed by the indefinite NP A woman in the first conjunct. In DRT format, the retrieved DRS is represented as follows.

figure av

Since the antecedent matches in gender, we declare the pronoun successfully resolved (line 48) and add a resolved presupposition DRS to the discourse_context buffer at time 1.8130 s (lines 49–52). This DRS is represented in the usual DRT format as shown below.

figure aw

After successfully resolving the pronoun, the incremental parsing process proceeds as expected. The project-S rule (line 53) is followed by:

  • encoding, retrieving and projecting (‘shifting’) the intransitive verb left (lines 56–62),

  • the project-VP rule and the syntactic structure and the DRS contributed by this rule (lines 63–69),

  • the move-event-dref-peg rule (line 70), and finally,

  • the "finished: no visual input" rule (line 71).

At the end of the simulation, we have seven DRSs in memory, listed on lines 75–99 in (56) above, together with their time stamps. The first two DRSs (lines 75–81) are part of the main DRS \(d_1\) contributed by the first conjunct. The next two DRSs (lines 82–88) are contributed by the pronoun she in the second conjunct. The fifth DRS (lines 89–92) is just the first DRS (contributed by A woman) that is recalled to serve as the antecedent of the pronoun. The sixth DRS (lines 93–96) encodes the resolved presupposition of the pronoun, while the seventh DRS is the one contributed by the intransitive verb left.

Let us turn now to the example in (61) below, where the pronoun resolution fails because an antecedent with a non-matching gender is retrieved.

The only part of the temporal trace that differs from the one in (55) above is provided in (62) below. We see that the DRS contributed by the indefinite A woman is retrieved when we attempt to resolve the pronoun he (lines 5–8). We therefore declare the resolution failed because the antecedent and the pronoun do not match in gender.

As was the case in Chap. 8, the model predicts that recalling the mismatching antecedent should take slightly more time than recalling a match because the mismatching antecedent does not receive any spreading activation from the gender of the pronoun held in one of the buffers at the moment of recall.

At the end of the simulation in (62), we see that only six DRSs are stored in declarative memory (lines 14–34). These are basically the same DRSs as the ones stored after the simulation in (55), except for the seventh DRS that encoded the successfully resolved presupposition of the pronoun.

figure ax

9.3.5 Rules for Conditionals and Cataphora Resolution

We are now ready to discuss conditionals and cataphora resolution.

The project-and-complete rule for conditionals with a sentence-final if-clause needed to model Experiments 1 and 2 above follows the pattern of the project-and-complete rule for and. As shown in (63) below, we start a new sentence S for the conditional antecedent (line 14), we advance the DRS dref peg (line 13), and we mark the embedding level of the conditional antecedent as 1 (line 23), ensuring that pronouns in a matrix clause (with embedding level 0) won’t be able to access drefs introduced by expressions in the conditional antecedent.

Finally, the rule creates a \([\!\!_{\text{ CP }}\!\![\!\!_{\text{ C }}\!\!\text{ if }]\!\!\text{ S }\!\!]\) structure in the imaginal buffer (lines 24–31) that will be Chomsky-adjoined to the previous (matrix clause) S by the next rule we will examine.

figure ay

The project-and-complete rule for sentence final if sets the stage for a sequence of rules reanalyzing the previous (matrix) clause. We already mentioned this when we discussed Experiments 1 and 2 earlier in this chapter: a sentence-final if-clause triggers the reanalysis of the previous matrix clause because, until if is read, the incremental processor assumes the sentence is an unconditionalized assertion, so it has an embedding level of 0. When if is read, the previous clause has to be reanalyzed from a main clause/main assertion to a conditional consequent. Specifically, all the DRSs contributed as part of that clause should have their embedding level changed from 0 to 2.

The "start if-triggered reanalysis" rule in (64) below begins the process of recalling these DRSs for reanalysis. The task is updated to if_reanalysis (line 21), the CP structure created by the project-and-complete-if rule in (63) above is Chomsky-adjoined to the S node of the previous clause, and the structure is encoded in the imaginal buffer (lines 23–29). Most importantly, a retrieval request is placed for a DRS that has an embedding level of 0 and is indexed with the DRS dref peg of the previous sentence (lines 30–33).

figure az

Once the first DRS is recalled for reanalysis, we want to update its embedding level. To do this, we trigger one of two rules:

  • "if-triggered reanalysis (no event recalled)", provided in (65) below, or

  • "if-triggered reanalysis (event recalled)", provided in (66).

We need an ‘event recalled’ version of the rule because we want to capture the Maximize Presupposition effect we observed in Experiment 2; see the discussion of Maximize Presupposition in Sect. 9.1.2 above.

The default version of the rule, i.e., "if-triggered reanalysis (no event recalled)" in (65), is triggered when a DRS has been successfully retrieved and is available in the retrieval buffer (lines 8–21 in (65)). The rule triggers two actions. First, an identical DRS, except with an embedding level of 2, is placed in the discourse_context buffer (lines 25–37). Second, a new retrieval request for a DRS that has an embedding level of 0 and is indexed with the DRS dref peg of the previous sentence is placed (lines 40–43). Crucially, however, if a new DRS is to be retrieved, it should be different from the previous-sentence DRSs that have already been retrieved (lines 38–39)—we use here the FINST (‘fingers of instantiation’) feature we discussed in the previous chapter.

figure ba

The "if-triggered reanalysis (event recalled)" rule in (66) below is very similar to the no-event-recalled rule in (65) above. The only difference is that when a DRS with an event dref is recalled, we keep track of its main predicate =p1 in the goal buffer: this predicate is stored as the value of an if_conseq_pred feature (line 40), i.e., it is indexed as the predicate that was contributed by the conditional consequent.

figure bb

The two rules for if-triggered reanalysis (no-event-recalled and event-recalled) run repeatedly until all the DRSs contributed by the previous sentence are recalled and reanalyzed, i.e., their embedding level gets set to 2. Once there are no more DRSs to be recalled, the if-triggered reanalysis is complete and the "stop if-triggered reanalysis" rule in (67) below is triggered. This rule simply resets the task to reading_word (line 12) and flushes the retrieval, imaginal and discourse_context buffers (lines 15–17).

figure bc

We are almost ready to parse the conditional & cataphora example in (7) above (John won’t eat it if a hamburger is overcooked, Elbourne 2009), we only need to introduce two kinds of rules related to cataphoric search.

First, there are three rules that trigger a cataphoric search for an antecedent, depending on the embedding level of the potential antecedent. We only discuss the ‘embedding level 1’ rule, provided in (68) below. The other rules (linked to in Appendix 9.6.2) are similar.

figure bd

Just as the anaphoric search rules, the cataphoric search rules are ‘elsewhere’ rules: they have a set of negative constraints for the current task (lines 5–11 in (68)), and a high utility (line 46). The rule is triggered if the entity_cataphora feature is set to True (line 13) and the found feature is set to None (line 12). Consequently, the rule cannot immediately follow a failed anaphoric search because such a search sets the found feature to no_antecedent. Most importantly, a cataphoric search is triggered only if a suitable antecedent is available in the discourse_context buffer (lines 16–24).

If these conditions are met, a cataphoric search is triggered. Cataphoric searches are mirror images of anaphoric searches. For cataphora, we have a potential antecedent in place, and place a retrieval request for an unresolved presupposition DRS that could be resolved by the antecedent. In contrast, for anaphora, we have an unresolved presupposition and we place a retrieval request for an antecedent that could resolve it.

The "attempting to resolve cataphoric pronoun" rule triggers three actions. The most important one is placing a retrieval request for an unresolved presupposition (lines 37–45 in (68)). We specify that no new drefs should be introduced in this unresolved DRS (line 39), that the arg1 and pred1 slots should be non-empty (lines 40 and 42), that the arg2 slot should be set to UNKNOWN (line 41), that the DRS should be part of a main DRS that is different than the one that the potential antecedent belongs to (line 43), and that the embedding level of the unresolved presupposition should not be 0 (line 44) since the potential antecedent has an embedding level of 1.

The second action is to store information about the current potential antecedent in the unresolved_discourse buffer (lines 29–36). This is not strictly necessary since the information will be maintained in the discourse_context buffer, but we do it here just to show how specific buffers can be used to safeguard information that might otherwise be flushed by subsequent rules. And involving the unresolved_discourse buffer in the process of cataphoric search is a natural choice. We save the dref information (line 31), the gender predicate =p2 (line 33), the DRS peg (line 34) and the embedding level of 1 (line 35).

The third and final action is to update the goal buffer (lines 26–28) so that the resolution attempt is stopped with this one retrieval request and does not enter a loop. We therefore update the task to stop_resolution_attempt_PRO.

The retrieval request, i.e., the cataphoric resolution attempt, can either succeed or fail. If the attempt fails, the same failure rules as for anaphoric attempts are triggered—see (53) and (54) above.

But if the cataphoric search succeeds, the "resolution of cataphoric PRO succeeded" rule in (69) below is triggered. This rule adds a DRS to the discourse_context buffer that resolves the retrieved unresolved presupposition to the currently available antecedent (lines 25–33). The DRS has a presupposed discourse status (line 33), and is otherwise identical to the DRS contributed by the "resolution of PRO succeeded" rule in (52) above.

figure be

We can now see how the model parses the conditional + cataphora example in (70) below, repeated from (7) above. The temporal trace is provided in (71). We omit all the output that is not directly relevant to incremental semantic interpretation.

figure bf

The first word, namely the proper name John, contributes the first DRS to the discourse_context buffer—see lines 3–6 in (71) above. This DRS can be represented in the familiar DRT format as shown below.

figure bg

Providing an analysis of negation is outside the scope of this model, so the negated auxiliary won’t contributes a ‘placeholder’ DRS with no drefs or arguments and a predicate NOT that simply marks that the DRS was contributed by a form of sentential negation (lines 10–11).

The transitive verb eat contributes the DRS below (see lines 14–16 in (71)):

figure bh

The pronoun it further specifies the DRS introduced by the verb eat (not shown in the temporal trace) and introduces its own DRSs (line 19–25):

figure bi

An attempt to resolve the pronoun it (lines 26–27) ends in failure (lines 29–31), since there is no suitable antecedent for it. The parsing process then moves on to the complementizer if (lines 32–33), which triggers the reanalysis of the previous clause from a main assertion to a conditional consequent. This means recalling all the four DRSs contributed by the previous clause and creating four new DRSs with the same content, except that the embedding level is set to 2 (conditional consequent) instead of 0 (main assertion).

The first DRS that gets recalled as part of the if-triggered reanalysis is the at-issue DRS in (75) above contributed by the pronoun it (lines 39–41). The reanalysis of this DRS (which sets the embedding level to 2) creates a new DRS in the discourse_context buffer (lines 44–46), shown below.

figure bj

The second DRS retrieved as part of if-reanalysis is the unresolved-presupposition DRS contributed by the pronoun (lines 47–50). We see that recency is the most important factor for DRS activation in the if-reanalysis process. A new, reanalyzed unresolved-presupposition DRS is added to the discourse_context buffer (lines 53–56), which can be represented as follows.

figure bk

The third DRS retrieved during the if-reanalysis process is the one contributed by the transitive verb eat (lines 57–59). Because this DRS introduces an event, it triggers the ‘event recalled’ version of the "if-triggered reanalysis" rule, which does do things. On one hand, it adds the verbal predicate to the goal buffer as the value of the if_conseq_pred feature. On the other hand, it resets the embedding level of the recalled DRS to 2 and adds the modified DRS to the discourse_context buffer (lines 62–65). This DRS can be represented in DRT format as follows.

figure bl

After this, the ‘dummy’ negative DRS contributed by the negated auxiliary won’t is retrieved (lines 66–67) and reanalyzed (lines 68–71). Finally, the DRS contributed by the proper name John is retrieved (lines 72–75) and reanalyzed (lines 76–81). The new DRS, which is the final one contributed by the if-reanalysis process, can be represented as shown below.

figure bm

At this point, the reanalysis process concludes and we continue with the incremental parsing process—specifically, the interpretation of the indefinite article a (in ...a hamburger is ...). We should note here that this process is cognitively unrealistic: it predicts that we need about 1.2 s to process sentence-final if, because the process of retrieving all these DRSs and reanalyzing them is very time consuming. We implement it here just to show that this type of process can be modeled in our framework, and leave a more appropriate model for a future occasion.Footnote 5

The next DRS is contributed by the indefinite NP a hamburger in the if-clause (lines 84–88), and can be represented as follows.

figure bn

The presence of this new DRS in the discourse_context buffer and the fact that the entity_cataphora feature in the goal buffer is turned on after the unsuccessful resolution of the pronoun it trigger an attempt to resolve the cataphoric pronoun (lines 89–90). The rule places the relevant information about the potential antecedent a hamburger in the unresolved_discourse buffer (lines 91–93) and places a retrieval request for an unresolved presupposition contributed by a pronoun. This request is successfully completed, and as a result, the unresolved DRS contributed by it is available in the retrieval buffer (lines 95–98).

The resolution of the cataphoric pronoun is declared a success (line 99–100) and the resolved presupposition, provided below for convenience, is added to the discourse_context buffer (lines 101–104).

figure bo

The copula is and the adjectival participle overcooked contribute a final DRS (lines 107–109) to the discourse_context buffer:

figure bp

9.4 Modeling the Interaction of Conditionals and Cataphoric Presuppositions

We are now ready to move to the somewhat more complex case of event anaphora/cataphora associated with the adverb again. We first introduce the rules for the syntax and semantics of again and for the process of presupposition resolution for event anaphora (Sect. 9.4.1). We then discuss one way of capturing the ‘maximize presupposition’ effect we saw in Experiment 2 above (Sect. 9.4.2). Finally, we discuss the results of fitting the model to part of the Experiment 2 data (Sect. 9.4.3).

9.4.1 Rules for ‘Again’ and Presupposition Resolution

The resolution of event anaphora/cataphora contributed by the adverb again follows the same pattern as the resolution of entity anaphora/cataphora contributed by pronouns. The main difference is in the syntax of again: again is an adverb/adjunct, and given the optionality of adjuncts, they basically need to be parsed bottom-up.Footnote 6 That is, the syntactic attachment of adjuncts requires a form of syntactic reanalysis.

Consider again the conditional + event cataphora example in (16d) above, repeated in (84) below:

figure bq

The matrix clause ends with the anaphoric adverb again. To parse it, we need to attach it to the VP will argue with Danielle that has already been completely parsed and closed by the time we encounter again. To build the appropriate syntactic structure, we would therefore need to recall both the VP node and the higher S node so that the adverb again can be attached intermediately between them. For simplicity, we will simply recall the higher S node and add the adverb again as a third daughter, as shown by the two rules in (85) and (86) below.

figure br

In addition to reanalyzing the S node, rule (86) places another retrieval request for the event contributed by the verb modified by the adverb again. This event is needed for the semantics of again. The fact that we need two separate retrieval requests, one on the syntax side for the S node and one on the semantics side for the event, is an artifact of our setup for syntactic and semantic parsing.

For expository simplicity, the semantic information constructed during incremental interpretation is assembled in chunks and buffers that are separate from the chunks and buffers where syntactic information is constructed. This enabled us to import the left-corner syntax parser we introduced in Chap. 4 basically as-is, and we were able to focus on the semantic aspects of interpretation in this and the previous chapter (Chap. 8 and this chapter) without worrying about a tighter integration of the syntactic and semantic aspects of parsing.

As we investigate more complex structures and their interpretation, these simplifying assumptions are likely to come into focus and require revision. It is possible that the structures constructed during the incremental interpretation process integrate phonetics/phonology, syntax and semantics in a tighter way, for example, along the lines of the linguistic representations countenanced by HPSG or CG.

Once the event contributed by the modified verb is available in the retrieval buffer, we are able to encode the unresolved presupposition contributed by again. This is accomplished by the rule in (87) below. The unresolved presupposition DRS contributed by again on lines 22–30 is largely parallel to the unresolved presupposition DRS contributed by pronouns. There are three main differences: (i) arg1 (not arg2) is marked as UNKNOWN for again (line 24), (ii) the first predicate is (temporally) PRECEDES (not EQUALS; line 26), and (iii) the second predicate is the verbal predicate contributed by the modified verb (not gender; line 27).

figure bs

Once the again presupposition is encoded, we can attempt to resolve it. There are three different rules for the three different embedding levels of the presupposition. We provide only the rule for embedding level 1 in (88) below, for ease of comparison with the pronoun rule. The rule has the same structure, the main difference is that we now require the potential antecedent to have a non-empty event argument (lines 31–32).

figure bt

If the resolution succeeds, we add the resolved presupposition to the discourse_context buffer with the "resolution of AGAIN succeeded" rule in (89) below.

figure bu

Just as for pronouns, the resolution of the again presupposition can fail, either (i) because no suitable antecedent is retrieved or (ii) because an antecedent is retrieved, but the retrieved verbal predicate is different from the verbal predicate of the unresolved again presupposition. These two cases are handled by the rules in (90) and (91) below, respectively.

In both cases, we turn the event_cataphora feature on when the retrieval of a suitable antecedent event fails. That is, we do not simply mark the presupposition resolution process as a failure and end it. Instead, we assume that the again presupposition is cataphoric.

figure bv

Once the event presupposition contributed by again is marked as cataphoric and a suitable antecedent is available in the discourse_context buffer, we start a cataphoric search, just as we did for pronouns. Once again, there are three rules for cataphoric search depending on the embedding level of the potential event antecedent. We provide only the embedding level 1 rule in (92) below.

figure bw

If the resolution of cataphoric again succeeds, we add the resolved presupposition to the discourse_context buffer with the rule in (93) below.

figure bx

If the resolution of cataphoric again fails because the antecedent does not match the verbal predicate, the rule in (94) below is triggered. If there is no suitable antecedent, the general rule in (90) above, which applies to both anaphoric and cataphoric searches, is triggered.

figure by

We are now ready to see how all these rules get deployed during the incremental interpretation of the conditional & event cataphora example in (95) below, repeated from above. We stop the parsing immediately after the preposition with in the if-clause since this is when event cataphora is resolved.

figure bz

In (96), we only list the processing steps that are most relevant to conditionals and cataphora resolution. After the proper name Jeffrey, the prepositional verb (will) argue with and the proper name Danielle contribute DRSs to the discourse context (lines 1–21), we start parsing the adverb again. The event DRS contributed by (will) argue with is recalled (lines 29–32), and the again presupposition is encoded in the unresolved_discourse buffer (lines 35–38). An attempt to anaphorically resolve this presupposition fails (lines 39–43), after which the process of if-triggered reanalysis begins, which contributes four new DRSs to the discourse context (lines 46–66): these are the four DRSs contributed by Jeffrey, (will) argue with, Danielle and again, except their embedding level is set to 2 (conditional consequent) instead of 0 (main assertion).

The pronoun he is then parsed and correctly resolved to the dref contributed by the proper name Jeffrey (lines 67–86). Then, as soon as the verb argued is parsed (lines 87–91), a cataphoric search attempting to resolve again is started (lines 92–96). The search is successfully completed (lines 98–101), so the again-contributed presupposition is resolved (lines 102–107).

The simulation ends with the time taken to read the preposition with, reported on line 109. This time crucially includes the again cataphoric search, so it can be used to model the Experiment 2 data for this ROI. We see that the time taken to read the preposition is about 360 ms, which is reasonable. In the next (Sect. 9.4.2), we will see that this time increases under specific conditions; and in the final (Sect. 9.4.3), we will fit the predicted RTs for this region to the Experiment 2 data.

9.4.2 Rules for ‘Maximize Presupposition’

In Sect. 9.1.2 above, we noted that conditionals with matching VP meanings and no presuppositional again, like the one in (97) below (repeated from above), were significantly slower than conjunctions with matching meanings or conditionals with mismatching meanings.

figure ca

We conjectured that the processing difficulty associated with these conditionals was an effect of the Maximize Presupposition principle (Heim 1991), which requires that a presupposed VP meaning should be marked as such by again. This penalizes conditionals with matching VP meanings, while conditionals with non-identical VP meanings and coordinations should not be affected.

While Maximize Presupposition is commonly used as an explanatory principle in the formal semantics literature, there is no received way to formalize it and no explicit conjectures about the way it could become part of a mechanistic processing model of natural language interpretation.

In this section, we propose a tentative model of Maximize Presupposition processing as noise/error correction. Specifically, we conjecture that upon encountering the verb argued in the if-clause of example (97) above, the human processor pauses to consider whether it has erroneously failed to activate the event_cataphora feature in the goal buffer.

That is, given that the if-clause could satisfy a presuppositional again in the matrix clause, which would not violate Maximize Presupposition, the human processor hypothesizes that such an unresolved again presupposition might actually be present in declarative memory, but the event_cataphora feature in the goal buffer has erroneously failed to encode it.

Consequently, a search for an unresolved again presupposition is initialized, which adds extra reading time. In sum, the processing difficulty associated with a Maximize Presupposition violation is attributed to an extra retrieval request meant to check whether a goal feature was erroneously encoded.

This account is implemented in our mechanistic processing model by means of several rules. First, when the event DRS is recalled during the if-reanalysis process, its verbal predicate is stored in the goal buffer as the value of the if_conseq_pred feature (see the rule in (66) above).

Assuming the presence of such a feature, the rule in (98) below is triggered when we parse the matching verb argued in the if-clause. Specifically, the if_conseq_pred features has to be non-empty (line 18), and the event_cataphora feature should be turned off (+ +True; line 20). If that is the case, the usual actions associated with a prepositional verb are triggered (lines 25–51). Crucially, we also place a retrieval request for an unresolved event presupposition that we might have mistakenly failed to encode in the event_cataphora feature because of communication noise, comprehension noise, encoding noise etc. (lines 52–61).

figure cb

In our example (97), this search for an unencoded again fails, and triggers the rule in (99) below.

figure cc

We can see all these rules in action, as well as their consequences for reading times, in (101) below.

figure cd

The simulation in (101) proceeds as expected until we reach the crucial point, namely the project-and-complete-VP rule on lines (62–63). This rule triggers a memory search for an unencoded again presupposition, which takes place while the preposition with is being read (line 67). The search fails (lines 68–69), but the extra time needed for such a failed search can be seen in the higher reading time associated with the preposition, which is now almost 400 ms (line 71 in (101)).

9.4.3 Fitting the Model to the Experiment 2 Data

We are now ready to fit the model to part of the Experiment 2 data. Specifically, we will focus on the four match conditions in ((102a)–102d) below, and the two mismatch & cataphora conditions in (102e102f).

figure ce

We do not attempt to model the remaining two conditions of Experiment 2 because our model does not really have to say anything about the mismatch & nothing, i.e., no-cataphora, cases. In fact, our model is not designed to capture the and & cataphora cases either, i.e., (102c) (match) or (102e) (mismatch), but we include them here for completeness.

Our model is set up to capture the if-conditions in (102b), (102d) and (102f), and we will focus on these conditions for most of our discussion in this subsection.

The mean RTs for the 6 conditions in (102) obtained in Experiment 2 (averaging over both subjects and items) are, in order: 364.05, 429.01, 390.13, 378.83, 374.28, and 387.72. The file (linked to in Appendix 9.6.4) lists these 6 conditions and the corresponding mean RTs, and provides the code for the Bayesian model that fits our incremental interpreter to data.

The Bayesian model is particularly simple: we only estimate the latency exponent and keep the other parameters fixed (see Appendix 9.6.1 for the exact values). The prior for the latency exponent is a half-Normal distribution—line 4 in (103) below. The ACT-R model we have introduced provides the likelihood function (lines 6–8). We sample NDRAWS (=1500) values from the posterior (lines 10–11).

figure cf

The posterior estimates for the latency exponent and the 6 mean RTs are provided in (104) below and are plotted in Fig. 9.3.

figure cg

Note that the Rhat values for this model are practically 1:

figure ch
Fig. 9.3
figure 3

Parser model: observed versus predicted RT

We see that the model captures the conditional & cataphora conditions—both match, \(\mu _3\), and mismatch, \(\mu _5\)—very well. This is a consequence of the spreading activation from the discourse_context buffer, which boosts the activation of the correct antecedent for the match condition \(\mu _3\), but which has no effect for the mismatch condition \(\mu _5\). The explanatory processing mechanism used here is spreading activation, i.e., the influence of the cognitive context on memory retrieval latency (and accuracy). This is the same explanatory mechanism as the one we used in the previous chapter when we captured the difference in latency between the semantic evaluation of sentences with varying fans.

The conjunction conditions are captured reasonably well, particularly the control condition \(\mu _0\) and the conjunction & cataphora & match condition \(\mu _2\). The model does not make any distinction between the two conjunction & cataphora conditions \(\mu _2\) (match) and \(\mu _4\) (mismatch): both retrieval requests proceed and fail the same way. Unfortunately, we overestimate the time required for retrieval failure, particularly for the mismatch \(\mu _4\) condition, where the observed value is a low 374.28 ms.

It might be that, by the time the human participants in the experiment read the preposition following the second finite verb, they realize that the stimulus overall is hopeless, and they give up on deeper processing rules like attempting a cataphoric search. This would explain why the observed RT for the \(\mu _4\) condition is fairly close to the control (conjunction) condition \(\mu _0\). One way to implement this in our model would be to have a rule that turns off the event_cataphora feature for overly difficult/incoherent conditions like \(\mu _4\). Firing this extra rule would add about 10 ms relative to the control condition \(\mu _0\), which would be almost exactly right.

Our attempt to capture the ‘maximize presupposition’ effect in the \(\mu _1\) condition provides a good qualitative fit, but quantitatively, the effect is greatly underestimated: the estimated mean is 385.25 ms, while the observed mean is 429.01 ms. Clearly, a failed attempt to retrieve an unencoded again from declarative memory is not sufficient to capture the processing effects of the semantic-pragmatic reasoning involved in ascertaining a failure to ‘maximize presupposition’.

But the extra retrieval request and its failure are sufficient to capture the qualitative pattern. The estimated mean for \(\mu _1\) (conditionals with matching predicates and no cataphora) is greater than the estimated mean for the control condition \(\mu _0\) (conjunctions with matching predicates and no cataphora). It is also greater than the estimated mean for \(\mu _3\) (conditionals with matching predicates and cataphora).

Similarly, we capture the qualitative pattern involving the two cataphora & match cases: the estimated mean for \(\mu _2\) (conjunctions), where the attempt to resolve the again cataphora fails, is higher than the estimated mean for \(\mu _3\) (conditionals), where the attempt to resolve the again cataphora succeeds.

However, we do not capture the difference between conditionals and conjunctions in the cataphora & mismatch cases, i.e., \(\mu _4\) versus \(\mu _5\). We predict that conjunctions (estimated mean: 407.43 ms) take longer than conditionals (estimated mean: 387.72 ms). The observed values exhibit the opposite pattern: 374.28 and 387.72 ms, respectively. Once again, the observation we made above about conjunction & cataphora & mismatch cases could resolve this issue: if these conditions are too hard and human participants never even start a cataphoric search process, we expect the predicted pattern to be reversed.

We leave further developments of this model for future work. But given the relatively poor data fit exhibited by our complex model, it is reasonable to ask if going for a simpler, but less explanatory model, for example, a linear model of some sort, is not a better way to proceed.

We believe that the independently-motivated commitments we made to specific (i) formal semantics representations and theories, (ii) cognitive-architectural organization principles and constraints and (iii) language processing theories and models should not be abandoned in future iterations of this modeling endeavor. As Neal (1996) puts it:

“Sometimes a simple model will outperform a more complex model [...] [But] deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex. Instead, if a simple model is found that outperforms some particular complex model, the appropriate response is to define a different complex model that captures whatever aspect of the problem led to the simple model performing well.” (Neal 1996, 103–104)

9.5 Conclusion

In this chapter, we built the first (to our knowledge) formally and computationally explicit mechanistic model of active anaphoric/cataphoric search for an antecedent. This model integrates rich semantic representations (independently motivated in the formal semantics literature) and processing mechanisms (independently motivated in the psycholinguistics literature) into a wide-coverage cognitive architecture (ACT-R).

In the spirit of van der Sandt (1992), Kamp (2001a, b), the model analyzes anaphora and presupposition as fundamentally processing-level phenomena that guide and constrain the cognitive process of integration, or linking, of new and old semantic information. Anaphora and presupposition have semantic effects, but they are not exclusively, or even primarily, semantics.

There are many open questions left for future research about (i) the exact nature of the semantic representations deployed in these models, (ii) the fine details of the processing mechanisms, (iii) how exactly these representations and processes should be integrated into a general, independently-constrained cognitive architecture, and (iv) the exact division of labor between semantics and processing for the analysis of anaphora and presupposition. We hope the modeling endeavor pursued in this chapter provides a framework for formulating these questions in a precise way, and for mounting a systematic search for answers.

We think that the most important lesson to be drawn from the extensive and detailed modeling attempt in this chapter is methodological. Computationally explicit mechanistic processing models that can be fit to experimental data are crucial when working at the interface between theoretical linguistics and experimental psycholinguistics in general, and at the semantics-psycholinguistics interface in particular.

Our argument for this is as follows. We started the chapter with a general question about the processing of semantic representations (is it incremental and predictive?). To shed light on this question, we collected a reasonably rich amount of real-time experimental data, we analyzed the data with standard methods (mixed-effects linear models) and we informally stated an account that linked the theoretical question and the experimental data in a reasonably adequate way.

However, our ACT-R model, which explicitly formalized the proposed account, was able to quantitatively capture some, but crucially not all the data. This partial quantitative failure of our detailed, computationally-explicit cognitive model is enlightening: it opens up a variety of specific questions for future research that would have been missed had we stayed at the informal level of our initial account.

Specifically, it turns out that for the informal account to really work, we need auxiliary hypotheses that are both crucial for and completely glossed over by that informal account. Looking carefully only at the experimental data (or the results of standard statistical methods applied to that data), or only at the formal semantic representations, or even at both, but separately, is not enough. We need to be formally and computationally explicit about how we link them via mechanistically-specified processing models.

This lesson is not new by any means. In fact, it is very familiar to generative linguists when it comes to formulating competence-level theories:

“Precisely constructed models for linguistic structure can play an important role, both negative and positive, in the process of discovery itself. By pushing a precise but inadequate formulation to an unacceptable conclusion, we can often expose the exact source of this inadequacy and, consequently, gain a deeper understanding of the linguistic data. More positively, a formalized theory may automatically provide solutions for many problems other than those for which it was explicitly designed. Obscure and intuition-bound notions can neither lead to absurd conclusions nor provide new and correct ones, and hence they fail to be useful in two important respects. [We need] to recognize the productive potential in the method of rigorously stating a proposed theory and applying it strictly to linguistic material with no attempt to avoid unacceptable conclusions by ad hoc adjustments or loose formulation.” (Chomsky 1957, p. 5)

Computational modeling of cognitive phenomena has become increasingly central to cognitive science for fundamentally the same reason. As [Lewandowsky and Farrell 2010, p. 9] put it: “[e]ven intuitively attractive notions may fail to provide the desired explanation for behavior once subjected to the rigorous analysis required by a computational model.”

We take the explicit focus on computationally-specified mechanistic processing models that are both (i) theoretically informed and (ii) quantitatively fit to experimental data in a statistically informed and thoughtful way to be a distinguishing feature of research at the semantics-psycholinguistics interface. We distinguish this kind of work from experimentally-informed semantics, as well as from semantically-informed psycholinguistics.

In our view, work in experimentally-informed semantics engages primarily with semantic theories using empirical investigation methodologies (mostly offline: forced choice, acceptability etc.) that have become standard in psycholinguistics. But the semantic theories are connected to the experimental measurements of linguistic behavior only implicitly and/or informally, hence weakly. In addition, designing properly powered experiments in semantics and pragmatics, where the effects are usually subtle and difficult to detect, is non-trivial,Footnote 7 which makes the presumed links between theory and experimental data even more tenuous.

Similarly, work in semantically-informed psycholinguistics engages only in informal ways with formal semantics theories. While insightful, this work falls short of the standard of systematicity and formalization that permeates work in formal semantics, and does not engage in substantial ways with formal semantics frameworks and systems as a whole.

In sum, taking real-time experimental data and explicit computational modeling seriously opens up exciting new directions of research in formal semantics, and new ways of (re)connecting formal semantics and the broader field of cognitive science. This is the reason for the copious amounts of code and behavioral data introduced in this chapter, and in the book overall.

The specifics of the code and models we included in the book will likely become obsolete in the near future, just as many of our other auxiliary assumptions will. That is perfectly fine: their main purpose is to get the larger project off the ground and demonstrate its feasibility. Ultimately, our intention was to argue for a new range of theoretical and empirical goals for semantics, introduce an appropriate research workflow, and help semanticists and psycholinguists start using it.