Anaphora and negation

A Correction to this article was published on 26 October 2020

This article has been updated


One of the central questions of discourse dynamics is when an anaphoric pronoun is licensed. This paper addresses this question as it pertains to the complex data involving anaphora and negation. It is commonly held that negation blocks anaphoric potential, for example, we cannot say “Bill doesn’t have a car. It is black”. However, there are many exceptions to this generalization. This paper examines a variety of types of discourses in which anaphora on indefinites under the scope of negation is felicitous. These cases are not just of intrinsic interest, but I argue present serious problems for the dynamic semantic framework, which builds the licensing facts into the semantics. I argue in favor of adopting a dynamic pragmatics, a theory that explains context change through general Gricean principles, and combining it with a static, d-type theory of anaphora, in which pronouns go proxy for definite descriptions.

Change history


  1. 1.

    Examples from Karttunen (1976, 4).

  2. 2.

    E.g. See Kamp (1981) and Kamp and Reyle (1993) on Discourse Representation Theory, Heim (1982) on File Change Semantics, Groenendijk and Stokhof (1991) on Dynamic Predicate Logic, and Muskens (1991), Chierchia (1995), Groenendijk et al. (1996), Beaver (2001) for different versions of update semantics.

  3. 3.

    E.g. See Evans (1977), Cooper (1979), Davies (1981), Heim (1990), Neale (1990), Elbourne (2005, (2013). D-type pronouns are also sometimes called e-type pronouns.

  4. 4.

    But see also King (1987, (1991, (1994) for another static option, Context Dependent Quantifier (CDQ) theory.

  5. 5.

    Geurts (1999, 188).

  6. 6.

    This kind of example was pointed out to me by Maria Bittner (p.c.).

  7. 7.

    Examples of this kind are found in Chierchia (1995) and Elbourne (2005).

  8. 8.

    It is helpful to look at some specific examples of clauses for negation in dynamic semantics. For example, in Beaver (2001)’s system ABLE (ch.7), the clause for negation is as follows:

    (11) \(\llbracket \lnot \phi \rrbracket = \lambda I \lambda J [\exists K I \downarrow \llbracket \phi \rrbracket K \wedge J = I \backslash K]\)

    In ABLE, an information state tracks assignment function/world pairs. Negation is a function from an input information state I to an output information state J, such that there is some intermediate information state K, where K is the output of updating I with the anaphoric closure of \(\llbracket \phi \rrbracket \). Updating I with the anaphoric closure of \(\llbracket \phi \rrbracket \) involves there being some further state L which is the result of updating I with \(\llbracket \phi \rrbracket \), and the output, in this case K, is the subset of the assignment function/world pairs in I which have extensions in L, that is, where an extension of an assignment function g is one that is the same as g except that it has a larger domain (i.e. it assigns values to more indices). In this sense, we see here that anaphoric connections are allowed within \(\phi \), in particular in the update of I to L, the intermediate state in calculating the anaphoric closure of \(\phi \), but disallowed outside of it (once we get to the anaphoric closure).

    Groenendijk and Stokhof (1991)’s Dynamic Predicate Logic (DPL) is only concerned with tracking anaphoric information, so contexts only track assignment functions. In DPL, the clause for negation is:

    (12) \( \llbracket \lnot \phi \rrbracket = \{\langle g,h \rangle | h = g \& \lnot \exists k: \langle h, k \rangle \in \llbracket \phi \rrbracket \}\)

    The criteria that the input assignment function g has to be identical to the output assignment function h is what makes negation externally static. In other words, no changes can be made to assignment functions outside the scope of negation, which is what represents that no discourse referents are introduced into the global context. Thus external infelicity is predicted. To calculate a negation, the material in the scope of the negation, however, is processed as a whole, including any anaphoric relations within it. (Heim (1982)’s definition of negation works similarly).

    Chierchia (1995)’s Dynamic Intensional Logic (DIL) also predicts external infelicity and internal felicity, since the clause for negation is: \({\underline{\lnot }}\)A = \(\uparrow \lnot \downarrow \) A (where the underlined negation sign is dynamic negation and the other is regular static negation). The down arrow is here an assertion operator—it takes a CCP to a static proposition—and the up arrow is the opposite—it maps a static proposition to its corresponding CCP. So A is calculated normally (with all the dynamic CCPs it may contain inside), but the negation blocks its context change potential by taking only its static content, negating it, and turning the result into a test (its corresponding CCP) on the context.

  9. 9.

    Quantificational subordination is the phenomena by which anaphoric relationships hold across two quantificational sentences; that is, the indefinite antecedent is under the scope of one quantifier and the pronoun under the scope of another. For example, “Harvey courts a woman at every convention. She always comes to the banquet with him”. (Example from Brasoveanu (2010)).

  10. 10.

    Classic examples of first generation dynamic semantics includes Kamp (1981), Heim (1982), Groenendijk and Stokhof (1991). Second generation dynamic semantics and beyond is too prolific to include in its entirety, but some relevant references are van den Berg (1996), Brasoveanu (2010), and Keshet (2018).

  11. 11.

    Kurafuji (1998, (1999) argues that there is evidence for a dynamically bound variable/d-type distinction in third-person Japanese pronouns. Elbourne (2005) (pgs. 26–31) argues persuasively that the data does not support this ambiguity.

  12. 12.

    They do propose that whatever the rules are, they obey a monotonicity constraint, namely that \(\lnot \phi \) is interpreted dynamically only if \(\phi \) is downward monotonic (so that every step in a discourse is upward monotonic in the sense that we never lose truth conditional information as updates occur). But even if this is right, this only provides a necessary condition.

  13. 13.

    Since it doesn’t matter for present purposes, I am glossing over the details of the accounts, including important differences between them, such as whether modal subordination is accounted for by antecedent accommodation (Roberts) or anaphorically (Frank & Kamp). Technically, on Roberts’s view, the pronoun is not anaphoric on the original indefinite, but on the indefinite in the accommodated material, but again, nothing rests on such details for the purposes of my argument.

  14. 14.

    The same goes for (5). Note that (18) is fine with ‘because’ instead of ‘and’.

  15. 15.

    (Geurts 1999, 189). Geurts has other examples, but they are problematic because the modal sentence is in the indicative mood, which Frank and Kamp (1997)’s theory rules out.

  16. 16.

    Global accommodation is when the necessary adjustment (such as the introduction of a discourse referent) is added to the global (rather than local) context.

  17. 17.

    See Gauker (2008) for more arguments against accommodation.

  18. 18.

    See e.g. Heim (1982, (1990), Elbourne (2005).

  19. 19.

    The definition of affirmative embedding is a little hard to apply here, because (1) Evans is only thinking about sentences rather than discourses when he defines affirmative embedding and (2) he is thinking of pronouns as referring expressions, the reference being fixed by a definite description. The definition is as follows: Let \(\Sigma \)(\(\sigma \),\(\sigma '\)) be a sentence embedding an existential sentence, \(\sigma \), and a sentence \(\sigma '\) that contains a pronoun anaphoric on the indefinite in \(\sigma \). \(\sigma \) is affirmatively embedded in \(\Sigma \) relative to \(\sigma '\) iff when the truth of \(\Sigma \) turns on the truth or falsity of \(\sigma '\), \(\sigma \) is true. Intuitively, the idea is that whenever a sentence’s truth turns on the truth or falsity of the pronominal sentence contained in it, there is something that the pronoun refers to. We can extend this idea to discourses by thinking of the discourses as the conjunction of the sentences within them, and we could tweak the definition to better suit d-type theories so that the requirement is that there is a unique object that satisfies the description.

  20. 20.

    King (1994) claims something similar with respect to his CDQ theory, “that an occurrence of a quantifier in a sentence must be existentially positive to support subsequent (simple) anaphora in another sentence”(p. 229), where a quantifier being existentially positive means it is not non-existence entailing, i.e. (Dx:Fx)Gx does not entail that the intersection of F and G is empty.

  21. 21.

    Elbourne (2005, (2013) is silent on these cases; his theory runs into the same problems, but he could appeal to the same considerations as Heim.

  22. 22.

    See Lewis (2012, (2014).

  23. 23.

    See Roberts (2004, (2012) for more on questions under discussion. The set of questions under discussion likely has a hierarchical structure, but that is not important for present purposes, so here I treat it as a simple set.

  24. 24.

    Formally the discourse referents are modeled as indices and the information associated with them as partial assignment functions; see “Appendix A” for more details.

  25. 25.

    See Lewis (2012, (2014, (2017) for detailed arguments in support of this point.

  26. 26.

    Roberts (2003), 294–5.

  27. 27.

    This stands in contrast to a view like that of Jankovic (2014), who argues that communicative intentions are shared intentions.

  28. 28.

    One thing I have not mentioned here is that plans typically have a hierarchical structure, which general intentions embedding more specific intentions. It could be that the best way to make sense of discourse intentions to introduce a new discourse referent is to think of them as embedded in a more general communicative intention regarding the proposition expressed.

  29. 29.

    The difference is captured at the level of the propositional content only if we have Russellian or structured propositions. Nevertheless, even a system with unstructured propositions can distinguish between sentences containing singular denoting phrases and those which do not.

  30. 30.

    An anonymous reviewer raised the point that I am excluding other potential alternatives such as ‘this woman’ (used in its specific indefinite sense) and ‘a woman I haven’t mentioned yet’. To this I can add others, like ‘one or more women’, ‘at least one woman’, ‘some woman’. If, for example, the alternatives include ‘a woman I haven’t mentioned yet’, then this should equally implicate non-novelty. (This is a version of the symmetry problem.) I take it the alternatives should be constrained to denoting phrases that people actually tend to use in discourse, thus excluding ‘a woman I haven’t mentioned yet’. Incidentally, I do think that it is a prediction of my view that if we did start to use such phrases in every day conversation, simple indefinites like ‘a woman’ would no longer indicate a plan to introduce a new discourse referent. However, if my view is right, it is also a prediction that speakers have little reason to start using such clumsy phrases. The rest of the expressions just mentioned are things people tend to say, but there is nothing relevant here to infer from the speaker not using them. That is to say, these are all equally good ways of asserting the existence of a woman who walked in. Not using one of these doesn’t indicate anything for present purposes. (I say for present purposes because I am interested in deriving the novelty implicature. There may be other implicatures that can be derived from the subtle differences in meaning between these indefinite expressions, but that is a project for another time.).

  31. 31.

    This example is based on Stalnaker (1978).

  32. 32.

    I use examples involving ‘it’ because ‘it’, unlike ‘he’ and ‘she’, cannot be used as a deictic pronoun.

  33. 33.

    For a detailed discussion and defense of this view of pronouns (and definite descriptions), see Lewis (ms).

  34. 34.

    They are local contexts in some sense of the term, but not in the sense that they are constructed by way of the semantic composition of a sentence.

  35. 35.

    This also explains why (44) with “because” replacing “and” is acceptable.

  36. 36.

    For example, Horn (1989) cites psycholinguistic research that shows that processing time is longer for negative sentences than positive ones, but not if a proper context of denial is set up. Furthermore, many of the scholars he cites agree that negative sentences can be odd out of the blue when their positive counterparts aren’t. Tian and Breheny (2016) argue that “negation is a cue for retrieving the prominent QUD” (27).

  37. 37.

    Thank you to an anonymous reviewer for raising this point and for this example.

  38. 38.

    We shouldn’t forget that there are some contexts in which contextual factors make it such that existence does not need to be asserted and a discourse referent will be established anyway because the reason for a speaker taking a non-standard route to introducing a new object under discussion will be clear. These will be highly contextual, particularized pragmatic derivations. So in a specific context that warrants saying something like ‘fewer than two’ when the speaker clearly means ‘exactly one’, a discourse referent will be introduced.

  39. 39.

    Comparing a syntactic tree for a standard bound pronoun sentence ‘a student asked for her grade’ with ‘it’s not the case that a student came and asked for her grade’ shows that the pronoun is c-commanded by the quantifier ‘a student’ in both cases, which means it can syntactically bind it. A node A c-commands a node B in a syntactic tree iff the lowest branching node dominating A also dominates B and A does not dominate B nor does B dominate A. In these examples, the pronoun ‘her’ is c-commanded by the DP ‘a student’ in both trees. The tree with VP conjunction could also be written as a non-binary branching tree, depending on the syntactic theory to which one ascribes; this makes no difference to the c-command relationship of DP and pronoun.

  40. 40.

    See Geurts (1998) and van der Sandt and Maier (2003) for more on denial.

  41. 41.

    For the ‘I doubt’ and ‘There’s no way’ locutions the echoic denial reading is more salient. To get the infelicitous reading, the reader must imagine a case in which this is not being uttered as an echoic denial.

  42. 42.

    I use the set of all worlds rather than the worlds in the common ground because the information the assignment functions encode is the properties associated with the discourse referents; this has nothing to do with which worlds are still considered open according to the conversation. Keeping these separate is helpful in maintaining the ordinary notion of truth at a world in the system.

  43. 43.

    I don’t discuss conditionals in this fragment for the sake of simplicity.

  44. 44.

    I am suppressing treatment of gender and number in pronouns, but that could be easily incorporated.

  45. 45.

    I haven’t included a QDR variable on ‘a woman’ for the sake of perspicuity, though I think all quantifiers come with the QDR variable.


Thanks are due to Josh Dever, Michael Glanzberg, Jim Pryor, Jessica Rett, and Anders Schoubye for helpful comments and discussion. Thanks as well to audiences at the Philosophy of Language and Linguistics Conference in Dubrovnik, Croatia in September 2014, the Workshop on Semantic Content in Barcelona, Spain in November 2014, and the Dartmouth Philosophy of Language Workshop at Dartmouth College in September 2015, and members of the NYU Mind and Language seminar in February 2018 for questions and discussion, as well as two anonymous reviewers for this journal for their comments, all of which helped improve the paper vastly.

A formal implementation

In this appendix, I briefly outline the formal implementation of both context and the semantics of pronouns, showing how these work together to account for basic cases of discourse anaphora.

  1. 1.

    Context: A context C contains the following elements:

  • DR, the set of discourse referents, modeled as indices. These indices are the domain of the assignment functions.

  • WG, a set of world/assignment function pairs \(\langle \)w, g\(\rangle \) such that w \(\in \) W, where W is the set of all possible worlds, and g is one of the possible assignments of indices in DR to entities in w, for each possible such assignment g.Footnote 42

  • CG, the common ground, modeled a context set, i.e.{w \(\in \) W \(\mid \) w is possible given conversational presuppositions}

  • QUD, the set of questions under discussion.

  1. 2.

    Syntax: The syntax of the language includes variables x, y, z, with or without subscripts, numerical indices, logical constants (proper names), predicates (from English), the logical connectives \(\wedge \), \(\vee \), \(\lnot \), the indefinite article a, the definite article the, the generalized quantifiers every, some, most, few, no.

  2. 3.


  • A model M = \(\langle \)W, D\(_e\), D\(_t\), I\(\rangle \) where W = set of worlds w, D\(_e\) = domain of entities e, D\(_t\) = {0,1}, and I = interpretation function

  • For each constant c, I(c) = some e \(\in \) D\(_e\)

  • For n-place predicates p, I(p) = {\(\langle \)w,{\(\langle \)e\(_1\),...e\(_n\rangle \), \(\langle \)e\(_1\),...e\(_n \rangle \)...}\(\rangle \), \(\langle \)w,{\(\langle \)e\(_1\),...e\(_n \rangle \), \(\langle \)e\(_1\),...e\(_n \rangle \)...}\(\rangle \)...}, such that w \(\in \) W and e\(_1\),...e\(_n\) \(\in \) D\(_e\), and the second member of the n-tuple is a set of entities if p is a 1-place predicate and a set of n-tuples of entities otherwise.

  • The logical connectives have their ordinary static denotations.Footnote 43

  • The quantifiers have standard denotations from generalized quantifier theory, e.g.:

  1. a.

    \(\llbracket \)[every x: \(\phi \)](\(\psi \))\(\rrbracket ^{M, w, c,h}\) = 1 iff \(\forall \)e in D\(_e\) s.t. \(\llbracket \phi \rrbracket ^{h[x\rightarrow e]} = 1\) at w, \(\llbracket \psi \rrbracket ^{h[x\rightarrow e]} = 1\) at w

  2. b.

    \(\llbracket \)[some x: \(\phi \)](\(\psi \))\(\rrbracket ^{M, w, c,h} = 1\) iff \(\exists \)e in D\(_e\) s.t. \(\llbracket \phi \rrbracket ^{h[x\rightarrow e]} = 1\) at w & \(\llbracket \psi \rrbracket ^{h[x\rightarrow e]} = 1\) at w

  • The assignment functions relative to which denotations are calculated are distinct from the ones in WG. The assignment functions in WG are functions from indices to entities, the other assignment functions are functions from variables to entities or indices. I reserve ‘g’ for the former and ‘h’ for the latter.

  • An has the same denotation as the existential quantifier some.

  • Pronouns are equivalent to definite descriptions with null overt material, and the definite article is treated as a generalized quantifier that presupposes discourse uniqueness:

    $$ \begin{aligned}&\llbracket \text {[The x:} \phi ] (\psi )\rrbracket ^{M, w, c, h} \\&\quad = {\left\{ \begin{array}{ll} \text {defined if } \exists !\text {n} \in \text {DR s.t. for all} \ \langle \text {w,g} \rangle \in \text {WG}, \llbracket \phi \rrbracket ^{h[x\rightarrow g(n)]} = 1 \ \text {at w}\\ \text {1 iff } \exists \text {e in D}_e \ \text {s.t.} \ \llbracket \phi \rrbracket ^{h[x\rightarrow e]} = 1 \ \text {at w} \ \& \llbracket \psi \rrbracket ^{h[x\rightarrow e]} = 1 \ \text {at w}\\ \text {0 otherwise} \end{array}\right. } \end{aligned}$$

The clause defines the discourse uniqueness presupposition as the requirement that there is exactly one index n in DR such that the (possibly complex, massively conjoined) property associated with that index is exactly the same as the property denoted by the descriptive material in the definite description. The presupposition is a condition on definedness. The relevant descriptive material on the definite description that must satisfy discourse uniqueness is the completed description. The completed description occurs via the mechanism of quantifier domain restriction. If defined, The F is G is true iff there is at least one F that is G.

I treat quantifier domain restriction following Stanley and Szabo (2000). All quantifiers are restricted by a variable that shares a node with the noun at the level of syntax. The variable is of the form f(i), where i is an individual variable that can either be bound or get a value from the context and f is a contextually determined function from individuals to quantifier domains, which in the intensional version is a property, a function from worlds to sets of individuals. The domain of the quantifier is determined by combining the denotation of the overt predicate (if any) and the result of f(i). (In Stanley and Szabo’s extensional version, they combine by set intersection. In the intensional version, we can think of them combining by something like looks exactly like predicate modification, except for the fact that the noun and variables share the same node.) For example, take the sentence ‘every student was happy’ in a particular context of use, formally written as [Every x: \(\langle \)student, f(i)\(\rangle \) x] (happy x). In a particular context, f might be the function from events to a function from worlds to their participants at that world, and the value for i might be the 2019 Met Gala. Hence, f(i) yields a function from worlds to the set of all the participants of the 2019 Met Gala at that world, and combining that with the denotation of students (i.e. a function from worlds to students at that world), we get students at the 2019 Met Gala as the restrictor property. Thus we get the restricted universal claim that every student at the 2019 Met Gala was happy.

As I said, pronouns are treated as definite descriptions with null overt material, and so the descriptive material comes entirely from the quantifier domain restriction.Footnote 44 Let’s see how this applies to our central example:

(57) a. A woman walked in.
a’. [An x: woman x] (walked in x)Footnote

I haven’t included a QDR variable on ‘a woman’ for the sake of perspicuity, though I think all quantifiers come with the QDR variable.

b. She sat down.
b’. [The \(x: \langle \emptyset \),f(i)\(\rangle x\)] (sat down x)

Introduction and update of discourse referents are modeled as change to DR and WG. (57a) adds a novel index, say 1, to DR, and changes WG such that the new set contains all the 1-variants of the input assignment functions, such that 1 is now assigned to an individual in I(woman) and I(, relative to its paired world. An 1-variant of an assignment function g is here defined as all the possible extensions of g that assign 1 to something in the relevant interpretation.

Consider the following toy model:

  1. 1.

    W = {\(w_1\), \(w_2\), \(w_3\)}

  2. 2.

    D\(_e\) = {Alice, Bob, Carol, David, Emily, Francine}

  3. 3.

    I(woman) = {\(\langle w_1\), {Alice, Carol, Emily, Francine}\(\rangle \), \(\langle w_2\), {Alice, Carol, Emily, Francine}\(\rangle \), \(\langle w_3\), {Alice, Carol, Emily, Francine}\(\rangle \)}

  4. 4.

    I(walked in) = {\(\langle w_1\), {Alice, Carol, Bob}\(\rangle \), \(\langle w_2\), {Bob, Carol, David, Emily}\(\rangle \), \(\langle w_3\), {Alice, Emily, Francine}\(\rangle \)}

  5. 5.

    I(sat down) = {\(\langle w_1\), {Alice, Bob, Francine}\(\rangle \), \(\langle w_2\), {Carol, Emily}\(\rangle \), \(\langle w_3\), {Francine}\(\rangle \)}

For simplicity assume a initial context in which CG = W and DR = {}. I suppress the QUD in what follows. The pragmatic effect on the context of asserting (57a) is to add a novel discourse referent for a woman who walked in, and to eliminate all worlds in CG that are incompatible with a woman walking in:

Context after (57a) is asserted:

  1. 1.

    CG = W \(\cap \) \(\llbracket \)a woman walked in\(\rrbracket ^{M,c,h}\) = W\(_1\)

  2. 2.

    DR = {1}

  3. 3.

    WG = {\(\langle w_1\), g\(_1\): 1\(\rightarrow \) Alice\(\rangle \), \(\langle w_1\), g\(_2\): 1\(\rightarrow \) Carol\(\rangle \), \(\langle w_2\), g\(_3\): 1\(\rightarrow \) Carol\(\rangle \), \(\langle w_2\), g\(_4\): 1\(\rightarrow \) Emily\(\rangle \), \(\langle w_3\), g\(_5\): 1\(\rightarrow \) Alice\(\rangle \), \(\langle w_3\), g\(_6\): 1\(\rightarrow \) Emily\(\rangle \), \(\langle w_3\), g\(_7\): 1\(\rightarrow \) Francine\(\rangle \)}

The discourse referent serves as the value for the salient individual i in (57b), and the QDR function that goes in for the variable f gathers all the information associated with that discourse referent, yielding the set of all possible individuals who could be a witness for the discourse referent relative to each possible world:

Contextual QDR function for anaphoric pronouns:

For some index n, f(n) = {\(\langle \)w, {e\(_1\)...e\(_n\)}\(\rangle \) \(\mid \) w is the first member of a tuple in WG and {e\(_1\)...e\(_n\)} is the set of all entities assigned to n by some g which is the second member of a tuple that has w as its first member}

This will yield a function from worlds w to the set of all women who walked in at w. (57b) is then true at a world iff there is at least one object in that domain of possible witnesses that sat down, i.e. it is true iff there is at least one woman who walked in and sat down.

Context after (57b) is asserted:

  1. 1.

    CG = W\(_1\) \(\cap \) \(\llbracket \)some woman walked in and sat down\(\rrbracket ^{M,c,h}\)

  2. 2.

    DR = {1}

  3. 3.

    WG = {\(\langle w_1\), g\(_1\): 1\(\rightarrow \) Alice\(\rangle \), \(\langle w_2\), g\(_3\): 1\(\rightarrow \) Carol\(\rangle \), \(\langle w_2\), g\(_4\): 1\(\rightarrow \) Emily\(\rangle \), \(\langle w_3\), g\(_7\): 1\(\rightarrow \) Francine\(\rangle \)}

The effect here is again purely pragmatically motivated. Eliminating incompatible assignment functions captures the fact that conversational participants use discourse referents to track what properties hang together as satisfied by a single witness according to the discourse.

For the cases that involve accommodation, such as the external felicity cases central to this paper, the relevant discourse referent is accommodated in the context before the semantic machinery does its work. For example, take (39): when we get to the pronoun ‘it’, there is no unique discourse referent to which it can be resolved. So a discourse referent for Bryan’s Paris apartment is added to the context, and then semantics works in the same way as described.

  • Anaphora
  • Negation
  • Dynamic semantics
  • Dynamic pragmatics
  • Descriptions
  • Pronouns
  • Discourse