Introduction

Speakers can express the same meaning in many different ways. For example, after saying the boy liked the girl, a speaker can refer back to the girl with a pronoun (she) or with a repeated noun (the girl). What makes people choose one referring expression over another? Many theories of reference assume that accessibility affects the choice of referring expressions: People produce less explicit referring expressions, such as pronouns, more frequently when the referent is more salient in the prior discourse and, hence, more easily retrieved from memory (i.e., accessible; see Bock, 1982; Bock & Warren, 1985), and they produce more explicit referring expressions, such as repeated nouns, more frequently when the referent is less accessible (e.g., Ariel, 1990; Givón, 1983; Gundel, Hedberg, & Zacharski, 1993). However, the question of what sources of information affect accessibility and, hence, choice of expressions has not been fully settled.

Previous research has identified some important factors that influence the referent’s accessibility and, hence, the choice of referring expressions. It has been shown, for example, that pronouns are used more often when the antecedent is the syntactic subject of the sentence than when it plays some other grammatical role (Arnold, 2001; Brennan, 1995; Fletcher, 1984; Fukumura & Van Gompel, 2010, 2011; Stevenson, Crawley, & Kleinman, 1994). According to various theoretical accounts (Brennan, 1995; Brennan, Friedman, & Pollard, 1987; Gordon, Grosz, & Gilliom, 1993; Grosz, Joshi, & Weinstein, 1995), the syntactic subject is more accessible than other syntactic functions. Moreover, the presence of a referential competitor in the linguistic (Arnold & Griffin, 2007) or visual (Fukumura, Van Gompel, & Pickering, 2010) context reduces the use of pronouns, possibly because similarity between referential candidates results in semantic interference, thereby reducing the referent’s accessibility (Fukumura, Hyönä, & Scholfield, 2013; Fukumura, Van Gompel, Harley, & Pickering, 2011). Additionally, people are more likely to use pronouns to refer to animate rather than inanimate entities (Fukumura & Van Gompel, 2011), and animate entities have been argued to be more accessible (Bock, 1982; Bock & Warren, 1985).

In the present study, we investigated whether the length of an antecedent (hereafter, antecedent length) affects subsequent choice of referring expressions to that antecedent (pronoun vs. repeated noun). According to functional-linguistic theories of reference (e.g., Ariel, 1990; Givón, 1983), the amount of information attached to a noun phrase (henceforth, NP) signals the referent’s accessibility in discourse. For example, in Ariel’s (1990) accessibility hierarchy, long definite descriptions such as the first woman selected to be on the team of an American spaceship are ranked lower (i.e., are deemed to be less accessible) than short definite descriptions such as the woman or she in terms of the accessibility of the referent in discourse. According to Ariel (1990, 1996), longer NPs are typically used when the referent is less accessible in the context because they refer to new information in discourse, whereas shorter NPs are more common when the associated entity is given and, hence, more accessible (Ariel, 1990, 1996; Givón, 1983, 1988, 1989; Gundel et al., 1993). Importantly, Ariel assumes that the amount of information predicated of an NP indicates how accessible the referent is—that is, the shorter the NP, the more accessible the referent in discourse. Thus, on the basis of this account, there should be a greater preference for reduced referring expressions following shorter NPs.

Another possibility, however, is that longer NPs are more accessible than shorter ones, because extra information tends to lead to richer memory representations. Research on memory suggests that elaborative information on words enhances later retrieval of those words, possibly because extra information provides additional retrieval cues (e.g., Craik & Tulving, 1975; Fisher & Craik, 1980; Marks, 1987). Therefore, it may be that longer antecedents are more easily retrieved from memory, increasing the probability of pronominal reference. Consistent with this possibility, Hofmeister (2011) found that semantically richer antecedents resulted in faster reading times in long-distance dependencies than did semantically more impoverished antecedents. For example, in (1a), the direct object of banned is a communist, but this phrase does not appear in its standard (“canonical”) location.

  1. (1)
    1. a.

      It was a communist who the members of the club banned from ever entering the premises.

    2. b.

      It was an alleged Venezuelan communist who the members of the club banned from ever entering the premises.

Efficient comprehension involves rapidly associating a communist with banned. Hofmeister found that people read the words immediately following banned faster in (1b) than in (1a), presumably because semantically richer phrases are encoded more clearly in memory and, hence, are easier to access.

Although no study thus far has examined the effect of antecedent length on the choice of referring expressions, previous research has shown that the length of an NP affects constituent order. For instance, Stallings, MacDonald, and O’Seaghdha (1998) showed that English speakers preferentially produce the prepositional object of a ditransitive sentence before the direct object when the latter is long (the manger exhibited to Jill the new line of bright summer beach and resort fashions) (see also Arnold, Wasow, Losongco, & Ginstrom, 2000). Importantly, such an effect of length on word order, often termed heavy NP shift, has been assumed to occur because of accessibility. Researchers argue that shorter NPs tend to precede longer NPs in English because shorter NPs are more accessible (e.g., Arnold et al., 2000; Stallings & MacDonald, 2011; Stallings et al., 1998), which makes sense in light of the given-before-new ordering preferences observed more generally (E. Clark & Clark, 1978, H. H. Clark & Haviland, 1977, Halliday, 1967). Interestingly, however, Yamashita and Chang (2001) showed that, unlike English speakers, Japanese speakers prefer to place the longer NPs before the shorter NPs when producing both transitive and ditransitive structures. Consistent with Hofmeister (2011), Yamashita and Chang argued that longer NPs are more accessible than shorter ones because extra linguistic material adds more information to the referent, making it semantically richer and, therefore, conceptually more salient. Thus, although length may affect accessibility differently in different languages (e.g., Chang, 2009; Hawkins, 1994), if the effect of length on constituent order is indeed mediated by accessibility, we might expect length to also affect referential forms.

We thus carried out three experiments to investigate whether and how the length of a potential antecedent affects the form of the associated referring expressions. A functional-linguistic account (Ariel, 1990, 1996; Givón, 1983, 1988, 1989; Gundel et al., 1993) predicts that shorter NPs signal higher accessibility, as compared with longer NPs, and that this should cause people to produce more pronouns when the antecedent is shorter than when it is longer. In contrast, the semantic richness account claims that the more information predicated of an NP, the more conceptually accessible it becomes, because extra semantic information leads to an enriched semantic representation of the referent and/or provides additional recall cues for retrieval. As was discussed earlier, research on memory and recall (Craik & Tulving, 1975; Fisher & Craik, 1980; Marks, 1987) and filler-gap dependencies (Hofmeister, 2011) is consistent with this account. If length indeed enhances the accessibility of the referent, we could expect more pronouns following a long antecedent than a short antecedent. In all experiments, we used a free sentence continuation task (Fukumura & Van Gompel, 2010; Garvey & Caramazza, 1974; Stevenson et al., 1994; Stevenson, Knott, Oberlander, & McDonald, 2000). To preview the results, in all three experiments, we found that participants produced more pronouns when the antecedent was longer rather than shorter, consistent with the semantic richness account.

Experiment 1

We examined the effect of antecedent length by manipulating the presence or absence of a relative clause attached to a potential antecedent. We created three preceding sentence conditions, as in (2). In the long–short condition (2a), a relative clause was attached to NP1 (the actor), but not to NP2 (the actress), making NP1 longer than NP2. In the short–long condition (2b), the same relative clause was attached to NP2, making NP2 longer than NP1. Finally, in the short–short (2c) condition, neither NP had a relative clause.

  1. (2)
    1. a.

      Long–short: The actor who was frustrated and visibly upset about the night’s disastrous performance walked away from the actress.

    2. b.

      Short–long: The actor walked away from the actress who was frustrated and visibly upset about the night’s disastrous performance.

    3. c.

      Short–short: The actor walked away from the actress.

Participants were asked to read the sentences and then provide a meaningful continuation for it. We were primarily interested in how participants referred back to the entity they chose to talk about in their continuations—with a pronoun (he or she) or with a repeated noun (the actor or the actress)—and how this might be affected by the presence of a relative clause attached to the antecedent.

The functional account predicts that shorter referents are more accessible than longer ones in discourse (Ariel, 1990; Givón, 1983). Specifically, in her corpus analyses, Ariel (1996) found that NPs followed by relative clauses typically occur when the referent is less accessible in the context, which led her to argue that long descriptions are “low accessibility markers.” Therefore, if the form of the antecedent is indeed taken to signal the referent’s accessibility, participants should produce fewer pronouns (relative to repeated head nouns) when the antecedent is longer than when it is shorter. That is, when the antecedent is NP1, more pronouns are expected in the short–long (2b) than in the long–short (2a) condition, whereas when the antecedent is NP2, more pronouns are expected in the long–short (2a) than in the short–long (2b) condition. Alternatively, the semantic richness account (e.g., Hofmeister, 2011; Marks, 1987; Yamashita & Chang, 2001) predicts that longer antecedents boost the referent’s accessibility because of the additional information that is predicated of the referent. Therefore, more pronouns are expected in the long–short (2a) than in the short–long (2b) condition for NP1 antecedents, whereas more pronouns are expected in the short–long (2b) than in the long–short (2a) condition for NP2 antecedents. In addition, it is possible that the presence of a relative clause generally influences the referent’s accessibility (i.e., regardless of which NP it is attached to). For instance, the additional words might increase the distance between the antecedent and the anaphor, which may decrease the referent’s accessibility (Ariel, 1990; Givón, 1983). As a result, participants may generally produce fewer pronouns, rather than repeated head nouns, when the preceding sentence contains a relative clause anywhere (the long–short and short–long conditions) than when both NPs are short (the short-short condition).

Furthermore, because the participants were free to talk about either of the entities in the preceding sentence, an interesting question was how the length of the antecedent would affect the choice of referent (i.e., which antecedent the participants chose to talk about). Some previous studies have argued that choice of referent and form of referring expression are driven by the same underlying forces (e.g., Arnold, 2001, 2008; Givón 1988, 1989); that is, language users produce more reduced expressions such as pronouns for the referent that is most likely to be referred to, because the more predictable the referent is, the more accessible it is. But other research suggests that choice and form of reference are guided by different mechanisms (Fukumura & Van Gompel, 2010; Kehler, Kertz, Rohde, & Elman, 2008; Stevenson et al., 1994).

Method

Participants

Thirty-six undergraduate students studying at the University of Edinburgh took part in the experiment in exchange for £6. They were all native speakers of British English.

Materials and design

We constructed 42 experimental sentences such as (2). Each sentence included two NPs of different genders (NP1 and NP2), and a relative clause was attached to NP1, NP2, or neither, creating three preceding sentence conditions. We counterbalanced NP order by including three additional conditions, in which the order of the NPs was reversed. This resulted in six conditions for each experimental sentence: preceding sentence (short–short vs. long–short vs. short–long) × NP order (male–female vs. female–male). We also constructed 60 fillers. The fillers did not contain relative clauses, but about half of them contained constituents, such as prepositional phrases, that made them appear similar to the long experimental sentences. The 42 experimental and 60 filler sentences were distributed in a fixed random order across six lists, subject to the constraints that at least one filler sentence occurred between two experimental sentences and that no more than two experimental sentences of the same condition occurred consecutively. Each experimental list contained one version of each item and seven items from each condition, together with all 60 fillers. Six participants were randomly assigned to each list.

Procedure

Participants were given a booklet that contained the to-be-continued sentences and were asked to write a meaningful continuation for each sentence. The participants were encouraged to produce continuations “quickly” and “with the first thing that comes to mind,” but there was no time limit. Participants were permitted to take a short break halfway through. The experimental session lasted about 45–60 min.

Scoring

We scored whether the subject of the continuation referred to NP1 or NP2 in the preceding sentence and what referring expressions were used to refer to them. Responses were scored as other responses if (1) the referring expression referred to neither NP1 nor NP2, (2) neither a pronoun nor a repeated noun was produced to refer to NP1 or NP2, (3) the referent was not the first-mentioned entity in the continuation, (4) participants did not produce a new sentence, and (5) the referring expression was part of a subordinate clause in the continuation (e.g., When he/the actor asked for an explanation, she/the actress didn’t provide one).

Results

Throughout this article, we analyze the effect of our manipulations on the choice of referring expression (i.e., how the participants referred to the antecedent they talked about, with a pronoun or with a repeated noun), as well as on the choice of referent (i.e., which antecedent the participants talked about in the continuations). Because our dependent variable was categorical for both choice of referent and choice of referring expression, we always used logit mixed-effects models (Baayen, Davidson, & Bates, 2008; Jaeger, 2008). For random effects, we always included by-participants and by-items random intercepts. We attempted fitting the maximum random effect structure (Barr, Levy, Scheepers, & Tily, 2013), but because the models failed to converge, as is often the case with categorical data, we included by-participants and by-items random slopes only if their inclusion was justified by the model (Baayen et al., 2008). We also ran more traditional F 1 and F 2 analyses of variance (ANOVAs) on by-participants and by-items means, which, according to Barr et al., control type 1 error rate well. The results were consistent with those from the mixed effects reported below.

Choice of referring expressions

Table 1 reports the percentage of pronominal reference relative to repeated noun reference for NP1 and NP2 references in each preceding sentence. Participants rarely produced referring expressions other than pronouns and repeated nouns; the ones they did generate included null pronouns (… and went out to smoke) and modified nouns (… The frustrated actress …) (long–short, N = 2; short–long, N = 1; short–short, N = 2). These responses were therefore excluded from further analyses on choice of referring expressions.

Table 1 The percentage of pronouns out of all pronouns and repeated nouns for NP1 and NP2 reference by preceding sentence in Experiment 1

We analyzed the number of pronouns and repeated nouns as functions of antecedent position (NP1 vs. NP2) and preceding sentence. We first analyzed how the inclusion of a relative clause modulated the effect of antecedent position by comparing (1) the short–short condition with the long–short condition and (2) the short–short condition with the short–long condition. We then analyzed the effect of antecedent length on pronoun use by comparing (3) the long–short condition with the short–long condition. In all three analyses, both antecedent position and preceding sentence were centered, so that the results could be interpreted in the same way as in traditional ANOVAs. We collapsed across NP order, because this was merely a counterbalancing variable. Table 2 provides a summary of all the coefficients for choice of referring expression analyses.

Table 2 Summary of the coefficients from the analyses on choice of referring expressions in Experiment 1

The comparison of the short–short and the long–short conditions revealed a significant main effect of antecedent position, with more pronominal reference to NP1 than to NP2, but there was no significant main effect of preceding sentence, nor any interaction between antecedent position and preceding sentence (short–short vs. long–short). The comparison between the short–short and the short–long condition, which included by-items random slopes for antecedent position and preceding sentence, also revealed a significant main effect of antecedent position, with more pronoun use for NP1 than for NP2, but there was no significant main effect of preceding sentence. However, the effect of antecedent position was significantly modulated by preceding sentence (short–short vs. short–long). We followed up this interaction by analyzing simple effects of preceding sentence (long–short vs. short–long) for NP1 and NP2 separately. When the antecedent was NP1, there was no difference between the short–short and the short–long condition, but when the antecedent was NP2, there were significantly more pronouns in the short–long condition than in the short–short condition.

Finally, the comparison between the long–short and the short–long conditions, which included by-item random slopes for antecedent position and preceding sentence, revealed that antecedent position had a significant influence on pronoun use, indicating that there were more pronouns following NP1 than NP2 antecedents. There was no overall effect of preceding sentence (long–short vs. short–long), however, suggesting that participants produced similar numbers of pronouns following long–short and short–long sentences. Importantly, there was a significant antecedent position × preceding sentence (long–short vs. short–long) interaction, indicating that the effect of antecedent position was larger in the long–short than in the short–long condition. When the antecedent was NP1, although there was a numerical tendency for more pronouns for the long–short than for the short–long condition, this difference did not reach significance. However, when the antecedent was NP2, there were significantly more pronouns in the short–long condition than in the long–short condition, indicating that a longer antecedent was more likely to be realized with a pronoun, as compared with a shorter antecedent.

Choice of referent

We also analyzed whether and how the choice of referent was affected by our manipulations. Table 3 reports the percentage of NP1, NP2, and other references for each condition.

Table 3 Percentage of NP1, NP2, and other references by preceding sentence in Experiment 1

There were slightly more other responses in the short–short condition than in both the long–short condition (p = .08) and the short–long condition (p = .08), but there was no difference in other responses between the long–short and the short–long conditions (p = .92). Given that the number of other responses differed between the conditions, we analyzed the number of NP1 references and NP2 references relative to all trials (including NP1, NP2, and other). As in the analyses on the choice of referring expression, we first compared the short–short (baseline) condition with (1) the long–short and with (2) the short–long conditions and then compared (3) the long–short and the short–long conditions with respect to the numbers of NP1 and N2 references (out of all trials). Table 4 reports the results.

Table 4 Summary of the coefficients from the analyses on the choice of referent in Experiment 1

The results revealed no significant difference in NP1 reference between the short–short and the long–short conditions, but there were significantly fewer NP2 references in the short–short than in the long–short condition. There were significantly fewer NP1 references in the short–short condition than in the short–long condition and significantly more NP2 references in the short–short condition than in the short–long condition. The comparison between the long–short and the short–long conditions included by-items random slopes for preceding sentence. The results showed significantly fewer NP1 references in the long–short condition than in the short–long condition and significantly more NP2 references in the long–short condition than in the short–long condition, indicating that there was a tendency to talk about the relatively shorter, rather than the longer, antecedent.

Discussion

The results of Experiment 1 revealed that participants produced more pronouns for NP2 in the short–long condition than in the long–short condition, which indicated that people tend to use more pronouns when referring to long rather than short antecedents. When the antecedent was NP1, although there was a numerical tendency toward more pronouns in the long–short than in the short–long condition, the effect was not significant. In addition, there were more pronouns for NP2 in the short–long condition than in the short–short condition. However, participants did not generally use more pronouns when the preceding sentence was longer: for NP1 reference, there was no significant difference between the short–short and the short–long condition. Instead, participants may have preferentially produced more pronouns for longer antecedents.

The tendency to use pronouns more often with longer antecedents is compatible with the semantic richness account, which claims that additional semantic information in long NPs makes them conceptually more accessible, and is incompatible with the functional linguistic account, which claims that longer descriptions signal low accessibility in discourse. In addition, participants used more pronouns when the antecedent was NP1 (subject) than NP2 (object), in accord with previous research (Arnold, 2001; Brennan, 1995; Fletcher, 1984; Fukumura & Van Gompel, 2010, 2011; Stevenson et al., 1994). The preference to use pronouns to refer to NP1 antecedents was so strong that pronominal reference approached ceiling level in the NP1 conditions (over 90%). This might explain why the effect of antecedent length did not reach significance in this condition.

Interestingly, there were more NP1 references in the short–long condition than in the long–short condition, whereas there were more NP2 references in the long–short condition than in the short–long condition. This indicated that participants referred to the shorter antecedents more often. Such preference to refer to relatively shorter antecedents may also explain why there were more NP1 and fewer NP2 references in the short–long condition than in the short–short condition and more NP2 references in the long–short than in the short–short condition.

Thus, although participants were less likely to refer to longer antecedents, they tended to use more pronouns to refer to them. These findings run counter to theories of reference production that assume that choice of referent and choice of referring expression are both driven by the same underlying force (Arnold, 2001, 2008; Givón 1988, 1989), because these theories maintain that the more likely an entity is to be referred to, the more accessible it is and, hence, more reduced referring expressions should be used to refer to it.

Experiment 2

In Experiment 1, the tendency to use a pronoun to refer to the NP modified by a relative clause was significant for NP2 but not for NP1. This difference might be related to the fact that pronouns were generally used much more often than repeated nouns in all conditions, and this was especially the case for NP1 reference, where participants hardly produced any repeated nouns at all (2.6%). Thus, a data set containing fewer pronouns overall might allow us to observe an effect of NP length on both NP1 and NP2. In Experiment 2, therefore, we attempted to increase the proportion of repeated nouns relative to pronouns by matching the genders of the two NPs in the preceding sentence, as in (3). Previous studies have shown that people tend to use fewer pronouns and more repeated nouns to refer to one of two NPs that share gender or other semantic features than otherwise (Arnold & Griffin, 2007; Fukumura & Van Gompel, 2011; Fukumura et al., 2013; Fukumura et al., 2011; Fukumura et al., 2010).

  1. (3)
    1. a.

      Long–short: The actor who was frustrated and visibly upset about the night’s disastrous performance walked away from the cameraman.

    2. b.

      Short–long: The actor walked away from the cameraman who was frustrated and visibly upset about the night’s disastrous performance.

    3. c.

      Short–short: The actor walked away from the cameraman.

Method

Participants

Thirty-six participants were drawn from the same population as in Experiment 1 and compensated in the same manner. None had participated in Experiment 1.

Materials, design, and procedure

Unlike in Experiment 1, the two NPs in the preceding sentences were always of the same gender, as in (3). A few other minor changes were also made to keep the sentences as natural as possible given the gender changes (see the Appendix). Fillers, design, and procedure were the same as in Experiment 1.

Scoring

The criteria for scoring were the same as in Experiment 1. Because the two NPs had the same gender, a pronoun referring back to either of the NPs was ambiguous. We thus asked two native speakers of British English who were blind to the purposes of the study to score all the continuations containing pronouns and decide whether the pronoun referred to NP1 or NP2 or was ambiguous. Any continuation for which the scorers did not agree about the referent of the pronoun was also excluded. This resulted in the removal of 18 continuations.

Results

Choice of referring expression

Table 5 reports the percentage of pronominal reference relative to repeated noun reference to NP1 and NP2 by preceding sentence. As before, participants almost never produced other referring expressions (long–short, N = 0; short–long, N = 0; short–short, N = 1), so these (as well as other responses) were excluded.

Table 5 The percentage of pronouns out of all pronouns and repeated nouns for NP1 and NP2 reference by preceding sentence in Experiment 2

We analysed the choice of referring expression (pronouns vs. repeated nouns) in the same way as in Experiment 1, by including antecedent position (NP1 vs. NP2) and preceding sentence (short–short vs. long–short and short–short vs. short–long) as the fixed factors. Table 6 provides a summary of all the coefficients for choice of referring expression analyses.

Table 6 Summary of the coefficients from the analyses on the choice of referring expressions in Experiment 2

First, we compared the short–short condition with (1) the long–short and (2) the short–long conditions in terms of pronoun use. The comparison between the short–short and the long–short conditions (which included by-subjects random slopes for antecedent position) revealed a significant main effect of antecedent position, with more pronouns used to refer to NP1 than to NP2. Also, there was a main effect of preceding sentence (short–short vs. long–short), with significantly more pronouns in the short–short condition than in the long–short condition. There was also a significant interaction between antecedent position and preceding sentence (short–short vs. long–short). For NP1 reference, there was no difference between the short–short and the long–short conditions, but for NP2 reference, there were significantly more pronouns in the short–short condition than in the long–short condition. The comparison between the short–short and the short–long condition, which included by-subjects random slopes for antecedent position and by-items random slopes for antecedent position, preceding sentence (short–short vs. short–long), as well as their interaction, also revealed a significant antecedent position effect, with more pronominal reference to NP1 than to NP2. In addition, there were significantly more pronouns in the short–short condition than in the short–long condition. There was also a significant interaction between antecedent position and preceding sentence (short–short vs. short–long). When the antecedent was NP1, there were more pronouns in the short–short than in the short–long condition, but when the antecedent was NP2, there was no significant difference between the short–short condition and the short–long condition.

Next, we examined the effect of antecedent length by comparing (3) the long–short and the short–long conditions. As before, there was a significant effect of antecedent position, with more pronouns following NP1 than NP2 antecedents, but there was no effect of preceding sentence. Importantly, there was a significant antecedent position × preceding sentence (long–short vs. short–long) interaction, which indicated that the effect of antecedent position was greater in the long–short than in the short–long condition. Simple effects further revealed that when the antecedent was NP1, there were significantly more pronouns in the long–short than in the short–long condition. In contrast, when the antecedent was NP2, there were significantly fewer pronouns in the long–short than in the short–long condition, indicating that participants used more pronouns for longer antecedents.

Choice of referent

Table 7 reports the frequencies of NP1, NP2, and other references. There were significantly more other responses in the short–short condition than in the long–short condition (p < .01), but there was no difference between the short–short condition and the short–long condition (p = .95). In addition, there were more other responses in the short–long condition than in the long–short condition (p < .05). Therefore, we analyzed the number of NP1 references and that of NP2 references, relative to all trials (including other responses).

Table 7 The percentages of NP1, NP2 and other references by preceding sentence in Experiment 2

The analysis was the same as in Experiment 1. Table 8 reports the results of analyses on choice of referent for Experiment 2.

Table 8 Summary of the coefficients from the analyses on the choice of referent in Experiment 2

We first compared the short–short condition with (1) the long–short and (2) the short–long conditions in terms of choice of referent. There was no significant difference between the short–short and the long–short conditions for NP1 reference or for NP2 reference. Similarly, the comparison between the short–short and the short–long condition revealed no significant difference in NP1 or NP2 reference. We then compared (3) the long–short and the short–long conditions, which included by-item random slopes for preceding sentence (long–short vs. short–long). The results showed no significant difference between the long–short and the short–long conditions in NP1 references, but there were more NP2 references in the long–short condition than in the short–long condition.

Discussion

We obtained even clearer results concerning the effect of antecedent length in this experiment, as compared with Experiment 1. Participants were more likely to use pronouns for NP1 in the long–short condition than for NP1 in the short–long condition, whereas they were less likely to use pronouns for NP2 in the long–short than for NP2 in the short–long condition, indicating that antecedent length increases pronoun use for both NP1 and NP2 antecedents. These results are in line with the semantic richness account, which assumes that the amount of information predicated of the antecedent increases its accessibility (Hofmeister, 2011; Yamashita & Chang, 2001). However, there were more pronouns for NP2 reference in the short–short than in the long–short condition and more pronouns for NP1 reference in the short–short than in the short–long condition. These results are unlikely to be due to a general tendency to use more pronouns when the preceding sentence is shorter, because there were no more pronouns in the short–short condition, as compared with the long–short condition for NP1, and similarly, the short–short and short–long conditions did not differ in NP2 reference. Instead, the effect may have occurred because participants were less likely to use pronouns when referring to an antecedent shorter than the referential alternative.

As in Experiment 1, there was a preference to refer to the shorter of the two potential antecedents in a sentence, although the effect was less pronounced: Although there were more NP2 references in the long–short condition than in the short–long condition, indicating a preference to refer to shorter antecedents, there was no difference in the number of NP1 references between the long–short and short–long conditions.

Experiment 3

In the first and second experiments, we found that antecedent length affects the use of pronouns. Experiment 3 examined whether the effect of antecedent length could also be observed in spoken language production. Some research suggests that task demands can modulate language processing (Ferreira, Ferraro, & Bailey, 2002; Ferreira, Foucart, & Engelhardt, 2013; Ferreira & Patson, 2007; Sanford & Sturt, 2002; Swets, Desmet, Clifton, & Ferreira 2008). Since the linguistic signal is transient in speech, the representations of the discourse entities might fade faster in memory, as compared with written language processing, and this difference could potentially influence the choice of referring expressions. Therefore, in Experiment 3, we presented the preceding sentences auditorily and asked participants to continue the discourse orally. The stimuli for this experiment were the ones used in Experiment 1.

Method

Participants

Twenty-four undergraduate students were drawn from the participant pool of the University of South Carolina. They were all native speakers of American English and participated in the study in exchange for course credit.

Materials and design

The materials and design were identical to those in Experiment 1, except that the sentences were recorded by a female native speaker of North American English at a slightly slower than normal speech rate.

Procedure

The recorded experimental lists were programmed in the Experiment Builder software such that each sentence was orally played to the participants once they pressed the “space” button on the keyboard. Immediately after the sentence was finished, a “speak” prompt appeared on the screen and indicated that they could start speaking their continuations into a microphone. They were required to press the “space” button again to stop recording themselves and one more time to move to the next sentence. All the participants were tested individually in a quiet testing room. As in the previous experiments, the participants were encouraged to respond quickly and “with the first thing that comes to mind,” but there was no time limit for starting to speak the continuation. In addition, they could take a break in the middle of the experiment if they needed to do so. The experiment took approximately 30 min to complete.

Scoring

The scoring of this experiment was identical to that in Experiment 1.

Results

Choice of referring expression

Table 9 reports the percentage of pronominal reference relative to repeated noun reference to NP1 and NP2 for each preceding sentence. As in Experiments 1 and 2, other referring expressions were extremely rare (long–short, N = 2; short–long, N = 2; short–short, N = 0), so we focused on the number of pronouns relative to repeated nouns in our analyses.

Table 9 Percentages of pronouns (out of all pronouns and repeated nouns) for NP1 and NP2 reference by preceding sentence in Experiment 3

Table 10 contains a summary of the coefficients for choice of referring expression analyses. As before, we included antecedent position (NP1 vs. NP2) and preceding sentence (short–short vs. long–short and short–short vs. short–long) as the fixed factors. The comparison between the short–short and the long–short conditions (1) revealed a significant effect of antecedent position, with more pronoun use for NP1 than for NP2, and also an effect of preceding sentence (short–short vs. long–short), with more pronoun use in the long–short condition than in the short–short condition, but there was no significant interaction between antecedent position and preceding sentence (short–short vs. long–short). The comparison between the short–short and the short–long conditions (2) also revealed a significant effect of antecedent position, with more pronominal reference to NP1 than to NP2, and a significant effect of preceding sentence (short–short vs. short–long), showing that there were significantly more pronouns in the short–long condition than in the short–short condition. Additionally, there was a significant interaction between antecedent position and preceding sentence (short–short vs. short–long). When the antecedent was NP1, there was no significant difference between the short–short and the short–long conditions, but when the antecedent was NP2, there were significantly more pronouns in the short–long condition than in the short–short condition. The comparison between the long–short and the short–long conditions (3) revealed a main effect of antecedent position, with more pronouns following NP1 than NP2 antecedents, but there was no main effect of preceding sentence (long–short vs. short–long). Most important and in line with the results of the previous experiments, we found a significant antecedent position × preceding sentence (long–short vs. short–long) interaction, indicating that that the effect of antecedent position was larger in the long–short than in the short–long condition. Simple effects revealed that when NP1 was the antecedent, there were more pronouns in the long–short than in the short–long condition. In contrast, when the antecedent was NP2, there were fewer pronouns in the long–short than in the short–long condition. Thus, there were more pronoun references to longer antecedents.

Table 10 Summary of the coefficients from the analyses on the choice of referring expressions in Experiment 3

Choice of referent

Table 11 reports choice of referent by preceding sentence. There was no significant difference in the number of other responses between the short–short condition and the long–short condition (p = .25) or between the short–short condition and the short–long condition (p = .35). However, there were significantly more other responses in the short–long condition than in the long–short condition (p < .05). We thus analyzed the numbers of NP1 and NP2 references relative to all trials (NP1, NP2, and other references). We also analyzed the choice of referent, as in Experiments 1 and 2. Table 12 reports the results of these analyses. The comparisons between the short–short and the long–short condition (1) revealed that there were significantly fewer NP1 references in the short–short condition than in the long–short condition, but there was no significant difference in NP2 reference. The comparisons between the short–short and the short–long condition (2) found fewer NP1 references in the short–short condition than in the short–long condition, whereas there were significantly more NP2 references in the short–short condition than in the short–long condition. There were significantly more NP2 references in the long–short condition than in the short–long condition (3), but no difference between these two conditions for NP1 reference.

Table 11 Percentages of NP1, NP2 and other references by preceding sentence in Experiment 3
Table 12 Summary of the coefficients from the analyses on the choice of referent in Experiment 3

Discussion

Experiment 3 replicated the effect of length in speech; participants were more likely to use pronouns to refer to longer than to refer to shorter antecedents. Specifically, similar to Experiment 2, participants were more likely to use a pronoun to refer to NP1 in the long–short condition than in the short–long condition and more likely to use a pronoun to refer to NP2 in the short–long condition than in the long–short condition. The advantage of longer antecedents was also found between the short–short and the short–long conditions; there were significantly more pronouns for NP2 in the short–long condition than in the short–short condition. This effect was unlikely to be due to the length of the preceding sentence, because there were no significant differences in NP1 reference between these two conditions. Instead, the effect is most straightforwardly explained by the semantic richness account: Longer antecedents were more salient, and participants were more likely to use pronouns to refer to them.

Antecedent length also affected choice of referent. Participants were more likely to refer to NP2 in the long–short than in the short–long condition, although the two conditions did not differ in NP1 reference. In addition, more NP1 references and fewer NP2 references in the short–long condition than in the short–short condition indicate that participants tended to refer to the relatively shorter antecedents, consistent with the findings from Experiments 1 and 2. The only result that goes against this interpretation is the comparison between the short–short and long–short conditions for NP1 reference, where we found more NP1 references in the long–short condition. However, neither Experiments 1 nor 2 revealed such an effect, which suggests the effect is rather weak.

General discussion

Across experiments, there was a clear and strong effect of antecedent position, with NP1 (the sentence subject) being considerably more likely to be realized with a pronoun than NP2 (the sentence object). This effect has repeatedly been shown in previous studies and is attributed to the greater prominence associated with the first-mentioned entity and the syntactic subject of a sentence (Arnold, 2001; Brennan, 1995; Fletcher, 1984; Fukumura & Van Gompel, 2010, 2011; Stevenson et al, 1994). More important, all experiments consistently showed an interaction between antecedent length and antecedent position. Specifically, the advantage of NP1 over NP2 antecedents for pronominal reference increased when NP1 was longer than NP2, whereas it decreased when NP2 was longer than NP1, indicating that longer antecedents received more pronoun referring expressions. These results are in line with theories of language processing that maintain that length increases the accessibility of the associated NP (e.g., Almor, 1999, 2004; Hofmeister, 2011; Yamashita & Chang, 2001) and run counter to the functionalist view of accessibility which assumes that a longer NP might be perceived as less “given” by comprehenders and, therefore, should be rendered less accessible (Ariel, 1990, 1996; Givón, 1988, 1989; Gundel et al., 1993). Although Ariel’s corpus analyses found that NPs that are modified by relative clauses tend to be produced when the referent is less accessible in the prior discourse context, we found that long antecedents attenuate anaphoric forms, suggesting that length increases accessibility for subsequent reference. Therefore, longer antecedents do not seem to signal low referent accessibility, contrary to what Ariel has proposed.

But the question then is: Why does length enhance accessibility? One possibility, as Hofmeister (2011) pointed out, is that the extra information predicated of longer NPs causes them to be encoded more firmly in memory and, therefore, allows them to be retrieved faster, by providing additional retrieval cues when language users reaccess the referent. This possibility is consistent with theories proposing that processing of an NP whose representation depends on some other modifying information requires reactivating that NP so that the new information can be efficiently incorporated into the discourse representation (Hofmeister, 2011; Lewis & Vasishth, 2005). In our study, the head noun of the long antecedent has likely been repeatedly reactivated in the process of incorporating the words included in the attached relative clause, facilitating subsequent retrieval (i.e., higher accessibility).

Another possibility is that since length necessarily adds more information to the associated antecedent, it renders it more predicable (Keil, 1979). Simply put, predicability refers to the number of possible conceptual pathways for retrieving a certain concept from memory (Bock, 1982; Bock & Warren, 1985). For example, an animate entity such as an actor can move, stop, fall, think, sleep, and eat. However, an inanimate entity such as a car can move, stop, and fall but cannot think, sleep, or eat. Consequently, “actor” is connected to more concepts than “car,” making it possible to predicate more ideas of “actor” and, therefore, making it more conceptually accessible. It could be the case that length acts in a similar fashion. That is, the extra information attached to the long entity necessarily connects it to more concepts, rendering it more accessible.

Regardless of the precise mechanism underlying the effect of length, an important question is why more accessible antecedents are realized with less marked referring expressions. One possible explanation of this inverse relationship between accessibility and explicitness of referring expressions is that subsequent retrieval of an accessible referent is easier than that of a less accessible referent. Fukumura et al. (2011; Fukumura & Van Gompel, 2010) thus suggested that when referring to less accessible referents, speakers need to reactivate more information about the referent, which affects the activation of concepts associated with more explicit referring expressions. In contrast, when referring to more accessible antecedents, less information needs to be activated to retrieve the referent, which leads to the production of less explicit referring expressions.

Another potential explanation is that the language processing system might have a bias toward keeping a constant ratio between information to be relayed and amount of linguistic signal, a view proposed by Jaeger (2010) and dubbed the “uniform information density” (UID) hypothesis. Originally, it aimed at explaining optional that realization in English (e.g., Ferreira & Dell, 2000). Take the following sentences, for example:

  1. (4)

    My boss confirmed/thought (that) I was absolutely crazy.

According to UID, that is more likely to be included in (4) when the main clause verb is confirmed rather than thought, because the verb confirm is statistically less likely to take a complement clause. (This pattern occurs because confirm can take both complement clauses and direct objects, whereas think almost always takes only complement clauses; Garnsey, Pearlmutter, Myers, & Lotocky, 1997) Therefore, speakers are more likely to produce that after confirmed, as compared with thought as a way of preventing a sudden disturbance in the “information-to-linguistic signal” balance, thus achieving a more uniform distribution of information over words. We might therefore wonder whether participants produced more pronouns following longer antecedents in our study because longer antecedents conveyed more information about the referent, and therefore, the use of a pronoun, which encoded less information than repeated nouns, was more efficient from an information-theoretic point of view.

However, UID is primarily based on statistical predictability: Jaeger (2010) argued that “speakers should be more likely to produce pronouns (e.g. she) instead of full NPs (e.g. the girl) when reference to the expression’s referent is probable in that context” (p.48). Whether speakers are more likely to choose pronouns over repeated NPs when the referent is more likely to be referred to is controversial, however (Arnold, 2001; Fukumura & Van Gompel, 2010; Stevenson et al., 1994). In the present study, whereas pronouns were more frequently used to refer to longer antecedents, participants were more likely to refer to shorter than to longer antecedents in their continuations. Thus, UID is not likely to be the underlying force behind the effects of length. How can this tendency to refer to relatively shorter antecedents be explained? Can this tendency be linked to the results for the choice of referring expression? Although we can only speculate on what factors might be behind the findings related to choice of referent, one possibility is that the more information is predicated of an antecedent, the more specific it becomes and, as a result, the harder it becomes to add more information. This is because the new information that is added to a long antecedent should semantically fit the information already predicated of it. However, because short antecedents (in our case, bare nouns such as “the actor”) have no extra information, the participants had more freedom in describing them (because almost any new information would fit). Thus, talking about a longer antecedent is perhaps more difficult than talking about a shorter antecedent. Note that, under this view, both choice of referent and choice of referring expression are guided by the general tendency to minimize processing effort (Jaeger, 2010; Zipf, 1949).

Another possibility is that speakers prefer to refer to the shorter NP in order to balance the amount of information associated with the discourse entities. That is, speakers may not decide to refer to a particular antecedent merely because it is accessible. Instead, they choose what to refer to depending on whether it is communicatively informative or semantically sensible. In our experiments, participants tended to refer more often to the NPs that did not include a relative clause, possibly because there was little description of those characters, as compared with the characters described with a relative clause. In other words, participants may have produced reference in order to fill the gap in their knowledge (cf. Simner & Pickering, 2005). But because of the relatively limited information about those characters, participants used fewer pronouns to refer to them through the mechanisms discussed earlier. In support of this idea, Brennan (1995) found that participants typically first make a character salient in the discourse by referring to it in subject position and with a fuller form of reference before referring to that same character with a pronoun.

An alternative explanation of our results for choice of referent and choice of referring expression comes from the connectionist account proposed by Chang (2002, 2009). Under this account, the notion of accessibility is not restricted to conceptual factors such as animacy (Bock, 1982; Bock & Warren, 1985) and can be extended to include lexical factors. Therefore, the choice between different forms of referring expressions may be driven by the conceptual accessibility of the antecedent, as well as the lexical accessibility of the anaphoric expression itself. Assuming that lexical accessibility is affected by competition among different alternatives (Chang, 2002, 2009), there may have been fewer linguistic forms that can be used to refer to shorter antecedents (“he,” “the actor”) than longer ones (“he,” “the actor,” “the frustrated actor,” “the upset actor”, “the frustrated and upset actor,” etc.). If so, there would be less competition between referring forms for short antecedents than for long antecedents. This may have made it easier to refer to shorter antecedents, as compared with longer ones, and may have led to more pronouns for longer antecedents: Under stronger competition, pronouns may be more likely to win over other alternative forms of reference, due to their ease of production.

However, if the lexical competition account (Chang, 2002, 2009) is correct, we would expect to have found more variability in the form of referring expressions for longer antecedents. For example, participants would have used more modified NPs (e.g., “the frustrated actor,” “the frustrated and upset actress”) to refer to the longer antecedents. Examination of our data revealed that such modified referring expressions were almost nonexistent. Thus, the lexical competition account is not fully supported by our data. In fact, Fukumura et al. (2013) recently showed that factors that affect pronoun use are different from those that affect lexical competition. Although pronoun use can be affected by competing representations of different discourse entities (Arnold & Griffin, 2007; Fukumura et al., 2013; Fukumura & Van Gompel, 2011; Fukumura et al., 2011), the competition is assumed to occur at a nonlinguistic level (between the nonlexicalized representations), so the idea that lexical competition drives pronoun use goes against current theories of pronoun production.

In conclusion, our study demonstrates that antecedent length has a major effect on form of reference: Language users were more likely to use a pronoun when the antecedent was elaborated on with a modifier than when it was a simple NP. This finding is consistent with the proposal that the additional information increases the referent’s prominence and, thus, supports the view that semantic enrichment enhances accessibility.