Assessing priming for prosodic representations: Speaking rate, intonational phrase boundaries, and pitch accenting

Tooley, Kristen M.; Konopka, Agnieszka E.; Watson, Duane G.

doi:10.3758/s13421-018-0789-5

Assessing priming for prosodic representations: Speaking rate, intonational phrase boundaries, and pitch accenting

Published: 18 January 2018

Volume 46, pages 625–641, (2018)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Assessing priming for prosodic representations: Speaking rate, intonational phrase boundaries, and pitch accenting

Download PDF

Kristen M. Tooley¹,
Agnieszka E. Konopka² &
Duane G. Watson³

2267 Accesses
11 Citations
5 Altmetric
Explore all metrics

Abstract

Recent work in the literature on prosody presents a puzzle: Some aspects of prosody can be primed in production (e.g., speech rate), but others cannot (e.g., intonational phrase boundaries, or IPBs). In three experiments we aimed to replicate these effects and identify the source of this dissociation. In Experiment 1 we investigated how speaking rate and the presence of an intonational boundary in a prime sentence presented auditorily affect the production of these aspects of prosody in a target sentence presented visually. Analyses of the targets revealed that participants’ speaking rates, but not their production of boundaries, were affected by the priming manipulation. Experiment 2 verified whether speakers are more sensitive to IPBs when the boundaries provide disambiguating information, and in this different context replicated Experiment 1 in showing no IPB priming. Experiment 3 tested whether speakers are sensitive to another aspect of prosody—pitch accenting—in a similar paradigm. Again, we found no evidence that this manipulation affected pitch accenting in target sentences. These findings are consistent with earlier research and suggest that aspects of prosody that are paralinguistic (like speaking rate) may be more amenable to priming than are linguistic aspects of prosody (such as phrase boundaries and pitch accenting).

The Implicit Prosody of Corrective Contrast Primes Appropriately Intonated Probes (for Some Readers)

Prosody, Procedures and Pragmatics

Prominence in Relative Clause Attachment: Evidence from Prosodic Priming

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Priming and models of language processing

As in most domains of cognitive psychology, understanding the processes that underlie language behavior is an empirical challenge because these processes cannot be observed directly. Instead, the field relies on analyses of performance in language-oriented tasks to shed light on these processes and then to formulate models of language representation and use. A classic example is the use of priming tasks, which have provided extensive insight into the ways that language knowledge is stored, activated, and used during comprehension and production (Pickering & Ferreira, 2008). Our previous work (Tooley, Konopka, & Watson, 2014) suggested that, unlike every other level of linguistic representation tested so far, one aspect of prosody—intonational phrase boundaries (IPBs)—is not amenable to priming. A further puzzle is that previous work has suggested that another aspect of prosody—speech rate—is primeable (Jungers & Hupp, 2009; Jungers, Palmer, & Speer, 2002). In this article, we first aim to replicate this prosodic priming asymmetry in one experiment; then we investigate priming for IPBs and pitch accenting in two further experiments, to assess the similarity of the underlying representations of these aspects of prosody. Below we first discuss the role of priming in language research and the types of inferences made from priming effects. Then we present three studies investigating the priming of intonational boundaries, speech rate, and pitch accenting.

Priming as a method

The term priming usually refers to a facilitation of a construction/retrieval mechanism that the language user deploys due to recent experience with similar representations. For example, in lexical priming studies, participants are normally faster in lexical decision tasks after exposure to related words (e.g., Meyer & Schvaneveldt, 1971). These findings helped to motivate models of semantic memory (Collins & Loftus, 1975; Collins & Quillian, 1970; see also Gabora, Rosch, & Aerts, 2008; Rosch, 1975) and lexical ambiguity resolution (e.g., Swinney, 1979), and they paved the way for more specific models of lexical access during comprehension (McClelland & Elman, 1986; Vigliocco, Vinson, Damian, & Levelt, 2002).

More recently, priming tasks have also been used to investigate complex representations like syntactic structure. Studies of syntactic priming show that speakers reuse recently processed structures when producing novel sentences, such as when describing pictured events or completing sentence preambles (Bock, 1986; see Pickering & Ferreira, 2008, for reviews). Priming also occurs in dialogue, including alignment of terminology (Brennan & Clark, 1996; Schober & Clark, 1989) and syntactic structures (Branigan, Pickering, & Cleland, 2000) between conversation partners, as well as higher-level representations, like situation models (Schober, 1993). Such findings helped motivate the interactive alignment model (Garrod & Pickering, 2004), which posits a language mechanism that causes language users to adapt their linguistic representations to those of their conversation partners in order to facilitate communication.

Thus, priming tasks provide a fruitful avenue to study the mental representations and processing of language: The extent to which priming is observed for a specific aspect of language has implications for whether, and how, that information is represented during processing. Recently, priming tasks have also been used to investigate how different aspects of prosody are represented and planned during production (Jungers & Hupp, 2009; Tooley et al., 2014). This line of research provides novel and important evidence about the nature of prosodic representations and helps to establish where prosody fits in a process model of language production.

Priming for prosodic representations

Prosody refers to acoustic aspects of spoken language that are not specific to individual vowels or consonants, but to larger units such as words or phrases in an utterance, such as rhythm, pitch, intonation, and speech rate. Intonational phrasing refers to perceptual groupings of words within an utterance. Intonational phrases are separated by intonational phrase boundaries (IPBs), which are perceived as pauses, and can be recognized by a pause in sound energy and/or lengthening of the preboundary word and tonal movement at the end of the phrase (Pierrehumbert & Hirschberg, 1990; see Wagner & Watson, 2010, for a review). How this aspect of prosody is represented and planned during production processes is still unclear. However, if IPBs are a structuring property of spoken language (grouping words together in time) in the same way that syntax is a structuring property of language (grouping words into grammatical phrases), they may be expected to prime in the same way that syntactic structures can be primed.

Yet, our earlier work in this area found no evidence of priming for IPBs (Tooley et al., 2014). We manipulated the presence of a boundary at two syntactic locations in prime sentences that were presented to participants auditorily. Participants repeated the prime sentences, and then silently read and repeated a visually presented target sentence out loud, from memory. In three experiments, participants produced pauses at the primed locations in the prime sentences they repeated, but these effects did not carry over to the target sentences. This was the case whether participants repeated back the prime sentence (Exp. 2) or not (Exp. 3) before receiving the target sentence. These findings are markedly different from priming effects observed at other levels of linguistic representation, including syntax, where experience with a prime sentence reliably affects syntactic choice in a target sentence (e.g., Bock, 1986; Pickering & Branigan, 1998). To our knowledge, intonational boundary production may be the only aspect of linguistic representation reported thus far that is not amenable to priming.

One possible explanation for the lack of IPB priming is that speakers do not create a separate, abstract plan for when and where to produce boundaries in a sentence. Thus, there may be no representation to prime. Instead, boundaries may be triggered by cues from the syntactic and semantic processing stages (see Tooley et al., 2014, for a production model that incorporates this account of IPBs).

Although these findings offer an explanation for how one aspect of prosody may be represented during planning, they also present a puzzle. There have been a number of reports of robust priming effects for different aspects of prosody. For example, interlocutors’ F0 and intensity become more alike over the course of a conversation (de Looze, Scherer, Vaughan, & Campbell, 2014; Levitan & Hirschberg, 2011; Ward & Litman; 2007). This entrainment is linked to real-world behavior. For example, the amount of prosodic convergence that occurs in a conversation can be used to predict positive and negative affect in couples undergoing marriage counseling (Lee et al., 2010). Additional work suggests that prosodic entrainment can interact with the content of the conversation: Couples who entrain while discussing a conflict are less likely to resolve that conflict (Weidman, Breen, & Haydon, 2016). While this line of work reveals a strategic or communicative dimension to prosodic entrainment, it also points to the possibility of priming occurring for the underlying prosodic representation.

There is also clear evidence of prosodic priming for speech rate in nonconversational tasks that closely resemble traditional priming paradigms (Jungers & Hupp, 2009; Jungers et al., 2002). Jungers and Hupp auditorily presented participants with three prime sentences, spoken at a fast or slow rate, while they viewed a clipart image depicting the meaning of each sentence. Participants were then asked to describe a new target image. The rate of their productions for target sentences depended on the rate of the prime sentences they heard (target speech rate was faster after fast primes and slower after slow primes). The fact that encountered speech rate does prime speaking rate of later utterances suggests that speech rate may be planned separately from other representations.

If different aspects of prosody share a common type of underlying representation and are planned at a similar stage of processing, then in principle they should be equally primeable. Yet the evidence presented above suggests that this is not the case: Unlike with speech rate, we have no evidence that IPBs can be primed. We extend this work in the present study by testing whether the same type of linguistic representations underlie three different aspects of prosody: speech rate, intonational boundaries, and pitch accents. We use a priming paradigm again to test whether production of these aspects of prosody persists from one sentence to another.

The goal of Experiment 1 was to verify that the priming asymmetry between speech rate and IPBs exists when investigated within the same experiment. Thus, assessing priming for speech rate and IPBs in one experiment is critical to eliminate the possibility that differences in participants and methodology across studies produced the observed differences in priming. Next, Experiment 2 verified whether priming for intonational phrase boundaries may occur only when these boundaries have communicative value. Finally, Experiment 3 tested the validity of our conclusions for IPBs by examining priming of an aspect of prosody that has not been investigated previously: pitch accenting. If the lack of priming for IPBs is due to processing constraints or representational constraints on this specific aspect of prosody, one might expect other aspects of prosody (i.e., pitch accenting) to show priming. However, if both IPBs and pitch accenting are found to be immune to priming manipulations, this supports the claim that not all aspects of prosody are subsumed under the same processing stage.

Experiment 1

Experiment 1 used a prime–target paradigm to test whether boundary placement and speech rate of prime sentences can influence the production of new target sentences. Participants listened to and immediately repeated back the prime sentences they heard. These sentences either had no intonational phrase boundaries or had a boundary spliced in at a syntactically preferred location. The sentences were then resynthesized to be either 10% faster or 10% slower than the originally recorded speaking rate (i.e., the naturally produced rate of the speaker). Primes were followed by target trials, in which speakers silently read a novel sentence and then repeated it aloud from memory. Durational and perceptual measures were used to determine whether participants persisted in producing the speech rate and IPBs in the target sentences that they had heard in the primes.

Method

Participants

In all, 64 students from the University of Illinois participated for course credit. Participants in all three experiments were native speakers of English with normal hearing and normal (or corrected-to-normal) vision.

Materials

We used the experimental sentences from Tooley et al. (2014). The experimental set consisted of 40 items: 20 sentences with relative clauses (e.g., The dolphin that tossed the ball wanted a reward for his trick) and 20 sentences with main clauses (e.g., The girl bought new clothes at the mall today; Appx. 1). Two sentences with the same syntactic structure were yoked together to create 20 prime–target pairs (ten main-clause and ten relative-clause pairs).

To create the boundary manipulation, two versions of each sentence were initially recorded by a native English speaker: one with and one without an IPB at the critical location. The critical boundary location followed the second noun (e.g., The dolphin that tossed the ball // wanted a reward for his trick), as the clause boundary and boundary between the noun and verb phrases make this a natural location for a boundary (e.g., Truckenbrodt, 1999; Watson & Gibson, 2004). For the purpose of our analyses, the critical boundary region includes the word immediately preceding the boundary, the boundary itself, and the word after the boundary. All experimental sentences were created by splicing critical regions from recordings of each condition into a neutral carrier sentence that had no prosodic boundaries. This ensured that the prosody of regions that were outside of the critical region did not influence perception of the critical region. This splicing procedure was used to create sentences in the control condition (with no boundaries) as well as the experimental condition (with a boundary at the critical region). On average, sentences with a boundary were approximately 400 ms longer than those with no boundaries. The sentences were then subjected to a rate manipulation using a rate-resynthesizing script in PRAAT that created two sentence versions that were 10% slower and 10% faster than the original sentences, respectively. (The stimuli are available at the following link: https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/LHQZDQ).

The boundary manipulation crossed with the rate manipulation yielded four conditions: fast sentences without boundaries, fast sentences with boundaries, slow sentences without boundaries, and slow sentences with boundaries. Both factors were counterbalanced within-participants and within-items, so each participant saw each sentence in only one of these conditions. Within lists, each participant received five items in each of the four conditions. Additionally, each sentence could appear both as a prime and as a target. Thus, we created eight lists of stimuli to counterbalance the four conditions as well as the prime/target status of each sentence (referred to as sentence position below) on that list. The experimental sentences were arranged such that no more than two items from the same condition followed one another. Targets always immediately followed primes, and three filler sentences intervened between all prime–target pairs.

Filler sentences included a variety of syntactic structures (e.g., cleft constructions, sentences with fronted prepositional phrases, sentences with that-complements, and sentences with fronted temporal phrases). To reduce the salience of the manipulations in the primes, the fillers also varied with respect to IPBs and speaking rates. Roughly, one half of the fillers had one boundary, one quarter had two boundaries, and one quarter had no boundaries. Boundaries were produced naturally by the speaker and did not include any splicing. Half of the filler sentences were presented at the original recording rate, one quarter were resynthesized to be 10% faster, and one quarter were resynthesized to be 10% slower.

Procedure

The procedure used was the same as in Tooley et al.’s (2014) second experiment. Participants were told that they would either hear recorded sentences or read sentences printed on the screen. After either hearing a sentence or silently reading a sentence, their task was repeat the sentence back out loud. If the sentence was presented auditorily, the word LISTEN appeared and remained on the screen while the recording played. At sentence offset, the word REPEAT appeared on the screen to prompt participants to repeat the sentence. Participants then pressed the spacebar to advance to the next sentence. If the sentence was presented visually (i.e., if it was printed on the screen), participants first saw the word READ for 1 s, followed by the sentence. The sentence remained on the screen for an amount of time equal to 50 ms multiplied by the number of words in the sentence. After that amount of time had elapsed, the word REPEAT appeared on the screen, prompting participants to repeat the sentence aloud from memory. Participants then pressed the spacebar to advance to the next trial.

The prime sentences were always listen-and-repeat trials (as these recordings contained the manipulations), and the target sentences were always read-and-repeat trials (so they were prosodically neutral). Roughly half of the filler sentences were randomly assigned to be presented as listen-and-repeat trials, and half as read-and-repeat trials. The modality of fillers remained constant across all eight lists, and varied throughout the experiment to reduce the predictability of the trial type. The experiment started with a practice block of four listen-and-repeat and four read-and-repeat sentences, presented in a pseudorandom order.

Scoring and analyses

We excluded responses in which participants changed the syntactic structure of the sentence, paused for extended periods of time (average pause time of 1.36 s. for excluded trials), produced disfluencies at or near the critical sentence region, or produced sentence fragments. Minor wording changes were acceptable. These exclusion criteria left 1,131 trials (out of 1,280 total trials) for analysis. Participants’ boundary productions were assessed in two ways: One coder (the first author) rated whether or not a boundary was discernible in the critical region, and a second coder (the second author) measured the duration of the preboundary word through the onset of the first postboundary word. Total speaking time of each sentence was also measured. Coders were blind to condition in all experiments.

Analyses were carried out in R (R Development Core Team, 2008) using logit mixed models for the measure of perceived intonational boundaries, and linear mixed-effects models for the analyses of word-and-pause durations and total sentence speaking durations (Baayen, Davidson, & Bates, 2008; Jaeger, 2008). Prime boundary (present vs. absent), speech rate (fast vs. slow), and sentence position (prime vs. target) were included as mean-centered fixed effects (along with all interactions), and all models estimated random effects for participants and items. In all experiments, the maximal version of the models (warranted by the design) was used unless this resulted in nonconvergence. In those cases, random effects were removed on the basis of the size of their variance components (smaller effects were removed first) until the model reached convergence. All effects were considered significant at α < .05.

Results and discussion

Sentence speaking duration

The listen-and-repeat (prime) sentences were spoken faster when the original recording was the fast sentence version, and slower when the original recording was the slow sentence version (Fig. 1, left panel). This effect carried over into the read-and-repeat (target) sentences (Fig. 1, right panel).

The analysis of overall speaking times showed significant main effects of speech rate and boundary, as well as an interaction between speech rate and sentence position (Table 1). Participants spoke faster and slower after hearing fast and slow prime sentences, respectively, but this effect was smaller in the targets than in the primes. Participants also spoke more slowly when they heard a prime sentence with a boundary. Follow-up analysis of these effects in target sentences alone revealed a significant effect of prime speaking rate, suggesting that the speaking rate of the prime sentences did influence participants’ speaking rates in the targets.

Table 1 Analyses of total sentence speaking duration for all sentences (primes and targets)

Full size table

Production of intonational phrase boundaries

The repeated prime sentences were longer and contained a perceived boundary more often when the recorded prime sentence also had a boundary (Fig. 2, left panel). This effect, however, did not carry over into the read-and-repeat (target) sentences (Fig. 2, right panel).

The overall analysis of perceived pauses revealed a main effect of boundary, a marginal effect of speaking rate, and an interaction between boundary and sentence position (Table 2). Participants were slightly more likely to produce a boundary after hearing a slow-rate prime. They also produced boundaries at the critical region more often when they were primed to do so, but this effect depended on sentence position: Participants reproduced the heard boundaries in the prime sentences but did not generalize these boundaries to the target sentences. A follow-up analysis restricted to the target sentences confirmed that the effect of speaking rate was significant (p = .037) but the effect of boundary was not (p = .10). In other words, hearing a slower prime increased the chances that participants would produce a boundary in the target, but hearing a boundary in the prime sentence did not.

Table 2 Analyses of perceived boundary production and word-and-pause durations at the critical sentence region of all sentences (primes and targets)

Full size table

A similar pattern was observed with word-and-pause durations: Word-and-pause durations were longer in sentences produced after hearing slow primes than in sentences after fast primes, and after hearing primes with boundaries than after primes without boundaries (Fig. 3). This resulted in main effects of boundary and speaking rate but no interaction (Table 2). Importantly, we observed an interaction between boundary and sentence position, in which the effect of boundary on word-and-pause durations was limited to the repeated prime sentences. There was also a weak interaction between speaking rate and sentence position, in which the effect of speaking rate was again limited to the repeated prime sentences.

Thus, Experiment 1 replicated previous studies (Jungers & Hupp, 2009; Jungers et al., 2002; Tooley et al., 2014): Participants persisted in their use of a faster or slower speaking rate when the recorded prime sentences, respectively, also had a faster or slower rate. However, they did not persist in their use of intonational boundaries at the critical target sentence location when the prime sentence contained a boundary at that location. This supports the observation that speaking rate is much more amenable to priming than is the production of IPBs. Thus, different types of underlying representations and/or processes may be involved in the production of IPBs and speaking rate.

Interestingly, when participants heard a slow-rate prime, their production of the target sentence was more likely to contain a boundary. Likewise, the presence of a boundary in the prime sentence resulted in participants taking longer to produce the target sentence. Participants may have perceived an overall speaking rate that was slower in a prime sentence with a boundary, leading to an overall reduction in their speaking rate. This is consistent with earlier work (Lass, 1970) suggesting that the perception of speech rate is influenced by the presence of a boundary. Thus, our results are consistent with earlier work showing a relationship between speaking rate and boundary production (e.g., Gee & Grosjean, 1983). Though not the primary focus of this study, this interplay between boundary production and speaking rate can provide novel insight into the relationship between the perception and production of prosody.

Naturally, there are limitations to these conclusions. The absence of a priming effect for boundaries does not necessarily mean that no effect was present, since the null finding could reflect an inability to detect such effects in the present paradigm. However, we have consistently found that participants are more likely to reproduce boundaries heard in prime sentences (Tooley et al., 2014). This implies that our manipulation is not too weak to influence production and that participants do in fact retain some prosodic information from the prime sentences. Our paradigm was also successful in showing variation in participants’ boundary production, but importantly, this effect was not driven by the boundary-priming manipulation.

One plausible alternative for the lack of IPB priming concerns the optionality and information value of the boundaries in the prime sentences. In Experiment 1, as well as in previous studies, the boundaries produced in the primes were not strictly necessary and did not add syntactic or semantic information that might influence comprehension. Thus, primes may have been ineffective because they did not contribute to participants’ interpretation of the sentences. It is therefore plausible that priming might be observed in sentences with more “meaningful” boundaries. We tested this possibility in Experiment 2.

Experiment 2

Previous studies had used sentence structures in which IPBs were optional and did not add meaningful syntactic or semantic information to the sentence, which may have decreased the saliency of the boundaries. Thus, in Experiment 2 we used the same prime–target paradigm and the same measures of boundary production as in Experiment 1, but with new, ambiguous sentences in which boundaries supported disambiguation. We manipulated the presence of a boundary in the prime sentences (with no manipulation of speech rate). The target sentences were always ambiguous (e.g., She put the money in the basket on the table), so their structural interpretations could be influenced by the presence of a boundary in the critical location (i.e., between the phrases in the basket and on the table in the present example). If priming for IPBs is dependent on the saliency or meaningfulness of those boundaries to the listener, then participants should be more likely to produce a boundary at the critical location in target sentences when they have heard a boundary in that location in the primes.