Skip to main content

Rhythmic and speech rate effects in the perception of durational cues

A Correction to this article was published on 29 July 2021

This article has been updated

Abstract

Listeners’ perception of temporal contrasts in spoken language is highly sensitive to contextual information, such as variation in speech rate. The present study tests how rate-dependent perception is also mediated by distal (i.e., temporally removed) rhythmic patterns. In four experiments the role of rhythmic alternations and their interaction with speech rate effects are tested. Experiment 1 shows proximal speech rate (contrast) effects obtain based on changes in local context. Experiment 2 shows that these effects disappear with the addition of distal rhythmic alternations, indicating that rhythmic grouping shifts listeners’ perception, even when proximal context conflicts. Experiments 3 and 4 explore how orthogonal variation in overall speech rate impacts these effects and finds that trial-to-trial (i.e., global) speech rate variation eliminates rhythmic grouping effects, both with and without variation in proximal (immediately preceding) context. Together, these results suggest a role for rhythmic patterning in listeners’ processing of durational cues in speech, which interacts in various ways with proximal, distal, and global rate contexts.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Change history

Notes

  1. Cues that have been shown to be rate-dependent include voice onset time (Miller & Volaitis, 1989; Toscano & McMurray, 2015), formant transition duration as a manner cue (Wade & Holt, 2005; Miller & Liberman, 1979), vowel duration as a cue to coda obstruent voicing (Heffner, Newman, & Idsardi, 2017; Steffman, 2019), and vowel duration in a language with contrastive vowel length (Bosker, 2017; Reinisch & Sjerps, 2013). Rate-dependent perception also extends to syllable identification and word segmentation, discussed below (Bosker, Sjerps, & Reinisch, 2020; Dilley & Pitt, 2010; Reinisch, Jesse, & McQueen, 2011).

  2. Proximal, distal, and global contexts all shape listeners’ perception of durational cues in speech, though recent work suggests the mechanisms responsible for these effects may not be the same (Bosker, 2017; Bosker and Ghitza, 2018; Maslowski, Meyer, & Bosker, 2020). For example, a durational contrast account (Diehl & Walsh, 1989; Wade & Holt, 2005) is often offered to explain proximal effects, whereby the “perceived length of a given acoustic segment is affected contrastively by the duration of adjacent segments” (Diehl & Walsh 1989, p 2154), such that a given segment is perceived as shorter when following a segment that is relatively long. This sort of localized contrast account is supported by a variety of speech perception findings in which only proximal context is manipulated (e.g., Miller & Liberman 1979; Miller & Volaitis 1989). On the other hand, a growing body of literature suggests that more distal rate effects may be best accounted for by entrainment (Doelling, Arnal, Ghitza, & Poeppel, 2014; Luo & Poeppel, 2007), a model in which oscillators encode rate information neurally on the basis of the rate of repetition of roughly syllable-sized envelope fluctuations in the signal (Bosker, 2017; Peelle & Davis, 2012; Pitt, Szostak, & Dilley, 2016). This premise further has recent neurobiological support (Kösem et al., 2018; Kösem, Bosker, Jensen, Hagoort, & Riecke, 2020).

  3. It should be noted more generally that in the domain of spectral contrast, the evidence favors a clear precedence of proximal, over distal, context (Stilp 2018, Stilp 2020 for a review)

  4. Reinisch et al., (2011) tested how variation in rate influenced segmentation of ambiguous sequences in Dutch in which, for example, a durational event (e.g., closure duration for [t]) signaled a sequence of two /t/s across a word boundary, or a single /t/, as in “nooit rap” versus “nooit trap” (“never quick”/“never staircase”). A faster contextual rate in this case leads to increased perception of /t/ initial words, that is, closure duration is perceived as relatively long in relation to fast rate, signaling a geminate at the word boundary.

  5. For example, isochronous timing for linguistic units (e.g., metrically prominent, or stressed syllables) has been hypothesized to aid speech processing (Lehiste, 1977; Hawkins & Smith, 2001), as related to the more general theory of dynamic attending (e.g., Jones1976; Large & Jones 1999) in which recurrent patterns in a stimulus guide attentional resources and expectations for incoming auditory material. Indeed, regular timing for trochaic (strong-weak) and iambic (weak-strong) syllabic patterns were shown by Quené and Port (2005) to facilitate processing in a phoneme monitoring task (see also (Cutler & Darwin, 1981)), regardless of the sequence type, or its deviation from previous sequences (i.e., an iamb following a series of trochees).

  6. The log10 frequency of “coat” is 3.33, the log10 frequency of “code” is 3.43.

  7. PSOLA synthesis allows for manipulation of pitch and duration by analyzing the speech signal into pitch-synchronous Hanning-windowed sub-units for voiced portions of speech. Duration is manipulated by the duplication or reduction of windowed units. Pitch is manipulated by moving units closer together, raising pitch, or further apart, lowering pitch. The output signal is constructed via convolution of units with the overlap add technique (Crochiere, 1980; Oppenheim & Schafer, 1975). PSOLA is used frequently in perception experiments where pitch and/or duration are manipulated (e.g., Bosker 2017; Dilley & McAuley 2008; Reinisch & Sjerps 2013; Steffman & Jun 2019).

  8. The Experiment 2 stimuli can be considered in terms of the so-called “iambic/trochaic law” (e.g., Hayes 1995). This refers to a tendency for listeners to perceive alternating sequences as iambic or trochaic on the basis of the alternating acoustic medium. Alternations in intensity have been suggested to be perceived generally as trochaic (strong-weak) while alternations in duration are generally perceived as iambic (weak-strong). This is relevant to the present design in the sense that the alternations employed are purely durational, and accordingly, if the iambic/trochaic law obtains, one might predict that both conditions here would be perceived as alternating in a weak-strong fashion. However, previous research has suggested this pattern is only a tendency (Crowhurst and Olivares, 2014), and can be overridden. In particular, Hay and Diehl (2007) found a “strong tendency” for listeners’ perception of rhythm to be based on the starting pattern of a given sequence, i.e., the structure of the first two units in the pattern. This led the authors in that study to create onset masking in which stimuli were gradually faded in to obscure the starting point of the sequence. In the absence of such masking, as in the present stimuli, it is assumed that listeners’ perception of sequence structure will be based largely on the starting pattern in the sequence. The results from Experiment 2 further support this conclusion.

  9. This is also worth considering in light of the finding that repeating speech can sometimes be perceived as sung, i.e., the speech-to-song illusion (Deutsch, Henthorn, & Lapidis, 2011), which might impact possessive listeners’ perception of rhythmic timing patterns.

  10. Though Experiment 3 contained twice as many trials as Experiment 2, scaling and centering trial as a predictor for each experiment individually allows scaled values to occupy the same range, making trial as a variable more comparable across experiments.

  11. In Experiment 2: β= 0.05, 95% CI = [-0.16, 0.27]; in Experiment 3, β=-0.04, 95% CI = [-0.21, 0.12]. This provides, at best, very weak evidence for an asymmetrical change over trials for the effect of rhythm in each experiment.

  12. In an exploratory analysis, trial number was included in all other models reported in this paper. The only experiment in which trial, or any of its interactions showed a credible effect was Experiment 2, also the only experiment which evidenced the rhythmic grouping effect.

References

  • Baese-Berk, M. M., Heffner, C. C., Dilley, L. C., Pitt, M. A., Morrill, T. H., & McAuley, J. D. (2014). Long-term temporal tracking of speech rate affects spoken-word recognition. Psychological Science, 25 (8), 1546–1553.

    PubMed  Article  Google Scholar 

  • Barnes, R., & Jones, M. R. (2000). Expectancy, attention, and time. Cognitive Psychology, 41 (3), 254–311.

    PubMed  Article  Google Scholar 

  • Barry, W., Andreeva, B., & Koreman, J. (2009). Do rhythm measures reflect perceived rhythm?. Phonetica, 66(1-2), 78–94.

    PubMed  Article  Google Scholar 

  • Boersma, P., & Weenink, D. (2020). Praat: doing phonetics by computer (version 6.1.09). http://www.praat.org.

  • Bosker, H. R. (2017). Accounting for rate-dependent category boundary shifts in speech perception. Attention, Perception, & Psychophysics, 79(1), 333–343.

    Article  Google Scholar 

  • Bosker, H. R., & Ghitza, O. (2018). Entrained theta oscillations guide perception of subsequent speech: Behavioural evidence from rate normalisation. Language, Cognition and Neuroscience, 33(8), 955–967.

    Article  Google Scholar 

  • Bosker, H. R., Sjerps, M. J., & Reinisch, E. (2020). Temporal contrast effects in human speech perception are immune to selective attention. Scientific Reports, 10(1), 1–11.

    Article  Google Scholar 

  • Brown, M., Salverda, A. P., Dilley, L. C., & Tanenhaus, M. K. (2015). Metrical expectations from preceding prosody influence perception of lexical stress. Journal of Experimental Psychology: Human Perception and Performance, 41(2), 306–323.

    PubMed  Google Scholar 

  • Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977– 990.

    Article  PubMed  Google Scholar 

  • Bürkner, P-C (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.

    Article  Google Scholar 

  • Chen, M. (1970). Vowel length variation as a function of the voicing of the consonant environment. Phonetica, 22(3), 129–159.

    Article  Google Scholar 

  • Crochiere, R. (1980). A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(1), 99–102.

    Article  Google Scholar 

  • Crowhurst, M. J., & Olivares, A. T. (2014). Beyond the iambic-trochaic law: The joint influence of duration and intensity on the perception of rhythmic speech. Phonology, 31(1), 51–94.

    Article  Google Scholar 

  • Cutler, A., & Darwin, C. J. (1981). Phoneme-monitoring reaction time and preceding prosody: Effects of stop closure duration and of fundamental frequency. Perception & Psychophysics, 29(3), 217–224.

    Article  Google Scholar 

  • Deutsch, D., Henthorn, T., & Lapidis, R. (2011). Illusory transformation from speech to song. The Journal of the Acoustical Society of America, 129(4), 2245–2252.

    PubMed  Article  Google Scholar 

  • Diehl, R. L., & Walsh, M. A. (1989). An auditory basis for the stimulus-length effect in the perception of stops and glides. The Journal of the Acoustical Society of America, 85(5), 2154–2164.

    PubMed  Article  Google Scholar 

  • Dilley, L. C., Mattys, S. L., & Vinke, L. (2010). Potent prosody: Comparing the effects of distal prosody, proximal prosody, and semantic context on word segmentation. Journal of Memory and Language, 63 (3), 274–294.

    Article  Google Scholar 

  • Dilley, L. C., & McAuley, J D (2008). Distal prosodic context affects word segmentation and lexical processing. Journal of Memory and Language, 59(3), 294–311.

    Article  Google Scholar 

  • Dilley, L. C., & Pitt, M. A. (2010). Altering context speech rate can cause words to appear or disappear. Psychological Science, 21(11), 1664–1670.

    PubMed  Article  Google Scholar 

  • Doelling, K. B., Arnal, L. H., Ghitza, O., & Poeppel, D. (2014). Acoustic landmarks drive delta–theta oscillations to enable speech comprehension by facilitating perceptual parsing. NeuroImage, 85, 761–768.

    PubMed  Article  Google Scholar 

  • Handel, S. (1993) Listening: An introduction to the perception of auditory events. Cambridge: The MIT Press.

    Google Scholar 

  • Hawkins, S., & Smith, R. (2001). Polysp: A polysystemic, phonetically-rich approach to speech understanding. Italian Journal of Linguistics, 13, 99–188.

    Google Scholar 

  • Hay, J. S. F., & Diehl, R. L. (January 2007). Perception of rhythmic grouping: Testing the iambic/trochaic law. Perception & Psychophysics, 69(1), 113–122.

    Article  Google Scholar 

  • Hayes, B. (1995) Metrical stress theory: Principles and case studies. Chicago: University of Chicago Press.

    Google Scholar 

  • Heffner, C. C., Newman, R. S., & Idsardi, W. J. (2017). Support for context effects on segmentation and segments depends on the context. Attention, Perception, & Psychophysics, 79(3), 964–988.

    Article  Google Scholar 

  • Hoequist, C. E., & Kohler, K. J. (1986). Further experiments on speech rate perception with logatomes. Arbeitsberichte des Instituts fur Phonetik der Universitit Kiel, 22, 29–136.

    Google Scholar 

  • Horr, N. K., & Di Luca, M. (2015). Taking a long look at isochrony: Perceived duration increases with temporal, but not stimulus regularity. Attention, Perception, & Psychophysics, 77(2), 592–602.

    Article  Google Scholar 

  • Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention, and memory. Psychological Review, 83(5), 323–355.

    PubMed  Article  Google Scholar 

  • Jones, M. R., & McAuley, J D (2005). Time judgments in global temporal contexts. Perception & Psychophysics, 67(3), 398–417.

    Article  Google Scholar 

  • Jun, S-A (2012). Prosodic typology revisited: Adding macro-rhythm. In Proceedings of speech prosody, Vol. 6.

  • Jungers, M. K., Palmer, C., & Speer, S. R. (2002). Time after time: The coordinating influence of tempo in music and speech. Cognitive Processing, 1(2), 21–35.

    Google Scholar 

  • Kidd, G. R. (1989). Articulatory-rate context effects in phoneme identification. Journal of Experimental Psychology: Human Perception and Performance, 15(4), 736–748.

    PubMed  Google Scholar 

  • Kim, S., Mitterer, H., & Cho, T. (2018). A time course of prosodic modulation in phonological inferencing: The case of Korean post-obstruent tensing. Plos one, 13(8), e0202912.

    PubMed  PubMed Central  Article  Google Scholar 

  • Kösem, A, Bosker, H. R., Jensen, O., Hagoort, P., & Riecke, L. (2020). Biasing the perception of spoken words with transcranial alternating current stimulation. Journal of Cognitive Neuroscience, 32 (8), 1428–1437.

    PubMed  Article  Google Scholar 

  • Kösem, A, Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. (2018). Neural entrainment determines the words we hear. Current Biology, 28(18), 2867–2875.

    PubMed  Article  Google Scholar 

  • Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106(1), 119–159.

    Article  Google Scholar 

  • Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics, 5(3), 253–263.

    Article  Google Scholar 

  • Lenth, R., Singmann, H., Love, J., Buerkner, P., & Herve, M. (2018). emmeans: Estimated Marginal Means, aka Least-Squares Means. https://CRAN.R-project.org/package=emmeans.

  • Luo, H., & Poeppel, D. (2007). Phase patterns of neuronal responses reliably discriminate speech in human auditory cortex. Neuron, 54(6), 1001–1010.

    PubMed  PubMed Central  Article  Google Scholar 

  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2020). Eye-tracking the time course of distal and global speech rate effects. Journal of Experimental Psychology: Human Perception and Performance, 40(10), 1148–1163.

    Google Scholar 

  • Maslowski, M., Meyer, A. S., & Bosker, H. R. (2019). How the tracking of habitual rate influences speech perception. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(1), 128–138.

    PubMed  Google Scholar 

  • Mattys, S. L., White, L., & Melhorn, J. F. (2005). Integration of multiple speech segmentation cues: a hierarchical framework. Journal of Experimental Psychology: General, 134(4), 477–500.

    Article  Google Scholar 

  • McAuley, J D, & Jones, M. R. (2003). Modeling effects of rhythmic context on perceived duration: A comparison of interval and entrainment approaches to short-interval timing. Journal of Experimental Psychology: Human Perception and Performance, 29(6), 1102–1125.

    PubMed  Google Scholar 

  • Miller, J. L., Grosjean, F., & Lomanto, C. (1984). Articulation rate and its variability in spontaneous speech: A reanalysis and some implications. Phonetica, 41(4), 215–225.

    PubMed  Article  Google Scholar 

  • Miller, J. L., & Liberman, A. M. (1979). Some effects of later-occurring information on the perception of stop consonant and semivowel. Perception & Psychophysics, 25(6), 457–465.

    Article  Google Scholar 

  • Miller, J. L., & Volaitis, L. E. (1989). Effect of speaking rate on the perceptual structure of a phonetic category. Perception & Psychophysics, 46(6), 505–512.

    Article  Google Scholar 

  • Mitterer, H., Kim, S., & Cho, T. (2019). The glottal stop between segmental and suprasegmental processing: The case of Maltese. Journal of Memory and Language, 108, 104034.

    Article  Google Scholar 

  • Morrill, T. H., Dilley, L. C., McAuley, J D, & Pitt, M. A. (2014). Distal rhythm influences whether or not listeners hear a word in continuous speech: Support for a perceptual grouping hypothesis. Cognition, 131(1), 69–74.

    PubMed  Article  Google Scholar 

  • Moulines, E., & Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Communication, 9(5-6), 453–467.

    Article  Google Scholar 

  • Newman, R. S., & Sawusch, J. R. (May 1996). Perceptual normalization for speaking rate: effects of temporal distance. Perception & Psychophysics, 58(4), 540–560 (eng).

    Article  Google Scholar 

  • Oppenheim, A. V., & Schafer, R. W. (1975) Digital signal processing. Upper Saddle River: Prentice-Hall.

    Google Scholar 

  • Peelle, J. E., & Davis, M. H. (2012). Neural oscillations carry speech rhythm through to comprehension. Frontiers in Psychology, 3.

  • Pellegrino, F., Coupé, C, & Marsico, E. (2011). A cross-language perspective on speech information rate. Language, 539– 558.

  • Pitt, M. A., Szostak, C., & Dilley, L. C. (2016). Rate-dependent speech processing can be speech specific: Evidence from the perceptual disappearance of words under changes in context speech rate. Attention, Perception, & Psychophysics, 78(1), 334–345.

    Article  Google Scholar 

  • Quené, H (2008). Multilevel modeling of between-speaker and within-speaker variation in spontaneous speech tempo. The Journal of the Acoustical Society of America, 123(2), 1104–1113.

    PubMed  Article  Google Scholar 

  • Quené, H (2013). Longitudinal trends in speech tempo: The case of Queen Beatrix. The Journal of the Acoustical Society of America, 133(6), EL452–EL457.

    PubMed  Article  Google Scholar 

  • Quené, H, & Port, R. F. (2005). Effects of timing regularity and metrical expectancy on spoken-word perception. Phonetica, 62(1), 1–13.

    PubMed  Article  Google Scholar 

  • Raphael, L. J. (1972). Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American English. The Journal of the Acoustical Society of America, 51(4B), 1296–1303.

    PubMed  Article  Google Scholar 

  • Reinisch, E., Jesse, A., & McQueen, J. M. (2011). Speaking rate from proximal and distal contexts is used during word segmentation. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 978–996.

    PubMed  Google Scholar 

  • Reinisch, E., & Sjerps, M. J. (2013). The uptake of spectral and temporal cues in vowel perception is rapidly influenced by context. Journal of Phonetics, 41(2), 101–116.

    Article  Google Scholar 

  • Steffman, J. (2019). Intonational structure mediates speech rate normalization in the perception of segmental categories. Journal of Phonetics, 74, 114–129.

    Article  Google Scholar 

  • Steffman, J., & Jun, S-A (2019). Perceptual integration of pitch and duration: Prosodic and psychoacoustic influences in speech perception. The Journal of the Acoustical Society of America, 146(3), EL251–EL257.

    PubMed  Article  Google Scholar 

  • Steffman, J., & Katsuda, H. (2020). Intonational structure influences perception of contrastive vowel length: The case of phrase-final lengthening in Tokyo Japanese. Language and Speech, 0023830920971842.

  • Stilp, C. (2018). Short-term, not long-term, average spectra of preceding sentences bias consonant categorization. The Journal of the Acoustical Society of America, 144(3), 1797–1797.

    Article  Google Scholar 

  • Stilp, C. (2020). Acoustic context effects in speech perception. Wiley Interdisciplinary Reviews: Cognitive Science, 11(1), e1517.

    PubMed  Google Scholar 

  • Tehrani, H. (2020). Appsobabble: Online applications platform. https://www.appsobabble.com.

  • Toscano, J. C., & McMurray, B. (2015). The time-course of speaking rate compensation: Effects of sentential rate and vowel length on voicing judgments. Language, Cognition and Neuroscience, 30(5), 529–543.

    PubMed  Article  Google Scholar 

  • Vasishth, S., Nicenboim, B., Beckman, M. E., Li, F., & Kong, E. J. (2018). Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71, 147–161.

    PubMed  PubMed Central  Article  Google Scholar 

  • Wade, T., & Holt, L. L. (2005). Perceptual effects of preceding nonspeech rate on temporal properties of speech categories. Perception & Psychophysics, 67(6), 939–950.

    Article  Google Scholar 

  • Warren, R. M. (1985). Criterion shift rule and perceptual homeostasis. Psychological Review, 92 (4), 574–584.

    PubMed  Article  Google Scholar 

  • Woodrow, H. (1909) A quantitative study of rhythm: The effect of variations in intensity, rate and duration. San Francisco: Science Press.

    Google Scholar 

  • Woodrow, H. (1911). The role of pitch in rhythm. Psychological Review, 18(1), 54–77.

    Article  Google Scholar 

Download references

Acknowledgements

Many thanks are due to Adam Royer for recording speech materials for the stimuli, and to Yang Wang, Danielle Bagnas, and Qinxia Guo for help with data collection. Additional thanks are due to four anonymous reviewers, and to members of the UCLA Phonetics lab, for insightful feedback and discussion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeremy Steffman.

Ethics declarations

Conflict of Interests

The author declares that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Combined analysis of Experiment 2 and 3

To allow for more concrete comparison of Experiments 2 and 3, a model was fit to combined data from both experiments, described in this section. Given the idea the global rate regularity is beneficial for rhythmic effects we might expect these influences to change over the course of an experiment, as listeners accumulate more exposure to a global rate, with consistent rate in Experiment 2 potentially strengthening the rhythmic effect over time. Accordingly, testing how listeners’ responses shift over the course of an experiment might be insightful, which motivated inclusion of trial number as a variable in the model. The model predicted listeners’ responses as a function of the continuum, rhythm condition, and experiment (contrast-coded with Experiment 2 mapped to -0.5, and Experiment 3 mapped to 0.5), as well as trial number (scaled and centered within each experiment).Footnote 10 Note that rate was not included as a predictor in the model because only Experiment 3 varied rate. Random effects in the model included all fixed effects and interactions as by-participant random slopes, save for experiment. This model will allow us to more thoroughly compare the differences observed across these two experiments, while additionally testing how they vary over time. If it is the case that more exposure to regularity helps strengthen rhythmic grouping effects in Experiment 2, we should expect a three-way interaction in the model between experiment, rhythm and trial, which would show increasing strength of rhythmic effects over the course of the trials in Experiment 2 (where global rate is invariant), but not in Experiment 3. We might also expect to see an interaction between experiment and trial if a different global rate in each impacted overall responses.

The combined analysis (model output shown in Table 7 in Appendix B) finds that, in addition to the expected effect of continuum, only two predictors were credible. The first was the interaction of precursor rhythm and experiment, in line with the reversal of the rhythmic effect observed across experiments. Unsurprisingly, comparing model contrasts using emmeans showed that each experiment evidenced a different credible effect of rhythm (Experiment 2: β= 0.33, 95% CI = [0.12, 0.56]; Experiment 3: β=-0.32, 95% CI = [-0.50, -0.15] ), as was established by the main effects of rhythm in the individual analysis of these experiments (see also Fig. 5).

The other model estimate that was observed to be credible was the interaction between trial and Experiment, suggesting that responses changed over the course of each experiment in a different fashion. Change over the course of each Experiment is shown in Fig. 7, plotting scaled trial number by overall “code”, responses, also split by condition. The emtrends function of emmeans was used to test for the influence of changing trial number in each experiment. This assessment finds a credible effect of trial in Experiment 2, whereby “code” responses increase over the course of the experiment (β= 0.26, 95% CI = [0.04, 0.45]). In comparison, no credible change across trials was found in Experiment 3, though the estimated effect is negative unlike that in Experiment 2 (β=-0.08, 95% CI = [-0.27, 0.08]), providing weak evidence for a decrease in “code” responses over the course of Experiment 3. The three way interaction between Experiment, rhythm, and trial was not observed to be credible, and indeed post-hoc inspection of the influence of rhythm over the course of each experiment found that the effect did not change reliably in either, though the estimate was positive in Experiment 2, and negative in Experiment 3.Footnote 11 We thus do not have clear evidence for an effect of rhythm that changes in a different way across experiments, though we do have a clear evidence for a difference in overall responses, such that listeners reliably increased their “code” responses over the course of Experiment 2. What might explain this effect? One possibility is that regular rhythmic patterns influenced listeners’ perception of speech rate in the stimuli, given that previous links between rhythmicity and perceived duration have been suggested in the literature.Footnote 12Horr and Di Luca (2015) found that stimuli which presented an isochronous pulse train of tones were perceived to last longer than an interval of the same duration which was anisochronous. If it is the case that listeners’ perception of global rate could incorporate this effect, listeners should develop perceived slower global pace with increased exposure to isochronous rhythmic patterning in the stimuli. This in turn would make a given stimulus sound relatively fast, increasing “code” responses over the course of the experiment, and would, by hypothesis, be disrupted by variation in global rate, as in Experiment 3, though this explanation is speculative.

Fig. 7
figure 7

Model fit for the effect of trial number on overall responses from the combined analysis. Fits are split by rhythm, indicated by line type and color within a panel, and by Experiment, across panels

The lack of an effect of trial in Experiment 3 can also be compared to previous studies examining global rate effects. For example, it might be expected that the neutral rate condition would be perceived as slower in relation to fast rate (in line with Maslowski et al., 2020, discussed in “Introduction”), and therefore we would see relatively decreased “code” responses in Experiment 3’s neutral rate condition as compared to Experiment 2, however this is not the case. Baese-Berk et al., (2014) showed effects like these grow in strength over the course of an experiment as listeners accumulate exposure to global rate patterns, further suggesting we might have expected to see this change occurring over the course of the trials in Experiment 3. A likely explanation for the present lack of an effect is the relatively short duration of Experiment 3, which lasted approximately 20 min. In both Maslowski, Meyer, and Bosker (2019) and Baese-Berk et al., (2014), the experiment lasted longer than 50 min, giving listeners longer exposure to global rate patterns. In fact, Baese-Berk et al., (2014), who report their experiment took approximately 1 h to complete, analyzed the effect of global rate over the course of the experiment in three blocks. In the first block of the experiment (corresponding to about 20 min) there was no observable effect of global rate (see Baese-Berk et al., 2014 Figure 2), which only emerged in the second block and strengthened in the third. This data suggests 20 min of exposure to a global rate pattern may not be enough time to generate previously documented effects, consistent with the lack of an effect seen here in Experiment 3. We can also note that the model estimate for trial in Experiment 3, though it is not credible, is in the direction that we would expect given a faster global rate in the experiment (i.e., decreasing “code” responses over the course of the experiment), suggesting a longer experiment might have allowed for the expected effect to appear.

In summarizing the comparison across experiments, we have reaffirmed that global speech rate variation disrupts the rhythmic grouping effects seen in Experiment 2, as discussed in “Results and discussion”. We also have evidence that temporally regular rhythmic patterns induce a change over time, potentially indicating listeners’ increasing sensitivity to the pattern.

Appendix 2: Model summaries for all Experiments

Fixed effect estimates, and upper and lower 95% CI are given in each table. A credible fixed effect, for which the CI exclude zero, is bolded. Model estimates are given for (by-participant) random intercepts, and for random slopes.

Table 2 Model results for Experiment 1
Table 3 Model results for Experiment 2
Table 4 Model results for Experiment 3
Table 5 Pairwise comparison of contrasts for all rhythm and rate combinations in Experiment 3
Table 6 Model results for Experiment 4
Table 7 Model results for the combined analysis of Experiment 2 and Experiment 3

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Steffman, J. Rhythmic and speech rate effects in the perception of durational cues. Atten Percept Psychophys 83, 3162–3182 (2021). https://doi.org/10.3758/s13414-021-02334-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13414-021-02334-w

Keywords

  • Speech perception
  • Durational processing
  • Speech rhythm
  • Speech rate
  • Perceptual grouping