Skip to main content
Log in

Close Shadowing Natural Versus Synthetic Speech

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aubergé, V., Grépillat, T., and Rilliard, A. (1997). Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours. Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece, pp. 871–874.

  • Auxiette, C. and Gérard, C. (1992). Perceptual and motor determinants in the synchronization of music and speech. Fourth InternationalWorkshop on Rhythm Perception and Production. Bourges,France, pp. 59–64.

  • Bailly, G., Barbe, T., and Wang, H. (1990). Automatic labelling of large prosodic databases:Tools, methodology and links with a textto-speech system. ETRWWorkshop on Speech Synthesis. Autrans, France, pp. 201–204.

  • Boersma, P. and Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4, Institute of Phonetic Sciences of the University of Amsterdam, Report 132. 182 pages.

  • Carey, P.W. (1971). Verbal retention after shadowing and after listening. Perception and Psychopysics, 9:79–83.

    Google Scholar 

  • Charpentier, F. and Moulines, E. (1990). Pitch-synchronous waveform processing techniques for text-to-speech using diphones. Speech Communication, 9(5/6):453–467.

    Google Scholar 

  • Chistovich, L.A., Aliakrinskii, V.V., and Abulian, V.A. (1960). Time delays in speech repetition. Voprosy Psikhologii, 1:114–119.

    Google Scholar 

  • Dumay, N. and Radeau, M. (1997). Rime and syllabic effects in phonological priming between French spoken words. Proceedings of the European Conference on Speech Communication and Technology, pp. 2191–2194.

  • Dutoit, T., Pagel, V., Pierret, N., Bataille, F., and Vrecken, O.v.d. (1996). The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. Proceedings of the International Conference on Speech and Language Processing. Philadelphia, USA, pp. 1393–1396.

  • Eriksson, A. and Wretling, P. (1997). How flexible is the human voice? A case study of mimicry. Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece, pp. 1043–1046.

  • Fay, W.H. and Coleman, R.O. (1977). A human sound transducer/reproducer: Temporal capabilities of a profoundly echolatic child. Brain and Language, 4:396–402.

    Google Scholar 

  • Grosjean, F. (1983). How long is the sentence? Prediction and prosody in the on-line processing of language. Linguistica, 21:501–529.

    Google Scholar 

  • Grosjean, F. and Hirt, C. (1996). Using prosody to predict the end of sentences in English and French: Normal and brain damaged subjects. Language and Cognitive Processes, 11(1):107–134.

    Google Scholar 

  • Jones, M.R. and Boltz, M.G. (1989). Dynamic attending and responses to time. Psychological Review, 96:459–491.

    Google Scholar 

  • Kuhl, P.K. and Meltzoff, A.N. (1982). The bimodal perception of speech in infancy. Science, 218:1138–1141.

    Google Scholar 

  • Kuhl, P.K. and Meltzoff, A.N. (1996). Infant vocalizations in response to speech: Vocal imitation and developmental change. Journal of the Acoustical Society of America, 100:2425–2438.

    Google Scholar 

  • MacNeilage, P. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4):499–548.

    Google Scholar 

  • Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies. Nature, 244:522–523.

    Google Scholar 

  • Marslen-Wilson,W. (1985). Speech shadowing and speech comprehension. Speech Communication, 4:55–73.

    Google Scholar 

  • McCarthy, R. and Warrington, E.K. (1984). A two-route model of speech production: Evidence from aphasia. Brain, 107:463–485.

    Google Scholar 

  • McLeod, P. and Posner, M.I. (1984). Privileged loops from percept to act. In H. Bouma and D. Bouwhuis (Eds.), Attention and performance X. Lawrence Erlbaum Associates: Mahwah, NJ, USA, pp. 55–66.

    Google Scholar 

  • Porter, R.J. and Castellanos, F.X. (1980). Speech-production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67(4):1349–1356.

    Google Scholar 

  • Porter, R.J. and Lubker, J.F. (1980). Rapid reproduction of vowelvowel sequences: Evidence for a fast and direct acoustic-motoric linkage in speech. Journal of Speech and Hearing Research, 23:593–602.

    Google Scholar 

  • Rizzolatti, G., Fadiga, L., Gallese,V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3:131–141.

    Google Scholar 

  • Schmuckler, M. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, 7:109–150.

    Google Scholar 

  • Schneider, D.E. (1938). The clinical syndromes of echolalia, echopraxia, grasping and sucking. Journal of Nervous and Mental Disease, 88(18–35):200–216.

    Google Scholar 

  • Stetson, R.H. (1905). Motor theory of rhythm and discrete succession I and II. Psychological Review, 12:250–269, 293–335.

    Google Scholar 

  • Vitkovitch, M. and Barber, P. (1994). Effect of video frame rate on shadowing. Journal of Speech and Hearing Research, 37:1204–1210.

    Google Scholar 

  • Young, S.J. (1992). HTK: Hidden Markov Model Toolkit V1.3. Reference Manual. Cambridge University Engineering Department.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bailly, G. Close Shadowing Natural Versus Synthetic Speech. International Journal of Speech Technology 6, 11–19 (2003). https://doi.org/10.1023/A:1021091720511

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021091720511

Navigation