Close Shadowing Natural Versus Synthetic Speech

Bailly, G.

doi:10.1023/A:1021091720511

Close Shadowing Natural Versus Synthetic Speech

Published: January 2003

Volume 6, pages 11–19, (2003)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

G. Bailly¹

171 Accesses
29 Citations
3 Altmetric
Explore all metrics

Abstract

Close shadowing experiments involving natural and synthetic stimuli are described. Preliminary results show that speakers are able to follow natural stimuli with an average delay of 70 ms whereas this delay typically exceeds 100 ms for stimuli produced by text-to-speech systems. A complementary experiment shows that this contrast is mainly due to the inappropriate or impoverished prosody generated by actual text-to-speech systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

Article 05 October 2015

Language and perception: Introduction to the Special Issue “Speakers and Listeners in the Visual World”

Article Open access 14 October 2019

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

Article 11 January 2021

References

Aubergé, V., Grépillat, T., and Rilliard, A. (1997). Can we perceive attitudes before the end of sentences? The gating paradigm for prosodic contours. Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece, pp. 871–874.
Auxiette, C. and Gérard, C. (1992). Perceptual and motor determinants in the synchronization of music and speech. Fourth InternationalWorkshop on Rhythm Perception and Production. Bourges,France, pp. 59–64.
Bailly, G., Barbe, T., and Wang, H. (1990). Automatic labelling of large prosodic databases:Tools, methodology and links with a textto-speech system. ETRWWorkshop on Speech Synthesis. Autrans, France, pp. 201–204.
Boersma, P. and Weenink, D. (1996). Praat, a system for doing phonetics by computer, version 3.4, Institute of Phonetic Sciences of the University of Amsterdam, Report 132. 182 pages.
Carey, P.W. (1971). Verbal retention after shadowing and after listening. Perception and Psychopysics, 9:79–83.
Google Scholar
Charpentier, F. and Moulines, E. (1990). Pitch-synchronous waveform processing techniques for text-to-speech using diphones. Speech Communication, 9(5/6):453–467.
Google Scholar
Chistovich, L.A., Aliakrinskii, V.V., and Abulian, V.A. (1960). Time delays in speech repetition. Voprosy Psikhologii, 1:114–119.
Google Scholar
Dumay, N. and Radeau, M. (1997). Rime and syllabic effects in phonological priming between French spoken words. Proceedings of the European Conference on Speech Communication and Technology, pp. 2191–2194.
Dutoit, T., Pagel, V., Pierret, N., Bataille, F., and Vrecken, O.v.d. (1996). The MBROLA project: Towards a set of high quality speech synthesizers free of use for non commercial purposes. Proceedings of the International Conference on Speech and Language Processing. Philadelphia, USA, pp. 1393–1396.
Eriksson, A. and Wretling, P. (1997). How flexible is the human voice? A case study of mimicry. Proceedings of the European Conference on Speech Communication and Technology. Rhodes, Greece, pp. 1043–1046.
Fay, W.H. and Coleman, R.O. (1977). A human sound transducer/reproducer: Temporal capabilities of a profoundly echolatic child. Brain and Language, 4:396–402.
Google Scholar
Grosjean, F. (1983). How long is the sentence? Prediction and prosody in the on-line processing of language. Linguistica, 21:501–529.
Google Scholar
Grosjean, F. and Hirt, C. (1996). Using prosody to predict the end of sentences in English and French: Normal and brain damaged subjects. Language and Cognitive Processes, 11(1):107–134.
Google Scholar
Jones, M.R. and Boltz, M.G. (1989). Dynamic attending and responses to time. Psychological Review, 96:459–491.
Google Scholar
Kuhl, P.K. and Meltzoff, A.N. (1982). The bimodal perception of speech in infancy. Science, 218:1138–1141.
Google Scholar
Kuhl, P.K. and Meltzoff, A.N. (1996). Infant vocalizations in response to speech: Vocal imitation and developmental change. Journal of the Acoustical Society of America, 100:2425–2438.
Google Scholar
MacNeilage, P. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21(4):499–548.
Google Scholar
Marslen-Wilson, W. (1973). Linguistic structure and speech shadowing at very short latencies. Nature, 244:522–523.
Google Scholar
Marslen-Wilson,W. (1985). Speech shadowing and speech comprehension. Speech Communication, 4:55–73.
Google Scholar
McCarthy, R. and Warrington, E.K. (1984). A two-route model of speech production: Evidence from aphasia. Brain, 107:463–485.
Google Scholar
McLeod, P. and Posner, M.I. (1984). Privileged loops from percept to act. In H. Bouma and D. Bouwhuis (Eds.), Attention and performance X. Lawrence Erlbaum Associates: Mahwah, NJ, USA, pp. 55–66.
Google Scholar
Porter, R.J. and Castellanos, F.X. (1980). Speech-production measures of speech perception: Rapid shadowing of VCV syllables. Journal of the Acoustical Society of America, 67(4):1349–1356.
Google Scholar
Porter, R.J. and Lubker, J.F. (1980). Rapid reproduction of vowelvowel sequences: Evidence for a fast and direct acoustic-motoric linkage in speech. Journal of Speech and Hearing Research, 23:593–602.
Google Scholar
Rizzolatti, G., Fadiga, L., Gallese,V., and Fogassi, L. (1996). Premotor cortex and the recognition of motor actions. Cognitive Brain Research, 3:131–141.
Google Scholar
Schmuckler, M. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, 7:109–150.
Google Scholar
Schneider, D.E. (1938). The clinical syndromes of echolalia, echopraxia, grasping and sucking. Journal of Nervous and Mental Disease, 88(18–35):200–216.
Google Scholar
Stetson, R.H. (1905). Motor theory of rhythm and discrete succession I and II. Psychological Review, 12:250–269, 293–335.
Google Scholar
Vitkovitch, M. and Barber, P. (1994). Effect of video frame rate on shadowing. Journal of Speech and Hearing Research, 37:1204–1210.
Google Scholar
Young, S.J. (1992). HTK: Hidden Markov Model Toolkit V1.3. Reference Manual. Cambridge University Engineering Department.

Download references

Author information

Authors and Affiliations

Institut de la Communication Parlée, UMR CNRS n°5009, INPG/Univ. Stendhal, 46, av. Félix Viallet, 38031, Grenoble CEDEX, France
G. Bailly

Authors

G. Bailly
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bailly, G. Close Shadowing Natural Versus Synthetic Speech. International Journal of Speech Technology 6, 11–19 (2003). https://doi.org/10.1023/A:1021091720511

Download citation

Issue Date: January 2003
DOI: https://doi.org/10.1023/A:1021091720511

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Close Shadowing Natural Versus Synthetic Speech

Abstract

Access this article

Similar content being viewed by others

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

Language and perception: Introduction to the Special Issue “Speakers and Listeners in the Visual World”

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Close Shadowing Natural Versus Synthetic Speech

Abstract

Access this article

Similar content being viewed by others

Re-examining selective adaptation: Fatiguing feature detectors, or distributional learning?

Language and perception: Introduction to the Special Issue “Speakers and Listeners in the Visual World”

Usability Evaluation of Artificial Intelligence-Based Voice Assistants: The Case of Amazon Alexa

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation