Text-to-Speech Synthesis with Dynamic Control of Source Parameters

Oliveira, Luís C.

doi:10.1007/978-1-4612-1894-4_3

Luís C. Oliveira

284 Accesses

Abstract

This chapter describes the study of some characteristics of sourceparameter dynamics to derive a preliminary set of rules that were integrated in textto-speech (TTS) systems. An automated procedure estimated the source parameters of 534 seconds of voiced speech from a set of 300 English sentences spoken by a single female speaker. The results showed that there is a strong correlation between the values of the source parameter in the vowel midpoint and the vowel duration. The same parameters tend to decrease on vowel onsets and to increase on vowels offsets. This seems to indicate a prosodic nature of these parameters requiring special treatment in concatenative-based TTS systems that use source modification techniques, such as pitch synchronous overlap add (PSOLA) and multipulse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. B. Almeida and J. M. Tribolet. Non-stationary spectral modeling of voiced speech. Transactions on Acoustic Speech and Signal Froc. ASSP-31(3):664–678, June 1983.
Article Google Scholar
C. Gobi and A. Chasaide. The effects of adjacent voiced/voiceless consonants on the vowel voice source: A cross language study. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 2–3, 1988.
Google Scholar
G. Fant, J. Liljencrants, and Q. Lin. A four parameter model of glottal flow. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 4:1–13, 1985.
Google Scholar
I. Karlsson. Glottal waveforms for normal female speakers. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 31–36, 1985.
Google Scholar
D. H. Klatt and L. C. Klatt. Analysis, synthesis and perception of voice quality variations among female and male talkers. J. Acoust. Soc. Amer. 87(2):820–857, 1990.
Article Google Scholar
D. H. Klatt. Review of text-to-speech conversion for English. J. Acoust. Soc. Amer. 82(3):737–793, 1987.
Article Google Scholar
J. Marques and L. Almeida. Sinusoidal modeling of voiced and unvoiced speech. In Proceedings of the European Conference on Speech Communication and Technology, September 1989.
Google Scholar
J. P. Olive. A new algorithm for a concatenative speech synthesis system using an augmented acoustic inventory of speech sounds. In ESCA Workshop on Speech Synthesis, Autrans, France, 25–29, September 1990.
Google Scholar
L. C. Oliveira. Estimation of source parameters by frequency analysis. In Proceedings of the European Conference on Speech Communication and Technology, Berlin, vol. 1, 99–102, September 1993.
Google Scholar
A. E. Rosenberg. Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Amer. 49(2 (Part 2)):583–590, 1971.
Article Google Scholar
J. Schroeter and M. M. Sondhi. Speech coding based on physiological models of speech production. In Advances in Speech Signal Processing, S. Furui and M. Mohan Sondhi, eds., Marcel Dekker, Inc., New York, 231–268, 1992.
Google Scholar
D. Talkin. Voice epoch determination with dynamic programming. J. Acoust. Soc. Amer. 85S1-S149 1989
Google Scholar
D. Talkin and J. Rowley. Pitch-synchronous analysis and synthesis for TTS systems. In ESCA Workshop on Speech Synthesis, Autrans, France, 55–58, September 1990.
Google Scholar

Download references

Authors

Luís C. Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Laboratories Room 2D-452, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Jan P. H. van Santen
Bell Laboratories Room 2D-447, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Joseph P. Olive
Bell Laboratories Room 2D-451, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Richard W. Sproat
AT&T Research Room 2C-409, 600 Mountain Avenue, Murray Hill, NJ, 07974-0636, USA
Julia Hirschberg

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Oliveira, L.C. (1997). Text-to-Speech Synthesis with Dynamic Control of Source Parameters. In: van Santen, J.P.H., Olive, J.P., Sproat, R.W., Hirschberg, J. (eds) Progress in Speech Synthesis. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1894-4_3

Download citation

DOI: https://doi.org/10.1007/978-1-4612-1894-4_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7328-8
Online ISBN: 978-1-4612-1894-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics