Abstract
This chapter describes the study of some characteristics of sourceparameter dynamics to derive a preliminary set of rules that were integrated in textto-speech (TTS) systems. An automated procedure estimated the source parameters of 534 seconds of voiced speech from a set of 300 English sentences spoken by a single female speaker. The results showed that there is a strong correlation between the values of the source parameter in the vowel midpoint and the vowel duration. The same parameters tend to decrease on vowel onsets and to increase on vowels offsets. This seems to indicate a prosodic nature of these parameters requiring special treatment in concatenative-based TTS systems that use source modification techniques, such as pitch synchronous overlap add (PSOLA) and multipulse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. B. Almeida and J. M. Tribolet. Non-stationary spectral modeling of voiced speech. Transactions on Acoustic Speech and Signal Froc. ASSP-31(3):664–678, June 1983.
C. Gobi and A. Chasaide. The effects of adjacent voiced/voiceless consonants on the vowel voice source: A cross language study. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 2–3, 1988.
G. Fant, J. Liljencrants, and Q. Lin. A four parameter model of glottal flow. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 4:1–13, 1985.
I. Karlsson. Glottal waveforms for normal female speakers. Speech Transmission Laboratory — QPSR Stockholm, Sweden, 31–36, 1985.
D. H. Klatt and L. C. Klatt. Analysis, synthesis and perception of voice quality variations among female and male talkers. J. Acoust. Soc. Amer. 87(2):820–857, 1990.
D. H. Klatt. Review of text-to-speech conversion for English. J. Acoust. Soc. Amer. 82(3):737–793, 1987.
J. Marques and L. Almeida. Sinusoidal modeling of voiced and unvoiced speech. In Proceedings of the European Conference on Speech Communication and Technology, September 1989.
J. P. Olive. A new algorithm for a concatenative speech synthesis system using an augmented acoustic inventory of speech sounds. In ESCA Workshop on Speech Synthesis, Autrans, France, 25–29, September 1990.
L. C. Oliveira. Estimation of source parameters by frequency analysis. In Proceedings of the European Conference on Speech Communication and Technology, Berlin, vol. 1, 99–102, September 1993.
A. E. Rosenberg. Effect of glottal pulse shape on the quality of natural vowels. J. Acoust. Soc. Amer. 49(2 (Part 2)):583–590, 1971.
J. Schroeter and M. M. Sondhi. Speech coding based on physiological models of speech production. In Advances in Speech Signal Processing, S. Furui and M. Mohan Sondhi, eds., Marcel Dekker, Inc., New York, 231–268, 1992.
D. Talkin. Voice epoch determination with dynamic programming. J. Acoust. Soc. Amer. 85S1-S149 1989
D. Talkin and J. Rowley. Pitch-synchronous analysis and synthesis for TTS systems. In ESCA Workshop on Speech Synthesis, Autrans, France, 55–58, September 1990.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media New York
About this chapter
Cite this chapter
Oliveira, L.C. (1997). Text-to-Speech Synthesis with Dynamic Control of Source Parameters. In: van Santen, J.P.H., Olive, J.P., Sproat, R.W., Hirschberg, J. (eds) Progress in Speech Synthesis. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1894-4_3
Download citation
DOI: https://doi.org/10.1007/978-1-4612-1894-4_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7328-8
Online ISBN: 978-1-4612-1894-4
eBook Packages: Springer Book Archive