Abstract
The paper examines physical mechanisms of frequency modulations in acoustics of the vocal tract and methods of estimation of these modulations in the speech signal. It has been found that vibrations of the tract walls make a negligibly small effect on modulations of its resonance frequencies. The model of the process of speech formation with account for the subglottal cavity shows that a change in boundary conditions at the open glottis produces noticeable variations in resonance frequencies. Along with this type of modulations, modulations determined by the shape of the source of excitation also arise in the speech signal. They substantially depend on the ratio of the frequency of the fundamental tone to the resonance frequency and of the parameters of methods estimating modulations and methods of analysis of the speech signal. Overall, this may sometimes cause unstable and unpredictable modulations of estimated formant frequencies in the speech signal.
Similar content being viewed by others
References
P. Maragos, J. F. Kaiser, and T. F. Quatieri, IEEE Trans. Signal Process 41, 3024 (1993).
A. Potamianos and P. Maragos, J. Acoust. Soc. Am. 99, 3795 (1996).
D. Dimitriadis and P. Maragos, Speech Commun. 48, 819 (2006).
C. Ramalingam, IEEE Signal Process. Lett. 3(5), 141 (1996).
L. Fertig and J. McClellan, IEEE Signal Process. Lett. 3(2), 54 (1996).
R. Kumaresan and A. Rao, J. Acoust. Soc. Am. 105, 1912 (1999).
A. Rao and R. Kumaresan, IEEE Trans. Speech, Audio Process 8, 240 (2000).
V. N. Sorokin and I. P. Trifonenkov, Akust. Zh. 42, 365 (1996) [Acoust. Phys. 42, 318 (1996)].
A. S. Leonov and V. N. Sorokin, Informats. Prots. 7, 386 (2007).
R. Kumaresan and Y. Wang, J. Acoust. Soc. Am. 110, 2421 (2001).
Y. Wang and R. Kumaresan, J. Acoust. Soc. Amer. Express Lett. 119, EL68 (2006).
F. Gianfelici, G. Biagetti, P. Crippa, and C. Turchetti, IEEE Trans. Audio, Speech, Language Process 15, 823 (2007).
D. Vakman, IEEE Trans. Signal Process 44, 791 (1996).
A. Potamianos and P. Maragos, Speech Commun. 28, 195 (1999).
M. Grimaldi and F. Cummins, IEEE Trans. Audio, Speech, Language Process 16, 1097 (2008).
P. Badin and G. Fant, STL-QPSR 2–3, 53 (1984).
G. Fant and H. Wakita, STL-QPSR, p. 9 (1978).
V. N. Sorokin, Theory of Speech Formation (Radio Svyaz’, Moscow, 1985) [in Russian].
K. Stevens, Acoustic Phonetics (MIT Press, 1998), p. 614.
X. Chi and M. Sonderegger, J. Acoust. Soc. Am. 122, 1735 (2007).
G. I. Tsemel’, Identification of Speech Signals (Nauka, Moscow, 1971) [in Russian].
A. S. Leonov, I. S. Makarov, and V. N. Sorokin, Rech. Tehnol., No. 1, 3 (2009).
V. N. Sorokin, Speech Synthesis (Nauka, Moscow, 1992) [in Russian].
T. Baer, J. Gore, V. Gracco, and P. Nye, J. Acoust. Soc. Am. 90, 799 (1991).
I. S. Makarov, Akust. Zh. 55, 256 (2009) [Acoust. Phys. 55, 261 (2009)].
V. N. Sorokin and I. S. Makarov, Akust. Zh. 54, 659 (2008) [Acoust. Phys. 54, 571 (2008)].
B. Yegnanarayana and R. Veldhuis, IEEE Trans. Speech, Audio Process 6, 313 (1998).
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.S. Leonov, I.S. Makarov, V.N. Sorokin, 2009, published in Akusticheskiĭ Zhurnal, 2009, Vol. 55, No. 6, pp. 809–821.