Abstract
Mathematical models of the phase function and its parameters in speech-signal analysis problems have been investigated. The phase spectrum of a speech signal has been calculated using the Hilbert transform of signals at the output of a gammatone filterbank. Short- and long-term modulations of the linear phase component and phase derivatives with respect to frequency and time, and mixed derivative have been considered. The method for vowel segmentation using aggregation of the correlation coefficients of the phase parameters is described. Experiments on estimating the formant and pitch frequencies and the glottal opening and closure instants have been performed.
Similar content being viewed by others
REFERENCES
J. Flanagan and R. Golden, Bell Syst. Tech. J. 45 (9), 1493 (1966).
J. Laroche and M. Dolson, IEEE Trans. Speech Audio Process. 7 (3), 323 (1999).
A. V. Oppenheim and J. S. Lim, Proc. IEEE 9 (5), 529 (1981).
A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing (Prentice Hall, 1999).
L. Liu, J. He, and G. Palm, Speech Commun. 22 (4), 403 (1997).
K. K. Paliwal and L. D. Alsteris, Speech Commun. 45, 153 (2005).
L. D. Alsteris and K. K. Paliwal, Digital Signal Process. 17, 578 (2007).
P. Aarabi, G. Shi, M. M. Shanechi, and S. A. Rabi, Phase Based Processing Speech (World Scientific Publ., Singapore, 2006).
A. P. Stark and K. K. Paliwal, in Proc. 9th Annu. Conf. of the International Speech Communication Association, ISCA Interspeech 2008 (Brisbane, Sept. 22–26, 2008).
H. A. Murthy and B. Yegnanarayna, Sadhana 36 (5), 745 (2011).
A. S. Leonov and V. N. Sorokin, Inf. Protsessy 21 (2), 125 (2021). http://www.jip.ru.
B. Yegnanarayana, J. Sreekanth, and A. Rangarajan, IEEE Trans. Audio Speech Lang. Process. 33 (4), 832 (1985).
R. Smits and B. Yegnanarayana, IEEE Trans. Speech Audio Process. 3 (5), 325 (1995).
T. Drugman, M. Thomas, J. Gudnason, P. Naylor, and T. Dutoit, IEEE Trans. Audio Speech Lang. Process. 20 (3), 994 (2012).
P. Mowlaee and R. Saeidi, Speech Commun. 81, 1 (2016).
K. Gurugubelli and A. K. Vuppala, Speech Commun. 121, 1 (2020).
V. N. Sorokin and A. S. Leonov, Acoust. Phys. 67 (2), 193 (2021).
T. Drugman and Y. Stylianou, in Proc. ISCA Interspeech (Dresden, 2015), p. 1171.
S. O. Sadjadi and J. H. L. Hansen, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (Prague, 2011), p. 5448.
S. O. Sadjadi and J. H. L. Hansen, Speech Commun. 72, 138 (2015).
R. D. Patterson, K. Robinson, J. Holdsworth, D. McKeown, C. Zhang, and M. Allerhand, in Auditory Physiology and Perception, Ed. by Y. Cazals, L. Demany, and K. Horner (Pergamon Press, Oxford, 1992), p. 429.
R. D. Patterson and J. Holdsworth, Adv. Speech, Hear. Lang. Process. 3, 547 (1996).
B. Bozkurt, L. Couvreur, and T. Dutoit, Speech Commun. 49 (3), 159 (2007).
V. N. Sorokin, Acoust. Phys. 62 (2), 244 (2016).
V. N. Sorokin, Acoust. Phys. 66 (1), 67 (2020).
Funding
A.S. Leonov acknowledges the support within the Program for Increasing the Competitiveness of the National Research Nuclear University MEPhI (project no. 02.a03.21.0005 dated August 27, 2013).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no conflicts of interest.
Additional information
Translated by A. Sin’kov
Rights and permissions
About this article
Cite this article
Sorokin, V.N., Leonov, A.S. Phase Modulations in a Speech Signal. Acoust. Phys. 68, 187–200 (2022). https://doi.org/10.1134/S1063771022020099
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1063771022020099