Time–Frequency Analysis of Vietnamese Speech Inspired on Chirp Auditory Selectivity
In speech analysis, the pitch or fundamental frequency is usually considered as parameter for characterizing the vocal chord excitation, but it plays nearly no role in the very time–spectral analysis of the speech signal. In this paper, we present a novel speech analysis approach in which pitch (and its variation over time) play a leading role. The computation of the pitch and the pitch rate is carried out in-segment, by means of the minimization of Huber’s loss over the short-time correlation according to a second-order polynomial fitting law. The proposed method is integrated within the Fan-Chirp transform and the Spectral All-Pole Estimation method, both proposed previously by the authors. The results over Vietnamese speech reveal the advantages of the proposed analysis methodology versus the popular linear prediction estimation. The paper discusses finally the possible impact of the proposed method in speech coding, this representing the upcoming research work.
KeywordsPitch-driven time–frequency analysis frequency-selective AR estimation speech coding
Unable to display preview. Download preview PDF.
- 2.Quatieri, T.F.: Discrete-Time Speech Signal Processing. Prentice-Hall, Englewood Cliffs (2001)Google Scholar
- 6.Dunn, R., Quatieri, T.F.: Sinewave analysis/synthesis based on the fan-chirp transform. In: Proc. IEEE WASPAA, pp. 247–250 (2007)Google Scholar
- 7.Li, P., Guan, Y., Xu, B., Liu, W.: Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. In: Proc. IEEE ICASSP, pp. 2014–2023 (2008)Google Scholar
- 11.Marques, J.S., et al.: Improved pitch prediction with fractional delays in CELP coding. In: Proc. IEEE ICASSP, pp. 665–668 (1990)Google Scholar