Time–Frequency Analysis of Vietnamese Speech Inspired on Chirp Auditory Selectivity

  • Ha Nguyen
  • Luis Weruaga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5351)

Abstract

In speech analysis, the pitch or fundamental frequency is usually considered as parameter for characterizing the vocal chord excitation, but it plays nearly no role in the very time–spectral analysis of the speech signal. In this paper, we present a novel speech analysis approach in which pitch (and its variation over time) play a leading role. The computation of the pitch and the pitch rate is carried out in-segment, by means of the minimization of Huber’s loss over the short-time correlation according to a second-order polynomial fitting law. The proposed method is integrated within the Fan-Chirp transform and the Spectral All-Pole Estimation method, both proposed previously by the authors. The results over Vietnamese speech reveal the advantages of the proposed analysis methodology versus the popular linear prediction estimation. The paper discusses finally the possible impact of the proposed method in speech coding, this representing the upcoming research work.

Keywords

Pitch-driven time–frequency analysis frequency-selective AR estimation speech coding 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kondoz, A.M.: Digital speech: Coding for low bit rate communication systems. John Wiley & Sons, Chichester (2004)CrossRefGoogle Scholar
  2. 2.
    Quatieri, T.F.: Discrete-Time Speech Signal Processing. Prentice-Hall, Englewood Cliffs (2001)Google Scholar
  3. 3.
    Weruaga, L., Képesi, M.: The fan-chirp transform for nonstationary harmonic signals. Signal Processing 87, 1504–1522 (2007)CrossRefMATHGoogle Scholar
  4. 4.
    Kawahara, H., et al.: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication 27, 187–207 (1999)CrossRefGoogle Scholar
  5. 5.
    Mercado, E., Myers, C.E., Gluck, M.A.: Modeling auditory cortical processing as an adaptive chirplet transform. Neurocomputing 32(33), 913–919 (2000)CrossRefGoogle Scholar
  6. 6.
    Dunn, R., Quatieri, T.F.: Sinewave analysis/synthesis based on the fan-chirp transform. In: Proc. IEEE WASPAA, pp. 247–250 (2007)Google Scholar
  7. 7.
    Li, P., Guan, Y., Xu, B., Liu, W.: Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. In: Proc. IEEE ICASSP, pp. 2014–2023 (2008)Google Scholar
  8. 8.
    Weruaga, L.: All-pole estimation in spectral domain. IEEE Trans. Signal Processing 55, 4821–4830 (2007)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Whittle, P.: Gaussian estimation in stationary time series. Bull. Intl. Stat. Instit. 39, 105–130 (1961)MathSciNetMATHGoogle Scholar
  10. 10.
    Képesi, M., Weruaga, L.: Adaptive chirp-based time-frequency analysis of speech signals. Speech Communication 55, 474–492 (2006)CrossRefGoogle Scholar
  11. 11.
    Marques, J.S., et al.: Improved pitch prediction with fractional delays in CELP coding. In: Proc. IEEE ICASSP, pp. 665–668 (1990)Google Scholar
  12. 12.
    Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)MATHGoogle Scholar
  13. 13.
    Rojo-Álvarez, J.L., et al.: A robust support vector algorithm for nonparametric spectral analysis. IEEE Signal Processing Lett. 10, 320–323 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ha Nguyen
    • 1
  • Luis Weruaga
    • 1
  1. 1.Commission for Scientific VisualisationAustrian Academy of SciencesViennaAustria

Personalised recommendations