Advertisement

Oscillating Statistical Moments for Speech Polarity Detection

  • Thomas Drugman
  • Thierry Dutoit
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7015)

Abstract

An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing. An automatic method for determining the speech polarity (which is dependent upon the recording setup) is thus required as a preliminary step for ensuring the well-behaviour of such techniques. This paper proposes a new approach of polarity detection relying on oscillating statistical moments. These moments have the property to oscillate at the local fundamental frequency and to exhibit a phase shift which depends on the speech polarity. This dependency stems from the introduction of non-linearity or higher-order statistics in the moment calculation. The resulting method is shown on 10 speech corpora to provide a substantial improvement compared to state-of-the-art techniques.

Keywords

Speech Signal Statistical Moment Vocal Tract Speech Synthesis Emotional Speech 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Drugman, T., Thomas, M., Gudnason, J., Naylor, P., Dutoit, T.: Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review. IEEE Trans. on Audio, Speech and Language Processing (to appear)Google Scholar
  2. 2.
    Fant, G., Liljencrants, J., Lin, Q.: A four parameter model of glottal flow, STL-QPSR4, pp. 1–13 (1985)Google Scholar
  3. 3.
    Sakaguchi, S., Arai, T., Murahara, Y.: The Effect of Polarity Inversion of Speech on Human Perception and Data Hiding as Application. In: ICASSP, vol. 2, pp. 917–920 (2000)Google Scholar
  4. 4.
    Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP, pp. 373–376 (1996)Google Scholar
  5. 5.
    Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16, 175–205 (1995)CrossRefGoogle Scholar
  6. 6.
    Drugman, T., Bozkurt, B., Dutoit, T.: A comparative study of glottal source estimation techniques. Computer Speech and Language 26, 20–34 (2012)CrossRefGoogle Scholar
  7. 7.
    Ding, W., Campbell, N.: Determining Polarity of Speech Signals Based on Gradient of Spurious Glottal Waveforms. In: ICASSP, pp. 857–860 (1998)Google Scholar
  8. 8.
    Alku, P., Svec, J., Vilkman, E., Sram, F.: Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Communication 11(2-3), 109–118 (1992)CrossRefGoogle Scholar
  9. 9.
    Saratxaga, I., Erro, D., Hernáez, I., Sainz, I., Navas, E.: Use of harmonic phase information for polarity detection in speech signals. In: Interspeech, pp. 1075–1078 (2009)Google Scholar
  10. 10.
    Kominek, J., Black, A.: The CMU Arctic Speech Databases. In: SSW5, pp. 223–224 (2004)Google Scholar
  11. 11.
    Burkhardt, F., Paseschke, A., Rolfes, M., Sendlmeier, W., Weiss, B.: A Database of German Emotional Speech. In: Interspeech, pp. 1517–1520 (2005)Google Scholar
  12. 12.
    Bagshaw, P., Hiller, S., Jack, M.: Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: Eurospeech, pp. 1003–1006 (1993)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Thomas Drugman
    • 1
  • Thierry Dutoit
    • 1
  1. 1.TCTS LabUniversity of MonsBelgium

Personalised recommendations