Advertisement

Cognitive Computation

, Volume 5, Issue 4, pp 442–447 | Cite as

Detecting Speech Polarity with High-Order Statistics

  • Thomas Drugman
  • Thierry Dutoit
Article

Abstract

Inverting the speech polarity, which is dependent upon the recording setup, may seriously degrade the performance of various speech processing applications. Therefore, its automatic detection from the speech signal is thus required as a preliminary step for ensuring such techniques are well-behaved. In this paper, a new method for polarity detection is proposed. This new approach relies on oscillating statistical moments that exhibit the property of having a phase shift which depends on the speech polarity. This dependency arises from the higher-order statistics in the moment calculation. The proposed approach is compared to state-of-the-art techniques on 10 speech corpora. Their performance in clean conditions as well as their robustness to additive noise is discussed.

Keywords

Speech processing Speech analysis Speech polarity Glottal source Pitch-synchronous Glottal closure instant 

Notes

Acknowledgments

Authors would like to thank the Walloon Region, Belgium, for its support (grant WIST 3 COMPTOUX # 1017071).

References

  1. 1.
    Drugman T, Thomas M, Gudnason J, Naylor P, Dutoit T. Detection of glottal closure instants from speech signals: a quantitative review. IEEE Trans Audio Speech Lang Process. 2012; 20(3):994–1006.CrossRefGoogle Scholar
  2. 2.
    Fant G, Liljencrants J, Lin Q. A four parameter model of glottal flow, STL-QPSR4, 1985. p. 1–13.Google Scholar
  3. 3.
    Sakaguchi S, Arai T, Murahara Y. The effect of polarity inversion of speech on human perception and data hiding as application. ICASSP. 2000; 2:917–20.Google Scholar
  4. 4.
    Hunt A, Black A. Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP, 1996. p. 373–76.Google Scholar
  5. 5.
    Moulines E, Laroche J. Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Commun. 1995; 16:175–205.CrossRefGoogle Scholar
  6. 6.
    Drugman T, Bozkurt B, Dutoit T. A comparative study of glottal source estimation techniques. Comput Speech Lang. 2012. 26:20–34.CrossRefGoogle Scholar
  7. 7.
    Ding W, Campbell N. Determining polarity of speech signals based on gradient of spurious glottal waveforms, ICASSP, 1998; p. 857–60.Google Scholar
  8. 8.
    Alku P, Svec J, Vilkman E, Sram F. Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering. Speech Commun. 1992; 11(2–3):109–18.CrossRefGoogle Scholar
  9. 9.
    Saratxaga I, Erro D, Hernez I, Sainz I, Navas E. Use of harmonic phase information for polarity detection in speech signals, Interspeech, 2009. p. 1075–8.Google Scholar
  10. 10.
    Kawahara H, Atake Y, Zolfaghari P. Accurate vocal event detection based on a fixed point analysis of mapping from time to weighted average group delay, Proceedings of ICSLP, 2000; p. 664–7.Google Scholar
  11. 11.
    Chatfield C. The analysis of time series. London: Chapman and Hall, 1984).Google Scholar
  12. 12.
    Kominek J, Black A. The CMU Arctic Speech Databases, SSW5, 2004; p. 223–4.Google Scholar
  13. 13.
    Burkhardt F, Paseschke A, Rolfes M, Sendlmeier W, Weiss B. A database of german emotional speech, Interspeech, 2005. p. 1517–20.Google Scholar
  14. 14.
    Bagshaw P, Hiller S, Jack M. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching, Eurospeech, 1993. p. 1003–6.Google Scholar
  15. 15.
  16. 16.
    Drugman T, Alwan A. Joint robust voicing detection and pitch estimation based on residual harmonics, Interspeech, 2011. p. 1973–6.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.TCTS LabUniversity of MonsMonsBelgium

Personalised recommendations