Detecting Speech Polarity with High-Order Statistics
- 137 Downloads
Inverting the speech polarity, which is dependent upon the recording setup, may seriously degrade the performance of various speech processing applications. Therefore, its automatic detection from the speech signal is thus required as a preliminary step for ensuring such techniques are well-behaved. In this paper, a new method for polarity detection is proposed. This new approach relies on oscillating statistical moments that exhibit the property of having a phase shift which depends on the speech polarity. This dependency arises from the higher-order statistics in the moment calculation. The proposed approach is compared to state-of-the-art techniques on 10 speech corpora. Their performance in clean conditions as well as their robustness to additive noise is discussed.
KeywordsSpeech processing Speech analysis Speech polarity Glottal source Pitch-synchronous Glottal closure instant
Authors would like to thank the Walloon Region, Belgium, for its support (grant WIST 3 COMPTOUX # 1017071).
- 2.Fant G, Liljencrants J, Lin Q. A four parameter model of glottal flow, STL-QPSR4, 1985. p. 1–13.Google Scholar
- 3.Sakaguchi S, Arai T, Murahara Y. The effect of polarity inversion of speech on human perception and data hiding as application. ICASSP. 2000; 2:917–20.Google Scholar
- 4.Hunt A, Black A. Unit selection in a concatenative speech synthesis system using a large speech database, ICASSP, 1996. p. 373–76.Google Scholar
- 7.Ding W, Campbell N. Determining polarity of speech signals based on gradient of spurious glottal waveforms, ICASSP, 1998; p. 857–60.Google Scholar
- 9.Saratxaga I, Erro D, Hernez I, Sainz I, Navas E. Use of harmonic phase information for polarity detection in speech signals, Interspeech, 2009. p. 1075–8.Google Scholar
- 10.Kawahara H, Atake Y, Zolfaghari P. Accurate vocal event detection based on a fixed point analysis of mapping from time to weighted average group delay, Proceedings of ICSLP, 2000; p. 664–7.Google Scholar
- 11.Chatfield C. The analysis of time series. London: Chapman and Hall, 1984).Google Scholar
- 12.Kominek J, Black A. The CMU Arctic Speech Databases, SSW5, 2004; p. 223–4.Google Scholar
- 13.Burkhardt F, Paseschke A, Rolfes M, Sendlmeier W, Weiss B. A database of german emotional speech, Interspeech, 2005. p. 1517–20.Google Scholar
- 14.Bagshaw P, Hiller S, Jack M. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching, Eurospeech, 1993. p. 1003–6.Google Scholar
- 15.Noisex-92, Online, http://www.speech.cs.cmu.edu/comp.speech/Sectionl/Data/noisex.html.
- 16.Drugman T, Alwan A. Joint robust voicing detection and pitch estimation based on residual harmonics, Interspeech, 2011. p. 1973–6.Google Scholar