Advertisement

Monaural Voiced Speech Separation with Multipitch Tracking

  • Wei Jiang
  • Pengfei Hu
  • Shan Liang
  • Wenju Liu
  • Zhanlei Yang
Part of the Communications in Computer and Information Science book series (CCIS, volume 321)

Abstract

Separating voiced speech from its mixtures with interferences in monaural condition is not only an important but also challenging task. As multipitch tracking can enable much better performance of speech separation for CASA systems, we propose a new multipitch determination algorithm, which can be used under various kinds of noise conditions. In the process of multipitch estimation, a new representation method is utilized together with maximum support constraint and harmonic completeness constraint. The proposed approach can reliably detect up to two pitches in each frame. Sequential grouping is performed based on a new target pitch tracking strategy. System evaluations show that our algorithm leads to significantly better speech separation results than previous ones.

Keywords

voiced speech separation computational auditory scene analysis multipitch determination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hu, G.: Monaural Speech Organization and Segregation. Ph.D. dissertation, Biophysics Program, The Ohio State University (2006)Google Scholar
  2. 2.
    Cooke, M.P.: Modeling Auditory Processing and Organization. Cambridge University, U.K. (1993)Google Scholar
  3. 3.
    Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15(5), 1135–1150 (2004)CrossRefGoogle Scholar
  4. 4.
    Zhang, X.L., Liu, W.J., Li, P., Xu, B.: Monaural voiced speech segregation based on elaborate harmonic grouping strategy. In: Proc. IEEE ICASSP 2009, Taiwan, pp. 4661–4664 (2009)Google Scholar
  5. 5.
    Jiang, W., Liu, W.J., Hu, P.F.: Modeling spectral smoothness principle for monaural voiced speech separation. In: Proc. ACPR 2011, Beijing, pp. 254–258 (2011)Google Scholar
  6. 6.
    Zhang, X.L., Liu, W.J., Xu, B.: Monaural Voiced Speech Segregation Based on Dynamic Harmonic Function. EURASIP Journal on Audio, Speech, and Music Processing 2010 (2010)Google Scholar
  7. 7.
    Hu, G., Wang, D.L.: A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Trans. Audio, Speech, Lang. Process. 18, 2067–2079 (2010)CrossRefGoogle Scholar
  8. 8.
    Meddis, R.: Simulation of auditory-neural transduction: further studies. J. Acoust. Soc. Amer. 83, 1056–1063 (1988)CrossRefGoogle Scholar
  9. 9.
    Wang, D.L.: On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi, P. (ed.) Speech Separation by Humans and Machines. Kluwer, Boston (2005)Google Scholar
  10. 10.
    Hu, G., Wang, D.L.: An auditory scene analysis approach to monaural speech segregation. In: Schmidt, G., Haensler, E. (eds.) Selected Methods for Acoustic Echo and Noise Control. Springer, Berlin (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wei Jiang
    • 1
  • Pengfei Hu
    • 1
  • Shan Liang
    • 1
  • Wenju Liu
    • 1
  • Zhanlei Yang
    • 1
  1. 1.National Laboratory of Pattern Recognition, Institute of AutomationChinese Academy of SciencesBeijingChina

Personalised recommendations