Skip to main content

Multi-channel Speaker Separation Using Speaker-Aware Beamformer

  • Conference paper
  • First Online:
Intelligent Computing (CompCom 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 997))

Included in the following conference series:

  • 1101 Accesses

Abstract

In this work, we address the problem of multi-channel speech separation. We use a localization network to estimate delay times to compute steering vectors and derive spatial filters using these vectors and mixtures, in a similar way as a recently proposed method. The beamformer has difficulties in speech separation when speakers are close to each other or their locations are estimated inaccurately. To overcome this problem, we propose to inform beamforming about speakers so that it tracks speakers using not only locations but also speaker characteristics through utterances. We investigate and compare different methods of using the speaker information in beamforming such as multiplying steering vectors with speaker weights. Experiments on simulated data demonstrate that the proposed method can improve the performance of both speech separation and speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schmidt, M.N., Olsson, R.K.: Single-channel speech separation using sparse non-negative matrix factorization. In: INTERSPEECH, pp. 2614–2617 (2006)

    Google Scholar 

  2. Lee, T.-W.: Independent Component Analysis, pp. 27–66. Springer, Heidelberg (1998)

    Google Scholar 

  3. Cooke, M.: Modelling Auditory Processing and Organisation, vol. 7. Cambridge University Press, Cambridge (2005)

    Google Scholar 

  4. Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Interspeech, pp. 1981–1985 (2016)

    Google Scholar 

  5. Heymann, J., Drude, L., Haeb-Umbach, R.: Neural network based spectral mask estimation for acoustic beamforming. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 196–200. IEEE (2016)

    Google Scholar 

  6. Higuchi, T., Ito, N., Araki, S., Yoshioka, T., Delcroix, M., Nakatani, T.: Online MVDR beamformer based on complex gaussian mixture model with spatial prior for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 780–793 (2017)

    Google Scholar 

  7. Drude, L., Haeb-Umbach, R.: Tight integration of spatial and spectral features for BSS with deep clustering embeddings. In: Proceedings of Interspeech, pp. 2650–2654 (2017)

    Google Scholar 

  8. Liu, C., Inoue, N., Shinoda, K.: A unified network for multi-speaker speech recognition with multi-channel recordings. In: Accepted to APSIPA (2017)

    Google Scholar 

  9. Yoshioka, T., Erdogan, H., Chen, Z., Alleva, F.: Multi-microphone neural speech separation for far-field multi-talker speech recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5739–5743. IEEE (2018)

    Google Scholar 

  10. Zmolikova, K., Delcroix, M., Kinoshita, K., Higuchi, T., Ogawa, A., Nakatani, T.: Speaker-aware neural network based beamformer for speaker extraction in speech mixtures. In: Interspeech (2017)

    Google Scholar 

  11. Wang, Q., Muckenhirn, H., Wilson, K., Sridhar, P., Wu, Z., Hershey, J., Saurous, R.A., Weiss, R.J., Jia,Y., Moreno, I.L.: Voicefilter: targeted voice separation by speaker-conditioned spectrogram masking. arXiv preprint arXiv:1810.04826 (2018)

  12. Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S.: Deep clustering: discriminative embeddings for segmentation and separation. In: ICASSP, pp. 31–35 (2016)

    Google Scholar 

  13. Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE/ACM Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Google Scholar 

  14. Nautsch, A., Darmstadt, H.: Speaker verification using i-vector. University of Applied Science Hochschule Darmstadt, Germany (2014)

    Google Scholar 

  15. Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: INTERSPEECH, pp. 2341–2344 (2011)

    Google Scholar 

  16. Vincent, E., Gribonval, R., Févotte, C.: Performance measurement in blind audio source separation. IEEE/ACM Trans. Audio Speech Lang. Process. 14(4), 1462–1469 (2006)

    Google Scholar 

  17. Knapp, C., Carter, G.: The generalized correlation method for estimation of time delay. IEEE/ACM Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)

    Google Scholar 

  18. Mestre, X., Lagunas, M.A.: On diagonal loading for minimum variance beamformers. In: ISSPIT, pp. 459–462 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Conggui Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Liu, Y. (2019). Multi-channel Speaker Separation Using Speaker-Aware Beamformer. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Intelligent Computing. CompCom 2019. Advances in Intelligent Systems and Computing, vol 997. Springer, Cham. https://doi.org/10.1007/978-3-030-22871-2_32

Download citation

Publish with us

Policies and ethics