Far Field Speech Enhancement at Low SNR in Presence of Nonstationary Noise Based on Spectral Masking and MVDR Beamforming

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11096)


Low Signal to Noise Ratio (SNR) conditions are highly likely during remote speech acquisition. This paper handles a method of remote speech multi-channel signal processing for speech enhancement in presence of strong nonstationary noise. The presented approach builds upon the Minimum Variance Distortionless response (MVDR) method, additionally filtering the multi-channel signal prior to MVDR beamforming coefficient estimation with a spectral mask. This mask is obtained by applying mixture observation vector clustering based on a spatial correlation model, which is estimated by a Complex Gaussian Mixture Model (CGMM). The posterior probabilities obtained during the CGMM Expectation-Maximization (EM) algorithm are used to estimate the cumulative noise mask, which is applied to the mixture. The masked mixture is then used to calculate the MVDR covariance matrix and beamforming coefficients. The method is tested on four mixtures acquired using a 66 microphone array at various low SNR. The results are compared to conventional MVDR and several other methods and validated using the Signal to Distortion Ratio (SDR) improvement metric. The results show that the presented method gives SDR improvement no less than 1–1.5 dB in the majority of cases, compared to MVDR, and performs best specifically at low SNR of \(-15\)\(-20\) dB.


Speech enhancement Low SNR Microphone array Nonstationary noise MVDR Complex Gaussian Mixture Model (CGMM) 



This research was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0132 (IDRFMEFI57517X0132).


  1. 1.
    Araki, S., Okada, M., Higuchi, T., Ogawa, A., Nakatani, T.: Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 385–389, March 2016Google Scholar
  2. 2.
    Benesty, J., Chen, J., Huang, Y., Cohen, I.: Noise Reduction in Speech Processing. Springer Topics in Signal Processing, vol. 2. Springer, Heidelberg (2010). Scholar
  3. 3.
    Brandstein, M., Ward, D.: Microphone Arrays: Signal Processing Techniques and Applications. Digital Signal Processing, Heidelberg (2010). Scholar
  4. 4.
    Cauchi, B., et al.: Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech. EURASIP J. Adv. Signal Process. 61 (2015)Google Scholar
  5. 5.
    Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Proceedings of Interspeech Conference, INTERSPEECH, pp. 1981–1985 (2016)Google Scholar
  6. 6.
    Habets, E.A.P., Benesty, J.: A two-stage beamforming approach for noise reduction and dereverberation. IEEE Trans. Audio Speech Lang. Process. 21(5), 945–958 (2013)CrossRefGoogle Scholar
  7. 7.
    Higuchi, T., Ito, N., Araki, S., Yoshioka, T., Delcroix, M., Nakatani, T.: Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR. IEEE/ACM Trans. Audio Speech Lang. Process. 25(4), 780–793 (2017)CrossRefGoogle Scholar
  8. 8.
    Hong, L., Rosca, J., Balan, R.: Independent component analysis based single channel speech enhancement. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, pp. 522–525, December 2003Google Scholar
  9. 9.
    Jaureguiberry, X., Vincent, E., Richard, G.: Fusion methods for speech enhancement and audio source separation. IEEE Trans. Audio Speech Lang. Process. 24(7), 1266–1279 (2016)CrossRefGoogle Scholar
  10. 10.
    Korenevsky, M.L., Matveev, Y.N., Yakovlev, A.V.: Investigation and development of methods for improving robustness of automatic speech recognition algorithms in complex acoustic environments. In: Anisimov, K.V., et al. (eds.) Proceedings of the Scientific-Practical Conference “Research and Development - 2016”, pp. 11–20. Springer, Cham (2018). Scholar
  11. 11.
    Oleinik, A.: A lightweight face tracking system for video surveillance. In: Campilho, A., Karray, F. (eds.) ICIAR 2016. LNCS, vol. 9730. Springer, Cham (2016). Scholar
  12. 12.
    Prudnikov, A., Korenevsky, M., Aleinik, S.: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, pp. 401–408, December 2015Google Scholar
  13. 13.
    Stolbov, M., Aleinik, S.: Speech enhancement with microphone array using frequency-domain alignment technique. In: Proceedings of the Audio Engineering Society 54th International Conference, Audio Forensics, London, pp. 1–6, June 2014Google Scholar
  14. 14.
    Upadhyay, N., Karmakar, A.: Speech enhancement using spectral subtraction-type algorithms: a comparison and simulation study. Procedia Comput. Sci. 54, 574–584 (2015)CrossRefGoogle Scholar
  15. 15.
    Zhao, Y., Jensen, J.R., Christensen, M.G., Doclo, S., Chen, J.: Experimental study of robust beamforming techniques for acoustic applications. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 86–90, October 2017Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Speech Information SystemsITMO UniversitySt. PetersburgRussia
  2. 2.Speech Technology CenterSt. PetersburgRussia

Personalised recommendations