Far Field Speech Enhancement at Low SNR in Presence of Nonstationary Noise Based on Spectral Masking and MVDR Beamforming
- 1k Downloads
Low Signal to Noise Ratio (SNR) conditions are highly likely during remote speech acquisition. This paper handles a method of remote speech multi-channel signal processing for speech enhancement in presence of strong nonstationary noise. The presented approach builds upon the Minimum Variance Distortionless response (MVDR) method, additionally filtering the multi-channel signal prior to MVDR beamforming coefficient estimation with a spectral mask. This mask is obtained by applying mixture observation vector clustering based on a spatial correlation model, which is estimated by a Complex Gaussian Mixture Model (CGMM). The posterior probabilities obtained during the CGMM Expectation-Maximization (EM) algorithm are used to estimate the cumulative noise mask, which is applied to the mixture. The masked mixture is then used to calculate the MVDR covariance matrix and beamforming coefficients. The method is tested on four mixtures acquired using a 66 microphone array at various low SNR. The results are compared to conventional MVDR and several other methods and validated using the Signal to Distortion Ratio (SDR) improvement metric. The results show that the presented method gives SDR improvement no less than 1–1.5 dB in the majority of cases, compared to MVDR, and performs best specifically at low SNR of \(-15\) – \(-20\) dB.
KeywordsSpeech enhancement Low SNR Microphone array Nonstationary noise MVDR Complex Gaussian Mixture Model (CGMM)
This research was financially supported by the Ministry of Education and Science of the Russian Federation, Contract 14.575.21.0132 (IDRFMEFI57517X0132).
- 1.Araki, S., Okada, M., Higuchi, T., Ogawa, A., Nakatani, T.: Spatial correlation model based observation vector clustering and MVDR beamforming for meeting recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, pp. 385–389, March 2016Google Scholar
- 4.Cauchi, B., et al.: Combination of MVDR beamforming and single-channel spectral processing for enhancing noisy and reverberant speech. EURASIP J. Adv. Signal Process. 61 (2015)Google Scholar
- 5.Erdogan, H., Hershey, J.R., Watanabe, S., Mandel, M.I., Le Roux, J.: Improved MVDR beamforming using single-channel mask prediction networks. In: Proceedings of Interspeech Conference, INTERSPEECH, pp. 1981–1985 (2016)Google Scholar
- 8.Hong, L., Rosca, J., Balan, R.: Independent component analysis based single channel speech enhancement. In: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, pp. 522–525, December 2003Google Scholar
- 10.Korenevsky, M.L., Matveev, Y.N., Yakovlev, A.V.: Investigation and development of methods for improving robustness of automatic speech recognition algorithms in complex acoustic environments. In: Anisimov, K.V., et al. (eds.) Proceedings of the Scientific-Practical Conference “Research and Development - 2016”, pp. 11–20. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-62870-7_2CrossRefGoogle Scholar
- 12.Prudnikov, A., Korenevsky, M., Aleinik, S.: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, pp. 401–408, December 2015Google Scholar
- 13.Stolbov, M., Aleinik, S.: Speech enhancement with microphone array using frequency-domain alignment technique. In: Proceedings of the Audio Engineering Society 54th International Conference, Audio Forensics, London, pp. 1–6, June 2014Google Scholar
- 15.Zhao, Y., Jensen, J.R., Christensen, M.G., Doclo, S., Chen, J.: Experimental study of robust beamforming techniques for acoustic applications. In: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 86–90, October 2017Google Scholar