Improving Reverberant Speech Separation with Binaural Cues Using Temporal Context and Convolutional Neural Networks
Given binaural features as input, such as interaural level difference and interaural phase difference, Deep Neural Networks (DNNs) have been recently used to localize sound sources in a mixture of speech signals and/or noise, and to create time-frequency masks for the estimation of the sound sources in reverberant rooms. Here, we explore a more advanced system, where feed-forward DNNs are replaced by Convolutional Neural Networks (CNNs). In addition, the adjacent frames of each time frame (occurring before and after this frame) are used to exploit contextual information, thus improving the localization and separation for each source. The quality of the separation results is evaluated in terms of Signal to Distortion Ratio (SDR).
KeywordsConvolutional Neural Networks Binaural cues Reverberant rooms Speech separation Contextual information
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7-PEOPLE-2013-ITN) under grant agreement no 607290 SpaRTaN.
- 1.Comon, P., Jutten, C. (eds.): Handbook of Blind Source Separation: Independent Component Analysis and Applications. Elsevier, Amsterdam, Boston (2010)Google Scholar
- 3.Lee, D.D., Sebastian, S.H.: Algorithms for non-negative matrix factorization. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press, Cambridge (2001)Google Scholar
- 9.Zermini, A., Liu, Q., Xu, Y., Plumbley, M.D., Betts, D., Wang, W.: Binaural and log-power spectra features with deep neural networks for speech-noise separation. In: IEEE 19th International Workshop on Multimedia Signal Processing, MMSP 2017, pp. 1–6. IEEE, October 2017Google Scholar
- 12.Chakrabarty, S., Habets, E.A.P.: Multi-speaker localization using convolutional neural network trained with noise. In: 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017)Google Scholar
- 13.Hummersone, C.: A psychoacoustic engineering approach to machine sound source separation in reverberant environments (2011). https://github.com/IoSR-Surrey/RealRoomBRIRs/