Abstract
This article addresses the problem of single-channel speech enhancement in the presence of nonstationary noise. A novel-modified NMF-based filter bank approach is proposed for speech enhancement. The method consists of filter bank analysis of the noisy input signal followed by extraction of speech signal based on a modified NMF (MNMF) by learning a speaker-independent speech dictionary using a precomputed noise dictionary. The proposed method works well with the real-world nonstationary noise independent to the speaker. The method is evaluated using a speech database consists of different speakers showing promising enhancement performance by reducing the nonstationary noise together with improving PESQ (perceptual evaluation of speech quality) compared to other competitive state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The Noizeus speech corpus is publicly available online at the following url: http://www.utdallas.edu/~loizou/speech/noizeus.
References
Y. Hu, P.C. Loizou, Subjective comparison of speech enhancement algorithms, in Proceedings ICASSP’06, vol. 1 (2006), pp. 153–156
E. Hansler, G. Schmidt, Topics in Acoustic Echo and Noise Control (Springer, Berlin, 2006)
Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator, in IEEE Transactions on Audio, Speech, and Language Processing, vol. ASSP-32 (1984), pp. 1109–1121
N. Virag, Single channel speech enhancement based on masking properties of the human auditory system. IEEE Trans. Speech Audio Process. 7(2), 497–513 (1997)
D.E. Tsoukalas, J.N. Mourjopoulos, G. Kokkinakis, Speech enhancement based on audible noise suppression. IEEE Trans. Speech Audio Process. 5(6), 497–513 (1997)
D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization. Adv. Neural Inf. Proc. Syst. 13, 556–562 (2001)
Y.-X. Wang, Y.-J. Zhang, Nonnegative matrix factorization: a comprehensive review. IEEE Trans. Knowl. Data Eng. 25, 6 (2013)
A.K. Kattepur, F. Jin, F. Sattar, Single channel source separation for convolutive mixture with application to respiratory sounds, in IEEE-EMBS Conference on Bio-inspired Systems and Signal Processing (BIOSIGNALS) (2010)
B. Ghoraani, S. Krishnan, Time-frequency matrix feature extraction and classification of environmental audio signals. IEEE Trans. Audio Speech Lang. Process. 19(7), 2197–2209 (2011)
M. Kim, J. Yoo, K. Kang, S. Choi, Nonnegative matrix partial co-factorization for spectral and temporal drum source separation. IEEE J. Sel. Top. Signal Process. 5(6), 1192–1204 (2011)
G. Shah, P. Koch, C.B. Papadias, On the blind recovery of cardiac and respiratory sounds. IEEE J. Biomed. Health Inform. 19(1), 151–157 (2015)
H. Chung, E. Plourde, B. Champagne, Regularized NMF-based speech enhancement with spectral components modeled by Gaussian mixures, in IEEE International Workshop on Machine Learning for Signal Processing (2014), pp. 21–24
H.-T. Fan, J.-W. Hung, X. Lu, S.-S. Wang, Y. Tsao, Speech enhancement using segmental nonnegative matrix factorization, in ICASSP’14 (2014), pp. 4483–4487
K. Kwon, J.W. Shin, S. Sonowal, I. Choi, N.S. Kim, Speech enhancement combining statistical models and NMF with update of speech and noise bases, in ICASSP’14 (2014)
T.G. Kang, K. Kwon, J.W. Shin, N.S. Kim, NMF-based target source separation using deep neural network. IEEE Signal Process. Lett. 22, 2 (2015)
Y. Hu, M. Bhatnagar, P. Loizou, A cross correlation technique for enhancing speech corrupted by colored noise, in Proceedings of the IEEE Conference on Acoustics, Speech, Signal Processing (2001), pp. 673–676
S.-J. Lee, S.-H. Kim, Noise estimation based on standard deviation and sigmoid function using a posteriori signal to noise ratio in nonstationary noisy environments. Int. J. Control Autom. Syst. 6(6), 818–827 (2008)
P.P. Vaidyanathan, Multirate Systems and Filter Banks (Prentice-Hall, Englewood Cliffs, 1993)
A.N. Akansu, R.A. Haddad, Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets (Academic Press, Orlando, 1992)
M.R. Portnoff, Time-frequency representation of digital signals and systems based on short-time Fourier analysis. IEEE Trans. ASSP 28(1), 55–69 (1980)
P.O. Hoyer, Non-negative sparse coding, in IEEE Workshop Neural Networks for Signal Processing (2002), pp. 557–565
J. Eggert, E. Kmrner, Sparse coding and NMF, in IEEE Conference Neural Networks, vol. 4 (2004), pp. 2529–2533
C.J. Lin, Projected gradient methods for non negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
D. Kim, S. Sra, I.S. Dhillon, Fast newton-type methods for the least squares non negative matrix approximation problem, in SIAM Conference on Data Mining (2007)
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Commun. 49(78), 588–601 (2007)
H. Hirsch, D. Pearce, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in ISCA ITRW ASR2000, France (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sattar, F., Jin, F. (2016). A Modified NMF-Based Filter Bank Approach for Enhancement of Speech Data in Nonstationary Noise. In: Naik, G. (eds) Non-negative Matrix Factorization Techniques. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48331-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-48331-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48330-5
Online ISBN: 978-3-662-48331-2
eBook Packages: EngineeringEngineering (R0)