Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Saleem, Nasir; Ijaz, Gohar

doi:10.1007/s10772-018-9500-2

Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Published: 08 March 2018

Volume 21, pages 217–231, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

272 Accesses
12 Citations
Explore all metrics

Abstract

In speech enhancement systems, the key stage is to estimate noise which generally requires prior speech or noise models. However, it is difficult to obtain such prior models sometimes. This paper presents a speech enhancement algorithm which does not require prior knowledge of speech and noise, and is based on low-rank and sparse matrix decomposition model using gammatone filterbank and Kullback–Leibler divergence to estimate noise and speech by decomposing the input noisy speech magnitude spectra into low-rank noise and sparse speech parts, respectively. According to the proposed technique, noise signals are assumed as low-rank components because noise spectra within different time frames are usually highly correlated with each other; while the speech signals are considered as sparse components because they are relatively sparse in time–frequency domain. Based on these assumptions, we have developed an alternative speech enhancement algorithm to separate the speech and noise magnitude spectra by imposing rank and sparsity constraints, with which the enhanced time-domain speech can be constructed from sparse matrix The proposed technique is significantly different from existing speech enhancement techniques as it enhances noisy speech in an uncomplicated manner, without need of noise estimation algorithm to find noise-only excerpts for noise estimation. Moreover, it can obtain improved performance in low SNR conditions, and does not need to know the exact distribution of noise signals. Experimental results have showed that proposed technique can perform better than conventional techniques in many types of strong noise conditions, in terms of yielding less residual noise, lower speech distortion and better overall speech quality. An important improvement in terms of the PESQ, SNRSeg, SIG and BAK is observed with the proposed algorithm over baseline algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints

Speech denoising using Bayesian NMF with online base update

Article 07 December 2018

Speech Denoising Based on Sparse Representation Algorithm

References

Benesty, J., Chen, J., Huang, Y. A., & Doclo, S. (2005). Study of the Wiener filter for noise reduction. In Speech enhancement (pp. 9–41). Berlin: Springer.
Chapter Google Scholar
Boldt, J., Kjems, U., Pedersen, M. S., Lunner, T., & Wang, D. (2008). Estimation of the ideal binary mask using directional systems. In: Proceedings of the International Workshop on Acoustic Echo and Noise Control.
Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), 113–120.
Article Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), 1–122.
Article MATH Google Scholar
Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM (JACM), 58(3), 11.
Article MathSciNet MATH Google Scholar
De Moor, B. (1993). The singular value decomposition and long and short spaces of noisy matrices. IEEE Transactions on Signal Processing, 41(9), 2826–2838.
Article MATH Google Scholar
Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), 1109–1121.
Article Google Scholar
Ephraim, Y., & Malah, D. (1985). Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2), 443–445.
Article Google Scholar
Ephraim, Y., & Van Trees, H. L. (1995). A signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 3(4), 251–266.
Article Google Scholar
Hermus, K., & Wambacq, P. (2006). A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP Journal on Advances in Signal Processing, 2007(1), 045821.
Article MathSciNet MATH Google Scholar
Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. Automatic speech recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop.
Hu, G., & Wang, D. (2004). Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Networks, 15(5), 1135–1150.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2003). A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Transactions on Speech and Audio Processing, 11(4), 334–341.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Article Google Scholar
Huang, J., Zhang, X., Zhang, Y., Zou, X., & Zeng, L. (2014). Speech denoising via low-rank and sparse matrix decomposition. ETRI Journal, 36(1), 167–170.
Article Google Scholar
Huang, P. S., Chen, S. D., Smaragdis, P., & Hasegawa-Johnson, M. (2012). Singing-voice separation from monaural recordings using robust principal component analysis. IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 57–60.
Jolliffe, I. T. (2002). Principal component analysis and factor analysis. In Principal component analysis. New York: Springer, 150–166.
Google Scholar
Li, Y., & Wang, D. (2009). On the optimality of ideal binary time–frequency masks. Speech Communication, 51(3), 230–239.
Article MathSciNet Google Scholar
Liang, S., Liu, W., & Jiang, W. (2012). Integrating binary mask estimation with MRF priors of cochleagram for speech separation. IEEE Signal Processing Letters, 19(10), 627–630.
Article Google Scholar
Liutkus, A., & Badeau, R. (2015). Generalized Wiener filtering with fractional power spectrograms. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on (pp. 266–270). IEEE.
Loizou, P. C. (2007). Subjective evaluation and comparison of speech enhancement algorithms. Speech Communication, 49, 588–601.
Article Google Scholar
Loizou, P. C. (2013). Speech enhancement: theory and practice. New York: CRC Press.
Google Scholar
Manohar, K., & Rao, P. (2006). Speech enhancement in nonstationary noise environments using noise properties. Speech Communication, 48(1), 96–109.
Article Google Scholar
Martin, R. (2001). Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing, 9(5), 504–512.
Article Google Scholar
Mavaddaty, S., Ahadi, S. M., & Seyedin, S. (2016). A novel speech enhancement method by learnable sparse and low-rank decomposition and domain adaptation. Speech Communication, 76, 42–60.
Article Google Scholar
Messaoud, MAB., & Bouzid, A. (2017). Sparse representations for single channel speech enhancement based on voiced/unvoiced classification. Circuits, Systems, and Signal Processing, 36(5), 1912–1933.
Article Google Scholar
Min, G., Zhang, X., Zou, X., & Sun, M. (2016). Mask estimate through Itakura-Saito nonnegative RPCA for speech enhancement. IEEE International Workshop on Acoustic Signal Enhancement pp. 1–5.
Rangachari, S., & Loizou, P. C. (2006). A noise-estimation algorithm for highly non-stationary environments. Speech Communication, 48(2), 220–231.
Article Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 749–752.
Saleem, N. (2017). Single channel noise reduction system in low SNR. International Journal of Speech Technology, 20(1), 89–98.
Article MathSciNet Google Scholar
Saleem, N., & Irfan, M. (2017). Noise reduction based on soft masks by incorporating SNR uncertainty in frequency domain. Circuits, Systems, and Signal Processing. https://doi.org/10.1007/s00034-017-0684-5.
Google Scholar
Saleem, N., Mustafa, E., Nawaz, A., & Khan, A. (2015a). Ideal binary masking for reducing convolutive noise. International Journal of Speech Technology, 18(4), 547–554.
Article Google Scholar
Saleem, N., Shafi, M., Mustafa, E., & Nawaz, A. (2015b). A novel binary mask estimation based on spectral subtraction gain-induced distortions for improved speech intelligibility and quality. University of Engineering and Technology Taxila. Technical Journal, 20(4), 36.
Google Scholar
Scalart, P. (1996). Speech enhancement based on a priori signal to noise estimation. IEEE International Conference on Acoustics, Speech, and Signal Processing.
Soon, I. Y., & Koh, S. N. (2000). Low distortion speech enhancement. IEEE Proceedings-Vision, Image and Signal Processing, 147(3), 247–253.
Article Google Scholar
Sorensen, K. V., & Andersen, S. V. (2005). Speech enhancement with natural sounding residual noise based on connected time-frequency speech presence regions. EURASIP Journal on Advances in Signal Processing, 2005(18), 305909.
Article MATH Google Scholar
Sun, D. L., & Fevotte, C. (2014). Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. IEEE International Conference on Acoustics, Speech and Signal Processing pp. 6201–6205.
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2011). An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), 2125–2136.
Article Google Scholar
Wang, D., & Brown, G. J. (2006). Computational auditory scene analysis: Principles, algorithms, and applications. Hoboken, NJ: Wiley-IEEE Press.
Wang, D., Kjems, U., Pedersen, M. S., Boldt, J. B., & Lunner, T. (2008). Speech perception of noise with binary gains. The Journal of the Acoustical Society of America, 124(4), 2303–2307.
Article Google Scholar
Wang, H. Y., Zhao, X. H., & Gu, H. J. (2011). Speech enhancement using super gauss mixture model of speech spectral amplitude. The Journal of China Universities of Posts and Telecommunications, 18, 13–18.
Article Google Scholar
Wiem, B., & Aicha, B. (2016). Single channel speech separation based on PCA and Fuzzy logic. Neural Parallel & Scientific Computations, 24, 489–504.
MathSciNet Google Scholar
Wright, J., Ganesh, A., Rao, S., Peng, Y., & Ma, Y. (2009). Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Advances in neural information processing systems (pp. 2080–2088).
Zhou, T., & Tao, D. (2011). Godec: Randomized low-rank & sparse matrix decomposition in noisy case. In International conference on machine learning.

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable and constructive comments.

Author information

Authors and Affiliations

Department of Electrical Engineering, Faculty of Engineering & Technology, Gomal University, Dera Ismail Khan, Pakistan
Nasir Saleem & Gohar Ijaz

Authors

Nasir Saleem
View author publications
You can also search for this author in PubMed Google Scholar
Gohar Ijaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nasir Saleem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saleem, N., Ijaz, G. Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence. Int J Speech Technol 21, 217–231 (2018). https://doi.org/10.1007/s10772-018-9500-2

Download citation

Received: 24 October 2017
Accepted: 04 March 2018
Published: 08 March 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10772-018-9500-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Abstract

Access this article

Similar content being viewed by others

Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints

Speech denoising using Bayesian NMF with online base update

Speech Denoising Based on Sparse Representation Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback–Leibler divergence

Abstract

Access this article

Similar content being viewed by others

Speech Enhancement Using Non-negative Low-Rank Modeling with Temporal Continuity and Sparseness Constraints

Speech denoising using Bayesian NMF with online base update

Speech Denoising Based on Sparse Representation Algorithm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation