Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

Ben Messaoud, Mohamed Anouar; Bouzid, Aïcha

doi:10.1007/s00034-016-0384-6

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

Published: 02 September 2016

Volume 36, pages 1912–1933, (2017)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Mohamed Anouar Ben Messaoud¹ &
Aïcha Bouzid¹

351 Accesses
6 Citations
Explore all metrics

Abstract

The approach presented here in relies on a new voicing decision algorithm based on the multi-scale product (MP) characteristics. The MP is based on the multiplication of Wavelet Transform Coefficients at some scales. According to the voicing decision, improved subspace decomposition is operated on the voiced segments of the noisy speech signal and a multi-scale principal component analysis is applied on the unvoiced segments of the same signal. Furthermore, the voiced frames are decomposed into three subspaces: sparse, low rank, and the remainder noise components. Then, we calculate the components as a segregation problem. In the unvoiced frames, we combine the straightforward multivariate generalization of the wavelet denoising technique with the principal component analysis method. Experiments on NOIZEUS and NTT databases show that the proposed approach obtains satisfying results for most types of noise with little speech degradation and outperforms several competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech Denoising Based on Sparse Representation Algorithm

Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information

Article 31 March 2015

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

Article 03 September 2016

References

A. Abramson, I. Cohen, Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Speech Audio Process. 15, 2348–2359 (2007)
Article Google Scholar
M. Bahoura, J. Rouat, Wavelet speech enhancement based on time-scale adaptation. Speech Commun. 48, 1620–1637 (2006)
Article Google Scholar
M.A. Ben Messaoud, A. Bouzid, Speech enhancement based on wavelet transform and improved subspace decomposition. J. Audio Eng. Soc. 63, 990–1000 (2015)
Google Scholar
M.A. Ben Messaoud, A. Bouzid, Inverse filtering and principal component analysis techniques for speech dereverberation. Adv. Electr. Electron. Eng. 14, 129–138 (2016)
Google Scholar
S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech. Signal Process. 27, 113–120 (1979)
Article Google Scholar
A. Camacho, J.G. Harris, A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124, 1638–1652 (2008)
Article Google Scholar
E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM. 58(3), 1–37 (2011)
Article MathSciNet MATH Google Scholar
E. Cho, J.O. Smith, B. Widrow, Exploiting the harmonic structure for speech enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
M.E. Deisher, A.S. Spanias, HMM-based speech enhancement using harmonic modeling, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1997)
F. Deng, C.C. Bao, Speech enhancement based on Bayesian decision and spectral amplitude estimation. EURASIP J. Audio Speech Music Process. 28, 1–18 (2015)
Google Scholar
D.L. Donoho, I.M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage. J. Am Stat. Assoc. 90, 1200–1224 (1995)
Article MathSciNet MATH Google Scholar
S. Dubost, O. Cappe, Enhancement of speech based on non-parametric estimation of a time varying harmonic representation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2000)
J.S. Erkelens, R.C. Hendriks, R. Heusdens, J. Jensen, Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors. IEEE Trans. Audio Speech Lang Process. 15, 1741–1752 (2007)
Article Google Scholar
J.F. Gemmeke, H. Van hamme, B. Cranen, L. Boves, Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Signal Process. 4, 272–287 (2010)
Article Google Scholar
J. Hardwick, C.D. Yoo, J.S. Lim, Speech enhancement using the dual excitation model, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1993)
Y. Hu, P.C. Loizou, Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process 12, 59–68 (2004)
Article Google Scholar
Y. Hu, P.C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Trans. Speech Audio Process. 11(4), 334–341 (2003)
Article Google Scholar
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Comm. 49, 588–601 (2007)
Article Google Scholar
F. Huag, T. Lee, W.B. Kleijn, Transform-domain wiener filter for speech periodicity, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
International Telecommunication Recommendation I. T. Rec.: p.56 (12/11) objective measurement of active speech level, in Proceedings of the Internatoinal Telecommunications Union, Geneva, Switzerland (1993)
C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization. LVA/ICA, p. 322–329 (2012)
M.T. Johnson, X. Yuan, Y. Ren, Speech signal enhancement through adaptive wavelet thresholding. Speech Commun. 49, 123–133 (2007)
Article Google Scholar
K. Kwon, J.W. Shin, N.S. Kim, NMF-based speech enhancement using bases update. IEEE Trans. Signal Process. Lett. 22, 450–454 (2015)
Article Google Scholar
C. Li, W.J. Liu, A novel multi-band spectral subtraction method based on phase modification and magnitude compensation, in Proceedings of the IEEE International Conferece on Acoustics, Speech, and Signal Processing, ICASSP (2011)
Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, UIUC (2009)
P.C. Loizou, Speech Enhancement: Theory and Practice, 2nd edn. (CRC Press, Boca Raton, FL, 2013)
Google Scholar
P.C. Loizou, Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum. IEEE Trans. Speech Audio Process. 13, 857–869 (2005)
Article Google Scholar
Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Comm. 50, 453–466 (2008)
Article Google Scholar
S. Mallat, A wavelet tour of signal processing, 3rd edn. (Academic Press, San Diego, 2008)
MATH Google Scholar
R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors. EE Trans. Audio Speech Lang. Process 13, 845–856 (2005)
Article Google Scholar
B.M. Sadler, A. Swami, Analysis of multi-scale products for step detection and estimation. IEEE Trans. Inf. Theory 45, 1043–1053 (1999)
Article MATH Google Scholar
C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process 20, 1698–1712 (2012)
Article Google Scholar
A. Varga, H.J.M. Steeneken, NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm. 12, 247–251 (1993)
Article Google Scholar
E. Vincent, MUSHRAM: A MATLAB interface for MUSHRA listening tests. http://c4dm.eecs.qmul.ac.uk/downloads/
T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process 15, 1066–1074 (2007)
Article Google Scholar
D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines, ed. by P. Divenyi (Kluwer, Berlin, 2005), pp. 181–197
Chapter Google Scholar
Y. Xu, J.B. Weaver, D.M. Healy, J. Lu, Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans. Image Process. 3, 747–760 (1994)
Article Google Scholar
T. Zhou, D. Tao, GoDec: Randomized low-rank & sparse matrix decomposition in noisy case, in Proceedings of ICML, Bellevue, WA, USA, pp. 33–40 (2011)
H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their helpful and constructive comments.

Author information

Authors and Affiliations

Electrical Engineering Department, National School of Engineers of Tunis, University of Tunis El Manar, Tunis, Tunisia
Mohamed Anouar Ben Messaoud & Aïcha Bouzid

Authors

Mohamed Anouar Ben Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Aïcha Bouzid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben Messaoud, M.A., Bouzid, A. Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification. Circuits Syst Signal Process 36, 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6

Download citation

Received: 13 May 2015
Revised: 05 August 2016
Accepted: 08 August 2016
Published: 02 September 2016
Issue Date: May 2017
DOI: https://doi.org/10.1007/s00034-016-0384-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

Abstract

Access this article

Similar content being viewed by others

Speech Denoising Based on Sparse Representation Algorithm

Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

Abstract

Access this article

Similar content being viewed by others

Speech Denoising Based on Sparse Representation Algorithm

Single-channel Speech Separation Using Dictionary-updated Orthogonal Matching Pursuit and Temporal Structure Information

Subspace filtering approach based on orthogonal projection for better analysis of stressed speech under clean and noisy environments

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation