Skip to main content
Log in

Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The approach presented here in relies on a new voicing decision algorithm based on the multi-scale product (MP) characteristics. The MP is based on the multiplication of Wavelet Transform Coefficients at some scales. According to the voicing decision, improved subspace decomposition is operated on the voiced segments of the noisy speech signal and a multi-scale principal component analysis is applied on the unvoiced segments of the same signal. Furthermore, the voiced frames are decomposed into three subspaces: sparse, low rank, and the remainder noise components. Then, we calculate the components as a segregation problem. In the unvoiced frames, we combine the straightforward multivariate generalization of the wavelet denoising technique with the principal component analysis method. Experiments on NOIZEUS and NTT databases show that the proposed approach obtains satisfying results for most types of noise with little speech degradation and outperforms several competitive methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. A. Abramson, I. Cohen, Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Speech Audio Process. 15, 2348–2359 (2007)

    Article  Google Scholar 

  2. M. Bahoura, J. Rouat, Wavelet speech enhancement based on time-scale adaptation. Speech Commun. 48, 1620–1637 (2006)

    Article  Google Scholar 

  3. M.A. Ben Messaoud, A. Bouzid, Speech enhancement based on wavelet transform and improved subspace decomposition. J. Audio Eng. Soc. 63, 990–1000 (2015)

    Google Scholar 

  4. M.A. Ben Messaoud, A. Bouzid, Inverse filtering and principal component analysis techniques for speech dereverberation. Adv. Electr. Electron. Eng. 14, 129–138 (2016)

    Google Scholar 

  5. S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech. Signal Process. 27, 113–120 (1979)

    Article  Google Scholar 

  6. A. Camacho, J.G. Harris, A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124, 1638–1652 (2008)

    Article  Google Scholar 

  7. E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM. 58(3), 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  8. E. Cho, J.O. Smith, B. Widrow, Exploiting the harmonic structure for speech enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)

  9. M.E. Deisher, A.S. Spanias, HMM-based speech enhancement using harmonic modeling, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1997)

  10. F. Deng, C.C. Bao, Speech enhancement based on Bayesian decision and spectral amplitude estimation. EURASIP J. Audio Speech Music Process. 28, 1–18 (2015)

    Google Scholar 

  11. D.L. Donoho, I.M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage. J. Am Stat. Assoc. 90, 1200–1224 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  12. S. Dubost, O. Cappe, Enhancement of speech based on non-parametric estimation of a time varying harmonic representation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2000)

  13. J.S. Erkelens, R.C. Hendriks, R. Heusdens, J. Jensen, Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors. IEEE Trans. Audio Speech Lang Process. 15, 1741–1752 (2007)

    Article  Google Scholar 

  14. J.F. Gemmeke, H. Van hamme, B. Cranen, L. Boves, Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Signal Process. 4, 272–287 (2010)

    Article  Google Scholar 

  15. J. Hardwick, C.D. Yoo, J.S. Lim, Speech enhancement using the dual excitation model, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1993)

  16. Y. Hu, P.C. Loizou, Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process 12, 59–68 (2004)

    Article  Google Scholar 

  17. Y. Hu, P.C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Trans. Speech Audio Process. 11(4), 334–341 (2003)

    Article  Google Scholar 

  18. Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Comm. 49, 588–601 (2007)

    Article  Google Scholar 

  19. F. Huag, T. Lee, W.B. Kleijn, Transform-domain wiener filter for speech periodicity, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)

  20. International Telecommunication Recommendation I. T. Rec.: p.56 (12/11) objective measurement of active speech level, in Proceedings of the Internatoinal Telecommunications Union, Geneva, Switzerland (1993)

  21. C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization. LVA/ICA, p. 322–329 (2012)

  22. M.T. Johnson, X. Yuan, Y. Ren, Speech signal enhancement through adaptive wavelet thresholding. Speech Commun. 49, 123–133 (2007)

    Article  Google Scholar 

  23. K. Kwon, J.W. Shin, N.S. Kim, NMF-based speech enhancement using bases update. IEEE Trans. Signal Process. Lett. 22, 450–454 (2015)

    Article  Google Scholar 

  24. C. Li, W.J. Liu, A novel multi-band spectral subtraction method based on phase modification and magnitude compensation, in Proceedings of the IEEE International Conferece on Acoustics, Speech, and Signal Processing, ICASSP (2011)

  25. Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, UIUC (2009)

  26. P.C. Loizou, Speech Enhancement: Theory and Practice, 2nd edn. (CRC Press, Boca Raton, FL, 2013)

    Google Scholar 

  27. P.C. Loizou, Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum. IEEE Trans. Speech Audio Process. 13, 857–869 (2005)

    Article  Google Scholar 

  28. Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Comm. 50, 453–466 (2008)

    Article  Google Scholar 

  29. S. Mallat, A wavelet tour of signal processing, 3rd edn. (Academic Press, San Diego, 2008)

    MATH  Google Scholar 

  30. R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors. EE Trans. Audio Speech Lang. Process 13, 845–856 (2005)

    Article  Google Scholar 

  31. B.M. Sadler, A. Swami, Analysis of multi-scale products for step detection and estimation. IEEE Trans. Inf. Theory 45, 1043–1053 (1999)

    Article  MATH  Google Scholar 

  32. C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process 20, 1698–1712 (2012)

    Article  Google Scholar 

  33. A. Varga, H.J.M. Steeneken, NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm. 12, 247–251 (1993)

    Article  Google Scholar 

  34. E. Vincent, MUSHRAM: A MATLAB interface for MUSHRA listening tests. http://c4dm.eecs.qmul.ac.uk/downloads/

  35. T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process 15, 1066–1074 (2007)

    Article  Google Scholar 

  36. D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines, ed. by P. Divenyi (Kluwer, Berlin, 2005), pp. 181–197

    Chapter  Google Scholar 

  37. Y. Xu, J.B. Weaver, D.M. Healy, J. Lu, Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans. Image Process. 3, 747–760 (1994)

    Article  Google Scholar 

  38. T. Zhou, D. Tao, GoDec: Randomized low-rank & sparse matrix decomposition in noisy case, in Proceedings of ICML, Bellevue, WA, USA, pp. 33–40 (2011)

  39. H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their helpful and constructive comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Anouar Ben Messaoud.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Messaoud, M.A., Bouzid, A. Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification. Circuits Syst Signal Process 36, 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-016-0384-6

Keywords

Navigation