Abstract
The approach presented here in relies on a new voicing decision algorithm based on the multi-scale product (MP) characteristics. The MP is based on the multiplication of Wavelet Transform Coefficients at some scales. According to the voicing decision, improved subspace decomposition is operated on the voiced segments of the noisy speech signal and a multi-scale principal component analysis is applied on the unvoiced segments of the same signal. Furthermore, the voiced frames are decomposed into three subspaces: sparse, low rank, and the remainder noise components. Then, we calculate the components as a segregation problem. In the unvoiced frames, we combine the straightforward multivariate generalization of the wavelet denoising technique with the principal component analysis method. Experiments on NOIZEUS and NTT databases show that the proposed approach obtains satisfying results for most types of noise with little speech degradation and outperforms several competitive methods.
Similar content being viewed by others
References
A. Abramson, I. Cohen, Simultaneous detection and estimation approach for speech enhancement. IEEE Trans. Speech Audio Process. 15, 2348–2359 (2007)
M. Bahoura, J. Rouat, Wavelet speech enhancement based on time-scale adaptation. Speech Commun. 48, 1620–1637 (2006)
M.A. Ben Messaoud, A. Bouzid, Speech enhancement based on wavelet transform and improved subspace decomposition. J. Audio Eng. Soc. 63, 990–1000 (2015)
M.A. Ben Messaoud, A. Bouzid, Inverse filtering and principal component analysis techniques for speech dereverberation. Adv. Electr. Electron. Eng. 14, 129–138 (2016)
S.F. Boll, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech. Signal Process. 27, 113–120 (1979)
A. Camacho, J.G. Harris, A sawtooth waveform inspired pitch estimator for speech and music. J. Acoust. Soc. Am. 124, 1638–1652 (2008)
E.J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis? J. ACM. 58(3), 1–37 (2011)
E. Cho, J.O. Smith, B. Widrow, Exploiting the harmonic structure for speech enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
M.E. Deisher, A.S. Spanias, HMM-based speech enhancement using harmonic modeling, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1997)
F. Deng, C.C. Bao, Speech enhancement based on Bayesian decision and spectral amplitude estimation. EURASIP J. Audio Speech Music Process. 28, 1–18 (2015)
D.L. Donoho, I.M. Johnstone, Adapting to unknown smoothness via wavelet shrinkage. J. Am Stat. Assoc. 90, 1200–1224 (1995)
S. Dubost, O. Cappe, Enhancement of speech based on non-parametric estimation of a time varying harmonic representation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2000)
J.S. Erkelens, R.C. Hendriks, R. Heusdens, J. Jensen, Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors. IEEE Trans. Audio Speech Lang Process. 15, 1741–1752 (2007)
J.F. Gemmeke, H. Van hamme, B. Cranen, L. Boves, Compressive sensing for missing data imputation in noise robust speech recognition. IEEE J. Sel. Top. Signal Process. 4, 272–287 (2010)
J. Hardwick, C.D. Yoo, J.S. Lim, Speech enhancement using the dual excitation model, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (1993)
Y. Hu, P.C. Loizou, Speech enhancement based on wavelet thresholding the multitaper spectrum. IEEE Trans. Speech Audio Process 12, 59–68 (2004)
Y. Hu, P.C. Loizou, A generalized subspace approach for enhancing speech corrupted by colored noise. IEEE Trans. Speech Audio Process. 11(4), 334–341 (2003)
Y. Hu, P.C. Loizou, Subjective comparison and evaluation of speech enhancement algorithms. Speech Comm. 49, 588–601 (2007)
F. Huag, T. Lee, W.B. Kleijn, Transform-domain wiener filter for speech periodicity, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP (2012)
International Telecommunication Recommendation I. T. Rec.: p.56 (12/11) objective measurement of active speech level, in Proceedings of the Internatoinal Telecommunications Union, Geneva, Switzerland (1993)
C. Joder, F. Weninger, F. Eyben, D. Virette, B. Schuller, Real-time speech separation by semi-supervised nonnegative matrix factorization. LVA/ICA, p. 322–329 (2012)
M.T. Johnson, X. Yuan, Y. Ren, Speech signal enhancement through adaptive wavelet thresholding. Speech Commun. 49, 123–133 (2007)
K. Kwon, J.W. Shin, N.S. Kim, NMF-based speech enhancement using bases update. IEEE Trans. Signal Process. Lett. 22, 450–454 (2015)
C. Li, W.J. Liu, A novel multi-band spectral subtraction method based on phase modification and magnitude compensation, in Proceedings of the IEEE International Conferece on Acoustics, Speech, and Signal Processing, ICASSP (2011)
Z. Lin, M. Chen, L. Wu, Y. Ma, The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Technical Report UILU-ENG-09-2215, UIUC (2009)
P.C. Loizou, Speech Enhancement: Theory and Practice, 2nd edn. (CRC Press, Boca Raton, FL, 2013)
P.C. Loizou, Speech enhancement based on perceptually motivated bayesian estimators of the magnitude spectrum. IEEE Trans. Speech Audio Process. 13, 857–869 (2005)
Y. Lu, P.C. Loizou, A geometric approach to spectral subtraction. Speech Comm. 50, 453–466 (2008)
S. Mallat, A wavelet tour of signal processing, 3rd edn. (Academic Press, San Diego, 2008)
R. Martin, Speech enhancement based on minimum mean-square error estimation and supergaussian priors. EE Trans. Audio Speech Lang. Process 13, 845–856 (2005)
B.M. Sadler, A. Swami, Analysis of multi-scale products for step detection and estimation. IEEE Trans. Inf. Theory 45, 1043–1053 (1999)
C.D. Sigg, T. Dikk, J.M. Buhmann, Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process 20, 1698–1712 (2012)
A. Varga, H.J.M. Steeneken, NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Comm. 12, 247–251 (1993)
E. Vincent, MUSHRAM: A MATLAB interface for MUSHRA listening tests. http://c4dm.eecs.qmul.ac.uk/downloads/
T. Virtanen, Monaural sound source separation by non-negative matrix factorization with temporal continuity and sparseness criteria. IEEE Trans. Audio Speech Lang. Process 15, 1066–1074 (2007)
D. Wang, On ideal binary mask as the computational goal of auditory scene analysis, in Speech Separation by Humans and Machines, ed. by P. Divenyi (Kluwer, Berlin, 2005), pp. 181–197
Y. Xu, J.B. Weaver, D.M. Healy, J. Lu, Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans. Image Process. 3, 747–760 (1994)
T. Zhou, D. Tao, GoDec: Randomized low-rank & sparse matrix decomposition in noisy case, in Proceedings of ICML, Bellevue, WA, USA, pp. 33–40 (2011)
H. Zou, T. Hastie, R. Tibshirani, Sparse principal component analysis. J. Comput. Graph. Stat. 15, 265–286 (2006)
Acknowledgments
The authors would like to thank the editor and the anonymous reviewers for their helpful and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ben Messaoud, M.A., Bouzid, A. Sparse Representations for Single Channel Speech Enhancement Based on Voiced/Unvoiced Classification. Circuits Syst Signal Process 36, 1912–1933 (2017). https://doi.org/10.1007/s00034-016-0384-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-016-0384-6