Abstract
In this paper, we address the speech denoising problem, where Gaussian and coloured additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and—under certain assumptions on the ambient dimension—we use it as window vector to generate a spark deficient Gabor frame. The analysis operator associated with such a frame, is a (highly) redundant Gabor transform, which we use as a sparsifying transform in the denoising procedure. We conduct computational experiments on real-world speech data, using as baseline three Gabor transforms generated by state-of-the-art window vectors in time-frequency analysis and compare their performance to the proposed Gabor transform. The results show that the proposed redundant Gabor transform outperforms previous ones consistently for all types of examined signals of noise.
Similar content being viewed by others
Data Availability
The Librispeech dataset we utilized is publicly available at http://www.openslr.org/12.
Notes
Equivalently, we call such frames full spark frames.
That is, the sparsity and cosparsity of a signal with respect to a sparsifying operator.
Such a frame can also be generated by the eigenvectors of certain unitaries belonging to the Clifford group, under certain assumptions. However, since the algebraic nature of these assumptions goes beyond the scope of the present paper, we preferred to employ only the Zauner unitary matrix.
The autocorrelation of a signal is defined as the inner product between the signal and its time-translated version.
Since we target at speech denoising, a tractable alternative could be some perceptual variant of the matching pursuit (MP) method, e.g. [45]; however—to our knowledge—perceptual variants of MP account for synthesis sparsity, while we are interested in analysis-sparsity-based denoising.
We will interchangeably use both terms in the sequel.
\(\beta \beta ^{-1}\equiv 1\textrm{mod}\,L\).
In the rest of the paper, when we speak of coloured noises, we mean the examined cases of pink and blue noise.
References
Chowdhury, T.H., Poudel, K.N., Hu, Y.: Time-frequency analysis, denoising, compression, segmentation, and classification of PCG signals. IEEE Access 8, 160882–160890 (2020)
Yasuda, M., Koizumi, Y., Saito, S., Uematsu, H., Imoto, K.: Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 651–655. IEEE (2020)
Grozdić, D.T., Jovičić, S.T., Subotić, M.: Whispered speech recognition using deep denoising autoencoder. Eng. Appl. Artif. Intell. 59, 15–22 (2017)
Han, K., Wang, Y., Wang, D., Woods, W.S., Merks, I., Zhang, T.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)
Yu, C., Zezario, R.E., Wang, S.-S., Sherman, J., Hsieh, Y.-Y., Lu, X., Wang, H.-M., Tsao, Y.: Speech enhancement based on denoising autoencoder with multi-branched encoders. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2756–2769 (2020)
Zengyuan, L., Anming, D.: A speech denoising algorithm based on harmonic regeneration. In: IOP Conference Series: Earth and Environmental Science, vol. 332, p. 022042. IOP Publishing (2019)
Grais, E.M., Plumbley, M.D.: Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1265–1269. IEEE (2017)
Févotte, C., Torrésani, B., Daudet, L., Godsill, S.J.: Sparse linear regression with structured priors and application to denoising of musical audio. IEEE Trans. Audio Speech Lang. Process. 16(1), 174–185 (2007)
Attias, H., Platt, J.C., Acero, A., Deng, L.: Speech denoising and dereverberation using probabilistic models. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in neural information processing systems, pp. 758–764. MIT Press (2001)
Hasan, T., Hasan, M.K.: Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process. Lett. 16(1), 2–5 (2008)
Hussein, R., Shaban, K.B., El-Hag, A.H.: Denoising different types of acoustic partial discharge signals using power spectral subtraction. High Volt. 3(1), 44–50 (2018)
Kamath, S., Loizou, P., et al.: A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: ICASSP, vol. 4, pp. 44164–44164. Citeseer (2002)
Yu, G., Mallat, S., Bacry, E.: Audio denoising by time-frequency block thresholding. IEEE Trans. Signal Process. 56(5), 1830–1839 (2008)
Siedenburg, K., Dörfler, M.: Audio denoising by generalized time-frequency thresholding. In: Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio. Audio Engineering Society (2012)
Rethage, D., Pons, J., Serra, X.: A wavenet for speech denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5069–5073. IEEE (2018)
Xu, L., Choy, C.-S., Li, Y.-W.: Deep sparse rectifier neural networks for speech denoising. In: 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5 (2016)
Masuyama, Y., Yatabe, K., Oikawa, Y.: Low-rankness of complex-valued spectrogram and its application to phase-aware audio processing. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 855–859. IEEE (2019)
Sprechmann, P., Bronstein, A., Bronstein, M., Sapiro, G.: Learnable low rank sparse models for speech denoising. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 136–140. IEEE (2013)
Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2009)
Brajović, M., Stanković, I., Daković, M., Stanković, L.: Audio signal denoising based on laplacian filter and sparse signal reconstruction. In: 26th International Conference on Information Technology (IT), pp. 1–4. IEEE (2022)
Liu, H., Liu, S., Li, Y., Li, D., Truong, T.-K.: Speech denoising based on group sparse representation in the case of gaussian noise. In: 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5. IEEE (2018)
Hadhami, I., Bouzid, A.: Speech denoising based on empirical mode decomposition and improved thresholding. In: International Conference on Nonlinear Speech Processing, pp. 200–207. Springer (2013)
Abdulatif, S., Armanious, K., Guirguis, K., Sajeev, J.T., Yang, B.: Aegan: Time-frequency speech denoising via generative adversarial networks. In: 28th European Signal Processing Conference (EUSIPCO), pp. 451–455. IEEE (2021)
Fletcher, A.K., Rangan, S., Goyal, V.K., Ramchandran, K.: Analysis of denoising by sparse approximation with random frame asymptotics. In: Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., pp. 1706–1710. IEEE (2005)
Coifman, R.R., Donoho, D.L.: Translation-invariant de-noising. In: Antoniadis, A., Oppenheim, G. (eds.) Wavelets and Statistics, pp. 125–150. Springer, New York, NY (1995)
Yatabe, K., Oikawa, Y.: Phase corrected total variation for audio signals. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 656–660. IEEE (2018)
Gaultier, C., Kitić, S., Bertin, N., Gribonval, R.: AUDASCITY: Audio denoising by adaptive social cosparsity. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1265–1269. IEEE (2017)
Genzel, M., Kutyniok, G., März, M.: \(l_1\)-analysis minimization and generalized (co-) sparsity: when does recovery succeed? Appl. Comput. Harmon. Anal. 52, 82–140 (2021)
Selesnick, I.W., Figueiredo, M.A.: Signal restoration with overcomplete wavelet transforms: Comparison of analysis and synthesis priors. In: Wavelets XIII, vol. 7446, p. 74460. International Society for Optics and Photonics (2009)
Kabanava, M., Rauhut, H.: Analysis \(l_1\)-recovery with frames and Gaussian measurements. Acta Appl. Math. 140(1), 173–195 (2015)
Elad, M.: Sparse and redundant representations: from theory to applications in signal and image processing, vol. 2, no. 1. Springer, New York, NY (2010)
Bhattacharya, G., Depalle, P.: Sparse denoising of audio by greedy time-frequency shrinkage. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2898–2902. IEEE (2014)
Lawrence, J., Pfander, G.E., Walnut, D.: Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl. 11(6), 715–726 (2005)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Nam, S., Davies, M.E., Elad, M., Gribonval, R.: The cosparse analysis model and algorithms. Appl. Comput. Harmon. Anal. 34(1), 30–56 (2013)
Blumensath, T., Davies, M.E.: Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009)
Fickus, M., Mixon, D.G., Tremain, J.C.: Steiner equiangular tight frames. Linear Algebra Appl. 436(5), 1014–1027 (2012)
van Schijndel, N.H., Houtgast, T., Festen, J.M.: Intensity discrimination of gaussian-windowed tones: indications for the shape of the auditory frequency-time window. J. Acoust. Soc. Am. 105(6), 3425–3435 (1999)
Guenther, F.H., Hickok, G.: Role of the auditory system in speech production. Handb. Clin. Neurol. 129, 161–175 (2015)
Qiu, A., Schreiner, C.E., Escabí, M.A.: Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. J. Neurophysiol. 90(1), 456–476 (2003)
Necciari, T., Holighaus, N., Balazs, P., Pruša, Z., Majdak, P., Derrien, O.: Audlet filter banks: a versatile analysis/synthesis framework using auditory frequency scales. Appl. Sci. 8(1), 96 (2018)
Kouni, V., Rauhut, H.: Spark deficient Gabor frame provides a novel analysis operator for compressed sensing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) Neural Information Processing, pp. 700–708. Springer, Cham (2021)
Zauner, G.: Quantum Designs. Ph.D. thesis, University of Vienna Vienna (1999)
Zhivomirov, H.: A method for colored noise generation. Roman. J. Acoust. Vibr. 15(1), 14–19 (2018)
Chardon, G., Necciari, T., Balazs, P.: Perceptual matching pursuit with Gabor dictionaries and time-frequency masking. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3102–3106. IEEE (2014)
Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165 (2011)
Søndergaard, P.L.: Efficient algorithms for the discrete Gabor transform with a long fir window. J. Fourier Anal. Appl. 18(3), 456–470 (2012)
Malikiosis, R.-D.: A note on Gabor frames in finite dimensions. Appl. Comput. Harmon. Anal. 38(2), 318–330 (2015)
Scherzer, O.: Handbook of Mathematical Methods in Imaging. Springer Science & Business Media, Berlin (2010)
Malikiosis, R.-D.: Spark deficient Gabor frames. Pac. J. Math. 294(1), 159–180 (2018)
Dang, H.B., Blanchfield, K., Bengtsson, I., Appleby, D.M.: Linear dependencies in Weyl–Heisenberg orbits. Quantum Inf. Process. 12(11), 3449–3475 (2013)
Søndergaard, P.L., Torrésani, B., Balazs, P.: The linear time frequency analysis toolbox. Int. J. Wavelets Multiresolut. Inf. Process. 10(04), 1250032 (2012)
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Booth, T.E.: Power iteration method for the several largest eigenvalues and eigenfunctions. Nucl. Sci. Eng. 154(1), 48–62 (2006)
Isar, D., Gajitzki, P.: Pink noise generation using wavelets. In: 2016 12th IEEE International Symposium on Electronics and Telecommunications (ISETC), pp. 261–264. IEEE (2016)
Kailkhura, B., Thiagarajan, J.J., Bremer, P.-T., Varshney, P.K.: Stair blue noise sampling. ACM Trans. Graph. (TOG) 35(6), 1–10 (2016)
Chergui, L., Bouguezel, S.: A new pre-whitening transform domain LMS algorithm and its application to speech denoising. Signal Process. 130, 118–128 (2017)
Dahlke, S., Heuer, S., Holzmann, H., Tafo, P.: Statistically optimal estimation of signals in modulation spaces using Gabor frames. IEEE Trans. Inf. Theory 68(6), 4182–4200 (2022)
Luan, S., Chen, C., Zhang, B., Han, J., Liu, J.: Gabor convolutional networks. IEEE Trans. Image Process. 27(9), 4357–4366 (2018)
Tillmann, A.M.: Computing the spark: mixed-integer programming for the (vector) matroid girth problem. Comput. Optim. Appl. 74(2), 387–441 (2019)
Acknowledgements
V. Kouni was financially supported for this work by the German Academic Exchange Service (DAAD) through the program Research Grants - One-Year Grants for Doctoral Candidates, 2020–2021. V. Kouni would also like to thank G. Paraskevopoulos for his valuable advice and insightful discussions around the framework presented in this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Communicated by Ron Levie.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kouni, V., Rauhut, H. & Theoharis, T. Star DGT: a robust Gabor transform for speech denoising. Sampl. Theory Signal Process. Data Anal. 21, 14 (2023). https://doi.org/10.1007/s43670-023-00053-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43670-023-00053-x