Star DGT: a robust Gabor transform for speech denoising

Kouni, Vicky; Rauhut, Holger; Theoharis, Theoharis

doi:10.1007/s43670-023-00053-x

Star DGT: a robust Gabor transform for speech denoising

Original Article
Published: 12 April 2023

Volume 21, article number 14, (2023)
Cite this article

Sampling Theory, Signal Processing, and Data Analysis Aims and scope Submit manuscript

85 Accesses
1 Altmetric
Explore all metrics

Abstract

In this paper, we address the speech denoising problem, where Gaussian and coloured additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and—under certain assumptions on the ambient dimension—we use it as window vector to generate a spark deficient Gabor frame. The analysis operator associated with such a frame, is a (highly) redundant Gabor transform, which we use as a sparsifying transform in the denoising procedure. We conduct computational experiments on real-world speech data, using as baseline three Gabor transforms generated by state-of-the-art window vectors in time-frequency analysis and compare their performance to the proposed Gabor transform. The results show that the proposed redundant Gabor transform outperforms previous ones consistently for all types of examined signals of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Brief review of image denoising techniques

Article Open access 08 July 2019

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Review of wavelet denoising algorithms

Article 03 April 2023

Data Availability

The Librispeech dataset we utilized is publicly available at http://www.openslr.org/12.

Notes

Equivalently, we call such frames full spark frames.
That is, the sparsity and cosparsity of a signal with respect to a sparsifying operator.
Such a frame can also be generated by the eigenvectors of certain unitaries belonging to the Clifford group, under certain assumptions. However, since the algebraic nature of these assumptions goes beyond the scope of the present paper, we preferred to employ only the Zauner unitary matrix.
The autocorrelation of a signal is defined as the inner product between the signal and its time-translated version.
Since we target at speech denoising, a tractable alternative could be some perceptual variant of the matching pursuit (MP) method, e.g. [45]; however—to our knowledge—perceptual variants of MP account for synthesis sparsity, while we are interested in analysis-sparsity-based denoising.
In terms of optimization, it is preferred to solve (4) instead of (3).
We will interchangeably use both terms in the sequel.
\(\beta \beta ^{-1}\equiv 1\textrm{mod}\,L\).
In the rest of the paper, when we speak of coloured noises, we mean the examined cases of pink and blue noise.

References

Chowdhury, T.H., Poudel, K.N., Hu, Y.: Time-frequency analysis, denoising, compression, segmentation, and classification of PCG signals. IEEE Access 8, 160882–160890 (2020)
Article Google Scholar
Yasuda, M., Koizumi, Y., Saito, S., Uematsu, H., Imoto, K.: Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 651–655. IEEE (2020)
Grozdić, D.T., Jovičić, S.T., Subotić, M.: Whispered speech recognition using deep denoising autoencoder. Eng. Appl. Artif. Intell. 59, 15–22 (2017)
Article Google Scholar
Han, K., Wang, Y., Wang, D., Woods, W.S., Merks, I., Zhang, T.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)
Article Google Scholar
Yu, C., Zezario, R.E., Wang, S.-S., Sherman, J., Hsieh, Y.-Y., Lu, X., Wang, H.-M., Tsao, Y.: Speech enhancement based on denoising autoencoder with multi-branched encoders. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2756–2769 (2020)
Article Google Scholar
Zengyuan, L., Anming, D.: A speech denoising algorithm based on harmonic regeneration. In: IOP Conference Series: Earth and Environmental Science, vol. 332, p. 022042. IOP Publishing (2019)
Grais, E.M., Plumbley, M.D.: Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1265–1269. IEEE (2017)
Févotte, C., Torrésani, B., Daudet, L., Godsill, S.J.: Sparse linear regression with structured priors and application to denoising of musical audio. IEEE Trans. Audio Speech Lang. Process. 16(1), 174–185 (2007)
Article Google Scholar
Attias, H., Platt, J.C., Acero, A., Deng, L.: Speech denoising and dereverberation using probabilistic models. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in neural information processing systems, pp. 758–764. MIT Press (2001)
Hasan, T., Hasan, M.K.: Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process. Lett. 16(1), 2–5 (2008)
Article Google Scholar
Hussein, R., Shaban, K.B., El-Hag, A.H.: Denoising different types of acoustic partial discharge signals using power spectral subtraction. High Volt. 3(1), 44–50 (2018)
Article Google Scholar
Kamath, S., Loizou, P., et al.: A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: ICASSP, vol. 4, pp. 44164–44164. Citeseer (2002)
Yu, G., Mallat, S., Bacry, E.: Audio denoising by time-frequency block thresholding. IEEE Trans. Signal Process. 56(5), 1830–1839 (2008)
Article MathSciNet MATH Google Scholar
Siedenburg, K., Dörfler, M.: Audio denoising by generalized time-frequency thresholding. In: Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio. Audio Engineering Society (2012)
Rethage, D., Pons, J., Serra, X.: A wavenet for speech denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5069–5073. IEEE (2018)
Xu, L., Choy, C.-S., Li, Y.-W.: Deep sparse rectifier neural networks for speech denoising. In: 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5 (2016)
Masuyama, Y., Yatabe, K., Oikawa, Y.: Low-rankness of complex-valued spectrogram and its application to phase-aware audio processing. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 855–859. IEEE (2019)
Sprechmann, P., Bronstein, A., Bronstein, M., Sapiro, G.: Learnable low rank sparse models for speech denoising. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 136–140. IEEE (2013)
Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2009)
Article Google Scholar
Brajović, M., Stanković, I., Daković, M., Stanković, L.: Audio signal denoising based on laplacian filter and sparse signal reconstruction. In: 26th International Conference on Information Technology (IT), pp. 1–4. IEEE (2022)
Liu, H., Liu, S., Li, Y., Li, D., Truong, T.-K.: Speech denoising based on group sparse representation in the case of gaussian noise. In: 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5. IEEE (2018)
Hadhami, I., Bouzid, A.: Speech denoising based on empirical mode decomposition and improved thresholding. In: International Conference on Nonlinear Speech Processing, pp. 200–207. Springer (2013)
Abdulatif, S., Armanious, K., Guirguis, K., Sajeev, J.T., Yang, B.: Aegan: Time-frequency speech denoising via generative adversarial networks. In: 28th European Signal Processing Conference (EUSIPCO), pp. 451–455. IEEE (2021)
Fletcher, A.K., Rangan, S., Goyal, V.K., Ramchandran, K.: Analysis of denoising by sparse approximation with random frame asymptotics. In: Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., pp. 1706–1710. IEEE (2005)
Coifman, R.R., Donoho, D.L.: Translation-invariant de-noising. In: Antoniadis, A., Oppenheim, G. (eds.) Wavelets and Statistics, pp. 125–150. Springer, New York, NY (1995)
Yatabe, K., Oikawa, Y.: Phase corrected total variation for audio signals. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 656–660. IEEE (2018)
Gaultier, C., Kitić, S., Bertin, N., Gribonval, R.: AUDASCITY: Audio denoising by adaptive social cosparsity. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1265–1269. IEEE (2017)
Genzel, M., Kutyniok, G., März, M.: \(l_1\)-analysis minimization and generalized (co-) sparsity: when does recovery succeed? Appl. Comput. Harmon. Anal. 52, 82–140 (2021)
Article MathSciNet MATH Google Scholar
Selesnick, I.W., Figueiredo, M.A.: Signal restoration with overcomplete wavelet transforms: Comparison of analysis and synthesis priors. In: Wavelets XIII, vol. 7446, p. 74460. International Society for Optics and Photonics (2009)
Kabanava, M., Rauhut, H.: Analysis \(l_1\)-recovery with frames and Gaussian measurements. Acta Appl. Math. 140(1), 173–195 (2015)
Article MathSciNet MATH Google Scholar
Elad, M.: Sparse and redundant representations: from theory to applications in signal and image processing, vol. 2, no. 1. Springer, New York, NY (2010)
Bhattacharya, G., Depalle, P.: Sparse denoising of audio by greedy time-frequency shrinkage. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2898–2902. IEEE (2014)
Lawrence, J., Pfander, G.E., Walnut, D.: Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl. 11(6), 715–726 (2005)
Article MathSciNet MATH Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Nam, S., Davies, M.E., Elad, M., Gribonval, R.: The cosparse analysis model and algorithms. Appl. Comput. Harmon. Anal. 34(1), 30–56 (2013)
Article MathSciNet MATH Google Scholar
Blumensath, T., Davies, M.E.: Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009)
Article MathSciNet MATH Google Scholar
Fickus, M., Mixon, D.G., Tremain, J.C.: Steiner equiangular tight frames. Linear Algebra Appl. 436(5), 1014–1027 (2012)
Article MathSciNet MATH Google Scholar
van Schijndel, N.H., Houtgast, T., Festen, J.M.: Intensity discrimination of gaussian-windowed tones: indications for the shape of the auditory frequency-time window. J. Acoust. Soc. Am. 105(6), 3425–3435 (1999)
Article Google Scholar
Guenther, F.H., Hickok, G.: Role of the auditory system in speech production. Handb. Clin. Neurol. 129, 161–175 (2015)
Article Google Scholar
Qiu, A., Schreiner, C.E., Escabí, M.A.: Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. J. Neurophysiol. 90(1), 456–476 (2003)
Article Google Scholar
Necciari, T., Holighaus, N., Balazs, P., Pruša, Z., Majdak, P., Derrien, O.: Audlet filter banks: a versatile analysis/synthesis framework using auditory frequency scales. Appl. Sci. 8(1), 96 (2018)
Article Google Scholar
Kouni, V., Rauhut, H.: Spark deficient Gabor frame provides a novel analysis operator for compressed sensing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) Neural Information Processing, pp. 700–708. Springer, Cham (2021)
Chapter Google Scholar
Zauner, G.: Quantum Designs. Ph.D. thesis, University of Vienna Vienna (1999)
Zhivomirov, H.: A method for colored noise generation. Roman. J. Acoust. Vibr. 15(1), 14–19 (2018)
Google Scholar
Chardon, G., Necciari, T., Balazs, P.: Perceptual matching pursuit with Gabor dictionaries and time-frequency masking. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3102–3106. IEEE (2014)
Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165 (2011)
Article MathSciNet MATH Google Scholar
Søndergaard, P.L.: Efficient algorithms for the discrete Gabor transform with a long fir window. J. Fourier Anal. Appl. 18(3), 456–470 (2012)
Article MathSciNet MATH Google Scholar
Malikiosis, R.-D.: A note on Gabor frames in finite dimensions. Appl. Comput. Harmon. Anal. 38(2), 318–330 (2015)
Article MathSciNet MATH Google Scholar
Scherzer, O.: Handbook of Mathematical Methods in Imaging. Springer Science & Business Media, Berlin (2010)
Google Scholar
Malikiosis, R.-D.: Spark deficient Gabor frames. Pac. J. Math. 294(1), 159–180 (2018)
Article MathSciNet MATH Google Scholar
Dang, H.B., Blanchfield, K., Bengtsson, I., Appleby, D.M.: Linear dependencies in Weyl–Heisenberg orbits. Quantum Inf. Process. 12(11), 3449–3475 (2013)
Article MathSciNet MATH Google Scholar
Søndergaard, P.L., Torrésani, B., Balazs, P.: The linear time frequency analysis toolbox. Int. J. Wavelets Multiresolut. Inf. Process. 10(04), 1250032 (2012)
Article MathSciNet MATH Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)
Booth, T.E.: Power iteration method for the several largest eigenvalues and eigenfunctions. Nucl. Sci. Eng. 154(1), 48–62 (2006)
Article MathSciNet Google Scholar
Isar, D., Gajitzki, P.: Pink noise generation using wavelets. In: 2016 12th IEEE International Symposium on Electronics and Telecommunications (ISETC), pp. 261–264. IEEE (2016)
Kailkhura, B., Thiagarajan, J.J., Bremer, P.-T., Varshney, P.K.: Stair blue noise sampling. ACM Trans. Graph. (TOG) 35(6), 1–10 (2016)
Article Google Scholar
Chergui, L., Bouguezel, S.: A new pre-whitening transform domain LMS algorithm and its application to speech denoising. Signal Process. 130, 118–128 (2017)
Article Google Scholar
Dahlke, S., Heuer, S., Holzmann, H., Tafo, P.: Statistically optimal estimation of signals in modulation spaces using Gabor frames. IEEE Trans. Inf. Theory 68(6), 4182–4200 (2022)
Article MathSciNet MATH Google Scholar
Luan, S., Chen, C., Zhang, B., Han, J., Liu, J.: Gabor convolutional networks. IEEE Trans. Image Process. 27(9), 4357–4366 (2018)
Article MathSciNet Google Scholar
Tillmann, A.M.: Computing the spark: mixed-integer programming for the (vector) matroid girth problem. Comput. Optim. Appl. 74(2), 387–441 (2019)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

V. Kouni was financially supported for this work by the German Academic Exchange Service (DAAD) through the program Research Grants - One-Year Grants for Doctoral Candidates, 2020–2021. V. Kouni would also like to thank G. Paraskevopoulos for his valuable advice and insightful discussions around the framework presented in this paper.

Author information

Authors and Affiliations

Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Athens, Greece
Vicky Kouni & Theoharis Theoharis
Chair for Mathematics of Information Processing, RWTH Aachen University, Aachen, Germany
Holger Rauhut

Authors

Vicky Kouni
View author publications
You can also search for this author in PubMed Google Scholar
Holger Rauhut
View author publications
You can also search for this author in PubMed Google Scholar
Theoharis Theoharis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicky Kouni.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Communicated by Ron Levie.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kouni, V., Rauhut, H. & Theoharis, T. Star DGT: a robust Gabor transform for speech denoising. Sampl. Theory Signal Process. Data Anal. 21, 14 (2023). https://doi.org/10.1007/s43670-023-00053-x

Download citation

Received: 22 February 2022
Accepted: 21 March 2023
Published: 12 April 2023
DOI: https://doi.org/10.1007/s43670-023-00053-x

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Star DGT: a robust Gabor transform for speech denoising

Abstract

Access this article

Similar content being viewed by others

Brief review of image denoising techniques

Speech Emotion Recognition: A Comprehensive Survey

Review of wavelet denoising algorithms

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Star DGT: a robust Gabor transform for speech denoising

Abstract

Access this article

Similar content being viewed by others

Brief review of image denoising techniques

Speech Emotion Recognition: A Comprehensive Survey

Review of wavelet denoising algorithms

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation