Skip to main content
Log in

Star DGT: a robust Gabor transform for speech denoising

  • Original Article
  • Published:
Sampling Theory, Signal Processing, and Data Analysis Aims and scope Submit manuscript

Abstract

In this paper, we address the speech denoising problem, where Gaussian and coloured additive noises are to be removed from a given speech signal. Our approach is based on a redundant, analysis-sparse representation of the original speech signal. We pick an eigenvector of the Zauner unitary matrix and—under certain assumptions on the ambient dimension—we use it as window vector to generate a spark deficient Gabor frame. The analysis operator associated with such a frame, is a (highly) redundant Gabor transform, which we use as a sparsifying transform in the denoising procedure. We conduct computational experiments on real-world speech data, using as baseline three Gabor transforms generated by state-of-the-art window vectors in time-frequency analysis and compare their performance to the proposed Gabor transform. The results show that the proposed redundant Gabor transform outperforms previous ones consistently for all types of examined signals of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The Librispeech dataset we utilized is publicly available at http://www.openslr.org/12.

Notes

  1. Equivalently, we call such frames full spark frames.

  2. That is, the sparsity and cosparsity of a signal with respect to a sparsifying operator.

  3. Such a frame can also be generated by the eigenvectors of certain unitaries belonging to the Clifford group, under certain assumptions. However, since the algebraic nature of these assumptions goes beyond the scope of the present paper, we preferred to employ only the Zauner unitary matrix.

  4. The autocorrelation of a signal is defined as the inner product between the signal and its time-translated version.

  5. Since we target at speech denoising, a tractable alternative could be some perceptual variant of the matching pursuit (MP) method, e.g. [45]; however—to our knowledge—perceptual variants of MP account for synthesis sparsity, while we are interested in analysis-sparsity-based denoising.

  6. In terms of optimization, it is preferred to solve (4) instead of (3).

  7. We will interchangeably use both terms in the sequel.

  8. \(\beta \beta ^{-1}\equiv 1\textrm{mod}\,L\).

  9. In the rest of the paper, when we speak of coloured noises, we mean the examined cases of pink and blue noise.

References

  1. Chowdhury, T.H., Poudel, K.N., Hu, Y.: Time-frequency analysis, denoising, compression, segmentation, and classification of PCG signals. IEEE Access 8, 160882–160890 (2020)

    Article  Google Scholar 

  2. Yasuda, M., Koizumi, Y., Saito, S., Uematsu, H., Imoto, K.: Sound event localization based on sound intensity vector refined by DNN-based denoising and source separation. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 651–655. IEEE (2020)

  3. Grozdić, D.T., Jovičić, S.T., Subotić, M.: Whispered speech recognition using deep denoising autoencoder. Eng. Appl. Artif. Intell. 59, 15–22 (2017)

    Article  Google Scholar 

  4. Han, K., Wang, Y., Wang, D., Woods, W.S., Merks, I., Zhang, T.: Learning spectral mapping for speech dereverberation and denoising. IEEE/ACM Trans. Audio Speech Lang. Process. 23(6), 982–992 (2015)

    Article  Google Scholar 

  5. Yu, C., Zezario, R.E., Wang, S.-S., Sherman, J., Hsieh, Y.-Y., Lu, X., Wang, H.-M., Tsao, Y.: Speech enhancement based on denoising autoencoder with multi-branched encoders. IEEE/ACM Trans. Audio Speech Lang. Process. 28, 2756–2769 (2020)

    Article  Google Scholar 

  6. Zengyuan, L., Anming, D.: A speech denoising algorithm based on harmonic regeneration. In: IOP Conference Series: Earth and Environmental Science, vol. 332, p. 022042. IOP Publishing (2019)

  7. Grais, E.M., Plumbley, M.D.: Single channel audio source separation using convolutional denoising autoencoders. In: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1265–1269. IEEE (2017)

  8. Févotte, C., Torrésani, B., Daudet, L., Godsill, S.J.: Sparse linear regression with structured priors and application to denoising of musical audio. IEEE Trans. Audio Speech Lang. Process. 16(1), 174–185 (2007)

    Article  Google Scholar 

  9. Attias, H., Platt, J.C., Acero, A., Deng, L.: Speech denoising and dereverberation using probabilistic models. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in neural information processing systems, pp. 758–764. MIT Press (2001)

  10. Hasan, T., Hasan, M.K.: Suppression of residual noise from speech signals using empirical mode decomposition. IEEE Signal Process. Lett. 16(1), 2–5 (2008)

    Article  Google Scholar 

  11. Hussein, R., Shaban, K.B., El-Hag, A.H.: Denoising different types of acoustic partial discharge signals using power spectral subtraction. High Volt. 3(1), 44–50 (2018)

    Article  Google Scholar 

  12. Kamath, S., Loizou, P., et al.: A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In: ICASSP, vol. 4, pp. 44164–44164. Citeseer (2002)

  13. Yu, G., Mallat, S., Bacry, E.: Audio denoising by time-frequency block thresholding. IEEE Trans. Signal Process. 56(5), 1830–1839 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  14. Siedenburg, K., Dörfler, M.: Audio denoising by generalized time-frequency thresholding. In: Audio Engineering Society Conference: 45th International Conference: Applications of Time-Frequency Processing in Audio. Audio Engineering Society (2012)

  15. Rethage, D., Pons, J., Serra, X.: A wavenet for speech denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5069–5073. IEEE (2018)

  16. Xu, L., Choy, C.-S., Li, Y.-W.: Deep sparse rectifier neural networks for speech denoising. In: 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1–5 (2016)

  17. Masuyama, Y., Yatabe, K., Oikawa, Y.: Low-rankness of complex-valued spectrogram and its application to phase-aware audio processing. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 855–859. IEEE (2019)

  18. Sprechmann, P., Bronstein, A., Bronstein, M., Sapiro, G.: Learnable low rank sparse models for speech denoising. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 136–140. IEEE (2013)

  19. Plumbley, M.D., Blumensath, T., Daudet, L., Gribonval, R., Davies, M.E.: Sparse representations in audio and music: from coding to source separation. Proc. IEEE 98(6), 995–1005 (2009)

    Article  Google Scholar 

  20. Brajović, M., Stanković, I., Daković, M., Stanković, L.: Audio signal denoising based on laplacian filter and sparse signal reconstruction. In: 26th International Conference on Information Technology (IT), pp. 1–4. IEEE (2022)

  21. Liu, H., Liu, S., Li, Y., Li, D., Truong, T.-K.: Speech denoising based on group sparse representation in the case of gaussian noise. In: 23rd International Conference on Digital Signal Processing (DSP), pp. 1–5. IEEE (2018)

  22. Hadhami, I., Bouzid, A.: Speech denoising based on empirical mode decomposition and improved thresholding. In: International Conference on Nonlinear Speech Processing, pp. 200–207. Springer (2013)

  23. Abdulatif, S., Armanious, K., Guirguis, K., Sajeev, J.T., Yang, B.: Aegan: Time-frequency speech denoising via generative adversarial networks. In: 28th European Signal Processing Conference (EUSIPCO), pp. 451–455. IEEE (2021)

  24. Fletcher, A.K., Rangan, S., Goyal, V.K., Ramchandran, K.: Analysis of denoising by sparse approximation with random frame asymptotics. In: Proceedings. International Symposium on Information Theory, 2005. ISIT 2005., pp. 1706–1710. IEEE (2005)

  25. Coifman, R.R., Donoho, D.L.: Translation-invariant de-noising. In: Antoniadis, A., Oppenheim, G. (eds.) Wavelets and Statistics, pp. 125–150. Springer, New York, NY (1995)

  26. Yatabe, K., Oikawa, Y.: Phase corrected total variation for audio signals. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 656–660. IEEE (2018)

  27. Gaultier, C., Kitić, S., Bertin, N., Gribonval, R.: AUDASCITY: Audio denoising by adaptive social cosparsity. In: 25th European Signal Processing Conference (EUSIPCO), pp. 1265–1269. IEEE (2017)

  28. Genzel, M., Kutyniok, G., März, M.: \(l_1\)-analysis minimization and generalized (co-) sparsity: when does recovery succeed? Appl. Comput. Harmon. Anal. 52, 82–140 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  29. Selesnick, I.W., Figueiredo, M.A.: Signal restoration with overcomplete wavelet transforms: Comparison of analysis and synthesis priors. In: Wavelets XIII, vol. 7446, p. 74460. International Society for Optics and Photonics (2009)

  30. Kabanava, M., Rauhut, H.: Analysis \(l_1\)-recovery with frames and Gaussian measurements. Acta Appl. Math. 140(1), 173–195 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  31. Elad, M.: Sparse and redundant representations: from theory to applications in signal and image processing, vol. 2, no. 1. Springer, New York, NY (2010)

  32. Bhattacharya, G., Depalle, P.: Sparse denoising of audio by greedy time-frequency shrinkage. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2898–2902. IEEE (2014)

  33. Lawrence, J., Pfander, G.E., Walnut, D.: Linear independence of Gabor systems in finite dimensional vector spaces. J. Fourier Anal. Appl. 11(6), 715–726 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  34. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  35. Nam, S., Davies, M.E., Elad, M., Gribonval, R.: The cosparse analysis model and algorithms. Appl. Comput. Harmon. Anal. 34(1), 30–56 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  36. Blumensath, T., Davies, M.E.: Sampling theorems for signals from the union of finite-dimensional linear subspaces. IEEE Trans. Inf. Theory 55(4), 1872–1882 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  37. Fickus, M., Mixon, D.G., Tremain, J.C.: Steiner equiangular tight frames. Linear Algebra Appl. 436(5), 1014–1027 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  38. van Schijndel, N.H., Houtgast, T., Festen, J.M.: Intensity discrimination of gaussian-windowed tones: indications for the shape of the auditory frequency-time window. J. Acoust. Soc. Am. 105(6), 3425–3435 (1999)

    Article  Google Scholar 

  39. Guenther, F.H., Hickok, G.: Role of the auditory system in speech production. Handb. Clin. Neurol. 129, 161–175 (2015)

    Article  Google Scholar 

  40. Qiu, A., Schreiner, C.E., Escabí, M.A.: Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. J. Neurophysiol. 90(1), 456–476 (2003)

    Article  Google Scholar 

  41. Necciari, T., Holighaus, N., Balazs, P., Pruša, Z., Majdak, P., Derrien, O.: Audlet filter banks: a versatile analysis/synthesis framework using auditory frequency scales. Appl. Sci. 8(1), 96 (2018)

    Article  Google Scholar 

  42. Kouni, V., Rauhut, H.: Spark deficient Gabor frame provides a novel analysis operator for compressed sensing. In: Mantoro, T., Lee, M., Ayu, M.A., Wong, K.W., Hidayanto, A.N. (eds.) Neural Information Processing, pp. 700–708. Springer, Cham (2021)

    Chapter  Google Scholar 

  43. Zauner, G.: Quantum Designs. Ph.D. thesis, University of Vienna Vienna (1999)

  44. Zhivomirov, H.: A method for colored noise generation. Roman. J. Acoust. Vibr. 15(1), 14–19 (2018)

    Google Scholar 

  45. Chardon, G., Necciari, T., Balazs, P.: Perceptual matching pursuit with Gabor dictionaries and time-frequency masking. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3102–3106. IEEE (2014)

  46. Becker, S.R., Candès, E.J., Grant, M.C.: Templates for convex cone problems with applications to sparse signal recovery. Math. Program. Comput. 3(3), 165 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  47. Søndergaard, P.L.: Efficient algorithms for the discrete Gabor transform with a long fir window. J. Fourier Anal. Appl. 18(3), 456–470 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  48. Malikiosis, R.-D.: A note on Gabor frames in finite dimensions. Appl. Comput. Harmon. Anal. 38(2), 318–330 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  49. Scherzer, O.: Handbook of Mathematical Methods in Imaging. Springer Science & Business Media, Berlin (2010)

    Google Scholar 

  50. Malikiosis, R.-D.: Spark deficient Gabor frames. Pac. J. Math. 294(1), 159–180 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  51. Dang, H.B., Blanchfield, K., Bengtsson, I., Appleby, D.M.: Linear dependencies in Weyl–Heisenberg orbits. Quantum Inf. Process. 12(11), 3449–3475 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  52. Søndergaard, P.L., Torrésani, B., Balazs, P.: The linear time frequency analysis toolbox. Int. J. Wavelets Multiresolut. Inf. Process. 10(04), 1250032 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  53. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210. IEEE (2015)

  54. Booth, T.E.: Power iteration method for the several largest eigenvalues and eigenfunctions. Nucl. Sci. Eng. 154(1), 48–62 (2006)

    Article  MathSciNet  Google Scholar 

  55. Isar, D., Gajitzki, P.: Pink noise generation using wavelets. In: 2016 12th IEEE International Symposium on Electronics and Telecommunications (ISETC), pp. 261–264. IEEE (2016)

  56. Kailkhura, B., Thiagarajan, J.J., Bremer, P.-T., Varshney, P.K.: Stair blue noise sampling. ACM Trans. Graph. (TOG) 35(6), 1–10 (2016)

    Article  Google Scholar 

  57. Chergui, L., Bouguezel, S.: A new pre-whitening transform domain LMS algorithm and its application to speech denoising. Signal Process. 130, 118–128 (2017)

    Article  Google Scholar 

  58. Dahlke, S., Heuer, S., Holzmann, H., Tafo, P.: Statistically optimal estimation of signals in modulation spaces using Gabor frames. IEEE Trans. Inf. Theory 68(6), 4182–4200 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  59. Luan, S., Chen, C., Zhang, B., Han, J., Liu, J.: Gabor convolutional networks. IEEE Trans. Image Process. 27(9), 4357–4366 (2018)

    Article  MathSciNet  Google Scholar 

  60. Tillmann, A.M.: Computing the spark: mixed-integer programming for the (vector) matroid girth problem. Comput. Optim. Appl. 74(2), 387–441 (2019)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

V. Kouni was financially supported for this work by the German Academic Exchange Service (DAAD) through the program Research Grants - One-Year Grants for Doctoral Candidates, 2020–2021. V. Kouni would also like to thank G. Paraskevopoulos for his valuable advice and insightful discussions around the framework presented in this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicky Kouni.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Communicated by Ron Levie.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kouni, V., Rauhut, H. & Theoharis, T. Star DGT: a robust Gabor transform for speech denoising. Sampl. Theory Signal Process. Data Anal. 21, 14 (2023). https://doi.org/10.1007/s43670-023-00053-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43670-023-00053-x

Keywords

Mathematics Subject Classification

Navigation