Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Raj, Shivangi; Prakasam, P.; Gupta, Shubham

doi:10.1007/s00521-021-05782-5

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Original Article
Published: 17 February 2021

Volume 33, pages 10199–10209, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

454 Accesses
6 Citations
Explore all metrics

Abstract

The denoising of audio signal and quality enhancement has a substantial contribution in speaker identification, audio transmission, hearing aids, microphones, mobile phones, etc., Hence, an efficient denoising method is required to enhance the audio signal quality securely. A robust multilayered convolutional neural network (MLCNN)-based auto-CODEC for audio signal denoising which is utilizing the mel-frequency cepstral coefficients (MFCCs) has been proposed in this research. The MLCNN takes the input as MFCC with different frames from the noise-contaminated audio signal for training and testing. The proposed MLCNN model has been trained and tested as 80:20 ratios for the available MIT database. After the training, the proposed method has been validated. From the validation, it has been found that the proposed MLCNN model provides an accuracy of 93.25%. The performance of MLCNN has been evaluated and compared with the reported methods using short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and cosine similarities. From the performance comparisons, it has been found that the proposed MLCNN model outperforms other models. From the cosine similarity, it has been proved that MLCNN provides high security level which can be used for many secure applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Article 28 January 2021

Mel spectrogram-based audio forgery detection using CNN

Article 19 December 2022

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Article 27 March 2019

Code availability

Custom code.

Notes

https://www.kaggle.com/nltkdata/timitcorpus

References

Ali MA, Shemi PM (2015) An improved method of audio denoising based on wavelet transform. Proceedings of the IEEE International Conference on Power, Instrumentation, Control and Computing. 53:1–6
Google Scholar
Haneche H, Boudraa B, Ouahabi A (2020) A new way to enhance speech signal based on compressed sensing. Measurement. https://doi.org/10.1016/j.measurement.2019.107117
Article Google Scholar
Welk M., Bergmeister A., Weickert J (2015) Denoising of Audio Data by Nonlinear Diffusion, In: Scale Space and PDE Methods in Computer Vision. Lecture Notes in Computer Science. 3459; 598–609.
Yu, G. Bacry, E. Mallat, S (2007) Audio Signal Denoising with Complex Wavelets and Adaptive Block Attenuation. In:Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, 863–869.
Ali A (2019) Impulse noise reduction in audio signal through multi-stage technique. Engineering Science and Technology, an International Journal 22(2):629–636
Article Google Scholar
Pohjalainen, J. Ringeval, F. Zhang, Z. Schuller, B. (2016). Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition, In: Proceedings of the 24^th ACM International Conference on Multimedia. 670 - 674.
Davoudabadi MJ, Mina A (2017) A fuzzy-wavelet denoising technique with applications to noise reduction in audio signals. Journal of Intelligent & Fuzzy Systems 33(4):2159–2169
Article Google Scholar
Das N, Chakraborty S, Chaki J, Padhy N, Dey N (2020) Fundamentals, present and future perspectives of speech enhancement. Int J Speech Technol. https://doi.org/10.1007/s10772-020-09674-2
Article Google Scholar
Michelashvili M, Wolf L (2019) 2019. Audio Denoising with Deep Network Priors, CoRR
Google Scholar
Bhat GS, Shankar N, Reddy CKA, Panahi IMS (2019) A Real-Time Convolutional Neural Network Based Speech Enhancement for Hearing Impaired Listeners Using Smartphone. IEEE Access 7:78421–78433
Article Google Scholar
Candes, E,J. Li, X. Ma, Y. Wright, J (2011) Robust principal component analysis?. Journal of the ACM. 58(3); 11:01–11:37.
Chin YH, Wang JC, Huang CL, Wang KY, Wu CH (2017) Speaker identification using discriminative features and sparse representation. IEEE Trans Inf Forensics Secur 12:1979–1987
Article Google Scholar
Wilson, K.W. Raj, B. Smaragdis, P. Divakaran, A. (2009). Speech denoising using nonnegative matrix factorization with priors, In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 32; 4029–4032.
Wang JC, Lee YS, Lin CH, Wang SF, Shih CH, Wu CH (2016) Compressive sensing-based speech enhancement. IEEE/ACM Transactions on Audio, Speech and Language Processing 24(11):2122–2131
Article Google Scholar
Tan K, Wang D (2019) Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE/ACM Transactions on Audio, Speech and Language Processing 28:380–390
Article Google Scholar
Alamdari N, Azarang A, Kehtarnavaz N (2020) Improving deep speech denoising by Noisy2Noisy signal mapping. Appl Acoust. https://doi.org/10.1016/j.apacoust.2020.107631
Article Google Scholar
Chen, Z. Watanabe, S. Erdogan, H. Hershey, J.R. (2015). Speech enhancement and recognition using multi-task learning of long short term memory recurrent neural networks, In: Proceedings of the 16^th Annual Conference of the International Speech Communication Association, 3274–3278.
Sun, L. Du, J. Dai, L. Lee, C. (2017). Multiple-target deep learning for LSTM-RNN based speech enhancement, In: Proceedings of the Hands-free Speech Communications and Microphone Arrays Conference, 136–140.
Xu Y, Du J, Dai LR, Lee CH (2014) An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process Lett 21(1):65–68
Article Google Scholar
Pandey A, Wang D (2019) A new framework for CNN-based speech enhancement in the time domain. IEEE/ACM Transactions on Audio, Speech and Language Processing 27(7):1179–1188
Article Google Scholar
Fu SW, Wang TW, Tsao Y, Lu X, Kawai H (2018) End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing 26(9):1570–1584
Article Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11:3371–3408
MathSciNet MATH Google Scholar
Shivakumar P.G. Georgiou, P.G (2016) Perception optimized deep denoising autoencoders for speech enhancement. In: Proc. INTERSPEECH. 3743–3747.
Chandra B, Sharma RK (2014) Adaptive Noise Schedule for Denoising Autoencoder. In Neural Information Processing. ICONIP 2014. Lect Notes Comput Sci 8834:535–542
Article Google Scholar
Hao, X. Shan, C. Xu, Y. Sun, S. Xie, L. (2019) An Attention-based Neural Network Approach for Single Channel Speech Enhancement, In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 6895–6899.
Liang R, Kong F, Xie Y, Tang G, Cheng J (2020) Real-Time Speech Enhancement Algorithm Based on Attention LSTM. IEEE Access 8:48464–48476
Article Google Scholar
Vincent, P. Larochelle, H. Bengio, Y. Manzagol, P.A. (2008). Extracting and Composing Robust Features with Denoising Autoencoders, In: Proceedings of the International Conference on Machine Learning, 1096–1103
Yildirim O, Tan RS, Acharya UR (2018) An efficient compression of ECG signals using deep convolutional autoencoders. Cognitive Systems Research 53:198–211
Article Google Scholar
Zhao Z, Liu H, Fingscheidt T (2019) Convolutional Neural Networks to Enhance Coded Speech. IEEE/ACM Transactions on Audio, Speech and Language Processing 27(4):663–678
Article Google Scholar
Tiwari V (2010) MFCC and its applications in speaker recognition. International Journal on Emerging Technologies 1(1):19–22
Google Scholar
Thiruvengadam, (2017) Speech/Music Classification using MFCC and KNN. International Journal of Computational Intelligence Research 13(10):2449–2452
Google Scholar
Vidyadhar U, Sastry PA (2019) An Overview of Restricted Boltzmann Machines. J Indian Inst Sci. https://doi.org/10.1007/s41745-019-0102-z
Article Google Scholar
Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech. IEEE Trans Audio Speech Lang Process 19(7):2125–2136
Article Google Scholar
Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2:749–752
Google Scholar
Wang D, Chen J (2018) Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech and Language Processing 26(10):1702–1726
Article Google Scholar
Pascual, S. Bonafonte, A. Serra, J. (2017). SEGAN: Speech Enhancement Generative Adversarial Network, In: Proceedings of INTERSPEECH. 3642–3646.

Download references

Author information

Authors and Affiliations

School of Electronics Engineering, Vellore Institute of Technology, Vellore, India
Shivangi Raj, P. Prakasam & Shubham Gupta

Authors

Shivangi Raj
View author publications
You can also search for this author in PubMed Google Scholar
P. Prakasam
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. Prakasam.

Ethics declarations

Conflicts of interest

We hereby declare that there is no conflict of interest in this research work/paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raj, S., Prakasam, P. & Gupta, S. Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients. Neural Comput & Applic 33, 10199–10209 (2021). https://doi.org/10.1007/s00521-021-05782-5

Download citation

Received: 04 October 2020
Accepted: 28 January 2021
Published: 17 February 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00521-021-05782-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Abstract

Access this article

Similar content being viewed by others

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Mel spectrogram-based audio forgery detection using CNN

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Code availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multilayered convolutional neural network-based auto-CODEC for audio signal denoising using mel-frequency cepstral coefficients

Abstract

Access this article

Similar content being viewed by others

Audio signal quality enhancement using multi-layered convolutional neural network based auto encoder–decoder

Mel spectrogram-based audio forgery detection using CNN

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Code availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation