Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Abouzid, Houda; Chakkor, Otman; Reyes, Oscar Gabriel; Ventura, Sebastian

doi:10.1007/s10470-019-01446-6

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Published: 27 March 2019

Volume 100, pages 501–512, (2019)
Cite this article

Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Houda Abouzid ORCID: orcid.org/0000-0003-1916-0322¹,
Otman Chakkor¹,
Oscar Gabriel Reyes² &
…
Sebastian Ventura²

1725 Accesses
18 Citations
Explore all metrics

Abstract

Datasets exist in real life in many formats (audio, music, image,...). In our case, we have them from various sources mixed together. Our mixtures represent noisy audio data that need to be extracted (features), compressed and analysed in order to be presented in a standard way. The resulted data will be used for the Blind Source Separation task. In this paper, we deal with two types of autoencoders: convolutional and denoising. The novelty of our work is to reconstruct the audio signal in the output of the neural network after extracting the meaningful features that present the pure and the powerful information. Simulation results show a great performance, yielding of 87% for the reconstructed signals that will be included in the automated system used for real word applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 12

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Notes

http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Under-determined+speech+and+music+mixtures.

References

Li, Y., Wang, F., Chen, Y., Cichocki, A., & Sejnowski, T. (2017). The effects of audiovisual inputs on solving the cocktail party problem in the human brain: An fmri study. Cerebral Cortex, 28, 1–15.
Google Scholar
Févotte, C., & Cardoso, J.-F. (2005). Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 78–81). IEEE.
Duong, N. Q. K., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1830–1840.
Article Google Scholar
Romano, J. M. T., Romis, A., Cavalcante, C. C., & Suyama, R. (2016). Unsupervised signal processing: Channel equalization and source separation. Boca Raton: CRC Press.
Google Scholar
Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A. S., Yu, T., et al. (2017). Real-time user-guided image colorization with learned deep priors. arXiv preprint arXiv:1705.02999.
Chandna, P., Miron, M., Janer, J., & Gómez, E. (2017). Monoaural audio source separation using deep convolutional neural networks. In International conference on latent variable analysis and signal separation (pp. 258–266). Springer.
Dubey, N., & Mehra, R. (2015). Blind audio source separation (bass): An unsupervised approach. International Journal of Electrical and Electronics Engineering, 2, 29–33.
Google Scholar
Zhao, M., Wang, D., Zhang, Z., & Zhang, X. (2015). Music removal by convolutional denoising autoencoder in speech recognition. In 2015 Asia-Pacific signal and information processing association annual summit and conference (APSIPA) (pp. 338–341). IEEE.
Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., & Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proceedings of workshop on new tools and methods for very-large scale phonetics research.
Houda, A., & Otman, C. (2015). Blind audio source separation: State-of-art. International Journal of Computer Applications, 130(4), 1–6.
Article Google Scholar
Houda, A., & Otman, C. (2017). A novel method based on gaussianity and sparsity for signal separation algorithms. International Journal of Electrical and Computer Engineering (IJECE), 7(4), 1906–1914.
Article Google Scholar
Kim, E., Hannan, D., & Kenyon, G. (2017). Deep sparse coding for invariant multimodal halle berry neurons. arXiv preprint arXiv:1711.07998.
Middlebrooks, J. C., & Simon, J. Z. (2017). Ear and brain mechanisms for parsing the auditory scene. In The Auditory System at the Cocktail Party (pp. 1–6). Springer.
Saruwatari, H., Kurita, S., Takeda, K., Itakura, F., Nishikawa, T., & Shikano, K. (2003). Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Advances in Signal Processing, 2003(11), 569270.
Article MATH Google Scholar
Leglaive, S., Badeau, R., & Richard, G. (2017). Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp 264–268). IEEE.
Jang, G., Kim, H.-G., & Oh, Y.-H. (2014). Audio source separation using a deep autoencoder. arXiv preprint arXiv:1412.7193.
Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 975, 8887.
Google Scholar
Abouzid, H, & Chakkor, O. (2017). Blind source separation approach for audio signals based on support vector machine classification. In Proceedings of the 2nd international conference on computing and wireless communication systems (p. 39). ACM.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Article Google Scholar
Pawar, R. V., Jalnekar, R. M., & Chitode, J. S. (2018). Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integrated Circuits and Signal Processing, 94(2), 247–257.
Article Google Scholar
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.
Article Google Scholar
Mary, L. (2011). Extraction and representation of prosody for speaker, speech and language recognition. Berlin: Springer.
MATH Google Scholar
Degara-Quintela, N., Pena, A., Sobreira-Seoane, M., & Torres-Guijarro, S. Knowledge-based onset detection in musical applications.
Dannenberg, R. B. (1984). An on-line algorithm for real-time accompaniment. In ICMC (Vol. 84, pp. 193–198).
Sarroff, A. M, & Casey, M. A. (2014). Musical audio synthesis using autoencoding neural nets. In ICMC.
Abouzid, H., & Chakkor, O. (2018). Dimension reduction techniques for signal separation algorithms. In International conference on big data, cloud and applications (pp. 326–340). Springer.
Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., et al. (2017). The 2016 signal separation evaluation campaign. In International conference on latent variable analysis and signal separation (pp. 323–332). Springer.

Download references

Acknowledgements

Drs Reyes and Ventura want to acknowledge the economical support of the Spanish Ministry of Economy and Competitiveness and the Fund of Regional Development (Project TIN2017-83445-P).

Author information

Authors and Affiliations

Telecommunications Department, ENSATE, Abdelmalek Essaadi University, Tétouan, Morocco
Houda Abouzid & Otman Chakkor
Department of Computer Science and Numerical Analysis, University of Cordoba, Córdoba, Spain
Oscar Gabriel Reyes & Sebastian Ventura

Authors

Houda Abouzid
View author publications
You can also search for this author in PubMed Google Scholar
Otman Chakkor
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Gabriel Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Ventura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houda Abouzid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been supported by the research group KDIS (Knowledge Discovery and Intelligent Systems) during my research period at the KDIS laboratory, department of computer science and numerical analysis at the University of Cordoba, Spain.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abouzid, H., Chakkor, O., Reyes, O.G. et al. Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. Analog Integr Circ Sig Process 100, 501–512 (2019). https://doi.org/10.1007/s10470-019-01446-6

Download citation

Received: 21 September 2018
Revised: 15 March 2019
Accepted: 19 March 2019
Published: 27 March 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10470-019-01446-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

Abstract

Access this article

Similar content being viewed by others

Methods for image denoising using convolutional neural network: a review

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation