Skip to main content
Log in

Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning

  • Published:
Analog Integrated Circuits and Signal Processing Aims and scope Submit manuscript

Abstract

Datasets exist in real life in many formats (audio, music, image,...). In our case, we have them from various sources mixed together. Our mixtures represent noisy audio data that need to be extracted (features), compressed and analysed in order to be presented in a standard way. The resulted data will be used for the Blind Source Separation task. In this paper, we deal with two types of autoencoders: convolutional and denoising. The novelty of our work is to reconstruct the audio signal in the output of the neural network after extracting the meaningful features that present the pure and the powerful information. Simulation results show a great performance, yielding of 87% for the reconstructed signals that will be included in the automated system used for real word applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Under-determined+speech+and+music+mixtures.

References

  1. Li, Y., Wang, F., Chen, Y., Cichocki, A., & Sejnowski, T. (2017). The effects of audiovisual inputs on solving the cocktail party problem in the human brain: An fmri study. Cerebral Cortex, 28, 1–15.

    Google Scholar 

  2. Févotte, C., & Cardoso, J.-F. (2005). Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 78–81). IEEE.

  3. Duong, N. Q. K., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1830–1840.

    Article  Google Scholar 

  4. Romano, J. M. T., Romis, A., Cavalcante, C. C., & Suyama, R. (2016). Unsupervised signal processing: Channel equalization and source separation. Boca Raton: CRC Press.

    Google Scholar 

  5. Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A. S., Yu, T., et al. (2017). Real-time user-guided image colorization with learned deep priors. arXiv preprint arXiv:1705.02999.

  6. Chandna, P., Miron, M., Janer, J., & Gómez, E. (2017). Monoaural audio source separation using deep convolutional neural networks. In International conference on latent variable analysis and signal separation (pp. 258–266). Springer.

  7. Dubey, N., & Mehra, R. (2015). Blind audio source separation (bass): An unsupervised approach. International Journal of Electrical and Electronics Engineering, 2, 29–33.

    Google Scholar 

  8. Zhao, M., Wang, D., Zhang, Z., & Zhang, X. (2015). Music removal by convolutional denoising autoencoder in speech recognition. In 2015 Asia-Pacific signal and information processing association annual summit and conference (APSIPA) (pp. 338–341). IEEE.

  9. Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., & Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proceedings of workshop on new tools and methods for very-large scale phonetics research.

  10. Houda, A., & Otman, C. (2015). Blind audio source separation: State-of-art. International Journal of Computer Applications, 130(4), 1–6.

    Article  Google Scholar 

  11. Houda, A., & Otman, C. (2017). A novel method based on gaussianity and sparsity for signal separation algorithms. International Journal of Electrical and Computer Engineering (IJECE), 7(4), 1906–1914.

    Article  Google Scholar 

  12. Kim, E., Hannan, D., & Kenyon, G. (2017). Deep sparse coding for invariant multimodal halle berry neurons. arXiv preprint arXiv:1711.07998.

  13. Middlebrooks, J. C., & Simon, J. Z. (2017). Ear and brain mechanisms for parsing the auditory scene. In The Auditory System at the Cocktail Party (pp. 1–6). Springer.

  14. Saruwatari, H., Kurita, S., Takeda, K., Itakura, F., Nishikawa, T., & Shikano, K. (2003). Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Advances in Signal Processing, 2003(11), 569270.

    Article  MATH  Google Scholar 

  15. Leglaive, S., Badeau, R., & Richard, G. (2017). Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp 264–268). IEEE.

  16. Jang, G., Kim, H.-G., & Oh, Y.-H. (2014). Audio source separation using a deep autoencoder. arXiv preprint arXiv:1412.7193.

  17. Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 975, 8887.

    Google Scholar 

  18. Abouzid, H, & Chakkor, O. (2017). Blind source separation approach for audio signals based on support vector machine classification. In Proceedings of the 2nd international conference on computing and wireless communication systems (p. 39). ACM.

  19. Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.

    Article  Google Scholar 

  20. Pawar, R. V., Jalnekar, R. M., & Chitode, J. S. (2018). Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integrated Circuits and Signal Processing, 94(2), 247–257.

    Article  Google Scholar 

  21. Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.

    Article  Google Scholar 

  22. Mary, L. (2011). Extraction and representation of prosody for speaker, speech and language recognition. Berlin: Springer.

    MATH  Google Scholar 

  23. Degara-Quintela, N., Pena, A., Sobreira-Seoane, M., & Torres-Guijarro, S. Knowledge-based onset detection in musical applications.

  24. Dannenberg, R. B. (1984). An on-line algorithm for real-time accompaniment. In ICMC (Vol. 84, pp. 193–198).

  25. Sarroff, A. M, & Casey, M. A. (2014). Musical audio synthesis using autoencoding neural nets. In ICMC.

  26. Abouzid, H., & Chakkor, O. (2018). Dimension reduction techniques for signal separation algorithms. In International conference on big data, cloud and applications (pp. 326–340). Springer.

  27. Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., et al. (2017). The 2016 signal separation evaluation campaign. In International conference on latent variable analysis and signal separation (pp. 323–332). Springer.

Download references

Acknowledgements

Drs Reyes and Ventura want to acknowledge the economical support of the Spanish Ministry of Economy and Competitiveness and the Fund of Regional Development (Project TIN2017-83445-P).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houda Abouzid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work has been supported by the research group KDIS (Knowledge Discovery and Intelligent Systems) during my research period at the KDIS laboratory, department of computer science and numerical analysis at the University of Cordoba, Spain.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abouzid, H., Chakkor, O., Reyes, O.G. et al. Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. Analog Integr Circ Sig Process 100, 501–512 (2019). https://doi.org/10.1007/s10470-019-01446-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10470-019-01446-6

Keywords

Navigation