Abstract
Datasets exist in real life in many formats (audio, music, image,...). In our case, we have them from various sources mixed together. Our mixtures represent noisy audio data that need to be extracted (features), compressed and analysed in order to be presented in a standard way. The resulted data will be used for the Blind Source Separation task. In this paper, we deal with two types of autoencoders: convolutional and denoising. The novelty of our work is to reconstruct the audio signal in the output of the neural network after extracting the meaningful features that present the pure and the powerful information. Simulation results show a great performance, yielding of 87% for the reconstructed signals that will be included in the automated system used for real word applications.
Similar content being viewed by others
References
Li, Y., Wang, F., Chen, Y., Cichocki, A., & Sejnowski, T. (2017). The effects of audiovisual inputs on solving the cocktail party problem in the human brain: An fmri study. Cerebral Cortex, 28, 1–15.
Févotte, C., & Cardoso, J.-F. (2005). Maximum likelihood approach for blind audio source separation using time-frequency gaussian source models. In IEEE workshop on applications of signal processing to audio and acoustics (pp. 78–81). IEEE.
Duong, N. Q. K., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), 1830–1840.
Romano, J. M. T., Romis, A., Cavalcante, C. C., & Suyama, R. (2016). Unsupervised signal processing: Channel equalization and source separation. Boca Raton: CRC Press.
Zhang, R., Zhu, J.-Y., Isola, P., Geng, X., Lin, A. S., Yu, T., et al. (2017). Real-time user-guided image colorization with learned deep priors. arXiv preprint arXiv:1705.02999.
Chandna, P., Miron, M., Janer, J., & Gómez, E. (2017). Monoaural audio source separation using deep convolutional neural networks. In International conference on latent variable analysis and signal separation (pp. 258–266). Springer.
Dubey, N., & Mehra, R. (2015). Blind audio source separation (bass): An unsupervised approach. International Journal of Electrical and Electronics Engineering, 2, 29–33.
Zhao, M., Wang, D., Zhang, Z., & Zhang, X. (2015). Music removal by convolutional denoising autoencoder in speech recognition. In 2015 Asia-Pacific signal and information processing association annual summit and conference (APSIPA) (pp. 338–341). IEEE.
Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., & Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proceedings of workshop on new tools and methods for very-large scale phonetics research.
Houda, A., & Otman, C. (2015). Blind audio source separation: State-of-art. International Journal of Computer Applications, 130(4), 1–6.
Houda, A., & Otman, C. (2017). A novel method based on gaussianity and sparsity for signal separation algorithms. International Journal of Electrical and Computer Engineering (IJECE), 7(4), 1906–1914.
Kim, E., Hannan, D., & Kenyon, G. (2017). Deep sparse coding for invariant multimodal halle berry neurons. arXiv preprint arXiv:1711.07998.
Middlebrooks, J. C., & Simon, J. Z. (2017). Ear and brain mechanisms for parsing the auditory scene. In The Auditory System at the Cocktail Party (pp. 1–6). Springer.
Saruwatari, H., Kurita, S., Takeda, K., Itakura, F., Nishikawa, T., & Shikano, K. (2003). Blind source separation combining independent component analysis and beamforming. EURASIP Journal on Advances in Signal Processing, 2003(11), 569270.
Leglaive, S., Badeau, R., & Richard, G. (2017). Separating time-frequency sources from time-domain convolutive mixtures using non-negative matrix factorization. In 2017 IEEE workshop on applications of signal processing to audio and acoustics (WASPAA) (pp 264–268). IEEE.
Jang, G., Kim, H.-G., & Oh, Y.-H. (2014). Audio source separation using a deep autoencoder. arXiv preprint arXiv:1412.7193.
Wang, D., & Chen, J. (2018). Supervised speech separation based on deep learning: An overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 975, 8887.
Abouzid, H, & Chakkor, O. (2017). Blind source separation approach for audio signals based on support vector machine classification. In Proceedings of the 2nd international conference on computing and wireless communication systems (p. 39). ACM.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Pawar, R. V., Jalnekar, R. M., & Chitode, J. S. (2018). Review of various stages in speaker recognition system, performance measures and recognition toolkits. Analog Integrated Circuits and Signal Processing, 94(2), 247–257.
Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.
Mary, L. (2011). Extraction and representation of prosody for speaker, speech and language recognition. Berlin: Springer.
Degara-Quintela, N., Pena, A., Sobreira-Seoane, M., & Torres-Guijarro, S. Knowledge-based onset detection in musical applications.
Dannenberg, R. B. (1984). An on-line algorithm for real-time accompaniment. In ICMC (Vol. 84, pp. 193–198).
Sarroff, A. M, & Casey, M. A. (2014). Musical audio synthesis using autoencoding neural nets. In ICMC.
Abouzid, H., & Chakkor, O. (2018). Dimension reduction techniques for signal separation algorithms. In International conference on big data, cloud and applications (pp. 326–340). Springer.
Liutkus, A., Stöter, F.-R., Rafii, Z., Kitamura, D., Rivet, B., Ito, N., et al. (2017). The 2016 signal separation evaluation campaign. In International conference on latent variable analysis and signal separation (pp. 323–332). Springer.
Acknowledgements
Drs Reyes and Ventura want to acknowledge the economical support of the Spanish Ministry of Economy and Competitiveness and the Fund of Regional Development (Project TIN2017-83445-P).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work has been supported by the research group KDIS (Knowledge Discovery and Intelligent Systems) during my research period at the KDIS laboratory, department of computer science and numerical analysis at the University of Cordoba, Spain.
Rights and permissions
About this article
Cite this article
Abouzid, H., Chakkor, O., Reyes, O.G. et al. Signal speech reconstruction and noise removal using convolutional denoising audioencoders with neural deep learning. Analog Integr Circ Sig Process 100, 501–512 (2019). https://doi.org/10.1007/s10470-019-01446-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10470-019-01446-6