Skip to main content

Advertisement

Log in

Construction of complex environment speech signal communication system based on 5G and AI driven feature extraction techniques

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In daily life, the most direct and important way of human communication is voice. With the rise of the Internet and the development of communication technology, the proportion of non voice signals such as image and data in communication system is increasing. However, in most communication systems, voice transmission function is generally required, so it is still one of the necessary functions for many communication systems to transmit voice information effectively. With the rapid development of the Internet, especially the recent 5G technology commercial and future civil, human–computer interaction will become more and more intelligent in the future, which also poses a greater challenge to speech recognition as a human–computer interface. Noise interference is one of the biggest obstacles to the practical application of speech system. Although a large number of noisy data based on deep learning can solve part of the noise robustness problem, non-stationary noise interference is still a great challenge for speech recognition system in very low SNR complex scenes. In addition, in the multi information fusion communication system, there are many kinds of data to be transmitted, and there are high requirements for bandwidth and storage space. Hence, this paper studies the construction of voice signal communication system based on the artificial intelligence and 5G technology. The model is designed and implemented considering the complex scenarios. The performance is validated through the simulations compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Cernak, M., Lazaridis, A., Asaei, A., et al. (2016). Composition of deep and spiking neural networks for very low bit rate speech coding[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2301–2312.

    Article  Google Scholar 

  • Chen, J., & Wang, D. L. (2017). Long short-term memory for speaker generalization in supervised speech separation[J]. The Journal of the Acoustical Society of America, 141(6), 4705–4714.

    Article  MathSciNet  Google Scholar 

  • Erdogan H, Hershey J R, Watanabe S, et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Brisbane, QLD, Australia, 2015, 4: 708–712.

  • Hao, W., Yingxi, L., Bin, X., et al. (2019). Speech enhancement method based on convolution gated recurrent neural network [J]. Journal of Huazhong University of Science and Technology, 47(4), 13–18.

    Google Scholar 

  • Huang, P. S., Kim, M., Hasegawa-Johnson, M., et al. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136–2147.

    Article  Google Scholar 

  • Kim, S. K., Park, Y. J., & Lee, S. (2016). Voice activity detection based on deep belief networks using likelihood ratio[J]. Journal of Central South University, 23(1), 145–149.

    Article  Google Scholar 

  • Li, Y., & Kang, S. (2016). Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation[J]. IET Signal Processing, 10(4), 422–427.

    Article  Google Scholar 

  • Li, Z., Cadet, C., & Outbib, R. (2018). Diagnosis for PEMFC based on magnetic measurements and data-driven approach. IEEE Transactions on Energy Conversion, 34(2), 964–972.

    Article  Google Scholar 

  • Ling, Z. H., Kang, S. Y., Zen, H., et al. (2015). Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends[J]. IEEE Signal Processing Magazine, 32(3), 35–52.

    Article  Google Scholar 

  • Liu M, Wang Y, Wang J, et al. Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition[C]. 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, Beijing, China, 2018, 8: 245–249.

  • Luo, X. J., Oyedele, Lukumon O., Ajayi, Anuoluwapo O., Akinade, Olugbenga O., Owolabi, Hakeem A., & Ahmed, Ashraf. (2020). Feature extraction and genetic algorithm enhanced adaptive deep neural network for energy consumption prediction in buildings. Renewable and Sustainable Energy Reviews, 131, e109980.

    Article  Google Scholar 

  • Ma, Y., & Huang, B. (2017). Bayesian learning for dynamic feature extraction with application in soft sensing. IEEE Transactions on Industrial Electronics, 64(9), 7171–7180.

    Article  Google Scholar 

  • Michelsanti D, Tan Z H. Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification[J]. arXiv preprint arXiv: 1709.01703, 2017.

  • Novotny O, Plchot O, Glembek O, et al. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition[J]. arXiv preprint arXiv: 1811.07629, 2018.

  • Park S R, Lee J. A fully convolutional neural network for speech enhancement[J]. arXiv preprint arXiv: 1609.07132, 2016.

  • Qing, W. (2018). Research on speech enhancement based on multi-objective learning and fusion of deep neural network [D]. University of science and technology of China.

    Google Scholar 

  • Ruiyu, L., Li, Z., Qingyun, W., et al. (2018). Speech signal processing: C + + version [M]. China Machine Press.

    Google Scholar 

  • Sun, M., Konstantelos, I., & Strbac, G. (2018). A deep learning-based feature extraction framework for system security assessment. IEEE Transactions on Smart Grid, 10(5), 5007–5020.

    Article  Google Scholar 

  • Wai C.Chu.Speech Coding Algorithms: Foundation and Evolution of Standardized Coders[M]. Hoboken and New Jersey:A John Wiley & Sons,Inc,2003:1–60.

  • Wan, J., Zheng, P., Si, H., Xiong, N. N., Zhang, W., & Vasilakos, A. V. (2019). An artificial intelligence driven multi-feature extraction scheme for big data detection. IEEE Access, 7, 80122–80132.

    Article  Google Scholar 

  • Wang Y, Zhao S, Liu W, et al. Speech Bandwidth Expansion Based on Deep Neural Networks[C]//Interspeech. 2015: 2593–2597.

  • Weninger, F., Erdogan, H., Watanabe, S., et al. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR[C]. International Conference on Latent Variable Analysis and Signal Separation, Liberec, Czech Republic, 8, 91–99.

    Google Scholar 

  • Li Xiaodong. Research and implementation of an improved speech codec algorithm [D]. University of Defense Science and technology, 2011:3–10.

  • Xu Y, Du J, Huang Z, et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement[J]. arXiv preprint arXiv: 1703.07172, 2017.

  • Xu, Y., Du, J., Dai, L. R., et al. (2015). A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.

    Article  Google Scholar 

  • Feng Xudong. Research and implementation of low bit rate error resilient speech coding algorithm [D]. Xi'an University of Electronic Science and technology, 2013:2–15.

  • Jiang Xuehua. Digital speech compression system based on arm [D]. Hunan University. 2007:2–8.

  • Zhuang, Y.-T., Fei, Wu., Chen, C., & Pan, Y.-h. (2017). Challenges and opportunities: from big data to knowledge in AI 2.0. Frontiers of Information Technology & Electronic Engineering, 18(1), 3–14.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yali Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y., Cheng, E., Li, Y. et al. Construction of complex environment speech signal communication system based on 5G and AI driven feature extraction techniques. Int J Speech Technol 25, 817–830 (2022). https://doi.org/10.1007/s10772-021-09900-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-021-09900-5

Keywords

Navigation