Skip to main content
Log in

Performance Optimization of Speech Recognition System with Deep Neural Network Model

  • Published:
Optical Memory and Neural Networks Aims and scope Submit manuscript

Abstract

With the development of internet, man-machine interaction has tended to be more important. Precise speech recognition has become an important means to achieve man-machine interaction. In this study, deep neural network model was used to enhance speech recognition performance. Feedforward fully connected deep neural network, time-delay neural network, convolutional neural network and feedforward sequence memory neural network were studied, and their speech recognition performance was studied by comparing their acoustic models. Moreover, the recognition performance of the model after adding different dimension human voice features was tested. The results showed that the performance of the speech recognition system could be improved effectively by using the deep neural network model, and the performance of feedforward sequence memory neural network was the best, followed by deep neural network, time-delay neural network and convolutional neural network. Different extraction features had different improvement effects on model performance. The performance of the model which was added with Fbank extraction features was superior to that added with Mel-frequency cepstrum coefficient (MFCC) extraction feature. The model performance improved after the addition of vocal characteristics. Different models had different vocal characteristic dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

REFERENCES

  1. Chan, W., Jaitly, N., Le, Q., and Vinyals, O., Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, 2016, pp. 4960–4964.

  2. Wang, Y., Li, J. and Gong, Y., Small-footprint high-performance deep neural network-based speech recognition using split-VQ, IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, pp. 4984–4988.

  3. Wu, C., Karanasou, P., Gales, M.J.F., and Sim K.C., Stimulated deep neural network for speech recognition, in Interspeech, San Francisco, 2016, pp. 400–404.

    Book  Google Scholar 

  4. Graves, A., Mohamed, A.R. and Hinton, G., Speech recognition with deep recurrent neural networks, IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, 2013, pp. 6645–6649.

  5. Salvador, S.W. and Weber, F.V., US Patent 9 153 231, 2015.

  6. Cai, J., Li, F., Zhang, Y., and Liu, Y., Research on multi-base depth neural network speech recognition, Advanced Information Technology, Electronic and Automation Control Conference, Chongqing, 2017, pp. 1540–1544.

  7. Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., and Bengio, Y., Attention-based models for speech recognition, Comput. Sci., 2015, vol. 10, no. 4, pp. 429–439.

    Google Scholar 

  8. Miao, Y., Gowayyed, M., and Metze, F., EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding, Automatic Speech Recognition & Understanding, Scottsdale, 2015, pp. 167–174.

    Google Scholar 

  9. Schwarz, A., Huemmer, C., Maas, R. and Kellermann, W., Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments, IEEE International Conference on Acoustics, Speech and Signal Processing, 2015, pp. 4380–4384.

  10. Kipyatkova, I., Experimenting with hybrid TDNN/HMM acoustic models for Russian speech recognition, Speech and Computer: 19th International Conference, 2017, pp. 362–369.

  11. Yoshioka, T., Karita, S. and Nakatani, T., Far-field speech recognition using CNN-DNN-HMM with convolution in time’, IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, 2015, pp. 4360–4364.

  12. Wang, Y., Bao, F., Zhang, H. and Gao, G.L., Research on Mongolian speech recognition based on FSMN, Natural Language Processing and Chinese Computing, 2017, pp. 243–254.

    Google Scholar 

  13. Alam, M.J., Gupta, V., Kenny, P., and Dumouchel, P., Speech recognition in reverberant and noisy environments employing multiple feature extractors and i-vector speaker adaptation’, EURASIP J. Adv. Signal Process., 2015, vol. 2015, no. 1, p. 50.

    Article  Google Scholar 

  14. Brayda, L., Wellekens, C., and Omologo, M., N-best parallel maximum likelihood beamformers for robust speech recognition, Signal Processing Conference, Florence, 2015, pp. 1–4.

  15. Ali, A., Zhang, Y., Cardinal, P., Dahak, N., Vogel, S., and Glass, J.R., A complete KALDI recipe for building Arabic speech recognition systems, 2014 Spoken Language Technology Workshop, South Lake Tahoe, NV, 2015, pp. 525–529.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Guan.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wei Guan Performance Optimization of Speech Recognition System with Deep Neural Network Model. Opt. Mem. Neural Networks 27, 272–282 (2018). https://doi.org/10.3103/S1060992X18040094

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S1060992X18040094

Keywords:

Navigation