English speech recognition based on deep learning with multiple features

Song, Zhaojuan

doi:10.1007/s00607-019-00753-0

English speech recognition based on deep learning with multiple features

Published: 26 August 2019

Volume 102, pages 663–682, (2020)
Cite this article

Computing Aims and scope Submit manuscript

Zhaojuan Song¹

1581 Accesses
60 Citations
Explore all metrics

Abstract

English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep learning for time series classification: a review

Article 02 March 2019

Biometrics recognition using deep learning: a survey

Article 13 January 2023

Automatic speech recognition: a survey

Article 10 November 2020

References

Nassif AB, Shahin I, Attili I et al (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7(99):19143–19165
Article Google Scholar
Toth L, Hoffmann I, Gosztolya G et al (2018) A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr Alzheimer Res 15(2):130–138
Article Google Scholar
Schillingmann L, Ernst J, Keite V et al (2018) AlignTool: the automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes. Behav Res Methods 50(2):466–489
Article Google Scholar
Coutrot A, Hsiao JH, Chan AB (2018) Scanpath modeling and classification with hidden Markov models. Behav Res Methods 50(1):362–379
Article Google Scholar
Ali Z, Abbas AW, Thasleema TM, Uddin B, Raaz T, Abid SAR (2015) Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN. Int J Speech Technol 18(2):271–275
Article Google Scholar
Satori H, Zealouk O, Satori K et al (2017) Voice comparison between smokers and non-smokers using HMM speech recognition system. Int J Speech Technol 20(12):1–7
Google Scholar
Bocchieri E (2017) System and method for speech recognition modeling for mobile voice search. Jersey Citynj Usphiladelphiapa Uschathamnj Us 47(10):4888–4891
Google Scholar
Telmem M, Ghanou Y (2018) Estimation of the optimal HMM parameters for amazigh speech recognition system using CMU-Sphinx. Procedia Comput Sci 127:92–101
Article Google Scholar
Siniscalchi SM, Salerno VM (2017) Adaptation to new microphones using artificial neural networks with trainable activation functions. IEEE Trans Neural Netw Learn Syst 28(8):1959–1965
Article MathSciNet Google Scholar
Enarvi S, Smit P, Virpioja S et al (2017) Automatic speech recognition with very large conversational finnish and estonian vocabularies. IEEE/ACM Trans Audio Speech Lang Process 25(11):2085–2097
Article Google Scholar
Yan Z, Qiang H, Jian X (2013) A scalable approach to using DNN-derived features in GMM–HMM based acoustic modeling for LVCSR. Math Comput 44(170):519–521
Google Scholar
Sailor HB, Patil HA, Sailor HB et al (2016) Novel unsupervised auditory filterbank learning using convolutional RBM for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(12):2341–2353
Article Google Scholar
Cairong Z, Xinran Z, Cheng Z et al (2016) A novel DBN feature fusion model for cross-corpus speech emotion recognition. J Electr Comput Eng 2016(4):1–11
Google Scholar
Affonso ET, Rosa RL, Rodríguez DZ (2017) Speech quality assessment over lossy transmission channels using deep belief networks. IEEE Signal Process Lett 25(1):70–74
Article Google Scholar
Ali H, Tran SN, Benetos E et al (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19
Article Google Scholar
Jian L, Li Z, Yang X et al (2019) Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consum Electron Mag 8(3):81–86
Article Google Scholar
Toshitatsu T, Masumura R, Sakauchi S et al (2018) New report preparation system for endoscopic procedures using speech recognition technology. Endosc Int Open 6(6):E676–E687
Article Google Scholar
Ishimitsu S (2018) Speech recognition method and speech recognition apparatus. J Acoust Soc Am 94(109):3538
Google Scholar
Abdelaziz AH (2018) Comparing fusion models for DNN-based audiovisual continuous speech recognition. IEEE/ACM Trans Audio Speech Lang Process 26(3):475–484
Article Google Scholar
Fadlullah ZM, Tang F, Mao B et al (2017) State-of-the-art deep learning: evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Commun Surv Tutor 19(4):2432–2455
Article Google Scholar
Tang D, Bing Q, Liu T (2015) Deep learning for sentiment analysis: successful approaches and future challenges. Wiley Interdiscip Rev Data Min Knowl Discov 5(6):292–303
Article Google Scholar
Chen Miaochao, Shengqi Lu, Liu Qilin (2018) Global regularity for a 2D model of electro-kinetic fluid in a bounded domain. Acta Math Appl Sin Engl Ser 34(2):398–403
Article MathSciNet Google Scholar
Tomczak JM, Gonczarek A (2017) Learning invariant features using subspace restricted boltzmann machine. Neural Process Lett 45(1):173–182
Article Google Scholar
Zhang F, Mao Q, Shen X et al (2018) Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multimed Comput Commun Appl 14(1s):1–19
Article Google Scholar
Yin J (2019) Study on the progress of neural mechanism of positive emotions. Transl Neurosci 10(1):93–98. https://doi.org/10.1515/tnsci-2019-0016
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Translation Studies of Qufu Normal University, Rizhao, 276826, Shandong, China
Zhaojuan Song

Authors

Zhaojuan Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhaojuan Song.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Z. English speech recognition based on deep learning with multiple features. Computing 102, 663–682 (2020). https://doi.org/10.1007/s00607-019-00753-0

Download citation

Received: 30 May 2019
Accepted: 17 August 2019
Published: 26 August 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00607-019-00753-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

English speech recognition based on deep learning with multiple features

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Biometrics recognition using deep learning: a survey

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

English speech recognition based on deep learning with multiple features

Abstract

Access this article

Similar content being viewed by others

Deep learning for time series classification: a review

Biometrics recognition using deep learning: a survey

Automatic speech recognition: a survey

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation