Introduction of Semantic Model to Help Speech Recognition

Level, Stephane; Illina, Irina; Fohr, Dominique

doi:10.1007/978-3-030-58323-1_41

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12284))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1361 Accesses
1 Citations

Abstract

Current Automatic Speech Recognition (ASR) systems mainly take into account acoustic, lexical and local syntactic information. Long term semantic relations are not used. ASR systems significantly decrease performance when the training conditions and the testing conditions differ due to the noise, etc.. In this case the acoustic information can be less reliable. To help noisy ASR system, we propose to supplement ASR system with a semantic module. This module re-evaluates the N-best speech recognition hypothesis list and can be seen as a form of adaptation in the context of noise. For the words in the processed sentence that could have been poorly recognized, this module chooses words that correspond better to the semantic context of the sentence. To achieve this, we introduced the notions of a context part and possibility zones that measure the similarity between the semantic context of the document and the corresponding possible hypothesis. The proposed methodology uses two continuous representations of words: word2vec and FastText. We conduct experiments on the publicly available TED conferences dataset (TED-LIUM) mixed with real noise. The proposed method achieves a significant improvement of the word error rate (WER) over the ASR system without semantic information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 238–247 (2014)
Google Scholar
Bayer, A., Riccardi, G.: Semantic language models for automatic speech recognition. In: Proceedings of the IEEE Spoken Language Technology Workshop (SLT) (2014)
Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. In: Transactions of the Association for Computational Linguistics, pp. 135–146 (2017)
Google Scholar
Corona, R., Thomason, J., Mooney, R.: Improving black-box speech recognition using semantic parsing. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 122–127 (2017)
Google Scholar
Erdogan, H., Sarikaya, R., Chen, S., Gao, Y., Picheny, M.: Using semantic analysis to improve speech recognition performance. Comput. Speech Lang. 19, 321–343 (2005)
Article Google Scholar
Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. In: Karpov, A., Jokisch, O., Potapova, R. (eds.) SPECOM 2018. LNCS (LNAI), vol. 11096, pp. 198–208. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99579-3_21
Chapter Google Scholar
Gaspers, J., Cimiano, P., Wrede, B.: Semantic parsing of speech using grammars learned with weak supervision. In: Proceedings of the HLT-NAACL, pp. 872–881 (2015)
Google Scholar
Gillick, L., Cox, S.: Some statistical issues in the comparison of speech recognition algorithms. In: Proceedings of ICASSP, vol. 1, pp. 532–535 (1989)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26, pp. 3111–3119 (2013)
Google Scholar
Mikolov T., Kombrink S., Burget L., Cernocky J.-H., Khudanpur S.: Extensions of recurrent neural network language model. In: Proceedings of the ICASSP, pp. 5528–5531 (2011)
Google Scholar
Morbini, F., et al.: A reranking approach for recognition and classification of speech input in conversational dialogue systems. In: Proceedings of the Spoken Language Technology Workshop (SLT), pp. 49–54. IEEE (2012)
Google Scholar
Ogawa, A., Delcroix, M., Karita, S., Nakatani, T.: Rescoring N-best speech recognition list based on one-on-one hypothesis comparaison using encoder-classifier model. In: Proceedings of the ICASSP (2018)
Google Scholar
Povey, D., et al.: The Kaldi speech recognition toolkit. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
Sheikh, I., Fohr, D., Illina, I., Linarès, G.: Modelling semantic context of OOV words in large vocabulary continuous speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25(3), 598–610 (2017)
Article Google Scholar
Shin, J., Lee, Y., Jung, K.: Effective sentence scoring method using BERT for speech recognition. In: Proceedings of Machine Learning Research, vol. 101, pp. 1081–1093 (2019)
Google Scholar
Song, Y., et al.: L2RS: a learning-to-rescore mechanism for automatic speech recognition. arXiv:1910.11496v1 (2019)
Varga, A., Steeneken, H.: Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)
Article Google Scholar
Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A., Jin, W., Schuller, B.: Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans. Intell. Syst. Technol. 9(5), 1–28 (2018)
Article Google Scholar

Download references

Acknowledgments

The authors thank the DGA (Direction Générale de l’Armement, part of the French Ministry of Defence), Thales AVS and Dassault Aviation who are supporting the funding of this study and the “Man-Machine Teaming” scientific program in which this research project is taking place.

Author information

Authors and Affiliations

Université de Lorraine, CNRS, Inria, 54000, Nancy, France
Stephane Level, Irina Illina & Dominique Fohr

Authors

Stephane Level
View author publications
You can also search for this author in PubMed Google Scholar
Irina Illina
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Fohr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Irina Illina .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Level, S., Illina, I., Fohr, D. (2020). Introduction of Semantic Model to Help Speech Recognition. In: Sojka, P., Kopeček, I., Pala, K., Horák, A. (eds) Text, Speech, and Dialogue. TSD 2020. Lecture Notes in Computer Science(), vol 12284. Springer, Cham. https://doi.org/10.1007/978-3-030-58323-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-030-58323-1_41
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58322-4
Online ISBN: 978-3-030-58323-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics