ASR Hypothesis Reranking Using Prior-Informed Restricted Boltzmann Machine

Ma, Yukun; Cambria, Erik; Bigot, Benjamin

doi:10.1007/978-3-319-77113-7_39

Yukun Ma^14,15,
Erik Cambria¹⁴ &
Benjamin Bigot¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10761))

Included in the following conference series:

International Conference on Computational Linguistics and Intelligent Text Processing

863 Accesses

Abstract

Discriminative language models (DLMs) have been widely used for reranking competing hypotheses produced by an Automatic Speech Recognition (ASR) system. While existing DLMs suffer from limited generalization power, we propose a novel DLM based on a discriminatively trained Restricted Boltzmann Machine (RBM). The hidden layer of the RBM improves generalization and allows for employing additional prior knowledge, including pre-trained parameters and entity-related prior. Our approach outperforms the single-layer-perceptron (SLP) reranking model, and fusing our approach with SLP achieves up to 1.3% absolute Word Error Rate (WER) reduction and a relative 180% improvement in terms of WER reduction over the SLP reranker. In particular, it shows that proposed prior informed RBM reranker achieves largest ASR error reduction (3.1% absolute WER) on content words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Collins, M.: Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 489–496 (2002)
Google Scholar
Ma, Y., Cambria, E., Gao, S.: Label embedding for zero-shot fine-grained named entity typing. In: COLING, Osaka, pp. 171–180 (2016)
Google Scholar
Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Comput. Linguist. 31, 25–70 (2005)
Article MathSciNet Google Scholar
Koo, T., Collins, M.: Hidden-variable models for discriminative reranking. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 507–514 (2005)
Google Scholar
Li, Z., Khudanpur, S.: Large-scale discriminative n-gram language models for statistical machine translation. In: Proceedings of AMTA (2009)
Google Scholar
Roark, B., Saraclar, M., Collins, M., Johnson, M.: Discriminative language modeling with conditional random fields and the perceptron algorithm. In: Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 2004), Main Volume, Barcelona, Spain, pp. 47–54 (2004)
Google Scholar
Collins, M., Roark, B., Saraclar, M.: Discriminative syntactic language modeling for speech recognition. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 507–514 (2005)
Google Scholar
Lambert, B., Raj, B., Singh, R.: Discriminatively trained dependency language modeling for conversational speech recognition. In: INTERSPEECH, pp. 3414–3418 (2013)
Google Scholar
Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 194–281 (1986)
Google Scholar
Niehues, J., Waibel, A.: Continuous space language models using restricted boltzmann machines. In: IWSLT, pp. 164–170 (2012)
Google Scholar
Dahl, G.E., Adams, R.P., Larochelle, H.: Training restricted boltzmann machines on word observations. arXiv preprint arXiv:1202.5695 (2012)
Wang, L., Liu, K., Cao, Z., Zhao, J., de Melo, G.: Sentiment-aspect extraction based on restricted boltzmann machines. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1: Long Papers. Association for Computational Linguistics, Beijing (2015)
Google Scholar
Fries, C.C.: The Structure of English. Harcourt Brace, New York (1952)
Google Scholar
Pernkopf, F., Wohlmayr, M., Tschiatschek, S.: Maximum margin Bayesian network classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 34, 521–532 (2012)
Article Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article Google Scholar
Levit, M., Parthasarathy, S., Chang, S.: Word-phrase-entity language models: Getting more mileage out of n-grams. In: Proceedings of Interspeech, Singapore, ISCA - International Speech Communication Association, pp. 666–670 (2014)
Google Scholar
Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: NIPS, vol. 22, pp. 1607–1614 (2009)
Google Scholar
Rousseau, A., Deléglise, P., Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more ted talks. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 3935–3939 (2014)
Google Scholar
Ferraresi, A., Zanchetta, E., Baroni, M., Bernardini, S.: Introducing and evaluating ukWaC, a very large web-derived corpus of English. In: Proceedings of WAC-4 (2008)
Google Scholar
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Google Scholar
Walker, W., et al.: Sphinx-4: a flexible open source framework for speech recognition. Technical report, Mountain View, CA, USA (2004)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press, Cambridge (1998)
MATH Google Scholar
Cambria, E., Poria, S., Bajpai, R., Schuller, B.: SenticNet 4: a semantic resource for sentiment analysis based on conceptual primitives. In: COLING, pp. 2666–2677 (2016)
Google Scholar

Download references

Acknowledgements

This work was conducted within the Rolls-Royce@NTU Corp Lab with support from the National Research Foundation Singapore under the Corp Lab@University Scheme.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Yukun Ma & Erik Cambria
Rolls-Royce@NTU Corporate Lab, Nanyang Technological University, Singapore, Singapore
Yukun Ma & Benjamin Bigot

Authors

Yukun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Erik Cambria
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Bigot
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yukun Ma .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Cambria, E., Bigot, B. (2018). ASR Hypothesis Reranking Using Prior-Informed Restricted Boltzmann Machine. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2017. Lecture Notes in Computer Science(), vol 10761. Springer, Cham. https://doi.org/10.1007/978-3-319-77113-7_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-77113-7_39
Published: 10 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77112-0
Online ISBN: 978-3-319-77113-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics