Online LDA-Based Language Model Adaptation

Lehečka, Jan; Pražák, Aleš

doi:10.1007/978-3-030-00794-2_36

Online LDA-Based Language Model Adaptation

Jan Lehečka¹⁹ &
Aleš Pražák¹⁹

Conference paper
First Online: 08 September 2018

1427 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Abstract

In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)
Article Google Scholar
Chen, L., Lamel, L., Gauvain, J.L., Adda, G.: Dynamic language modeling for broadcast news. In: Eighth International Conference on Spoken Language Processing (2004)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Tam, Y.C., Schultz, T.: Dynamic language model adaptation using variational bayes inference. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Hsu, B.J.P., Glass, J.: Style & topic language model adaptation using HMM-LDA. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 373–381. Association for Computational Linguistics (2006)
Google Scholar
Heidel, A., Chang, H., Lee, L.: Language model adaptation using latent Dirichlet allocation and an efficient topic inference algorithm. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Google Scholar
Liu, Y., Liu, F.: Unsupervised language model adaptation via topic modeling based on named entity hypotheses. In: IEEE International Conference on Acoustics, Speech and Signal Processing 2008, ICASSP 2008, pp. 4921–4924. IEEE (2008)
Google Scholar
Haidar, M.A., O’Shaughnessy, D.: Unsupervised language model adaptation using latent Dirichlet allocation and dynamic marginals. In: 2011 19th European Signal Processing Conference, pp. 1480–1484. IEEE (2011)
Google Scholar
Jeon, H.B., Lee, S.Y.: Language model adaptation based on topic probability of latent dirichlet allocation. ETRI J. 38(3), 487–493 (2016)
Google Scholar
Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010)
Google Scholar
Pražák, A., Loose, Z., Trmal, J., Psutka, J.V., Psutka, J.: Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker’s needs. In: INTERSPEECH (2012)
Google Scholar
Švec, J., et al.: General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Lang. Resour. Eval. 48(2), 227–248 (2014)
Article Google Scholar

Download references

Acknowledgments

This paper was supported by the project no. P103/12/G084 of the Grant Agency of the Czech Republic and by the grant of the University of West Bohemia, project no. SGS-2016-039.

Author information

Authors and Affiliations

Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Jan Lehečka & Aleš Pražák

Authors

Jan Lehečka
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Pražák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Lehečka .

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Masaryk University, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lehečka, J., Pražák, A. (2018). Online LDA-Based Language Model Adaptation. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-00794-2_36
Published: 08 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00793-5
Online ISBN: 978-3-030-00794-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics