Online LDA-Based Language Model Adaptation

  • Jan LehečkaEmail author
  • Aleš Pražák
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)


In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.


Topic modeling Language model adaptation 



This paper was supported by the project no. P103/12/G084 of the Grant Agency of the Czech Republic and by the grant of the University of West Bohemia, project no. SGS-2016-039.


  1. 1.
    Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)CrossRefGoogle Scholar
  2. 2.
    Chen, L., Lamel, L., Gauvain, J.L., Adda, G.: Dynamic language modeling for broadcast news. In: Eighth International Conference on Spoken Language Processing (2004)Google Scholar
  3. 3.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  4. 4.
    Tam, Y.C., Schultz, T.: Dynamic language model adaptation using variational bayes inference. In: Ninth European Conference on Speech Communication and Technology (2005)Google Scholar
  5. 5.
    Hsu, B.J.P., Glass, J.: Style & topic language model adaptation using HMM-LDA. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 373–381. Association for Computational Linguistics (2006)Google Scholar
  6. 6.
    Heidel, A., Chang, H., Lee, L.: Language model adaptation using latent Dirichlet allocation and an efficient topic inference algorithm. In: Eighth Annual Conference of the International Speech Communication Association (2007)Google Scholar
  7. 7.
    Liu, Y., Liu, F.: Unsupervised language model adaptation via topic modeling based on named entity hypotheses. In: IEEE International Conference on Acoustics, Speech and Signal Processing 2008, ICASSP 2008, pp. 4921–4924. IEEE (2008)Google Scholar
  8. 8.
    Haidar, M.A., O’Shaughnessy, D.: Unsupervised language model adaptation using latent Dirichlet allocation and dynamic marginals. In: 2011 19th European Signal Processing Conference, pp. 1480–1484. IEEE (2011)Google Scholar
  9. 9.
    Jeon, H.B., Lee, S.Y.: Language model adaptation based on topic probability of latent dirichlet allocation. ETRI J. 38(3), 487–493 (2016)Google Scholar
  10. 10.
    Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010)Google Scholar
  11. 11.
    Pražák, A., Loose, Z., Trmal, J., Psutka, J.V., Psutka, J.: Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker’s needs. In: INTERSPEECH (2012)Google Scholar
  12. 12.
    Švec, J., et al.: General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Lang. Resour. Eval. 48(2), 227–248 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Cybernetics, Faculty of Applied SciencesUniversity of West BohemiaPlzeňCzech Republic

Personalised recommendations