Skip to main content

Online LDA-Based Language Model Adaptation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Abstract

In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bellegarda, J.R.: Statistical language model adaptation: review and perspectives. Speech Commun. 42(1), 93–108 (2004)

    Article  Google Scholar 

  2. Chen, L., Lamel, L., Gauvain, J.L., Adda, G.: Dynamic language modeling for broadcast news. In: Eighth International Conference on Spoken Language Processing (2004)

    Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Tam, Y.C., Schultz, T.: Dynamic language model adaptation using variational bayes inference. In: Ninth European Conference on Speech Communication and Technology (2005)

    Google Scholar 

  5. Hsu, B.J.P., Glass, J.: Style & topic language model adaptation using HMM-LDA. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 373–381. Association for Computational Linguistics (2006)

    Google Scholar 

  6. Heidel, A., Chang, H., Lee, L.: Language model adaptation using latent Dirichlet allocation and an efficient topic inference algorithm. In: Eighth Annual Conference of the International Speech Communication Association (2007)

    Google Scholar 

  7. Liu, Y., Liu, F.: Unsupervised language model adaptation via topic modeling based on named entity hypotheses. In: IEEE International Conference on Acoustics, Speech and Signal Processing 2008, ICASSP 2008, pp. 4921–4924. IEEE (2008)

    Google Scholar 

  8. Haidar, M.A., O’Shaughnessy, D.: Unsupervised language model adaptation using latent Dirichlet allocation and dynamic marginals. In: 2011 19th European Signal Processing Conference, pp. 1480–1484. IEEE (2011)

    Google Scholar 

  9. Jeon, H.B., Lee, S.Y.: Language model adaptation based on topic probability of latent dirichlet allocation. ETRI J. 38(3), 487–493 (2016)

    Google Scholar 

  10. Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems, pp. 856–864 (2010)

    Google Scholar 

  11. Pražák, A., Loose, Z., Trmal, J., Psutka, J.V., Psutka, J.: Novel approach to live captioning through re-speaking: tailoring speech recognition to re-speaker’s needs. In: INTERSPEECH (2012)

    Google Scholar 

  12. Švec, J., et al.: General framework for mining, processing and storing large amounts of electronic texts for language modeling purposes. Lang. Resour. Eval. 48(2), 227–248 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This paper was supported by the project no. P103/12/G084 of the Grant Agency of the Czech Republic and by the grant of the University of West Bohemia, project no. SGS-2016-039.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan Lehečka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lehečka, J., Pražák, A. (2018). Online LDA-Based Language Model Adaptation. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics