Skip to main content

Progress in Neural Network Based Statistical Language Modeling

  • Chapter
  • First Online:
Deep Learning: Concepts and Architectures

Part of the book series: Studies in Computational Intelligence ((SCI,volume 866))

  • 3025 Accesses

Abstract

Statistical Language Modeling (LM) is one of the central steps in many Natural Language Processing (NLP)  tasks including Automatic Speech recognition (ASR), Statistical Machine Translation (SMT) , Sentence completion, Automatic Text Generation to name a few. Good Quality Language Model has been one of the key success factors for many commercial NLP applications. Since past three decades diverse research communities like psychology, neuroscience, data compression, machine translation, speech recognition, linguistics etc, have advanced research in the field of Language Modeling. First we understand the mathematical background of LM problem. Further we review various Neural Network based LM techniques in the order they were developed. We also review recent developments in Recurrent Neural Network (RNN) Based Language Models. Early LM research in ASR gave rise to commercially successful class of LMs called as N-gram LMs. These class of models were purely statistical based and lacked in utilising the linguistic information present in the text itself. With the advancement in the computing power, availability of humongous and rich sources of textual data Neural Network based LM paved their way into the arena. These techniques proved significant, since they mapped word tokens into a continuous space than treating them as discrete. As NNLM performance was proved to be comparable to existing state of the art N-gram LMs researchers also successfully used Deep Neural Network to LM. Researchers soon realised that the inherent sequential nature of textual input make LM problem a good Candidate for use of Recurrent Neural Network (RNN) architecture. Today RNN is the choice of Neural Architecture to solve LM by most practitioners. This chapter sheds light on variants of Neural Network Based LMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)

    Google Scholar 

  2. Chen, J.G.S.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting of the ACL (1996)

    Google Scholar 

  3. Joshua, T., Goodman, J.: A bit of progress in language modeling extended version, Machine Learning and Applied Statistics Group Microsoft Research. Technical Report, MSR-TR-2001-72 (2001)

    Google Scholar 

  4. Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. HLT 91, 293–295 (1991)

    Google Scholar 

  5. Bellegarda, J.R.: A multispan language modeling framework for large vocabulary speech recognition. IEEE Trans. Speech Audio Process. 6(5), 456–467 (1998)

    Article  Google Scholar 

  6. Lau, R., Rosenfeld, R., Roukos, S.: Trigger-based language models: a maximum entropy approach. In: IEEE International Conference in Acoustics, Speech, and Signal Processing, ICASSP-93, vol. 2, pp. 45–48 (1993)

    Google Scholar 

  7. Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)

    Google Scholar 

  8. Rosenfeld R.: Adaptive statistical language modeling: a maximum entropy approach. Ph.D. thesis, Carnegie Mellon University (1994)

    Google Scholar 

  9. Chen, S.F.: Shrinking exponential language models, In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics pp. 468–476 (2009)

    Google Scholar 

  10. Chen, S.F., Mangu, L., Ramabhadran, B., Sarikaya, R., Sethy, A.: Scaling shrinkage-based language models. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 299–304 (2009)

    Google Scholar 

  11. Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, BRNO University of Technology, Faculty of information Technology (2012)

    Google Scholar 

  12. Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Innovations in Machine Learning, pp. 137–186, Springer (2006)

    Google Scholar 

  13. Arisoy, E., Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Deep neural network language models. In: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pp. 20–28 (2012)

    Google Scholar 

  14. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)

    Article  Google Scholar 

  15. De Mulder, W., Bethard, S., Moens, M.-F.: A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30(1), 61–98 (2015)

    Article  Google Scholar 

  16. Graves, A.: Supervised sequence labelling. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012)

    Google Scholar 

  17. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078

  18. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. ICML 3(28), 1310–1318 (2013)

    Google Scholar 

  19. Bengio, Y., Senécal, J.S.: Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Trans. Neural Netw. 19(4), 713–722 (2008)

    Article  Google Scholar 

  20. Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628 (2013)

    Google Scholar 

  21. Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2011, pp. 196–201 (2011)

    Google Scholar 

  22. Ji, S., Vishwanathan, S., Satish, N., Anderson, M.J., Dubey, P.: Blackout Speeding up recurrent neural network language models with very large vocabularies (2015). arXiv preprint arXiv:1511.06909

  23. Zoph, B., Vaswani, A., May, J., Knight, K.: Simple, fast noise-contrastive estimation for large rnn vocabularies. NAACL HLT, pp. 1217–1222 (2016)

    Google Scholar 

  24. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). arXiv preprint arXiv:1602.02410

  25. Cho, S.J.K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation (2014). CoRR arXiv:1412.2007

  26. Luong, M.T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation (2014). arXiv preprint arXiv:1410.8206

  27. Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation (2015). arXiv preprint arXiv:1508.02096

  28. Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model (2017). arXiv preprint arXiv:1711.03953

  29. Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv preprint arXiv:1708.02182

  30. Krause, B., Kahembwe, E., Murray, I., Renals, S.: Dynamic evaluation of neural sequence models (2017). arXiv preprint arXiv:1709.07432

  31. Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., Robinson, T.: One billion word benchmark for measuring progress in statistical language modeling (2013). arXiv preprint arXiv:1312.3005

  32. Kuchaiev, O., Ginsburg, B.: Factorization tricks for lstm networks (2017). arXiv preprint arXiv:1703.10722

  33. Rae, J.W., Dyer, C., Dayan, P., Lillicrap, T.P.: Fast parametric learning with activation memorization (2018). arXiv preprint arXiv:1803.10049

  34. Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache (2016). arXiv preprint arXiv:1612.04426

  35. Sprechmann, P., Jayakumar, S.M., Rae, J.W., Pritzel, A., Badia, A.P., Uria, B., Vinyals, O., Hassabis, D., Pascanu, R., Blundell, C.: Memory-based parameter adaptation (2018). arXiv preprint arXiv:1802.10542

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Anup Shrikant Kunte or Vahida Z. Attar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kunte, A.S., Attar, V.Z. (2020). Progress in Neural Network Based Statistical Language Modeling. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-31756-0_11

Download citation

Publish with us

Policies and ethics