Abstract
Statistical Language Modeling (LM) is one of the central steps in many Natural Language Processing (NLP) tasks including Automatic Speech recognition (ASR), Statistical Machine Translation (SMT) , Sentence completion, Automatic Text Generation to name a few. Good Quality Language Model has been one of the key success factors for many commercial NLP applications. Since past three decades diverse research communities like psychology, neuroscience, data compression, machine translation, speech recognition, linguistics etc, have advanced research in the field of Language Modeling. First we understand the mathematical background of LM problem. Further we review various Neural Network based LM techniques in the order they were developed. We also review recent developments in Recurrent Neural Network (RNN) Based Language Models. Early LM research in ASR gave rise to commercially successful class of LMs called as N-gram LMs. These class of models were purely statistical based and lacked in utilising the linguistic information present in the text itself. With the advancement in the computing power, availability of humongous and rich sources of textual data Neural Network based LM paved their way into the arena. These techniques proved significant, since they mapped word tokens into a continuous space than treating them as discrete. As NNLM performance was proved to be comparable to existing state of the art N-gram LMs researchers also successfully used Deep Neural Network to LM. Researchers soon realised that the inherent sequential nature of textual input make LM problem a good Candidate for use of Recurrent Neural Network (RNN) architecture. Today RNN is the choice of Neural Architecture to solve LM by most practitioners. This chapter sheds light on variants of Neural Network Based LMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
Chen, J.G.S.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting of the ACL (1996)
Joshua, T., Goodman, J.: A bit of progress in language modeling extended version, Machine Learning and Applied Statistics Group Microsoft Research. Technical Report, MSR-TR-2001-72 (2001)
Jelinek, F., Merialdo, B., Roukos, S., Strauss, M.: A dynamic language model for speech recognition. HLT 91, 293–295 (1991)
Bellegarda, J.R.: A multispan language modeling framework for large vocabulary speech recognition. IEEE Trans. Speech Audio Process. 6(5), 456–467 (1998)
Lau, R., Rosenfeld, R., Roukos, S.: Trigger-based language models: a maximum entropy approach. In: IEEE International Conference in Acoustics, Speech, and Signal Processing, ICASSP-93, vol. 2, pp. 45–48 (1993)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)
Rosenfeld R.: Adaptive statistical language modeling: a maximum entropy approach. Ph.D. thesis, Carnegie Mellon University (1994)
Chen, S.F.: Shrinking exponential language models, In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics pp. 468–476 (2009)
Chen, S.F., Mangu, L., Ramabhadran, B., Sarikaya, R., Sethy, A.: Scaling shrinkage-based language models. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 299–304 (2009)
Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, BRNO University of Technology, Faculty of information Technology (2012)
Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Innovations in Machine Learning, pp. 137–186, Springer (2006)
Arisoy, E., Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Deep neural network language models. In: Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, pp. 20–28 (2012)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)
De Mulder, W., Bethard, S., Moens, M.-F.: A survey on the application of recurrent neural networks to statistical language modeling. Comput. Speech Lang. 30(1), 61–98 (2015)
Graves, A.: Supervised sequence labelling. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5–13. Springer (2012)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. ICML 3(28), 1310–1318 (2013)
Bengio, Y., Senécal, J.S.: Adaptive importance sampling to accelerate training of a neural probabilistic language model. IEEE Trans. Neural Netw. 19(4), 713–722 (2008)
Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8624–8628 (2013)
Mikolov, T., Deoras, A., Povey, D., Burget, L., Černockỳ, J.: Strategies for training large scale neural network language models. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2011, pp. 196–201 (2011)
Ji, S., Vishwanathan, S., Satish, N., Anderson, M.J., Dubey, P.: Blackout Speeding up recurrent neural network language models with very large vocabularies (2015). arXiv preprint arXiv:1511.06909
Zoph, B., Vaswani, A., May, J., Knight, K.: Simple, fast noise-contrastive estimation for large rnn vocabularies. NAACL HLT, pp. 1217–1222 (2016)
Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling (2016). arXiv preprint arXiv:1602.02410
Cho, S.J.K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation (2014). CoRR arXiv:1412.2007
Luong, M.T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation (2014). arXiv preprint arXiv:1410.8206
Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C., Black, A.W., Trancoso, I.: Finding function in form: compositional character models for open vocabulary word representation (2015). arXiv preprint arXiv:1508.02096
Yang, Z., Dai, Z., Salakhutdinov, R., Cohen, W.W.: Breaking the softmax bottleneck: a high-rank RNN language model (2017). arXiv preprint arXiv:1711.03953
Merity, S., Keskar, N.S., Socher, R.: Regularizing and optimizing LSTM language models (2017). arXiv preprint arXiv:1708.02182
Krause, B., Kahembwe, E., Murray, I., Renals, S.: Dynamic evaluation of neural sequence models (2017). arXiv preprint arXiv:1709.07432
Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., Koehn, P., Robinson, T.: One billion word benchmark for measuring progress in statistical language modeling (2013). arXiv preprint arXiv:1312.3005
Kuchaiev, O., Ginsburg, B.: Factorization tricks for lstm networks (2017). arXiv preprint arXiv:1703.10722
Rae, J.W., Dyer, C., Dayan, P., Lillicrap, T.P.: Fast parametric learning with activation memorization (2018). arXiv preprint arXiv:1803.10049
Grave, E., Joulin, A., Usunier, N.: Improving neural language models with a continuous cache (2016). arXiv preprint arXiv:1612.04426
Sprechmann, P., Jayakumar, S.M., Rae, J.W., Pritzel, A., Badia, A.P., Uria, B., Vinyals, O., Hassabis, D., Pascanu, R., Blundell, C.: Memory-based parameter adaptation (2018). arXiv preprint arXiv:1802.10542
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kunte, A.S., Attar, V.Z. (2020). Progress in Neural Network Based Statistical Language Modeling. In: Pedrycz, W., Chen, SM. (eds) Deep Learning: Concepts and Architectures. Studies in Computational Intelligence, vol 866. Springer, Cham. https://doi.org/10.1007/978-3-030-31756-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-31756-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31755-3
Online ISBN: 978-3-030-31756-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)