Advertisement

Neural Computing and Applications

, Volume 31, Supplement 2, pp 999–1011 | Cite as

Global context-dependent recurrent neural network language model with sparse feature learning

  • Hongli Deng
  • Lei ZhangEmail author
  • Lituan Wang
Original Article

Abstract

Recurrent neural network language models (RNNLMs) are an important type of language model. In recent years, context-dependent RNNLMs are the most widely used ones as they apply additional information summarized from other sequences to access the larger context. However, when the sequences are mutually independent or randomly shuffled, these models cannot learn useful additional information, resulting in no larger context taken into account. In order to ensure that the model can obtain more contextual information in any case, a new language model is proposed in this paper. It can capture the global context just by the words within the current sequences, incorporating all the preceding and following words of target, without resorting to additional information summarized from other sequences. This model includes two main modules: a recurrent global context module used for extracting the global contextual information of the target and a sparse feature learning module that learns the sparse features of all the possible output words to distinguish the target word from others at the output layer. The proposed model was tested on three language modeling tasks. Experimental results show that it improves the perplexity of the model, speeds up the convergence of the network and learns better word embeddings compared with other language models.

Keywords

Recurrent neural network Language model Global context Sparse feature Deep learning 

Notes

Acknowledgements

This work was supported by Fok Ying Tung Education Foundation (Grant 151068); National Natural Science Foundation of China (Grants 61332002); and Foundation for Youth Science and Technology Innovation Research Team of Sichuan Province (Grants 2016TD0018).

Compliance with ethical standards

Conflicts of interest

The authors declare that they have no conflicts of interest to this work.

References

  1. 1.
    Bengio Y, Schwenk H, Sencal JS, Morin F, Gauvain JL (2003) A neural probabilistic language model. J Mach Learn Res 3(6):1137–1155Google Scholar
  2. 2.
    Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRefGoogle Scholar
  3. 3.
    Brown PF, Desouza PV, Mercer RL, Pietra VJD, Lai JC (1997) Class-based n -gram models of natural language. Comput Linguist 18(4):467–479Google Scholar
  4. 4.
    Chelba C, Mikolov T, Schuster M, Ge Q, Brants T, Koehn P, Robinson T (2013) One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint; arXiv:1312.3005
  5. 5.
    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537zbMATHGoogle Scholar
  6. 6.
    Federico M (1996) Bayesian estimation methods for n-gram language model adaptation. In: Proceedings of the international conference on spoken language, Icslp 96. vol 1, pp 240–243Google Scholar
  7. 7.
    Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471CrossRefGoogle Scholar
  8. 8.
    Gers FA, Schraudolph NN, Schmidhuber J (2003) Learning precise timing with LSTM recurrent networks. J Mach Learn Res 3:115–143MathSciNetzbMATHGoogle Scholar
  9. 9.
    Graves A (2013) Generating sequences with recurrent neural networks. arXiv preprint; arXiv:1308.0850
  10. 10.
    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefGoogle Scholar
  11. 11.
    Jozefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint; arXiv:1602.02410
  12. 12.
    Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv preprint; arXiv:1508.06615
  13. 13.
    Kneser R, Ney H (1995) Improved backing-off for n-gram language modelingGoogle Scholar
  14. 14.
    Lee H, Ekanadham C, Ng AY (2008) Sparse deep belief net model for visual area v2. In: Advances in neural information processing systems, pp 873–880Google Scholar
  15. 15.
    Liu X, Chen X, Gales M, Woodland P (2015) Paraphrastic recurrent neural network language models. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015, pp 5406–5410. IEEEGoogle Scholar
  16. 16.
    Mahoney M (2009) Large text compression benchmark. URL: http://www. mattmahoney. net/text/text. htmlGoogle Scholar
  17. 17.
    Marcus MP, Marcinkiewicz MA, Santorini B (1993) Building a large annotated corpus of english: the Penn treebank. Comput Linguist 19(2):313–330Google Scholar
  18. 18.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint; arXiv:1301.3781
  19. 19.
    Mikolov T, Karafiát M, Burget L, Cernockỳ J, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH, vol 2, p 3Google Scholar
  20. 20.
    Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: SLT, pp 234–239Google Scholar
  21. 21.
    Niesler TR, Woodland PC (1996) A variable-length category-based n-gram language model. In: ICASSP, pp 164–167Google Scholar
  22. 22.
    Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint; arXiv:1312.6026
  23. 23.
    Peng X, Yu Z, Yi Z, Tang H (2016) Constructing the L2-graph for robust subspace learning and subspace clustering. IEEE Trans Cybern 99: 1–14. 10.1109/TCYB.2016.2536752Google Scholar
  24. 24.
    Saxe AM, McClelland JL, Ganguli S (2013) Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint; arXiv:1312.6120
  25. 25.
    Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958MathSciNetzbMATHGoogle Scholar
  26. 26.
    Sukhbaatar S, Weston J, Fergus R et al (2015) End-to-end memory networks. In: Advances in neural information processing systems, pp 2431–2439Google Scholar
  27. 27.
    Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: INTERSPEECH, pp 194–197Google Scholar
  28. 28.
    Team TD, Alrfou R, Alain G, Almahairi A, Angermueller C, Bahdanau D, Ballas N, Bastien F, Bayer J, Belikov A (2016) Theano: a python framework for fast computation of mathematical expressionsGoogle Scholar
  29. 29.
    Tomáš M (2012) Statistical language models based on neural networks. Ph.D. thesis, Brno University of Technology, 2012Google Scholar
  30. 30.
    Wang T, Cho K (2015) Larger-context language modelling. arXiv preprint; arXiv:1511.03729
  31. 31.
    Xiong D, Zhang M, Li H (2011) Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, pp 1288–1297. Association for Computational LinguisticsGoogle Scholar
  32. 32.
    Yamamoto H, Isogai S, Sagisaka Y (2003) Multi-class composite n-gram language model. Syst Comput Jpn 34(7):108–114CrossRefGoogle Scholar
  33. 33.
    Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint; arXiv:1409.2329
  34. 34.
    Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv preprint; arXiv:1212.5701
  35. 35.
    Zhang S, Jiang H, Wei S, Dai L (2015) Feedforward sequential memory neural networks without recurrent feedback. arXiv preprint; arXiv:1510.02693
  36. 36.
    Zhang S, Jiang H, Xu M, Hou J, Dai L (2015) The fixed-size ordinally-forgetting encoding method for neural network language models. Short Papers 2: 495Google Scholar
  37. 37.
    Zhen L, Peng D, Yi Z, Xiang Y, Chen P (2016) Underdetermined blind source separation using sparse coding. IEEE Trans Neural Netw Learn Syst 99: 1–7. doi: 10.1109/TNNLS.2016.2610960

Copyright information

© The Natural Computing Applications Forum 2017

Authors and Affiliations

  1. 1.College of Computer ScienceSichuan UniversityChengduChina
  2. 2.Education and Information Technology CenterChina West Normal UniversityNanchongChina

Personalised recommendations