Named Entity Recognition in Russian with Word Representation Learned by a Bidirectional Language Model

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 930)


Named Entity Recognition is one of the most popular tasks of the natural language processing. Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for natural language processing tasks. However, in most cases, a recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively few labeled data. Also, there are many difficulties in processing Russian language. In this paper, we present a semi-supervised approach for adding deep contextualized word representation that models both complex characteristics of word usage (e.g., syntax and semantics), and how these usages vary across linguistic contexts (i.e., to model polysemy). Here word vectors are learned functions of the internal states of a deep bidirectional language model, which is pretrained on a large text corpus. We show that these representations can be easily added to existing models and be combined with other word representation features. We evaluate our model on FactRuEval-2016 dataset for named entity recognition in Russian and achieve state of the art results.


NER Word representation Semi-supervised learning Language modeling Bi-LSTM 



The authors would like to thank Ivan Smetannikov for helpful conversation and unknown AINL reviewers for useful comments.

The research was supported by the Government of the Russian Federation (Grant 08-08).


  1. 1.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  2. 2.
    Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)Google Scholar
  3. 3.
    Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)zbMATHGoogle Scholar
  4. 4.
    Wieting, J., Bansal, M., Gimpel, K., Livescu, K.: Charagram: embedding words and sentences via character n-Grams. arXiv:1607.02789 (2016)
  5. 5.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. arXiv:1607.04606 (2016)
  6. 6.
    Neelakantan, A., Shankar, J., Passos, A., McCallum, A.: Efficient non-parametric estimation of multiple embeddings per word in vector space. arXiv:1504.06654 (2015)
  7. 7.
    Melamud, O., Goldberger, J., Dagan, I.: context2vec: learning generic context embedding with bidirectional LSTM. In: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pp. 51–61 (2016)Google Scholar
  8. 8.
    McCann, B., Bradbury, J., Xiong, C., Socher, R.: Learned in translation: contextualized word vectors. In: Advances in Neural Information Processing Systems, pp. 6297–6308 (2017)Google Scholar
  9. 9.
    Peters, M.E., Ammar, W., Bhagavatula, C., Power, R.: Semi-supervised sequence tagging with bidirectional language models. arXiv:1705.00108 (2017)
  10. 10.
    Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: growing a neural network for multiple NLP tasks. arXiv:1611.01587 (2016)
  11. 11.
    Belinkov, Y., Durrani, N., Dalvi, F., Sajjad, H., Glass, J.: What do neural machine translation models learn about morphology? arXiv:1704.03471 (2017)
  12. 12.
    Yang, Z., Salakhutdinov, R., Cohen, W.W.: Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv:1703.06345 (2017)
  13. 13.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv:1603.01360 (2016)
  14. 14.
    Peters, M.E., et al.: Deep contextualized word representations. arXiv:1802.05365 (2018)
  15. 15.
    Howard, J., Sebastian, R.: Fine-tuned language models for text classification. arXiv:1801.06146 (2018)
  16. 16.
    Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., Wu, Y.: Exploring the limits of language modeling. arXiv:1602.02410 (2016)
  17. 17.
    Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Advances in Neural Information Processing Systems, pp. 3320–3328 (2014)Google Scholar
  18. 18.
    Ruder, S.: An Overview of gradient descent optimization algorithms. arXiv:1609.04747 (2016)
  19. 19.
    Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., Lehmann, S.: Using millions of Emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. arXiv:1708.00524 (2017)
  20. 20.
    Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
  21. 21.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems, pp. 2377–2385 (2015)Google Scholar
  22. 22.
    Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)Google Scholar
  23. 23.
    Trofimov, I.V.: Person name recognition in news articles based on the per-sons1000/1111-F collections. In: 16th All-Russian Scientific Conference Digital Libraries: Advanced Methods and Technologies, Digital Collections, RCDL 2014, pp. 217–221 (2014)Google Scholar
  24. 24.
    Gareev, R., Tkachenko, M., Solovyev, V., Simanovsky, A., Ivanov, V.: Introducing baselines for russian named entity recognition. In: Gelbukh, A. (ed.) CICLing 2013 Part I. LNCS, vol. 7816, pp. 329–342. Springer, Heidelberg (2013). Scholar
  25. 25.
    Mozharova, V., Loukachevitch, N.: Two-stage approach in Russian named entity recognition. In: Proceeding of IEEE International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT 2016), pp. 1–6 (2016)Google Scholar
  26. 26.
    Ivanitskiy, R., Shipilo, A., Kovriguina, L.: Russian named entities recognition and classification using distributed word and phrase representations. In: SIMBig, pp. 150–156 (2016)Google Scholar
  27. 27.
    Sysoev, A.A., Andrianov, I.A.: Named entity recognition in Russian: the power of wiki-based approach. In: Dialog Conference (2016, in Russian)Google Scholar
  28. 28.
    Malykh, V., Ozerin, A.: Reproducing Russian NER baseline quality without additional data. In: CDUD@ CLA, pp. 54–59 (2016)Google Scholar
  29. 29.
    Rubaylo, A.V., Kosenko, M.Y.: Software utilities for natural language information retrievial. Alm. Mod. Sci. Educ. 12(114), 87–92 (2016)Google Scholar
  30. 30.
    Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991 (2015)
  31. 31.
    Tutubalina, E., Nikolenko, S.: Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. J Healthc. Eng. 2017 (2017)CrossRefGoogle Scholar
  32. 32.
    Anh, L.T., Arkhipov, M.Y., Burtsev. M.S.: Application of a hybrid Bi-LSTM-CRF model to the task of russian named entity recognition. arXiv:1709.09686 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.ITMO UniversitySaint PetersburgRussia
  2. 2.Kurchatov InstituteMoscowRussia

Personalised recommendations