Application of a Hybrid Bi-LSTM-CRF Model to the Task of Russian Named Entity Recognition

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 789)


Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approaches have been proposed for this task in Russian language, it still has a substantial potential for the better solutions. In this work, we studied several deep neural network models starting from vanilla Bi-directional Long Short Term Memory (Bi-LSTM) then supplementing it with Conditional Random Fields (CRF) as well as highway networks and finally adding external word embeddings. All models were evaluated across three datasets Gareev’s, Person-1000 and FactRuEval 2016. We found that extension of Bi-LSTM model with CRF significantly increased the quality of predictions. Encoding input tokens with external word embeddings reduced training time and allowed to achieve state of the art for the Russian NER task.





The statement of author contributions. AL conducted initial literature review, selected a baseline (Bi-LSTM + CRF) model, prepared datasets and run experiments under supervision of MB. AM implemented and studied extensions of the NeuroNER model. AL drafted the first version of the paper. AM added a review of works related to the Russian NER and materials related to the NeuroNER modifications. MB, AL and AM edited and extended the manuscript.

This work was supported by National Technology Initiative and PAO Sberbank project ID 0000000007417F630002.


  1. 1.
    Patawar, M.L., Potey, M.A.: Approaches to named entity recognition: a survey. Int. J. Innov. Res. Comput. Commun. Eng. 3(12), 12201–12208 (2015)Google Scholar
  2. 2.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. ArXiv preprint arXiv: 1603.01360 (2016)
  3. 3.
    Dernoncourt F., Lee, J.Y., Szolovits P.: NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. ArXiv preprint arXiv:1705.05487 (2017)
  4. 4.
    Gareev, R., Tkachenko, M., Solovyev, V., Simanovsky, A., Ivanov, V.: Introducing baselines for russian named entity recognition. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 329–342. Springer, Heidelberg (2013). Scholar
  5. 5.
    Trofimov, I.V.: Person name recognition in news articles based on the persons-1000/1111-F collections. In: 16th All-Russian Scientific Conference Digital Libraries: Advanced Methods and Technologies, Digital Collection, RCDL 2014, pp. 217–221 (2014)Google Scholar
  6. 6.
    Mozharova V., Loukachevitch N.: Two-stage approach in Russian named entity recognition. In: 2016 International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT), pp. 1–6 (2016)Google Scholar
  7. 7.
    Ivanitskiy, R., Alexander, S., Liubov, K.: Russian named entities recognition and classification using distributed word and phrase representations. In: SIMBig, pp. 150–156 (2016)Google Scholar
  8. 8.
    Sysoev, A.A., Andrianov, I.A.: Named entity recognition in Russian: the power of Wiki-based approach. (2016)Google Scholar
  9. 9.
    Malykh, V., Ozerin, A.: Reproducing Russian NER baseline quality without additional data. In: Proceedings of the 3rd International Workshop on Concept Discovery in Unstructured Data, Moscow, Russia, pp. 54–59 (2016)Google Scholar
  10. 10.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). MIT PressCrossRefGoogle Scholar
  12. 12.
    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)CrossRefGoogle Scholar
  13. 13.
    Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional random fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)Google Scholar
  14. 14.
    Kutuzov, A., Kuzmenko, E.: WebVectors: a toolkit for building web interfaces for vector semantic models. In: Ignatov, D.I., Khachay, M.Y., Labunets, V.G., Loukachevitch, N., Nikolenko, S.I., Panchenko, A., Savchenko, A.V., Vorontsov, K. (eds.) AIST 2016. CCIS, vol. 661, pp. 155–161. Springer, Cham (2017). Scholar
  15. 15.
    Ekbal, A., Haque, R., Bandyopadhyay, S.: Named entity recognition in Bengali: a conditional random field approach. In: IJCNLP Conference, pp. 589–594 (2008)Google Scholar
  16. 16.
    Starostin, A.S., Bocharov, V.V., Alexeeva, S.V., Bodrova, A., Chuchunkov, A.S., Dzhumaev, S.S., Nikolaeva, M.A.: FactRuEval 2016: evaluation of named entity recognition and fact extraction systems for Russian. In: Proceedings of the Annual International Conference Dialogue on Computational Linguistics and Intellectual Technologies, no. 15, pp. 702–720 (2016)Google Scholar
  17. 17.
    Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. ArXiv preprint arXiv:1607.04606 (2016)
  18. 18.
    Vlasova, N.A., Suleymanova, E.A., Trofimov, I.V: Report on Russian corpus for personal name retrieval. In: Proceedings of Computational and Cognitive Linguistics, TEL 2014, Kazan, Russia, pp. 36–40 (2014)Google Scholar
  19. 19.
    Rubaylo, A.V., Kosenko, M.Y.: Software utilities for natural language information retrievial. Alm. Mod. Sci. Educ. 12(114), 87–92 (2016)Google Scholar
  20. 20.
    Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 2741–2749 (2016)Google Scholar
  21. 21.
    Pundak, G., Sainath, T.N.: Highway-LSTM and recurrent highway networks for speech recognition. In: Proceedings of Interspeech 2017, ISCA (2017)Google Scholar
  22. 22.
    Tran, P.-N., Ta, V.-D., Truong, Q.-T., Duong, Q.-V., Nguyen, T.-T., Phan, X.-H.: Named entity recognition for vietnamese spoken texts and its application in smart mobile voice interaction. In: Nguyen, N.T., Trawiński, B., Fujita, H., Hong, T.-P. (eds.) ACIIDS 2016. LNCS (LNAI), vol. 9621, pp. 170–180. Springer, Heidelberg (2016). Scholar
  23. 23.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Neural Networks and Deep Learning LabMoscow Institute of Physics and TechnologyDolgoprudnyRussia
  2. 2.Faculty of Information TechnologyVietnam Maritime UniversityHaiphongViet Nam

Personalised recommendations