Advertisement

Combining Knowledge and CRF-Based Approach to Named Entity Recognition in Russian

  • V. A. MozharovaEmail author
  • N. V. LoukachevitchEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 661)

Abstract

Current machine-learning approaches for information extraction often include features based on large volumes of knowledge in form of gazetteers, word clusters, etc. In this paper we consider a CRF-based approach for Russian named entity recognition based on multiple lexicons. We test our system on the open Russian collections “Persons-1000” and “Persons-1111” labeled with personal names. We additionally annotated the collection “Persons-1000” with names of organizations, media, locations, and geo-political entities and present the results of our experiments for one type of names (Persons) for comparison purposes, for three types (Persons, Organizations, and Locations), and five types of names. We also compare two types of labeling schemes for Russian: IO-scheme and BIO-scheme.

Keywords

CRF Named entity recognition 

Notes

Acknowledgments

This work is partially supported by RFBR grant No. 15-07-09306.

References

  1. 1.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  2. 2.
    Tkachenko, M., Simanovsky, A.: Named entity recognition: exploring features. In: 11th Conference on Natural Language Processing, KONVENS 2012, pp. 118–127. Eigenverlag ÖGAI (2012)Google Scholar
  3. 3.
    Straková, J., Straka, M., Hajič, J.: A new state-of-the-art Czech named entity recognizer. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 68–75. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40585-3_10 Google Scholar
  4. 4.
    Brown, P.F., Della Pietra, V.J., Desouza, P.V., Lai, J.C., Mercer, R.L.: Class-based n-gram models of natural language. Comput. Linguist. 18(4), 467–479 (1992)Google Scholar
  5. 5.
    Marcińczuk, M., Stanek, M., Piasecki, M., Musiał, A.: Rich set of features for proper name recognition in polish texts. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 332–344. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-25261-7_26 CrossRefGoogle Scholar
  6. 6.
    Antonova, A.Y., Soloviev, A.N.: Conditional random field models for the processing of Russian. In: International Conference "Dialog 2013", pp. 27–44. RGGU (2013)Google Scholar
  7. 7.
    Podobryaev, A.V.: Persons recognition using CRF model. In: 15th All-Russian Scientific Conference “Digital Libraries: Advanced Methods and Technologies, Digital Collection", RCDL-2013, pp. 255–258. Demidov Yaroslavl State University (2013)Google Scholar
  8. 8.
    Gareev, R., Tkachenko, M., Solovyev, V., Simanovsky, A., Ivanov, V.: Introducing baselines for russian named entity recognition. In: Gelbukh, A. (ed.) CICLing 2013. LNCS, vol. 7816, pp. 329–342. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37247-6_27 CrossRefGoogle Scholar
  9. 9.
    Trofimov, I.V.: Person name recognition in news articles based on the persons-1000/1111-F collections. In: 16th All-Russian Scientific Conference “Digital Libraries: Advanced Methods and Technologies, Digital Collections", RCDL 2014, pp. 217–221 (2014)Google Scholar
  10. 10.
    Chrupala, G.: Efficient induction of probabilistic word classes with LDA. In: 5th International Joint Conference on Natural Language Processing, IJCNLP 2011, pp. 363–372. Asian Federation of Natural Language Processing (2011)Google Scholar
  11. 11.
    Clark, A.: Combining distributional and morphological information for part of speech induction. In: 10th Conference on European Chapter of the Association for Computational Linguistics, EACL 2003, vol. 1, pp. 59–66. ACL (2003)Google Scholar
  12. 12.
    Vlasova, N.A., Suleimanova, E.A., Trofimov, I.V.: The message about Russian collection for named entity recognition task. In: TEL 2014, pp. 36–40 (2014)Google Scholar
  13. 13.
    Chinchor, N., Robinson, P.: MUC-7 named entity task definition. In: 7th Conference on Message Understanding, p. 29 (1997)Google Scholar
  14. 14.
    Sang, T.K., Erik, F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: 7th conference on Natural language learning at HLT-NAACL 2003, vol. 4, pp. 142–147. ACL (2003)Google Scholar
  15. 15.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL (2005)Google Scholar
  16. 16.
    Loukachevitch, N., Dobrov, B.: RuThes linguistic ontology vs. Russian wordnets. In: Global WordNet Conference GWC-2014. Tartu (2014)Google Scholar
  17. 17.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: 13th Conference on Computational Natural Language Learning, CoNLL, pp. 147–155. ACL (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Lomonosov Moscow State UniversityMoscowRussia

Personalised recommendations