Context-Based Rules for Grammatical Disambiguation in the Tatar Language

  • Ramil Gataullin
  • Bulat Khakimov
  • Dzhavdet Suleymanov
  • Rinat Gilmullin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10449)


The paper is dedicated to the problem of grammatical ambiguity in the Tatar National Corpus and describes the methodology and software used for automation of the disambiguation process. Grammatical ambiguity is widely represented in agglutinative languages like Turkic or Finno-Ugric. Disambiguation in the corpus is based on the context-oriented classification of ambiguity types which has been carried out on corpus data in the Tatar language for the first time. In this study the corpus is used as a source for the research and at the same time as a destination for implementing the results. The grammatical ambiguity types are detected automatically using the finite-state morphological analyzer and then classified. In order to build up the grammatically disambiguated subcorpus, a special software module was developed. It searches for ambiguous tokens in the corpus, collects statistical information and allows creating and implementing the formal context-based disambiguation rules for different ambiguity types.


Disambiguation Grammatical homonymy Context-based rules Linguistic software Turkic languages Corpus linguistics 


  1. 1.
    «Tugan Tel» Tatar National Corpus Homepage. 05 June 2017
  2. 2.
    Suleymanov, D.S., Nevzorova, O.A., Gatiatullin, A.R., Gilmullin, R.A., Khakimov, B.E.: National corpus of the Tatar language “Tugan Tel”: grammatical annotation and implementation. Procedia Soc. Behav. Sci. 95, 68–74 (2013)CrossRefGoogle Scholar
  3. 3.
    Suleymanov, D.S., Khakimov, B.E., Gilmullin, R.A.: Corpus of Tatar: conception and linguistic aspects (in Russian). Philol. Cult. 4(26), 211–216 (2011)Google Scholar
  4. 4.
    Suleymanov, D.S., Gilmullin, R.A.: Two-level description of the Tatar morphology (in Russian). In: Proceedings of “Language Semantics and Image of the World” International Scientific Conference, vol. 2, pp. 65–67. Kazan State University, Kazan (1997)Google Scholar
  5. 5.
    Galieva, A.M., Khakimov, B.E., Gatiatullin, A.R.: A Metalanguage for describing the structure of Tatar word forms for corpus grammatical annotations (in Russian). In: Uchenye Zapiski Kazanskogo Universiteta, vol. 155(5), pp. 287–296. Seriya Gumanitarnye Nauki (2013)Google Scholar
  6. 6.
    HFST Homepage. Accessed 20 Apr 2017
  7. 7.
    Kurbatov, K.: Grammatical homonyms in the Tatar language (in Tatar). J. Tatar Lang. Lit. 307–311 (1959)Google Scholar
  8. 8.
    Salimgarayeva, B.: Homonyms in modern Tatar language: abstract of dissertation (in Tatar). Bashkir State University, Ufa (1971)Google Scholar
  9. 9.
    Salakhova, R.R.: Homonym suffixes of the Tatar language (in Russian). Gumanitarya, Kazan (2007)Google Scholar
  10. 10.
    Khakimov, B.E., Gilmullin, R.A., Gataullin, R.R.: Grammatical disambiguation in the corpus of the Tatar Language (in Russian). Uchenye Zapiski Kazanskogo Universiteta. Seriya Gumanitarnye Nauki 156(5), 236–244 (2014)Google Scholar
  11. 11.
    Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. In: Proceedings of the Third Workshop on Very Large Corpora, vol. 30, pp. 1–13. Association for Computational Linguistics, Somerset (1995)Google Scholar
  12. 12.
    Yuret, D., Ture, F.: Learning morphological disambiguation rules for Turkish. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp. 328–334. ACL, New York (2006)Google Scholar
  13. 13.
    Nevzorova, O.A., Zinkina, Y., Pyatkin, N.: Resolution of functional homonymy in the Russian language based on context rules (in Russian). In: Proceedings of “Dialog’2005” International Conference, pp. 198–202. Nauka, Moscow (2005)Google Scholar
  14. 14.
    Tatar Grammar: Morphology (in Russian), vol. 2. Tatar Publishing Company, Kazan (1993)Google Scholar
  15. 15.
    Tatar Grammar: Morphology (in Tatar), vol. 2. Insan, Moscow. Fiker, Kazan (2002)Google Scholar
  16. 16.
    Gataullin, R.R., Gilmullin, R.A.: Web interface for removing morphological ambiguity in the corpus of the Tatar language (in Russian). In: Open Semantic Technologies for Intelligent Systems OSTIS-2015 Proceedings of IV International Scientific and Technical Conference, pp. 451–454. BSUIR, Minsk (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Applied Semiotics, TASKazanRussia
  2. 2.Kazan (Volga Region) Federal UniversityKazanRussia

Personalised recommendations