Advertisement

Morphosyntactic Analyzer for the Tibetan Language: Aspects of Structural Ambiguity

  • Alexei Dobrov
  • Anastasia Dobrova
  • Pavel GrokhovskiyEmail author
  • Nikolay Soms
  • Victor Zakharov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9924)

Abstract

The paper deals with the development of a morphosyntactic analyzer for the Tibetan language. It aims to create a consistent formal grammatical description (formal grammar) of the Tibetan language, including all grammar levels of the language system from morphosyntax (syntactics of morphemes) to the syntax of composite sentences and supra-phrasal entities. Syntactic annotation was created on the basis of morphologically tagged corpora of Tibetan texts. The peculiarity of the annotation consists in combining both the immediate constituents structure and the dependency one. An individual (basic) grammar module of Tibetan grammatical categories, its possible values, and restrictions on their combination are created. Types of tokens and their grammatical features form the basis of the formal grammar being produced, allowing linguistic processor to build syntactic trees of various kinds. Methods of avoiding redundant structural ambiguity are proposed.

Keywords

Corpus linguistics Tibetan language Morphosyntactic analyzer Tokenization Immediate constituents Dependency grammar Natural language processing 

Notes

Acknowledgment

Development of the morphosyntactic analyzer for the Tibetan language is supported by the grant No. 16-06-00578 “Morphosyntactic Analyser of Texts in the Tibetan language” by Russian Foundation for Basic Research. Authors also acknowledge Saint-Petersburg State University for a research grant 2.38.293.2014 “Modernizing the Tibetan Literary Tradition” which enabled the conceptual study of the original Tibetan texts.

References

  1. 1.
    Beyer, S.: The Classical Tibetan Language. State University of New York, New York (1992)Google Scholar
  2. 2.
    Gladkii, A.V.: Syntactic Structures of Natural Language in Automated Communication Systems [Sintaksicheskie struktury estestvennogo jazyka v avtomatizirovannyh sistemah obshhenija]. Nauka, Moscow (1985)Google Scholar
  3. 3.
    Dobrov, A.V.: Automatic Classification of News by Means of Syntactic Semantics [Avtomaticheskaja rubrikacija novostnyh soobshhenij sredstvami sintaksicheskoj semantiki]. Doctoral thesis, Saint-Petersburg State University (2014)Google Scholar
  4. 4.
    Grokhovskiy, P., Khokhlova, M., Smirnova, M., Zakharov, V.: Tibetan linguistic terminology on the base of the Tibetan traditional grammar treatises corpus. In: Král, P., et al. (eds.) TSD 2015. LNCS, vol. 9302, pp. 299–306. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-24033-6_34 CrossRefGoogle Scholar
  5. 5.
    Tseitin, G.S.: Programming in Associative Networks [Programmirovanie na associativnyh setjah], Computers in Designing and Manufacturing [EVM v proektirovanii i proizvodstve] (2). Mashinostroenie, Leningrad (1985)Google Scholar
  6. 6.
    Andersen, P.K.: Zero-anaphora and related phenomena in classical Tibetan. Stud. Lang. 11, 279–312 (1987)CrossRefGoogle Scholar
  7. 7.
    Denwood, P.: Tibetan. John Benjamins Publishing, Amsterdam (1999)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexei Dobrov
    • 1
  • Anastasia Dobrova
    • 2
  • Pavel Grokhovskiy
    • 1
    Email author
  • Nikolay Soms
    • 2
  • Victor Zakharov
    • 1
  1. 1.Saint-Petersburg State UniversitySaint-PetersburgRussia
  2. 2.LLC “AIIRE”Saint-PetersburgRussia

Personalised recommendations