Advertisement

POS Tagging and Less Resources Languages Individuated Features in CorpusWiki

  • Maarten JanssenEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)

Abstract

CorpusWiki (http://www.corpuswiki.org) is an online tool for building POS tagged corpora in (almost) any language. The system is primarily aimed at those languages for which no corpus data exist, and for which it would be very difficult to create tagged data by traditional means. This article describes how CorpusWiki uses individuated morphosyntactic features to combine the flexibility required in annotating less-described languages with the requirements of a POS tagger.

Keywords

POS tagging Less-resourced languages Morphosyntax 

References

  1. 1.
    Beerman, D., Mihaylov, P.: TypeCraft collaborative databasing and resource sharing for linguists. In: Proceedings of the 9th Extended Semantic Web Conference, Workshop, Interacting with Linked Data, 27th–31st May 2012 (2012)Google Scholar
  2. 2.
    Beridze, M., Nadaraia, D.: The corpus of Georgian dialects. In: Proceedings of the Fifth International Conference, Slovakia (2009)Google Scholar
  3. 3.
    Drude, S.: Advanced glossing: a language documentation format and its implementation with shoebox. In: Paper presented at the International Workshop on Resources and Tools in Field Linguistics, Las Palmas, Spain, 26–27 May 2002 (2002)Google Scholar
  4. 4.
    Farrar, S., Langendoen, D.T.: A linguistic ontology for the semantic web. GLOT Int. 7, 97–100 (2003)Google Scholar
  5. 5.
    Janssen, M.: Inline contraction decomposition: language independent POS tagging in the CorpusWiki project. In: Paper presented at the 10th Tbilisi Symposium, Gudauri (2013)Google Scholar
  6. 6.
    Janssen, M.: Multi-level manuscript transcription: TEITOK. In: Paper presented at Congresso de Humanidades Digitais em Portugal, Lisboa (2015)Google Scholar
  7. 7.
    Meurer, P.: Constructing an annotated corpus for Georgian. In: Paper presented at the 9th Tbilisi Symposium, Kutaisi (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Centro de LingusticaUniversidade de LisboaLisbonPortugal

Personalised recommendations