Advertisement

Annotated Clause Boundaries’ Influence on Parsing Results

  • Dage Särg
  • Kadri Muischnek
  • Kaili Müürisep
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

The aim of the paper is to study the effect of pre-annotated clause boundaries on dependency parsing of Estonian new media texts. Our hypothesis is that correct identification of clause boundaries helps to improve parsing because as the text is split into smaller syntactically meaningful units, it should be easier for the parser to determine the syntactic structure of a given unit. To test the hypothesis, we performed two experiments on a 14,000-word corpus of Estonian web texts whose morphological analysis had been manually validated. In the first experiment, the corpus with gold standard morphological tags was parsed with MaltParser both with and without the manually annotated clause boundaries. In the second experiment, only the segmentation of the text was preserved and the morphological analysis was done automatically before parsing. The experiments confirmed our hypothesis about the influence of correct clause boundaries by a small margin: in both experiments, the improvement of LAS was 0.6%.

Keywords

Dependency parsing Clause boundaries New media language Estonian 

References

  1. 1.
    Müürisep, K., Nigol, H.: Disfluency detection and parsing of transcribed speech of Estonian. In: Proceedings of 3rd LTC, pp. 483–487 (2007)Google Scholar
  2. 2.
    Kong, L., Schneider, N., Swayamdipta, S., Bhatia, A., Dyer, C., Smith, N.A.: A dependency parser for tweets. In: Proceedings of EMNLP 2014, pp. 1001–1012 (2014)Google Scholar
  3. 3.
    Foster, J., et al.: From news to comment: resources and benchmarks for parsing the language of web 2.0. In: Proceedings of the 5th IJCNLP, pp. 893–901 (2011)Google Scholar
  4. 4.
    Seddah, D., Sagot, B., Candito, M., Mouilleron, V., Combet, V. The French social media bank: a treebank of noisy user generated content. In: Proceedings of COLING 2012, Technical Papers, pp. 2441–2458 (2012)Google Scholar
  5. 5.
    Särg, D.: Adapting constraint grammar for parsing Estonian chatroom Texts. In: Proceedings of TLT 14, pp. 300–307 (2015)Google Scholar
  6. 6.
    Kallas, J., Koppel, K., Tuulik, M.: Korpusleksikograafia uued võimalused eesti keele kollokatsioonisõnastiku näitel. In: Eesti Rakenduslingvistika Ühingu aastaraamat, vol. 11, pp. 75–94 (2015)CrossRefGoogle Scholar
  7. 7.
    Muischnek, K., Müürisep, K., Puolakainen, T., Aedmaa, E., Kirt, R., Särg, D.: Estonian dependency treebank and its annotation scheme. In: Proceedings of TLT 13, pp. 285–291 (2014)Google Scholar
  8. 8.
    Nivre, J., Hall, J., Nilsson, J.: Malt-parser: a data-driven parser-generator for dependency parsing. In: Proceedings of the 5th LREC, pp. 2216–2219 (2006)Google Scholar
  9. 9.
    Muischnek, K., Müürisep, K., Puolakainen, T.: Parsing and beyond. Tools and resources for Estonian. Acta Linguist. Acad. 64(3), 347–367 (2017)CrossRefGoogle Scholar
  10. 10.
    Kaalep, H.-K., Vaino, T.: Complete morphological analysis in the linguist’s toolbox. In: Congressus Nonus Internationalis FennoUgristarum Pars V, pp. 9–17 (2000)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Dage Särg
    • 1
    • 2
  • Kadri Muischnek
    • 1
    • 2
  • Kaili Müürisep
    • 1
  1. 1.Institute of Computer ScienceUniversity of TartuTartuEstonia
  2. 2.Institute of Estonian and General LinguisticsUniversity of TartuTartuEstonia

Personalised recommendations