Verbal Morphosyntactic Disambiguation through Topological Field Recognition in German-Language Law Texts

  • Kyoko Sugisaki
  • Stefan Höfler
Part of the Communications in Computer and Information Science book series (CCIS, volume 380)

Abstract

The morphosyntactic disambiguation of verbs is a crucial pre-processing step for the syntactic analysis of morphologically rich languages like German and domains with complex clause structures like law texts. This paper explores how much linguistically motivated rules can contribute to the task. It introduces an incremental system of verbal morphosyntactic disambiguation that exploits the concept of topological fields. The system presented is capable of reducing the rate of POS-tagging mistakes from 10.2% to 1.6%. The evaluation shows that this reduction is mostly gained through checking the compatibility of morphosyntactic features within the long-distance syntactic relationships of discontinuous verbal elements. Furthermore, the present study shows that in law texts, the average distance between the left and right bracket of clauses is relatively large (9.5 tokens), and that in this domain, a wide context window is therefore necessary for the morphosyntactic disambiguation of verbs.

Keywords

Morphosyntactic disambiguation topological field model Constraint Grammar law texts German verbs POS-tagging 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bangalore, S., Joshi, A.K.: Supertagging: an approach to almost parsing. Computational Linguistics 25(2) (1999)Google Scholar
  2. 2.
    Becker, M.: Frank. A.: A Stochastic Topological Parser for German. In: Proceedings of COLING 2002, pp. 71–77. Association of Computational Linguistics, New York (2002)Google Scholar
  3. 3.
    Dudenredaktion (ed.): Duden - die Grammatik: unentbehrlich für richtiges Deutsch, Duden, vol. 4. Dudenverlag, Mannheim (2009)Google Scholar
  4. 4.
    Dürscheid, C.: Syntax: Grundlagen und Theorien. Vandenhoeck & Ruprecht, Göttingen (2012)Google Scholar
  5. 5.
    Foth, K., By, T., Menzel, W.: Guiding a constraint dependency parser with supertags. In: Bangalore, S., Joshi, A.K. (eds.) Supertagging: Using Complex Lexical Descriptions in Natural Language Processing. MIT Press, Cambridge (2010)Google Scholar
  6. 6.
    Frank, A., Becker, M., Crysmann, B., Kiefer, B., Schäfer, U.: Integrated Shallow and Deep Parsing: TopP Meets HPSG. In: Proceedings of ACL 2003, pp. 104–111. Association for Computational Linguistics, New York (2003)Google Scholar
  7. 7.
    Haapalainen, M., Majorin, A.: GERTWOL: ein System zur automatischen Wortformerkennung deutscher Wörter. Technical report, Lingsoft (1994)Google Scholar
  8. 8.
    Hansen-Schirra, S., Neumann, S.: Linguistische Verständlichmachung in der juristischen Realität. In: Lerch, K.D. (ed.) Recht verstehen: Verständlichkeit, Missverständlichkeit und Unverständlichkeit von Recht, Die Sprache des Rechts, vol. 1. Walter de Gruyter, Berlin (2004)Google Scholar
  9. 9.
    Harper, M.P., Wang, W.: Constraint dependency grammars: Superarvs, language modeling, and parsing. In: Bangalore, S., Joshi, A.K. (eds.) Supertagging: Using Complex Lexical Descriptions in Natural Language Processing. MIT Press, Cambridge (2010)Google Scholar
  10. 10.
    Hinrichs, E.W., Kübler, S., Müller, F.H., Ule, T.: A hybrid architecture for robust parsing of German. In: Proceedings of the 3rd International Confererence on Language Resources and Evaluation (LREC 2002), Las Palmas, Gran Canaria (2002)Google Scholar
  11. 11.
    Höfler, S., Piotrowski, M.: Building Corpora for the Philological Study of Swiss Legal Texts. Journal for Language Technology and Computational Linguistics (JLCL) 26(2), 77–89 (2011)Google Scholar
  12. 12.
    Höfler, S., Sugisaki, K.: From Drafting to Error Detection: Automating Style Checking for Legislative Texts. In: EACL 2012 Workshop on Computational Linguistics and Writing, pp. 9–18. Association for Computational Linguistics, New York (2012)Google Scholar
  13. 13.
    Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: A Language- Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin/New York (1995)Google Scholar
  14. 14.
    Kathol, A.: Linear syntax. Oxford University Press, Oxford (2000)Google Scholar
  15. 15.
    Nasr, A., Rambow, O.: Supertagging and full parsing. In: Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+7), Vancouver, British Columbia, Canada, pp. 56–63 (2004)Google Scholar
  16. 16.
    Neumann, G., Braun, C., Piskorski, J.: A divide-and-conquer strategy for shallow parsing of German free texts. In: Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLC 2000), Seatle, WA, pp. 239–246 (2000)Google Scholar
  17. 17.
    Nussbaumer, M.: Rhetorisch-stilistische Eigenschaften der Sprache des Rechtswesens. In: Fix, U., Gardt, A., Knape, J. (eds.) Rhetorik und Stilistik / Rhetoric and Stylistics, Handbooks of Linguistics and Communication Science, vol. 31(2), pp. 2132–2150. Mouton de Gruyter, Boston/New York (2009)Google Scholar
  18. 18.
    Schiller, A., Teufel, C., Stöckert, C., Thielen, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS (kleines und grosses Tagset). Technical report, Universität Stuttgart/Universität Tübingen (1999)Google Scholar
  19. 19.
    Schmid, H.: Improvements in Part-of-Speech Tagging with an Application to German. In: Proceedings of the ACL SIGDAT-Workshop, Dublin (1995)Google Scholar
  20. 20.
    Schneider, G., Volk, M.: Adding Manual Constraints and Lexical Look-Up to a Brill-Tagger for German. In: Proceedings of the ESSLLI 1998 Workshop on Recent Advances in Corpus Annotation, Saarbrücken (1998)Google Scholar
  21. 21.
    Volk, M., Schneider, G.: Comparing a Statistical and a Rule-Based Tagger for German. In: Lang, P., Frankfurt, A.M. (ed.) Proceeding of the 4th Conference on Natural Language Processing (KONVENS 1998), Berlin, Bern, New York, Paris, Wien, pp. 125–137 (1998)Google Scholar
  22. 22.
    Voutilainen, A.: NPtool, A Detector of English Noun Phrases. In: Proceeding of Workshop on Very Large Corpora: Academic and Industrial Perspectives, pp. 48–57. Ohio State University, Columbus (1993)Google Scholar
  23. 23.
    Voutilainen, A.: A Syntax-Based Part-of-Speech Analyser. In: Proceedings of the Seventh Conference on European Chapter of the Association for Computational Linguistics, EACL 1995, pp. 157–164. Morgan Kaufmann, San Francisco (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Kyoko Sugisaki
    • 1
  • Stefan Höfler
    • 1
  1. 1.Institute of Computational LinguisticsUniversity of ZurichZürichSwitzerland

Personalised recommendations