International Conference on Text, Speech and Dialogue

TSD 2012: Text, Speech and Dialogue pp 174-182

Towards a Constraint Grammar Based Morphological Tagger for Croatian

  • Hrvoje Peradin
  • Jan Šnajder
Conference paper

DOI: 10.1007/978-3-642-32790-2_21

Volume 7499 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Peradin H., Šnajder J. (2012) Towards a Constraint Grammar Based Morphological Tagger for Croatian. In: Sojka P., Horák A., Kopeček I., Pala K. (eds) Text, Speech and Dialogue. TSD 2012. Lecture Notes in Computer Science, vol 7499. Springer, Berlin, Heidelberg

Abstract

A Constraint Grammar (CG) uses context-dependent hand-crafted rules to disambiguate the possible grammatical readings of words in running text. In this paper we describe the development of a CG-based morphological tagger for Croatian language. Our CG tagger uses a morphological analyzer based on an automatically acquired inflectional lexicon and an elaborate tagset based on MULTEXT-East and the Croatian Verb Valence Lexicon. Currently our grammar has 290 rules, organized into cleanup and mapping rules, disambiguation rules, and heuristic rules. The grammar is implemented in the CG3 formalism and compiled with the vislcg3 open-source compiler. The preliminary tagging performance is P:,96.1%, R:,99.8% for POS tagging and P:,88.2%, R:,98.1% for complete morphosyntactic tagging.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hrvoje Peradin
    • 1
  • Jan Šnajder
    • 2
  1. 1.Faculty of Science, Department of MathematicsUniversity of ZagrebZagrebCroatia
  2. 2.Faculty Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia