Tagging a Morphologically Complex Language Using Heuristics

  • Hrafn Loftsson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)

Abstract

We describe and evaluate heuristics, a collection of algorithmic procedures, which have been developed as a part of a linguistic rule-based tagger, IceTagger, for POS tagging Icelandic text. The purpose of the heuristics is to mark grammatical functions and prepositional phrases, and use this information to force feature agreement where appropriate. The heuristics are run after the application of local rules, i.e. rules which perform initial disambiguation based on a local context. Evaluation shows that the accuracy of two of the heuristics, which guess subjects and objects of verbs, is relatively high when compared to the results of parsing-based systems. Similar heuristics could be used for POS tagging texts in other morphologically complex languages.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. Computational Linguistics 21, 543–565 (1995)Google Scholar
  2. 2.
    Ratnaparkhi, A.: A Maximum Entropy Part-of-Speech Tagger. In: Proceedings of the Empirical Methods in Natural Language Processing Conference, Philadelphia, PA, USA (1996)Google Scholar
  3. 3.
    Brants, T.: TnT: A statistical part-of-speech tagger. In: Proceedings of the 6th Conference on Applied natural language processing, Seattle, WA, USA (2000)Google Scholar
  4. 4.
    Voutilainen, A.: A syntax-based part-of-speech analyzer. In: Proceedings of the 7th Conference on European Chapter of the ACL, Dublin, Ireland (1995)Google Scholar
  5. 5.
    Loftsson, H.: Tagging Icelandic text: A linguistic rule-based approach. Technical Report CS-06-04, Department of Computer Science, University of Sheffield (2006)Google Scholar
  6. 6.
    Schmid, H.: Improvements in Part-of-Speech Tagging with an Application to German. In: European Chapter of the ACL SIGDAT workshop, Dublin, Ireland (1995)Google Scholar
  7. 7.
    Pind, J., Magnússon, F., Briem, S.: The Icelandic Frequency Dictionary. The Institute of Lexicography at the University of Iceland, Reykjavik, Iceland (1991)Google Scholar
  8. 8.
    Helgadóttir, S.: Testing Data-Driven Learning Algorithms for PoS Tagging of Icelandic. In: Holmboe, H. (ed.) Nordisk Sprogteknologi 2004, Museum Tusculanums Forlag (2004)Google Scholar
  9. 9.
    Þráinsson, H.: Icelandic. In: König, E., Auwera, J. (eds.) The Germanic Languages. Routledge, London (1994)Google Scholar
  10. 10.
    Brill, E.: A Simple Rule-Based Part of Speech Tagger. In: Proceedings of the 3rd Conference on Applied natural language processing, Trento, Italy (1992)Google Scholar
  11. 11.
    Samuelsson, C.: Morphological tagging based entirely on Bayesian inference. In: Eklund, R. (ed.) 9th Scandinavian Conference on Computational Linguistics, Stockholm, Sweden (1994)Google Scholar
  12. 12.
    Chanod, J.P., Tapanainen, P.: Tagging French – comparing a statistical and a constraint-based method. In: Proceedings of the 7th Conference on European Chapter of the ACL Conference, Dublin, Ireland (1995)Google Scholar
  13. 13.
    Karlsson, F.: Constraint Grammar as a Framework for Parsing Running Text. In: Karlgren, H. (ed.) The 13th International Conference on Computational Linguistics, Helsinki, Finland (1990)Google Scholar
  14. 14.
    Samuelsson, C., Voutilainen, A.: Comparing a Linguistic and a Stochastic tagger. In: Proceedings of the 8th Conference on European Chapter of the ACL, Madrid, Spain (1997)Google Scholar
  15. 15.
    Hagen, K., Johannessen, J., Nøklestad, A.: A Constraint-Based Tagger for Norwegian. In: Lindberg, C.E., Lund, S.N. (eds.) 17th Scandinavian Conference on Computational Linguistics. Odense Working Papers in Language and Communication, Odense, Denmark, vol. 19, pp. 31–48 (2000)Google Scholar
  16. 16.
    Hinrichs, E., Trushkina, J.: Getting a Grip on Morphological Disambiguation. In: Proceedings of KONVENS 2002, 6. Konferenz zur Verarbeitung natürlicher Sprache, Saarbrücken, Germany (2002)Google Scholar
  17. 17.
    Ngai, G., Florian, R.: Transformation-Based Learning in the Fast Lane. In: Proceedings of the 2nd Conference of the North American Chapter of the ACL, Pittsburgh, PA, USA (2001)Google Scholar
  18. 18.
    Kouchnir, B.: Knowledge-Poor Grammatical Function Assignment for German. Seminar für Sprachwissenschaft (manuscript, 2004)Google Scholar
  19. 19.
    Müller, F.H.: Annotating Grammatical Functions in German Using Finite-State Cascades. In: 20th International Conference on Computational Linguistics, Geneva, Switzerland (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Hrafn Loftsson
    • 1
  1. 1.Department of Computer ScienceUniversity of SheffieldSheffieldUnited Kingdom

Personalised recommendations