Advertisement

A Morphosyntactic Brill Tagger for Inflectional Languages

  • Szymon Acedański
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6233)

Abstract

In this paper we present and evaluate a Brill morphosyntactic transformation-based tagger adapted for specifics of highly inflectional languages. Multi-phase tagging with grammatical category matching transformations and lexical transformations brings significant accuracy improvements comparing to previous work. Evaluation shows the accuracy of 92.44% for the Polish language which is higher than the same metric for the other known taggers of Polish: stochastic trigram tagger (90.59%) and hybrid tagger TaKIPI employing decision tree classifier and automatically extracted rule-based tagger used for tagging the IPI PAN Corpus of Polish (91.06%).

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Spoustová, D.: Combining Statistical and Rule-Based Approaches to Morphological Tagging of Czech Texts. The Prague Bulletin of Mathematical Linguistics (89), 23–40 (2008)Google Scholar
  2. 2.
    Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopecek, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 213–220. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Karwańska, D., Przepiórkowski, A.: On the Evaluation of Two Polish Taggers. In: Proceedings of the 2009 PALC Conference in Łódź, Frankfurt/M., Peter Lang (2009) (to appear)Google Scholar
  4. 4.
    Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, Morristown, NJ, USA, pp. 152–155. Association for Computational Linguistics (1992)Google Scholar
  5. 5.
    Przepiórkowski, A., Woliński, M.: A Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)Google Scholar
  6. 6.
    Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)Google Scholar
  7. 7.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition, February 2008, 2nd edn. Prentice Hall, Englewood Cliffs (2008)Google Scholar
  8. 8.
    Piasecki, M., Wardyński, A.: Multiclassifier Approach to Tagging of Polish. In: Proceedings of 1st International Symposium Advances in Artificial Intelligence and Applications, unknown (2006)Google Scholar
  9. 9.
    Acedański, S., Gołuchowski, K.: A Morphosyntactic Rule-Based Brill Tagger for Polish. In: Recent Advances in Intelligent Information Systems, Kraków, Poland, June 2009, pp. 67–76. Academic Publishing House EXIT (2009)Google Scholar
  10. 10.
    Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: NAACL 2001 on Language technologies, Morristown, NJ, USA, pp. 1–8. Association for Computational Linguistics (2001)Google Scholar
  11. 11.
    Tufis, D.: Tiered Tagging and Combined Language Models Classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  12. 12.
    Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI 1994: Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park, CA, USA, vol. 1, pp. 722–727. American Association for Artificial Intelligence, Standford (1994)Google Scholar
  13. 13.
    Megyesi, B.: Improving Brill’s POS tagger for an agglutinative language. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 275–284 (1999)Google Scholar
  14. 14.
    Gabriel, E., et al.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: Kranzlmüller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)Google Scholar
  15. 15.
    Przepiówski, A., Górski, R.L., Lewandowska-Tomaszyk, B., Łaziński, M.: Towards the National Corpus of Polish. In: Proceedings of the Sixth International Language Resources and Evaluation (2008)Google Scholar
  16. 16.
    Acedański, S., Przepiórkowski, A.: Towards the Adequate Evaluation of Morphosyntactic Taggers. In: Proceedings of the 23rd International Conference on Computational Linguistics (2010) (to appear)Google Scholar
  17. 17.
    Dębowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Intelligent Information Systems. Advances in Soft Computing, pp. 409–413. Springer, Heidelberg (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Szymon Acedański
    • 1
    • 2
  1. 1.Institute of InformaticsUniversity of WarsawWarszawaPoland
  2. 2.Institute of Computer SciencePolish Academy of SciencesWarszawaPoland

Personalised recommendations