Skip to main content

A Morphosyntactic Brill Tagger for Inflectional Languages

  • Conference paper
Advances in Natural Language Processing (NLP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

Abstract

In this paper we present and evaluate a Brill morphosyntactic transformation-based tagger adapted for specifics of highly inflectional languages. Multi-phase tagging with grammatical category matching transformations and lexical transformations brings significant accuracy improvements comparing to previous work. Evaluation shows the accuracy of 92.44% for the Polish language which is higher than the same metric for the other known taggers of Polish: stochastic trigram tagger (90.59%) and hybrid tagger TaKIPI employing decision tree classifier and automatically extracted rule-based tagger used for tagging the IPI PAN Corpus of Polish (91.06%).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Spoustová, D.: Combining Statistical and Rule-Based Approaches to Morphological Tagging of Czech Texts. The Prague Bulletin of Mathematical Linguistics (89), 23–40 (2008)

    Google Scholar 

  2. Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopecek, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 213–220. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  3. KarwaƄska, D., PrzepiĂłrkowski, A.: On the Evaluation of Two Polish Taggers. In: Proceedings of the 2009 PALC Conference in ƁódĆș, Frankfurt/M., Peter Lang (2009) (to appear)

    Google Scholar 

  4. Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, Morristown, NJ, USA, pp. 152–155. Association for Computational Linguistics (1992)

    Google Scholar 

  5. Przepiórkowski, A., WoliƄski, M.: A Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)

    Google Scholar 

  6. PrzepiĂłrkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)

    Google Scholar 

  7. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition, February 2008, 2nd edn. Prentice Hall, Englewood Cliffs (2008)

    Google Scholar 

  8. Piasecki, M., WardyƄski, A.: Multiclassifier Approach to Tagging of Polish. In: Proceedings of 1st International Symposium Advances in Artificial Intelligence and Applications, unknown (2006)

    Google Scholar 

  9. AcedaƄski, S., GoƂuchowski, K.: A Morphosyntactic Rule-Based Brill Tagger for Polish. In: Recent Advances in Intelligent Information Systems, Kraków, Poland, June 2009, pp. 67–76. Academic Publishing House EXIT (2009)

    Google Scholar 

  10. Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: NAACL 2001 on Language technologies, Morristown, NJ, USA, pp. 1–8. Association for Computational Linguistics (2001)

    Google Scholar 

  11. Tufis, D.: Tiered Tagging and Combined Language Models Classifiers. In: Matouơek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  12. Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI 1994: Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park, CA, USA, vol. 1, pp. 722–727. American Association for Artificial Intelligence, Standford (1994)

    Google Scholar 

  13. Megyesi, B.: Improving Brill’s POS tagger for an agglutinative language. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 275–284 (1999)

    Google Scholar 

  14. Gabriel, E., et al.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: KranzlmĂŒller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97–104. Springer, Heidelberg (2004)

    Google Scholar 

  15. Przepiówski, A., Górski, R.L., Lewandowska-Tomaszyk, B., ƁaziƄski, M.: Towards the National Corpus of Polish. In: Proceedings of the Sixth International Language Resources and Evaluation (2008)

    Google Scholar 

  16. AcedaƄski, S., Przepiórkowski, A.: Towards the Adequate Evaluation of Morphosyntactic Taggers. In: Proceedings of the 23rd International Conference on Computational Linguistics (2010) (to appear)

    Google Scholar 

  17. Dębowski, Ɓ.: Trigram morphosyntactic tagger for Polish. In: Intelligent Information Systems. Advances in Soft Computing, pp. 409–413. Springer, Heidelberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

AcedaƄski, S. (2010). A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., HelgadĂłttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14770-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14769-2

  • Online ISBN: 978-3-642-14770-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics