Abstract
In this paper we present and evaluate a Brill morphosyntactic transformation-based tagger adapted for specifics of highly inflectional languages. Multi-phase tagging with grammatical category matching transformations and lexical transformations brings significant accuracy improvements comparing to previous work. Evaluation shows the accuracy of 92.44% for the Polish language which is higher than the same metric for the other known taggers of Polish: stochastic trigram tagger (90.59%) and hybrid tagger TaKIPI employing decision tree classifier and automatically extracted rule-based tagger used for tagging the IPI PAN Corpus of Polish (91.06%).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
SpoustovĂĄ, D.: Combining Statistical and Rule-Based Approaches to Morphological Tagging of Czech Texts. The Prague Bulletin of Mathematical Linguistics (89), 23â40 (2008)
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopecek, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 213â220. Springer, Heidelberg (2006)
KarwaĆska, D., PrzepiĂłrkowski, A.: On the Evaluation of Two Polish Taggers. In: Proceedings of the 2009 PALC Conference in ĆĂłdĆș, Frankfurt/M., Peter Lang (2009) (to appear)
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, Morristown, NJ, USA, pp. 152â155. Association for Computational Linguistics (1992)
PrzepiĂłrkowski, A., WoliĆski, M.: A Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)
PrzepiĂłrkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing. In: Computational Linguistics and Speech Recognition, February 2008, 2nd edn. Prentice Hall, Englewood Cliffs (2008)
Piasecki, M., WardyĆski, A.: Multiclassifier Approach to Tagging of Polish. In: Proceedings of 1st International Symposium Advances in Artificial Intelligence and Applications, unknown (2006)
AcedaĆski, S., GoĆuchowski, K.: A Morphosyntactic Rule-Based Brill Tagger for Polish. In: Recent Advances in Intelligent Information Systems, KrakĂłw, Poland, June 2009, pp. 67â76. Academic Publishing House EXIT (2009)
Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: NAACL 2001 on Language technologies, Morristown, NJ, USA, pp. 1â8. Association for Computational Linguistics (2001)
Tufis, D.: Tiered Tagging and Combined Language Models Classifiers. In: MatouĆĄek, V., Mautner, P., OcelĂkovĂĄ, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28â33. Springer, Heidelberg (1999)
Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI 1994: Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park, CA, USA, vol. 1, pp. 722â727. American Association for Artificial Intelligence, Standford (1994)
Megyesi, B.: Improving Brillâs POS tagger for an agglutinative language. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 275â284 (1999)
Gabriel, E., et al.: Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In: KranzlmĂŒller, D., Kacsuk, P., Dongarra, J. (eds.) EuroPVM/MPI 2004. LNCS, vol. 3241, pp. 97â104. Springer, Heidelberg (2004)
PrzepiĂłwski, A., GĂłrski, R.L., Lewandowska-Tomaszyk, B., ĆaziĆski, M.: Towards the National Corpus of Polish. In: Proceedings of the Sixth International Language Resources and Evaluation (2008)
AcedaĆski, S., PrzepiĂłrkowski, A.: Towards the Adequate Evaluation of Morphosyntactic Taggers. In: Proceedings of the 23rd International Conference on Computational Linguistics (2010) (to appear)
DÄbowski, Ć.: Trigram morphosyntactic tagger for Polish. In: Intelligent Information Systems. Advances in Soft Computing, pp. 409â413. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
AcedaĆski, S. (2010). A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., HelgadĂłttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-14770-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14769-2
Online ISBN: 978-3-642-14770-8
eBook Packages: Computer ScienceComputer Science (R0)