A Tiered CRF Tagger for Polish

Radziszewski, Adam

doi:10.1007/978-3-642-35647-6_16

Adam Radziszewski⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 467))

965 Accesses
31 Citations

Abstract

In this paper we present a new approach to morphosyntactic tagging of Polish by bringing together Conditional Random Fields and tiered tagging. Our proposal also allows to take advantage of a rich set of morphological features, which resort to an external morphological analyser. The proposed algorithm is implemented as a tagger for Polish. Evaluation of the tagger shows significant improvement in tagging accuracy on two state-of-the-art taggers, namely PANTERA and WMBT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)
Chapter Google Scholar
Brill, E.: A simple rule-based part of speech tagger. In: Proceedings of the Third Conference on Applied Natural Language Processing, pp. 152–155. Association for Computational Linguistics, Morristown (1992)
Chapter Google Scholar
Cohn, T.: Scaling conditional random fields for natural language processing. PhD thesis, Department of Computer Science and Software Engineering, University of Melbourne, Australia (2007)
Google Scholar
Erjavec, T.: MULTEXT-East: morphosyntactic resources for Central and Eastern European languages. Language Resources and Evaluation 46(1), 131–142 (2012)
Article Google Scholar
Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial combination of rules and statistics: A case study in Czech tagging. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 268–275. Association for Computational Linguistics (2001)
Google Scholar
Kudo, T.: CRF++: Yet another CRF toolkit (2005), User’s manual and implementation available at http://crfpp.googlecode.com/svn/trunk/doc/index.html
Kuta, M.: Tagging and Corpus based Methods for improving Natural Language Processing of Polish. PhD thesis, Wydział Elektrotechniki, Automatyki, Informatyki i Elektroniki, Akademia Górniczo-Hutnicza, Kraków (2010)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001 (2001)
Google Scholar
Lehnen, P., Hahn, S., Ney, H., Mykowiecka, A.: Large-scale Polish SLU. In: Interspeech, Brighton, UK, pp. 2723–2726 (2009)
Google Scholar
Marcińczuk, M., Janicki, M.: Optimizing CRF-Based Model for Proper Name Recognition in Polish Texts. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 258–269. Springer, Heidelberg (2012)
Chapter Google Scholar
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 213–220. Springer, Heidelberg (2006)
Chapter Google Scholar
Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)
Google Scholar
Przepiórkowski, A.: The IPI PAN Corpus in numbers. In: Vetulani, Z. (ed.) Proceedings of the 2nd Language & Technology Conference, Poznań, Poland (2005)
Google Scholar
Przepiórkowski, A.: A comparison of two morphosyntactic tagsets of Polish. In: Koseska-Toszewa, V., Dimitrova, L., Roszko, R. (eds.) Representing Semantics in Digital Lexicography: Proceedings of MONDILEX Fourth Open Workshop, Warsaw, pp. 138–144 (2009)
Google Scholar
Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)
Google Scholar
Przepiórkowski, A., Górski, R.L., łaziński, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: Proceedings of the Seventh International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta. ELRA (2010)
Google Scholar
Przepiórkowski, A., Murzynowski, G.: Manual annotation of the National Corpus of Polish with Anotatornia. In: Goźdź Roszkowski, S. (ed.) The Proceedings of Practical Applications in Language and Computers, PALC 2009, Frankfurt, Germany. Peter Lang (2009)
Google Scholar
Przepiórkowski, A., Woliński, M.: The unbearable lightness of tagging: A case study in morphosyntactic tagging of Polish. In: Proceedings of the 4th International Workshop on Linguistically Interpreted Corpora (LINC 2003), EACL 2003 (2003)
Google Scholar
Radziszewski, A.: Treatment of unknown words in WMBT. Wrocław University of Technology (2012), http://nlp.pwr.wroc.pl/redmine/projects/wmbt/wiki/Guessing
Radziszewski, A., Acedański, S.: Taggers Gonna Tag: An Argument against Evaluating Disambiguation Capacities of Morphosyntactic Taggers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 81–87. Springer, Heidelberg (2012)
Chapter Google Scholar
Radziszewski, A., Pawlaczek, A.: Large-Scale Experiments with NP Chunking of Polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 143–149. Springer, Heidelberg (2012)
Chapter Google Scholar
Radziszewski, A., Śniatowski, T.: Maca — a configurable tool to integrate Polish morphological data. In: Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation (2011)
Google Scholar
Radziszewski, A., Śniatowski, T.: A memory-based tagger for Polish. In: Proceedings of the 5th Language & Technology Conference, Poznań (2011)
Google Scholar
Radziszewski, A., Wardyński, A., Śniatowski, T.: WCCL: A morpho-syntactic feature toolkit. In: Proceedings of the Balto-Slavonic Natural Language Processing Workshop. Springer (2011)
Google Scholar
Sutton, C., McCallum, A.: An introduction to conditional random fields. In: Foundations and Trends in Machine Learning (2011)
Google Scholar
Tufiş, D.: Tiered Tagging and Combined Language Models Classifiers. In: Matoušek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds.) TSD 1999. LNCS (LNAI), vol. 1692, pp. 28–33. Springer, Heidelberg (1999)
Chapter Google Scholar
Vidová-Hladká, B.: Czech Language Tagging. PhD thesis, Uniwersytet Karola, Wydział Matematyki i Fizyki, Praga (2000)
Google Scholar
Wallach, H.M.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania, USA (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Wrocław University of Technology, Wrocław, Poland
Adam Radziszewski

Authors

Adam Radziszewski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Robert Bembenik
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Lukasz Skonieczny
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Henryk Rybinski
, Institute of Computer Science, Warsaw University of Technology, ul. Nowowiejska 15/19, Warszawa, 00-665, Poland
Marzena Kryszkiewicz
, Interdisciplinary Centre for, University of Warsaw, Pawińskiego 5a bl. D, Warsaw, 02-106, Poland
Marek Niezgodka

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Radziszewski, A. (2013). A Tiered CRF Tagger for Polish. In: Bembenik, R., Skonieczny, L., Rybinski, H., Kryszkiewicz, M., Niezgodka, M. (eds) Intelligent Tools for Building a Scientific Information Platform. Studies in Computational Intelligence, vol 467. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35647-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-35647-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35646-9
Online ISBN: 978-3-642-35647-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics