Skip to main content

Combining Polish Morphosyntactic Taggers

  • Conference paper
Security and Intelligent Information Systems (SIIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7053))

Abstract

This paper describes work on the construction of a morpho-syntactic tagger for Polish as an ensemble of the best performing Polish taggers: TaKIPI and Pantera. The tagger set was extended with RFTagger trained on the Polish corpus. Several methods of ensemble construction were tested with the best result, in terms of the tagging error reduction, achieved with simple, unweighted voting among the three taggers. Two evaluation metrics were used, namely: weak and strong accuracy. The ensemble-based tagger presented a significant increase in both evaluation metrics, achieving nearly 94% weak correctness. This represents a one percentage point increase over the best individual tagger tested, or an error rate reduction of over 15%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Acedański, S., Gołuchowski, K.: A Morphosyntactic Rule-Based Brill Tagger for Polish. In: Proceedings of Intelligent Information Systems, pp. 67–76 (2009)

    Google Scholar 

  • Acedański, S., Przepiórkowski, A.: Towards the Adequate Evaluation of Morphosyntactic Taggers. In: Proceedings of COLING 2010 (2010)

    Google Scholar 

  • Borin, L.: Something borrowed, something blue: Rule-based combination of POS taggers. In: Proceedings of the Second International Conference on Language Resources and Evaluation, pp. 21–26 (2000)

    Google Scholar 

  • Brill, E., Wu, J.: Classifier combination for improved lexical disambiguation. In: Proceedings of COLING 1998, vol. 1, pp. 191–195. Association for Computational Linguistics (1998)

    Google Scholar 

  • Dębowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Proceedings of the International IIS: IIPWM 2004 Conference, pp. 409–413 (2004)

    Google Scholar 

  • Grefenstette, G., Tapanainen, P.: What is a word, what is a sentence? Problems of tokenization. In: Proceedings of COMPLEX 1994, Budapest (1994)

    Google Scholar 

  • Habert, B., Adda, G., Adda-Decker, M., de Mareuil, P.B., Ferrari, S., Ferret, O., Illouz, G., Paroubek, P.: Towards Tokenization Evaluation. In: Proceedings of 1st International Conference on Language Resources and Evaluation, vol. 1 (1998)

    Google Scholar 

  • Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial combination of rules and statistics: A case study in Czech tagging. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pp. 268–275. Association for Computational Linguistics (2001)

    Google Scholar 

  • Henderson, J., Brill, E.: Exploiting diversity in natural language processing: Combining parsers. In: Proceedings of the Fourth Conference on Empirical Methods in Natural Language Processing, pp. 187–194 (1999)

    Google Scholar 

  • Kuba, A., Felföldi, L., Kocsor, A.: POS tagger combinations on Hungarian text. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 191–196. Springer, Heidelberg (2005)

    Google Scholar 

  • Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of English: The Penn Treebank. Computational linguistics 19(2), 313–330 (1993)

    Google Scholar 

  • Miłkowski, M.: Developing an open-source, rule-based proofreading tool. Software: Practice and Experience 40, 543–566 (2010)

    Google Scholar 

  • Piasecki, M.: Polish Tagger TaKIPI: Rule Based Construction and Optimisation. Task Quarterly 11(1–2), 151–167 (2007)

    Google Scholar 

  • Piasecki, M., Gaweł, B.: A rule-based tagger for Polish based on Genetic Algorithm. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of IIPWM 2005. Advances in Soft Computing. Springer, Heidelberg (2005)

    Google Scholar 

  • Piasecki, M., Radziszewski, A.: Morphological Prediction for Polish by a Statistical A Tergo Index. Systems Science 34(4), 7–17 (2008)

    MATH  Google Scholar 

  • Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science, Polish Academy of Sciences, Warsaw (2004)

    Google Scholar 

  • Przepiórkowski, A.: The IPI PAN corpus in numbers. In: Proceedings of the 2nd Language & Technology Conference, Poznan, Poland (2005)

    Google Scholar 

  • Przepiórkowski, A., Woliński, M.: A flexemic tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)

    Google Scholar 

  • Schmid, H., Laws, F.: Estimation of conditional probabilities with decision trees and an application to fine-grained POS tagging. In: Proceedings of COLING 2008, vol. 1, pp. 777–784. Association for Computational Linguistics (2008)

    Google Scholar 

  • Sharoff, S.: What is at stake: a case study of Russian expressions starting with a preposition. In: Proceedings of the Workshop on Multiword Expressions: Integrating Processing, pp. 17–23. Association for Computational Linguistics (2004)

    Google Scholar 

  • Sjöbergh, J.: Combining POS-taggers for improved accuracy on Swedish text. In: Proceedings of NoDaLiDa 2003 (2003)

    Google Scholar 

  • Søgaard, A.: Ensemble-based POS tagging of Italian. In: IAAI-EVALITA, Reggio Emilia, Italy (2009)

    Google Scholar 

  • Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  • Van Halteren, H.: Performance of taggers. Syntactic Wordclass Tagging 9, 81–94 (1999)

    Article  Google Scholar 

  • Van Halteren, H., Daelemans, W., Zavrel, J.: Improving accuracy in word class tagging through the combination of machine learning systems, vol. 27, pp. 199–229. MIT Press (2001)

    Google Scholar 

  • Woliński, M.: Morfeusz — a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of IIPWM 2006, Ustroń, Poland, pp. 511–520. Springer, Berlin (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Pascal Bouvry Mieczysław A. Kłopotek Franck Leprévost Małgorzata Marciniak Agnieszka Mykowiecka Henryk Rybiński

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Śniatowski, T., Piasecki, M. (2012). Combining Polish Morphosyntactic Taggers. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds) Security and Intelligent Information Systems. SIIS 2011. Lecture Notes in Computer Science, vol 7053. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25261-7_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25261-7_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25260-0

  • Online ISBN: 978-3-642-25261-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics