Advertisement

Improving Part-of-Speech Tagging by Meta-learning

  • Łukasz Kobyliński
  • Michał Wasiluk
  • Grzegorz Wojdyga
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

Recently, we have observed a rapid progress in the state of Part of Speech tagging for Polish. Thanks to PolEval—a shared task organized in late 2017—many new approaches to this problem have been proposed. New deep learning paradigms have helped to narrow the gap between the accuracy of POS tagging methods for Polish and for English. Still, the number of errors made by the taggers on large corpora is very high, as even the currently best performing tagger reaches an accuracy of ca. 94.5%, which translates to millions of errors in a billion-word corpus.

To further improve the accuracy of Polish POS tagging we propose to employ a meta-learning approach on top of several existing taggers. This meta-learning approach is inspired by the fact that the taggers, while often similar in terms of accuracy, make different errors, which leads to a conclusion that some of the methods are better in specific contexts than the others. We thus train a machine learning method that captures the relationship between a particular tagger accuracy and language context and in this way create a model, which makes a selection between several taggers in each context to maximize the expected tagging accuracy.

Keywords

Part-of-speech tagging Meta learning Natural language processing 

References

  1. 1.
    Acedański, S.: A morphosyntactic brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS (LNAI), vol. 6233, pp. 3–14. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14770-8_3CrossRefGoogle Scholar
  2. 2.
    Dȩbowski, Ł.: Trigram morphosyntactic tagger for Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) IIPWM 2004. AINSC, vol. 25, pp. 409–413. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-39985-8_43CrossRefGoogle Scholar
  3. 3.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009).  https://doi.org/10.1145/1656274.1656278CrossRefGoogle Scholar
  4. 4.
    Kobyliński, Ł.: PoliTa: A multitagger for Polish, pp. 2949–2954. ELRA, Reykjavík (2014). http://www.lrec-conf.org/proceedings/lrec2014/index.html
  5. 5.
    Krasnowska, K.: Morphosyntactic disambiguation for Polish with bi-LSTM neural networks. In: Vetulani [12]Google Scholar
  6. 6.
    Młodzki, R., Przepiórkowski, A.: The WSD development environment. In: Vetulani, Z. (ed.) LTC 2009. LNCS (LNAI), vol. 6562, pp. 224–233. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20095-3_21CrossRefGoogle Scholar
  7. 7.
    Piasecki, M.: Polish tagger TaKIPI: rule based construction and optimisation. Task Q. 11(1–2), 151–167 (2007)Google Scholar
  8. 8.
    Przepiórkowski, A., Bańko, M., Górski, R., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Jȩzyka Polskiego. Warszawa (2012)Google Scholar
  9. 9.
    Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions, vol. 467, pp. 215–230. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-35647-6_16CrossRefGoogle Scholar
  10. 10.
    Radziszewski, A., Acedański, S.: Taggers gonna tag: an argument against evaluating disambiguation capacities of morphosyntactic taggers. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS (LNAI), vol. 7499, pp. 81–87. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-32790-2_9CrossRefGoogle Scholar
  11. 11.
    Radziszewski, A., Śniatowski, T.: A memory-based tagger for Polish. In: Proceedings of the LTC (2011)Google Scholar
  12. 12.
    Vetulani, Z. (ed.): Proceedings of the 8th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Poznań, Poland, 17–19 November 2017Google Scholar
  13. 13.
    Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2789–2804 (2012)Google Scholar
  14. 14.
    Wróbel, K.: KRNNT: Polish recurrent neural network tagger. In: Vetulani [12]Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Łukasz Kobyliński
    • 1
  • Michał Wasiluk
    • 1
  • Grzegorz Wojdyga
    • 1
  1. 1.Institute of Computer Science, Polish Academy of SciencesWarszawaPoland

Personalised recommendations