Heuristic Algorithm for Zero Subject Detection in Polish

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)

Abstract

This article describes a heuristic approach to zero subject detection in Polish. It focuses on the zero subject detection as a crucial step in end-to-end coreference resolution. The zero subject verbs are recognized using a set of manually created rules utilizing information from different sources, including: a dependency parser, a shallow relational parser and a valence dictionary. The rules were developed and evaluated on the Polish Coreference Corpus. The experimental results show that the presented method significantly outperforms the only machine learning-based alternative for Polish, i.e., MentionDetector. We also discuss and evaluate the importance of zero subject detection for existing coreference resolution tools for Polish.

Keywords

Zero subject Anaphora detection Coreference resolution Polish 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: towards a free corpus of Polish. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of LREC 2012, Istanbul, Turkey. ELRA (2012)Google Scholar
  2. 2.
    Chomsky, N.: Lectures on government and binding. In: The Pisa Lectures. Foris Publications, Holland (1981)Google Scholar
  3. 3.
    Russo, L., Loáiciga, S., Gulati, A.: Improving machine translation of null subjects in italian and spanish. In: Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France, pp. 81–89. Association for Computational Linguistics, April 2012Google Scholar
  4. 4.
    Rello, L., Ferraro, G., Gayo, I.: A first approach to the automatic detection of zero subjects and impersonal constructions in portuguese. Procesamiento del Lenguaje Natural 49, 163–170 (2012)Google Scholar
  5. 5.
    Mihăilă, C., Ilisei, I., Inkpen, D.: Zero pronominal anaphora resolution for the romanian languageGoogle Scholar
  6. 6.
    Kopeć, M.: Zero subject detection for Polish. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Short Papers, Gothenburg, Sweden, vol. 2, pp. 221–225. Association for Computational Linguistics (2014)Google Scholar
  7. 7.
    Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Polish coreference corpus. In: Vetulani, Z. (ed.) Proceedings of the 6th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poznań, Poland, Wydawnictwo Poznańskie, Fundacja Uniwersytetu im, pp. 494–498. Adama Mickiewicza (2013)Google Scholar
  8. 8.
    Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  9. 9.
    Przepiórkowski, A., Hajnicz, E., Patejuk, A., Woliński, M., Skwarski, F., Świdziński, M.: Walenty: towards a comprehensive valence dictionary of polish. In: Chair, N.C.C., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), Reykjavik, Iceland. European Language Resources Association (ELRA), May 2014Google Scholar
  10. 10.
    Nivre, J., Hall, J., Nilsson, J.: Maltparser: a data-driven parser-generator for dependency parsing. In: Proc. of LREC-2006, pp. 2216–2219 (2006)Google Scholar
  11. 11.
    Wróblewska, A.: Polish dependency bank. Linguistic Issues in Language Technology 7(1) (2012)Google Scholar
  12. 12.
    Radziszewski, A., Orłowicz, P., Broda, B.: Classification of predicate-argument relations in Polish data. In: Kłopotek, M.A., Koronacki, J., Marciniak, M., Mykowiecka, A., Wierzchoń, S.T. (eds.) IIS 2013. LNCS, vol. 7912, pp. 28–38. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  13. 13.
    Ogrodniczuk, M., Kopeć, M.: Rule-based coreference resolution module for Polish. In: Proceedings of the 8th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2011), Faro, Portugal, pp. 191–200 (2011)Google Scholar
  14. 14.
    Kopeć, M., Ogrodniczuk, M.: Creating a coreference resolution system for Polish. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, pp. 192–195. ELRA (2012)Google Scholar
  15. 15.
    Broda, B., Burdka, L., Maziarz, M.: IKAR: an improved kit for anaphora resolution for Polish. In: Proceedings of COLING 2012: Demonstration Papers, Mumbai, India, pp. 25–32. The COLING 2012 Organizing Committee, December 2012Google Scholar
  16. 16.
    Marcińczuk, M., Kocoń, J., Janicki, M.: Liner2 — a customizable framework for proper names recognition for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intell. Tools for Building a Scientific Information. SCI, vol. 467, pp. 231–254. Springer, Heidelberg (2013) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Computational Intelligence Reserach Group, Institute of Computer ScienceUniversity of WrocławWrocławPoland
  2. 2.G4.19 Research Group: Computational Linguistics and Language Technology, Department of Computational IntelligenceWrocław University of TechnologyWrocławPoland

Personalised recommendations