Advertisement

Multi-pass Sieve Coreference Resolution System for Polish

  • Bartłomiej Nitoń
  • Maciej Ogrodniczuk
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10318)

Abstract

This paper examines the portability of Stanford’s multi-pass rule-based sieve coreference resolution system to inflectional language (Polish) with a different annotation scheme. The presented system is implemented in BART, a modular toolkit later adapted to the sieve architecture by Baumann et al. The sieves for Polish include processing of zero subjects and experimental knowledge-intensive sieve using the newly created database of periphrastic expressions. Evaluation shows that the results for Polish are higher than those seen on the CoNLL-2011/2012 data.

Keywords

Coreference resolution BART The Stanford’s multi-pass sieve architecture Polish language Knowledge-based resources 

Notes

Acknowledgements

The work reported here was carried out within the research project financed by the Polish National Science Centre (contract number 2014/15/B/HS2/03435) and as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

References

  1. 1.
    Acedański, S.: A morphosyntactic Brill Tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-14770-8_3 CrossRefGoogle Scholar
  2. 2.
    Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the 1st International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pp. 563–566 (1998)Google Scholar
  3. 3.
    Baumann, J., Kühling, X., Ruder, S.: Rule-based coreference resolution with BART, Technical poster (2014). http://www.cl.uni-heidelberg.de/studies/projects/poster/baumann_kuehling_ruder_poster.pdf
  4. 4.
    Chen, C., Ng, V.: Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. In: Proceedings of the Shared Task on Joint Conference on EMNLP and CoNLL, pp. 56–63 (2012)Google Scholar
  5. 5.
    Durrett, G., Klein, D.: Easy victories and uphill battles in coreference resolution. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1971–1982. Association for Computational Linguistics, Seattle (2013). http://www.aclweb.org/anthology/D13-1203
  6. 6.
    Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: Carroll, J.A., van den Bosch, A., Zaenen, A. (eds.) Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 848–855. The Association for Computational Linguistics (2007)Google Scholar
  7. 7.
    Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), vol. 3, pp. 1152–1161. Association for Computational Linguistics (2009)Google Scholar
  8. 8.
    Kopeć, M.: Zero subject detection for Polish. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Short Papers, vol. 2, pp. 221–225. Association for Computational Linguistics, Gothenburg (2014)Google Scholar
  9. 9.
    Kopeć, M., Ogrodniczuk, M.: Creating a coreference resolution system for Polish. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 192–195. ELRA, Istanbul (2012)Google Scholar
  10. 10.
    Krug, M., Puppe, F., Jannidis, F., Macharowsky, L., Reger, I., Weimar, L.: Rule-based coreference resolution in German historic novels. In: Proceedings of the 4th Workshop on Computational Linguistics for Literature, pp. 98–104. Association for Computational Linguistics, Denver, June 2015. http://www.aclweb.org/anthology/W15-0711
  11. 11.
    Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. Association for Computational Linguistics (2011)Google Scholar
  12. 12.
    Luo, X.: On coreference resolution performance metrics. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 25–32. Association for Computational Linguistics, Vancouver (2005). http://dx.doi.org/10.3115/1220575.1220579
  13. 13.
    Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference, Matsue, Japan, January 2012Google Scholar
  14. 14.
    Ng, V.: Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1396–1411. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  15. 15.
    Ogrodniczuk, M.: Discovery of common nominal facts for coreference resolution: proof of concept. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 709–716. Springer, Cham (2013). doi: 10.1007/978-3-319-03844-5_69 CrossRefGoogle Scholar
  16. 16.
    Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Coreference in Polish: Annotation, Resolution and Evaluation. Walter De Gruyter (2015). http://www.degruyter.com/view/product/428667
  17. 17.
    Ogrodniczuk, M., Kopeć, M.: Rule-based coreference resolution module for Polish. In: Proceedings of the 8th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2011), Faro, Portugal, pp. 191–200 (2011)Google Scholar
  18. 18.
    Ponzetto, S.P., Strube, M.: Exploiting semantic role labeling, WordNet and wikipedia for coreference resolution. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 192–199. Association for Computational Linguistics, New York (2006). http://www.aclweb.org/anthology/N06-1025
  19. 19.
    Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. 30(1), 181–212 (2007)zbMATHGoogle Scholar
  20. 20.
    Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task 2011, pp. 1–27. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2132936.2132937
  21. 21.
    Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Jȩzyka Polskiego [Eng.: National Corpus of Polish]. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  22. 22.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)Google Scholar
  23. 23.
    Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 492–501. Association for Computational Linguistics (2010)Google Scholar
  24. 24.
    Rahman, A., Ng, V.: Coreference resolution with world knowledge. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 814–824 (2011)Google Scholar
  25. 25.
    Ratinov, L., Roth, D.: Learning-based multi-sieve co-reference resolution with knowledge. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 1234–1244. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2390948.2391088
  26. 26.
    Stoyanov, V., Eisner, J.: Easy-first coreference resolution. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2519–2534 (2012)Google Scholar
  27. 27.
    Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A.: BART: a modular toolkit for coreference resolution. In: Association for Computational Linguistics (ACL) Demo Session (2008)Google Scholar
  28. 28.
    Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52 (1995)Google Scholar
  29. 29.
    Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2789–2804 (2012)Google Scholar
  30. 30.
    Woliéski, M.: Morfeusz reloaded. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pp. 1106–1111. ELRA, Reykjavík (2014). http://www.lrec-conf.org/proceedings/lrec2014/index.html

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations