Multi-pass Sieve Coreference Resolution System for Polish

Nitoń, Bartłomiej; Ogrodniczuk, Maciej

doi:10.1007/978-3-319-59888-8_20

Bartłomiej Nitoń¹⁹ &
Maciej Ogrodniczuk¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Included in the following conference series:

International Conference on Language, Data and Knowledge

1248 Accesses

Abstract

This paper examines the portability of Stanford’s multi-pass rule-based sieve coreference resolution system to inflectional language (Polish) with a different annotation scheme. The presented system is implemented in BART, a modular toolkit later adapted to the sieve architecture by Baumann et al. The sieves for Polish include processing of zero subjects and experimental knowledge-intensive sieve using the newly created database of periphrastic expressions. Evaluation shows that the results for Polish are higher than those seen on the CoNLL-2011/2012 data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://zil.ipipan.waw.pl/Scoreference.
2.
http://zil.ipipan.waw.pl/PCC.
3.
http://nkjp.pl.
4.
See http://nkjp.pl/poliqarp/help/en.html for a concise tag descriptions.
5.
http://sgjp.pl/morfeusz/.
6.
http://zil.ipipan.waw.pl/PANTERA.
7.
Polish stop-words list was taken from the Polish Wikipedia stop-words list https://pl.wikipedia.org/wiki/Wikipedia:Stopwords.
8.
http://sjp.pl/.
9.
http://plwordnet.pwr.wroc.pl/.
10.
https://www.wikidata.org/.
11.
http://szarada.net/.

References

Acedański, S.: A morphosyntactic Brill Tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14770-8_3
Chapter Google Scholar
Bagga, A., Baldwin, B.: Algorithms for scoring coreference chains. In: Proceedings of the 1st International Conference on Language Resources and Evaluation Workshop on Linguistics Coreference, pp. 563–566 (1998)
Google Scholar
Baumann, J., Kühling, X., Ruder, S.: Rule-based coreference resolution with BART, Technical poster (2014). http://www.cl.uni-heidelberg.de/studies/projects/poster/baumann_kuehling_ruder_poster.pdf
Chen, C., Ng, V.: Combining the best of two worlds: a hybrid approach to multilingual coreference resolution. In: Proceedings of the Shared Task on Joint Conference on EMNLP and CoNLL, pp. 56–63 (2012)
Google Scholar
Durrett, G., Klein, D.: Easy victories and uphill battles in coreference resolution. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1971–1982. Association for Computational Linguistics, Seattle (2013). http://www.aclweb.org/anthology/D13-1203
Haghighi, A., Klein, D.: Unsupervised coreference resolution in a nonparametric Bayesian model. In: Carroll, J.A., van den Bosch, A., Zaenen, A. (eds.) Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 848–855. The Association for Computational Linguistics (2007)
Google Scholar
Haghighi, A., Klein, D.: Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), vol. 3, pp. 1152–1161. Association for Computational Linguistics (2009)
Google Scholar
Kopeć, M.: Zero subject detection for Polish. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, Short Papers, vol. 2, pp. 221–225. Association for Computational Linguistics, Gothenburg (2014)
Google Scholar
Kopeć, M., Ogrodniczuk, M.: Creating a coreference resolution system for Polish. In: Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012, pp. 192–195. ELRA, Istanbul (2012)
Google Scholar
Krug, M., Puppe, F., Jannidis, F., Macharowsky, L., Reger, I., Weimar, L.: Rule-based coreference resolution in German historic novels. In: Proceedings of the 4th Workshop on Computational Linguistics for Literature, pp. 98–104. Association for Computational Linguistics, Denver, June 2015. http://www.aclweb.org/anthology/W15-0711
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. In: Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. Association for Computational Linguistics (2011)
Google Scholar
Luo, X.: On coreference resolution performance metrics. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT 2005, pp. 25–32. Association for Computational Linguistics, Vancouver (2005). http://dx.doi.org/10.3115/1220575.1220579
Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference, Matsue, Japan, January 2012
Google Scholar
Ng, V.: Supervised noun phrase coreference research: the first fifteen years. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1396–1411. Association for Computational Linguistics, Stroudsburg (2010)
Google Scholar
Ogrodniczuk, M.: Discovery of common nominal facts for coreference resolution: proof of concept. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS, vol. 8284, pp. 709–716. Springer, Cham (2013). doi:10.1007/978-3-319-03844-5_69
Chapter Google Scholar
Ogrodniczuk, M., Głowińska, K., Kopeć, M., Savary, A., Zawisławska, M.: Coreference in Polish: Annotation, Resolution and Evaluation. Walter De Gruyter (2015). http://www.degruyter.com/view/product/428667
Ogrodniczuk, M., Kopeć, M.: Rule-based coreference resolution module for Polish. In: Proceedings of the 8th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC 2011), Faro, Portugal, pp. 191–200 (2011)
Google Scholar
Ponzetto, S.P., Strube, M.: Exploiting semantic role labeling, WordNet and wikipedia for coreference resolution. In: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference, pp. 192–199. Association for Computational Linguistics, New York (2006). http://www.aclweb.org/anthology/N06-1025
Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. 30(1), 181–212 (2007)
MATH Google Scholar
Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R., Xue, N.: CoNLL-2011 shared task: modeling unrestricted coreference in ontonotes. In: Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, CONLL Shared Task 2011, pp. 1–27. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2132936.2132937
Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Jȩzyka Polskiego [Eng.: National Corpus of Polish]. Wydawnictwo Naukowe PWN, Warsaw (2012)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., Manning, C.: A multi-pass sieve for coreference resolution. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 492–501. Association for Computational Linguistics (2010)
Google Scholar
Rahman, A., Ng, V.: Coreference resolution with world knowledge. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 814–824 (2011)
Google Scholar
Ratinov, L., Roth, D.: Learning-based multi-sieve co-reference resolution with knowledge. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012, pp. 1234–1244. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2390948.2391088
Stoyanov, V., Eisner, J.: Easy-first coreference resolution. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2519–2534 (2012)
Google Scholar
Versley, Y., Ponzetto, S.P., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X., Moschitti, A.: BART: a modular toolkit for coreference resolution. In: Association for Computational Linguistics (ACL) Demo Session (2008)
Google Scholar
Vilain, M., Burger, J., Aberdeen, J., Connolly, D., Hirschman, L.: A model-theoretic coreference scoring scheme. In: Proceedings of the 6th Message Understanding Conference (MUC-6), pp. 45–52 (1995)
Google Scholar
Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 2789–2804 (2012)
Google Scholar
Woliéski, M.: Morfeusz reloaded. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pp. 1106–1111. ELRA, Reykjavík (2014). http://www.lrec-conf.org/proceedings/lrec2014/index.html

Download references

Acknowledgements

The work reported here was carried out within the research project financed by the Polish National Science Centre (contract number 2014/15/B/HS2/03435) and as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Bartłomiej Nitoń & Maciej Ogrodniczuk

Authors

Bartłomiej Nitoń
View author publications
You can also search for this author in PubMed Google Scholar
Maciej Ogrodniczuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Ogrodniczuk .

Editor information

Editors and Affiliations

Universidad Politécnica de Madrid, Madrid, Spain
Jorge Gracia
Nanyang Technological University, Singapore, Singapore
Francis Bond
Insight Centre for Data Analytics, National University of Ireland, Galway, Galway, Ireland
John P. McCrae
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
Paul Buitelaar
Goethe-University Frankfurt, Frankfurt, Germany
Christian Chiarcos
University of Leipzig, Leipzig, Germany
Sebastian Hellmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nitoń, B., Ogrodniczuk, M. (2017). Multi-pass Sieve Coreference Resolution System for Polish. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-59888-8_20
Published: 27 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59887-1
Online ISBN: 978-3-319-59888-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics