Preliminary Study on Automatic Recognition of Spatial Expressions in Polish Texts

Marcińczuk, Michał; Oleksy, Marcin; Wieczorek, Jan

doi:10.1007/978-3-319-45510-5_18

Preliminary Study on Automatic Recognition of Spatial Expressions in Polish Texts

Michał Marcińczuk¹⁷,
Marcin Oleksy¹⁷ &
Jan Wieczorek¹⁷

Conference paper
First Online: 03 September 2016

1690 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Abstract

In the paper we cover the problem of spatial expression recognition in text for Polish language. A spatial expression is a text fragment which describes a relative location of two or more physical objects to each other. The first part of the paper treats about a Polish corpus annotated with spatial expressions and annotators agreement. In the second part we analyse the feasibility of spatial expression recognition by overviewing relevant tools and resources for text processing for Polish. Then we present a knowledge-based approach which utilizes the existing tools and resources for Polish, including: a morpho-syntactic tagger, shallow parsers, a dependency parser, a named entity recognizer, a general ontology, a wordnet and a wordnet to ontology mapping. We also present a dedicated set of manually created syntactic and semantic patterns for generating and filtering candidates of spatial expressions. In the last part we discuss the results obtained on the reference corpus with the proposed method and present detailed error analysis.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kolomiyets, O., Kordjamshidi, P., Bethard, S., Moens, M.: SemEval-2013 task 3: spatial role labeling. In: Second Joint Conference on Lexical and Computational Semantics (SEM). Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, USA. ACL, East Stroudsburg (2013)
Google Scholar
LDC: ACE (Automatic Content Extraction) English Annotation Guidelines for Relations. Argument (2008)
Google Scholar
Broda, B., Marcińczuk, M., Maziarz, M., Radziszewski, A., Wardyński, A.: KPWr: towards a free corpus of Polish. In: Calzolari, N., Choukri, K., Declerck, T., Doğan, M.U., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey. European Language Resources Association (ELRA), May 2012
Google Scholar
Radziszewski, A.: A tiered CRF tagger for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information. SCI, vol. 467, pp. 215–230. Springer, Heidelberg (2013)
Chapter Google Scholar
Waszczuk, J.: Harnessing the CRF complexity with domain-specific constraints. The case of morphosyntactic tagging of a highly inflected language. In: Proceedings of COLING 2012, no. December 2012, pp. 2789–2804 (2012)
Google Scholar
Acedański, S.: A morphosyntactic Brill tagger for inflectional languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)
Chapter Google Scholar
Kaczmarek, A., Marcińczuk, M.: Heuristic algorithm for zero subject detection in Polish. In: Král, P., Matoušek, V. (eds.) TSD 2015. LNCS, vol. 9302, pp. 378–386. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24033-6_43
Chapter Google Scholar
Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Problemy współczesnej nauki, teoria i zastosowania: Inżynieria lingwistyczna. Akademicka Oficyna Wydawnicza “Exit” (2008)
Google Scholar
Głowińska, K.: Anotacja składniowa NKJP. In: Przepiórkowski, A., Bańko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.) Narodowy Korpus Języka Polskiego, pp. 107–127. Wydawnictwo Naukowe PWN, Warsaw (2012)
Google Scholar
Radziszewski, A., Pawlaczek, A.: Large-scale experiments with NP chunking of Polish. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 143–149. Springer, Heidelberg (2012)
Chapter Google Scholar
Radziszewski, A.: Metody znakowania morfosyntaktycznego i automatycznej płytkiej analizy składniowej języka polski. Ph.D. thesis, Politechnika Wrocławska, Wrocław (2012)
Google Scholar
Maziarz, M., Piasecki, M., Szpakowicz, S.: Approaching plWordNet 2.0. In: Proceedings of the 6th Global Wordnet Conference, Matsue, Japan, January 2012
Google Scholar
Pease, A., Niles, I., Li, J.: The suggested upper merged ontology: a large ontology for the semantic web and its applications. In: Working Notes of the AAAI-2002 Workshop on Ontologies and the Semantic Web (2002)
Google Scholar
Marcińczuk, M., Kocoń, J., Janicki, M.: Liner2 — a customizable framework for proper names recognition for Polish. In: Bembenik, R., Skonieczny, Ł., Rybiński, H., Kryszkiewicz, M., Niezgódka, M. (eds.) Intelligent Tools for Building a Scientific Information. SCI, vol. 467, pp. 231–254. Springer, Heidelberg (2013)
Chapter Google Scholar
Wróblewska, A., Woliński, M.: Preliminary experiments in Polish dependency parsing. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 279–292. Springer, Heidelberg (2012)
Chapter Google Scholar
Kordjamshidi, P., Van Otterlo, M., Moens, M.F.: Spatial role labeling: towards extraction of spatial relations from natural language. ACM Trans. Speech Lang. Process. 8(3), 1–36 (2011)
Article Google Scholar
Przybylska, R.: Polisemia przyimków polskich w świetle semantyki kognitywnej. Universitas, Kraków (2002)
Google Scholar

Download references

Acknowledgements

Work financed as part of the investment in the CLARIN-PL research infrastructure funded by the Polish Ministry of Science and Higher Education.

Author information

Authors and Affiliations

G4.19 Research Group, Department of Computational Intelligence, Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland
Michał Marcińczuk, Marcin Oleksy & Jan Wieczorek

Authors

Michał Marcińczuk
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Oleksy
View author publications
You can also search for this author in PubMed Google Scholar
Jan Wieczorek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michał Marcińczuk .

Editor information

Editors and Affiliations

Masaryk University , Brno, Czech Republic
Petr Sojka
Masaryk University , Brno, Czech Republic
Aleš Horák
Masaryk University , Brno, Czech Republic
Ivan Kopeček
Masaryk University , Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marcińczuk, M., Oleksy, M., Wieczorek, J. (2016). Preliminary Study on Automatic Recognition of Spatial Expressions in Polish Texts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-45510-5_18
Published: 03 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45509-9
Online ISBN: 978-3-319-45510-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics