Polish Language Processing Chains for Multilingual Information Systems

Ogrodniczuk, Maciej; Przepiórkowski, Adam

doi:10.1007/978-3-642-31178-9_14

Maciej Ogrodniczuk¹⁹ &
Adam Przepiórkowski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

2233 Accesses

Abstract

The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operations such as automated categorization, information extraction, machine translation or summarization.

This paper concentrates on applications of ATLAS language processing chains to multilingual information systems, with particular interest in processing Polish. Inflectional characteristics of this language offers the possibility to comment on a few more advanced functions such as multiword unit lemmatisation, vital for real-life presentation of extracted phrases. Several sample applications using the NLP chain are also presented.

The work reported here was carried out within the Applied Technology for Language-Aided CMS project co-funded by the European Commission under the Information and Communications Technologies (ICT) Policy Support Programme (Grant Agreement No 250467).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)
Chapter Google Scholar
Buczyński, A., Przepiórkowski, A.: Spejd: A Shallow Processing and Morphological Disambiguation Tool. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol. 5603, pp. 131–141. Springer, Heidelberg (2009)
Chapter Google Scholar
Ciura, M., Grund, D., Kulików, S., Suszczanska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Okatan, A. (ed.) Proceedings of the International Conference on Computational Intelligence (ICCI 2004), pp. 117–120. International Computational Intelligence Society, Istanbul (2004)
Google Scholar
Degórski, Ł.: Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 370–378. Springer, Heidelberg (2012)
Chapter Google Scholar
Dudczak, A., Stefanowski, J., Weiss, D.: Evaluation of Sentence-Selection Text Summarization Methods on Polish News Articles. Foundations of Computing and Decision Sciences 1(35), 27–41 (2010)
Google Scholar
Głowińska, K., Przepiórkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta (2010)
Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177–180 (2007)
Google Scholar
Ogrodniczuk, M., Karagiozov, D.: ATLAS — The Multilingual Language Processing Platform. Procesamiento del Lenguaje Natural 47, 241–248 (2011)
Google Scholar
Ogrodniczuk, M., Kopeć, M.: End-to-end coreference resolution baseline system for Polish. In: Proceedings of the 5th Language & Technology Conference (LTC 2011), Poznań, Poland, pp. 167–171 (2011)
Google Scholar
Postolache, O.: RARE: Robust Anaphora Resolution Engine. Master’s thesis, University of Iasi (2004)
Google Scholar
Przepiórkowski, A., Górski, R.L., Łazinski, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta, Malta (2010)
Google Scholar
Przepiórkowski, A., Woliński, M.: A Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)
Google Scholar
Saloni, Z., Gruszczyński, W., Woliński, M., Wołosz, R.: Grammatical Dictionary of Polish – Presentation by the Authors. Studies in Polish Linguistics 4, 5–25 (2007)
Google Scholar
Savary, A., Waszczuk, J., Przepiórkowski, A.: Towards the Annotation of Named Entities in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC,2010, Valletta, Malta, ELRA (2010)
Google Scholar
Świetlicka, J.: Machine learning methods in automatic text summarization (in Polish). Master’s thesis, Warsaw University, Poland (2010)
Google Scholar
Woliński, M.: Morfeusz – a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the International Intelligent Information Systems: Intelligent Information Processing and Web Mining 2006 Conference, Wisła, Poland, pp. 511–520 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw, Poland
Maciej Ogrodniczuk & Adam Przepiórkowski

Authors

Maciej Ogrodniczuk
View author publications
You can also search for this author in PubMed Google Scholar
Adam Przepiórkowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Science Department, University of Groningen, Oude Kijk in ’t Jatstraat 26, 9712 EK, Groningen, The Netherlands
Gosse Bouma
Faculty of Economics and Business, University of Groningen, Nettelbosje 2, 9747 AE, Groningen, The Netherlands
Ashwin Ittoo & Hans Wortmann &
CNAM-Laboratoire Cédric, 292 rue St. Martin, 75141, Paris Cedex 03, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ogrodniczuk, M., Przepiórkowski, A. (2012). Polish Language Processing Chains for Multilingual Information Systems. In: Bouma, G., Ittoo, A., Métais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-31178-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics