Skip to main content

Polish Language Processing Chains for Multilingual Information Systems

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7337))

  • 2233 Accesses

Abstract

The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operations such as automated categorization, information extraction, machine translation or summarization.

This paper concentrates on applications of ATLAS language processing chains to multilingual information systems, with particular interest in processing Polish. Inflectional characteristics of this language offers the possibility to comment on a few more advanced functions such as multiword unit lemmatisation, vital for real-life presentation of extracted phrases. Several sample applications using the NLP chain are also presented.

The work reported here was carried out within the Applied Technology for Language-Aided CMS project co-funded by the European Commission under the Information and Communications Technologies (ICT) Policy Support Programme (Grant Agreement No 250467).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rƶgnvaldsson, E., HelgadĆ³ttir, S. (eds.) IceTAL 2010. LNCS, vol.Ā 6233, pp. 3ā€“14. Springer, Heidelberg (2010)

    ChapterĀ  Google ScholarĀ 

  2. Buczyński, A., PrzepiĆ³rkowski, A.: Spejd: A Shallow Processing and Morphological Disambiguation Tool. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol.Ā 5603, pp. 131ā€“141. Springer, Heidelberg (2009)

    ChapterĀ  Google ScholarĀ 

  3. Ciura, M., Grund, D., KulikĆ³w, S., Suszczanska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Okatan, A. (ed.) Proceedings of the International Conference on Computational Intelligence (ICCI 2004), pp. 117ā€“120. International Computational Intelligence Society, Istanbul (2004)

    Google ScholarĀ 

  4. DegĆ³rski, Ł.: Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar. In: Bouvry, P., Kłopotek, M.A., LeprĆ©vost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol.Ā 7053, pp. 370ā€“378. Springer, Heidelberg (2012)

    ChapterĀ  Google ScholarĀ 

  5. Dudczak, A., Stefanowski, J., Weiss, D.: Evaluation of Sentence-Selection Text Summarization Methods on Polish News Articles. Foundations of Computing and Decision SciencesĀ 1(35), 27ā€“41 (2010)

    Google ScholarĀ 

  6. Głowińska, K., PrzepiĆ³rkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta (2010)

    Google ScholarĀ 

  7. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177ā€“180 (2007)

    Google ScholarĀ 

  8. Ogrodniczuk, M., Karagiozov, D.: ATLAS ā€” The Multilingual Language Processing Platform. Procesamiento del Lenguaje NaturalĀ 47, 241ā€“248 (2011)

    Google ScholarĀ 

  9. Ogrodniczuk, M., Kopeć, M.: End-to-end coreference resolution baseline system for Polish. In: Proceedings of the 5th Language & Technology Conference (LTC 2011), Poznań, Poland, pp. 167ā€“171 (2011)

    Google ScholarĀ 

  10. Postolache, O.: RARE: Robust Anaphora Resolution Engine. Masterā€™s thesis, University of Iasi (2004)

    Google ScholarĀ 

  11. PrzepiĆ³rkowski, A., GĆ³rski, R.L., Łazinski, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta, Malta (2010)

    Google ScholarĀ 

  12. PrzepiĆ³rkowski, A., Woliński, M.: AĀ Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)

    Google ScholarĀ 

  13. Saloni, Z., Gruszczyński, W., Woliński, M., Wołosz, R.: Grammatical Dictionary of Polish ā€“ Presentation by the Authors. Studies in Polish LinguisticsĀ 4, 5ā€“25 (2007)

    Google ScholarĀ 

  14. Savary, A., Waszczuk, J., PrzepiĆ³rkowski, A.: Towards the Annotation of Named Entities in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC,2010, Valletta, Malta, ELRA (2010)

    Google ScholarĀ 

  15. Świetlicka, J.: Machine learning methods in automatic text summarization (in Polish). Masterā€™s thesis, Warsaw University, Poland (2010)

    Google ScholarĀ 

  16. Woliński, M.: Morfeusz ā€“ a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the International Intelligent Information Systems: Intelligent Information Processing and Web Mining 2006 Conference, Wisła, Poland, pp. 511ā€“520 (2006)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ogrodniczuk, M., PrzepiĆ³rkowski, A. (2012). Polish Language Processing Chains for Multilingual Information Systems. In: Bouma, G., Ittoo, A., MĆ©tais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31178-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31177-2

  • Online ISBN: 978-3-642-31178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics