Polish Language Processing Chains for Multilingual Information Systems

  • Maciej Ogrodniczuk
  • Adam Przepiórkowski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7337)

Abstract

The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operations such as automated categorization, information extraction, machine translation or summarization.

This paper concentrates on applications of ATLAS language processing chains to multilingual information systems, with particular interest in processing Polish. Inflectional characteristics of this language offers the possibility to comment on a few more advanced functions such as multiword unit lemmatisation, vital for real-life presentation of extracted phrases. Several sample applications using the NLP chain are also presented.

Keywords

Noun Phrase Machine Translation Latent Dirichlet Allocation Statistical Machine Translation National Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Buczyński, A., Przepiórkowski, A.: Spejd: A Shallow Processing and Morphological Disambiguation Tool. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol. 5603, pp. 131–141. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Ciura, M., Grund, D., Kulików, S., Suszczanska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Okatan, A. (ed.) Proceedings of the International Conference on Computational Intelligence (ICCI 2004), pp. 117–120. International Computational Intelligence Society, Istanbul (2004)Google Scholar
  4. 4.
    Degórski, Ł.: Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar. In: Bouvry, P., Kłopotek, M.A., Leprévost, F., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) SIIS 2011. LNCS, vol. 7053, pp. 370–378. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Dudczak, A., Stefanowski, J., Weiss, D.: Evaluation of Sentence-Selection Text Summarization Methods on Polish News Articles. Foundations of Computing and Decision Sciences 1(35), 27–41 (2010)Google Scholar
  6. 6.
    Głowińska, K., Przepiórkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta (2010)Google Scholar
  7. 7.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177–180 (2007)Google Scholar
  8. 8.
    Ogrodniczuk, M., Karagiozov, D.: ATLAS — The Multilingual Language Processing Platform. Procesamiento del Lenguaje Natural 47, 241–248 (2011)Google Scholar
  9. 9.
    Ogrodniczuk, M., Kopeć, M.: End-to-end coreference resolution baseline system for Polish. In: Proceedings of the 5th Language & Technology Conference (LTC 2011), Poznań, Poland, pp. 167–171 (2011)Google Scholar
  10. 10.
    Postolache, O.: RARE: Robust Anaphora Resolution Engine. Master’s thesis, University of Iasi (2004)Google Scholar
  11. 11.
    Przepiórkowski, A., Górski, R.L., Łazinski, M., Pęzik, P.: Recent developments in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta, Malta (2010)Google Scholar
  12. 12.
    Przepiórkowski, A., Woliński, M.: A Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)Google Scholar
  13. 13.
    Saloni, Z., Gruszczyński, W., Woliński, M., Wołosz, R.: Grammatical Dictionary of Polish – Presentation by the Authors. Studies in Polish Linguistics 4, 5–25 (2007)Google Scholar
  14. 14.
    Savary, A., Waszczuk, J., Przepiórkowski, A.: Towards the Annotation of Named Entities in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC,2010, Valletta, Malta, ELRA (2010)Google Scholar
  15. 15.
    Świetlicka, J.: Machine learning methods in automatic text summarization (in Polish). Master’s thesis, Warsaw University, Poland (2010)Google Scholar
  16. 16.
    Woliński, M.: Morfeusz – a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the International Intelligent Information Systems: Intelligent Information Processing and Web Mining 2006 Conference, Wisła, Poland, pp. 511–520 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Maciej Ogrodniczuk
    • 1
  • Adam Przepiórkowski
    • 1
  1. 1.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations