Abstract
The ATLAS project, started in March 2010, intends to create a multilingual language processing framework integrating the common set of linguistic tools for a group of European languages, among them Polish. The chained tools producing multi-level UIMA-encoded annotation of texts can be used by NLP applications for complex language-intensive operations such as automated categorization, information extraction, machine translation or summarization.
This paper concentrates on applications of ATLAS language processing chains to multilingual information systems, with particular interest in processing Polish. Inflectional characteristics of this language offers the possibility to comment on a few more advanced functions such as multiword unit lemmatisation, vital for real-life presentation of extracted phrases. Several sample applications using the NLP chain are also presented.
The work reported here was carried out within the Applied Technology for Language-Aided CMS project co-funded by the European Commission under the Information and Communications Technologies (ICT) Policy Support Programme (Grant Agreement No 250467).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
AcedaÅski, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rƶgnvaldsson, E., HelgadĆ³ttir, S. (eds.) IceTAL 2010. LNCS, vol.Ā 6233, pp. 3ā14. Springer, Heidelberg (2010)
BuczyÅski, A., PrzepiĆ³rkowski, A.: Spejd: A Shallow Processing and Morphological Disambiguation Tool. In: Vetulani, Z., Uszkoreit, H. (eds.) LTC 2007. LNCS (LNAI), vol.Ā 5603, pp. 131ā141. Springer, Heidelberg (2009)
Ciura, M., Grund, D., KulikĆ³w, S., Suszczanska, N.: A System to Adapt Techniques of Text Summarizing to Polish. In: Okatan, A. (ed.) Proceedings of the International Conference on Computational Intelligence (ICCI 2004), pp. 117ā120. International Computational Intelligence Society, Istanbul (2004)
DegĆ³rski, Å.: Towards the Lemmatisation of Polish Nominal Syntactic Groups Using a Shallow Grammar. In: Bouvry, P., KÅopotek, M.A., LeprĆ©vost, F., Marciniak, M., Mykowiecka, A., RybiÅski, H. (eds.) SIIS 2011. LNCS, vol.Ā 7053, pp. 370ā378. Springer, Heidelberg (2012)
Dudczak, A., Stefanowski, J., Weiss, D.: Evaluation of Sentence-Selection Text Summarization Methods on Polish News Articles. Foundations of Computing and Decision SciencesĀ 1(35), 27ā41 (2010)
GÅowiÅska, K., PrzepiĆ³rkowski, A.: The design of syntactic annotation levels in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta (2010)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177ā180 (2007)
Ogrodniczuk, M., Karagiozov, D.: ATLAS ā The Multilingual Language Processing Platform. Procesamiento del Lenguaje NaturalĀ 47, 241ā248 (2011)
Ogrodniczuk, M., KopeÄ, M.: End-to-end coreference resolution baseline system for Polish. In: Proceedings of the 5th Language & Technology Conference (LTC 2011), PoznaÅ, Poland, pp. 167ā171 (2011)
Postolache, O.: RARE: Robust Anaphora Resolution Engine. Masterās thesis, University of Iasi (2004)
PrzepiĆ³rkowski, A., GĆ³rski, R.L., Åazinski, M., PÄzik, P.: Recent developments in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010. ELRA, Valletta, Malta (2010)
PrzepiĆ³rkowski, A., WoliÅski, M.: AĀ Flexemic Tagset for Polish. In: Proceedings of Morphological Processing of Slavic Languages, EACL 2003 (2003)
Saloni, Z., GruszczyÅski, W., WoliÅski, M., WoÅosz, R.: Grammatical Dictionary of Polish ā Presentation by the Authors. Studies in Polish LinguisticsĀ 4, 5ā25 (2007)
Savary, A., Waszczuk, J., PrzepiĆ³rkowski, A.: Towards the Annotation of Named Entities in the National Corpus of Polish. In: Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC,2010, Valletta, Malta, ELRA (2010)
Åwietlicka, J.: Machine learning methods in automatic text summarization (in Polish). Masterās thesis, Warsaw University, Poland (2010)
WoliÅski, M.: Morfeusz ā a practical tool for the morphological analysis of Polish. In: KÅopotek, M.A., WierzchoÅ, S.T., Trojanowski, K. (eds.) Proceedings of the International Intelligent Information Systems: Intelligent Information Processing and Web Mining 2006 Conference, WisÅa, Poland, pp. 511ā520 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ogrodniczuk, M., PrzepiĆ³rkowski, A. (2012). Polish Language Processing Chains for Multilingual Information Systems. In: Bouma, G., Ittoo, A., MĆ©tais, E., Wortmann, H. (eds) Natural Language Processing and Information Systems. NLDB 2012. Lecture Notes in Computer Science, vol 7337. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31178-9_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-31178-9_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31177-2
Online ISBN: 978-3-642-31178-9
eBook Packages: Computer ScienceComputer Science (R0)