Terminology Extraction from Domain Texts in Polish

  • Małgorzata Marciniak
  • Agnieszka Mykowiecka
Part of the Studies in Computational Intelligence book series (SCI, volume 467)


The paper presents a method of extracting terminology from Polish texts which consists of two steps. The first one identifies candidates for terms, and is supported by linguistic knowledge-a shallow grammar used for extracted phrases is given. The second step is based on statistics, consisting in ranking and filtering candidates for domain terms with the help of a C-value method, and phrases extracted from general Polish texts. The presented approach is sensitive to finding terminology also expressed as subphrases. We applied the method to economics texts, and describe the results of the experiment. The paper closes with an evaluation and a discussion of the results.


terminology extraction shallow text processing domain corpora 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Acedański, S.: A Morphosyntactic Brill Tagger for Inflectional Languages. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) IceTAL 2010. LNCS, vol. 6233, pp. 3–14. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An Improved Automatic Term Recognition Method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. System Science (4) (2008)Google Scholar
  4. 4.
    Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. Journal on Digital Libraries 3, 115–130 (2000)CrossRefGoogle Scholar
  5. 5.
    Korkontzelos, I., Klapaftis, I.P., Manandhar, S.: Reviewing and Evaluating Automatic Term Recognition Techniques. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 248–259. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Marciniak, M., Mykowiecka, A.: Towards morphologically annotated corpus of hospital discharge reports in Polish. In: Proc. of the BioNLP, ACL/HLT 2011 Workshop, Portland, Oregon (2011)Google Scholar
  7. 7.
    Marciniak, M., Savary, A., Sikora, P., Woliński, M.: Toposław – A Lexicographic Framework for Multi-word Units. In: Vetulani, Z. (ed.) LTC 2009. LNCS, vol. 6562, pp. 139–150. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Mykowiecka, A., Marciniak, M.: Terminology extraction from medical texts in Polish. In: Ananiadou, S., Pyysalo, S., Rebholz-Schuhmann, D., Rinaldi, F., Salakoski, T. (eds.) Proceedings of the 5th International Symposium on Semantic Mining in Biomedicine, SMBM 2012 (2012)Google Scholar
  9. 9.
    Pazienza, M.T., Marco Pennacchiotti, M., Zanzotto, F.M.: Terminology Extraction: An Analysis of Linguistic and Statistical Approaches. In: Sirmakessis, S. (ed.) Knowledge Mining. STUDFUZZ, vol. 185, pp. 255–279. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Piasecki, M.: Polish tagger TaKIPI: Rule based construction and optimisation. Task Quarterly 11(1-2), 151–167 (2007)Google Scholar
  11. 11.
    Piasecki, M., Radziszewski, A.: Polish Morphological Guesser Based on a Statistical A Tergo Index. In: Proceedings of the International Multiconference on Computer Science and Information Technology — 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA 2007), pp. 247–256 (2007)Google Scholar
  12. 12.
    Przepiórkowski, A.: Powierzchniowe przetwarzanie języka polskiego. Akademicka Oficyna Wydawnicza EXIT, Warsaw (2008)Google Scholar
  13. 13.
    Przepiórkowski, A., Bañko, M., Górski, R.L., Lewandowska-Tomaszczyk, B. (eds.): Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN, Warsaw (2012)Google Scholar
  14. 14.
    Savova, G.K., Harris, M., Johnson, T., Pakhomov, S.V., Chute, C.G.: A data-driven approach for extracting “the most specific term” for ontology development. In: Proc. of AMIA (2003)Google Scholar
  15. 15.
    Sinclair, J. (ed.): Collins Cobuid English Language Dictionary. Collins Publ. (1990)Google Scholar
  16. 16.
    Wermter, J., Hahn, U.: Massive Biomedical Term Discovery. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds.) DS 2005. LNCS (LNAI), vol. 3735, pp. 281–293. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. 17.
    Woliński, M.: Morfeusz — a Practical Tool for the Morphological Analysis of Polish. In: Kłopotek, M., Wierzchoń, S., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining, IIS: IIPWM 2006 Proceedings, pp. 503–512. Springer, Heidelberg (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Institute of Computer Science, Polish Academy of SciencesWarsawPoland

Personalised recommendations