Advertisement

Morphosyntactic Constraints in the Acquisition of Linguistic Knowledge for Polish

  • Maciej Piasecki
  • Adam Radziszewski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5070)

Abstract

Many approaches to the construction of language tools and acquisition of linguistic knowledge from corpora assume the application of some robust shallow parser. Construction of such a parser is difficult in the case of inflective languages with relaxed word order like Polish. The goal of the work presented here is to analyse the extent of knowledge that can be expressed in the form of morphosyntactic constraints referring to morphological properties of word forms, and its applications in the automatic extraction of syntactic and semantic knowledge. Basic properties of an extended version of the language of morphosyntactic constraints called JOSKIPI are briefly presented. The application of morphosyntactic constraints as background knowledge for extraction of disambiguation rules for Polish is discussed. A new approach to extraction of lexical semantic relations is presented: it relies on the constraints in identifying lexico-morphosyntactic dependencies among word forms in the text. Finally, a combination of the constraints and statistical analysis in the acquisition of multiword expressions is outlined.

Keywords

morphosyntactic constraints morphosyntactic tagging measures of semantic relatedness decision trees extraction of multiword expressions annotated corpus Polish 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Broda, B., Derwojedowa, M., Piasecki, M.: Recognition of structured collocations in an inflective language. In: Proceedings of the International Multiconference on Computer Science and Information Technology — 2nd International Symposium Advances in Artificial Intelligence and Applications (AAIA’07), pp. 237–246 (2007)Google Scholar
  3. 3.
    Broda, B., Derwojedowa, M., Piasecki, M., Szpakowicz, S.: Corpus-based semantic relatedness for the construction of Polish WordNet. In: (ELRA), E.L.R.A., (ed.), Proceedings of the Sixth International Language Resources and Evaluation (LREC’08), Marrakech, Morocco (May 2008)Google Scholar
  4. 4.
    Broda, B., Piasecki, M., Radziszewski, A.: Towards a set of general purpose morphosyntactic tools for Polish. In: Kłopotek, M.A., Przepiórkowski, A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Systems XVI. Proceedings of the International IIS’08 Conference held in Zakopane, Poland, 2006, June 2006. Advances in Soft Computing, pp. 441–450. Academic Publishing House EXIT, Warsaw (2006)Google Scholar
  5. 5.
    Buczyński, A.: Pozyskiwanie z internetu tekstów do badań lingwistycznych. Master’s thesis, Wydział Matematyki, Informatyki i Mechaniki, Uniwersytet Warszawski (2004)Google Scholar
  6. 6.
    Derwojedowa, M., Piasecki, M., Szpakowicz, S., Zawisławska, M., Broda, B.: Words, Concepts and Relations in the Construction of Polish WordNet. In: Tanács, A., Csendes, D., Vincze, V., Fellbaum, C., Vossen, P. (eds.) Proc. Global WordNet Conference, Seged, Hungary University of Szeged, January 22-25 2008, pp. 162–177 (2008)Google Scholar
  7. 7.
    Dietterich, T.G.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1924 (1997)CrossRefGoogle Scholar
  8. 8.
    Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD thesis, University of Stuttgart (2004)Google Scholar
  9. 9.
    Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5), 378–382 (1971)CrossRefGoogle Scholar
  10. 10.
    Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R., Wang, Z.: New Experiments in Distributional Representations of Synonymy. In: Proc. Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor, Michigan, Association for Computational Linguistics, June 2005, pp. 25–32 (2005)Google Scholar
  11. 11.
    Geffet, M., Dagan, I.: Vector quality and distributional similarity. In: Proceedings of the 20th international conference on Computational Linguistics, COLING2004, pp. 247–254 (2004)Google Scholar
  12. 12.
    Godlewski, G., Piasecki, M.: Optimisation of Polish tagger parameters. In: Kłopotek, M.A., Tchórzewski, J. (eds.) Proceedings of Artificial Intelligence Studies, vol. 3, pp. 157–164. Publishing House of University of Podlasie (2006)Google Scholar
  13. 13.
    van Halteren, H. (ed.): Syntactic Wordclass Tagging. Kluwer Academic Publishers, Dordrecht (1999)zbMATHGoogle Scholar
  14. 14.
    Harris, Z.S.: Mathematical Structures of Language. Interscience Publishers, New York (1968)zbMATHGoogle Scholar
  15. 15.
    Israel, G.D.: Determining sample size. Tech. Rep. PEOD6, University of Florida (1992)Google Scholar
  16. 16.
    Karlsson, F., Voutilainen, A., Heikkilä, J., Anttila, A. (eds.): Constraint Grammar: A Language-Independent System for Parsing Unrestricted Text. Mouton de Gruyter, Berlin and New York (1994)Google Scholar
  17. 17.
    Kłopotek, M.A., Przepiórkowski, A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Systems XVI. Proceedings of the International IIS’08 Conference held in Zakopane, Poland, June 2006. Advances in Soft Computing. Academic Publishing House EXIT, Warsaw (2006)Google Scholar
  18. 18.
    Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference held in Wisła, Poland, June 2006. Advances in Soft Computing. Springer, Berlin (2006)zbMATHGoogle Scholar
  19. 19.
    Lin, D.: Principle-based parsing without overgeneration. In: Annual Meeting of the ACL. Proceedings of the 31st annual meeting on Association for Computational Linguistics, pp. 112–120 (1993)Google Scholar
  20. 20.
    Lin, D.: Automatic retrieval and clustering of similar words. In: International Conference On Computational Linguistics (COLING’98). Proceedings of the 17th International Conference on Computational Linguistics, vol. 2, pp. 768–774. ACL (1998)Google Scholar
  21. 21.
    Matoušek, V., Mautner, P. (eds.): TSD 2007. LNCS (LNAI), vol. 4629. Springer, Heidelberg (2007)Google Scholar
  22. 22.
    Nenadić, G., Spasić, I., Ananiadou, S.: Morpho-syntactic clues for terminological processing in Serbian. In: Proceedings of Workshop on Morphological Processing of Slavic Languages, EACL 2003, Budapest, Hungary, pp. 79–86 (2003)Google Scholar
  23. 23.
    Obrębski, T.: An all-path parsing algorithm for constraint-based dependency grammars of cf-power. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 139–146. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Pecina, P.: An extensive empirical study of collocation extraction methods. In: Proceedings of the ACL Student Research Workshop, Ann Arbor, Michigan, June 2005, pp. 13–18. Association for Computational Linguistics (2005)Google Scholar
  25. 25.
    Piasecki, M.: Hand-written and automatically extracted rules for Polish tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 205–212. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  26. 26.
    Piasecki, M., Godlewski, G.: Effective architecture of the Polish tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 213–220. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  27. 27.
    Piasecki, M., Szpakowicz, S., Broda, B.: Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 99–106. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  28. 28.
    Piasecki, M., Szpakowicz, S., Broda, B.: Extended similarity test for the evaluation of semantic similarity functions. In: Vetulani, Z. (ed.) Human Language Technologies as a Challenge for Computer Science and Linguistics, 3rd Language & Technology Conference, Poznań, Poland, October 5–7 2007, pp. 104–108. Wydawnictwo Poznańskie Sp. z o.o (2007)Google Scholar
  29. 29.
    Piotrowski, T., Saloni, Z.: Kieszonkowy słownik angielsko-polski i polsko-angielski. Wyd. Wilga, Warszawa (1999)Google Scholar
  30. 30.
    Przepiórkowski, A.: The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS (2004)Google Scholar
  31. 31.
    Przepiórkowski, A., Buczyński, A.: \(\spadesuit\): Shallow parsing and disambiguation engine. In: Vetulani, Z. (ed.) Human Language Technologies as a Challenge for Computer Science and Linguistics, 3rd Language & Technology Conference, Poznań, Poland, October 5–7, 2007, pp. 340–344. Wydawnictwo Poznańskie Sp. z o.o (2007)Google Scholar
  32. 32.
    P.W.N.: Słownik języka polskiego, May 2007. Published on the web page (2007), http://sjp.pwn.pl/
  33. 33.
    Ratnaparkhi, A.: Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania, Philadelphia, PA, USA (1998)Google Scholar
  34. 34.
    Sharoff, S.: What is at stake: a case study of Russian expressions starting with a preposition. In: Tanaka, T., Villavicencio, A., Bond, F., Korhonen, A. (eds.) Second ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, July 2004, pp. 17–23. ACL (2004)Google Scholar
  35. 35.
    Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)Google Scholar
  36. 36.
    Sojka, P., Kopeček, I., Pala, K. (eds.): TSD 2006. LNCS (LNAI), vol. 4188. Springer, Heidelberg (2006)Google Scholar
  37. 37.
    Spasić, I.: A Machine Learning Approach to Term Classification. PhD thesis, Information Systems Research Centre School of Computing, Science and Engineering University of Salford, Salford, UK (May 2004)Google Scholar
  38. 38.
    Vetulani, Z. (ed.): Human Language Technologies as a Challenge for Computer Science and Linguistics, 3rd Language & Technology Conference, Poznań, Poland, October 5–7, 2007. Wydawnictwo Poznańskie Sp. z o.o (2007)Google Scholar
  39. 39.
    Woliński, M.: Morfeusz — a practical tool for the morphological analysis of Polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference held in Wisła, Poland, June 2006. Advances in Soft Computing, pp. 511–520. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  40. 40.
    Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances, Sydney, Australia, July 2006, pp. 16–24. Association for Computational Linguistics (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Maciej Piasecki
    • 1
  • Adam Radziszewski
    • 1
  1. 1.Institute of Applied InformaticsWrocław University of TechnologyPoland

Personalised recommendations