Advertisement

Semantic Similarity Measure of Polish Nouns Based on Linguistic Features

  • Maciej Piasecki
  • Bartosz Broda
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4439)

Abstract

A word-to-word similarity function automatically extracted from a corpus of texts can be a very helpful tool in automatic extraction of lexical semantic relations. There are many approaches for English, but only a few for inflective languages with almost free word order. In the paper a method for the construction of a similarity function for Polish nouns is proposed. The method uses only simple tools for language processing (e.g. it does need the application of a parser). The core is the construction of a matrix of co-occurrences of nouns and adjectives on the basis of application of morpho-syntactic constraints testing agreement between an adjective and a noun. Several methods of transformation of the matrix and calculation of the similarity function are presented. The achieved accuracy of 81.15% in WordNet-based Synonymy Test (for 4 611 Polish nouns, using the current version of Polish WordNet) seems to be comparable with the best results reported for English (e.g. 75.8% [5]).

Keywords

semantic similarity function Polish automatic extraction nouns LSA 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Berry, M.: Large Scale Singular Value Computations. International Journal of Supercomputer Applications 6(1), 13–49 (1992)MathSciNetGoogle Scholar
  2. 2.
    Dagan, I., Pereira, F., Lee, L.: Similarity-based estimation of Word Co-occurrence Probabilities. In: ACL, vol. 32, pp. 272–278 (1997)Google Scholar
  3. 3.
    Ehlert, B.: Making Accurate Lexical Semantic Similarity Judgments Using Word-context Co-occurrence Statistics. Master’s thesis, University of California, San Diego (2003)Google Scholar
  4. 4.
    Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  5. 5.
    Freitag, D., et al.: New Experiments in Distributional Representations of Synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 25–32. ACL (2005)Google Scholar
  6. 6.
    Gärdenfors, P.: Conceptual Spaces — The Geometry of Thought. MIT Press, Cambridge (2000)Google Scholar
  7. 7.
    Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Computational Linguistics 32(1), 83–135 (2006)Google Scholar
  8. 8.
    Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches. In: Proceedings of The Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX/ACL (1993)Google Scholar
  9. 9.
    Harris, Z.: Mathematical Structures of Language. Interscience Publishers, New York (1968)zbMATHGoogle Scholar
  10. 10.
    Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35. Springer, Heidelberg (2006)zbMATHGoogle Scholar
  11. 11.
    Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review 104(2), 211–240 (1997)CrossRefGoogle Scholar
  12. 12.
    Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. ACM Press, New York (2001)Google Scholar
  13. 13.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  14. 14.
    Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of 21st International Conference on Computational Linguistics (COLING-06), Sydney, pp. 113–120. ACL (2006)Google Scholar
  15. 15.
    Piasecki, M.: LSA Based Extractionof Semantic Similarity for Polish. In: Proceedings of Multimedia and Network Information Systems, Wrocław University of Technology (2006)Google Scholar
  16. 16.
    Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)Google Scholar
  17. 17.
    Polish WordNet — Homepage of The Project. http://www.plwordnet.pwr.wroc.pl/main/, State: the December 2006
  18. 18.
    Przepiórkowski, A.: The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS (2004)Google Scholar
  19. 19.
    Ruge, G.: Experiments on Linguistically-based Term Associations. Information Processing and Management 28(3), 317–332 (1992)CrossRefGoogle Scholar
  20. 20.
    Ruge, G.: Automatic Detection of Thesaurus Relations for Information Retrieval Applications. In: Freksa, C., Jantzen, M., Valk, R. (eds.) Foundations of Computer Science. LNCS, vol. 1337, pp. 499–506. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  21. 21.
    Shütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)Google Scholar
  22. 22.
    Turney, P.T.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  23. 23.
    Turney, P.T., et al.: Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)Google Scholar
  24. 24.
    Widdows, D.: Geometry and Meaning. CSLI Publications, Stanford (2004)zbMATHGoogle Scholar
  25. 25.
    Woliński, M.: Morfeusz — a practical tool for the morphological analysis of polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35, Springer, Heidelberg (2006)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Maciej Piasecki
    • 1
  • Bartosz Broda
    • 1
  1. 1.Institute of Applied Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, WrocławPoland

Personalised recommendations