Abstract
A word-to-word similarity function automatically extracted from a corpus of texts can be a very helpful tool in automatic extraction of lexical semantic relations. There are many approaches for English, but only a few for inflective languages with almost free word order. In the paper a method for the construction of a similarity function for Polish nouns is proposed. The method uses only simple tools for language processing (e.g. it does need the application of a parser). The core is the construction of a matrix of co-occurrences of nouns and adjectives on the basis of application of morpho-syntactic constraints testing agreement between an adjective and a noun. Several methods of transformation of the matrix and calculation of the similarity function are presented. The achieved accuracy of 81.15% in WordNet-based Synonymy Test (for 4 611 Polish nouns, using the current version of Polish WordNet) seems to be comparable with the best results reported for English (e.g. 75.8% [5]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berry, M.: Large Scale Singular Value Computations. International Journal of Supercomputer Applications 6(1), 13–49 (1992)
Dagan, I., Pereira, F., Lee, L.: Similarity-based estimation of Word Co-occurrence Probabilities. In: ACL, vol. 32, pp. 272–278 (1997)
Ehlert, B.: Making Accurate Lexical Semantic Similarity Judgments Using Word-context Co-occurrence Statistics. Master’s thesis, University of California, San Diego (2003)
Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)
Freitag, D., et al.: New Experiments in Distributional Representations of Synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 25–32. ACL (2005)
Gärdenfors, P.: Conceptual Spaces — The Geometry of Thought. MIT Press, Cambridge (2000)
Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Computational Linguistics 32(1), 83–135 (2006)
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches. In: Proceedings of The Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX/ACL (1993)
Harris, Z.: Mathematical Structures of Language. Interscience Publishers, New York (1968)
Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35. Springer, Heidelberg (2006)
Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review 104(2), 211–240 (1997)
Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. ACM Press, New York (2001)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of 21st International Conference on Computational Linguistics (COLING-06), Sydney, pp. 113–120. ACL (2006)
Piasecki, M.: LSA Based Extractionof Semantic Similarity for Polish. In: Proceedings of Multimedia and Network Information Systems, Wrocław University of Technology (2006)
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Polish WordNet — Homepage of The Project. http://www.plwordnet.pwr.wroc.pl/main/ , State: the December 2006
Przepiórkowski, A.: The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS (2004)
Ruge, G.: Experiments on Linguistically-based Term Associations. Information Processing and Management 28(3), 317–332 (1992)
Ruge, G.: Automatic Detection of Thesaurus Relations for Information Retrieval Applications. In: Freksa, C., Jantzen, M., Valk, R. (eds.) Foundations of Computer Science. LNCS, vol. 1337, pp. 499–506. Springer, Heidelberg (1997)
Shütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)
Turney, P.T.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Turney, P.T., et al.: Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
Widdows, D.: Geometry and Meaning. CSLI Publications, Stanford (2004)
Woliński, M.: Morfeusz — a practical tool for the morphological analysis of polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35, Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Piasecki, M., Broda, B. (2007). Semantic Similarity Measure of Polish Nouns Based on Linguistic Features. In: Abramowicz, W. (eds) Business Information Systems. BIS 2007. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72035-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-540-72035-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72034-8
Online ISBN: 978-3-540-72035-5
eBook Packages: Computer ScienceComputer Science (R0)