Semantic Similarity Measure of Polish Nouns Based on Linguistic Features

Piasecki, Maciej; Broda, Bartosz

doi:10.1007/978-3-540-72035-5_29

Maciej Piasecki¹ &
Bartosz Broda¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4439))

Included in the following conference series:

International Conference on Business Information Systems

1876 Accesses
5 Citations

Abstract

A word-to-word similarity function automatically extracted from a corpus of texts can be a very helpful tool in automatic extraction of lexical semantic relations. There are many approaches for English, but only a few for inflective languages with almost free word order. In the paper a method for the construction of a similarity function for Polish nouns is proposed. The method uses only simple tools for language processing (e.g. it does need the application of a parser). The core is the construction of a matrix of co-occurrences of nouns and adjectives on the basis of application of morpho-syntactic constraints testing agreement between an adjective and a noun. Several methods of transformation of the matrix and calculation of the similarity function are presented. The achieved accuracy of 81.15% in WordNet-based Synonymy Test (for 4 611 Polish nouns, using the current version of Polish WordNet) seems to be comparable with the best results reported for English (e.g. 75.8% [5]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berry, M.: Large Scale Singular Value Computations. International Journal of Supercomputer Applications 6(1), 13–49 (1992)
MathSciNet Google Scholar
Dagan, I., Pereira, F., Lee, L.: Similarity-based estimation of Word Co-occurrence Probabilities. In: ACL, vol. 32, pp. 272–278 (1997)
Google Scholar
Ehlert, B.: Making Accurate Lexical Semantic Similarity Judgments Using Word-context Co-occurrence Statistics. Master’s thesis, University of California, San Diego (2003)
Google Scholar
Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Freitag, D., et al.: New Experiments in Distributional Representations of Synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 25–32. ACL (2005)
Google Scholar
Gärdenfors, P.: Conceptual Spaces — The Geometry of Thought. MIT Press, Cambridge (2000)
Google Scholar
Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Computational Linguistics 32(1), 83–135 (2006)
Google Scholar
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches. In: Proceedings of The Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX/ACL (1993)
Google Scholar
Harris, Z.: Mathematical Structures of Language. Interscience Publishers, New York (1968)
MATH Google Scholar
Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35. Springer, Heidelberg (2006)
MATH Google Scholar
Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. ACM Press, New York (2001)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of 21st International Conference on Computational Linguistics (COLING-06), Sydney, pp. 113–120. ACL (2006)
Google Scholar
Piasecki, M.: LSA Based Extractionof Semantic Similarity for Polish. In: Proceedings of Multimedia and Network Information Systems, Wrocław University of Technology (2006)
Google Scholar
Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)
Google Scholar
Polish WordNet — Homepage of The Project. http://www.plwordnet.pwr.wroc.pl/main/ , State: the December 2006
Przepiórkowski, A.: The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS (2004)
Google Scholar
Ruge, G.: Experiments on Linguistically-based Term Associations. Information Processing and Management 28(3), 317–332 (1992)
Article Google Scholar
Ruge, G.: Automatic Detection of Thesaurus Relations for Information Retrieval Applications. In: Freksa, C., Jantzen, M., Valk, R. (eds.) Foundations of Computer Science. LNCS, vol. 1337, pp. 499–506. Springer, Heidelberg (1997)
Chapter Google Scholar
Shütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)
Google Scholar
Turney, P.T.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)
Chapter Google Scholar
Turney, P.T., et al.: Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)
Google Scholar
Widdows, D.: Geometry and Meaning. CSLI Publications, Stanford (2004)
MATH Google Scholar
Woliński, M.: Morfeusz — a practical tool for the morphological analysis of polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35, Springer, Heidelberg (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Informatics, Wrocław University of Technology, Wybrzeże Wyspiańskiego 27, Wrocław, Poland
Maciej Piasecki & Bartosz Broda

Authors

Maciej Piasecki
View author publications
You can also search for this author in PubMed Google Scholar
Bartosz Broda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Witold Abramowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Piasecki, M., Broda, B. (2007). Semantic Similarity Measure of Polish Nouns Based on Linguistic Features. In: Abramowicz, W. (eds) Business Information Systems. BIS 2007. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72035-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-72035-5_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72034-8
Online ISBN: 978-3-540-72035-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics