Skip to main content

Semantic Similarity Measure of Polish Nouns Based on Linguistic Features

  • Conference paper
Business Information Systems (BIS 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4439))

Included in the following conference series:

Abstract

A word-to-word similarity function automatically extracted from a corpus of texts can be a very helpful tool in automatic extraction of lexical semantic relations. There are many approaches for English, but only a few for inflective languages with almost free word order. In the paper a method for the construction of a similarity function for Polish nouns is proposed. The method uses only simple tools for language processing (e.g. it does need the application of a parser). The core is the construction of a matrix of co-occurrences of nouns and adjectives on the basis of application of morpho-syntactic constraints testing agreement between an adjective and a noun. Several methods of transformation of the matrix and calculation of the similarity function are presented. The achieved accuracy of 81.15% in WordNet-based Synonymy Test (for 4 611 Polish nouns, using the current version of Polish WordNet) seems to be comparable with the best results reported for English (e.g. 75.8% [5]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berry, M.: Large Scale Singular Value Computations. International Journal of Supercomputer Applications 6(1), 13–49 (1992)

    MathSciNet  Google Scholar 

  2. Dagan, I., Pereira, F., Lee, L.: Similarity-based estimation of Word Co-occurrence Probabilities. In: ACL, vol. 32, pp. 272–278 (1997)

    Google Scholar 

  3. Ehlert, B.: Making Accurate Lexical Semantic Similarity Judgments Using Word-context Co-occurrence Statistics. Master’s thesis, University of California, San Diego (2003)

    Google Scholar 

  4. Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  5. Freitag, D., et al.: New Experiments in Distributional Representations of Synonymy. In: Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 25–32. ACL (2005)

    Google Scholar 

  6. Gärdenfors, P.: Conceptual Spaces — The Geometry of Thought. MIT Press, Cambridge (2000)

    Google Scholar 

  7. Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Computational Linguistics 32(1), 83–135 (2006)

    Google Scholar 

  8. Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches. In: Proceedings of The Workshop on Acquisition of Lexical Knowledge from Text, Columbus, SIGLEX/ACL (1993)

    Google Scholar 

  9. Harris, Z.: Mathematical Structures of Language. Interscience Publishers, New York (1968)

    MATH  Google Scholar 

  10. Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  11. Landauer, T., Dumais, S.: A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  12. Lin, D., Pantel, P.: Induction of Semantic Classes from Natural Language Text. ACM Press, New York (2001)

    Google Scholar 

  13. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  14. Pantel, P., Pennacchiotti, M.: Esspresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: Proceedings of 21st International Conference on Computational Linguistics (COLING-06), Sydney, pp. 113–120. ACL (2006)

    Google Scholar 

  15. Piasecki, M.: LSA Based Extractionof Semantic Similarity for Polish. In: Proceedings of Multimedia and Network Information Systems, Wrocław University of Technology (2006)

    Google Scholar 

  16. Piasecki, M., Godlewski, G.: Effective Architecture of the Polish Tagger. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006)

    Google Scholar 

  17. Polish WordNet — Homepage of The Project. http://www.plwordnet.pwr.wroc.pl/main/ , State: the December 2006

  18. Przepiórkowski, A.: The IPI PAN Corpus Preliminary Version. Institute of Computer Science PAS (2004)

    Google Scholar 

  19. Ruge, G.: Experiments on Linguistically-based Term Associations. Information Processing and Management 28(3), 317–332 (1992)

    Article  Google Scholar 

  20. Ruge, G.: Automatic Detection of Thesaurus Relations for Information Retrieval Applications. In: Freksa, C., Jantzen, M., Valk, R. (eds.) Foundations of Computer Science. LNCS, vol. 1337, pp. 499–506. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  21. Shütze, H.: Automatic Word Sense Discrimination. Computational Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  22. Turney, P.T.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 491–502. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  23. Turney, P.T., et al.: Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (2003)

    Google Scholar 

  24. Widdows, D.: Geometry and Meaning. CSLI Publications, Stanford (2004)

    MATH  Google Scholar 

  25. Woliński, M.: Morfeusz — a practical tool for the morphological analysis of polish. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Processing and Web Mining — Proceedings of the International IIS: IIPWM’06 Conference, Ustrón, Poland. Advances in Soft Computing, vol. 35, Springer, Heidelberg (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Witold Abramowicz

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Piasecki, M., Broda, B. (2007). Semantic Similarity Measure of Polish Nouns Based on Linguistic Features. In: Abramowicz, W. (eds) Business Information Systems. BIS 2007. Lecture Notes in Computer Science, vol 4439. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72035-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72035-5_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72034-8

  • Online ISBN: 978-3-540-72035-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics