Calculation of Textual Similarity Using Semantic Relatedness Functions

  • Ammar Riadh KairaldeenEmail author
  • Gonenc Ercan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9042)


Semantic similarity between two sentences is concerned with measuring how much two sentences share the same or related meaning. Two methods in the literature for measuring sentence similarity are cosine similarity and overall similarity. In this work we investigate if it is possible to improve the performance of these methods by integrating different word level semantic relatedness methods. Four different word relatedness methods are compared using four different data sets compiled from different domains, providing a testbed formed of various range of writing expressions to challenge the selected methods. Results show that the use of corpus-based word semantic similarity function has significantly outperformed that of WordNet-based word semantic similarity function in sentence similarity methods. Moreover, we propose a new sentence similarity measure method by modifying an existing method which incorporates word order and lexical similarity called as overall similarity. Furthermore, the results show that the proposed method has significantly improved the performance of the overall method. All the selected methods are tested and compared with other state-of-the-art methods.


Singular Value Decomposition Semantic Similarity Semantic Relatedness Cosine Similarity Latent Semantic Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cilibrasi, R., Vitányi, P.: The Google Similarity Distance. IEEE Trans. Know Data Engineering (2006)Google Scholar
  2. 2.
    Batet, M.: Ontology-Based Semantic Clustering. AI Communication 24 (2011)Google Scholar
  3. 3.
    Jones, K., Walker, S., Robertson, S.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Part. In: Information Processing and Management (2000)Google Scholar
  4. 4.
    Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization (1997)Google Scholar
  5. 5.
    Mehran, S., Timothy, H.: A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. In: WWW 2006. ACM Press (2006)Google Scholar
  6. 6.
    Rapp, R.: Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts. In: Proc. 28th Annu. Conf. Gesellschaft für Klassif, pp. 521–528 (2004)Google Scholar
  7. 7.
    Ercan, G.: Lexical Cohesion Analysis for Topic Segmentation, Summarization and Keyphrase Extraction. Phd. Dissertation. Bilkent University (2012)Google Scholar
  8. 8.
    Leacock, C., Chodorow, M., Miller, G.: Using Semantics and WordNet Relation for Sense Identification. Association for Computational Linguistics (1998)Google Scholar
  9. 9.
    Wu, Z., Palmer, M.: Verb semantics and Lexical Selection. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (1994)Google Scholar
  10. 10.
    Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995)Google Scholar
  11. 11.
    Francis, W., Henry, K.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin, Boston (1982)Google Scholar
  12. 12.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of the International Conference on Machine Learning (1998)Google Scholar
  13. 13.
    Jay, J., David, W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)Google Scholar
  14. 14.
    Choueka, Y., Lusignan, S.: Disambiguation by Short Contexts Computers and the Humanities (1985)Google Scholar
  15. 15.
    Satanjeev, B., Ted, P.: Extended Gloss Overlaps as a Measure of Semantic Rrelatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003)Google Scholar
  16. 16.
    Zaka, B.: Theory and Applications of Similarity Detection Techniques., Institute for Information Systems and Computer Media (IICM) Graz University of Technology A-8010 Graz, Austria (2009)Google Scholar
  17. 17.
    Samuel, F., Stevenson, M.: A Semantic Similarity Approach to Paraphrase Detection (2007)Google Scholar
  18. 18.
    Yuhua, L., Zuhair, B., David, M., James, O.: A Method for Measuring Sentence Similarity and its Application to Conversational Agents. IEEE Transactions on Knowledge and Data Engineering (2006)Google Scholar
  19. 19.
    Li, J., Bandar, Z., McLean, D., Shea, O.: A Method for Measuring Sentence Similarity and its Application to Conversational Agents. In: 17th International Florida Artificial Intelligence Research Society Conference, Miami Beach. AAAI Press (2004)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.University of BaghdadBaghdadIraq
  2. 2.Department of InformaticsHacettepe UniversityAnkaraTurkey

Personalised recommendations