Skip to main content

Calculation of Textual Similarity Using Semantic Relatedness Functions

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9042))

  • 3349 Accesses

Abstract

Semantic similarity between two sentences is concerned with measuring how much two sentences share the same or related meaning. Two methods in the literature for measuring sentence similarity are cosine similarity and overall similarity. In this work we investigate if it is possible to improve the performance of these methods by integrating different word level semantic relatedness methods. Four different word relatedness methods are compared using four different data sets compiled from different domains, providing a testbed formed of various range of writing expressions to challenge the selected methods. Results show that the use of corpus-based word semantic similarity function has significantly outperformed that of WordNet-based word semantic similarity function in sentence similarity methods. Moreover, we propose a new sentence similarity measure method by modifying an existing method which incorporates word order and lexical similarity called as overall similarity. Furthermore, the results show that the proposed method has significantly improved the performance of the overall method. All the selected methods are tested and compared with other state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cilibrasi, R., Vitányi, P.: The Google Similarity Distance. IEEE Trans. Know Data Engineering (2006)

    Google Scholar 

  2. Batet, M.: Ontology-Based Semantic Clustering. AI Communication 24 (2011)

    Google Scholar 

  3. Jones, K., Walker, S., Robertson, S.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments. Part. In: Information Processing and Management (2000)

    Google Scholar 

  4. Barzilay, R., Elhadad, M.: Using Lexical Chains for Text Summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization (1997)

    Google Scholar 

  5. Mehran, S., Timothy, H.: A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets. In: WWW 2006. ACM Press (2006)

    Google Scholar 

  6. Rapp, R.: Discovering the Senses of an Ambiguous Word by Clustering its Local Contexts. In: Proc. 28th Annu. Conf. Gesellschaft für Klassif, pp. 521–528 (2004)

    Google Scholar 

  7. Ercan, G.: Lexical Cohesion Analysis for Topic Segmentation, Summarization and Keyphrase Extraction. Phd. Dissertation. Bilkent University (2012)

    Google Scholar 

  8. Leacock, C., Chodorow, M., Miller, G.: Using Semantics and WordNet Relation for Sense Identification. Association for Computational Linguistics (1998)

    Google Scholar 

  9. Wu, Z., Palmer, M.: Verb semantics and Lexical Selection. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics (1994)

    Google Scholar 

  10. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence (1995)

    Google Scholar 

  11. Francis, W., Henry, K.: Frequency Analysis of English Usage. Lexicon and Grammar. Houghton Mifflin, Boston (1982)

    Google Scholar 

  12. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of the International Conference on Machine Learning (1998)

    Google Scholar 

  13. Jay, J., David, W.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)

    Google Scholar 

  14. Choueka, Y., Lusignan, S.: Disambiguation by Short Contexts Computers and the Humanities (1985)

    Google Scholar 

  15. Satanjeev, B., Ted, P.: Extended Gloss Overlaps as a Measure of Semantic Rrelatedness. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  16. Zaka, B.: Theory and Applications of Similarity Detection Techniques., Institute for Information Systems and Computer Media (IICM) Graz University of Technology A-8010 Graz, Austria (2009)

    Google Scholar 

  17. Samuel, F., Stevenson, M.: A Semantic Similarity Approach to Paraphrase Detection (2007)

    Google Scholar 

  18. Yuhua, L., Zuhair, B., David, M., James, O.: A Method for Measuring Sentence Similarity and its Application to Conversational Agents. IEEE Transactions on Knowledge and Data Engineering (2006)

    Google Scholar 

  19. Li, J., Bandar, Z., McLean, D., Shea, O.: A Method for Measuring Sentence Similarity and its Application to Conversational Agents. In: 17th International Florida Artificial Intelligence Research Society Conference, Miami Beach. AAAI Press (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ammar Riadh Kairaldeen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kairaldeen, A.R., Ercan, G. (2015). Calculation of Textual Similarity Using Semantic Relatedness Functions. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9042. Springer, Cham. https://doi.org/10.1007/978-3-319-18117-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18117-2_38

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18116-5

  • Online ISBN: 978-3-319-18117-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics