An Approach to Semantic Text Similarity Computing

  • Imen AkermiEmail author
  • Rim Faiz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 285)


The use of text similarity plays an important role in many applications in Computational Linguistics, such as Text Classification and Information Extraction and Retrieval. Besides, there are several tasks that require computing the similarity between two short segments of text. In this work, we propose a sentence similarity computing approach that takes account of the semantic and the syntactic information contained in the sentences. The proposed method can be applied in a variety of applications to mention, text knowledge representation and discovery. Experiments on a set of sentence pairs show that our approach presents a similarity measure that illustrates a considerable correlation to human judgment.


Natural language processing Semantic similarity Computational linguistics 


  1. 1.
    McDonald, S.: Exploring the validity of corpus-derived measures of semantic similarity. In: 9th Annual CCS/HCRC Postgraduate Conference, University of Edinburgh (1997)Google Scholar
  2. 2.
    Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Proc. 6(1), 1–28 (1991)CrossRefGoogle Scholar
  3. 3.
    Elkhlifi, A., Bouchlaghem, R., Faiz, R.: Opinion extraction and classification based on semantic similarities. In: 24th International Florida Artificial Intelligence Research Society Conference. AAAI Press, Palm Beach, Florida, USA (2011)Google Scholar
  4. 4.
    Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22(1), 457–479 (2004)Google Scholar
  5. 5.
    Somers, H.: Review article: example-based machine translation. Mach. Transl. 14(2), 113–157 (1999)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Coelho, T.A.S., Calado, P.P., Souza, L.V., Ribeiro-Neto, B., Muntz, R.: Image retrieval using multiple evidence ranking. IEEE Trans. Knowl. Data Eng. 16(4), 408–417 (2004)CrossRefGoogle Scholar
  7. 7.
    Ko, Y., Park, J., Seo, J.: Improving text categorization using the importance of sentences. Inf. Process. Manage. 40(1), 65–79 (2004)CrossRefGoogle Scholar
  8. 8.
    Liu, T., Guo, J.: Text similarity computing based on standard deviation. In: International Conference on Advances in Intelligent Computing: Part I, pp. 456–464, Hefei, China (2005)Google Scholar
  9. 9.
    Wegrzyn-Wolska, K., Szczepaniak, P.: Classification of RSS-formatted documents using full text similarity measures. In: 5th International Conference on Web Engineering, pp. 400–405, Sydney, Australia, (2005)Google Scholar
  10. 10.
    Zhang, J.: Calculating statistical similarity between sentences. Convergence 6(2), 22–34 (2011)Google Scholar
  11. 11.
    Achananuparp, P., Hu, X., Shen, X.: The evaluation of sentence similarity measures. In: 10th International Conference on Data Warehousing and Knowledge Discovery, pp. 305–316. Springer, Heidelberg (2008)Google Scholar
  12. 12.
    Mohri, M.: Edit-distance of weighted automata. In: Champarnaud, J.-M., Maurel, D. (eds.) 7th International Conference, pp. 1–23, CIAA (2002)Google Scholar
  13. 13.
    Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  14. 14.
    Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: an on-line lexical database. Int. J. Lexicogr. 3(4), 235–244 (1993)CrossRefGoogle Scholar
  15. 15.
    Mihalcea, R.: Corpus-based and knowledge-based measures of text semantic similarity. In: 21st National Conference on Artificial Intelligence, vol. 1, pp. 775–780, Boston, Massachusetts (2006)Google Scholar
  16. 16.
    Turney, P.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: 12th European Conference on Machine Learning, pp. 491–502, London, UK (2001)Google Scholar
  17. 17.
    Landauer, T., Dumais, S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)CrossRefGoogle Scholar
  18. 18.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: 10th International Conference on Research on Computational Linguistics, pp. 19–33 (1997)Google Scholar
  19. 19.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
  20. 20.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: 5th ACM Annual International Conference on Systems Documentation, pp. 24–26 (1986)Google Scholar
  21. 21.
    Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: 14th International Joint Conference on Artificial Intelligence, pp. 448–453, Montreal, Quebec, Canada (1995)Google Scholar
  22. 22.
    Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138, Las Cruces, New Mexico, (1994)Google Scholar
  23. 23.
    Islam, A., Inkpen, D.: Semantic similarity of short text. In International Conference on Recent Advances in Natural Language Processing, Borovets, Bulgaria (2007)Google Scholar
  24. 24.
    Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discovery Data 2(2), 1–25 (2008)CrossRefGoogle Scholar
  25. 25.
    Inkpen, D.: Semantic similarity knowledge and its applications. Studia Universitatis BabesBolyai Informatica LII(1), 11–22 (2007)MathSciNetGoogle Scholar
  26. 26.
    Islam, A., Inkpen, D.: Second order co-occurrence PMI for determining the semantic similarity of words. In: 5th International Conference on Language Resources and Evaluation, pp. 1033–1038 (2006)Google Scholar
  27. 27.
    Akermi, I., Faiz, R.: Hybrid method for computing word-pair similarity based on web content. In: 2nd International Conference on Web Intelligence, Mining and Semantics, Craiova, Romania (2012)Google Scholar
  28. 28.
    Rubenstein, H., Goodenough, J.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  29. 29.
    Sinclair, J.: Collins Cobuild English Dictionary for Advanced Learners. HarperCollins, New York (2001)Google Scholar
  30. 30.
    Hirst, G., Budanitsky, A.: Correcting real-word spelling errors by restoring lexical cohesion. J. Nat. Lang. Eng. 11, 87–111 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.University of Tunis—ISG, LARODEC 2000BardoTunisia
  2. 2.University of Carthage—IHEC, LARODEC 2016CarthageTunisia

Personalised recommendations