The Evaluation of Sentence Similarity Measures

  • Palakorn Achananuparp
  • Xiaohua Hu
  • Xiajiong Shen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5182)


The ability to accurately judge the similarity between natural language sentences is critical to the performance of several applications such as text mining, question answering, and text summarization. Given two sentences, an effective similarity measure should be able to determine whether the sentences are semantically equivalent or not, taking into account the variability of natural language expression. That is, the correct similarity judgment should be made even if the sentences do not share similar surface form. In this work, we evaluate fourteen existing text similarity measures which have been used to calculate similarity score between sentences in many text applications. The evaluation is conducted on three different data sets, TREC9 question variants, Microsoft Research paraphrase corpus, and the third recognizing textual entailment data set.


Sentence similarity Paraphrase Recognition Textual Entailment Recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Achananuparp, P., Hu, X., Zhou, X., Zhang, X.: Utilizing Sentence Similarity and Question Type Similarity to Response to Similar Questions in Knowledge-Sharing Community. In: Proceedings of QAWeb 2008 Workshop, Beijing, China (to appear, 2008)Google Scholar
  2. 2.
    Allan, J., Bolivar, A., Wade, C.: Retrieval and novelty detection at the sentence level. In: Proceedings of SIGIR 2003, pp. 314–321 (2003)Google Scholar
  3. 3.
    Balasubramanian, N., Allan, J., Croft, W.B.: A comparison of sentence retrieval techniques. In: Proceedings of SIGIR 2007, Amsterdam, The Netherlands, pp. 813–814 (2007)Google Scholar
  4. 4.
    Banerjee, S., Pedersen, T.: Extended gloss overlap as a measure of semantic relatedness. In: Proceedings of IJCAI 2003, Acapulco, Mexico, pp. 805–810 (2003)Google Scholar
  5. 5.
    Budanitsky, A., Hirst, G.: Evaluating WordNet-based measures of semantic distance. Computational Linguistics 32(1), 13–47 (2006)CrossRefGoogle Scholar
  6. 6.
    Dagan, I., Glickman, O., Magnini, B.: The PASCAL recognising textual entailment challenge. In: Proceedings of the PASCAL Workshop (2005)Google Scholar
  7. 7.
    Dolan, W., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel new sources. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)Google Scholar
  8. 8.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  9. 9.
    Hoad, T., Zobel, J.: Methods for identifying versioned and plagiarized documents. Journal of the American Society of Information Science and Technology 54(3), 203–215 (2003)CrossRefGoogle Scholar
  10. 10.
    Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How Well Can Passage Meaning Be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans. In: Proc. 19th Ann. Meeting of the Cognitive Science Soc., pp. 412–417 (1997)Google Scholar
  11. 11.
    Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  12. 12.
    Lin, D.: An Information-Theoretic Definition of Similarity. In: Proceedings of the Fifteenth international Conference on Machine Learning, San Francisco, CA, pp. 296–304 (1998)Google Scholar
  13. 13.
    Malik, R., Subramaniam, V., Kaushik, S.: Automatically Selecting Answer Templates to Respond to Customer Emails. In: Proceedings of IJCAI 2007, Hyderabad, India, pp. 1659–1664 (2007)Google Scholar
  14. 14.
    Metzler, D., Bernstein, Y., Croft, W., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: Proceedings of CIKM, pp. 517–524 (2005)Google Scholar
  15. 15.
    Metzler, D., Dumais, S.T., Meek, C.: Similarity Measures for Short Segments of Text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  16. 16.
    Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. In: Proceedings of AAAI 2006, Boston (July 2006)Google Scholar
  17. 17.
    Murdock, V.: Aspects of sentence retrieval. Ph.D. Thesis, University of Massachusetts (2006)Google Scholar
  18. 18.
    Papineni, K.: Why inverse document frequency? In: Proceeding of the North American Chapter of the Association for Computational Linguistics, pp. 25–32 (2001)Google Scholar
  19. 19.
    Ponzetto, S.P., Strube, M.: Knowledge Derived From Wikipedia for Computing Semantic Relatedness. Journal of Artificial Intelligence Research 30, 181–212 (2007)Google Scholar
  20. 20.
    Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of WWW 2006, Edinburgh, Scotland, pp. 377–386 (2006)Google Scholar
  21. 21.
    Tomuro, N.: Interrogative Reformulation Patterns and Acquisition of Question Paraphrases. In: Proceedings of the 2nd international Workshop on Paraphrasing, pp. 33–40 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Palakorn Achananuparp
    • 1
  • Xiaohua Hu
    • 1
    • 2
  • Xiajiong Shen
    • 2
  1. 1.College of Information Science and TechnologyDrexel UniversityPhiladelphia 
  2. 2.College of Computer and Information EngineeringHehan UniversityHenanChina

Personalised recommendations