A Comparative Study of Two Short Text Semantic Similarity Measures

  • James O’Shea
  • Zuhair Bandar
  • Keeley Crockett
  • David McLean
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4953)


This paper describes a comparative study of STASIS and LSA. These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs). CAs are computer programs that interact with humans through natural language dialogue. Business organizations have spent large sums of money in recent years developing them for online customer self-service, but achievements have been limited to simple FAQ systems. We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures. “Short texts” are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages. We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings. This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers.


Natural Language Semantic Similarity Dialogue Management User Modeling Benchmark Sentence 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Li, Y., et al.: Sentence Similarity Based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering 18(8), 1138–1150 (2006)CrossRefGoogle Scholar
  2. 2.
    Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  3. 3.
    Lapalme, G., Lamontagne, L.: Textual Reuse for Email Response. In: Funk, P., González Calero, P.A. (eds.) ECCBR 2004. LNCS (LNAI), vol. 3155, pp. 242–256. Springer, Heidelberg (2004)Google Scholar
  4. 4.
    Glass, J., et al.: A Framework for Developing Conversational User Interfaces. In: Fourth International Conference on Computer-Aided Design of User Interfaces, Funchal, Isle of Madeira, Portugal (2004)Google Scholar
  5. 5.
    Bickmore, T., Giorgino, T.: Health dialog systems for patients and consumers. J. Biomed. Inform. 39(5), 556–571 (2006)CrossRefGoogle Scholar
  6. 6.
    Cassell, J., et al.: Embodied Conversational Agents (2000)Google Scholar
  7. 7.
    Gorin, A.L., Riccardi, G., Wright, J.H.: How I help you? Speech Communication 23, 113–127 (1997)CrossRefGoogle Scholar
  8. 8.
    Graesser, A.C., et al.: AutoTutor: An Intelligent Tutoring System With Mixed Initiative Dialogue. IEEE Transactions on Education 48(4), 612–618 (2005)CrossRefGoogle Scholar
  9. 9.
    McGeary, Z., et al.: Online Self-service: The Slow Road to Search Effectiveness, in Customer Relationship Management (2005)Google Scholar
  10. 10.
    Sammut, C.: Managing Context in a Conversational Agent. Electronic Transactions in Artificial Intelligence Volume, 191–201 (2001)Google Scholar
  11. 11.
    Michie, D.: Return of the Imitation Game. Electronic Transactions in Artificial Intelligence Volume, 205–220 (2001)Google Scholar
  12. 12.
    Resnik, P., Diab, M.: Measuring Verb Similarity. In: Twenty Second Annual Meeting of the Cognitive Science Society (COGSCI 2000), Philadelphia (2000)Google Scholar
  13. 13.
    Resnik, P.: Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language. Journal of Artificial Intelligence Research 11, 95–130 (1999)zbMATHGoogle Scholar
  14. 14.
    Prior, A., Bentin, S.: Incidental formation of episodic associations: The importance of sen-tential context. Memory and Cognition 31, 306–316 (2003)Google Scholar
  15. 15.
    McNamara, T.P., Sternberg, R.J.: Processing Verbal Relations. Intelligence 15, 193–221 (1991)CrossRefGoogle Scholar
  16. 16.
    Miller, G.A., Charles, W.G.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6(1), 1–28 (1991)CrossRefGoogle Scholar
  17. 17.
    Viggliocho, G., et al.: Representing the meanings of object and action words: The featural and unitary semantic space hypothesis. Cognition 85, B1–B69 (2002)Google Scholar
  18. 18.
    Charles, W.G.: Contextual Correlates of Meaning. Applied Psycholinguistics 21, 505–524 (2000)CrossRefGoogle Scholar
  19. 19.
    Klein, D., Murphy, G.: Paper has been my ruin: conceptual relations of polysemous senses. Journal of Memory and Language 47(4), 548–570 (2002)CrossRefGoogle Scholar
  20. 20.
    Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)CrossRefGoogle Scholar
  21. 21.
    Gleitman, L.R., et al.: Similar, and similar concepts. Cognition 58, 321–376 (1996)CrossRefGoogle Scholar
  22. 22.
    Deerwester, S., et al.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  23. 23.
    Blalock, H.M.: Social Statistics. McGraw-Hill Inc., New York (1979)Google Scholar
  24. 24.
    Rubenstein, H., Goodenough, J.: Contextual Correlates of Synonymy. Communications of the ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  25. 25.
    Sinclair, J.: Collins Cobuild English Dictionary for Advanced Learners, 3rd edn. Harper Collins, New York (2001)Google Scholar
  26. 26.
  27. 27.
    Laham, D.: (October 1998) (cited 30/09/2007),

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • James O’Shea
    • 1
  • Zuhair Bandar
    • 1
  • Keeley Crockett
    • 1
  • David McLean
    • 1
  1. 1.Department of Computing and MathematicsManchester Metropolitan UniversityManchesterUnited Kingdom

Personalised recommendations