Similarity Measures for Short Segments of Text

  • Donald Metzler
  • Susan Dumais
  • Christopher Meek
Conference paper

DOI: 10.1007/978-3-540-71496-5_5

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4425)
Cite this paper as:
Metzler D., Dumais S., Meek C. (2007) Similarity Measures for Short Segments of Text. In: Amati G., Carpineto C., Romano G. (eds) Advances in Information Retrieval. ECIR 2007. Lecture Notes in Computer Science, vol 4425. Springer, Berlin, Heidelberg

Abstract

Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Donald Metzler
    • 1
  • Susan Dumais
    • 2
  • Christopher Meek
    • 2
  1. 1.University of Massachusetts, Amherst, MA 
  2. 2.Microsoft Research, Redmond, WA 

Personalised recommendations