Information Retrieval

, Volume 13, Issue 2, pp 157–187

Utilizing passage-based language models for ad hoc document retrieval

Article

DOI: 10.1007/s10791-009-9118-8

Cite this article as:
Bendersky, M. & Kurland, O. Inf Retrieval (2010) 13: 157. doi:10.1007/s10791-009-9118-8

Abstract

To cope with the fact that, in the ad hoc retrieval setting, documents relevant to a query could contain very few (short) parts (passages) with query-related information, researchers proposed passage-based document ranking approaches. We show that several of these retrieval methods can be understood, and new ones can be derived, using the same probabilistic model. We use language-model estimates to instantiate specific retrieval algorithms, and in doing so present a novel passage language model that integrates information from the containing document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we present yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; these relevance models also outperform a document-based relevance model. Finally, we demonstrate the merits in using the document-homogeneity measures for integrating document-query and passage-query similarity information for document retrieval.

Keywords

Ad hoc document retrieval Passage-based language models Document homogeneity Relevance models Passage-based relevance models 

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer Science, Center for Intelligent Information RetrievalUniversity of Massachusetts AmherstAmherstUSA
  2. 2.Faculty of Industrial Engineering and ManagementTechnion—Israel Institute of TechnologyHaifaIsrael

Personalised recommendations