Information Retrieval

, 14:593

A study of the integration of passage-, document-, and cluster-based information for re-ranking search results


DOI: 10.1007/s10791-011-9168-6

Cite this article as:
Krikon, E. & Kurland, O. Inf Retrieval (2011) 14: 593. doi:10.1007/s10791-011-9168-6


Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on “expanding” the document representation, while passage-based approaches can be thought of as utilizing a “contracted” document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.


Ad hoc retrieval Re-ranking Clusters Cluster-based language models Passages Passage-based language models 

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.Faculty of Industrial Engineering and ManagementTechnion, Israel Institute of TechnologyHaifaIsrael

Personalised recommendations