“A term is known by the company it keeps”: On Selecting a Good Expansion Set in Pseudo-Relevance Feedback

Udupa, Raghavendra; Bhole, Abhijit; Bhattacharyya, Pushpak

doi:10.1007/978-3-642-04417-5_10

Raghavendra Udupa²¹,
Abhijit Bhole²² &
Pushpak Bhattacharyya²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5766))

Included in the following conference series:

Conference on the Theory of Information Retrieval

1005 Accesses
7 Citations

Abstract

It is well known that pseudo-relevance feedback (PRF) improves the retrieval performance of Information Retrieval (IR) systems in general. However, a recent study by Cao et al [3] has shown that a non-negligible fraction of expansion terms used by PRF algorithms are harmful to the retrieval. In other words, a PRF algorithm would be better off if it were to use only a subset of the feedback terms. The challenge then is to find a good expansion set from the set of all candidate expansion terms. A natural approach to solve the problem is to make term independence assumption and use one or more term selection criteria or a statistical classifier to identify good expansion terms independent of each other. In this work, we challenge this approach and show empirically that a feedback term is neither good nor bad in itself in general; the behavior of a term depends very much on other expansion terms. Our finding implies that a good expansion set can not be found by making term independence assumption in general. As a principled solution to the problem, we propose spectral partitioning of expansion terms using a specific term-term interaction matrix. We demonstrate on several test collections that expansion terms can be partitioned into two sets and the best of the two sets gives substantial improvements in retrieval performance over model-based feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buckley, C., Salton, G., Allan, J.: Automatic retrieval with locality information using smart. In: TREC, pp. 59–72 (1992)
Google Scholar
Carpineto, C., Romano, G.: Towards more effective techniques for automatic query expansion. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 126–141. Springer, Heidelberg (1999)
Chapter Google Scholar
Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 243–250. ACM Press, New York (2008)
Google Scholar
Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of Tenth International Conference on Information and Knowledge Management, pp. 403–410 (2001)
Google Scholar
Lavrenko, V., Croft, B.W.: Relevance based language models. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 120–127. ACM Press, New York (2001)
Google Scholar
Chung, F.R.K.: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics), February 1997. Cbms Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1997)
Google Scholar
Meyer, C., Basabe, I., Langville, A.: Clustering with the svd. In: Workshop on Numerical Linear Algebra, the Internet and its Applications, Monopoli (2007)
Google Scholar
von Luxburg, U.: A tutorial on spectral clustering. Technical Report 149, Max Planck Institute for Biological Cybernetics (August 2006)
Google Scholar
Efthimiadis, E.N.: Query expansion. Annual Review of Information Systems and Technology 31, 121–187 (1996)
Google Scholar
Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008)
Article Google Scholar
Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)
Article MathSciNet MATH Google Scholar
Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 162–169. ACM, New York (2006)
Google Scholar
Spielman, D.A., Teng, S.: Spectral partitioning works: Planar graphs and finite element meshes. Technical report, Berkeley, CA, USA (1996)
Google Scholar
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Article Google Scholar
Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 111–119. ACM, New York (2001)
Google Scholar
Robertson, S.E.: On term selection for query expansion. J. Doc. 46(4), 359–364 (1990)
Article Google Scholar
Smeaton, A.F., van Rijsbergen, C.J.: The retrieval effects of query expansion on a feedback document retrieval system. Comput. J. 26(3), 239–246 (1983)
Article Google Scholar
Robertson, S.E., Jones, S.K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research India, Bangalore, 560080, India
Raghavendra Udupa
Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, Mumbai, 400076, India
Abhijit Bhole & Pushpak Bhattacharyya

Authors

Raghavendra Udupa
View author publications
You can also search for this author in PubMed Google Scholar
Abhijit Bhole
View author publications
You can also search for this author in PubMed Google Scholar
Pushpak Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing Science, Sir Alwyn Williams Building, Lilybank Gardens, University of Glasgow, G12 8QQ, Glasgow, Scotland, UK
Leif Azzopardi
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, UK
Gabriella Kazai & Stephen Robertson &
Knowledge Media Institute,, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Microsoft Research Ltd, 7 JJ Thomson Avenue, CB3 0FB, Cambridge, United Kingdom
Milad Shokouhi & Emine Yilmaz &
School of Computing, The Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Dawei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Udupa, R., Bhole, A., Bhattacharyya, P. (2009). “A term is known by the company it keeps”: On Selecting a Good Expansion Set in Pseudo-Relevance Feedback. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-04417-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04416-8
Online ISBN: 978-3-642-04417-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics