Skip to main content

“A term is known by the company it keeps”: On Selecting a Good Expansion Set in Pseudo-Relevance Feedback

  • Conference paper
Advances in Information Retrieval Theory (ICTIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5766))

Included in the following conference series:

Abstract

It is well known that pseudo-relevance feedback (PRF) improves the retrieval performance of Information Retrieval (IR) systems in general. However, a recent study by Cao et al [3] has shown that a non-negligible fraction of expansion terms used by PRF algorithms are harmful to the retrieval. In other words, a PRF algorithm would be better off if it were to use only a subset of the feedback terms. The challenge then is to find a good expansion set from the set of all candidate expansion terms. A natural approach to solve the problem is to make term independence assumption and use one or more term selection criteria or a statistical classifier to identify good expansion terms independent of each other. In this work, we challenge this approach and show empirically that a feedback term is neither good nor bad in itself in general; the behavior of a term depends very much on other expansion terms. Our finding implies that a good expansion set can not be found by making term independence assumption in general. As a principled solution to the problem, we propose spectral partitioning of expansion terms using a specific term-term interaction matrix. We demonstrate on several test collections that expansion terms can be partitioned into two sets and the best of the two sets gives substantial improvements in retrieval performance over model-based feedback.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buckley, C., Salton, G., Allan, J.: Automatic retrieval with locality information using smart. In: TREC, pp. 59–72 (1992)

    Google Scholar 

  2. Carpineto, C., Romano, G.: Towards more effective techniques for automatic query expansion. In: Abiteboul, S., Vercoustre, A.-M. (eds.) ECDL 1999. LNCS, vol. 1696, pp. 126–141. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  3. Cao, G., Nie, J.Y., Gao, J., Robertson, S.: Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR 2008: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 243–250. ACM Press, New York (2008)

    Google Scholar 

  4. Zhai, C., Lafferty, J.: Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of Tenth International Conference on Information and Knowledge Management, pp. 403–410 (2001)

    Google Scholar 

  5. Lavrenko, V., Croft, B.W.: Relevance based language models. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 120–127. ACM Press, New York (2001)

    Google Scholar 

  6. Chung, F.R.K.: Spectral Graph Theory (CBMS Regional Conference Series in Mathematics), February 1997. Cbms Regional Conference Series in Mathematics, vol. 92. American Mathematical Society, Providence (1997)

    Google Scholar 

  7. Meyer, C., Basabe, I., Langville, A.: Clustering with the svd. In: Workshop on Numerical Linear Algebra, the Internet and its Applications, Monopoli (2007)

    Google Scholar 

  8. von Luxburg, U.: A tutorial on spectral clustering. Technical Report 149, Max Planck Institute for Biological Cybernetics (August 2006)

    Google Scholar 

  9. Efthimiadis, E.N.: Query expansion. Annual Review of Information Systems and Technology 31, 121–187 (1996)

    Google Scholar 

  10. Zhai, C.: Statistical language models for information retrieval a critical review. Found. Trends Inf. Retr. 2(3), 137–213 (2008)

    Article  Google Scholar 

  11. Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  12. Tao, T., Zhai, C.: Regularized estimation of mixture models for robust pseudo-relevance feedback. In: SIGIR 2006: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 162–169. ACM, New York (2006)

    Google Scholar 

  13. Spielman, D.A., Teng, S.: Spectral partitioning works: Planar graphs and finite element meshes. Technical report, Berkeley, CA, USA (1996)

    Google Scholar 

  14. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)

    Article  Google Scholar 

  15. Lafferty, J., Zhai, C.: Document language models, query models, and risk minimization for information retrieval. In: SIGIR 2001: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 111–119. ACM, New York (2001)

    Google Scholar 

  16. Robertson, S.E.: On term selection for query expansion. J. Doc. 46(4), 359–364 (1990)

    Article  Google Scholar 

  17. Smeaton, A.F., van Rijsbergen, C.J.: The retrieval effects of query expansion on a feedback document retrieval system. Comput. J. 26(3), 239–246 (1983)

    Article  Google Scholar 

  18. Robertson, S.E., Jones, S.K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Udupa, R., Bhole, A., Bhattacharyya, P. (2009). “A term is known by the company it keeps”: On Selecting a Good Expansion Set in Pseudo-Relevance Feedback. In: Azzopardi, L., et al. Advances in Information Retrieval Theory. ICTIR 2009. Lecture Notes in Computer Science, vol 5766. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04417-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04417-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04416-8

  • Online ISBN: 978-3-642-04417-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics