Skip to main content
Log in

Query Performance Prediction for Information Retrieval Based on Covering Topic Score

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

We present a statistical method called Covering Topic Score (CTS) to predict query performance for information retrieval. Estimation is based on how well the topic of a user’s query is covered by documents retrieved from a certain retrieval system. Our approach is conceptually simple and intuitive, and can be easily extended to incorporate features beyond bag-of-words such as phrases and proximity of terms. Experiments demonstrate that CTS significantly correlates with query performance in a variety of TREC test collections, and in particular CTS gains more prediction power benefiting from features of phrases and proximity of terms. We compare CTS with previous state-of-the-art methods for query performance prediction including clarity score and robustness score. Our experimental results show that CTS consistently performs better than, or at least as well as, these other methods. In addition to its high effectiveness, CTS is also shown to have very low computational complexity, meaning that it can be practical for real applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Carmel D, Yom-Tov E, Soboroff I. Predicting query difficulty. In Proc. SIGIR Workshop, Salvador, Brazil, 2005, http://www.haifa.ibm.com/sigir05-qp/index.html.

  2. Voorhees E M. Overview of the TREC 2004 robust track. In the Online Proceeding of 2004 Text Retrieval Conference (TREC 2004).

  3. Yom-Tov E, Fine S, Carmel D, Darlow A. Learning to estimate query difficulty: Including applications to missing content detection and distributed information retrieval. In Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.512–519.

  4. Cronen-Townsend S, Zhou Y, Croft B. Precision prediction based on ranked list coherence. Information Retrieval, 2006, 9(6): 723–755.

    Article  Google Scholar 

  5. Harman D, Buckley C. The NRRC reliable information access (RIA) workshop. In Proc. the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, 2004, pp.528–529.

  6. He B, Ounis I. Inferring query performance using pre-retrieval predictors. In Proc. the SPIRE 2004, Padova, Italy, 2004, pp.43–54.

  7. Plachouras V, He B, Ounis I. University of Glasgow at TREC2004: Experiments in web, robust, and terabyte tracks with terrier. In the Online Proc. 2004 Text Retrieval Conference (TREC 2004).

  8. Mothe J, Tanguy L. Linguistic features to predict query difficulty. In Proc. ACM SIGIR 2005 Workshop on Predicting Query Difficulty-Methods and Applications, 2005.

  9. Swen B, Lu X-Q, Zan H-Y, Su Q, Lai Z-G, Xiang K, Hu J-H. Part-of-speech sense matrix model experiments in the TREC 2004 robust track at ICL, PKU. In the Online Proceeding of 2004 Text Retrieval Conference (TREC 2004).

  10. Cronen-Townsend S, Zhou Y, Croft W B. Predicting query performance. In Proc. the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, 2002, pp.299–306.

  11. Amati G, Carpineto C, Romano G. Query difficulty, robustness and selective application of query expansion. In Proc. the 25th European Conference on Information Retrieval, Sunderland, Great Britain, 2004, pp.127–137.

  12. Zhou Y, Croft W B. Ranking robustness: A novel framework to predict query performance. In Proc. the 15th ACM International Conference on Information and Knowledge Management. Virginia, USA, 2006, pp.567–574.

  13. Vinay V, Cox I J, Milic-Frayling N, Wood K. On ranking the effectiveness of searches. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.398–404.

  14. C J van Rijsbergen. Information Retrieval. Second Edition, London: Butterworths, 1979.

    Google Scholar 

  15. Carmel D, Yom-Tov E, Darlow A, Pelleg D. What makes a query difficult? In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.390–397.

  16. Song F, Croft W B. A general language model for information retrieval. In Proc. the 18th ACM International Conference on Information and Knowledge Management, Kansas City, USA, 1999, pp.316–321.

  17. D Metzler, W Bruce Croft. A Markov random field model for term dependencies. In Proc. the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp.472–479.

  18. G Mishne, M de Rijke. Boosting web retrieval through query operations. In Proc. the 27th European Conference on Information Retrieval, pp.502–516.

  19. Yang Y, Liu X. A re-examination of text categorization methods. In Proc. the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, USA, 1999, pp.42–49.

  20. Wasserman L. All of Statistics: A Concise Course in Statistical Inference. Springer Press, 2004.

  21. Tao T, Zhai C. Regularized estimation of mixture models for robust pseudo-relevance feedback. In Proc. the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, USA, 2006, pp.162–169.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Lang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grant No. 60603094 and the National Grand Fundamental Research 973 Program of China under Grant No. 2004CB318109.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 100 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lang, H., Wang, B., Jones, G. et al. Query Performance Prediction for Information Retrieval Based on Covering Topic Score. J. Comput. Sci. Technol. 23, 590–601 (2008). https://doi.org/10.1007/s11390-008-9155-6

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9155-6

Keywords

Navigation