A Cluster Based Pseudo Feedback Technique Which Exploits Good and Bad Clusters

  • Javier Parapar
  • Álvaro Barreiro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7023)

Abstract

In the last years, cluster based retrieval has been demonstrated as an effective tool for both interactive retrieval and pseudo relevance feedback techniques. In this paper we propose a new cluster based retrieval function which uses the best and worst clusters of a document in the cluster ranking, to improve the retrieval effectiveness. The evaluation shows improvements in some standard TREC collections over the state-of-the-art techniques in precision and robustness.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Carpineto, C., de Mori, R., Romano, G., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Trans. Inf. Syst. 19(1), 1–27 (2001)CrossRefGoogle Scholar
  2. 2.
    Collins-Thompson, K., Callan, J.: Estimation and use of uncertainty in pseudo-relevance feedback. In: SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 303–310. ACM Press, New York (2007)Google Scholar
  3. 3.
    Croft, W., Harper, D.: Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 285–295 (1979)CrossRefGoogle Scholar
  4. 4.
    Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 154–161. ACM, New York (2006)Google Scholar
  5. 5.
    Hearst, M.A., Pedersen, J.O.: Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: SIGIR 1996: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76–84. ACM, New York (1996)CrossRefGoogle Scholar
  6. 6.
    Kurland, O.: The opposite of smoothing: a language model approach to ranking query-specific document clusters. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 171–178. ACM, New York (2008)Google Scholar
  7. 7.
    Kurland, O.: Re-ranking search results using language models of query-specific clusters. Inf. Retr. 12(4), 437–460 (2009)CrossRefGoogle Scholar
  8. 8.
    Kurland, O., Domshlak, C.: A rank-aggregation approach to searching for optimal query-specific clusters. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 547–554. ACM, New York (2008)Google Scholar
  9. 9.
    Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 194–201. ACM, New York (2004)Google Scholar
  10. 10.
    Lavrenko, V., Croft, W.B.: Relevance based language models. In: SIGIR 2001: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 120–127. ACM, New York (2001)Google Scholar
  11. 11.
    Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: SIGIR 2008: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 235–242. ACM, New York (2008)Google Scholar
  12. 12.
    Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR 2004: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 186–193. ACM, New York (2004)Google Scholar
  13. 13.
    Liu, X., Croft, W.B.: Evaluating Text Representations for Retrieval of the Best Group of Documents. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 454–462. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  14. 14.
    Lu, X.A., Ayoub, M., Dong, J.: Ad Hoc Experiments using Eureka. In: Proceedings of the Fifth Text Retrieval Conference (TREC-5), pp. 229–240 (1996)Google Scholar
  15. 15.
    Parapar, J., Barreiro, A.: Promoting Divergent Terms in the Estimation of Relevance Models. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 77–88. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Rijsbergen, C.V.: Information Retrieval. Butterworths, London (1979)MATHGoogle Scholar
  17. 17.
    Sakai, T., Manabe, T., Koyama, M.: Flexible pseudo-relevance feedback via selective sampling. ACM Transactions on Asian Language Information Processing (TALIP) 4(2), 111–135 (2005)CrossRefGoogle Scholar
  18. 18.
    Tombros, A., Villa, R., Van Rijsbergen, C.J.: The effectiveness of query-specific hierarchic clustering in information retrieval. Inf. Process. Manage. 38(4), 559–582 (2002)CrossRefMATHGoogle Scholar
  19. 19.
    Voorhees, E.M.: The cluster hypothesis revisited. In: SIGIR 1985: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 188–196. ACM, New York (1985)Google Scholar
  20. 20.
    Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Javier Parapar
    • 1
  • Álvaro Barreiro
    • 1
  1. 1.IRLab, Computer Science DepartmentUniversity of A CoruñaSpain

Personalised recommendations