Advertisement

Viewing Term Proximity from a Different Perspective

  • Ruihua Song
  • Michael J. Taylor
  • Ji-Rong Wen
  • Hsiao-Wuen Hon
  • Yong Yu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4956)

Abstract

This paper extends the state-of-the-art probabilistic model BM25 to utilize term proximity from a new perspective. Most previous work only consider dependencies between pairs of terms, and regard phrases as additional independent evidence. It is difficult to estimate the importance of a phrase and its extra contribution to a relevance score, as the phrase actually overlaps with the component terms. This paper proposes a new approach. First, query terms are grouped locally into non-overlapping phrases that may contain one or more query terms. Second, these phrases are not scored independently but are instead treated as providing a context for the component query terms. The relevance contribution of a term occurrence is measured by how many query terms occur in the context phrase and how compact they are. Third, we replace term frequency by the accumulated relevance contribution. Consequently, term proximity is easily integrated into the probabilistic model. Experimental results on TREC-10 and TREC-11 collections show stable improvements in terms of average precision and significant improvements in terms of top precisions.

Keywords

Language Model Average Precision Ranking Function Markov Random Field Term Frequency 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia (1998)Google Scholar
  2. 2.
    Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of 29th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval (2006)Google Scholar
  3. 3.
    Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for TREC-4). In: Proceedings of TREC-4 (1995)Google Scholar
  4. 4.
    Clarke, C.L.A., Cormack, G.V., Tudhope, E.A.: Relevance ranking for one to three term queries. Information Processing & Management 36(2), 291–311 (2000)CrossRefGoogle Scholar
  5. 5.
    Croft, W.B., Turtle, H.R., Lewis, D.D.: The use of phrases and structured queries in information retrieval. In: Proceedings of 14th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 32–45 (1991)Google Scholar
  6. 6.
    Croft, W.B.: Boolean queries and term dependencies in probabilistic retrieval models. JASIS 37(2), 71–77 (1986)Google Scholar
  7. 7.
    CSIRO, TREC Web Tracks home page, www.ted.cmis.csiro.au/TRECWeb/
  8. 8.
    Fagan, J.L.: Automatic phrase indexing for document retrieval: An examination of syntactic and non-syntactic methods. In: Proceedings of 10th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 91–101 (1987)Google Scholar
  9. 9.
    Fox, C.: A stop list for general text. In: SIGIR Forum, December 1990, vol. 24(4), pp. 19–35. ACM Press, New York (1990)Google Scholar
  10. 10.
    Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proceedings of 27th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 170–177 (2004)Google Scholar
  11. 11.
    Harman, D.K.: Overview of the fourth Text Retrieval Conference (TREC-4). In: Proceedings of TREC-4, pp. 1–24Google Scholar
  12. 12.
    Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216Google Scholar
  13. 13.
    Harper, D.J., van Rijsbergen, C.J.: An evaluation of feedback in document retrieval using co-occurrence data. Journal of Documentation 34, 189–216Google Scholar
  14. 14.
    Hawking, D., Thistlewaite, P.: Proximity operators - So near and yet so far. In: Proceedings of TREC-4, pp. 131–143 (1995)Google Scholar
  15. 15.
    Hawking, D., Thistlewaite, P.: Relevance weighting using distance between term occurrences. Computer Science Technical Report TR-CS-96-08, Australian National University (August 1996)Google Scholar
  16. 16.
    Losee Jr., R.M.: Term dependence: truncating the Bahadur Lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)CrossRefGoogle Scholar
  17. 17.
    Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of 28th Ann. Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 472–479 (2005)Google Scholar
  18. 18.
    Mishne, G., de Rijke, M.: Boosting web retrieval through query operations. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 502–516. Springer, Heidelberg (2005)Google Scholar
  19. 19.
    Nallapati, R., Allan, J.: Capturing term dependencies using a language model on sentence trees. In: Proceedings of the 2002 ACM CIKM Intl. Conf. on Information and Knowledge Management, pp. 383–390 (2002)Google Scholar
  20. 20.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  21. 21.
    Pratt, E.J.: Complete poems. University of Toronto Press (1989)Google Scholar
  22. 22.
    Rasolofo, Y., Savoy, J.: Term proximity scoring for keyword-based retrieval systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Robertson, S.E., Spark Jones, K.: Relevance weighting for search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)CrossRefGoogle Scholar
  24. 24.
    Robertson, S.E., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36(1), 95–108 (2000)CrossRefGoogle Scholar
  25. 25.
    Song, F., Croft, W.B.: A general language model for information retrieval. In: Proceedings of CIKM 1999, pp. 316–321 (1999)Google Scholar
  26. 26.
    Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: The public and their queries. Journal of the American Society for Information Science and Technology 52(3), 226–234 (2001)CrossRefGoogle Scholar
  27. 27.
    Srikanth, M., Srikanth, R.: Biterm language models for document retrieval. In: Proceedings of SIGIR 2002, pp. 425–426 (2002)Google Scholar
  28. 28.
    van Rijsbergen, C.J.: A theoretical basis for the use of cooccurrence data in retrieval. Journal of Documentation 33(2), 106–119 (1977)CrossRefGoogle Scholar
  29. 29.
    Yu, C.T., Buckley, C., Lam, K., Salton, G.: A generalized term dependence in information retrieval. Technical report (1983)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Ruihua Song
    • 1
    • 2
  • Michael J. Taylor
    • 3
  • Ji-Rong Wen
    • 2
  • Hsiao-Wuen Hon
    • 2
  • Yong Yu
    • 1
  1. 1.Dept. of Computer Science and EngineerShanghai Jiao Tong UniversityShanghaiChina
  2. 2.Microsoft Research AsiaBeijingChina
  3. 3.Microsoft Research LtdCambridgeEngland

Personalised recommendations