Skip to main content

Exploring and Exploiting Proximity Statistic for Information Retrieval Model

  • Conference paper
Information Retrieval Technology (AIRS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

Abstract

Proximity among query terms has been recognized to be useful for boosting retrieval performance. However, how to model proximity effectively and efficiently remains a challenging research problem. In this paper, we propose a novel proximity statistic, namely Phrase Frequency, to model term proximity systematically. Then we propose a new proximity-enhanced retrieval model named BM25PF that combines the phrase frequency information with the basic BM25 model to rank the documents. Extensive experiments on four standard TREC collections illustrate the effectiveness of the BM25PF model, and also shows the significant influence of the phrase frequency on retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bai, J., Chang, Y., Cui, H., Zheng, Z., Sun, G., Li, X.: Investigation of partial query proximity in web search. In: Proc. of the 17th WWW, pp. 1183–1184 (2008)

    Google Scholar 

  2. Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proc. of the 29th SIGIR, pp. 621–622 (2006)

    Google Scholar 

  3. Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for trec-4). In: TREC, pp. 295–304 (1995)

    Google Scholar 

  4. Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval. In: Proc. of the 32nd SIGIR, pp. 251–258 (2009)

    Google Scholar 

  5. Fagan, J.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th ACM SIGIR, pp. 91–101 (1987)

    Google Scholar 

  6. Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proc. of the 27th ACM SIGIR, pp. 170–177 (2004)

    Google Scholar 

  7. Hawking, D., Thistlewaite, P.B.: Proximity operators - so near and yet so far. In: TREC (1995)

    Google Scholar 

  8. He, B., Huang, J.X., Zhou, X.: Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181, 3017–3031 (2011)

    Article  MathSciNet  Google Scholar 

  9. Risvik, K.M., Mikolajewski, T., Boros, P.: Query segmentation for web search. In: Proc. of the 12th WWW (2003)

    Google Scholar 

  10. Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 299–306 (2009)

    Google Scholar 

  11. Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proc. of the 28th ACM SIGIR, pp. 472–479 (2005)

    Google Scholar 

  12. Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proc. of 5th RIAO, pp. 200–214 (1997)

    Google Scholar 

  13. Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: Proc. of the 11th ACM CIKM, pp. 383–390 (2002)

    Google Scholar 

  14. Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of the 21st ACM SIGIR, pp. 275–281 (1998)

    Google Scholar 

  15. Rasolofo, Y., Savoy, J.: Term Proximity Scoring for Keyword-Based Retrieval Systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)

    Article  Google Scholar 

  17. Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In: TREC, pp. 199–210 (1998)

    Google Scholar 

  18. Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)

    Article  Google Scholar 

  19. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  20. Song, F., Croft, W.B.: A general language model for information retrieval. In: Proc. of the 8th ACM CIKM, pp. 316–321 (1999)

    Google Scholar 

  21. Song, R., Taylor, M.J., Wen, J.-R., Hon, H.-W., Yu, Y.: Viewing Term Proximity from a Different Perspective. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 346–357. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  22. Srikanth, M., Srihari, R.: Biterm language models for document retrieval. In: Proc. of the 25th ACM SIGIR, pp. 425–426 (2002)

    Google Scholar 

  23. Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: Proc. of the 17th WWW, pp. 347–356 (2008)

    Google Scholar 

  24. Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: Proc. of the 30th ACM SIGIR, pp. 295–302 (2007)

    Google Scholar 

  25. Zhao, J., Huang, J.X., He, B.: Crter: using cross terms to enhance probabilistic information retrieval. In: Proc. of the 34th ACM SIGIR, pp. 155–164 (2011)

    Google Scholar 

  26. Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 291–298 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhu, Y., Xue, Y., Guo, J., Lan, Y., Cheng, X., Yu, X. (2012). Exploring and Exploiting Proximity Statistic for Information Retrieval Model. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35341-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35340-6

  • Online ISBN: 978-3-642-35341-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics