Abstract
Proximity among query terms has been recognized to be useful for boosting retrieval performance. However, how to model proximity effectively and efficiently remains a challenging research problem. In this paper, we propose a novel proximity statistic, namely Phrase Frequency, to model term proximity systematically. Then we propose a new proximity-enhanced retrieval model named BM25PF that combines the phrase frequency information with the basic BM25 model to rank the documents. Extensive experiments on four standard TREC collections illustrate the effectiveness of the BM25PF model, and also shows the significant influence of the phrase frequency on retrieval performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bai, J., Chang, Y., Cui, H., Zheng, Z., Sun, G., Li, X.: Investigation of partial query proximity in web search. In: Proc. of the 17th WWW, pp. 1183–1184 (2008)
Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proc. of the 29th SIGIR, pp. 621–622 (2006)
Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for trec-4). In: TREC, pp. 295–304 (1995)
Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval. In: Proc. of the 32nd SIGIR, pp. 251–258 (2009)
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th ACM SIGIR, pp. 91–101 (1987)
Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proc. of the 27th ACM SIGIR, pp. 170–177 (2004)
Hawking, D., Thistlewaite, P.B.: Proximity operators - so near and yet so far. In: TREC (1995)
He, B., Huang, J.X., Zhou, X.: Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181, 3017–3031 (2011)
Risvik, K.M., Mikolajewski, T., Boros, P.: Query segmentation for web search. In: Proc. of the 12th WWW (2003)
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 299–306 (2009)
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proc. of the 28th ACM SIGIR, pp. 472–479 (2005)
Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proc. of 5th RIAO, pp. 200–214 (1997)
Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: Proc. of the 11th ACM CIKM, pp. 383–390 (2002)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of the 21st ACM SIGIR, pp. 275–281 (1998)
Rasolofo, Y., Savoy, J.: Term Proximity Scoring for Keyword-Based Retrieval Systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)
Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In: TREC, pp. 199–210 (1998)
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proc. of the 8th ACM CIKM, pp. 316–321 (1999)
Song, R., Taylor, M.J., Wen, J.-R., Hon, H.-W., Yu, Y.: Viewing Term Proximity from a Different Perspective. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 346–357. Springer, Heidelberg (2008)
Srikanth, M., Srihari, R.: Biterm language models for document retrieval. In: Proc. of the 25th ACM SIGIR, pp. 425–426 (2002)
Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: Proc. of the 17th WWW, pp. 347–356 (2008)
Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: Proc. of the 30th ACM SIGIR, pp. 295–302 (2007)
Zhao, J., Huang, J.X., He, B.: Crter: using cross terms to enhance probabilistic information retrieval. In: Proc. of the 34th ACM SIGIR, pp. 155–164 (2011)
Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 291–298 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhu, Y., Xue, Y., Guo, J., Lan, Y., Cheng, X., Yu, X. (2012). Exploring and Exploiting Proximity Statistic for Information Retrieval Model. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-35341-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)