Exploring and Exploiting Proximity Statistic for Information Retrieval Model

Zhu, Yadong; Xue, Yuanhai; Guo, Jiafeng; Lan, Yanyan; Cheng, Xueqi; Yu, Xiaoming

doi:10.1007/978-3-642-35341-3_1

Yadong Zhu²¹,
Yuanhai Xue²¹,
Jiafeng Guo²¹,
Yanyan Lan²¹,
Xueqi Cheng²¹ &
…
Xiaoming Yu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7675))

Included in the following conference series:

Asia Information Retrieval Symposium

1322 Accesses
1 Citations

Abstract

Proximity among query terms has been recognized to be useful for boosting retrieval performance. However, how to model proximity effectively and efficiently remains a challenging research problem. In this paper, we propose a novel proximity statistic, namely Phrase Frequency, to model term proximity systematically. Then we propose a new proximity-enhanced retrieval model named BM25PF that combines the phrase frequency information with the basic BM25 model to rank the documents. Extensive experiments on four standard TREC collections illustrate the effectiveness of the BM25PF model, and also shows the significant influence of the phrase frequency on retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bai, J., Chang, Y., Cui, H., Zheng, Z., Sun, G., Li, X.: Investigation of partial query proximity in web search. In: Proc. of the 17th WWW, pp. 1183–1184 (2008)
Google Scholar
Büttcher, S., Clarke, C.L.A., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proc. of the 29th SIGIR, pp. 621–622 (2006)
Google Scholar
Clarke, C.L.A., Cormack, G.V., Burkowski, F.J.: Shortest substring ranking (multitext experiments for trec-4). In: TREC, pp. 295–304 (1995)
Google Scholar
Cummins, R., O’Riordan, C.: Learning in a pairwise term-term proximity framework for information retrieval. In: Proc. of the 32nd SIGIR, pp. 251–258 (2009)
Google Scholar
Fagan, J.: Automatic phrase indexing for document retrieval. In: Proc. of the 10th ACM SIGIR, pp. 91–101 (1987)
Google Scholar
Gao, J., Nie, J.-Y., Wu, G., Cao, G.: Dependence language model for information retrieval. In: Proc. of the 27th ACM SIGIR, pp. 170–177 (2004)
Google Scholar
Hawking, D., Thistlewaite, P.B.: Proximity operators - so near and yet so far. In: TREC (1995)
Google Scholar
He, B., Huang, J.X., Zhou, X.: Modeling term proximity for probabilistic information retrieval models. Inf. Sci. 181, 3017–3031 (2011)
Article MathSciNet Google Scholar
Risvik, K.M., Mikolajewski, T., Boros, P.: Query segmentation for web search. In: Proc. of the 12th WWW (2003)
Google Scholar
Lv, Y., Zhai, C.: Positional language models for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 299–306 (2009)
Google Scholar
Metzler, D., Croft, W.B.: A markov random field model for term dependencies. In: Proc. of the 28th ACM SIGIR, pp. 472–479 (2005)
Google Scholar
Mitra, M., Buckley, C., Singhal, A., Cardie, C.: An analysis of statistical and syntactic phrases. In: Proc. of 5th RIAO, pp. 200–214 (1997)
Google Scholar
Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: Proc. of the 11th ACM CIKM, pp. 383–390 (2002)
Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proc. of the 21st ACM SIGIR, pp. 275–281 (1998)
Google Scholar
Rasolofo, Y., Savoy, J.: Term Proximity Scoring for Keyword-Based Retrieval Systems. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 207–218. Springer, Heidelberg (2003)
Chapter Google Scholar
Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27(3), 129–146 (1976)
Article Google Scholar
Robertson, S.E., Walker, S., Hancock-Beaulieu, M.: Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive. In: TREC, pp. 199–210 (1998)
Google Scholar
Robertson, S.E., Zaragoza, H.: The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval 3(4), 333–389 (2009)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Article MATH Google Scholar
Song, F., Croft, W.B.: A general language model for information retrieval. In: Proc. of the 8th ACM CIKM, pp. 316–321 (1999)
Google Scholar
Song, R., Taylor, M.J., Wen, J.-R., Hon, H.-W., Yu, Y.: Viewing Term Proximity from a Different Perspective. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 346–357. Springer, Heidelberg (2008)
Chapter Google Scholar
Srikanth, M., Srihari, R.: Biterm language models for document retrieval. In: Proc. of the 25th ACM SIGIR, pp. 425–426 (2002)
Google Scholar
Tan, B., Peng, F.: Unsupervised query segmentation using generative language models and wikipedia. In: Proc. of the 17th WWW, pp. 347–356 (2008)
Google Scholar
Tao, T., Zhai, C.: An exploration of proximity measures in information retrieval. In: Proc. of the 30th ACM SIGIR, pp. 295–302 (2007)
Google Scholar
Zhao, J., Huang, J.X., He, B.: Crter: using cross terms to enhance probabilistic information retrieval. In: Proc. of the 34th ACM SIGIR, pp. 155–164 (2011)
Google Scholar
Zhao, J., Yun, Y.: A proximity language model for information retrieval. In: Proc. of the 32nd ACM SIGIR, pp. 291–298 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, China
Yadong Zhu, Yuanhai Xue, Jiafeng Guo, Yanyan Lan, Xueqi Cheng & Xiaoming Yu

Authors

Yadong Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhai Xue
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Lan
View author publications
You can also search for this author in PubMed Google Scholar
Xueqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of computer Science and Technology, Tianjin University, Tianjin, 300072, China
Yuexian Hou
DIRO, University of Montreal, CP. 6128, succursale Centre-ville, H3C 3J7, Montreal, QC, Canada
Jian-Yun Nie
Institute of Software, Storage & Information Retrieval Laboratory, Chinese Academy of Sciences, 100190, Beijing, China
Le Sun
School of Computer Science and Technology, Tianjin University, 300072, Tianjin, China
Bo Wang
School of Computing, Robert Gordon University, St Andrew Street, AB25 1HG, Aberdeen, UK
Peng Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, Y., Xue, Y., Guo, J., Lan, Y., Cheng, X., Yu, X. (2012). Exploring and Exploiting Proximity Statistic for Information Retrieval Model. In: Hou, Y., Nie, JY., Sun, L., Wang, B., Zhang, P. (eds) Information Retrieval Technology. AIRS 2012. Lecture Notes in Computer Science, vol 7675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35341-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-35341-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35340-6
Online ISBN: 978-3-642-35341-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics