Exploring URL Hit Priors for Web Search

Song, Ruihua; Xin, Guomao; Shi, Shuming; Wen, Ji-Rong; Ma, Wei-Ying

doi:10.1007/11735106_25

Ruihua Song²²,
Guomao Xin²²,
Shuming Shi²²,
Ji-Rong Wen²² &
…
Wei-Ying Ma²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3936))

Included in the following conference series:

European Conference on Information Retrieval

1582 Accesses
4 Citations

Abstract

URL usually contains meaningful information for measuring the relevance of a Web page to a query in Web search. Some existing works utilize URL depth priors (i.e. the probability of being a good page given the length and depth of a URL) for improving some types of Web search tasks. This paper suggests the use of the location of query terms occur in a URL for measuring how well a web page is matched with a user’s information need in web search. First, we define and estimate URL hit types, i.e. the priori probability of being a good answer given the type of query term hits in the URL. The main advantage of URL hit priors (over depth priors) is that it can achieve stable improvement for both informational and navigational queries. Second, an obstacle of exploiting such priors is that shortening and concatenation are frequently used in a URL. Our investigation shows that only 30% URL hits are recognized by an ordinary word breaking approach. Thus we combine three methods to improve matching. Finally, the priors are integrated into the probabilistic model for enhancing web document retrieval. Our experiments were conducted using 7 query sets of TREC2002, TREC2003 and TREC2004, and show that the proposed approach is stable and improve retrieval effectiveness by 4%~11% for navigational queries and 10% for informational queries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Google Scholar
Berger, J.: Statistical decision theory and Bayesian analysis. Springer, New York (1985)
Book MATH Google Scholar
Bharat, K., Henzinger, M.: Improved algorithms for topic distillation in a hyperlinked environment. In: 21st Annual International ACM SIGIR Conference, Melbourne, Australia, pp. 104–111 (August 1998)
Google Scholar
Border, A.: A taxonomy of Web search. SIGIR Forum 36(2) (2002)
Google Scholar
Chi, C.-H., Ding, C., Lim, A.: Word segmentation and recognition for web document framework. In: CIKM 1999 (1999)
Google Scholar
Craswell, N., Robertson, S., Zaragoza, H., Taylor, M.: Relevance weight for query independent evidence. In: Proceedings of ACM SIGIR 2005, Salvador, Brazil (2005)
Google Scholar
Hawking, D., Voorhees, E., Craswell, N., Bailey, P.: Overview of the TREC-8 web track. In: The Eighth Text Retrieval Conference (TREC8), NIST (2001)
Google Scholar
Hu, Y., Xin, G., Song, R., Hu, G., Shi, S., Cao, Y., Li, H.: Title extraction from bodies of HTML documents and its application to Web page retrieval. In: Proceedings of SIGIR 2005, Salvador, Brazil (2005)
Google Scholar
Kraaij, W., Westerveld, T., Hiemstra, D.: The importance of prior probabilities for entry page search. In: SIGIR 2002 (2001)
Google Scholar
Lee, U., Liu, Z., Cho, J.: Automatic identification of user goals in Web search. In: The Proceedings of the Fourteenth Int’l World Wide Web Conference (WWW 2005), Chiba, Japan (2005)
Google Scholar
Marchionini, G.: Interfaces for End-User Information Seeking. Journal of the American Society for Information Science 43(2), 156–163 (1992)
Article Google Scholar
Ogilvie, P., Callan, J.: Combining structural information and the use of priors in mixed named-page and homepage finding. In: TREC 2003 (2003)
Google Scholar
Ra, D.-Y., Park, E.-K., Jang, J.-S.: Yonsi/etri at TREC-10: Utilizing web document properties. In: The Tenth Text Retrieval Conference (TREC 2001), NIST (2002)
Google Scholar
Robertson, S.E., Walker, S.: Okapi/Keenbow at TREC-8. In: The Eighth Text Retrieval Conference (TREC 8), pp. 151–162 (1999)
Google Scholar
Robertson, S.E., Sparck Jones, K.: Relevance weighting of search terms. Journal of the American Society of Information Science 27, 129–146 (1976)
Article Google Scholar
TREC-2004 Web Track Guidelines, http://es.csiro.au/TRECWeb/guidelines_2004.html
Rose, D.E., Levinson, D.: Understanding user goals in Web search. In: Proceedings of the Thirteenth Int’l World Wide Web Conference (WWW 2004), New York, USA (2004)
Google Scholar
Qin, T., Liu, T.-Y., Zhang, X.-D., Chen, Z., Ma, W.-Y.: A study on relevance propagation for Web search. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil (2005)
Google Scholar
Universal Resource Identifiers, http://www.w3.org/Addressing/URL/URI_Overview.html
Wen, J.-R., Song, R., Cai, D., Zhu, K., Yu, S., Ye, S., Ma, W.-Y.: Microsoft Research Asia at the Web Track of TREC 2003. In: The Twelfth Text Retrieval Conference (2003)
Google Scholar
Westerveld, T., Kraaij, W., Hiemstra, D.: Retrieving web pages using content, links, URLs and anchors. In: TREC 2001 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research Asia, 5F, Sigma Center, No.49 Zhichun Road, 100080, Beijing, P.R. China
Ruihua Song, Guomao Xin, Shuming Shi, Ji-Rong Wen & Wei-Ying Ma

Authors

Ruihua Song
View author publications
You can also search for this author in PubMed Google Scholar
Guomao Xin
View author publications
You can also search for this author in PubMed Google Scholar
Shuming Shi
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Rong Wen
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ying Ma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Queen Mary, University of London, London, UK
Mounia Lalmas
Department of Information Science, City University, Northampton Square, EC1V OHB, London, UK
Andy MacFarlane
Knowledge Media Institute, The Open University, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Queen Mary University of London, UK
Anastasios Tombros
CWI, Amsterdam, The Netherlands
Theodora Tsikrika
Department of Computing, Imperial College London, South Kensington Campus, SW7 2AZ, London, UK
Alexei Yavlinsky

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, R., Xin, G., Shi, S., Wen, JR., Ma, WY. (2006). Exploring URL Hit Priors for Web Search. In: Lalmas, M., MacFarlane, A., Rüger, S., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds) Advances in Information Retrieval. ECIR 2006. Lecture Notes in Computer Science, vol 3936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11735106_25

Download citation

DOI: https://doi.org/10.1007/11735106_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33347-0
Online ISBN: 978-3-540-33348-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics