Information Retrieval

, Volume 14, Issue 3, pp 290–314 | Cite as

Incorporating web browsing activities into anchor texts for web search

  • Bo Zhou
  • Yiqun Liu
  • Min Zhang
  • Yijiang Jin
  • Shaoping Ma
Web Mining for Search

Abstract

Anchor texts complement Web page content and have been used extensively in commercial Web search engines. Existing methods for anchor text weighting rely on the hyperlink information which is created by page content editors. Since anchor texts are created to help user browse the Web, browsing behavior of Web users may also provide useful or complementary information for anchor text weighting. In this paper, we discuss the possibility and effectiveness of incorporating browsing activities of Web users into anchor texts for Web search. We first make an analysis on the effectiveness of anchor texts with browsing activities. And then we propose two new anchor models which incorporate browsing activities. To deal with the data sparseness problem of user-clicked anchor texts, two features of user’s browsing behavior are explored and analyzed. Based on these features, a smoothing method for the new anchor models is proposed. Experimental results show that by incorporating browsing activities the new anchor models outperform the state-of-art anchor models which use only the hyperlink information. This study demonstrates the benefits of Web browsing activities to affect anchor text weighting.

Keywords

Anchor text Document representation Web browsing behavior Web search 

References

  1. Agichtein, E., Brill, E., & Dumais, S. (2006). Improving web search ranking by incorporating user behavior information. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM.Google Scholar
  2. Amitay, E., & Paris, C. (2000). Automatically summarising websites: Is there a way around it? In Proceeding of CIKM ‘00 (pp. 173–179). New York, NY, USA: ACM.Google Scholar
  3. Bilenko, M., & White, R. W. (2008). Mining the search trails of surfing crowds: identifying relevant websites from user activity. In Proceeding of WWW ‘08 (pp. 51–60). New York, NY, USA: ACM.Google Scholar
  4. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. In Seventh international world-wide web conference (WWW 1998), April 14–18, 1998, Brisbane, Australia. New York, NY, USA: ACM.Google Scholar
  5. Broder, A. (2002). A taxonomy of web search. SIGIR Forum, 36(2), 3–10. ACM.CrossRefGoogle Scholar
  6. Clarke, C. L. A., Craswell, N., & Soboroff, I. (2009). Overview of the TREC 2009 webtrack. In Proceedings of the 18th text retrieval conference.Google Scholar
  7. Craswell, N., Hawking, D., & Robertson, S. (2001). Effective site finding using link anchor information. In Proceeding of SIGIR ‘01 (pp. 250–257). New York, NY, USA: ACM.Google Scholar
  8. Dou, Z., Song, R., Nie, J.-Y., & Wen, J.-R. (2009). Using anchor texts with their hyperlink structure for web search. In Proceeding of SIGIR ‘09 (pp. 227–234). New York, NY, USA: ACM.Google Scholar
  9. Eiron, N., & McCurley, K. S. (2003). Analysis of anchor text for web search. In Proceeding of SIGIR ‘03 (pp. 459–460). New York, NY, USA: ACM.Google Scholar
  10. Fujii, A. (2008). Modeling anchor text and classifying queries to enhance web document retrieval. In Proceeding of WWW’08 (pp. 337–346). New York, NY, USA: ACM.Google Scholar
  11. Gyöngyi, Z., & Garcia-Molina, H. (2005). Web spam taxonomy. In the 1st international workshop on adversarial information retrieval on the web. AIRWeb ‘05. New York, USA: ACM.Google Scholar
  12. Jarvelin, K., & Kekalainen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the ACM conference on research and development on information retrieval (SIGIR). New York, NY, USA: ACM.Google Scholar
  13. Kraft, R., & Zien, J. (2004). Mining anchor text for query refinement. In Proceeding of WWW ‘04 (pp. 666–674). New York, NY, USA: ACM.Google Scholar
  14. Lee, U., Liu, Z., & Cho, J. (2005). Automatic identification of user goals in web search. In Proceeding of WWW ‘05 (pp. 391–400). New York, NY, USA: ACM.Google Scholar
  15. Liu, Y., Cen, R., Zhang, M., Ma, S., & Ru, L. (2008a). Identifying web spam with user behavior analysis. In the 4th international workshop on adversarial information retrieval on the web. AIRWeb ’08 (pp. 9–16). New York, NY: ACM.Google Scholar
  16. Liu, Y., Gao, B., Liu, T.-Y., Zhang, Y., Ma, Z., He, S. et al. (2008b). BrowseRank: letting web users vote for page importance. In Proceeding of SIGIR’08 (pp. 451–458). New York, NY, USA: ACM.Google Scholar
  17. Lu, W.-H., Chien, L.-F., & Lee, H.-J. (2004). Anchor text mining for translation of web queries: A transitive translation approach. ACM Transaction on Information System, 22(2), 242–269.CrossRefGoogle Scholar
  18. Metzler, D., Novak, J., Cui, H., & Reddy, S. (2009). Building enriched document representations using aggregated anchor text. In Proceeding of SIGIR’09 (pp. 123–130). New York, NY, USA: ACM.Google Scholar
  19. Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval. In Proceeding of SIGIR ‘98 (pp. 275–281). New York, NY, USA: ACM.Google Scholar
  20. Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., & Gatford, M. (1996). Okapi at trec-3. In Proceedings of TREC–3 (pp. 109–126).Google Scholar
  21. Robertson, S, Zaragoza, H., & Taylor, M. (2004). Simple bm25 extension to multiple weighted fields. In Proceedings of CIKM ‘04 (pp. 42–49). ACM.Google Scholar
  22. Sarukkai, R. R. (2000). Link prediction and path analysis using Markov chains. Computer Networks, 33, 377–386.CrossRefGoogle Scholar
  23. Sarwar, B. M., Karypis, G., Konstan, J. A., & Riedl, J. T. (2000). Analysis of recommender algorithms for e-commerce. In Proceedings of 2nd ACM Conference on electronic commerce (pp. 158–167). NewYork: ACM Press.Google Scholar
  24. Westerveld, T., Kraaij, W., & Hiemstra, D. (2001). Retrieving web pages using content, links, urls and anchors. In Tenth text retrieval conference (pp. 663–672).Google Scholar
  25. White, R. W., Bilenko, M., & Cucerzan, S. (2007). Studying the use of popular destinations to enhance web search interaction. In SIGIR ‘07 (pp. 159–166). New York, USA: ACM.Google Scholar
  26. Yiqun, L., & Liyun Ru, S. M. (2006). Automatic query type identification based on click through information. In Proceeding of AIRS ‘06 (pp. 593–600).Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Bo Zhou
    • 1
  • Yiqun Liu
    • 1
  • Min Zhang
    • 1
  • Yijiang Jin
    • 1
  • Shaoping Ma
    • 1
  1. 1.State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, CS&T DepartmentTsinghua UniversityBeijingPeople’s Republic of China

Personalised recommendations