Abstract
Use of links to enhance page ranking has been widely studied. The underlying assumption is that links convey recommendations. Although this technique has been used successfully in global web search, it produces poor results for website search, because the majority of the links in a website are used to organize information and convey no recommendations. By distinguishing these two kinds of links, respectively for recommendation and information organization, this paper describes a path-based method for web page ranking. We define the Hierarchical Navigation Path (HNP) as a new resource for improving web search. HNP is composed of multi-step navigation information in visitors’ website browsing. It provides indications of the content of the destination page. We first classify the links inside a website. Then, the links for web page organization are exploited to construct the HNPs for each page. Finally, the PathRank algorithm is described for web page retrieval. The experiments show that our approach results in significant improvements over existing solutions.
Similar content being viewed by others
References
Amento, B., Terveen, L., Hill, W.: Does authority mean quality? Predicting expert quality ratings of web documents. In Proc. of SIGIR. (2000)
Asadi, S., Zhou, X., Yang, G.: Using local popularity of web resources for geo-ranking of search engine results. World Wide Web Internet Web Inf. Syst. 12(2), 149–170 (2009)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. Addison-Wesley (1999)
Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y.: Optimizing web search using social annotation. Proc. of WWW (2007)
Borges, J., Levene, M.: Ranking pages by topology and popularity within web sites. World Wide Web Internet Web Inf. Syst. 9(3), 301–316 (2006)
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Broder, A., Kumar, R., Maghoul, F. et al.: Graph structure in the web. Proc. of WWW. (2000)
Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-level link analysis. Proc. of SIGIR. 440–447 (2004)
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: VIPS: A vision based page segmentation algorithm, Microsoft Technical Report, MSR-TR-2003-79 (2003)
Chen, M., Hearst, M. et. al.: Cha-Cha: A system for organizing intranet search results. Proc. of USENIX USITS. (1999)
Chen, Z., Liu, S., Liu, W., Pu, G. Ma, W.Y.: Building a web thesaurus from web link structure. Proc. of SIGIR03 (2003)
Chen, J.L., Zhou, B.Y., Shi, J., Zhang, H.J., Wu, Q.F.: Function-based object model towards Website Adaptation, In Proc. of WWW01 (2001)
Chi, E.H. et al.: Using information scent to model user information needs and actions on the web. Proc. of SIGCHI (2001)
Delicious: http://del.icio.us
Dyreson, C.E.: A jumping spider: Restructuring the WWW graph to index concepts that span pages. Proc. of WWW. (1998)
Eiron, N., McCurley, K.S.: Analysis of anchor text for web search. Proc. SIGIR 2003, 459–460 (2003)
Fagin, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Tomlin, J.A., Williamson, D.P.: Searching the workplace web. Proc. of WWW03, pp. 366–375 (2003)
Glover, E.J. et. al.: Improving category specific web search by learning query modifications, symposium on applications and the internet. pp. 23–32 (2001)
Glover, E.J., Tsioutsiouliklis, K., Lawrence, S., Pennock, D.M., Flake, G.W.: Using web structure for classifying and describing web pages. Proc. of WWW02, 562–569 (2002)
Hagen, P., Manning, H., Paul, Y.: Must search stink? The Forrester report, Forrester, June (2000)
Han, S.K., Shin, D., Jung, J.-Y., Park, J.: Exploring the relationship between keywords and feed elements in blog post search. World Wide Web Internet Web Inf. Syst. 12(4), 381–398 (2009)
Haveliwala, T.H.: Topic-sensitive PageRank. Proc. of WWW02, pp.517-526 (2002)
Hawking, D., Voorhees, E., Bailey, P., Craswell, N.: Overview of TREC-8 web track. Proceeding of TREC-8. 131–150(1999)
Hawking, D., Craswell, N.: Overview of the TREC 2001 Web Track, in TREC01 (2001)
Hawking, D.: Overview of the TREC-9 Web Track, in TREC02 (2000)
Henzinger, M.: Link analysis on the world wide web. Proc. of ACM Hypertext, pp.1-3 (2005)
Hu, Y., Xin, G., Song, R., Hu, G. et. al.: Title extraction from bodies of HTML documents and its application to web page retrieval. Proceeding of SIGIR05. 250–257 (2005)
Jeh, G., Widom, J.: Scaling personalized web search. Proc. of WWW03, pp. 271–279 (2003)
Kleinberg, J.: Authoritative sources in a linked environment. J. ACM 46(5), 604–622 (1999)
Li, L., Otsuka, S., Kitsuregawa, M.: Finding related search engine queries by web community based query enrichment. World Wide Web Internet Web Inf. Syst. 13(1–2), 121–142 (2009)
Li, J.Q., Zhao, Y.: PathRank: Web page retrieval with navigation path, Proc. ECIR09, pp. 350–361 (2009)
Lin, S.-H. and Ho, J.-M.: Discovering Informative Content Blocks from Web Documents, Proc. of SIGKDD (2002)
Matsuda, K., Fukushima, T.: Task-oriented world wide web retrieval by document type classification, Proc. of CIKM1999. pp.109–113 (1999)
Mizuuchi, Y., Tajima, K.: Finding context paths for Web pages. Proc. of ACM Hypertext, pp. 13–22 (1999)
Najork, M., Wiener, J.: Breadth-first search crawling yields high-quality pages, Proc. of WWW2000. pp. 114–118 (2000)
Nie, Z., Zhang, Y., Wen, J., Ma, W.-Y.: Object-level ranking: bringing order to Web objects. Proc. of WWW05. 567–574 (2005)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web, Technical Report, Stanford University (1998)
Pandit, S.; Olston, C.: Source: Navigation-aided retrieval. Proc. of WWW2007, pp. 391–400 (2007)
Robertson, S.E., Walker, S., Hancock-Beaulieu, M., Gull, A., Lau, M.: Okapi at TREC. In:Text REtrieval Conference (1992)
Shen, D., Sun, J.-T., Yang, Q., Chen, Z.: A comparison of implicit and explicit links for web page classification. Proc. of WWW. 643–650 (2006)
Soboroff, I.: Do trec web collections look like the web? SIGIR. Forum 36, 23–31 (2002)
Vaughan, L., Thelwall, M.: Scholarly use of the Web: what are the key inducers of links to journal web sites? J. Am. Soc. Inf. Sci. Technol. 54(1), 29–38 (2003)
WT10G, http://ir.dcs.gla.ac.uk/test_collections/access_to_data.html
Xue, G., Zeng, H., Chen, Z., Ma, W. etc.: Implicit link analysis to small web search. Proc. of SIGIR03. 56–63 (2003)
Yu, W., Zhang, W., Lin, X., Zhang, Q., Le, J.: A space and time efficient algorithm for SimRank computation, World Wide Web: Internet and Web Information Systems, 2010
Zakos, J., Verma, B.: A novel context-based technique for web information retrieval. World Wide Web Internet Web Inf. Syst. 9(4), 485–503 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in [31]. This submission includes more complete and formal description of the algorithms and experiments.
Rights and permissions
About this article
Cite this article
Li, JQ., Zhao, Y. & Garcia-Molina, H. A path-based approach for web page retrieval. World Wide Web 15, 257–283 (2012). https://doi.org/10.1007/s11280-011-0133-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-011-0133-5