Skip to main content

Exploiting PageRank at Different Block Level

  • Conference paper
Book cover Web Information Systems – WISE 2004 (WISE 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3306))

Included in the following conference series:

Abstract

In recent years, information retrieval methods focusing on the link analysis have been developed; The PageRank and HITS are two typical ones According to the hierarchical organization of Web pages, we could partition the Web graph into blocks at different level, such as page level, directory level, host level and domain level. On the basis of block, we could analyze the different hyperlinks among pages. Several approaches proposed that the intra-hyperlink in a host maybe less useful in computing the PageRank. However, there are no reports on how concretely the intra- or inter-hyperlink affects the PageRank. Furthermore, based on different block level, inter-hyperlink and intra-hyperlink can be two relative concepts. Thus which level should be optimal to distinguish the intra- or inter-hyperlink? And how the ratio set between the intra-hyperlink and inter-hyperlink could ultimately improve performance of the PageRank algorithm? In this paper, we analyze the link distribution at the different block level and evaluate the importance of the intra- and inter-hyperlink to PageRank on the TREC Web Track data set. Experiment shows that, if we set the block at host level and the ratio of the weight between the intra-hyperlink and inter-hyperlink is 1:4, the retrieval could achieve the best performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of web documents. In: Proc. of ACM SIGIR 2000, pp. 296–303 (2000)

    Google Scholar 

  2. Bharat, K., Henzinger, M.R.: Improved algorithms for topic distillation in a hyperlinked environment. In: Proc. of the ACM-SIGIR (1998)

    Google Scholar 

  3. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. In: The Seventh International World Wide Web Conference (1998)

    Google Scholar 

  4. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., Rajagopalan, S.: Automatic resource list compilation by analyzing hyperlink structure and associated text. In: Proc. of the 7th Int. World Wide Web Conference (May 1998)

    Google Scholar 

  5. Chakrabarti, S.: Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In: The 10th International World Wide Web Conference (2001)

    Google Scholar 

  6. Chakrabarti, S., Joshi, M., Tawde, V.: Enhanced topic distillation using text, markup tags, and hyperlinks. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 208–216. ACM Press, New York (2001)

    Chapter  Google Scholar 

  7. Monz, C., Kamps, J., de Rijke, M.: The University of Amsterdam at TREC 2002 (2002)

    Google Scholar 

  8. Davison, B.D.: Recognizing nepotistic links on the Web. In: Artificial Intelligence for Web Search, pp. 23–28. AAAI Press, Menlo Park (2000)

    Google Scholar 

  9. Flake, G., Lawrence, S., Giles, L., Coetzee, F.: Self-organization and identification of web communities. IEEE Computer, 66–71 (2002)

    Google Scholar 

  10. Gibson, D., Kleinberg, J., Raghavan, P.: Inferring web communities from link topology. In: Proceedings of the 9th ACM Conference on Hypertext and Hypermedia (HYPER 1998), pp. 225–234. ACM Press, New York (1998)

    Google Scholar 

  11. Haveliwala, T.H.: Topic-sensitive PageRank. In: Proc. of the 11th Int. World Wide Web Conference (May 2002)

    Google Scholar 

  12. Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–622 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  13. Bharat, K., Chang, B.-W., Henzinger, M.R., Ruhl, M.: Who Links to Whom: Mining Linkage between Web Sites. In: 1st International Conference on Data Mining (ICDM), pp. 51–58 (2001)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web, Technical report, Stanford University, Stanford, CA (1998)

    Google Scholar 

  15. Eiron, N., McCurley, K.S.: Locality, Hierarchy, and Bidirectionality on the Web. In: Workshop on Web Algorithms and Models (2003)

    Google Scholar 

  16. Robertson, S.E.: Overview of the okapi projects. Journal of Documentation 53(1), 3–7 (1997)

    Article  Google Scholar 

  17. Silverstein, C., Henzingger, M., Marais, J., Moricz, M.: Analysis of a Very Large Alta- Vista Query Log. Digital SRC Technical Note 1998-014

    Google Scholar 

  18. Kamvar, S., Haveliwala, T., Manning, C., Golub, G.: Exploiting the Block Structure of the Web for Computing PageRank. In: Proc. of the 12th Int. World Wide Web Conference (May 2003)

    Google Scholar 

  19. Hawking, D.: Overview of the TREC-9 Web Track. In: Proc. of the 9th Annual TREC Conference, pp.87–102

    Google Scholar 

  20. TREC, http://trec.nist.gov/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, XM., Xue, GR., Song, WG., Zeng, HJ., Chen, Z., Ma, WY. (2004). Exploiting PageRank at Different Block Level. In: Zhou, X., Su, S., Papazoglou, M.P., Orlowska, M.E., Jeffery, K. (eds) Web Information Systems – WISE 2004. WISE 2004. Lecture Notes in Computer Science, vol 3306. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30480-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30480-7_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23894-2

  • Online ISBN: 978-3-540-30480-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics