Skip to main content

Fractional PageRank Crawler: Prioritizing URLs Efficiently for Crawling Important Pages Early

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 5463)

Abstract

Crawling important pages early is a well studied problem. However, the availability of different types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

This work was supported by grant No. RO1-2006-000-10510-0 from the Basic Research Program of the Korea Science & Engineering Foundation.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)

    CrossRef  Google Scholar 

  2. Cho, J., Schonfeld, U.: Rankmass crawler: a crawler with high personalized pagerank coverage guarantee. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 375–386 (2007)

    Google Scholar 

  3. Web Collection Uk-2006-06. Crawled by the Laboratory for Web Algorithmics, University of Millan, http://law.dsi.unimi.it/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alam, M.H., Ha, J., Lee, S. (2009). Fractional PageRank Crawler: Prioritizing URLs Efficiently for Crawling Important Pages Early. In: Zhou, X., Yokota, H., Deng, K., Liu, Q. (eds) Database Systems for Advanced Applications. DASFAA 2009. Lecture Notes in Computer Science, vol 5463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00887-0_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00887-0_52

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00886-3

  • Online ISBN: 978-3-642-00887-0

  • eBook Packages: Computer ScienceComputer Science (R0)