Skip to main content

Focused Web Crawling

  • Reference work entry
  • First Online:
Book cover Encyclopedia of Database Systems
  • 63 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Babaria R, Saketha Nath J, Krishnan S, Sivaramakrishnan KR, Bhattacharyya C, Murty MN. Focused crawling with scalable ordinal regression solvers. In: Proceedings of the 24th International Conference on Machine Learning; 2007. p. 57–64.

    Google Scholar 

  2. Broder A et~al. Graph structure in the web: experiments and models. In: Proceedings of the 9th International World Wide Web Conference; 2000. p. 309–20.

    Google Scholar 

  3. Chakrabarti S. Mining the web: discovering knowledge from hypertext data. Morgan-Kauffman; 2002.

    Google Scholar 

  4. Chakrabarti S, Dom B, Indyk P. Enhanced hypertext categorization using hyperlinks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 307–18.

    Google Scholar 

  5. Chakrabarti S, van den Berg M, Dom B. Focused crawling: a new approach to topic-specific web resource discovery. Comput Netw. 1999;31(11–16):1623–40.

    Article  Google Scholar 

  6. Chakrabarti S, Joshi MM, Punera K, Pennock DM. The structure of broad topics on the web. In: Proceedings of the 11th International World Wide Web Conference; 2002. p. 251–62.

    Google Scholar 

  7. Chakrabarti S, Punera K, Subramanyam M. Accelerated focused crawling through online relevance feedback. In: Proceedings of the 11th International World Wide Web Conference; 2002. p. 148–59.

    Google Scholar 

  8. Cho J, Garcia-Molina H, Page L. Efficient crawling through URL ordering. In: Proceedings of the 7th International World Wide Web Conference; 1998. p. 161–72.

    Article  Google Scholar 

  9. Davison BD. Topical locality in the web. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2000. p. 272–9.

    Google Scholar 

  10. Diligenti M, Coetzee F, Lawrence S, Giles CL, Gori M. Focused crawling using context graphs. In: Proceedings of the 26th International Conference on Very Large Data Bases; 2000. p. 527–34.

    Google Scholar 

  11. Dill S, Ravi Kumar S, McCurley KS, Rajagopalan S, Sivakumar D, Tomkins A. Self-similarity in the web. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 69–78.

    Google Scholar 

  12. Herseovici M, Jacovi M, Maarek YS, Pelleg D, Shtalhaim M, Ur S. The shark-search algorithm – an application: tailored web site mapping. In: Proceedings of the 7th International World Wide Web Conference; 1998. p. 317–26.

    Google Scholar 

  13. Lafferty J, McCallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning; 2001. p. 282–9.

    Google Scholar 

  14. Najork M, Weiner J. Breadth-first search crawling yields high-quality pages. In: Proceedings of the 10th International World Wide Web Conference; 2001. p. 114–8.

    Google Scholar 

  15. Pandey S, Olston C. User-centric web crawling. In: Proceedings of the 14th International World Wide Web Conference; 2005. p. 401–11.

    Google Scholar 

  16. Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web. Manuscript, Stanford University; 1998.

    Google Scholar 

  17. Rennie J, McCallum A. Using reinforcement learning to spider the web efficiently. In: Proceedings of the 16th International Conference on Machine Learning; 1999. p. 335–43.

    Google Scholar 

  18. Sutton RS, Barto AG. Reinforcement learning: an introduction. Cambridge, MA: MIT; 1998.

    Google Scholar 

  19. Vinod Vydiswaran VG, Sarawagi S. Learning to extract information from large websites using sequential models. In: Proceedings of the 11th International Conference on Management of Data; 2005. p. 3–14.

    Google Scholar 

  20. Wikipedia page on Focused Crawling at http://en.wikipedia.org/wiki/Focused_crawler

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumen Chakrabarti .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Chakrabarti, S. (2018). Focused Web Crawling. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_165

Download citation

Publish with us

Policies and ethics