International Conference on Web Information Systems Engineering

Web Information Systems Engineering – WISE 2015 pp 554-569

Adaptive Focused Crawling of Linked Data

  • Ran Yu
  • Ujwal Gadiraju
  • Besnik Fetahu
  • Stefan Dietze
Conference paper

DOI: 10.1007/978-3-319-26190-4_37

Part of the Lecture Notes in Computer Science book series (LNCS, volume 9418)
Cite this paper as:
Yu R., Gadiraju U., Fetahu B., Dietze S. (2015) Adaptive Focused Crawling of Linked Data. In: Wang J. et al. (eds) Web Information Systems Engineering – WISE 2015. Lecture Notes in Computer Science, vol 9418. Springer, Cham

Abstract

Given the evolution of publicly available Linked Data, crawling and preservation have become increasingly important challenges. Due to the scale of available data on the Web, efficient focused crawling approaches which are able to capture the relevant semantic neighborhood of seed entities are required. Here, determining relevant entities for a given set of seed entities is a crucial problem. While the weight of seeds within a seed list vary significantly with respect to the crawl intent, we argue that an adaptive crawler is required, which considers such characteristics when configuring the crawling and relevance detection approach. To address this problem, we introduce a crawling configuration, which considers seed list-specific features as part of its crawling and ranking algorithm. We evaluate it through extensive experiments in comparison to a number of baseline methods and crawling parameters. We demonstrate that, configurations which consider seed list features outperform the baselines and present further insights gained from our experiments.

Keywords

Focused crawling Linked data Relevance assessment 

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Ran Yu
    • 1
  • Ujwal Gadiraju
    • 1
  • Besnik Fetahu
    • 1
  • Stefan Dietze
    • 1
  1. 1.L3S Research CenterLeibniz Universität HannoverHannoverGermany

Personalised recommendations