Multimedia Tools and Applications

, Volume 75, Issue 3, pp 1563–1587

Focussed crawling of environmental Web resources based on the combination of multimedia evidence

  • Theodora Tsikrika
  • Anastasia Moumtzidou
  • Stefanos Vrochidis
  • Ioannis Kompatsiaris
Article

DOI: 10.1007/s11042-015-2624-3

Cite this article as:
Tsikrika, T., Moumtzidou, A., Vrochidis, S. et al. Multimed Tools Appl (2016) 75: 1563. doi:10.1007/s11042-015-2624-3

Abstract

Focussed crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic based on evidence obtained from the already downloaded pages. This work proposes a classifier-guided focussed crawling approach that estimates the relevance of a hyperlink to an unvisited Web resource based on the combination of textual evidence representing its local context, namely the textual content appearing in its vicinity in the parent page, with visual evidence associated with its global context, namely the presence of images relevant to the topic within the parent page. The proposed focussed crawling approach is applied towards the discovery of environmental Web resources that provide air quality measurements and forecasts, since such measurements (and particularly the forecasts) are not only provided in textual form, but are also commonly encoded as multimedia, mainly in the form of heatmaps. Our evaluation experiments indicate the effectiveness of incorporating visual evidence in the link selection process applied by the focussed crawler over the use of textual features alone, particularly in conjunction with hyperlink exploration strategies that allow for the discovery of highly relevant pages that lie behind apparently irrelevant ones.

Keywords

Focussed crawling Environmental data Link context Image classification Heatmaps 

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Theodora Tsikrika
    • 1
  • Anastasia Moumtzidou
    • 1
  • Stefanos Vrochidis
    • 1
  • Ioannis Kompatsiaris
    • 1
  1. 1.Information Technologies Institute, CERTHThessalonikiGreece