MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data

  • Andreas Harth
  • Jürgen Umbrich
  • Stefan Decker
Conference paper

DOI: 10.1007/11926078_19

Part of the Lecture Notes in Computer Science book series (LNCS, volume 4273)
Cite this paper as:
Harth A., Umbrich J., Decker S. (2006) MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data. In: Cruz I. et al. (eds) The Semantic Web - ISWC 2006. ISWC 2006. Lecture Notes in Computer Science, vol 4273. Springer, Berlin, Heidelberg

Abstract

The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Andreas Harth
    • 1
  • Jürgen Umbrich
    • 1
  • Stefan Decker
    • 1
  1. 1.Digital Enterprise Research InstituteNational University of IrelandGalway

Personalised recommendations