MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data
- Cite this paper as:
- Harth A., Umbrich J., Decker S. (2006) MultiCrawler: A Pipelined Architecture for Crawling and Indexing Semantic Web Data. In: Cruz I. et al. (eds) The Semantic Web - ISWC 2006. ISWC 2006. Lecture Notes in Computer Science, vol 4273. Springer, Berlin, Heidelberg
The goal of the work presented in this paper is to obtain large amounts of semistructured data from the web. Harvesting semistructured data is a prerequisite to enabling large-scale query answering over web sources. We contrast our approach to conventional web crawlers, and describe and evaluate a five-step pipelined architecture to crawl and index data from both the traditional and the Semantic Web.