Reference Work Entry

Encyclopedia of Database Systems

pp 3465-3471

Web Data Extraction System

  • Robert BaumgartnerAffiliated withVienna University of TechnologyLixto Software GmbH
  • , Wolfgang GatterbauerAffiliated withUniversity of Washington
  • , Georg GottlobAffiliated withOxford University

Synonyms

Web information extraction system; Wrapper generator; Web macros; Web scraper

Definition

A web data extraction system is a software system that automatically and repeatedly extracts data from web pages with changing content and delivers the extracted data to a database or some other application. The task of web data extraction performed by such a system is usually divided into five different functions: (i) web interaction, which comprises mainly the navigation to usually pre-determined target web pages containing the desired information; (ii) support for wrapper generation and execution, where a wrapper is a program that identifies the desired data on target pages, extracts the data and transforms it into a structured format; (iii) scheduling, which allows repeated application of previously generated wrappers to their respective target pages; (iv) data transformation, which includes filtering, transforming, refining, and integrating data extracted from one or mo ...

This is an excerpt from the content