- Wolfgang GatterbauerAffiliated withUniversity of Washington
Web harvesting describes the process of gathering and integrating data from various heterogeneous web sources. Necessary input is an appropriate knowledge representation of the domain of interest (e.g., an ontology), together with example instances of concepts or relationships (seed knowledge). Output is structured data (e.g., in the form of a relational database) that is gathered from the Web. The term harvesting implies that, while passing over a large body of available information, the process gathers only such information that lies in the domain of interest and is, as such, relevant.
The process of web harvesting can be divided into three subsequent tasks: (i) data or information retrieval, which involves finding relevant information on the Web and storing it locally. This task requires tools for searching and navigating the Web, i.e., crawlers and means for interacting ...
Reference Work Entry Metrics
- Web Harvesting
- Reference Work Title
- Encyclopedia of Database Systems
- pp 3472-3473
- Print ISBN
- Online ISBN
- Springer US
- Copyright Holder
- Springer Science+Business Media, LLC
- Additional Links
- Industry Sectors
- eBook Packages
- Editor Affiliations
- 1. College of Computing, Georgia Institute of Technology
- 2. Database Research Group David R. Cheriton School of Computer Science, University of Waterloo
- Author Affiliations
- 1. University of Washington, Seattle, WA, USA
To view the rest of this content please follow the download PDF link above.