Synonyms
Personalized web; Web content mining; Web information integration and schema matching
Definition
Data integration in Web data extraction systems refers to the task of providing a uniform access to multiple Web data sources. The ultimate goal of Web data integration is similar to the objective of data integration in database systems. However, the main difference is that Web data sources (i.e., Websites) do not feature a structured data format which can be accessed and queried by means of a query language. In contrast, Web data extraction systemsneed to provide an additional layer to transform Web pages into (semi)-structured data sources. Typically, this layer provides an extraction mechanism that exploits the inherent document structure of HTML pages (i.e., the document object model), the content of the document (i.e., text), visual cues (i.e., formatting and layout), and the inter document structure (i.e., hyperlinks) to extract data instances from the given Web pages. Due...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.
Berglund A, Boag S, Chamberlin D, Rernandez MF, Kay M, Robie J, Simeon J. editors. XML XPath language 2.0. W3C recommendation; 2007.
Bernstein PA, Melnik S, Petropoulos M, Quix C. Industrial-strength schema matching. ACM SIGMOD Rec. 2004;33(4):38–43.
Bing L, Chen-Chuan-Chang K. Editorial: special issue on web content mining. ACM SIGKDD Explor Newsl. 2004;6(2):1–4.
Boag S, Chamberlin D, Fernandez MF, Florescu D, Robie J, Simeon J. editors. XQuery 1.0. An XML query language. W3C recommendation; 2007.
Fodor O, Werthner E. Harmonise: a step toward an interoperable e-tourism marketplace. Intl J Electron Commer. 2005;9(2):11–39.
Gravano L, Panagiotis GI, Koudas N, Srivastava D. Text joins in an RDBMS for web data integration. In: Proceedings of the 12th International World Wide Web Conference; 2003. p. 90–101.
Halevy A, Rajaraman A, Ordille J. Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 9–18.
Harmonise Framework. Available at: http://sourcefo rge.net/projects/hmafra/
Herzog M, Gottlob G. InfoPipes: a flexible framework for m-commerce applications. In: Proceedings of the 2nd International Workshop on Technologies for E-Services; 2001. p. 175–86.
Kay M, editor. XSL transformations. Version 2.0. W3C recommendation; 2007.
Kirk T, Levy AY, Sagiv Y, Srivastava D. The information manifold. In: Proceedings of the Working Notes of the AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments; 1995. p. 85–91.
Ludäscher B, Himmeröder R, Lausen G, May W, Schlepphorst C. Managing semistructured data with florid: a deductive object-oriented perspective. Inf Syst. 1998;23(9):589–613.
May W, Lausen G. A uniform framework for integration of information from the web. Inf Syst. 2004;29(1):59–91.
Myllymaki J. Effective web data extraction with standard XML technologies. Comput Netw. 2002;39(5):635–44.
Rahm E, Bernstein PA. A survey of approaches to automatics schema matching. VLDB J. 2001;10(4):334–50.
Salton G, McGill MJ. Introduction to modern information retrieval. New York: McGraw-Hill; 1983.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Herzog, M. (2018). Data Integration in Web Data Extraction System. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1161
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1161
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering