Information Integration

doi:10.1007/978-3-540-37882-2_10

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

In Chap. 9, we studied data extraction from Web pages. The extracted data is put in tables. For an application, it is, however, often not sufficient to extract data from only a single site. Instead, data from a large number of sites are gathered in order to provide value-added services. In such cases, extraction is only part of the story. The other part is the integration of the extracted data to produce a consistent and coherent database because different sites typically use different data formats. Intuitively, integration means to match columns in different data tables that contain the same type of information (e.g., product names) and to match values that are semantically identical but represented differently in different Web sites (e.g., “Coke” and “Coca Cola”). Unfortunately, limited integration research has been done so far in this specific context. Much of the Web information integration research has been focused on the integration of Web query interfaces. This chapter will have several sections on their integration. However, many ideas developed are also applicable to the integration of the extracted data because the problems are similar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

(2007). Information Integration. In: Web Data Mining. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-37882-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-37882-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37881-5
Online ISBN: 978-3-540-37882-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics