Advertisement

Information Integration

  • Bing LiuEmail author
Chapter
Part of the Data-Centric Systems and Applications book series (DCSA)

Abstract

In Chap. 9, we studied data extraction from Web pages. The extracted data is put in tables. For an application, it is, however, often not sufficient to extract data from only a single site. Instead, data from a large number of sites are gathered in order to provide value-added services. In such cases, extraction is only part of the story. The other part is the integration of the extracted data to produce a consistent and coherent database because different sites typically use different data formats. Intuitively, integration means to match columns in different data tables that contain the same type of information (e.g., product names) and to match values that are semantically identical but represented differently in different Web sites (e.g., “Coke” and “Coca Cola”). Unfortunately, limited integration research has been done so far in this specific context. Much of the Web information integration research has been focused on the integration of Web query interfaces. This chapter will have several sections on their integration. However, many ideas developed are also applicable to the integration of the extracted data because the problems are similar.

Keywords

Information Integration Global Schema Schema Match Schema Element Query Interface 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Bibliography

  1. 1.
    Agrawal, R. and R. Srikant. On integrating catalogs. In Proceedings of International Conference on World Wide Web (WWW-2001), 2001.Google Scholar
  2. 2.
    Batini, C., M. Lenzerini, and S. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys (CSUR), 1986, 18(4): p. 323–364.CrossRefGoogle Scholar
  3. 3.
    Bergman, M. The deep web: Surfacing hidden value. Journal of Electronic Publishing, 2001, 7(1): p. 07–01.CrossRefGoogle Scholar
  4. 4.
    Bilke, A. and F. Naumann. Schema matching using duplicates. In Proceedings of IEEE International Conference on Data Engingeering (ICDE-2005), 2005.Google Scholar
  5. 5.
    Chang, K., B. He, C. Li, M. Patel, and Z. Zhang. Structured databases on the web: Observations and implications. ACM SIGMOD Record, 2004, 33(3): p. 61–70.CrossRefGoogle Scholar
  6. 6.
    Clifton, C., E. Housman, and A. Rosenthal. Experience with a combined approach to attribute-matching across heterogeneous databases. In Proceedings of IFIP 2.6 Working Conf. Database Semantics, 1997.Google Scholar
  7. 7.
    Cohen, W. Integration of heterogeneous databases without common domains using queries based on textual similarity. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-1998), 1998.Google Scholar
  8. 8.
    Dhamankar, R., Y. Lee, A. Doan, A. Halevy, and P. Domingos. iMAP: discovering complex semantic matches between database schemas. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2004), 2004.Google Scholar
  9. 9.
    Dice, L. Measures of the amount of ecologic association between species. Ecology, 1945, 26(3): p. 297–302.CrossRefGoogle Scholar
  10. 10.
    Do, H. and E. Rahm. COMA: a system for flexible combination of schema matching approaches. In Proceedings of International Conference on Very Large Data Bases (VLDB-2002), 2002.Google Scholar
  11. 11.
    Doan, A., P. Domingos, and A. Halevy. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2001), 2001.Google Scholar
  12. 12.
    Doan, A. and A. Halevy. Semantic integration research in the database community: A brief survey. AI magazine, 2005, 26(1): p. 83.Google Scholar
  13. 13.
    Doan, A., J. Madhavan, P. Domingos, and A. Halevy. Learning to map between ontologies on the semantic web. In Proceedings of International Conference on World Wide Web (WWW-2002), 2002.Google Scholar
  14. 14.
    Dragut, E., W. Wu, P. Sistla, C. Yu, and W. Meng. Merging source query interfaces onweb databases. In Proceedings of IEEE International Conference on Data Engineering (ICDE-06), 2006.Google Scholar
  15. 15.
    Dragut, E., C. Yu, and W. Meng. Meaningful labeling of integrated query interfaces. In Proceedings of International Conference on Very Large Data Bases (VLDB-2006), 2006.Google Scholar
  16. 16.
    Embley, D., D. Jackman, and L. Xu. Multifaceted exploitation of metadata for attribute match discovery in information integration. In Proceedings of Workshop on Information Integration on the Web, 2001.Google Scholar
  17. 17.
    Gal, A., G. Modica, H. Jamil, and A. Eyal. Automatic ontology matching using application semantics. AI magazine, 2005, 26(1): p. 21.Google Scholar
  18. 18.
    He, B. and K. Chang. Statistical schema matching across web query interfaces. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2003), 2003.Google Scholar
  19. 19.
    He, B., K. Chang, and J. Han. Discovering complex matchings across web query interfaces: a correlation mining approach. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Google Scholar
  20. 20.
    Mining (KDD-2004), 2004.Google Scholar
  21. 21.
    He, H., W. Meng, C. Yu, and Z. Wu. Automatic extraction of web search interfaces for interface schema integration. In Proceedings of WWW Alternate Track Papers and Posters, 2004.Google Scholar
  22. 22.
    He, H., W. Meng, C. Yu, and Z. Wu. Wise-integrator: An automatic integrator of web search interfaces for e-commerce. In Proceedings of International Conference on Very Large Data Bases (VLDB-2003), 2003.Google Scholar
  23. 23.
    Kalfoglou, Y. and M. Schorlemmer. Ontology mapping: the state of the art. The knowledge engineering review, 2003, 18(01): p. 1–31.CrossRefGoogle Scholar
  24. 24.
    Kashyap, V. and A. Sheth. Semantic and schematic similarities between database objects: a context-based approach. The VLDB journal, 1996, 5(4): p. 276–304.CrossRefGoogle Scholar
  25. 25.
    Larson, J., S. Navathe, and R. Elmasri. A theory of attributed equivalence in databases with application to schema integration. IEEE Transactions on Software Engineering, 1989: p. 449–463.Google Scholar
  26. 26.
    Madhavan, J., P. Bernstein, A. Doan, and A. Halevy. Corpus-based schema matching. In Proceedings of IEEE International Conference on Data Engineering (ICDE-2005), 2005.Google Scholar
  27. 27.
    Madhavan, J., P. Bernstein, and E. Rahm. Generic schema matching with cupid. In Proceedings of International Conference on Very Large Data Bases (VLDB-2001), 2001.Google Scholar
  28. 28.
    Miller, G., R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. WordNet: An on-line lexical database. 1990: Oxford Univ. Press.Google Scholar
  29. 29.
    Milo, T. and S. Zohar. Using schema matching to simplify heterogeneous data translation. In Proceedings of International Conference on Very Large Data Bases (VLDB-1998), 1998.Google Scholar
  30. 30.
    Palopoli, L., D. Saccá, and D. Ursino. An automatic technique for detecting type conflicts in database schemes. In Proceedings of ACM International Conference on Information and knowledge management (CIKM-1998), 1998.Google Scholar
  31. 31.
    Rahm, E. and P. Bernstein. A survey of approaches to automatic schema matching. The VLDB journal, 2001, 10(4): p. 334–350.zbMATHCrossRefGoogle Scholar
  32. 32.
    Sheth, A. and J. Larson. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 1990, 22(3): p. 183–236.CrossRefGoogle Scholar
  33. 33.
    Shvaiko, P. and J. Euzenat. A survey of schema-based matching approaches. Journal on Data Semantics IV, 2005: p. 146–171.Google Scholar
  34. 34.
    Wache, H., T. Voegele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, and S. Hübner. Ontology-based integration of information-a survey of existing approaches. In IJCAI Workshop on Ontologies and Information Sharing, 2001.Google Scholar
  35. 35.
    Wang, J., J.-R. Wen, B.A. Lochovsky, and W.-Y. Ma. Instance-Based Schema Matching for Web Databases by Domain-specific Query Probing. In Proceedings of International Conference on Very Large Data Bases (VLDB-2004), 2004.Google Scholar
  36. 36.
    Wu, W., A. Doan, and C. Yu. WebIQ: Learning from the web to match deepweb query interfaces. In Proceedings of IEEE International Conference on Data Engingeering (ICDE-2006), 2006.Google Scholar
  37. 37.
    Wu, W., C. Yu, A. Doan, and W. Meng. An interactive clustering-based approach to integrating source query interfaces on the deep web. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2004), 2004.Google Scholar
  38. 38.
    Xu, L. and D. Embley. Discovering direct and indirect matches for schema elements. In Proceedings of Intl. Conf. on Database Systems for Advanced Applications (DASFAA-2003), 2003.Google Scholar
  39. 39.
    Yan, L., R. Miller, L. Haas, and R. Fagin. Data-driven understanding and refinement of schema mappings. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2001), 2001.Google Scholar
  40. 40.
    Zhang, D. and W. Lee. Web taxonomy integration using support vector machines. In Proceedings of International Conference on World Wide Web (WWW-2004), 2004.Google Scholar
  41. 41.
    Zhang, Z., B. He, and K. Chang. Understanding web query interfaces: Besteffort parsing with hidden syntax. In Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD-2004), 2004.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Illinois, ChicagoChicagoUSA

Personalised recommendations