Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Data Integration

  • Paolo PapottiEmail author
  • Donatello SantoroEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_6-1



The goal of data integration systems is to provide a uniform access to a set of heterogeneous data sources. These sources can differ on the data model (relational, hierarchical, semi-structured), on the schema level, or on the query-processing capabilities. In a data integration architecture, these sources are queried by using a global schema, also called mediated schema, which provides a virtual view of the underlying sources.


Integrating data between different sources is a crucial step in many real-life applications, and the growth of structured data sources available on the Web is making this problem even more challenging. Consider as an example a Web application where users can query information about sport events planned in a particular day. In a traditional data management application, the information is stored in a database with a fixed schema (e.g., in a relational data management system) and retrieved by using a query....
This is a preview of subscription content, log in to check access.


  1. Balakrishnan S, Halevy AY, Harb B, Lee H, Madhavan J, Rostamizadeh A, Shen W, Wilder K, Wu F, Yu C (2015) Applying webtables in practice. In: CIDRGoogle Scholar
  2. Bernstein PA, Madhavan J, Rahm E (2011) Generic schema matching, ten years later. PVLDB 4(11): 695–701Google Scholar
  3. Chakrabarti K, Chaudhuri S, Chen Z, Ganjam K, He Y, Redmond W (2016) Data services leveraging bing’s data assets. IEEE Data Eng Bull 39(3):15–28Google Scholar
  4. Crescenzi V, Mecca G, Merialdo P (2001) Roadrunner: towards automatic data extraction from large web sites. In: VLDB 2001, proceedings of 27th international conference on very large data bases, Roma, 11–14 Sept 2001, pp 109–118Google Scholar
  5. Doan A, Halevy A, Ives Z (2012) Principles of data integration, 1st edn. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  6. Franklin M, Halevy A, Maier D (2005) From databases to dataspaces: a new abstraction for information management. ACM SIGMOD Rec 34(4):27–33CrossRefGoogle Scholar
  7. Golshan B, Halevy AY, Mihaila GA, Tan W (2017) Data integration: after the teenage years. In: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI symposium on principles of database systems, PODS 2017, Chicago, 14–19 May 2017, pp 101–106Google Scholar
  8. Halevy AY, Ives ZG, Suciu D, Tatarinov I (2003) Schema mediation in peer data management systems. In: Proceedings 19th international conference on data engineering, 2003. IEEE, pp 505–516Google Scholar
  9. Halevy A, Rajaraman A, Ordille J (2006) Data integration: the teenage years. In: Proceedings of the 32nd international conference on very large data bases, VLDB Endowment, VLDB’06, pp 9–16Google Scholar
  10. Ives ZG, Florescu D, Friedman M, Levy A, Weld DS (1999) An adaptive query execution system for data integration. In: Proceedings of the 1999 ACM SIGMOD international conference on management of data, SIGMOD’99. ACM, New York, pp 299–310. https://doi.org/10.1145/304182.304209 CrossRefGoogle Scholar
  11. Ives ZG, Halevy AY, Weld DS (2004) Adapting to source properties in processing data integration queries. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data, SIGMOD’04, Paris, 13–18 June 2004. ACM, New York, pp 395–406. https://doi.org/10.1145/1007568.1007613 Google Scholar
  12. Lenzerini M (2002) Data integration: a theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS’02. ACM, New York, pp 233–246. https://doi.org/10.1145/543613.543644 CrossRefGoogle Scholar
  13. Liu L, Zsu MT (2009) Encyclopedia of database systems, 1st edn. Springer, Incorporated, New York/LondonCrossRefGoogle Scholar
  14. Popa L, Velegrakis Y, Miller RJ, Hernández MA, Fagin R (2002) Translating web data. In: Proceedings of 28th international conference on very large data bases, VLDB 2002, Hong Kong, 20–23 Aug 2002, pp 598–609CrossRefGoogle Scholar
  15. Pottinger R, Halevy A (2001) Minicon: a scalable algorithm for answering queries using views. VLDB J 10(2–3):182–198zbMATHGoogle Scholar
  16. Tatarinov I, Ives Z, Madhavan J, Halevy A, Suciu D, Dalvi N, Dong XL, Kadiyska Y, Miklau G, Mork P (2003) The piazza peer data management project. ACM SIGMOD Rec 32(3):47–52CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Data Science DepartmentEurecomBiotFrance
  2. 2.Dipartimento di Matematica, Informatica ed EconomiaUniversità degli Studi della BasilicataPotenzaItaly

Section editors and affiliations

  • Maik Thiele
    • 1
  1. 1.Database Systems GroupTechnische Universität DresdenDresdenDeutschland