Data Integration under Integrity Constraints

  • Andrea CalìEmail author
  • Diego Calvanese
  • Giuseppe De Giacomo
  • Maurizio Lenzerini


Data integration systems provide access to a set of heterogeneous, autonomous data sources through a so-called global schema. There are basically two approaches for designing a data integration system. In the global-centric approach, one defines the elements of the global schema as views over the sources, whereas in the local-centric approach, one characterizes the sources as views over the global schema. It is well known that processing queries in the latter approach is similar to query answering with incomplete information, and, therefore, is a complex task. On the other hand, it is a common opinion that query processing is much easier in the former approach. In this paper we show the surprising result that, when the global schema is expressed in the relational model with integrity constraints, even of simple types, the problem of incomplete information implicitly arises, making query processing difficult in the global-centric approach as well. We then focus on global schemas with key and foreign key constraints, which represents a situation which is very common in practice, and we illustrate techniques for effectively answering queries posed to the data integration system in this case.


Logic Program Query Processing Global Schema Integrity Constraint Relation Symbol 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Batini, C., Lenzerini, M., Navathe, S. B.: A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18 (1986) 323–364 262Google Scholar
  2. 2.
    Sheth, A. P., Larson, J. A.: Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22 (1990) 183–236 262Google Scholar
  3. 3.
    Thomas, G., Thompson, G. R., Chung, C. W., Barkmeyer, E., Carter, F., Templeton, M., Fox, S., Hartman, B.: Heterogeneous distributed database systems for production use. ACM Computing Surveys 22 (1990) 237–266 262Google Scholar
  4. 4.
    Litwin, W., Mark, L., Roussopoulos, N.: Interoperability of multiple autonomous databases. ACM Computing Surveys 22 (1990) 267–293 262Google Scholar
  5. 5.
    Catarci, T., Lenzerini, M.: Representing and using interschema knowledge in cooperative information systems. J. of Intelligent and Cooperative Information Systems 2 (1993) 375–398 262Google Scholar
  6. 6.
    Hull, R.: Managing semantic heterogeneity in databases: A theoretical perspective. In: Proc. of PODS’97. (1997) 262Google Scholar
  7. 7.
    Halevy, A. Y.: Answering queries using views: A survey. VLDB Journal 10 (2001) 270–294 262, 263Google Scholar
  8. 8.
    Ullman, J. D.: Information integration using logical views. In: Proc. of ICDT’97. Volume 1186 of LNCS., Springer (1997) 19–40 262, 263Google Scholar
  9. 9.
    Li, C., Chang, E.: Query planning with limited source capabilities. In: Proc. of ICDE 2000. (2000) 401–412 263Google Scholar
  10. 10.
    Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J. D., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: Data models and languages. J. of Intelligent Information Systems 8 (1997) 117–132 263Google Scholar
  11. 11.
    Anthony Tomasic, Louiqa Raschid, P. V.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. on Knowledge and Data Engineering 10 (1998) 808–823 263Google Scholar
  12. 12.
    Goh, C. H., Bressan, S., Madnick, S. E., Siegel, M. D.: Context interchange: New features and formalisms for the intelligent integration of information. ACM Trans. on Information Systems 17 (1999) 270–293 263Google Scholar
  13. 13.
    Kirk, T., Levy, A. Y., Sagiv, Y., Srivastava, D.: The Information Manifold. In: Proceedings of the AAAI 1995 Spring Symp. on Information Gathering from Heterogeneous, Distributed Enviroments. (1995) 85–91 263Google Scholar
  14. 14.
    Abiteboul, S., Duschka, O.: Complexity of answering queries using materialized views. In: Proc. of PODS’98. (1998) 254–265 263, 266Google Scholar
  15. 15.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Nardi, D., Rosati, R.: Data integration in data warehousing. Int. J. of Cooperative Information Systems 10 (2001) 237–271 263, 264Google Scholar
  16. 16.
    Calì, A., De Giacomo, G., Lenzerini, M.: Models for information integration: Turning local-as-view into global-as-view. In: Proc. of Int. Workshop on Foundations of Models for Information Integration (10th Workshop in the series Foundations of Models and Languages for Data and Objects). (2001) 263Google Scholar
  17. 17.
    Gryz, J.: Query folding with inclusion dependencies. In: Proc. of ICDE’98. (1998) 126–133 263Google Scholar
  18. 18.
    Grahne, G., Mendelzon, A. O.: Tableau techniques for querying information sources through global schemas. In: Proc. of ICDT’99. Volume 1540 of LNCS., Springer (1999) 332–347 263, 266Google Scholar
  19. 19.
    Calvanese, D., De Giacomo, G., Lenzerini, M., Vardi, M. Y.: Query processing using views for regular path queries with inverse. In: Proc. of PODS 2000. (2000) 58–66 263Google Scholar
  20. 20.
    van der Meyden, R.: Logical approaches to incomplete information. In Chomicki, J., Saake, G., eds.: Logics for Databases and Information Systems. Kluwer Academic Publisher (1998) 307–356 263Google Scholar
  21. 21.
    Fernandez, M. F., Florescu, D., Levy, A., Suciu, D.: Verifying integrity constraints on web-sites. In: Proc. of IJCAI’99. (1999) 614–619 264Google Scholar
  22. 22.
    Fernandez, M. F., Florescu, D., Kang, J., Levy, A. Y., Suciu, D.: Catching the boat with strudel: Experiences with a web-site management system. In: Proc. of ACM SIGMOD. (1998) 414–425 264Google Scholar
  23. 23.
    Carey, M. J., Haas, L. M., Schwarz, P. M., Arya, M., Cody, W. F., Fagin, R., Flickner, M., Luniewski, A., Niblack, W., Petkovic, D., Thomas, J., Williams, J. H., Wimmers, E. L.: Towards heterogeneous multimedia information systems: The Garlic approach. In: Proc. of the 5th Int. Workshop on Research Issues in Data Engineering – Distributed Object Management (RIDE-DOM’95), IEEE CS Press (1995) 124–131 264Google Scholar
  24. 24.
    Li, C., Yerneni, R., Vassalos, V., Garcia-Molina, H., Papakonstantinou, Y., Ullman, J. D., Valiveti, M.: Capability based mediation in TSIMMIS. In: Proc. of ACM SIGMOD. (1998) 564–566 264Google Scholar
  25. 25.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E.: An extensible framework for data cleaning. Technical Report 3742, INRIA, Rocquencourt (1999) 268Google Scholar
  26. 26.
    Bouzeghoub, M., Lenzerini, M.: Introduction to the special issue on data extraction, cleaning, and reconciliation. Information Systems 26 (2001) 535–536 268Google Scholar
  27. 27.
    Lloyd, J. W.: Foundations of Logic Programming (Second, Extended Edition). Springer, Berlin, Heidelberg (1987) 272, 273, 274, 275Google Scholar
  28. 28.
    Lloyd, J. W., Shepherdson, J. C.: Partial evaluation in logic programming. J. of Logic Programming 11 (1991) 217–242 274, 275Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Andrea Calì
    • 1
    Email author
  • Diego Calvanese
    • 1
  • Giuseppe De Giacomo
    • 1
  • Maurizio Lenzerini
    • 1
  1. 1.Dipartimento di Informatica e SistemisticaUniversità di Roma “La Sapienza”RomaItaly

Personalised recommendations