Scalability of Source Identification in Data Integration Systems

  • François Boisson
  • Michel Scholl
  • Imen Sebei
  • Dan Vodislav
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4879)


Given a large number of data sources, each of them being indexed by attributes from a predefined set \(\cal{A}\) and given a query q over a subset Q of \(\cal{A}\) with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q. Each combination c may lead to a rewriting of q as a join over the sources in c. Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q. Although motivated by query rewriting in OpenXView [3], an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.


Equivalence Class Global Schema Minimal Cover Answering Query Data Integration System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Querying xml sources using an ontology-based mediator. In: Meersman, R., Tari, Z., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519, pp. 429–448. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Baru, C.K., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., Chu, V.: XML-Based Information Mediation with MIX. In: SIGMOD (1999)Google Scholar
  3. 3.
    Boisson, F., Scholl, M., Ssebei, I., Vodislav, D.: Query rewriting for open xml data integration systems. In: IADIS WWW/Internet (2006)Google Scholar
  4. 4.
    Deutsch, A., Katsis, Y., Papakonstantinou, Y.: Determining source contribution in integration systems. In: PODS (2005)Google Scholar
  5. 5.
    Halvey, A.: Answering queries using views: A survey. The VLDB Journal, 270–294 (2001)Google Scholar
  6. 6.
    Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD (2002)Google Scholar
  7. 7.
    Lenzerini, M.: Data integration: a theoretical perspective. In: PODS (2002)Google Scholar
  8. 8.
    Levy, A., Mendelzon, A., Sagiv, Y., Srivastava, D.: Answering queries using views. In: PODS (1995)Google Scholar
  9. 9.
    Macula, A.J.: Covers of a finite set. Math. Mag. 67, 141–144 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Pottinger, R., Halevey, A.: Minicon: A scalable algorithm for answering queries using views. The VLDB Journal, 182–198 (2001)Google Scholar
  11. 11.
    Vodislav, D., Cluet, S., Corona, G., Sebei, I.: Views for simplifying access to heterogeneous XML data. In: CoopIS (2006)Google Scholar
  12. 12.
    Yu, C., Popa, L.: Constraint-based xml query rewriting for data integration. In: SIGMOD (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • François Boisson
    • 1
  • Michel Scholl
    • 1
  • Imen Sebei
    • 1
  • Dan Vodislav
    • 1
  1. 1.CNAM/CEDRICParisFrance

Personalised recommendations