Abstract
Given a large number of data sources, each of them being indexed by attributes from a predefined set \(\cal{A}\) and given a query q over a subset Q of \(\cal{A}\) with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q. Each combination c may lead to a rewriting of q as a join over the sources in c. Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q. Although motivated by query rewriting in OpenXView [3], an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Querying xml sources using an ontology-based mediator. In: Meersman, R., Tari, Z., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519, pp. 429–448. Springer, Heidelberg (2002)
Baru, C.K., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., Chu, V.: XML-Based Information Mediation with MIX. In: SIGMOD (1999)
Boisson, F., Scholl, M., Ssebei, I., Vodislav, D.: Query rewriting for open xml data integration systems. In: IADIS WWW/Internet (2006)
Deutsch, A., Katsis, Y., Papakonstantinou, Y.: Determining source contribution in integration systems. In: PODS (2005)
Halvey, A.: Answering queries using views: A survey. The VLDB Journal, 270–294 (2001)
Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD (2002)
Lenzerini, M.: Data integration: a theoretical perspective. In: PODS (2002)
Levy, A., Mendelzon, A., Sagiv, Y., Srivastava, D.: Answering queries using views. In: PODS (1995)
Macula, A.J.: Covers of a finite set. Math. Mag. 67, 141–144 (1994)
Pottinger, R., Halevey, A.: Minicon: A scalable algorithm for answering queries using views. The VLDB Journal, 182–198 (2001)
Vodislav, D., Cluet, S., Corona, G., Sebei, I.: Views for simplifying access to heterogeneous XML data. In: CoopIS (2006)
Yu, C., Popa, L.: Constraint-based xml query rewriting for data integration. In: SIGMOD (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boisson, F., Scholl, M., Sebei, I., Vodislav, D. (2009). Scalability of Source Identification in Data Integration Systems. In: Damiani, E., Yetongnon, K., Chbeir, R., Dipanda, A. (eds) Advanced Internet Based Systems and Applications. SITIS 2006. Lecture Notes in Computer Science, vol 4879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01350-8_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-01350-8_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01349-2
Online ISBN: 978-3-642-01350-8
eBook Packages: Computer ScienceComputer Science (R0)