Skip to main content

Scalability of Source Identification in Data Integration Systems

  • Conference paper
Book cover Advanced Internet Based Systems and Applications (SITIS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4879))

  • 359 Accesses

Abstract

Given a large number of data sources, each of them being indexed by attributes from a predefined set \(\cal{A}\) and given a query q over a subset Q of \(\cal{A}\) with size k attributes, we are interested in identifying the set of all possible combinations of sources such that the union of their attributes covers Q. Each combination c may lead to a rewriting of q as a join over the sources in c. Furthermore, to limit redundancy and combinatorial explosion, we want the combination of sources to produce a minimal cover of Q. Although motivated by query rewriting in OpenXView [3], an XML data integration system with a large number of XML sources, we believe that the solutions provided in this paper apply to other scalable data integration schemes. In this paper we focus on the cases where the number of sources is very large, while the size of queries is small. We propose a novel algorithm for the computation of the set of minimal covers of a query and experimentally evaluate its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amann, B., Beeri, C., Fundulaki, I., Scholl, M.: Querying xml sources using an ontology-based mediator. In: Meersman, R., Tari, Z., et al. (eds.) CoopIS 2002, DOA 2002, and ODBASE 2002. LNCS, vol. 2519, pp. 429–448. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Baru, C.K., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., Chu, V.: XML-Based Information Mediation with MIX. In: SIGMOD (1999)

    Google Scholar 

  3. Boisson, F., Scholl, M., Ssebei, I., Vodislav, D.: Query rewriting for open xml data integration systems. In: IADIS WWW/Internet (2006)

    Google Scholar 

  4. Deutsch, A., Katsis, Y., Papakonstantinou, Y.: Determining source contribution in integration systems. In: PODS (2005)

    Google Scholar 

  5. Halvey, A.: Answering queries using views: A survey. The VLDB Journal, 270–294 (2001)

    Google Scholar 

  6. Josifovski, V., Schwarz, P., Haas, L., Lin, E.: Garlic: a new flavor of federated query processing for DB2. In: SIGMOD (2002)

    Google Scholar 

  7. Lenzerini, M.: Data integration: a theoretical perspective. In: PODS (2002)

    Google Scholar 

  8. Levy, A., Mendelzon, A., Sagiv, Y., Srivastava, D.: Answering queries using views. In: PODS (1995)

    Google Scholar 

  9. Macula, A.J.: Covers of a finite set. Math. Mag. 67, 141–144 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  10. Pottinger, R., Halevey, A.: Minicon: A scalable algorithm for answering queries using views. The VLDB Journal, 182–198 (2001)

    Google Scholar 

  11. Vodislav, D., Cluet, S., Corona, G., Sebei, I.: Views for simplifying access to heterogeneous XML data. In: CoopIS (2006)

    Google Scholar 

  12. Yu, C., Popa, L.: Constraint-based xml query rewriting for data integration. In: SIGMOD (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boisson, F., Scholl, M., Sebei, I., Vodislav, D. (2009). Scalability of Source Identification in Data Integration Systems. In: Damiani, E., Yetongnon, K., Chbeir, R., Dipanda, A. (eds) Advanced Internet Based Systems and Applications. SITIS 2006. Lecture Notes in Computer Science, vol 4879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01350-8_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01350-8_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01349-2

  • Online ISBN: 978-3-642-01350-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics