Skip to main content

Declarative Data Fusion – Syntax, Semantics, and Implementation

  • Conference paper
Advances in Databases and Information Systems (ADBIS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3631))

Abstract

In today’s integrating information systems data fusion, i.e., the merging of multiple tuples about the same real-world object into a single tuple, is left to ETL tools and other specialized software. While much attention has been paid to architecture, query languages, and query execution, the final step of actually fusing data from multiple sources into a consistent and homogeneous set is often ignored.

This paper states the formal problem of data fusion in relational databases and discusses which parts of the problem can already be solved with standard Sql. To bridge the final gap, we propose the SQL Fuse By statement and define its syntax and semantics. A first implementation of the statement in a prototypical database system shows the usefulness and feasibility of the new operator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. van Bercken, J., Blohsfeld, B., Dittrich, J.-P., Krämer, J., Schäfer, T., Schneider, M., Seeger, B.: XXL - a library approach to supporting efficient implementations of advanced database queries. In: Proc. of VLDB 2001, pp. 39–48 (2001)

    Google Scholar 

  2. Dayal, U.: Processing queries over generalization hierarchies in a multidatabase system. In: Proc. of VLDB 1983, pp. 342–353 (1983)

    Google Scholar 

  3. Galhardas, H., Florescu, D., Shasha, D., Simon, E.: AJAX: An extensible data cleaning tool. In: Proc. of SIGMOD, p. 590 (2000)

    Google Scholar 

  4. Galindo-Legaria, C.: Outerjoins as disjunctions. In: Proc. of SIGMOD, pp. 348–358 (1994)

    Google Scholar 

  5. Garcia-Molina, H., Papakonstantinou, Y., Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Vassalos, V., Widom, J.: The TSIMMIS approach to mediation: Data models and languages. J. Intell. Inf. Syst. 8(2), 117–132 (1997)

    Article  Google Scholar 

  6. Greco, S., Pontieri, L., Zumpano, E.: Integrating and managing conflicting data. In: Revised Papers from the 4th Int. Andrei Ershov Memorial Conf. on Perspectives of System Informatics, pp. 349–362 (2001)

    Google Scholar 

  7. Motro, A.: Completeness information and its application to query processing. In: Proc. of VLDB Kyoto, pp. 170–178 (August 1986)

    Google Scholar 

  8. Motro, A., Anokhin, P.: Fusionplex: resolution of data inconsistencies in the integration of heterogeneous information sources. Information Fusion (2004) (In Press)

    Google Scholar 

  9. Naumann, F., Freytag, J.-C., Leser, U.: Completeness of integrated information sources. Information Systems 29(7), 583–615 (2004)

    Article  Google Scholar 

  10. Papakonstantinou, Y., Abiteboul, S., Garcia-Molina, H.: Object fusion in mediator systems. In: Proc. of VLDB, pp. 413–424 (1996)

    Google Scholar 

  11. Raman, V., Hellerstein, J.: Potter’s Wheel: An interactive data cleaning system. In: Proc. of VLDB, pp. 381–390 (2001)

    Google Scholar 

  12. Rao, J., Pirahesh, H., Zuzarte, C.: Canonical abstraction for outerjoin optimization. In: Proc. of SIGMOD, pp. 671–682. ACM Press, New York (2004)

    Chapter  Google Scholar 

  13. Sattler, K., Conrad, S., Saake, G.: Adding Conflict Resolution Features to a Query Language for Database Federations. In: Proc. 3rd Int. Workshop on Engineering Federated Information Systems, EFIS, pp. 41–52 (2000)

    Google Scholar 

  14. Scannapieco, M., Batini, C.: Completeness in the relational model: a comprehensive framework. In: Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, pp. 333–345 (2004)

    Google Scholar 

  15. Schallehn, E., Sattler, K.-U., Saake, G.: Efficient similarity-based operations for data integration. Data Knowl. Eng. 48(3), 361–387 (2004)

    Article  Google Scholar 

  16. Subrahmanian, V.S., Adali, S., Brink, A., Emery, R., Lu, J.L., Rajput, A., Rogers, T.J., Ross, R., Ward, C.: Hermes: A heterogeneous reasoning and mediator system. Technical report, University of Maryland (1995)

    Google Scholar 

  17. Wang, H., Zaniolo, C.: Using SQL to build new aggregates and extenders for object- relational systems. In: Proc of VLDB, pp. 166–175 (2000)

    Google Scholar 

  18. Yan, L.L., Özsu, M.: Conflict tolerant queries in AURORA. In: Proc. of CoopIS, p. 279 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bleiholder, J., Naumann, F. (2005). Declarative Data Fusion – Syntax, Semantics, and Implementation. In: Eder, J., Haav, HM., Kalja, A., Penjam, J. (eds) Advances in Databases and Information Systems. ADBIS 2005. Lecture Notes in Computer Science, vol 3631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11547686_5

Download citation

  • DOI: https://doi.org/10.1007/11547686_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28585-4

  • Online ISBN: 978-3-540-31895-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics