Skip to main content
Log in

Schema mediation for large-scale semantic data sharing

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.

The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers’ schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers’ individual schemas.

This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Arens Y, Chee CY, Hsu C-N, Knoblock CA (1994) Retrieving and integrating data from multiple information sources. Int J Cooper Inf Sys 2(2):127-158

    Google Scholar 

  2. Aberer K, Cudre-Mauroux P, Hauswirth M (2002) A framework for semantic gossiping. SIGMOD Rec 31(4):48-53

    Google Scholar 

  3. Adali S, Candan K, Papakonstantinou Y, Subrahmanian VS (1996) Query caching and optimization in distributed mediator systems. In: Proceedings of SIGMOD, Montreal, 4-6 June 1996, pp 137-148

  4. Abiteboul S, Duschka O (1998) Complexity of answering queries using materialized views. In: Proceedings of PODS, Seattle, 1-3 June 1998, pp 254-263

  5. Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading, MA

  6. Gupta A, Mumick IS (1995) Maintenance of materialized views: problems, techniques, and applications. Data Eng Bull 18(2):3-18

    Google Scholar 

  7. Arenas M, Kantere V, Kementsietsidis A, Kiringa I, Miller RJ, Mylopoulos J (2003) The Hyperion project: from data integration to data coordination. SIGMOD Rec 32(3):53-58

    Google Scholar 

  8. Bruynooghe M, De-Schreye D, Krekels B (1989) Compiling control. J Logic Programm (6):135-162

    Google Scholar 

  9. Bayardo R(1997) Infosleuth: semantic integration of information in open and dynamic environments. In: Proceedings of SIGMOD, Tucson, AZ, 13-15 May 1997, pp 195-206

  10. Bernstein P, Giunchiglia F, Kementsietsidis A, Mylopoulos J, Serafini L, Zaihrayeu I (2002) Data management for peer-to-peer computing: a vision. In: Proceedings of the WebDB workshop, Madison, WI, 6-7 June 2002, pp 89-94

  11. Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34-43

    Google Scholar 

  12. Catarci T, Lenzerini M (1993) Representing and using interschema knowledge in cooperative information systems. J Intell Cooper Inf Sys 2(4):55-62

    Google Scholar 

  13. Duschka OM, Genesereth MR (1997) Answering recursive queries using views. In: Proceedings of PODS, Tucson, AZ, 12-14 May 1997, pp 109-116

  14. Doan A, Halevy A (2002) Efficiently ordering query plans for data integration. In: Proceedings of the international conference on data engineering, San Jose, CA, 26 February-1 March 2002, pp 393-402

  15. Friedman M, Levy A, Millstein T (1999) Navigational plans for data integration. In: Proceedings of AAAI, Orlando, FL, 18-22 July 1999, pp 67-73

  16. Friedman M, Weld D (1997) Efficient execution of information gathering plans. In: Proceedings of the international joint conference on artificial intelligence, Nagoya, Japan, 23-29 August 1997, pp 785-791

  17. Gribble S, Halevy A, Ives Z, Rodrig M, Suciu D (2001) What can databases do for peer-to-peer? In: Proceedings of the ACM SIGMOD WebDB workshop 2001, Santa Barbara, CA, 24-25 May 2001, pp 31-36

  18. Garcia-Molina H, Papakonstantinou Y, Quass D, Rajaraman A, Sagiv Y, Ullman J, Widom J (1997) The TSIMMIS project: integration of heterogeneous information sources. J Intell Inf Sys 8(2):117-132

    Google Scholar 

  19. Halevy A, Ives Z, Madhavan J, Mork P, Suciu D, Tatarinov I (2003) The Piazza peer data management system. Trans Knowl Data Eng (in press)

  20. Halevy AY (2001) Answering queries using views: a survey. J Very Large Databases 10(4):270-294

    Google Scholar 

  21. Halevy A, Ives Z, Tatarinov I, Mork P (2003) Piazza: Data management infrastructure for semantic web applications. In: Proceedings of the international WWW conference, Budapest, Hungary, 20-24 May 2003, pp 556-567

  22. Haas L, Kossmann D, Wimmers E, Yang J (1997) Optimizing queries across diverse data sources. In: Proceedings of the conference on very large databases, Athens, Greece, 25-29 August 1997, pp 276-285

  23. Halevy A, Madhavan J (2003) Composing mappings among data sources. In: Proceedings of the conference on very large databases, Berlin, Germany, 9-12 September 2003, pp 572-583

  24. Halevy A, Mumick I, Sagiv Y, Shmueli O (2001) Static analysis in datalog extensions. J ACM 48(5):971-1012

    Google Scholar 

  25. Ives ZG, Halevy AY, Weld DS (2001) Integrating network-bound XML data. IEEE Data Eng Bull Special Issue XML 24(2):20-26

    Google Scholar 

  26. Ives Z, Halevy A, Weld D (2002) An xml query engine for network-bound data. J Very Large Databases Special Issue XML Query Processing 11(4):380-402

    Google Scholar 

  27. Krishnamurthy R, Litwin W, Kent W (1991) Language features for interoperability of databases with schematic discrepancies. In: Proceedings of SIGMOD, Denver, CO, 29-31 May 1991, pp 40-49

  28. Kalnis P, Ng W, Ooi B, Papadias D, Tan K (2002) An adaptive peer-to-peer network for distributed caching of olap results. In: Proceedings of SIGMOD, Madison, WI, 3-6 June 2002, pp 25-36

  29. Lambrecht E, Kambhampati S, Gnanaprakasam S (1999) Optimizing recursive information gathering plans. In: Proceedings of the 16th international joint conference on artificial intelligence, Stockholm, Sweden, 31 July-6 August 1999, pp 1204-1211

  30. Litwin W, Mark L, Roussopoulos N (1990) Interoperability of multiple autonomous databases. ACM Comput Surv 22(3):267-293

    Google Scholar 

  31. Levy AY, Mumick IS, Sagiv Y (1994) Query optimization by predicate move-around. In: Proceedings of the conference on very large databases, Santiago, Chile, 12-15 September 1994, pp 96-107

  32. Levy AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: Proceedings of the conference on very large databases, Bombay, India, 3-6 September 1996, pp 251-262

  33. Madhavan J, Bernstein P, Rahm E (2001) Generic schema matching with Cupid. In: Proceedings of the conference on very large databases, Rome, Italy, 11-14 September 2001, pp 49-58

  34. Manolescu I, Florescu D, Kossmann D (2001) Answering xml queries on heterogeneous data sources. In: Proceedings of the conference on very large databases, Rome, Italy, 11-14 September 2001, pp 241-250

  35. Miller R, Haas L, Hernandez M (2000) Schema matching as query discovery. In: Proceedings of the conference on very large databases, Cairo, Egypt, 10-14 September 2000, pp 77-88

  36. Mena E, Kashyap V, Illarramendi A, Sheth A (2000) Imprecise answers in distributed environments: estimation of information loss for Multi-Ontology Based Query Processing. Int J Cooper Inf Sys 9(4):403-425

    Google Scholar 

  37. Napster (2001) http://www.napster.com

  38. Ng WS, Ooi BC, Tan KL, Zhou A (2003) Peerdb: A p2p-based system for distributed data sharing. In: Proceedings of the international conference on data engineering, Bangalore, India, 5-8 March 2003

  39. Pottinger R, Halevy A (2001) Minicon: a scalable algorithm for answering queries using views. J Very Large Databases 10(2):182-198

    Google Scholar 

  40. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. J Very Large Databases 10(4):334-350

    Google Scholar 

  41. Rusinkiewicz M, Sheth A, Karabatis G (1991) Specifying interdatabase dependencies in a multidatabase environment. IEEE Comput 24:12

    Google Scholar 

  42. Smith JM, Bernstein PA, Dayal U, Goodman N, Landers T, Lin KWT, Wong E (1981) Multibase - integrating heterogeneous distributed database systems. In: Proceedings of the national computer conference, Arlington, VA, May 1981. AFIPS Press, Montvale, NJ, pp 487-499

  43. Srivastava D, Ramakrishnan R (1992) Pushing constraint selections. In: Proceedings of PODS, San Diego, 2-4 June 1992, pp 301-315

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alon Y. Halevy.

Additional information

Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003

Edited by: V. Atluri

Rights and permissions

Reprints and permissions

About this article

Cite this article

Halevy, A.Y., Ives, Z.G., Suciu, D. et al. Schema mediation for large-scale semantic data sharing. The VLDB Journal 14, 68–83 (2005). https://doi.org/10.1007/s00778-003-0116-y

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-003-0116-y

Keywords:

Navigation