Abstract.
Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.
The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers’ schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers’ individual schemas.
This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.
Similar content being viewed by others
References
Arens Y, Chee CY, Hsu C-N, Knoblock CA (1994) Retrieving and integrating data from multiple information sources. Int J Cooper Inf Sys 2(2):127-158
Aberer K, Cudre-Mauroux P, Hauswirth M (2002) A framework for semantic gossiping. SIGMOD Rec 31(4):48-53
Adali S, Candan K, Papakonstantinou Y, Subrahmanian VS (1996) Query caching and optimization in distributed mediator systems. In: Proceedings of SIGMOD, Montreal, 4-6 June 1996, pp 137-148
Abiteboul S, Duschka O (1998) Complexity of answering queries using materialized views. In: Proceedings of PODS, Seattle, 1-3 June 1998, pp 254-263
Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading, MA
Gupta A, Mumick IS (1995) Maintenance of materialized views: problems, techniques, and applications. Data Eng Bull 18(2):3-18
Arenas M, Kantere V, Kementsietsidis A, Kiringa I, Miller RJ, Mylopoulos J (2003) The Hyperion project: from data integration to data coordination. SIGMOD Rec 32(3):53-58
Bruynooghe M, De-Schreye D, Krekels B (1989) Compiling control. J Logic Programm (6):135-162
Bayardo R(1997) Infosleuth: semantic integration of information in open and dynamic environments. In: Proceedings of SIGMOD, Tucson, AZ, 13-15 May 1997, pp 195-206
Bernstein P, Giunchiglia F, Kementsietsidis A, Mylopoulos J, Serafini L, Zaihrayeu I (2002) Data management for peer-to-peer computing: a vision. In: Proceedings of the WebDB workshop, Madison, WI, 6-7 June 2002, pp 89-94
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34-43
Catarci T, Lenzerini M (1993) Representing and using interschema knowledge in cooperative information systems. J Intell Cooper Inf Sys 2(4):55-62
Duschka OM, Genesereth MR (1997) Answering recursive queries using views. In: Proceedings of PODS, Tucson, AZ, 12-14 May 1997, pp 109-116
Doan A, Halevy A (2002) Efficiently ordering query plans for data integration. In: Proceedings of the international conference on data engineering, San Jose, CA, 26 February-1 March 2002, pp 393-402
Friedman M, Levy A, Millstein T (1999) Navigational plans for data integration. In: Proceedings of AAAI, Orlando, FL, 18-22 July 1999, pp 67-73
Friedman M, Weld D (1997) Efficient execution of information gathering plans. In: Proceedings of the international joint conference on artificial intelligence, Nagoya, Japan, 23-29 August 1997, pp 785-791
Gribble S, Halevy A, Ives Z, Rodrig M, Suciu D (2001) What can databases do for peer-to-peer? In: Proceedings of the ACM SIGMOD WebDB workshop 2001, Santa Barbara, CA, 24-25 May 2001, pp 31-36
Garcia-Molina H, Papakonstantinou Y, Quass D, Rajaraman A, Sagiv Y, Ullman J, Widom J (1997) The TSIMMIS project: integration of heterogeneous information sources. J Intell Inf Sys 8(2):117-132
Halevy A, Ives Z, Madhavan J, Mork P, Suciu D, Tatarinov I (2003) The Piazza peer data management system. Trans Knowl Data Eng (in press)
Halevy AY (2001) Answering queries using views: a survey. J Very Large Databases 10(4):270-294
Halevy A, Ives Z, Tatarinov I, Mork P (2003) Piazza: Data management infrastructure for semantic web applications. In: Proceedings of the international WWW conference, Budapest, Hungary, 20-24 May 2003, pp 556-567
Haas L, Kossmann D, Wimmers E, Yang J (1997) Optimizing queries across diverse data sources. In: Proceedings of the conference on very large databases, Athens, Greece, 25-29 August 1997, pp 276-285
Halevy A, Madhavan J (2003) Composing mappings among data sources. In: Proceedings of the conference on very large databases, Berlin, Germany, 9-12 September 2003, pp 572-583
Halevy A, Mumick I, Sagiv Y, Shmueli O (2001) Static analysis in datalog extensions. J ACM 48(5):971-1012
Ives ZG, Halevy AY, Weld DS (2001) Integrating network-bound XML data. IEEE Data Eng Bull Special Issue XML 24(2):20-26
Ives Z, Halevy A, Weld D (2002) An xml query engine for network-bound data. J Very Large Databases Special Issue XML Query Processing 11(4):380-402
Krishnamurthy R, Litwin W, Kent W (1991) Language features for interoperability of databases with schematic discrepancies. In: Proceedings of SIGMOD, Denver, CO, 29-31 May 1991, pp 40-49
Kalnis P, Ng W, Ooi B, Papadias D, Tan K (2002) An adaptive peer-to-peer network for distributed caching of olap results. In: Proceedings of SIGMOD, Madison, WI, 3-6 June 2002, pp 25-36
Lambrecht E, Kambhampati S, Gnanaprakasam S (1999) Optimizing recursive information gathering plans. In: Proceedings of the 16th international joint conference on artificial intelligence, Stockholm, Sweden, 31 July-6 August 1999, pp 1204-1211
Litwin W, Mark L, Roussopoulos N (1990) Interoperability of multiple autonomous databases. ACM Comput Surv 22(3):267-293
Levy AY, Mumick IS, Sagiv Y (1994) Query optimization by predicate move-around. In: Proceedings of the conference on very large databases, Santiago, Chile, 12-15 September 1994, pp 96-107
Levy AY, Rajaraman A, Ordille JJ (1996) Querying heterogeneous information sources using source descriptions. In: Proceedings of the conference on very large databases, Bombay, India, 3-6 September 1996, pp 251-262
Madhavan J, Bernstein P, Rahm E (2001) Generic schema matching with Cupid. In: Proceedings of the conference on very large databases, Rome, Italy, 11-14 September 2001, pp 49-58
Manolescu I, Florescu D, Kossmann D (2001) Answering xml queries on heterogeneous data sources. In: Proceedings of the conference on very large databases, Rome, Italy, 11-14 September 2001, pp 241-250
Miller R, Haas L, Hernandez M (2000) Schema matching as query discovery. In: Proceedings of the conference on very large databases, Cairo, Egypt, 10-14 September 2000, pp 77-88
Mena E, Kashyap V, Illarramendi A, Sheth A (2000) Imprecise answers in distributed environments: estimation of information loss for Multi-Ontology Based Query Processing. Int J Cooper Inf Sys 9(4):403-425
Napster (2001) http://www.napster.com
Ng WS, Ooi BC, Tan KL, Zhou A (2003) Peerdb: A p2p-based system for distributed data sharing. In: Proceedings of the international conference on data engineering, Bangalore, India, 5-8 March 2003
Pottinger R, Halevy A (2001) Minicon: a scalable algorithm for answering queries using views. J Very Large Databases 10(2):182-198
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. J Very Large Databases 10(4):334-350
Rusinkiewicz M, Sheth A, Karabatis G (1991) Specifying interdatabase dependencies in a multidatabase environment. IEEE Comput 24:12
Smith JM, Bernstein PA, Dayal U, Goodman N, Landers T, Lin KWT, Wong E (1981) Multibase - integrating heterogeneous distributed database systems. In: Proceedings of the national computer conference, Arlington, VA, May 1981. AFIPS Press, Montvale, NJ, pp 487-499
Srivastava D, Ramakrishnan R (1992) Pushing constraint selections. In: Proceedings of PODS, San Diego, 2-4 June 1992, pp 301-315
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003
Edited by: V. Atluri
Rights and permissions
About this article
Cite this article
Halevy, A.Y., Ives, Z.G., Suciu, D. et al. Schema mediation for large-scale semantic data sharing. The VLDB Journal 14, 68–83 (2005). https://doi.org/10.1007/s00778-003-0116-y
Issue Date:
DOI: https://doi.org/10.1007/s00778-003-0116-y