A structure-based approach to querying semi-structured data
Several researchers have considered integrating multiple unstructured, semi-structured, and structured data sources by modeling all sources as edge labeled graphs. Data in this model is self-describing and dynamically typed, and captures both schema and data information. The labels are arbitrary atomic values, such as strings, integers, reals, etc., and the integrated data graph is stored in a unique data repository, as a relation of edges. The relation is dynamically typed, i.e. each edge label is tagged with its type.
Although the unique, labeled graph repository is flexible, it looses all static type information, and results in severe efficiency penalties compared to querying structured databases, such as relational or object-oriented databases. In this paper we propose an alternative method of storing and querying semi-structured data, using storage schemas, which are closely related to recently introduced graph schemas [BDFS97]. A storage schema splits the graph's edges into several relations, some of which may have labels of known types (such as strings or integers) while others may be still dynamically typed. We show here that all positive queries in UnQL, a query language for semistructured data, can be translated into conjunctive queries against the relations in the storage schema. This result may be surprising, because UnQL is a powerful language, featuring regular path expressions, restructuring queries, joins, and unions. We use this technique in order to translate queries on the integrated, semi-structured data into queries on the external sources. In this setting the integrated semi-structured data is not materialized but virtual and the problem is to translate a query against the integrated view, possibly involving regular path expressions and restructuring, into queries which can be answered by the external sources. Here we use again the storage schema in order to split the graph into relations according to their sources. Any positive UnQL query is decomposed based on these relations and translated into queries on the external sources.
KeywordsQuery Language Data Graph Recursive Function Conjunctive Query Storage Schema
Unable to display preview. Download preview PDF.
- [Abi97]Serge Abiteboul. Querying semi-structured data. In ICDT, 1997.Google Scholar
- [AHV95]Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995.Google Scholar
- [AV97]Serge Abiteboul and Victor Vianu. Queries and computation on the web. In ICDT, pages 262–275, Deplhi, Greece, 1997. Springer Verlag.Google Scholar
- [BDFS97]Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu. Adding structure to unstructured data. In ICDT, pages 336–350, Deplhi, Greece, 1997. Springer Verlag.Google Scholar
- [BDHS96a]Peter Bunenyan, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language, and optimization techniques for unstructured data. In SIGMOD, 1996.Google Scholar
- [BDHS96b]Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. Technical Report 96-09, University of Pennsylvania, Computer and Information Science Department, February 1996.Google Scholar
- [BLS+94]P. Buneman, L. Libkin, D. Suciu, V. Tannery, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, March 1994.Google Scholar
- [FFK+97]M. Fernandez, D. Florescu, J. Kang, A. Levy, and D. Suciu. STRUDEL — a web-site management system. In SIGMOD, Tucson, Arizona, May 1997.Google Scholar
- [PAGM96]Y. Papakonstantinou, S. Abiteboul, and H. Garcia-Molina. Object fusion in mediator systems. In Proceedings of VLDB, September 1996.Google Scholar