Abstract
Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a “lightweight” object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of examples (a complete semantics is available in an accompanying technical report [QRS+94]), and describes the basic architecture and query processing strategy of the “lightweight” object repository we have developed.
This work was supported by ARPA Contract F33615-93-1-1339, by the Anderson Faculty Scholar Fund, and by equipment grants from Digital Equipment Corporation and IBM Corporation.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
F. Bancilhon, S. Cluet, and C. Delobel. A query language for O2 . In F. Bancilhon, C. Delobel, and P. Kanellakis, editors, Building an Object Oriented Database System — The Story of O2, pages 234–255. Morgan Kauffmann, 1992.
G. Blake, M. Consens, P. Kilpeläinen, P. Larson, T. Snider, and F. Tompa. Text / relational database management systems: Harmonizing SQL and SGML. In W. Litwin and T. Risch, editors, Applications of Databases: First International Conference, pages 267–280. Vadstena, Sweden, 1994.
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilties. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 313–324, Minneapolis, MN, May 1994.
R. Cattel, editor. The Object Database Standard: ODMG-93. Morgan Kaufmann, 1994.
M. Carey, D. DeWitt, and S. Vandenberg. A data model and query language for Exodus. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 413–423, Chicago, IL, June 1988.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of the 100th IPSJ, Tokyo, Japan, October 1994.
P. Dadam, K. Kuespert, F. Andersen, H. Blanken, R. Erbe, J. Guenauer, V. Lum, P. Pistor, and G. Walch. A DBMS prototype to support extended NF 2 relations: An integrated view on flat tables and hierarchies. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 356–367, 1986.
D. Fishman et al. Overview of the Iris DBMS. In W. Kim and F.H. Lochovsky, editors, Object-Oriented Concepts, Languages, and Applications, pages 219–250. Addison-Wesley, 1989.
M. Freedman. WILLOW: Technical overview. Available by anonymous ftp from ftp.cac.washington.edu as the file willow/Tech-Report.ps, September 1994.
M. Genesereth and R. Fikes. Knowledge interchange format reference manual (version 3.0). Available at the URL http://logic.stanford.edu/sharing/papers/kif.ps, 1994.
C. Harrison. An adaptive query language for object-oriented databases: Automatic navigation through partially specified data structures. Available by anonymous ftp from ftp.ccs.neu.edu as the file pub/people/lieber/adaptive-query-lang.ps, 1994.
ISO 8879. Information processing—text and office systems—Standard Generalized Markup Language (SGML), 1986.
W. Kim. On object oriented database technology. UniSQL product literature, 1994.
M. Kifer, W. Kim, and Y. Sagiv. Querying object-oriented databases. In Proceedings of the A CM SIGMOD International Conference on Management of Data, pages 393–402, 1992.
W. Litwin, L. Mark, and N. Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3):267–293, 1990.
Microsoft Corporation. OLE2 Programmer's Reference. Microsoft Press, Redmond, WA, 1994.
J. Melton and A.R. Simon. Understanding the New SQL: A Complete Guide. Morgan Kaufmann, San Mateo, California, 1993.
OMG ORBTF. Common Object Request Broker Architecture. Object Management Group, Framingham, MA, 1992.
Y. Papakonstantinou, H. Garcia-Molina, and J. Ullman. MedMaker: A mediation system based on declarative specifications. Available by anonymous ftp from db.stanford.edu as the file pub/papakonstantinou/1995/medmaker.ps, 1995.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proceedings of the Eleventh International Conference on Data Engineering, pages 251–260, Taipei, Taiwan, March 1995.
X. Qian. Semantic interoperation via intelligent mediation. In Proceedings of the Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, pages 228–231. IEEE Computer Society Press, April 1993.
D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructured heterogeneous information. Available by anonymous ftp from db.stanford.edu as the file pub/quass/1994/querying-full.ps, 1994.
A. Rafii, R. Ahmed, M. Ketabchi, P. DeSmedt, and W. Du. Integration strategies in Pegasus object oriented multidatabase system. In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Volume II, pages 323–334, January 1992.
R. Rao, B. Janssen, and A. Rajaraman. GAIA technical overview. Technical Report, Xerox Palo Alto Research Center, 1994.
K. Shoens, A. Luniewski, P. Schwarz, J. Stamos, and J. Thomas. The RUFUS system: Information organization for semi-structured data. In Proceedings of the Nineteenth International Conference on Very Large Data Bases, pages 97–107, Dublin, Ireland, August 1993.
J.E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. The MIT Press, Cambridge, Massachusetts, 1977.
T. Yan and J. Annevelink. Integrating a structured-text retrieval system with an object-oriented database system. In Proceedings of the Twentieth International Conference on Very Large Data Bases, Santiago, Chile, September 1994.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Widom, J. (1995). Querying semistructured heterogeneous information. In: Ling, T.W., Mendelzon, A.O., Vieille, L. (eds) Deductive and Object-Oriented Databases. DOOD 1995. Lecture Notes in Computer Science, vol 1013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60608-4_48
Download citation
DOI: https://doi.org/10.1007/3-540-60608-4_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60608-6
Online ISBN: 978-3-540-48460-8
eBook Packages: Springer Book Archive