Skip to main content

Querying semistructured heterogeneous information

  • Query Processing II
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1013))

Abstract

Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a “lightweight” object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of examples (a complete semantics is available in an accompanying technical report [QRS+94]), and describes the basic architecture and query processing strategy of the “lightweight” object repository we have developed.

This work was supported by ARPA Contract F33615-93-1-1339, by the Anderson Faculty Scholar Fund, and by equipment grants from Digital Equipment Corporation and IBM Corporation.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. F. Bancilhon, S. Cluet, and C. Delobel. A query language for O2 . In F. Bancilhon, C. Delobel, and P. Kanellakis, editors, Building an Object Oriented Database System — The Story of O2, pages 234–255. Morgan Kauffmann, 1992.

    Google Scholar 

  2. G. Blake, M. Consens, P. Kilpeläinen, P. Larson, T. Snider, and F. Tompa. Text / relational database management systems: Harmonizing SQL and SGML. In W. Litwin and T. Risch, editors, Applications of Databases: First International Conference, pages 267–280. Vadstena, Sweden, 1994.

    Google Scholar 

  3. V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilties. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 313–324, Minneapolis, MN, May 1994.

    Google Scholar 

  4. R. Cattel, editor. The Object Database Standard: ODMG-93. Morgan Kaufmann, 1994.

    Google Scholar 

  5. M. Carey, D. DeWitt, and S. Vandenberg. A data model and query language for Exodus. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 413–423, Chicago, IL, June 1988.

    Google Scholar 

  6. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of the 100th IPSJ, Tokyo, Japan, October 1994.

    Google Scholar 

  7. P. Dadam, K. Kuespert, F. Andersen, H. Blanken, R. Erbe, J. Guenauer, V. Lum, P. Pistor, and G. Walch. A DBMS prototype to support extended NF 2 relations: An integrated view on flat tables and hierarchies. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 356–367, 1986.

    Google Scholar 

  8. D. Fishman et al. Overview of the Iris DBMS. In W. Kim and F.H. Lochovsky, editors, Object-Oriented Concepts, Languages, and Applications, pages 219–250. Addison-Wesley, 1989.

    Google Scholar 

  9. M. Freedman. WILLOW: Technical overview. Available by anonymous ftp from ftp.cac.washington.edu as the file willow/Tech-Report.ps, September 1994.

    Google Scholar 

  10. M. Genesereth and R. Fikes. Knowledge interchange format reference manual (version 3.0). Available at the URL http://logic.stanford.edu/sharing/papers/kif.ps, 1994.

    Google Scholar 

  11. C. Harrison. An adaptive query language for object-oriented databases: Automatic navigation through partially specified data structures. Available by anonymous ftp from ftp.ccs.neu.edu as the file pub/people/lieber/adaptive-query-lang.ps, 1994.

    Google Scholar 

  12. ISO 8879. Information processing—text and office systems—Standard Generalized Markup Language (SGML), 1986.

    Google Scholar 

  13. W. Kim. On object oriented database technology. UniSQL product literature, 1994.

    Google Scholar 

  14. M. Kifer, W. Kim, and Y. Sagiv. Querying object-oriented databases. In Proceedings of the A CM SIGMOD International Conference on Management of Data, pages 393–402, 1992.

    Google Scholar 

  15. W. Litwin, L. Mark, and N. Roussopoulos. Interoperability of multiple autonomous databases. ACM Computing Surveys, 22(3):267–293, 1990.

    Google Scholar 

  16. Microsoft Corporation. OLE2 Programmer's Reference. Microsoft Press, Redmond, WA, 1994.

    Google Scholar 

  17. J. Melton and A.R. Simon. Understanding the New SQL: A Complete Guide. Morgan Kaufmann, San Mateo, California, 1993.

    Google Scholar 

  18. OMG ORBTF. Common Object Request Broker Architecture. Object Management Group, Framingham, MA, 1992.

    Google Scholar 

  19. Y. Papakonstantinou, H. Garcia-Molina, and J. Ullman. MedMaker: A mediation system based on declarative specifications. Available by anonymous ftp from db.stanford.edu as the file pub/papakonstantinou/1995/medmaker.ps, 1995.

    Google Scholar 

  20. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In Proceedings of the Eleventh International Conference on Data Engineering, pages 251–260, Taipei, Taiwan, March 1995.

    Google Scholar 

  21. X. Qian. Semantic interoperation via intelligent mediation. In Proceedings of the Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, pages 228–231. IEEE Computer Society Press, April 1993.

    Google Scholar 

  22. D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructured heterogeneous information. Available by anonymous ftp from db.stanford.edu as the file pub/quass/1994/querying-full.ps, 1994.

    Google Scholar 

  23. A. Rafii, R. Ahmed, M. Ketabchi, P. DeSmedt, and W. Du. Integration strategies in Pegasus object oriented multidatabase system. In Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Volume II, pages 323–334, January 1992.

    Google Scholar 

  24. R. Rao, B. Janssen, and A. Rajaraman. GAIA technical overview. Technical Report, Xerox Palo Alto Research Center, 1994.

    Google Scholar 

  25. K. Shoens, A. Luniewski, P. Schwarz, J. Stamos, and J. Thomas. The RUFUS system: Information organization for semi-structured data. In Proceedings of the Nineteenth International Conference on Very Large Data Bases, pages 97–107, Dublin, Ireland, August 1993.

    Google Scholar 

  26. J.E. Stoy. Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. The MIT Press, Cambridge, Massachusetts, 1977.

    Google Scholar 

  27. T. Yan and J. Annevelink. Integrating a structured-text retrieval system with an object-oriented database system. In Proceedings of the Twentieth International Conference on Very Large Data Bases, Santiago, Chile, September 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Tok Wang Ling Alberto O. Mendelzon Laurent Vieille

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Quass, D., Rajaraman, A., Sagiv, Y., Ullman, J., Widom, J. (1995). Querying semistructured heterogeneous information. In: Ling, T.W., Mendelzon, A.O., Vieille, L. (eds) Deductive and Object-Oriented Databases. DOOD 1995. Lecture Notes in Computer Science, vol 1013. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60608-4_48

Download citation

  • DOI: https://doi.org/10.1007/3-540-60608-4_48

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60608-6

  • Online ISBN: 978-3-540-48460-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics