Skip to main content

Adding structure to unstructured data

  • Contributed Papers
  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1186))

Abstract

We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Serge Abiteboul. Querying semi-structured data. In ICDT, 1997.

    Google Scholar 

  2. Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu. Adding structure to unstructured data. Technical Report MS-CIS-96-21, University of Pennsylvania, Computer and Information Science Department, 1996.

    Google Scholar 

  3. Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In SIGMOD, 1996.

    Google Scholar 

  4. Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. Technical Report 96-09, University of Pennsylvania, Computer and Information Science Department, February 1996.

    Google Scholar 

  5. Peter Buneman, Susan Davidson, and Dan Suciu. Programming constructs for unstructured data. In Proceedings of DBPL'95, Gubbio, Italy, September 1995.

    Google Scholar 

  6. V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Richard Snodgrass and Marianne Winslett, editors, Proceedings of 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994.

    Google Scholar 

  7. V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996.

    Google Scholar 

  8. M. P. Consens and A. O. Mendelzon. Graphlog: A visual formalism for real life recursion. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Sys., Nashville, TN, April 1990.

    Google Scholar 

  9. Carl A. Gunter. Semantics of Programming Languages: Structures and Techniques. Foundations of Computing. MIT Press, 1992.

    Google Scholar 

  10. Monika Henzinger, Thomas Henzinger, and Peter Kopke. Computing simulations on finite and infinite graphs. In Proceedings of 20th Symposium on Foundations of Computer Science, pages 453–462, 1995.

    Google Scholar 

  11. David Konopnicki and Oded Shmueli. Draft of W3QS: a query system for the World-Wide Web. In Proc. of VLDB, 1995.

    Google Scholar 

  12. SuA. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. Manuscript, available from http://www.cs.toronto.edu/ georgem/WebSQL.html, 1996.

    Google Scholar 

  13. D. Perrin. Finite automata. In Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science, chapter 1, pages 1–57. Elsevier, Amsterdam, 1990.

    Google Scholar 

  14. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, March 1995.

    Google Scholar 

  15. Robert Paige and Robert Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16:973–988, 1987.

    Article  Google Scholar 

  16. D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructure heterogeneous information. In International Conference on Deductive and Object Oriented Databases, 1995.

    Google Scholar 

  17. Dan Suciu. Query decomposition for unstructured query languages. In VLDB, September 1996.

    Google Scholar 

  18. J. Thierry-Mieg and R. Durbin. Syntactic Definitions for the ACEDB Data Base Manager. Technical Report MRC-LMB xx.92, MRC Laboratory for Molecular Biology, Cambridge,CB2 2QH, UK, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Foto Afrati Phokion Kolaitis

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Buneman, P., Davidson, S., Fernandez, M., Suciu, D. (1996). Adding structure to unstructured data. In: Afrati, F., Kolaitis, P. (eds) Database Theory — ICDT '97. ICDT 1997. Lecture Notes in Computer Science, vol 1186. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62222-5_55

Download citation

  • DOI: https://doi.org/10.1007/3-540-62222-5_55

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-62222-2

  • Online ISBN: 978-3-540-49682-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics