Adding structure to unstructured data

  • Peter Buneman
  • Susan Davidson
  • Mary Fernandez
  • Dan Suciu
Contributed Papers Session 7: Unstructured Data
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1186)


We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.


Graph Database Unstructured Data Graph Schema Unary Predicate Infinite Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Abi97]
    Serge Abiteboul. Querying semi-structured data. In ICDT, 1997.Google Scholar
  2. [BDFS96]
    Peter Buneman, Susan Davidson, Mary Fernandez, and Dan Suciu. Adding structure to unstructured data. Technical Report MS-CIS-96-21, University of Pennsylvania, Computer and Information Science Department, 1996.Google Scholar
  3. [BDHS96a]
    Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. In SIGMOD, 1996.Google Scholar
  4. [BDHS96b]
    Peter Buneman, Susan Davidson, Gerd Hillebrand, and Dan Suciu. A query language and optimization techniques for unstructured data. Technical Report 96-09, University of Pennsylvania, Computer and Information Science Department, February 1996.Google Scholar
  5. [BDS95]
    Peter Buneman, Susan Davidson, and Dan Suciu. Programming constructs for unstructured data. In Proceedings of DBPL'95, Gubbio, Italy, September 1995.Google Scholar
  6. [CACS94]
    V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From structured documents to novel query facilities. In Richard Snodgrass and Marianne Winslett, editors, Proceedings of 1994 ACM SIGMOD International Conference on Management of Data, Minneapolis, Minnesota, May 1994.Google Scholar
  7. [CCM96]
    V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Canada, June 1996.Google Scholar
  8. [CM90]
    M. P. Consens and A. O. Mendelzon. Graphlog: A visual formalism for real life recursion. In Proc. ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Sys., Nashville, TN, April 1990.Google Scholar
  9. [Gun92]
    Carl A. Gunter. Semantics of Programming Languages: Structures and Techniques. Foundations of Computing. MIT Press, 1992.Google Scholar
  10. [HHK95]
    Monika Henzinger, Thomas Henzinger, and Peter Kopke. Computing simulations on finite and infinite graphs. In Proceedings of 20th Symposium on Foundations of Computer Science, pages 453–462, 1995.Google Scholar
  11. [KS95]
    David Konopnicki and Oded Shmueli. Draft of W3QS: a query system for the World-Wide Web. In Proc. of VLDB, 1995.Google Scholar
  12. [MMM96]
    SuA. Mendelzon, G. Mihaila, and T. Milo. Querying the world wide web. Manuscript, available from georgem/WebSQL.html, 1996.Google Scholar
  13. [Per90]
    D. Perrin. Finite automata. In Formal Models and Semantics, volume B of Handbook of Theoretical Computer Science, chapter 1, pages 1–57. Elsevier, Amsterdam, 1990.Google Scholar
  14. [PGMW95]
    Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous information sources. In IEEE International Conference on Data Engineering, March 1995.Google Scholar
  15. [PT87]
    Robert Paige and Robert Tarjan. Three partition refinement algorithms. SIAM Journal of Computing, 16:973–988, 1987.CrossRefGoogle Scholar
  16. [QRS+95]
    D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom. Querying semistructure heterogeneous information. In International Conference on Deductive and Object Oriented Databases, 1995.Google Scholar
  17. [Suc96]
    Dan Suciu. Query decomposition for unstructured query languages. In VLDB, September 1996.Google Scholar
  18. [TMD92]
    J. Thierry-Mieg and R. Durbin. Syntactic Definitions for the ACEDB Data Base Manager. Technical Report MRC-LMB xx.92, MRC Laboratory for Molecular Biology, Cambridge,CB2 2QH, UK, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1996

Authors and Affiliations

  • Peter Buneman
    • 1
  • Susan Davidson
    • 1
  • Mary Fernandez
    • 2
  • Dan Suciu
    • 2
  1. 1.University of PennsylvaniaUSA
  2. 2.AT&T Labs - ResearchUSA

Personalised recommendations