On the Expressiveness of Implicit Provenance in Query and Update Languages

  • Peter Buneman
  • James Cheney
  • Stijn Vansummeren
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4353)

Abstract

Information concerning the origin of data (that is, its provenance) is important in many areas, especially scientific recordkeeping. Currently, provenance information must be maintained explicitly, by added effort of the database maintainer. Since such maintenance is tedious and error-prone, it is desirable to provide support for provenance in the database system itself. In order to provide such support, however, it is important to provide a clear explanation of the behavior and meaning of existing database operations, both queries and updates, with respect to provenance. In this paper we take the view that a query or update implicitly defines a provenance mapping linking components of the output to the originating components in the input. Our key result is that the proposed semantics are expressively complete relative to natural classes of queries that explicitly manipulate provenance.

Keywords

Sugar Hull Cond 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations Of Databases. Addison-Wesley, Reading (1995)MATHGoogle Scholar
  2. 2.
    Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: VLDB 2006, pp. 953–964 (2006)Google Scholar
  3. 3.
    Bhagwat, D., Chiticariu, L., Tan, W., Vijayvargiya, G.: An annotation management system for relational databases. In: VLDB 2004, pp. 900–911 (2004)Google Scholar
  4. 4.
    Bose, R., Frew, J.: Lineage retrieval for scientific data processing: a survey. ACM Comput. Surv. 37(1), 1–28 (2005)CrossRefGoogle Scholar
  5. 5.
    Buneman, P., Chapman, A., Cheney, J.: Provenance management in curated databases. In: SIGMOD 2006, pp. 539–550 (2006)Google Scholar
  6. 6.
    Buneman, P., Khanna, S., Tan, W.: Why and where: A characterization of data provenance. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 316–330. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  7. 7.
    Buneman, P., Naqvi, S.A., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theor. Comp. Sci. 149(1), 3–48 (1995)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)CrossRefGoogle Scholar
  9. 9.
    Moreau, L., Foster, I. (eds.): IPAW 2006. LNCS, vol. 4145. Springer, Heidelberg (2006)Google Scholar
  10. 10.
    Geerts, F., Kementsietsidis, A., Milano, D.: Mondrian: Annotating and querying databases through colors and blocks. In: ICDE 2006, p. 82 (2006)Google Scholar
  11. 11.
    Liefke, H., Davidson, S.B.: Specifying updates in biomedical databases. In: SSDBM, pp. 44–53 (1999)Google Scholar
  12. 12.
    Simmhan, Y., Plale, B., Gannon, D.: A survey of data provenance in e-science. SIGMOD Record 34(3), 31–36 (2005)CrossRefGoogle Scholar
  13. 13.
    Tan, W.: Containment of relational queries with annotation propagation. In: Lausen, G., Suciu, D. (eds.) DBPL 2003. LNCS, vol. 2921, pp. 37–53. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  14. 14.
    Tan, W.: Research problems in data provenance. IEEE Data Engineering Bulletin 27(4), 45–52 (2004)Google Scholar
  15. 15.
    Wang, Y.R., Madnick, S.E.: A polygen model for heterogeneous database systems: The source tagging perspective. In: VLDB 1990, pp. 519–538 (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Peter Buneman
    • 1
  • James Cheney
    • 1
  • Stijn Vansummeren
    • 2
  1. 1.University of EdinburghScotland
  2. 2.Hasselt University and Transnational University of LimburgBelgium

Personalised recommendations