Why and Where: A Characterization of Data Provenance

  • Peter Buneman
  • Sanjeev Khanna
  • Wang-Chiew Tan 
Conference paper

DOI: 10.1007/3-540-44503-X_20

Part of the Lecture Notes in Computer Science book series (LNCS, volume 1973)
Cite this paper as:
Buneman P., Khanna S., Wang-Chiew T. (2001) Why and Where: A Characterization of Data Provenance. In: Van den Bussche J., Vianu V. (eds) Database Theory — ICDT 2001. ICDT 2001. Lecture Notes in Computer Science, vol 1973. Springer, Berlin, Heidelberg

Abstract

With the proliferation of database views and curated data- bases, the issue of data provenance - where a piece of data came from and the process by which it arrived in the database - is becoming increasingly important, especially in scientific databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query. We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between “why” provenance (refers to the source data that had some influence on the existence of the data) and “where” provenance (refers to the location(s) in the source databases from which the data was extracted).

Supported in part by an Alfred P. Sloan Research Fellowship.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Peter Buneman
    • 1
  • Sanjeev Khanna
    • 1
  • Wang-Chiew Tan 
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations