Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Provenance in Databases

  • James Cheney
  • Wang-Chiew Tan
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_283

Synonyms

History; Lineage; Origin; Pedigree; Source

Definition

Let t be a data element in the result of a query Q applied to a dataset D. The provenance of t is the set of all proofs for t according to Q and D. A proof for t according to Q and D is a subset D′ of data elements in D so that t is in the result of applying Q on D′. In some cases, a proof also details the process by which t is derived from Q and D′.

Most work on provenance in databases focused on finding minimal subsets of D that witness the existence of t in the result, as well as which parts of D are t copied from. More general forms of provenance based on annotations (e.g., elements of algebraic structures such as semirings) have also been investigated. Provenance is also important for understanding how data in databases has evolved as a result of updates over time, particularly in curated scientific databases.

Historical Background

Data provenance (or fine-grained provenance) is an account of the derivation of a piece...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Arab B, Gawlick D, Radhakrishnan V, Guo H, Glavic B. A generic provenance middleware for database queries, updates, and transactions. In: Proceedings of the 6th USENIX Workshop on the Theory and Practice of Provenance; 2014.Google Scholar
  2. 2.
    Archer DW, Delcambre LML, Maier D. User trust and judgments in a curated database with explicit provenance. In: In search of elegance in the theory and practice of computation. Lecture notes in computer science, vol. 8000. Heidelberg: Springer; 2013. p. 89–111.CrossRefGoogle Scholar
  3. 3.
    Benjelloun O, Sarma AD, Halevy AY, Theobald M, Widom J. Databases with uncertainty and lineage. VLDB J. 2008;17(2):243–64.CrossRefGoogle Scholar
  4. 4.
    Bhagwat D, Chiticariu L, Tan W-C, Vijayvargiya G. An annotation management system for relational databases. Very Large Data Bases (VLDB) J. 2005;14(4):373–96.CrossRefGoogle Scholar
  5. 5.
    Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.Google Scholar
  6. 6.
    Buneman P, Khanna S, Tan W-C. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.CrossRefGoogle Scholar
  7. 7.
    Buneman P, Khanna S, Tan W-C. On propagation of deletions and annotations through views. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2002. p. 150–8.Google Scholar
  8. 8.
    Buneman P, Tan W-C. Provenance in databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 1171–3. (Tutorial Track).Google Scholar
  9. 9.
    Cheney J, Chiticariu L, Tan WC. Provenance in databases: why, how, and where. Found Trends Databases. 2009;1(4):379–474.CrossRefGoogle Scholar
  10. 10.
    Chiticariu L, Tan W-C. Debugging schema mappings with routes. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 79–90.Google Scholar
  11. 11.
    Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.CrossRefGoogle Scholar
  12. 12.
    Das Sarma A, Theobald M, Widom J. LIVE: a lineage-supported versioned DBMS. In: Proceedings of the 22nd International Conference on. Scientific and Statistical Database Management; 2010.Google Scholar
  13. 13.
    Fegaras L. Propagating updates through XML views using lineage tracing. In: Proceedings of the 26th International Conference on Data Engineering; 2010. p. 309–20.Google Scholar
  14. 14.
    Glavic B, Alonso G. Perm: processing provenance and data on the same data model through query rewriting. In: Proceedings of the 25th International Conference on Data Engineering; 2009.Google Scholar
  15. 15.
    Green TJ, Ives ZG, Tannen V. Reconcilable differences. Theory Comput Syst. 2011;49(2):460–88.MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007.Google Scholar
  17. 17.
    Karvounarakis G, Green TJ. Semiring-annotated data: queries and provenance. ACM SIGMOD Rec. 2012;41(3):5–14.CrossRefGoogle Scholar
  18. 18.
    Karvounarakis G, Ives ZG, Tannen V. Querying data provenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2010.Google Scholar
  19. 19.
    Wang Y, Madnick SE. A polygen model for heterogeneous database systems: the source tagging perspective. In: Proceedings of the 16th International Conference on Very Large Data Bases; 1990. p. 519–38.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of EdinburghEdinburghUK
  2. 2.University of California-Santa CruzSanta CruzUSA