Skip to main content

Data Provenance

  • Reference work entry
  • First Online:
  • 71 Accesses

Synonyms

Data lineage; Data pedigree; Data tracking; Provenance metadata

Definition

The term “data provenance” refers to a record trail that accounts for the origin of a piece of data (in a database, document or repository) together with an explanation of how and why it got to the present place.

Example

In an application like Molecular Biology, a lot of data is derived from public databases, which in turn might be derived from papers but after some transformations (only the most significant data were put in the public database), which are derived from experimental observations. A provenance record will keep this history for each piece of data.

Key Points

Databases today do not have a good way of managing provenance data and the subject is an active research area. One category of provenance research focuses on the case where one database derives some of its data by querying another database, and one may try to “invert” the query to determine which input data elements contribute to this...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Bose R, Frew J. Lineage retrieval for scientific data processing: a survey. ACM Comput Surv. 2005;37(1):1–28.

    Article  Google Scholar 

  2. Buneman P, Khanna S, Tajima K, Tan W-C. Archiving scientific data. In: Proceedings of the ACM SIGMOD Conference on Management of Data; 2002. p. 1–12.

    Google Scholar 

  3. Buneman P, Khanna S, Tan WC. On propagation of deletions and annotations through views. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2002. p. 150–8.

    Google Scholar 

  4. Simmhan YL, Plale B, Gannon D. A survey of data provenance techniques. Technical Report TR618, Department of Computer Science, Indiana University; 2005.

    Google Scholar 

  5. Widom J. Trio: a system for integrated management of data, accuracy, and lineage. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005. p. 262–76.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amarnath Gupta .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Gupta, A. (2018). Data Provenance. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1305

Download citation

Publish with us

Policies and ethics