Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Provenance in Scientific Databases

  • Sarah Cohen-Boulakia
  • Wang-Chiew Tan
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_282

Synonyms

History; Lineage; Origin; Pedigree; Source

Definition

Scientific databases contain data which may have been produced as answer to a query posed over other resources, or generated by in silico experiments (or scientific workflow) involving various softwares, or manually curated by domain experts based on analysis of several other resources. The provenance of a piece of data in scientific databases typically includes information of where this piece of data originates from, as well as details of the scientific process (e.g., parameters used in the experiments, software versions, etc.) by which it arrived in the scientific database.

Historical Background

Provenance of scientific databases has been studied in two granularities: workflow provenance and data provenance.

Workflow provenance (or coarse-grained provenance) refers to the record of the history (or workflow) of the derivation of some dataset in a scientific workflow [1, 2, 3]. The amount of information recorded for...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Bose R, Frew J. Lineage retrieval for scientific data processing: a survey. ACM Comput Surv. 2005;37(1):1–28.CrossRefGoogle Scholar
  2. 2.
    Moreau L, Ludäscher B, Altintas I, Barga RS, Bowers S, Callahan S, Chin G Jr, Clifford B, Cohen S, Cohen-Boulakia S, Davidson S, Deelman E, Digiampietri L, Foster I, Freire J, Frew J, Futrelle J, Gibson T, Gil Y, Goble C, Golbeck J, Groth P, Holland DA, Jiang S, Kim J, Koop D, Krenek A, McPhillips T, Mehta G, Miles S, Metzger D, Munroe S, Myers J, Plale B, Podhorszki N, Ratnakar V, Santos E, Scheidegger C, Schuchardt K, Seltzer M, Simmhan YL, Silva C, Slaughter P, Stephan E, Stevens R, Turi D, Vo H, Wilde M, Zhao J, Zhao Y. The first provenance challenge. Concurrency Comput Pract Exp. 2007;20(5):409–18. Special issue on the First Provenance Challenge.CrossRefGoogle Scholar
  3. 3.
    Simmhan Y, Plale B, Gannon D. A survey of data provenance in e-science. ACM SIGMOD Rec. 2005;34:31–6.CrossRefGoogle Scholar
  4. 4.
    Buneman P, Tan WC. Provenance in databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2007. p. 1171–3.Google Scholar
  5. 5.
    Biton O, Cohen-Boulakia S, Davidson S. Zoom* UserViews: querying relevant provenance in workflow systems. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 1366–9.Google Scholar
  6. 6.
    Biton O, Cohen-Boulakia S, Davidson S, Hara CS. Querying and managing provenance through user views in scientific workflows. In: Proceedings of the 24th International Conference on Data Engineering; 2008.Google Scholar
  7. 7.
    Buneman P, Khanna S, Tan WC. Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; 2001. p. 316–30.CrossRefGoogle Scholar
  8. 8.
    Cui Y, Widom J, Wiener JL. Tracing the lineage of view data in a warehousing environment. ACM Trans Database Syst. 2000;25(2):179–227.CrossRefGoogle Scholar
  9. 9.
    Green TJ, Karvounarakis G, Tannen V. Provenance semirings. In: Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2007. p. 31–40.Google Scholar
  10. 10.
    Wang YR, Madnick SE. A polygen model for heterogeneous database systems: the source tagging perspective. In: Proceedings of the 16th International Conference on Very Large Data Bases; 1990. p. 519–38.Google Scholar
  11. 11.
    Buneman P, Khanna S, Tan WC. On propagation of deletions and annotations through views. In: Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2002. p. 150–8.Google Scholar
  12. 12.
    Bhagwat D, Chiticariu L, Tan WC, Vijayvargiya G. An annotation management system for relational databases. VLDB J. 2005;14(4):373–96.CrossRefGoogle Scholar
  13. 13.
    Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.Google Scholar
  14. 14.
    Benjelloun O, Sarma AD, Halevy AY, Widom J. ULDBs: databases with uncertainty and lineage. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 953–64.Google Scholar
  15. 15.
    Chiticariu L, Tan WC. Debugging schema mappings with routes. In: Proceedings of the 32nd International Conference on Very Large Data Bases; 2006. p. 79–90.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University Paris-SudOrsay CedexFrance
  2. 2.University of California-Santa CruzSanta CruzUSA

Section editors and affiliations

  • Juliana Freire
    • 1
  1. 1.New York UniversityNew YorkUSA