Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Provenance Storage

  • Thomas Heinis
  • Adriane Chapman
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80746

Synonyms

History storage; Lineage storage; Pedigree organization; Provenance organization

Definition

Given the provenance of data processing or manipulation (e.g., through ad hoc manipulations, workflows, or database operators), provenance storage defines how the provenance information is stored on disk. Provenance information essentially captures all information describing the history, creation, and modification of a data product. In the context of workflows, for example, relevant information includes but is not limited to the parameters used in each step of the workflow recursively, software versions used, etc. Provenance storage defines where and how this information is stored and organized on disk.

Historical Background

The original academic works on digital provenance focused on provenance within relational databases [1, 2, 3]. However, workflow systems also found a use for provenance and quickly began capturing and storing provenance information [4, 5, 6, 7, 8]. Throughout the...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Buneman P, Khanna S, Tan W-C Why and where: a characterization of data provenance. In: Proceedings of the 8th International Conference on Database Theory; p. 316–30.CrossRefGoogle Scholar
  2. 2.
    Cui Y, Widom J. Practical lineage tracing in data warehouses. In: Proceedings of the 16th International Conference on Data Engineering; p. 367–78.Google Scholar
  3. 3.
    Woodruff A, Stonebraker M. Supporting fine-grained data Lineage in a database visualization environment. In: Proceedings of the 13th International Conference on Data Engineering; p. 97–102.Google Scholar
  4. 4.
    Altintas I, Barney O, Jaeger-Frank E. Provenance collection support in the Kepler scientific workflow system. In: Proceedings of the International Provenance and Annotation Workshop; 2006. p. 118–32.CrossRefGoogle Scholar
  5. 5.
    Foster I, Vockler J, Eilde M, Zhao Y. Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of the 14th International Conference on Scientific and Statistical Database Management; 2002. p. 37–46.Google Scholar
  6. 6.
    Freire J, Silva CT, et al. Managing rapidly-evolving scientific workflows, managing rapidly-evolving scientific workflows. 2006.CrossRefGoogle Scholar
  7. 7.
    Simmhan Y, Plale B, Gannon D. A framework for collecting provenance in data-centric scientific workflows. In: Proceedings of the IEEE International Conference on Web Services; 2006.Google Scholar
  8. 8.
    Wong SC, Miles S, Fang W, Groth P, Moreau L. Provenance-based validation of E-Science experiments. In: Proceedings of the 4th International Semantic Web Conference, Lecture Notes in Computer Science. 2005. p. 801–15.CrossRefGoogle Scholar
  9. 9.
    Anand MK, Bowers S, McPhillips T, Ludascher B. Efficient provenance storage over nested data collections. In: Advances in Database Technology, Proceedings of the 12th International Conference on Extending Database Technology; 2009. p. 958–69.Google Scholar
  10. 10.
    Artem Chebotko SL, Fei X, Fotouhi F. RDFPROV: a relational RDF store for querying and managing scientific workflow provenance. Data Knowl Eng. 2010;69(8):836–65.CrossRefGoogle Scholar
  11. 11.
    Buneman P, Chapman A, Cheney J. Provenance management in curated databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2006. p. 539–50.Google Scholar
  12. 12.
    Heinis T, Alonso G. Efficient lineage tracking for scientific workflows. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2008. p. 1007–18.Google Scholar
  13. 13.
    Xiey Y, Muniswamy-Reddy K-K, Fengy D, Liz Y, Longz DDE, Tany Z, Chen L. A hybrid approach for efficient provenance storage. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management; 2012.Google Scholar
  14. 14.
    Park H, Ikeda R, Widom J. RAMP: a system for capturing and tracing provenance in mapreduce workflows. In: Proceedings of the 37th International Conference on Very Large Data Bases; 2011.Google Scholar
  15. 15.
    Mason C. Cryptographic binding of metadata, The National Security Agency’s Review of Emerging Technologies, vol. 18. 2009.Google Scholar
  16. 16.
    Allen MD, Chapman A, Blaustein B. Engineering choices for open world provenance. In: Proceedings of the 6th International Provenance and Annotation Workshop; 2014.Google Scholar
  17. 17.
    Dey S, Agun M, Wang M, Ludäscher B, Bowers S, Missier P. A provenance repository for storing and retrieving data lineage information, Technical Report, DataONE Provenance & Workflow Working Group. 2011.Google Scholar
  18. 18.
    Missier P, Chen Z. Extracting PROV provenance traces from Wikipedia history pages. In: Proceedings of the 16th International Conference on Extending Database Technology; 2013.Google Scholar
  19. 19.
    Robinson I, Webber J, E. Eifrem. Graph databases. O’Reilly Media, Inc.; 2013.Google Scholar
  20. 20.
    Dublin Core Metadata Initiative Usage Board. DCMI Metadata Terms: A complete historical record. Dublin Core Metadata Initiative (DCMI), Online, 2014.Google Scholar
  21. 21.
    Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J, Plale B, Simmhan Y, Stephan E, Van den Bussche J. The Open Provenance Model core specification (v1.1), Future Generation Computer Systems 2011;27:6, 743–756.CrossRefGoogle Scholar
  22. 22.
    Moreau L, Groth P. Provenance an introduction to PROV. Morgan & Claypool Publishers; 2013.Google Scholar
  23. 23.
    Groth P, Moreau L. PROV-Overview. World Wide Web Consortium (W3C), Online, 2013.Google Scholar
  24. 24.
    Abawajy JH, Jami SI, Shaikh ZA, Hammad SA. A framework for scalable distributed provenance storage system. Comput Stand Interfaces. 2013;35(1):179–86.CrossRefGoogle Scholar
  25. 25.
    Allen MD, Chapman A, Blaustein B, Seligman L. Getting it together: enabling multi-organization provenance exchange. In: Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance; 2011.Google Scholar
  26. 26.
    Groth P, Jiang S, Miles S, Munroe S, Tan V, Tsasakou S, Moreau L. An architecture for provenance systems, Technical Report. ECS, University of Southampton. 2006.Google Scholar
  27. 27.
    Zhao D, Shou C, Malik T, Raicu I. Distributed data provenance for large-scale data-intensive computing. IEEE Cluster. 2013.Google Scholar
  28. 28.
    Groth P, Miles S, Moreau L. PReServ: provenance recording for services, UK OST e-Science second AHM. 2005.Google Scholar
  29. 29.
  30. 30.
    Simmhan Y, Plale B, Gannon D. Karma2: provenance management for data driven workflows. J Web Ser Res. 2008;5(2):1–22.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Imperial College LondonLondonUK
  2. 2.University of SouthamptonSouthamptonUK

Section editors and affiliations

  • Juliana Freire
    • 1
  1. 1.University of UtahSalt Lake CityUSA