Skip to main content

The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation

  • Chapter
  • First Online:
Provenance in Data Science

Abstract

As data science techniques are being applied to solve societal problems, understanding what is happening within the “pipeline” is essential for establishing trust and reproducibility of the results. Provenance captures information about what happened during design and execution in order to support reasoning for trust and reproducibility. However, how and where the information is captured as provenance within the data science pipelines changes how it can be utilized. In this work, we describe three different mechanisms to capture provenance in data science pipelines: human-based, tool-based, and script-based. By using an implementation of all techniques in a standard data science toolkit, we analyze the difference in provenance generated by these methods and how its use changes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.iso.org/obp/ui/#iso:std:iso:19115:-1:ed-1:v1:en

  2. 2.

    https://orange.biolab.si

  3. 3.

    https://prov.readthedocs.io

References

Download references

Acknowledgements

This work was partially supported by EPSRC (EP/S028366/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adriane Chapman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chapman, A., Sasikant, A., Simonelli, G., Missier, P., Torlone, R. (2021). The Right (Provenance) Hammer for the Job: A Comparison of Data Provenance Instrumentation. In: Sikos, L.F., Seneviratne, O.W., McGuinness, D.L. (eds) Provenance in Data Science. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-67681-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-67681-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-67680-3

  • Online ISBN: 978-3-030-67681-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics