Skip to main content

Provenance in Workflows

  • Reference work entry
  • First Online:
Encyclopedia of Database Systems

Synonyms

Computational provenance; Lineage; Origin; Source; History

Definition

Data and compute-intensive science require the ability to orchestrate computational steps and integrate distinct tools. Scientific workflow systems have been developed to structure such computations. A scientific workflow is a directed graph where a set of computational steps are linked together. Each computational module/actor/processor contains a set of input and output ports; a link/edge/channel/connection between an output of one module and the input of another indicates a data dependency. Modules may also have settable parameters that influence their computations. Workflow provenance may then include information about the specification of the workflow, the evolution of that specification, and executions of the workflow.

Historical Background

Workflows have been used to model business processes [14]. Business workflows, scripts, coordination languages, and dataflow systems are precursors of today’s...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Recommended Reading

  1. Alper P, Belhajjame K, Goble C, Karagoz P. Small is beautiful: summarizing scientific workflows using semantic annotations. In: Proceedings of the 2013 IEEE International Congress on Big Data; 2013. p. 18–25.

    Google Scholar 

  2. Biton O, Cohen-Boulakia S, Davidson SB. Zoom* userviews: querying relevant provenance in workflow systems. In: Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment; 2007. p. 1366–69.

    Google Scholar 

  3. Bose R, Frew J. Lineage retrieval for scientific data processing: a survey. ACM Comput Surv. 2005;37(1):1–28.

    Article  Google Scholar 

  4. Chapman AP, Jagadish HV, Ramanan P. Efficient provenance storage. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data; 2008. p. 993–1006.

    Google Scholar 

  5. Chirigati FS, Shasha D, Freire J. Packing experiments for sharing and publication. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 977–80.

    Google Scholar 

  6. Davidson SB, Boulakia SC, Eyal A, Ludäscher B, McPhillips TM, Bowers S, Anand MK, Freire J. Provenance in scientific workflow systems. IEEE Data Eng Bull. 2007;30(4):44–50.

    Google Scholar 

  7. Dias J, Guerra G, Rochinha F, Coutinho ALGA, Valduriez P, Mattoso M. Data-centric iteration in dynamic workflows. Futur Gener Comput Syst. 2015;46:114–26. http://dx.doi.org/10.1016/j.future.2014.10.021.

    Article  Google Scholar 

  8. Freire J, Koop D, Santos E, Silva C. Provenance for computational tasks: a survey. Comput Sci Eng. 2008;10(3):11–21.

    Article  Google Scholar 

  9. Freire J, Silva C, Callahan S, Santos E, Scheidegger C, Vo H. Managing rapidly-evolving scientific workflows. In: International Provenance and Annotation Workshop (IPAW), LNCS, vol. 4145. Springer; 2006. p. 10–8.

    Google Scholar 

  10. Koop D, Freire J, Silva CT. Visual summaries for graph collections. In: Visualization Symposium (PacificVis), 2013 IEEE Pacific; 2013. p. 57–64.

    Google Scholar 

  11. Mattoso M, Dias J, Ocaña KACS, Ogasawara E, Costa F, Horta F, Silva V, de Oliveira D. Dynamic steering of HPC scientific workflows: a survey. Futur Gener Comput Syst. 2015;46(May):100–13.

    Article  Google Scholar 

  12. Scheidegger CE, Vo HT, Koop D, Freire J, Silva CT. Querying and creating visualizations by analogy. IEEE Trans Vis Comput Graph. 2007;13(6):1560–67.

    Article  Google Scholar 

  13. Silva CT, Anderson E, Santos E, Freire J. Using VisTrails and provenance for teaching scientific visualization. Comput Graphics Forum. 2011;30(1): 75–84.

    Article  Google Scholar 

  14. Van Der Aalst WMP, Ter Hofstede AHM, Weske M. Business process management: a survey. In: Business Process Management. Springer; 2003. p. 1–2.

    Google Scholar 

  15. Walker E, Guiang C. Challenges in executing large parameter sweep studies across widely distributed computing environments. In: Proceedings of the 5th IEEE Workshop on Challenges of Large Applications in Distributed Environments; 2007. p. 11–8.

    Google Scholar 

  16. Zhao Y, Foster I. Scientific workflow systems for 21st century, new bottle or new wine. In: IEEE Workshop on Scientific Workflows; 2008.

    Google Scholar 

  17. Zhou W, Mapara S, Ren Y, Li Y, Haeberlen A, Ives Z, Loo BT, Sherr M. Distributed time-aware provenance. Proc VLDB Endow. 2012;6(2):49–60.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Koop .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Koop, D., Mattoso, M., Freire, J. (2018). Provenance in Workflows. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80745

Download citation

Publish with us

Policies and ethics