Towards Integrating Workflow and Database Provenance

  • Fernando Chirigati
  • Juliana Freire
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7525)


While there has been substantial work on both database and workflow provenance, the two problems have only been examined in isolation. It is widely accepted that the existing models are incompatible. Database provenance is fine-grained and captures changes to tuples in a database. In contrast, workflow provenance is represented at a coarser level and reflects the functional model of workflow systems, which is stateless—each computational step derives a new artifact. In this paper, we propose a new approach to combine database and workflow provenance. We address the mismatch between the different kinds of provenance by using a temporal model which explicitly represents the database states as updates are applied. We discuss how, under this model, reproducibility is obtained for workflows that manipulate databases, and how different queries that straddle the two provenance traces can be evaluated. We also describe a proof-of-concept implementation that integrates a workflow system and a commercial relational database.


Workflow Provenance Database Provenance Reproducibility 


  1. 1.
    Acar, U., Cheney, J., Bussche, J.V.D., Vansummeren, S., Buneman, P., Kwasnikowska, N.: A graph model of data and workflow provenance. In: Proceedings of the USENIX Workshop on the Theory and Practice of Provenance (TaPP), p. 11 (2010)Google Scholar
  2. 2.
    Amsterdamer, Y., Davidson, S.B., Deutch, D., Milo, T., Stoyanovich, J., Tannen, V.: Putting lipstick on pig: enabling database-style workflow provenance. Proceedings of VLDB Endowment 5(4), 346–357 (2011)Google Scholar
  3. 3.
    Cheney, J., Chiticariu, L., Tan, W.C.: Provenance in databases: Why, how, and where. Foundations and Trends in Databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  4. 4.
    Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the ACM SIGMOD, pp. 1345–1350 (2008)Google Scholar
  5. 5.
    A matter of time: Temporal data management in DB2 for z/OS (2010)Google Scholar
  6. 6.
    Fomel, S., Claerbout, J.: Guest editors’ introduction: Reproducible research. Computing in Science & Engineering 11(1), 5–7 (2009)CrossRefGoogle Scholar
  7. 7.
    Freire, J., Koop, D., Santos, E., Scheidegger, C., Silva, C.T., Vo, H.T.: VisTrails. In: The Architecture of Open Source Applications. (2011)Google Scholar
  8. 8.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: A survey. Computing in Science and Engineering 10(3), 11–21 (2008)CrossRefGoogle Scholar
  9. 9.
    Gawlick, D., Radhakrishnan, V.: Fine grain provenance using temporal databases. In: Proceedings of the USENIX Workshop on the Theory and Practice of Provenance (TaPP) (2011)Google Scholar
  10. 10.
    Huq, M.R., Wombacher, A., Apers, P.M.G.: Facilitating fine grained data provenance using temporal data model. In: Proceedings of the International Workshop on Data Management for Sensor Networks (DMSN). ACM (2010)Google Scholar
  11. 11.
    Jensen, C.S., Soo, M.D., Snodgrass, R.T.: Unifying temporal data models via a conceptual model. Information Systems 19, 513–547 (1993)CrossRefGoogle Scholar
  12. 12.
    Koop, D., Santos, E., Bauer, B., Troyer, M., Freire, J., Silva, C.T.: Bridging Workflow and Data Provenance Using Strong Links. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 397–415. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  13. 13.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD, pp. 1099–1110 (2008)Google Scholar
  14. 14.
  15. 15.
    Oracle total recall with oracle database 11g release 2 (2009)Google Scholar
  16. 16.
    The VisTrails Project,
  17. 17.
    The VisTrails Users’ Guide,

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fernando Chirigati
    • 1
  • Juliana Freire
    • 1
  1. 1.Computer Science and Engineering DepartmentPolytechnic Institute of NYUUSA

Personalised recommendations