Advertisement

SHARP: Harmonizing and Bridging Cross-Workflow Provenance

  • Alban Gaignard
  • Khalid Belhajjame
  • Hala Skaf-Molli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10577)

Abstract

PROV has been adopted by a number of workflow systems for encoding the traces of workflow executions. Exploiting these provenance traces is hampered by two main impediments. Firstly, workflow systems extend PROV differently to cater for system-specific constructs. The difference between the adopted PROV extensions yields heterogeneity in the generated provenance traces. This heterogeneity diminishes the value of such traces, e.g. when combining and querying provenance traces of different workflow systems. Secondly, the provenance recorded by workflow systems tends to be large, and as such difficult to browse and understand by a human user. In this paper (extending [14], initially published at SeWeBMeDA’17), we propose SHARP, a Linked Data approach for harmonizing cross-workflow provenance. The harmonization is performed by chasing tuple-generating and equality-generating dependencies defined for workflow provenance. This results in a provenance graph that can be summarized using domain-specific vocabularies. We experimentally evaluate SHARP (i) on publicly available provenance documents and (ii) using a real-world omic experiment involving workflow traces generated by the Taverna and Galaxy systems.

Keywords

Reproducibility Scientific workflows Provenance Prov constraints 

References

  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)zbMATHGoogle Scholar
  2. 2.
    Afgan, E., Baker, D., van den Beek, M., et al.: The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucl. Acids Res. 44(W1), W3–W10 (2016)CrossRefGoogle Scholar
  3. 3.
    Aggarwal, C.C., Wang, H.: Graph data management and mining: a survey of algorithms and applications. In: Aggarwal, C., Wang, H. (eds.) Managing and Mining Graph Data. Advances in Database Systems, vol. 40, pp. 13–68. Springer, Boston (2010).  https://doi.org/10.1007/978-1-4419-6045-0_2CrossRefzbMATHGoogle Scholar
  4. 4.
    Alper, P., Belhajjame, K., Goble, C.A., Karagoz, P.: Enhancing and abstracting scientific workflow provenance for data publishing. In: Proceedings of the Joint EDBT/ICDT 2013 Workshops, pp. 313–318. ACM (2013)Google Scholar
  5. 5.
    Alper, P., Belhajjame, K., Goble, C.A., Karagoz, P.: LabelFlow: exploiting workflow provenance to surface scientific data provenance. In: Ludäscher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 84–96. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16462-5_7CrossRefGoogle Scholar
  6. 6.
    Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006).  https://doi.org/10.1007/11890850_14CrossRefGoogle Scholar
  7. 7.
    Carroll, J.J., Dickinson, I., et al.: Jena: implementing the semantic web recommendations. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers & Posters, pp. 74–83. ACM (2004)Google Scholar
  8. 8.
    Cheney, J., Missier, P., Moreau, L.: Constraints of the provenance data model. Technical report (2012)Google Scholar
  9. 9.
    Chirigati, F., Shasha, D., Freire, J.: ReproZip: using provenance to support computational reproducibility. In: 5th USENIX Workshop on the Theory and Practice of Provenance, Berkeley (2013)Google Scholar
  10. 10.
    Daga, E., d’Aquin, M., et al.: Describing semantic web applications through relations between data nodes (2014)Google Scholar
  11. 11.
    Doan, A., Halevy, A., Ives, Z.: Principles of Data Integration, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco (2012)Google Scholar
  12. 12.
    Dodds, L., Davis, I.: Linked Data patterns: a pattern catalogue for modelling, publishing, and consuming Linked Data, May 2012Google Scholar
  13. 13.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Gaignard, A., Belhajjame, K., Skaf-Molli, H.: Sharp: harmonizing cross-workflow provenance. In: SeWeBMeDA Workshop on Semantic Web Solutions for Large-Scale Biomedical Data Analytics (2016)Google Scholar
  15. 15.
    Gaignard, A., Skaf-Molli, H., Bihouée, A.: From scientific workflow patterns to 5-star linked open data. In: 8th USENIX Workshop on the Theory and Practice of Provenance (2016)Google Scholar
  16. 16.
    Lebo, T., Sahoo, S., McGuinness, D., et al.: PROV-O: the PROV ontology. W3C Recommendation, 30 April 2013Google Scholar
  17. 17.
    Miles, S., Groth, P., Branco, M., Moreau, L.: The requirements of using provenance in E-science experiments. J. Grid Comput. 5(1), 1–25 (2007)CrossRefGoogle Scholar
  18. 18.
    Missier, P., Belhajjame, K., Cheney, J.: The W3C PROV family of specifications for modelling provenance metadata. In: Proceedings of the 16th International Conference on Extending Database Technology, pp. 773–776. ACM (2013)Google Scholar
  19. 19.
    Kuhn, T., Chichester, C., Krauthammer, M., et al.: Decentralized provenance-aware publishing with nanopublications. PeerJ Comput. Sci. 2, e78 (2016).  https://doi.org/10.7717/peerj-cs.78CrossRefGoogle Scholar
  20. 20.
    Wolstencroft, K., Haines, R., Fellows, D., et al.: The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucl. Acids Res. 41(Webserver–Issue), 557–561 (2013)CrossRefGoogle Scholar
  21. 21.
    Zhao, J., Wroe, C., Goble, C., Stevens, R., Quan, D., Greenwood, M.: Using semantic web technologies for representing E-science provenance. In: McIlraith, S.A., Plexousakis, D., Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 92–106. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30475-3_8CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Alban Gaignard
    • 1
  • Khalid Belhajjame
    • 2
  • Hala Skaf-Molli
    • 3
  1. 1.l’institut du thorax, INSERM, CNRS, UNIV NantesNantesFrance
  2. 2.Université de Paris-Dauphine, LAMSADEParisFrance
  3. 3.Université de Nantes, LS2NNantesFrance

Personalised recommendations