Advertisement

Intermediate Notation for Provenance and Workflow Reproducibility

  • Danius T. Michaelides
  • Richard Parker
  • Chris Charlton
  • William J. Browne
  • Luc Moreau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9672)

Abstract

We present a technique to capture retrospective provenance across a number of tools in a statistical software suite. Our goal is to facilitate portability of processes between the tools to enhance usability and to support reproducibility. We describe an intermediate notation to aid runtime capture of provenance and demonstrate conversion to an executable and editable workflow. The notation is amenable to conversion to PROV via a template expansion mechanism. We discuss the impact on our system of recording this intermediate notation in terms of runtime performance and also the benefits it brings.

Notes

Acknowledgements

This research was supported by the UK’s Economic and Social Research Council (grant reference ES/K007246/1). Downloadable versions of workflows, provenance and other resources used in this paper can be found at http://dx.doi.org/10.5258/SOTON/393118.

References

  1. 1.
    Stodden, V., Leisch, F., Peng, R.D.: Implementing Reproducible Research. CRC Press, Boca Raton (2014)Google Scholar
  2. 2.
    Yang, H., Michaelides, D.T., Charlton, C., Browne, W.J., Moreau, L.: DEEP: a provenance-aware executable document system. In: Groth, P., Frew, J. (eds.) IPAW 2012. LNCS, vol. 7525, pp. 24–38. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M.P., Sufi, S., Goble, C.: The Taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res. 41(W1), W557–W561 (2013)CrossRefGoogle Scholar
  4. 4.
    Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurrency Comput. Pract. Exp. 18(10), 1039–1065 (2006)CrossRefGoogle Scholar
  5. 5.
    Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 745–747. ACM, New York (2006)Google Scholar
  6. 6.
    McPhillips, T., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., Bocinsky, R., Cao, Y., Cheney, J., Chirigati, F., Dey, S., Freire, J., Jones, C., Hanken, J., Kintigh, K.W., Kohler, T.A., Koop, D., Macklin, J.A., Missier, P., Schildhauer, M., Schwalm, C., Wei, Y., Bieda, M., Ludäscher, B.: YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. Int. J. Digital Curation 10(1), 298–313 (2015)CrossRefGoogle Scholar
  7. 7.
    Chirigati, F., Shasha, D., Freire, J.: ReproZip: using provenance to support computational reproducibility. In: Presented as part of the 5th USENIX Workshop on the Theory and Practice of Provenance, Berkeley, CA. USENIX (2013)Google Scholar
  8. 8.
    Moreau, L.: Provenance-based reproducibility in the semantic web. Web Semant. Sci. Serv. Agents World Wide Web 9(2), 202–221 (2011)CrossRefGoogle Scholar
  9. 9.
    Cheney, J., Ahmed, A., Acar, U.A.: Provenance as dependency analysis. Math. Struct. Comput. Sci. 21, 1301–1337 (2011)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Fraser, N.: Blockly: A library for building visual editors. https://developers.google.com/blockly/
  11. 11.
    Missier, P., Dey, S., Belhajjame, K., Cuevas-Vicenttin, V., Ludäscher, B.: D-PROV: extending the PROV provenance model with workflow structure. In: 5th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2013), Lombard, IL. USENIX Association, April 2013Google Scholar
  12. 12.
    Michaelides, D., Huynh, T.D., Moreau, L.: PROV-Template: A Template System for PROV Documents. https://provenance.ecs.soton.ac.uk/prov-template/
  13. 13.
    Moreau, L., Missier, P.: PROV-DM: The PROV data model. World Wide Web Consortium, Recommendation REC-prov-dm-20130430, April 2013Google Scholar
  14. 14.
    Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)CrossRefGoogle Scholar
  15. 15.
    Murta, L., Braganholo, V., Chirigati, F., Koop, D., Freire, J.: noWorkflow: capturing and analyzing provenance of scripts. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 71–83. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  16. 16.
    Simmhan, Y., Groth, P., Moreau, L.: The third provenance challenge on using the open provenance model for interoperability. Future Gener. Comput. Syst. 27(6), 737–742 (2011)CrossRefGoogle Scholar
  17. 17.
    Missier, P., Goble, C.: Workflows to open provenance graphs, round-trip. Future Gener. Comput. Syst. 27(6), 812–819 (2011)CrossRefGoogle Scholar
  18. 18.
    Cheney, J.: Program slicing and data provenance. IEEE Data Eng. Bull. 30(4), 22–28 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Danius T. Michaelides
    • 1
  • Richard Parker
    • 2
  • Chris Charlton
    • 2
  • William J. Browne
    • 2
  • Luc Moreau
    • 1
  1. 1.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK
  2. 2.Graduate School of EducationUniversity of BristolBristolUK

Personalised recommendations