Towards Complete Tracking of Provenance in Experimental Distributed Systems Research

  • Tomasz BuchertEmail author
  • Lucas Nussbaum
  • Jens Gustedt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9523)


Running experiments on modern systems like supercomputers, cloud infrastructures or P2P networks became very complex, both technically and methodologically. It is difficult to re-run an experiment or understand its results even with technical background on the technology and methods used. Storing the provenance of experimental data, i.e., storing information about how the results were produced, proved to be a powerful tool to address similar problems in computational natural sciences. In this paper, we (1) survey provenance collection in various domains of computer science, (2) introduce a new classification of provenance types, and (3) sketch a design of a provenance system inspired by this classification.


Business Process Modeling General Computing Experiment Description Data Provenance Provenance Information 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Albrecht, J., et al.: Planetlab application management using plush. ACM SIGOPS Oper. Syst. Rev. 40, 33–40 (2006)CrossRefGoogle Scholar
  2. 2.
    Barga, R.S., et al.: Automatic capture and efficient storage of e-science experiment provenance. Conc. Comp. Pract. Experience 20(5), 419–429 (2008)CrossRefGoogle Scholar
  3. 3.
    Barker, A., van Hemert, J.: Scientific workflow: a survey and research directions. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds.) PPAM 2007. LNCS, vol. 4967, pp. 746–753. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  4. 4.
    Biton, O., et al.: Querying and managing provenance through user views in scientific workflows. In: Proceedings of the 24th International Conference on Data Engineering, pp. 1072–1081, ICDE 2008, Washington, DC, USA (2008)Google Scholar
  5. 5.
    Buchert, T., et al.: A survey of general-purpose experiment management tools for distributed systems. Future Gener. Comput. Syst. 45, 1–12 (2014)CrossRefGoogle Scholar
  6. 6.
    Buchert, T., et al.: A workflow-inspired, modular and robust approach to experiments in distributed systems. In: The 14th International Symposium on Cluster, Cloud and Grid Computing, Chicago, Illinois, USA (2014)Google Scholar
  7. 7.
    Caneill, M., et al.: Debsources: live and historical views on macro-level software evolution. In: Proceedings of the 8th International Symposium on Empirical Software Engineering and Measurement, pp. 28:1–28:10, ESEM 2014, New York, NY, USA (2014)Google Scholar
  8. 8.
    Cheney, J., et al.: Provenance in databases: why, how, and where. Found. Trends databases 1(4), 379–474 (2009)CrossRefGoogle Scholar
  9. 9.
    Cohen, S., et al.: Towards a model of provenance and user views in scientific workflows. In: Proceedings of the Third Internaitonal Confernce on Data Integration in the Life Sciences, DILS 2006, pp. 264–279 (2006)Google Scholar
  10. 10.
    Curcin, V., et al.: Scientific workflow systems - can one size fit all? In: Biomedical Engineering Conference, pp. 1–9 (2008)Google Scholar
  11. 11.
    Dabbish, L., et al.: Social coding in github: transparency and collaboration in an open software repository. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work, pp. 1277–1286, CSCW 2012, New York, NY, USA (2012)Google Scholar
  12. 12.
    Davidson, S.B., et al.: Provenance in scientific workflow systems. IEEE Data Eng. Bull. 30(4), 44–50 (2007)Google Scholar
  13. 13.
    Davidson, S.B., et al.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of ACM SIGMOD, pp. 1345–1350 (2008)Google Scholar
  14. 14.
    Davison, A.: Automated capture of experiment context for easier reproducibility in computational research. Comput. Sci. Eng. 14(4), 48–56 (2012)CrossRefGoogle Scholar
  15. 15.
    DeCandia, G., et al.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007)CrossRefGoogle Scholar
  16. 16.
    Deelman, E., et al.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  17. 17.
    Dolstra, E., et al.: Nixos: a purely functional linux distribution. In: Proceedings of the 13th International Conference on Functional Programming, pp. 367–378. ICFP 2008 (2008)Google Scholar
  18. 18.
    Freire, J.-L., Silva, C.T., Callahan, S.P., Santos, E., Scheidegger, C.E., Vo, H.T.: Managing Rapidly-Evolving Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 10–18. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  19. 19.
    Freire, J., et al.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)CrossRefGoogle Scholar
  20. 20.
    Garijo, D., et al.: Common motifs in scientific workflows: an empirical analysis. Future Gener. Comput. Syst. 36, 338–351 (2014)CrossRefGoogle Scholar
  21. 21.
    Gustedt, J., et al.: Experimental methodologies for large-scale systems: a survey. Parallel Process. Lett. 19(3), 399–418 (2009)CrossRefMathSciNetGoogle Scholar
  22. 22.
    Knuth, D.E.: Literate programming. Comput. J. 27(2), 97–111 (1984)zbMATHCrossRefGoogle Scholar
  23. 23.
    Ko, R.K.L.: A computer scientist’s introductory guide to business process management (bpm). Crossroads 15(4), 4:11–4:18 (2009)Google Scholar
  24. 24.
    Ludäscher, B., et al.: Scientific workflow management and the kepler system. Concurrency Comput. Pract. Experience 18(10), 1039–1065 (2006)CrossRefGoogle Scholar
  25. 25.
    McPhillips, T., et al.: Scientific workflow design for mere mortals. Future Gener. Comput. Syst. 25(5), 541–551 (2009)CrossRefGoogle Scholar
  26. 26.
    Moreau, L., et al.: Special issue: the first provenance challenge. Concurrency Comput. Pract. Experience 20(5), 409–418 (2008)CrossRefGoogle Scholar
  27. 27.
    Moreau, L., et al.: The open provenance model core specification (v1.1). Future Gener. Comput. Syst. 27(6), 743–756 (2011)CrossRefMathSciNetGoogle Scholar
  28. 28.
    Olson, M.A., et al.: Berkeley db. In: Proceedings of the Annual USENIX Technical Conference, pp. 43–43, ATEC 1999, Berkeley, CA, USA (1999)Google Scholar
  29. 29.
    Rakotoarivelo, T., et al.: Omf: a control and management framework for networking testbeds. ACM SIGOPS Oper. Syst. Rev. 43(4), 54–59 (2010)CrossRefGoogle Scholar
  30. 30.
    Simmhan, Y.L., et al.: A survey of data provenance in e-science. SIGMOD Rec. 34(3), 31–36 (2005)CrossRefGoogle Scholar
  31. 31.
    Singer, J.: A literate experimentation manifesto. In: Proceedings of the 10th SIGPLAN Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pp. 91–102, ONWARD 2011, ACM, New York, NY, USA (2011)Google Scholar
  32. 32.
    Spinellis, D.: A repository with 44 years of unix evolution. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 13–16. IEEE (2015)Google Scholar
  33. 33.
    Stanisic, L., et al.: An effective git and org-mode based workflow for reproducible research. SIGOPS Oper. Syst. Rev. 49(1), 61–70 (2015)CrossRefGoogle Scholar
  34. 34.
    Talia, D.: Workflow systems for science: Concepts and tools. ISRN Soft. Eng. 2013, 15 (2013)Google Scholar
  35. 35.
    Van Der Aalst, W.M.P., et al.: Workflow patterns. Distrib. Parallel Databases 14(1), 5–51 (2003)CrossRefGoogle Scholar
  36. 36.
    Yu, J., et al.: A taxonomy of scientific workflow systems for grid computing. SIGMOD Rec. 34, 44–49 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Tomasz Buchert
    • 1
    • 2
    • 3
    Email author
  • Lucas Nussbaum
    • 1
    • 2
    • 3
  • Jens Gustedt
    • 1
    • 4
    • 5
  1. 1.InriaVillers-lès-NancyFrance
  2. 2.Université de Lorraine, LORIANancyFrance
  3. 3.CNRS, LORIA - UMRNancyFrance
  4. 4.Université de StrasbourgStrasbourgFrance
  5. 5.CNRS, Icube - UMRStrasbourgFrance

Personalised recommendations