A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study

  • Idafen Santana-Perez
  • Rafael Ferreira da Silva
  • Mats Rynge
  • Ewa Deelman
  • María S. Pérez-Hernández
  • Oscar Corcho
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8805)

Abstract

Reproducible research in scientific workflows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We define a process for documenting the workflow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation using a real workflow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on both computing platforms.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amazon Elastic Compute Cloud: Amazon EC2, http://aws.amazon.com/ec2
  2. 2.
    Azarnoosh, S., Mats, Rynge, o.: Introducing precip: an api for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science. CloudCom, vol. 2, pp. 19–26 (2013)Google Scholar
  3. 3.
    Belhajjame, K., Corcho, O., et al.: Workflow-centric research objects: First class citizens in scholarly discourse. In: Proc. Workshop on the Semantic Publishing (SePublica), Crete, Greece (2012)Google Scholar
  4. 4.
    Berriman, G.B., Deelman, E., et al.: Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand. In: SPIE Conference on Astronomical Telescopes and Instrumentation, vol. 5493, pp. 221–232 (2004)Google Scholar
  5. 5.
    Bonnet, P., Manegold, S., et al.: Repeatability and workability evaluation of sigmod. SIGMOD Rec. 40(2), 45–48 (2011)CrossRefGoogle Scholar
  6. 6.
    Brammer, G.R., Crosby, R.W., et al.: Paper mache: Creating dynamic reproducible science. Procedia Computer Science 4(0), 658–667 (2011); Proceedings of the International Conference on Computational ScienceGoogle Scholar
  7. 7.
    Chirigati, F., Shasha, D., Freire, J.: Reprozip: Using provenance to support computational reproducibility (2013)Google Scholar
  8. 8.
    Deelman, E., Gurmeet, Singh, o.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13(3) (2005)Google Scholar
  9. 9.
    Drummond, C.: Replicability is not reproducibility: Nor is it good science. In: Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML (2009)Google Scholar
  10. 10.
    Executable paper grand challenge (2011), http://www.executablepapers.com/
  11. 11.
  12. 12.
    Gavish, M., Donoho, D.: A universal identifier for computational results. Procedia Computer Science 4, 637–647 (2011); Proceedings of the ICCS 2011Google Scholar
  13. 13.
    Van Gorp, P., Mazanek, S.: Share: a web portal for creating and sharing executable research papers. Procedia Computer Science 4, 589–597 (2011); Proceedings of the International Conference on Computational Science, ICCS 2011 Google Scholar
  14. 14.
    Howe, B.: Virtual appliances, cloud computing, and reproducible research. Computing in Science Engineering 14(4), 36–41 (2012)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Mao, B., Hong, Jiang, o.: Read-performance optimization for deduplication-based storage systems in the cloud. Trans. Storage 10(2) (2014)Google Scholar
  16. 16.
    Matthews, B., Shaon, A., et al.: Towards a methodology for software preservation (2009)Google Scholar
  17. 17.
    Owl 2 web ontology language, http://www.w3.org/TR/owl2-overview/
  18. 18.
    Reproducible research: Addressing the need for data and code sharing in computational science (2009), http://www.stanford.edu/~vcs/Conferences/RoundtableNov212009/RoundtableOutputDeclaration.pdf
  19. 19.
    De Roure, D., Goble, C., Stevens, R.: Designing the myexperiment virtual research environment for the social sharing of workflows. In: Proceedings of the Third IEEE International Conference on e-Science and Grid Computing, pp. 603–610 (2007)Google Scholar
  20. 20.
    Stodden, V., Leisch, F., Peng, R.D.: Implementing Reproducible Research. Chapman and Hall (2014)Google Scholar
  21. 21.
    Strodl, S., Mayer, R., Antunes, G., Draws, D., Rauber, A.: Digital preservation of a process and its application to e-science experiments (2013)Google Scholar
  22. 22.
    Taylor, I., Deelman, E., et al.: Workflows for e-Science. Springer (2007)Google Scholar
  23. 23.
    Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The condor experience: Research articles. Concurr. Comput. Pract. Exper. 17(2-4), 323–356 (2005)CrossRefGoogle Scholar
  24. 24.
    Zhao, J., Gomez-Perez, J.M., et al.: Why workflows break - understanding and combating decay in taverna workflows. In: 2012 IEEE 8th International Conference on E-Science, pp. 1–9 (2012)Google Scholar
  25. 25.
    Zhao, X., Zhang, Y., et al.: Liquid: A scalable deduplication file system for virtual machine images. IEEE Trans. on Paral. and Distr. Syst. (99) (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Idafen Santana-Perez
    • 1
  • Rafael Ferreira da Silva
    • 2
  • Mats Rynge
    • 2
  • Ewa Deelman
    • 2
  • María S. Pérez-Hernández
    • 1
  • Oscar Corcho
    • 1
  1. 1.Ontology Engineering GroupUniversidad Politécnica de MadridMadridSpain
  2. 2.USC Information Sciences InstituteMarina Del ReyUSA

Personalised recommendations