Abstract
Reproducible research in scientific workflows is often addressed by tracking the provenance of the produced results. While this approach allows inspecting intermediate and final results, improves understanding, and permits replaying a workflow execution, it does not ensure that the computational environment is available for subsequent executions to reproduce the experiment. In this work, we propose describing the resources involved in the execution of an experiment using a set of semantic vocabularies, so as to conserve the computational environment. We define a process for documenting the workflow application, management system, and their dependencies based on 4 domain ontologies. We then conduct an experimental evaluation using a real workflow application on an academic and a public Cloud platform. Results show that our approach can reproduce an equivalent execution environment of a predefined virtual machine image on both computing platforms.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amazon Elastic Compute Cloud: Amazon EC2, http://aws.amazon.com/ec2
Azarnoosh, S., Mats, Rynge, o.: Introducing precip: an api for managing repeatable experiments in the cloud. In: 2013 IEEE 5th International Conference on Cloud Computing Technology and Science. CloudCom, vol. 2, pp. 19–26 (2013)
Belhajjame, K., Corcho, O., et al.: Workflow-centric research objects: First class citizens in scholarly discourse. In: Proc. Workshop on the Semantic Publishing (SePublica), Crete, Greece (2012)
Berriman, G.B., Deelman, E., et al.: Montage: a grid-enabled engine for delivering custom science-grade mosaics on demand. In: SPIE Conference on Astronomical Telescopes and Instrumentation, vol. 5493, pp. 221–232 (2004)
Bonnet, P., Manegold, S., et al.: Repeatability and workability evaluation of sigmod. SIGMOD Rec. 40(2), 45–48 (2011)
Brammer, G.R., Crosby, R.W., et al.: Paper mache: Creating dynamic reproducible science. Procedia Computer Science 4(0), 658–667 (2011); Proceedings of the International Conference on Computational Science
Chirigati, F., Shasha, D., Freire, J.: Reprozip: Using provenance to support computational reproducibility (2013)
Deelman, E., Gurmeet, Singh, o.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Scientific Programming 13(3) (2005)
Drummond, C.: Replicability is not reproducibility: Nor is it good science. In: Proceedings of the Evaluation Methods for Machine Learning Workshop at the 26th ICML (2009)
Executable paper grand challenge (2011), http://www.executablepapers.com/
Futuregrid, http://portal.futuregrid.org
Gavish, M., Donoho, D.: A universal identifier for computational results. Procedia Computer Science 4, 637–647 (2011); Proceedings of the ICCS 2011
Van Gorp, P., Mazanek, S.: Share: a web portal for creating and sharing executable research papers. Procedia Computer Science 4, 589–597 (2011); Proceedings of the International Conference on Computational Science, ICCS 2011
Howe, B.: Virtual appliances, cloud computing, and reproducible research. Computing in Science Engineering 14(4), 36–41 (2012)
Mao, B., Hong, Jiang, o.: Read-performance optimization for deduplication-based storage systems in the cloud. Trans. Storage 10(2) (2014)
Matthews, B., Shaon, A., et al.: Towards a methodology for software preservation (2009)
Owl 2 web ontology language, http://www.w3.org/TR/owl2-overview/
Reproducible research: Addressing the need for data and code sharing in computational science (2009), http://www.stanford.edu/~vcs/Conferences/RoundtableNov212009/RoundtableOutputDeclaration.pdf
De Roure, D., Goble, C., Stevens, R.: Designing the myexperiment virtual research environment for the social sharing of workflows. In: Proceedings of the Third IEEE International Conference on e-Science and Grid Computing, pp. 603–610 (2007)
Stodden, V., Leisch, F., Peng, R.D.: Implementing Reproducible Research. Chapman and Hall (2014)
Strodl, S., Mayer, R., Antunes, G., Draws, D., Rauber, A.: Digital preservation of a process and its application to e-science experiments (2013)
Taylor, I., Deelman, E., et al.: Workflows for e-Science. Springer (2007)
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: The condor experience: Research articles. Concurr. Comput. Pract. Exper. 17(2-4), 323–356 (2005)
Zhao, J., Gomez-Perez, J.M., et al.: Why workflows break - understanding and combating decay in taverna workflows. In: 2012 IEEE 8th International Conference on E-Science, pp. 1–9 (2012)
Zhao, X., Zhang, Y., et al.: Liquid: A scalable deduplication file system for virtual machine images. IEEE Trans. on Paral. and Distr. Syst. (99) (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Santana-Perez, I., Ferreira da Silva, R., Rynge, M., Deelman, E., Pérez-Hernández, M.S., Corcho, O. (2014). A Semantic-Based Approach to Attain Reproducibility of Computational Environments in Scientific Workflows: A Case Study. In: Lopes, L., et al. Euro-Par 2014: Parallel Processing Workshops. Euro-Par 2014. Lecture Notes in Computer Science, vol 8805. Springer, Cham. https://doi.org/10.1007/978-3-319-14325-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-14325-5_39
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14324-8
Online ISBN: 978-3-319-14325-5
eBook Packages: Computer ScienceComputer Science (R0)