Combining P-Plan and the REPRODUCE-ME Ontology to Achieve Semantic Enrichment of Scientific Experiments Using Interactive Notebooks
End-to-end reproducibility of scientific experiments requires scientists to share their experimental data along with the computational environment. Interactive notebooks have recently gained widespread popularity among scientists because they allow users to document their experiments along with the code, visualize the results inline and selectively execute the code. In a multi-user environment where users can run and modify the shared notebooks, it becomes essential to capture the provenance of notebooks along with the experiments which used them. In this paper, we propose a way to capture provenance of these interactive notebooks and convert them into semantic descriptions so that a user can query the difference between the results, steps, errors and the execution environment of the code. We use the REPRODUCE-ME ontology extended from PROV-O and P-Plan to describe the provenance of notebook execution. We evaluate our prototype in a multi-user environment provided by JupyterHub.
KeywordsNotebooks Provenance Reproducibility Experiments Ontology
This research is supported by the “Deutsche Forschungsgemeinschaft” (DFG) in Project Z2 of the CRC/TRR 166 “High-end light microscopy elucidates membrane receptor function - ReceptorLight”. We thank Christoph Biskup, Kathrin Groeneveld and Tom Kache from University Hospital Jena, Germany, for providing the requirements to develop the proposed approach and evaluating the system.
- 1.Carvalho, L.A.M.C., Belhajjame, K., Medeiros, C.B.: Converting scripts into reproducible workflow research objects. In: 2016 IEEE 12th International Conference on e-Science (e-Science), pp. 71–80, October 2016Google Scholar
- 2.Carvalho, L.A.M.C., Wang, R., Gil, Y., Garijo, D.: NiW: converting notebooks into workflows to capture dataflow and provenance (2017)Google Scholar
- 3.Garijo, D., Gil, Y.: Augmenting PROV with plans in P-Plan: scientific processes as linked data. In: CEUR Workshop Proceedings (2012)Google Scholar
- 4.Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., et al.: PROV-O: the PROV ontology. W3C Recomm. 30 (2013)Google Scholar
- 5.McPhillips, T.M., Song, T., Kolisnik, T., Aulenbach, S., Belhajjame, K., et al.: YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. CoRR abs/1502.02403 (2015)Google Scholar
- 7.Pimentel, J.F.N., Braganholo, V., Murta, L., Freire, J.: Collecting and analyzing provenance on interactive notebooks: when IPython meets noWorkflow. In: 7th USENIX Workshop on the Theory and Practice of Provenance (TaPP 2015). USENIX Association, Edinburgh (2015)Google Scholar
- 8.Samuel, S., König-Ries, B.: REPRODUCE-ME: ontology-based data access for reproducibility of microscopy experiments. In: Blomqvist, E., Hose, K., Paulheim, H., Ławrynowicz, A., Ciravegna, F., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10577, pp. 17–20. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70407-4_4CrossRefGoogle Scholar
- 9.Samuel, S., Taubert, F., Walther, D., König-Ries, B., Bücker, H.M.: Towards reproducibility of microscopy experiments. D-Lib Mag. 23(1/2) (2017)Google Scholar