Abstract
Computational notebooks have gained widespread adoption among researchers from academia and industry as they support reproducible science. These notebooks allow users to combine code, text, and visualizations for easy sharing of experiments and results. They are widely shared in GitHub, which currently has more than 100 million repositories, making it the world’s largest host of source code. Recent reproducibility studies have indicated that there exist good and bad practices in writing these notebooks, which can affect their overall reproducibility. We present ReproduceMeGit, a visualization tool for analyzing the reproducibility of Jupyter Notebooks. This will help repository users and owners to reproduce and directly analyze and assess the reproducibility of any GitHub repository containing Jupyter Notebooks. The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc. Each notebook in the repository, along with the provenance information of its execution, can also be exported in RDF with the integration of the ProvBook tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Rule, A., et al.: Exploration and explanation in computational notebooks. In: Proceedings of the 2018 CHI, pp. 32:1–32:12. ACM (2018)
Pimentel, J.F., et al.: A large-scale study about quality and reproducibility of Jupyter notebooks. In: Proceedings of the 16th International Conference on MSR, pp. 507–517 (2019)
Rehman, M.S.: Towards understanding data analysis workflows using a large notebook corpus. In: Proceedings of the 2019 International Conference on Management of Data, SIGMOD 2019, pp. 1841–1843. Association for Computing Machinery, New York (2019)
Jupyter, P., Bussonnier, M., et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018)
Samuel, S., König-Ries, B.: ProvBook: provenance-based semantic enrichment of interactive notebooks for reproducibility. In: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks (2018)
Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLOS Comput. Biol. 9(10), 1–4 (2013)
Project Jupyter: nbdime: Jupyter notebook diff and merge tools (2021). https://github.com/jupyter/nbdime. Accessed 18 May 2021
Samuel, S.: A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments. Ph.D. thesis, Friedrich-Schiller-Universität Jena (2019)
Samuel, S., König-Ries, B.: Combining P-plan and the REPRODUCE-ME ontology to achieve semantic enrichment of scientific experiments using interactive notebooks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 126–130. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98192-5_24
Samuel, S., König-Ries, B.: ReproduceMeGit: a visualization tool for analyzing reproducibility of jupyter notebooks (2020). https://doi.org/10.6084/m9.figshare.12084393.v1,
Acknowledgments
The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual Werkstatt for Digitization in the Sciences (K3)” within the scope of the program-line “Breakthroughs: Exploring Intelligent Systems” for “Digitization – explore the basics, use applications”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Samuel, S., König-Ries, B. (2021). ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-80960-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-80959-1
Online ISBN: 978-3-030-80960-7
eBook Packages: Computer ScienceComputer Science (R0)