Skip to main content

ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks

  • Conference paper
  • First Online:
Provenance and Annotation of Data and Processes (IPAW 2020, IPAW 2021)

Abstract

Computational notebooks have gained widespread adoption among researchers from academia and industry as they support reproducible science. These notebooks allow users to combine code, text, and visualizations for easy sharing of experiments and results. They are widely shared in GitHub, which currently has more than 100 million repositories, making it the world’s largest host of source code. Recent reproducibility studies have indicated that there exist good and bad practices in writing these notebooks, which can affect their overall reproducibility. We present ReproduceMeGit, a visualization tool for analyzing the reproducibility of Jupyter Notebooks. This will help repository users and owners to reproduce and directly analyze and assess the reproducibility of any GitHub repository containing Jupyter Notebooks. The tool provides information on the number of notebooks that were successfully reproducible, those that resulted in exceptions, those with different results from the original notebooks, etc. Each notebook in the repository, along with the provenance information of its execution, can also be exported in RDF with the integration of the ProvBook tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.anaconda.com.

  2. 2.

    https://github.com/fusion-jena/ReproduceMeGit.

References

  1. Rule, A., et al.: Exploration and explanation in computational notebooks. In: Proceedings of the 2018 CHI, pp. 32:1–32:12. ACM (2018)

    Google Scholar 

  2. Pimentel, J.F., et al.: A large-scale study about quality and reproducibility of Jupyter notebooks. In: Proceedings of the 16th International Conference on MSR, pp. 507–517 (2019)

    Google Scholar 

  3. Rehman, M.S.: Towards understanding data analysis workflows using a large notebook corpus. In: Proceedings of the 2019 International Conference on Management of Data, SIGMOD 2019, pp. 1841–1843. Association for Computing Machinery, New York (2019)

    Google Scholar 

  4. Jupyter, P., Bussonnier, M., et al.: Binder 2.0 - reproducible, interactive, sharable environments for science at scale. In: Proceedings of the 17th Python in Science Conference, pp. 113–120 (2018)

    Google Scholar 

  5. Samuel, S., König-Ries, B.: ProvBook: provenance-based semantic enrichment of interactive notebooks for reproducibility. In: Proceedings of the ISWC 2018 Posters & Demonstrations, Industry and Blue Sky Ideas Tracks (2018)

    Google Scholar 

  6. Sandve, G.K., Nekrutenko, A., Taylor, J., Hovig, E.: Ten simple rules for reproducible computational research. PLOS Comput. Biol. 9(10), 1–4 (2013)

    Google Scholar 

  7. Project Jupyter: nbdime: Jupyter notebook diff and merge tools (2021). https://github.com/jupyter/nbdime. Accessed 18 May 2021

  8. Samuel, S.: A provenance-based semantic approach to support understandability, reproducibility, and reuse of scientific experiments. Ph.D. thesis, Friedrich-Schiller-Universität Jena (2019)

    Google Scholar 

  9. Samuel, S., König-Ries, B.: Combining P-plan and the REPRODUCE-ME ontology to achieve semantic enrichment of scientific experiments using interactive notebooks. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 11155, pp. 126–130. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98192-5_24

    Chapter  Google Scholar 

  10. Samuel, S., König-Ries, B.: ReproduceMeGit: a visualization tool for analyzing reproducibility of jupyter notebooks (2020). https://doi.org/10.6084/m9.figshare.12084393.v1,

Download references

Acknowledgments

The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual Werkstatt for Digitization in the Sciences (K3)” within the scope of the program-line “Breakthroughs: Exploring Intelligent Systems” for “Digitization – explore the basics, use applications”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheeba Samuel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Samuel, S., König-Ries, B. (2021). ReproduceMeGit: A Visualization Tool for Analyzing Reproducibility of Jupyter Notebooks. In: Glavic, B., Braganholo, V., Koop, D. (eds) Provenance and Annotation of Data and Processes. IPAW IPAW 2020 2021. Lecture Notes in Computer Science(), vol 12839. Springer, Cham. https://doi.org/10.1007/978-3-030-80960-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-80960-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-80959-1

  • Online ISBN: 978-3-030-80960-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics