Skip to main content

Reproducibility: Evaluating the Evaluations

  • Conference paper
  • First Online:
Book cover Reproducible Research in Pattern Recognition (RRPR 2021)

Abstract

Evaluation is at the heart of reproducibility in research, and the related but distinct concept of replicability. The difference between the two is whether the determination is based on the original author’s source code (replicability), or is independent of the code and based purely on a written description of the method (reproducibility). A recent study of published machine learning experiments concluded that only two-thirds were reproducible, and that paradoxically, having access to the source code did not help with reproducibility, even though it obviously provides for replicability. Reproducibility depends critically, then, on the quality and completeness of both internal and external documentation. The growing popularity of competitions at pattern recognition conferences presents an opportunity to develop and disseminate new best practices for evaluating reproducibility. As an initial step forward, we collected the final reports and reviewed the competition websites associated with recent ICPR and ICDAR conferences. We used this data from 42 competitions to assess current practices and posit ways to extend evaluations from replicability (already checked by some competitions) to reproducibility on application-oriented data. We recommend empirical standards, monitoring competitions, and modified code testing to be considered and discussed by the research community as we all work together to advance the desirable goals of conducting and publishing research that achieves higher degrees of reproducibility. Competitions can play a special role in this regard, but only if certain changes are made in the way they are formulated, run, and documented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    While ICPR and ICDAR seemed to us to be two obvious candidates to study, as noted by one of the reviewers there are, of course, many other relevant examples that may be instructive to consider, including Kaggle, the KITTI Vision Benchmark Suite, ImageNet, and reproducedpapers.org, among others.

References

  1. National Academies of Sciences, Engineering, and Medicine: Reproducibility and Replicability in Science. The National Academies Press, Washington, DC (2019). https://doi.org/10.17226/25303

  2. Third Workshop on Reproducible Research in Pattern Recognition (RRPR 2020). Reproducible Label. https://rrpr2020.sciencesconf.org/resource/page/id/5. Accessed 16 Oct 2020

  3. Raff, E.: A step toward quantifying independently reproducible machine learning research. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 5,485–5,495. Curran Associates, Inc., Vancouver (2019). https://papers.nips.cc/paper/8787-a-step-toward-quantifying-independently-reproducible-machine-learning-research. Accessed 16 Oct 2020

  4. Berger, E.D., et al.: ACM SIGPLAN Empirical Evaluation Guidelines (2018). https://www.sigplan.org/Resources/EmpiricalEvaluation/. Accessed 13 Oct 2020

  5. Ralph, P., et al.: ACM SIGSOFT Empirical Standards (2020). https://github.com/acmsigsoft/EmpiricalStandards. Accessed 1 Nov 2020

  6. Bonneel, N., et al.: Code replicability in computer graphics. ACM Trans. Graph. 39, 4. https://replicability.graphics/. (Proceedings of SIGGRAPH 2020)

  7. Klees, G., et al.: Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery (2018). https://dl.acm.org/doi/proceedings/10.1145/3243734. Accessed 1 Nov 2020

  8. Wheelan, C.: Naked Statistics. W. W. Norton & Company, New York/London (2013)

    Google Scholar 

  9. Competitions, University of Salford Manchester. https://www.primaresearch.org/competitions. Accessed 17 Jan 2021

  10. Aksoy, S., et al.: Algorithm performance contest. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 4, pp. 4870–4876, September 2020

    Google Scholar 

  11. ICDAR 2021 Competitions. https://icdar2021.org/competitions/. Accessed 17 Jan 2021

Download references

Acknowledgements

We thank the reviewers for their carefully considered feedback and helpful comments, many of which we have included in the present version of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Lopresti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lopresti, D., Nagy, G. (2021). Reproducibility: Evaluating the Evaluations. In: Kerautret, B., Colom, M., Krähenbühl, A., Lopresti, D., Monasse, P., Talbot, H. (eds) Reproducible Research in Pattern Recognition. RRPR 2021. Lecture Notes in Computer Science(), vol 12636. Springer, Cham. https://doi.org/10.1007/978-3-030-76423-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-76423-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-76422-7

  • Online ISBN: 978-3-030-76423-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics