Reproducibility: Evaluating the Evaluations

Lopresti, Daniel; Nagy, George

doi:10.1007/978-3-030-76423-4_2

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12636))

Included in the following conference series:

International Workshop on Reproducible Research in Pattern Recognition

492 Accesses
1 Citations

Abstract

Evaluation is at the heart of reproducibility in research, and the related but distinct concept of replicability. The difference between the two is whether the determination is based on the original author’s source code (replicability), or is independent of the code and based purely on a written description of the method (reproducibility). A recent study of published machine learning experiments concluded that only two-thirds were reproducible, and that paradoxically, having access to the source code did not help with reproducibility, even though it obviously provides for replicability. Reproducibility depends critically, then, on the quality and completeness of both internal and external documentation. The growing popularity of competitions at pattern recognition conferences presents an opportunity to develop and disseminate new best practices for evaluating reproducibility. As an initial step forward, we collected the final reports and reviewed the competition websites associated with recent ICPR and ICDAR conferences. We used this data from 42 competitions to assess current practices and posit ways to extend evaluations from replicability (already checked by some competitions) to reproducibility on application-oriented data. We recommend empirical standards, monitoring competitions, and modified code testing to be considered and discussed by the research community as we all work together to advance the desirable goals of conducting and publishing research that achieves higher degrees of reproducibility. Competitions can play a special role in this regard, but only if certain changes are made in the way they are formulated, run, and documented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
While ICPR and ICDAR seemed to us to be two obvious candidates to study, as noted by one of the reviewers there are, of course, many other relevant examples that may be instructive to consider, including Kaggle, the KITTI Vision Benchmark Suite, ImageNet, and reproducedpapers.org, among others.

References

National Academies of Sciences, Engineering, and Medicine: Reproducibility and Replicability in Science. The National Academies Press, Washington, DC (2019). https://doi.org/10.17226/25303
Third Workshop on Reproducible Research in Pattern Recognition (RRPR 2020). Reproducible Label. https://rrpr2020.sciencesconf.org/resource/page/id/5. Accessed 16 Oct 2020
Raff, E.: A step toward quantifying independently reproducible machine learning research. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 5,485–5,495. Curran Associates, Inc., Vancouver (2019). https://papers.nips.cc/paper/8787-a-step-toward-quantifying-independently-reproducible-machine-learning-research. Accessed 16 Oct 2020
Berger, E.D., et al.: ACM SIGPLAN Empirical Evaluation Guidelines (2018). https://www.sigplan.org/Resources/EmpiricalEvaluation/. Accessed 13 Oct 2020
Ralph, P., et al.: ACM SIGSOFT Empirical Standards (2020). https://github.com/acmsigsoft/EmpiricalStandards. Accessed 1 Nov 2020
Bonneel, N., et al.: Code replicability in computer graphics. ACM Trans. Graph. 39, 4. https://replicability.graphics/. (Proceedings of SIGGRAPH 2020)
Klees, G., et al.: Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery (2018). https://dl.acm.org/doi/proceedings/10.1145/3243734. Accessed 1 Nov 2020
Wheelan, C.: Naked Statistics. W. W. Norton & Company, New York/London (2013)
Google Scholar
Competitions, University of Salford Manchester. https://www.primaresearch.org/competitions. Accessed 17 Jan 2021
Aksoy, S., et al.: Algorithm performance contest. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 4, pp. 4870–4876, September 2020
Google Scholar
ICDAR 2021 Competitions. https://icdar2021.org/competitions/. Accessed 17 Jan 2021

Download references

Acknowledgements

We thank the reviewers for their carefully considered feedback and helpful comments, many of which we have included in the present version of this paper.

Author information

Authors and Affiliations

Lehigh University, Bethlehem, PA, 18015, USA
Daniel Lopresti
Rensselaer Polytechnic Institute, Troy, NY, 12180, USA
George Nagy

Authors

Daniel Lopresti
View author publications
You can also search for this author in PubMed Google Scholar
George Nagy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Lopresti .

Editor information

Editors and Affiliations

LIRIS, Université de Lyon 2, Bron, France
Bertrand Kerautret
Centre Borelli, École Normale Supérieure Paris-Saclay, Gif-sur-Yvette, France
Miguel Colom
Laboratoire ICube, Illkirch, France
Adrien Krähenbühl
Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA
Daniel Lopresti
Ecole des Ponts Paris Tech, Marne-la-Vallée, France
Pascal Monasse
University of Paris-Saclay, Gif-sur-Yvette, France
Hugues Talbot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopresti, D., Nagy, G. (2021). Reproducibility: Evaluating the Evaluations. In: Kerautret, B., Colom, M., Krähenbühl, A., Lopresti, D., Monasse, P., Talbot, H. (eds) Reproducible Research in Pattern Recognition. RRPR 2021. Lecture Notes in Computer Science(), vol 12636. Springer, Cham. https://doi.org/10.1007/978-3-030-76423-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-76423-4_2
Published: 14 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76422-7
Online ISBN: 978-3-030-76423-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)