Abstract
Evaluation is at the heart of reproducibility in research, and the related but distinct concept of replicability. The difference between the two is whether the determination is based on the original author’s source code (replicability), or is independent of the code and based purely on a written description of the method (reproducibility). A recent study of published machine learning experiments concluded that only two-thirds were reproducible, and that paradoxically, having access to the source code did not help with reproducibility, even though it obviously provides for replicability. Reproducibility depends critically, then, on the quality and completeness of both internal and external documentation. The growing popularity of competitions at pattern recognition conferences presents an opportunity to develop and disseminate new best practices for evaluating reproducibility. As an initial step forward, we collected the final reports and reviewed the competition websites associated with recent ICPR and ICDAR conferences. We used this data from 42 competitions to assess current practices and posit ways to extend evaluations from replicability (already checked by some competitions) to reproducibility on application-oriented data. We recommend empirical standards, monitoring competitions, and modified code testing to be considered and discussed by the research community as we all work together to advance the desirable goals of conducting and publishing research that achieves higher degrees of reproducibility. Competitions can play a special role in this regard, but only if certain changes are made in the way they are formulated, run, and documented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
While ICPR and ICDAR seemed to us to be two obvious candidates to study, as noted by one of the reviewers there are, of course, many other relevant examples that may be instructive to consider, including Kaggle, the KITTI Vision Benchmark Suite, ImageNet, and reproducedpapers.org, among others.
References
National Academies of Sciences, Engineering, and Medicine: Reproducibility and Replicability in Science. The National Academies Press, Washington, DC (2019). https://doi.org/10.17226/25303
Third Workshop on Reproducible Research in Pattern Recognition (RRPR 2020). Reproducible Label. https://rrpr2020.sciencesconf.org/resource/page/id/5. Accessed 16 Oct 2020
Raff, E.: A step toward quantifying independently reproducible machine learning research. In: Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS), pp. 5,485–5,495. Curran Associates, Inc., Vancouver (2019). https://papers.nips.cc/paper/8787-a-step-toward-quantifying-independently-reproducible-machine-learning-research. Accessed 16 Oct 2020
Berger, E.D., et al.: ACM SIGPLAN Empirical Evaluation Guidelines (2018). https://www.sigplan.org/Resources/EmpiricalEvaluation/. Accessed 13 Oct 2020
Ralph, P., et al.: ACM SIGSOFT Empirical Standards (2020). https://github.com/acmsigsoft/EmpiricalStandards. Accessed 1 Nov 2020
Bonneel, N., et al.: Code replicability in computer graphics. ACM Trans. Graph. 39, 4. https://replicability.graphics/. (Proceedings of SIGGRAPH 2020)
Klees, G., et al.: Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery (2018). https://dl.acm.org/doi/proceedings/10.1145/3243734. Accessed 1 Nov 2020
Wheelan, C.: Naked Statistics. W. W. Norton & Company, New York/London (2013)
Competitions, University of Salford Manchester. https://www.primaresearch.org/competitions. Accessed 17 Jan 2021
Aksoy, S., et al.: Algorithm performance contest. In: Proceedings of the 15th International Conference on Pattern Recognition, vol. 4, pp. 4870–4876, September 2020
ICDAR 2021 Competitions. https://icdar2021.org/competitions/. Accessed 17 Jan 2021
Acknowledgements
We thank the reviewers for their carefully considered feedback and helpful comments, many of which we have included in the present version of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lopresti, D., Nagy, G. (2021). Reproducibility: Evaluating the Evaluations. In: Kerautret, B., Colom, M., Krähenbühl, A., Lopresti, D., Monasse, P., Talbot, H. (eds) Reproducible Research in Pattern Recognition. RRPR 2021. Lecture Notes in Computer Science(), vol 12636. Springer, Cham. https://doi.org/10.1007/978-3-030-76423-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-76423-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76422-7
Online ISBN: 978-3-030-76423-4
eBook Packages: Computer ScienceComputer Science (R0)