Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless

Gray, Charles T.; Marwick, Ben

doi:10.1007/978-981-15-1960-4_8

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1150))

Included in the following conference series:

Research School on Statistics and Data Science

1138 Accesses
8 Citations
2 Altmetric

Abstract

Current concerns about reproducibility in many research communities can be traced back to a high value placed on empirical reproducibility of the physical details of scientific experiments and observations. For example, the detailed descriptions by 17th century scientist Robert Boyle of his vacuum pump experiments are often held to be the ideal of reproducibility as a cornerstone of scientific practice. Victoria Stodden has claimed that the computer is an analog for Boyle’s pump – another kind of scientific instrument that needs detailed descriptions of how it generates results. In the place of Boyle’s hand-written notes, we now expect code in open source programming languages to be available to enable others to reproduce and extend computational experiments. In this paper we show that there is another genealogy for reproducibility, starting at least from Euclid, in the production of proofs in mathematics. Proofs have a distinctive quality of being necessarily reproducible, and are the cornerstone of mathematical science. However, the task of the modern mathematical scientist has drifted from that of blackboard rhetorician, where the craft of proof reigned, to a scientific workflow that now more closely resembles that of an experimental scientist. So, what is proof in modern mathematics? And, if proof is unattainable in other fields, what is due scientific diligence in a computational experimental environment? How do we measure truth in the context of uncertainty? Adopting a manner of Lakatosian conversant conjecture between two mathematicians, we examine how proof informs our practice of computational statistical inquiry. We propose that a reorientation of mathematical science is necessary so that its reproducibility can be readily assessed.

Thank you to Kerrie Mengersen, Kate Smith-Miles, Mark Padgham, Hien Nguyen, Emily Kothe, Fiona Fidler, Mathew Ling, Luke Prendergast, Adam Sparks, Hannah Fraser, Felix SingletonThorn, James Goldie, Michel Penguin (Michael Sumner), in no particular order, with whom initial bits and pieces of this paper were discussed. Special thanks to Brian A. Davey for proofing the proofs and Alex Hayes for his edifying post [16].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Tools and techniques for computational reproducibility

Article Open access 11 July 2016

Predict, Control, and Replicate to Understand: How Statistics Can Foster the Fundamental Goals of Science

Article 05 September 2018

Randomness and the Games of Science

Notes

1.
We might argue here we employ the term research software engineer (RSE) as Katz and McHenry would define Super RSEs, developers who ‘work with and support researchers, and also work in teams of RSEs who research and develop their own software, support it, grow it, sustain it, etc.’ [20]. Or choose the more ambiguous Research Software Engineers Association definition of RSEs as people in academia who ‘combine expertise in programming with an intricate understanding of research’ [45].
2.
We focus in this manuscript on R packages, but the reader is invited to consider these as examples rather than definitive guidance. The same arguments hold for other languages, such as Python, and associated tools.
3.
Let P be a set. An order on P is a binary relation \(\leqslant \) on P such that, for all \(x, y, z \in P\): we have \(x \leqslant x\); with \(x \leqslant y\) and \(y \leqslant x\) imply \(x = y\); and, finally, \(x \leqslant y\) and \(y \leqslant z\) imply \(x \leqslant z\). We then say \(\leqslant \) is reflexive, antisymmetric, and transitive, for each of these properties, respectively [8].
4.
Lewis Carroll, author of Alice in Wonderland, is a writing pseudonym used by Charles Lutwidge Dogson, born in 1832, who taught mathematics at Christ Church, Oxford [7].
5.
In mathematics, we read \(:=\) as ‘be defined as’, \(\implies \) as ‘implies’, and < as ‘less than but not equal to’.
6.
Turning to the bible of algebra, Lattices and Order [8], we learn the Axiom of Choice ‘asserts that it is possible to find a map which picks one element from each member of a family of non-empty sets’.
7.
From Wickham’s Tidy data [15], we describe data as tidy if
1. 1.
  Each variable forms a column.
2. 2.
  Each observation forms a row.
3. 3.
  Each type of observational unit forms a table.
.
8.
Indeed, the natural consequence of questioning how we practice mathematical science is how we train the next generation of practitioners. Important, however this may be, this is beyond the scope of this manuscript.

References

Amrhein, V., Greenland, S., McShane, B.: Scientists rise up against statistical significance. Nature 567(7748), 305 (2019). https://doi.org/10.1038/d41586-019-00857-9
Auburn, D.: Proof: A Play. Farrar, Straus and Giroux, New York (2001). Google-Books-ID: 6AUtQVhrY90C
Book Google Scholar
Bertot, Y.: A short presentation of Coq. In: Mohamed, O.A., Muñoz, C., Tahar, S. (eds.) TPHOLs 2008. LNCS, vol. 5170, pp. 12–16. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71067-7_3
Chapter Google Scholar
Brown, S.: Partial unpacking and indirect proofs: a study of students’ productive use of the symbolic proof scheme. In: Proceedings of the 16th Annual Conference on Research in Undergraduate Mathematics Education, vol. 2, pp. 47–54 (2013)
Google Scholar
Bryan, J.: Excuse me, do you have a moment to talk about version control? Am. Stat. 72(1), 20–27 (2018). https://doi.org/10.1080/00031305.2017.1399928
Camerer, C.F., et al.: Evaluating replicability of laboratory experiments in economics. Science 351(6280), 1433–1436 (2016). https://doi.org/10.1126/science.aaf0918
Carroll, L.: The Annotated Alice: The, Definitive Edition, updated, subsequent edn. W. W. Norton & Company, New York (1999)
Google Scholar
Davey, B.A., Priestley, H.A.: Introduction to Lattices and Order. Cambridge University Press, Cambridge (2002). Google-Books-ID: vVVTxeuiyvQC
Book Google Scholar
Davey, B.A.: When is a Proof?, 2nd edn. La Trobe University, Bundoora (2009)
Google Scholar
Davey, B.A., Gray, C.T., Pitkethly, J.G.: The homomorphism lattice induced by a finite algebra. Order 35(2), 193–214 (2018). https://doi.org/10.1007/s11083-017-9426-3
Donoho, D.L.: An invitation to reproducible computational research. Biostatistics 11(3), 385–388 (2010). https://doi.org/10.1093/biostatistics/kxq028
Fidler, F., Wilcox, J.: Reproducibility of scientific results. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, winter 2018 edn. (2018)
Google Scholar
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., Fidler, F.: Questionable research practices in ecology and evolution. PLOS One 13(7), e0200303 (2018). https://doi.org/10.1371/journal.pone.0200303
Haack, S.: Defending Science - within Reason: Between Scientism and Cynicism. Prometheus Books, Buffalo (2011). Google-Books-ID: RhXxaPTc\(\_\)EYC
Google Scholar
Wickham, H.: Tidy data. J. Stat. Softw. 59(1), 1–23 (2014). https://doi.org/10.18637/jss.v059.i10
Hayes, A.: testing statistical software - aleatoric, July 2019. https://www.alexpghayes.com/blog/testing-statistical-software/
Head, M.L., Holman, L., Lanfear, R., Kahn, A.T., Jennions, M.D.: The extent and consequences of p-hacking in science. PLOS Biol. 13(3), e1002106 (2015). https://doi.org/10.1371/journal.pbio.1002106
Hester, J.: covr: Bringing test coverage to R, January 2016. https://www.rstudio.com/resources/webinars/covr-bringing-test-coverage-to-r/
Hester, J.: covr: Test Coverage for Packages (2018). https://CRAN.R-project.org/package=covr
Katz, D.S., McHenry, K.: Super RSEs: Combining research and service in three dimensions of Research Software Engineering, July 2019. https://danielskatzblog.wordpress.com/2019/07/12/
Lakatos, I.: Proofs and Refutations: The Logic of Mathematical Discovery, reissue edn. Cambridge University Press, Cambridge (2015)
Google Scholar
LeVeque, R.J., Mitchell, I.M., Stodden, V.: Reproducible research for scientific computing: tools and strategies for changing the culture. Comput. Sci. Eng. 14 (2012). https://doi.org/10.1109/mcse.2012.38
Martin-Löf, P.: Constructive mathematics and computer programming. In: Cohen, L.J., Łoś, J., Pfeiffer, H., Podewski, K.P. (eds.) Studies in Logic and the Foundations of Mathematics, Logic, Methodology and Philosophy of Science VI, vol. 104, pp. 153–175. Elsevier (1982). https://doi.org/10.1016/S0049-237X(09)70189-2
Marwick, B.: rrtools: Creates a reproducible research compendium (2018). https://github.com/benmarwick/rrtools
Marwick, B., Boettiger, C., Mullen, L.: Packaging data analytical work reproducibly using R (and friends). Technical report e3192v2, PeerJ Inc., March 2018. https://doi.org/10.7287/peerj.preprints.3192v2, https://peerj.com/preprints/3192
Merton, R.K.: On Social Structure and Science. University of Chicago Press, Chicago (1996). Google-Books-ID: j94XiVDwAZEC
Google Scholar
Murray, C.: How to accuse the other guy of lying with statistics. Stat. Sci. 20(3), 239–241 (2005). https://www.jstor.org/stable/20061179
Article MathSciNet Google Scholar
Nowogrodzki, A.: How to support open-source software and stay sane. Nature 571, 133 (2019). https://doi.org/10.1038/d41586-019-02046-0
Parker, H.: Opinionated analysis development. preprint (2017). https://doi.org/10.7287/peerj.preprints.3210v1
Peng, R.D.: Reproducible research in computational science. Science 334(6060), 1226–1227 (2011). https://doi.org/10.1126/science.1213847
Article Google Scholar
Pickering, A.: The Mangle of Practice: Time, Agency, and Science. University of Chicago Press, Chicago (2010)
MATH Google Scholar
Robinson, D., Hayes, A.: broom: Convert Statistical Analysis Objects into Tidy Tibbles (2019). https://CRAN.R-project.org/package=broom
Rodriguez-Sanchez, F., Pérez-Luque, A.J., Bartomeus, I., Varela, S.: Ciencia reproducible: qué, por qué, cómo. Revista Ecosistemas 25(2), 83–92-92 (2016). https://doi.org/10.7818/re.2014.25-2.00, https://www.revistaecosistemas.net/index.php/ecosistemas/article/view/1178
Shapin, S., Schaffer, S.: Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life (New in paper), vol. 32. Princeton University Press, Princeton (2011)
Book Google Scholar
Stodden, V.: What scientific idea is ready for retirement? (2014). https://www.edge.org/response-detail/25340.%202014
Stodden, V., Borwein, J., Bailey, D.H.: “Setting the default to reproducible” in computational science research. SIAM News 46(5), 4–6 (2013)
Google Scholar
Sørensen, M.H., Urzyczyn, P.: Lectures on the Curry-Howard Isomorphism, vol. 149. Elsevier, Amsterdam (2006)
MATH Google Scholar
Wallach, J.D., Boyack, K.W., Ioannidis, J.P.A.: Reproducible research practices, transparency, and open access data in the biomedical literature, 2015–2017. PLOS Biol. 16(11), e2006930 (2018). https://doi.org/10.1371/journal.pbio.2006930
Westgate, M., et al.: metaverse: Workflows for evidence synthesis projects (2019). https://github.com/rmetaverse/metaverse, r package version 0.0.1
Wickham, H.: R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, Sebastopol (2015). https://books.google.com.au/books?id=DqSxBwAAQBAJ
Wickham, H.: testthat: Get Started with Testing (2011)
Google Scholar
Wickham, H.: tidyverse: Easily Install and Load the ‘Tidyverse’ (2017). https://CRAN.R-project.org/package=tidyverse
Wilson, G., et al.: Best practices for scientific computing. PLoS Biol. 12(1), e1001745 (2014). https://doi.org/10.1371/journal.pbio.1001745
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., Teal, T.K.: Good enough practices in scientific computing. PLOS Comput. Biol. 13(6), e1005510 (2017). https://doi.org/10.1371/journal.pcbi.1005510
Wyatt, C.: Research Software Engineers Association (2019). https://rse.ac.uk/
Zeileis, A.: CRAN task views. R News 5(1), 39–40 (2005). https://CRAN.R-project.org/doc/Rnews/
Google Scholar

Download references

Author information

Authors and Affiliations

La Trobe University, Melbourne, Australia
Charles T. Gray
University of Washington, Seattle, USA
Ben Marwick

Authors

Charles T. Gray
View author publications
You can also search for this author in PubMed Google Scholar
Ben Marwick
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charles T. Gray .

Editor information

Editors and Affiliations

La Trobe University, Bundoora, VIC, Australia
Hien Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gray, C.T., Marwick, B. (2019). Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless. In: Nguyen, H. (eds) Statistics and Data Science. RSSDS 2019. Communications in Computer and Information Science, vol 1150. Springer, Singapore. https://doi.org/10.1007/978-981-15-1960-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-15-1960-4_8
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1959-8
Online ISBN: 978-981-15-1960-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless

Abstract

Access this chapter

Similar content being viewed by others

Tools and techniques for computational reproducibility

Predict, Control, and Replicate to Understand: How Statistics Can Foster the Fundamental Goals of Science

Randomness and the Games of Science

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Truth, Proof, and Reproducibility: There’s No Counter-Attack for the Codeless

Abstract

Access this chapter

Similar content being viewed by others

Tools and techniques for computational reproducibility

Predict, Control, and Replicate to Understand: How Statistics Can Foster the Fundamental Goals of Science

Randomness and the Games of Science

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation