Abstract
Computational tools for data analysis are being released daily on repositories such as the Comprehensive R Archive Network. How we integrate these tools to solve a problem in research is increasingly complex and requiring frequent updates. To mitigate these Kafkaesque computational challenges in research, this manuscript proposes toolchain walkthrough, an opinionated documentation of a scientific workflow. As a practical complement to our proof-based argument (Gray and Marwick, arXiv, 2019) for reproducible data analysis, here we focus on the practicality of setting up reproducible research compendia, with unit tests, as a measure of code::proof, confidence in computational algorithms.
Thank you to Ben Marwick, Hien Nguyen, Emily Kothe, James Goldie, Mathew Ling, J.D. Long, Kate Smith-Miles, Greg Wilson, Kerrie Mengersen, Jacinta Holloway, Alex Hayes, Noam Ross, Rowland Mosbergen, Luke Prendergast, Dale Maschette, Elio Campitelli, Thomas Lumley, and Daniel S. Katz for advising on particular aspects of this manuscript.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Stack Overflow (https://stackoverflow.com/) is forum for asking tightly scoped programming questions.
- 2.
Visit the discussion on metaresearch and RSEs on the research compendium associated with this manuscript as an example of why this paper, and its companion [8], have so many acknowledgements. Canonical literature is not yet established in the field of RSE, and thus leaders of RSE projects, such as Alex Hayes’ maintenance of the broom:: [23]. This has propelled Hayes rapidly to the level of expert, by virtue of the pioneering collaborative structure of the package, where hundreds of statistical modellers contribute integrated code.
- 3.
As in the companion manuscript [8], we focus on R packages, but the reader is invited to consider these as examples rather than definitive guidance. The same arguments hold for other languages, such as Python, and associated tools.
- 4.
As opposed to ofttimes unattainable or impractical best practices [33] in scientific computing.
References
Aust, F.: Citr: ‘RStudio’ add-in to insert markdown citations (2018). https://github.com/crsh/citr. R package version 0.3.0
Belmonte, A.: The tangled web of self-tying knots. Proc. Natl. Acad. Sci. 104(44), 17243–17244 (2007). https://doi.org/10.1073/pnas.0708150104. https://www.pnas.org/content/104/44/17243
Bland, M.: Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. Int. J. Stat. Med. Res. 4(1), 57–64 (2014). http://lifescienceglobal.com/pms/index.php/ijsmr/article/view/2688
Bryan, J.: Excuse me, do you have a moment to talk about version control? PeerJ PrePrints 5, e3159 (2017)
Consalvo, M.: Zelda 64 and video game fans: a walkthrough of games, intertextuality, and narrative. Televis. New Media 4(3), 321–334 (2003). https://doi.org/10.1177/1527476403253993
Ford, D., Smith, J., Guo, P.J., Parnin, C.: Paradise unplugged: identifying barriers for female participation on stack overflow. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, pp. 846–857. ACM, New York (2016). https://doi.org/10.1145/2950290.2950331
Fraser, H., Parker, T., Nakagawa, S., Barnett, A., Fidler, F.: Questionable research practices in ecology and evolution. PLOS One 13(7), e0200303 (2018). https://doi.org/10.1371/journal.pone.0200303. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0200303. Bibtex*[shortjournal=PLOS ONE]
Gray, C.T., Marwick, B.: Truth, proof, and reproducibility: there’s no counter-attack for the codeless. arXiv:1907.05947 [math], July 2019
Grolemund, G., Wickham, H.: R for data science (2017). https://r4ds.had.co.nz/
Hester, J.: covr: Test Coverage for Packages (2018). https://CRAN.R-project.org/package=covr
Hopkin, M.: Palaeontology journal will ‘fuel black market’. Nature 445, 234–235 (2007). https://doi.org/10.1038/445234b. https://www.nature.com/articles/445234b
Hozo, S.P., Djulbegovic, B., Hozo, I.: Estimating the mean and variance from the median, range, and the size of a sample. BMC Med. Res. Methodol. 5(1), 13 (2005). https://doi.org/10.1186/1471-2288-5-13
Httermann, M.: DevOps for Developers. Apress, New York (2012). google-Books-ID: JfUAkB8AA7EC
Kafka, F.: The Trial, April 2005. http://www.gutenberg.org/ebooks/7849
Katz, D.S., McHenry, K.: Super RSEs: combining research and service in three dimensions of Research Software Engineering, July 2019. https://danielskatzblog.wordpress.com/2019/07/12/
Klees, G., Ruef, A., Cooper, B., Wei, S., Hicks, M.: Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, Canada, pp. 2123–2138. ACM, New York (2018). https://doi.org/10.1145/3243734.3243804. http://doi.acm.org/10.1145/3243734.3243804
Long, J.J., Teetor, P.: R Cookbook, 2nd edn. (2019). https://rc2e.com/
Marwick, B.: rrtools: creates a reproducible research compendium (2018). https://github.com/benmarwick/rrtools
McBain, M., Carroll, J.: Datapasta: R tools for data copy-pasta (2018). R package version 3.0.0. https://CRAN.R-project.org/package=datapasta
Navarro, D.: Learning statistics with R: A tutorial for psychology students and other beginners. (Version 0.6.1) (2019). https://learningstatisticswithr.com/book/
Parker, H.: Opinionated analysis development. preprint (2017). https://doi.org/10.7287/peerj.preprints.3210v1
Ragkhitwetsagul, C., Krinke, J., Paixao, M., Bianco, G., Oliveto, R.: Toxic code snippets on stack overflow. IEEE Trans. Softw. Eng. 1 (2019). https://doi.org/10.1109/TSE.2019.2900307
Robinson, D., Hayes, A.: broom: convert statistical analysis objects into tidy tibbles (2019). https://CRAN.R-project.org/package=broom
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, Hoboken (2009)
skyisup: Deadly Boss Mods Addon Guide (2019). https://www.wowhead.com/deadly-boss-mods-addon-guide
UHS: Universal Hint System: Not your ordinary walkthrough. Just the hints you need (2019). http://www.uhs-hints.com/
Viechtbauer, W.: Conducting meta-analyses in R with the \(<\)span class=”nocase”\(>\)metafor\(<\)/span\(>\) package. J. Stat. Softw. 36(3), 1–48 (2010). http://www.jstatsoft.org/v36/i03/
Wan, X., Wang, W., Liu, J., Tong, T.: Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med. Res. Methodol. 14(1), 135 (2014). https://doi.org/10.1186/1471-2288-14-135
Wickham, H.: R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, Newton (2015). https://books.google.com.au/books?id=DqSxBwAAQBAJ. Bibtex*[lccn=2015472811]
Wickham, H.: Advanced R. Routledge, Boca Raton (2014)
Wickham, H., Bryan, J.: usethis: automate package and project setup (2019). https://CRAN.R-project.org/package=usethis
Wickham, H., Danenberg, P., Eugster, M.: Roxygen2: in-line documentation for R (2019). R package version 6.1.1.9000. https://github.com/klutometis/roxygen
Wilson, G., et al.: Best Practices for scientific computing. PLoS Biol. 12(1), e1001745 (2014). https://doi.org/10.1371/journal.pbio.1001745
Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., Teal, T.K.: Good enough practices in scientific computing. PLOS Comput. Biol. 13(6), e1005510 (2017). https://doi.org/10.1371/journal.pcbi.1005510
Wyatt, C.: Research Software Engineers Association (2019). https://rse.ac.uk/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gray, C.T. (2019). code::proof: Prepare for Most Weather Conditions. In: Nguyen, H. (eds) Statistics and Data Science. RSSDS 2019. Communications in Computer and Information Science, vol 1150. Springer, Singapore. https://doi.org/10.1007/978-981-15-1960-4_2
Download citation
DOI: https://doi.org/10.1007/978-981-15-1960-4_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1959-8
Online ISBN: 978-981-15-1960-4
eBook Packages: Computer ScienceComputer Science (R0)