Skip to main content

code::proof: Prepare for Most Weather Conditions

  • Conference paper
  • First Online:
Statistics and Data Science (RSSDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1150))

Included in the following conference series:

Abstract

Computational tools for data analysis are being released daily on repositories such as the Comprehensive R Archive Network. How we integrate these tools to solve a problem in research is increasingly complex and requiring frequent updates. To mitigate these Kafkaesque computational challenges in research, this manuscript proposes toolchain walkthrough, an opinionated documentation of a scientific workflow. As a practical complement to our proof-based argument (Gray and Marwick, arXiv, 2019) for reproducible data analysis, here we focus on the practicality of setting up reproducible research compendia, with unit tests, as a measure of code::proof, confidence in computational algorithms.

Thank you to Ben Marwick, Hien Nguyen, Emily Kothe, James Goldie, Mathew Ling, J.D. Long, Kate Smith-Miles, Greg Wilson, Kerrie Mengersen, Jacinta Holloway, Alex Hayes, Noam Ross, Rowland Mosbergen, Luke Prendergast, Dale Maschette, Elio Campitelli, Thomas Lumley, and Daniel S. Katz for advising on particular aspects of this manuscript.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Stack Overflow (https://stackoverflow.com/) is forum for asking tightly scoped programming questions.

  2. 2.

    Visit the discussion on metaresearch and RSEs on the research compendium associated with this manuscript as an example of why this paper, and its companion [8], have so many acknowledgements. Canonical literature is not yet established in the field of RSE, and thus leaders of RSE projects, such as Alex Hayes’ maintenance of the broom:: [23]. This has propelled Hayes rapidly to the level of expert, by virtue of the pioneering collaborative structure of the package, where hundreds of statistical modellers contribute integrated code.

  3. 3.

    As in the companion manuscript [8], we focus on R packages, but the reader is invited to consider these as examples rather than definitive guidance. The same arguments hold for other languages, such as Python, and associated tools.

  4. 4.

    As opposed to ofttimes unattainable or impractical best practices [33] in scientific computing.

References

  1. Aust, F.: Citr: ‘RStudio’ add-in to insert markdown citations (2018). https://github.com/crsh/citr. R package version 0.3.0

  2. Belmonte, A.: The tangled web of self-tying knots. Proc. Natl. Acad. Sci. 104(44), 17243–17244 (2007). https://doi.org/10.1073/pnas.0708150104. https://www.pnas.org/content/104/44/17243

    Article  Google Scholar 

  3. Bland, M.: Estimating mean and standard deviation from the sample size, three quartiles, minimum, and maximum. Int. J. Stat. Med. Res. 4(1), 57–64 (2014). http://lifescienceglobal.com/pms/index.php/ijsmr/article/view/2688

    Article  Google Scholar 

  4. Bryan, J.: Excuse me, do you have a moment to talk about version control? PeerJ PrePrints 5, e3159 (2017)

    Google Scholar 

  5. Consalvo, M.: Zelda 64 and video game fans: a walkthrough of games, intertextuality, and narrative. Televis. New Media 4(3), 321–334 (2003). https://doi.org/10.1177/1527476403253993

    Article  Google Scholar 

  6. Ford, D., Smith, J., Guo, P.J., Parnin, C.: Paradise unplugged: identifying barriers for female participation on stack overflow. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, pp. 846–857. ACM, New York (2016). https://doi.org/10.1145/2950290.2950331

  7. Fraser, H., Parker, T., Nakagawa, S., Barnett, A., Fidler, F.: Questionable research practices in ecology and evolution. PLOS One 13(7), e0200303 (2018). https://doi.org/10.1371/journal.pone.0200303. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0200303. Bibtex*[shortjournal=PLOS ONE]

    Article  Google Scholar 

  8. Gray, C.T., Marwick, B.: Truth, proof, and reproducibility: there’s no counter-attack for the codeless. arXiv:1907.05947 [math], July 2019

  9. Grolemund, G., Wickham, H.: R for data science (2017). https://r4ds.had.co.nz/

  10. Hester, J.: covr: Test Coverage for Packages (2018). https://CRAN.R-project.org/package=covr

  11. Hopkin, M.: Palaeontology journal will ‘fuel black market’. Nature 445, 234–235 (2007). https://doi.org/10.1038/445234b. https://www.nature.com/articles/445234b

    Article  Google Scholar 

  12. Hozo, S.P., Djulbegovic, B., Hozo, I.: Estimating the mean and variance from the median, range, and the size of a sample. BMC Med. Res. Methodol. 5(1), 13 (2005). https://doi.org/10.1186/1471-2288-5-13

    Article  Google Scholar 

  13. Httermann, M.: DevOps for Developers. Apress, New York (2012). google-Books-ID: JfUAkB8AA7EC

    Book  Google Scholar 

  14. Kafka, F.: The Trial, April 2005. http://www.gutenberg.org/ebooks/7849

  15. Katz, D.S., McHenry, K.: Super RSEs: combining research and service in three dimensions of Research Software Engineering, July 2019. https://danielskatzblog.wordpress.com/2019/07/12/

  16. Klees, G., Ruef, A., Cooper, B., Wei, S., Hicks, M.: Evaluating fuzz testing. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, Canada, pp. 2123–2138. ACM, New York (2018). https://doi.org/10.1145/3243734.3243804. http://doi.acm.org/10.1145/3243734.3243804

  17. Long, J.J., Teetor, P.: R Cookbook, 2nd edn. (2019). https://rc2e.com/

  18. Marwick, B.: rrtools: creates a reproducible research compendium (2018). https://github.com/benmarwick/rrtools

  19. McBain, M., Carroll, J.: Datapasta: R tools for data copy-pasta (2018). R package version 3.0.0. https://CRAN.R-project.org/package=datapasta

  20. Navarro, D.: Learning statistics with R: A tutorial for psychology students and other beginners. (Version 0.6.1) (2019). https://learningstatisticswithr.com/book/

  21. Parker, H.: Opinionated analysis development. preprint (2017). https://doi.org/10.7287/peerj.preprints.3210v1

  22. Ragkhitwetsagul, C., Krinke, J., Paixao, M., Bianco, G., Oliveto, R.: Toxic code snippets on stack overflow. IEEE Trans. Softw. Eng. 1 (2019). https://doi.org/10.1109/TSE.2019.2900307

  23. Robinson, D., Hayes, A.: broom: convert statistical analysis objects into tidy tibbles (2019). https://CRAN.R-project.org/package=broom

  24. Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, Hoboken (2009)

    MATH  Google Scholar 

  25. skyisup: Deadly Boss Mods Addon Guide (2019). https://www.wowhead.com/deadly-boss-mods-addon-guide

  26. UHS: Universal Hint System: Not your ordinary walkthrough. Just the hints you need (2019). http://www.uhs-hints.com/

  27. Viechtbauer, W.: Conducting meta-analyses in R with the \(<\)span class=”nocase”\(>\)metafor\(<\)/span\(>\) package. J. Stat. Softw. 36(3), 1–48 (2010). http://www.jstatsoft.org/v36/i03/

    Article  Google Scholar 

  28. Wan, X., Wang, W., Liu, J., Tong, T.: Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Med. Res. Methodol. 14(1), 135 (2014). https://doi.org/10.1186/1471-2288-14-135

    Article  Google Scholar 

  29. Wickham, H.: R Packages: Organize, Test, Document, and Share Your Code. O’Reilly Media, Newton (2015). https://books.google.com.au/books?id=DqSxBwAAQBAJ. Bibtex*[lccn=2015472811]

    Google Scholar 

  30. Wickham, H.: Advanced R. Routledge, Boca Raton (2014)

    Book  Google Scholar 

  31. Wickham, H., Bryan, J.: usethis: automate package and project setup (2019). https://CRAN.R-project.org/package=usethis

  32. Wickham, H., Danenberg, P., Eugster, M.: Roxygen2: in-line documentation for R (2019). R package version 6.1.1.9000. https://github.com/klutometis/roxygen

  33. Wilson, G., et al.: Best Practices for scientific computing. PLoS Biol. 12(1), e1001745 (2014). https://doi.org/10.1371/journal.pbio.1001745

    Article  Google Scholar 

  34. Wilson, G., Bryan, J., Cranston, K., Kitzes, J., Nederbragt, L., Teal, T.K.: Good enough practices in scientific computing. PLOS Comput. Biol. 13(6), e1005510 (2017). https://doi.org/10.1371/journal.pcbi.1005510

    Article  Google Scholar 

  35. Wyatt, C.: Research Software Engineers Association (2019). https://rse.ac.uk/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charles T. Gray .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gray, C.T. (2019). code::proof: Prepare for Most Weather Conditions. In: Nguyen, H. (eds) Statistics and Data Science. RSSDS 2019. Communications in Computer and Information Science, vol 1150. Springer, Singapore. https://doi.org/10.1007/978-981-15-1960-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-1960-4_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-1959-8

  • Online ISBN: 978-981-15-1960-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics