Advertisement

Caching and Visualizing Statistical Analyses

  • Roger D. Peng
  • Duncan Temple Lang
Chapter

Abstract

We present the cacher and CodeDepends packages for R, which provide tools for (1) caching and analyzing the code for statistical analyses and (2) distributing these analyses to others in an efficient manner over the Web. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into “cache packages” for distribution over the Web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. The CodeDepends package provides complementary tools for analyzing and visualizing the code for a statistical analysis and this functionality has been integrated into the cacher package. In this chapter, we describe the cacher and CodeDepends packages and provide examples of how they can be used for reproducible research.

Keywords

Source File Metadata File Reproducible Research Statistical Analysis Code Cache Directory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Baggerly K, Morris J, Edmonson S, Coombes K (2005) Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. J Natl Cancer Inst 97:307–309PubMedCrossRefGoogle Scholar
  2. Laine C, Goodman SN, Griswold ME, Sox HC (2007) Reproducible research: moving toward research the public can really trust. Ann Intern Med 146:450–453PubMedGoogle Scholar
  3. Peng RD (2008) Caching and distributing statistical analyses in R. J Stat Softw 26(7):1–24Google Scholar
  4. Peng RD, Dominici F (2008) Statistical methods for environmental epidemiology in R: a case study in air pollution and health. Springer, New YorkGoogle Scholar
  5. Peng RD, Eckel SP (2009) Distributed reproducible research using cached computations. IEEE Comput Sci Eng 11(1):28–34Google Scholar
  6. Samet JM, Dominici F, Curriero F, Coursac I, Zeger SL (2000) Particulate air pollution and mortality: findings from 20 U.S. cities. N Engl J Med 343(24):1742–1757PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of BiostatisticsJohns Hopkins Bloomberg School of Public HealthBaltimoreUSA

Personalised recommendations