Advertisement

Computational Statistics

, Volume 26, Issue 2, pp 219–239 | Cite as

Hands-on tutorial for parallel computing with R

  • Manuel J. A. Eugster
  • Jochen Knaus
  • Christine Porzelius
  • Markus Schmidberger
  • Esmeralda Vicedo
Original Paper

Abstract

Due to the increasing availability of powerful hardware resources, parallel computing is becoming an important issue, as a noticeable speedup may be achieved. The statistical programming language R allows for parallel computing on computer clusters as well as multicore systems through several packages. This tutorial gives a short, practical overview of four, in view of the authors, important packages for parallel computing in R, namely multicore, snow, snowfall and nws. First, the general principle of parallelizing simple tasks is briefly illustrated based on a statistical cross-validation example. Afterwards, the usage of each of the introduced packages is being demonstrated on the example. Furthermore, we address some specific features of the packages and provide guidance for selecting an adequate package for the computing environment at hand.

Keywords

Parallel computing Multicore Snow Snowfall nws 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bjornson R, Carriero N, Weston S (2007) Python NetWorkSpaces and parallel programs. Dr Dobb’s Journal, pp 1–7. http://www.ddj.com/web-development/200001971
  2. Dolkart V, Pronina L (2007) Change in computer hardware and software paradigms. Russian Electr Eng 78(10): 548–553CrossRefGoogle Scholar
  3. Dongarra, J, Foster, I, Fox, G, Gropp, W, Kennedy, K, Torczon, L, White, A (eds) (2003) Sourcebook of parallel computing. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  4. Eddelbuettel D (2010a) CRAN task view: high-performance and parallel computing. http://cran.r-project.org/web/views/HighPerformanceComputing.htm
  5. Eddelbuettel D (2010b) R SIG on high-performance computing. http://www.r-project.org/mail.html
  6. Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing. 2nd edn. Addison Wesley, ReadingGoogle Scholar
  7. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  8. Knaus J (2010) Snowfall: easier cluster computing based on snow. http://CRAN.R-project.org/package=snowfall, R package version 1.83
  9. Knaus J, Porzelius C, Binder H, Schwarzer G (2009) Easier parallel computing in R with snowfall and sfCluster. R J 1: 54–59Google Scholar
  10. R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org, ISBN 3-900051-07-0
  11. REvolution Computing (2008) nws: R functions for NetWorkSpaces and Sleigh. REvolution Computing with support and contributions from Pfizer and Inc. http://nws-r.sourceforge.net/, R package version 1.7.0.0
  12. Rossini A, Tierney L, Li NM (2007) Simple parallel statistical computing in R. J Comput Graph Stat 16(2): 399–420MathSciNetCrossRefGoogle Scholar
  13. Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009) State of the art in parallel computing with R. J Stat Softw 31(1). http://www.jstatsoft.org/v31/i01/
  14. Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J National Cancer Inst 95(1): 14–18CrossRefGoogle Scholar
  15. Sloan J (2004) High performance linux clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks). O’Reilly Media, Inc. http://www.oreilly.de/catalog/9780596005702/
  16. Stevens WR (1992) Advanced programming in the UNIX environment. 1st edn. Addison-Wesley, ReadingzbMATHGoogle Scholar
  17. Tierney L, Rossini AJ, Li MN, Sevcikova H (2008) Snow: simple network of workstations. http://CRAN.R-project.org/package=snow, R package version 0.3–3
  18. Urbanek S (2009) Multicore: parallel processing of R code on machines with multiple cores or CPUs. http://www.RForge.net/multicore/, R package version 0.1-3

Copyright information

© Springer-Verlag 2010

Authors and Affiliations

  • Manuel J. A. Eugster
    • 1
  • Jochen Knaus
    • 2
  • Christine Porzelius
    • 2
  • Markus Schmidberger
    • 3
  • Esmeralda Vicedo
    • 3
  1. 1.Department of StatisticsLudwig-Maximilians-Universität MünchenMunichGermany
  2. 2.Institute of Medical Biometry and Medical InformaticsUniversity Medical Center FreiburgFreiburgGermany
  3. 3.Division of Biometrics and BioinformaticsLudwig-Maximilians-Universität MünchenMunichGermany

Personalised recommendations