Skip to main content
Log in

Hands-on tutorial for parallel computing with R

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Due to the increasing availability of powerful hardware resources, parallel computing is becoming an important issue, as a noticeable speedup may be achieved. The statistical programming language R allows for parallel computing on computer clusters as well as multicore systems through several packages. This tutorial gives a short, practical overview of four, in view of the authors, important packages for parallel computing in R, namely multicore, snow, snowfall and nws. First, the general principle of parallelizing simple tasks is briefly illustrated based on a statistical cross-validation example. Afterwards, the usage of each of the introduced packages is being demonstrated on the example. Furthermore, we address some specific features of the packages and provide guidance for selecting an adequate package for the computing environment at hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bjornson R, Carriero N, Weston S (2007) Python NetWorkSpaces and parallel programs. Dr Dobb’s Journal, pp 1–7. http://www.ddj.com/web-development/200001971

  • Dolkart V, Pronina L (2007) Change in computer hardware and software paradigms. Russian Electr Eng 78(10): 548–553

    Article  Google Scholar 

  • Dongarra, J, Foster, I, Fox, G, Gropp, W, Kennedy, K, Torczon, L, White, A (eds) (2003) Sourcebook of parallel computing. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Eddelbuettel D (2010a) CRAN task view: high-performance and parallel computing. http://cran.r-project.org/web/views/HighPerformanceComputing.htm

  • Eddelbuettel D (2010b) R SIG on high-performance computing. http://www.r-project.org/mail.html

  • Grama A, Karypis G, Kumar V, Gupta A (2003) Introduction to parallel computing. 2nd edn. Addison Wesley, Reading

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Knaus J (2010) Snowfall: easier cluster computing based on snow. http://CRAN.R-project.org/package=snowfall, R package version 1.83

  • Knaus J, Porzelius C, Binder H, Schwarzer G (2009) Easier parallel computing in R with snowfall and sfCluster. R J 1: 54–59

    Google Scholar 

  • R Development Core Team (2009) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org, ISBN 3-900051-07-0

  • REvolution Computing (2008) nws: R functions for NetWorkSpaces and Sleigh. REvolution Computing with support and contributions from Pfizer and Inc. http://nws-r.sourceforge.net/, R package version 1.7.0.0

  • Rossini A, Tierney L, Li NM (2007) Simple parallel statistical computing in R. J Comput Graph Stat 16(2): 399–420

    Article  MathSciNet  Google Scholar 

  • Schmidberger M, Morgan M, Eddelbuettel D, Yu H, Tierney L, Mansmann U (2009) State of the art in parallel computing with R. J Stat Softw 31(1). http://www.jstatsoft.org/v31/i01/

  • Simon R, Radmacher MD, Dobbin K, McShane LM (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J National Cancer Inst 95(1): 14–18

    Article  Google Scholar 

  • Sloan J (2004) High performance linux clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks). O’Reilly Media, Inc. http://www.oreilly.de/catalog/9780596005702/

  • Stevens WR (1992) Advanced programming in the UNIX environment. 1st edn. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Tierney L, Rossini AJ, Li MN, Sevcikova H (2008) Snow: simple network of workstations. http://CRAN.R-project.org/package=snow, R package version 0.3–3

  • Urbanek S (2009) Multicore: parallel processing of R code on machines with multiple cores or CPUs. http://www.RForge.net/multicore/, R package version 0.1-3

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel J. A. Eugster.

Additional information

Manuel J. A. Eugster, Jochen Knaus, Christine Porzelius, Markus Schmidberger, and Esmeralda Vicedo contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eugster, M.J.A., Knaus, J., Porzelius, C. et al. Hands-on tutorial for parallel computing with R. Comput Stat 26, 219–239 (2011). https://doi.org/10.1007/s00180-010-0206-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-010-0206-4

Keywords

Navigation