Cloud Computing for Genome-Wide Association Analysis

  • James W. Baurley
  • Christopher K. Edlund
  • Bens Pardamean
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 144)


With the increasing availability and affordability of genome-wide genotyping and sequencing technologies, biomedical researchers are faced with increasing computational challenges in managing and analyzing large quantities of genetic data. Previously, this data intensive research required computing and personnel resources accessible only to large institutions. Cloud computing allows researchers to analyze their data without a local computing infrastructure. We evaluated the feasibility of cloud computing for association analysis of genome-wide data. Our approach utilized the MapReduce model which divides the analysis into independent units and distributes the work to a computing cloud. We evaluated our approach by modeling the relationships between genetic variants and disease in a simulated genome-wide association study. We generated several data sets of 100,000 subjects and various number of genetic variants, and demonstrated that our analysis approach is scalable and provides an attractive alternative to establishing and maintaining a local computing cluster.


Cloud Computing Cloud Resource MapReduce Model MapReduce Programming Model Virtual Core 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amazon elastic compute cloud,
  2. 2.
    Amazon elastic mapreduce,
  3. 3.
    Amazon elastic mapreduce pricing,
  4. 4.
    Amazon simple storage service,
  5. 5.
    Amazon web services,
  6. 6.
  7. 7.
  8. 8.
    Dean, J.: Mapreduce: Simplified data processing on large clusters. Usenix SDI (2004),
  9. 9.
    Mell, P.: The nist definition of cloud computing. National Institute of Standards and Technology (2009),
  10. 10.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sklar, P., de Bakker, P.I.W., Daly, M.J., Sham, P.C.: Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007), doi:10.1086/519795Google Scholar
  11. 11.
    R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2010) ISBN 3-900051-07-0,

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2012

Authors and Affiliations

  • James W. Baurley
    • 1
  • Christopher K. Edlund
    • 2
  • Bens Pardamean
    • 3
  1. 1.BiorealmLos AngelesUSA
  2. 2.University of Southern CaliforniaLos AngelesUSA
  3. 3.Bina Nusantara UniversityJakartaIndonesia

Personalised recommendations