Abstract
Genome-wide association studies (GWAS) have had great success in identifying common genetic determinants of disease. One of the challenges posed by GWAS is the analysis of the large amount of data generated. This review aims to provide the non-geneticists with an overview of the different steps entailed in analysis of GWAS data, with an emphasis on popular bioinformatics tools available. GWAS data generation, analysis, and interpretation will be covered.
Similar content being viewed by others
References
Korn, J. M., et al. (2008). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics, 40(10), 1253–1260.
Rabbee, N., & Speed, T. P. (2006). A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics, 22(1), 7–12.
McCarroll, S. A., et al. (2008). Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genetics, 40(10), 1166–1174.
Purcell, S., et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 81(3), 559–575.
Wigginton, J. E., Cutler, D. J., & Abecasis, G. R. (2005). A note on exact tests of Hardy–Weinberg equilibrium. American Journal of Human Genetics, 76(5), 887–893.
Cox, D. G., & Kraft, P. (2006). Quantification of the power of Hardy–Weinberg equilibrium testing to detect genotyping error. Human Heredity, 61(1), 10–14.
Pe'er, I., et al. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology, 32(4), 381–385.
Aulchenko, Y. S., et al. (2007). GenABEL: an R library for genome-wide association analysis. Bioinformatics, 23(10), 1294–1296.
Gonzalez, J. R., et al. (2007). SNPassoc: an R package to perform whole genome association studies. Bioinformatics, 23(5), 644–645.
Soranzo, N., et al. (2009). A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nature Genetics, 41(11), 1182–1190.
Rivadeneira, F., et al. (2009). Twenty bone-mineral-density loci identified by large-scale meta-analysis of genome-wide association studies. Nature Genetics, 41(11), 1199–1206.
Benjamin, E. J., et al. (2009). Variants in ZFHX3 are associated with atrial fibrillation in individuals of European ancestry. Nature Genetics, 41(8), 879–881.
Hindorff, L. A., et al. (2009). Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America, 106(23), 9362–9367.
Zeggini, E., et al. (2008). Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genetics, 40(5), 638–645.
Browning, B. L., & Browning, S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. American Journal of Human Genetics, 84(2), 210–223.
Guan, Y., & Stephens, M. (2008). Practical issues in imputation-based association mapping. PLoS Genet, 4(12), e1000279.
Howie, B. N., Donnelly, P., & Marchini, J. (2009). A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet, 5(6), e1000529.
International HapMap Consortium, A haplotype map of the human genome. (2005). Nature, 437(7063), 1299-1320.
Frazer, K. A., et al. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature, 449(7164), 851–861.
Clayton, D. G., et al. (2005). Population structure, differential bias and genomic control in a large-scale, case-control association study. Nature Genetics, 37(11), 1243–1246.
Zeggini, E., & Ioannidis, J. P. (2009). Meta-analysis in genome-wide association studies. Pharmacogenomics, 10(2), 191–201.
de Bakker, P. I., et al. (2008). Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Human Molecular Genetics, 17(R2), R122–R128.
Pereira, T. V., et al. (2009). Discovery properties of genome-wide association signals from cumulatively combined data sets. American Journal of Epidemiology, 170(10), 1197–1206.
Ioannidis, J. P., Patsopoulos, N. A., & Evangelou, E. (2007). Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE, 2(9), e841.
Price, A. L., et al. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38(8), 904–909.
Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genet, 2(12), e190.
Ramensky, V., Bork, P., & Sunyaev, S. (2002). Human non-synonymous SNPs: server and survey. Nucleic Acids Research, 30(17), 3894–3900.
Ng, P. C., & Henikoff, S. (2001). Predicting deleterious amino acid substitutions. Genome Research, 11(5), 863–874.
Chasman, D., & Adams, R. M. (2001). Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. Journal of Molecular Biology, 307(2), 683–706.
Wang, E. T., et al. (2008). Alternative isoform regulation in human tissue transcriptomes. Nature, 456(7221), 470–476.
Lopez-Bigas, N., et al. (2005). Are splicing mutations the most frequent cause of hereditary disease? FEBS Letters, 579(9), 1900–1903.
Cooper, T. A., Wan, L., & Dreyfuss, G. (2009). RNA and disease. Cell, 136(4), 777–793.
Lim, L. P., & Burge, C. B. (2001). A computational analysis of sequence features involved in recognition of short introns. Proceedings of the National Academy of Sciences of the United States of America, 98(20), 11193–11198.
Wang, Z., & Burge, C. B. (2008). Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA, 14(5), 802–813.
Wang, J., et al. (2005). Distribution of SR protein exonic splicing enhancer motifs in human protein-coding genes. Nucleic Acids Research, 33(16), 5053–5062.
Wang, Z., et al. (2004). Systematic identification and analysis of exonic splicing silencers. Cell, 119(6), 831–845.
Ge, B., et al. (2009). Global patterns of cis variation in human cells revealed by high-density allelic expression analysis. Nature Genetics, 41(11), 1216–1222.
Pastinen, T., et al. (2004). A survey of genetic and epigenetic variation affecting human gene expression. Physiol Genomics, 16(2), 184–193.
Emilsson, V., et al. (2008). Genetics of gene expression and its effect on disease. Nature, 452(7186), 423–428.
Schadt, E. E., et al. (2008). Mapping the genetic architecture of gene expression in human liver. PLoS Biology, 6(5), e107.
Matys, V., et al. (2003). TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Research, 31(1), 374–378.
Barrett, J. C., et al. (2005). Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics, 21(2), 263–265.
Chen, W., Liang, L., & Abecasis, G. R. (2009). GWAS GUI: graphical browser for the results of whole-genome association studies with high-dimensional phenotypes. Bioinformatics, 25(2), 284–285.
Johnson, A. D., et al. (2008). SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics, 24(24), 2938–2939.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pare, G. Genome-Wide Association Studies—Data Generation, Storage, Interpretation, and Bioinformatics. J. of Cardiovasc. Trans. Res. 3, 183–188 (2010). https://doi.org/10.1007/s12265-010-9181-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12265-010-9181-y