Abstract
Many traits, such as height, the response to a given drug, or the susceptibility to certain diseases are presumably co-determined by genetics. Especially in the field of medicine, it is of major interest to identify genetic aberrations that alter an individual’s risk to develop a certain phenotypic trait. Addressing this question requires the availability of comprehensive, high-quality genetic datasets. The technological advancements and the decreasing cost of genotyping in the last decade led to an increase in such datasets. Parallel to and in line with this technological progress, an analysis framework under the name of genome-wide association studies was developed to properly collect and analyze these data. Genome-wide association studies aim at finding statistical dependencies—or associations—between a trait of interest and point-mutations in the DNA. The statistical models used to detect such associations are diverse, spanning the whole range from the frequentist to the Bayesian setting.
Since genetic datasets are inherently high-dimensional, the search for associations poses not only a statistical but also a computational challenge. As a result, a variety of toolboxes and software packages have been developed, each implementing different statistical methods while using various optimizations and mathematical techniques to enhance the computations.
This chapter is devoted to the discussion of widely used methods and tools in genome-wide association studies. We present the different statistical models and the assumptions on which they are based, explain peculiarities of the data that have to be accounted for and, most importantly, introduce commonly used tools and software packages for the different tasks in a genome-wide association study, complemented with examples for their application.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
MacDonald ME, Novelletto A, Lin C et al (1992) The Huntington’s disease candidate region exhibits many different haplotypes. Nat Genet 1:99–103
Kerem B-S (1989) Identification of the cystic fibrosis gene: genetic analysis. Trends Genet 5:363
Bush WS, Moore JH (2012) Chapter 11: Genome-wide association studies. PLoS Comput Biol 8:e1002822
Visscher PM, Brown MA, McCarthy MI et al (2012) Five years of GWAS discovery. Am J Hum Genet 90:7–24
Altshuler D, Daly MJ, Lander ES (2008) Genetic mapping in human disease. Science 322:881–888
The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68–74
Gibbs RA, Belmont JW, Hardenbol P et al (2003) The international HapMap project. Nature 426:789–796
Davey JW, Hohenlohe PA, Etter PD et al (2011) Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Rev Genet 12:499–510
Fan J-B, Chee MS, Gunderson KL (2006) Highly parallel genomic assays. Nat Rev Genet 7:632–644
Dudoit S, van der Laan MJ (2008) Multiple hypothesis testing. In: Multiple testing procedures with applications to genomics. Springer, New York, NY, pp 1–47
Fairweather D, Frisancho-Kiss S, Rose NR (2008) Sex differences in autoimmune disease from a pathological perspective. Am J Pathol 173:600–609
Price AL, Patterson NJ, Plenge RM et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909
Atwell S, Huang YS, Vilhjálmsson BJ et al (2010) Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 465:627–631
Alonso-Blanco C, Andrade J, Becker C et al (2016) 1,135 Genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491
Zhao K, Tung C-W, Eizenga GC et al (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467
Mackay TFC, Richards S, Stone EA et al (2012) The Drosophila melanogaster genetic reference panel. Nature 482:173–178
Kirby A, Kang HM, Wade CM et al (2010) Fine mapping in 94 inbred mouse strains using a high-density haplotype resource. Genetics 185:1081–1095
Freilinger T, Anttila V, de Vries B et al (2012) Genome-wide assoiation analysis identifies susceptibility loci for migraine without aura. Nat Genet 44:777–782
Manolio TA, Collins FS, Cox NJ et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753
Lee SH, Wray NR, Goddard ME et al (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88:294–305
Pedroso I, Breen G (2011) Gene set analysis and network analysis for genome-wide association studies. Cold Spring Harb Protoc 2011:pdb.top065581
Cordell HJ (2002) Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 11:2463–2468
Kam-Thong T, Czamara D, Tsuda K et al (2011) EPIBLASTER-fast exhaustive two-locus epistasis detection strategy using graphical processing units. Eur J Hum Genet 19:465–471
Kam-Thong T, Azencott C-A, Cayton L et al (2012) GLIDE: GPU-based linear regression for detection of epistasis. Hum Hered 73:220–236
Liu JZ, Mcrae AF, Nyholt DR et al (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87:139–145
Lamparter D, Marbach D, Rueedi R et al (2016) Fast and rigorous computation of gene and pathway scores from SNP-based summary statistics. PLoS Comput Biol 12:e1004714
Jia P, Zheng S, Long J et al (2011) dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics (Oxford, England) 27:95–102
Rossin EJ, Lage K, Raychaudhuri S et al (2011) Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet 7:e1001273
Azencott C-A, Grimm D, Sugiyama M et al (2013) Efficient network-guided multi-locus association mapping with graph cuts. Bioinformatics 29:i171–i179
Wang Q, Yu H, Zhao Z et al (2015) EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles. Bioinformatics (Oxford, England). 31:2591–2594
Llinares-López F, Grimm DG, Bodenham DA et al (2015) Genome-wide detection of intervals of genetic heterogeneity associated with complex traits. Bioinformatics 31:i240–i249
Buzdugan L, Kalisch M, Navarro A et al (2016) Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32:1990–2000
Matsuzaki H, Dong S, Loi H et al (2004) Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat Methods 1:109–111
Clarke GM, Anderson CA, Pettersson FH et al (2011) Basic statistical analysis in genetic case-control studies. Nat Protoc 6:121–133
Plomin R, Haworth CMA, Davis OSP (2009) Common disorders are quantitative traits. Nat Rev Genet 10:872–878
Power RA, Parkhill J, de Oliveira T (2017) Microbial genome-wide association studies: lessons from human GWAS. Nat Rev Genet 18:41–50
Wu MC, Lee S, Cai T et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93
Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34:188–193
Neale BM, Rivas MA, Voight BF et al (2011) Testing for an unusual distribution of rare variants. PLoS Genet 7:e1001322
Anderson CA, Pettersson FH, Clarke GM et al (2010) Data quality control in genetic case-control association studies. Nat Protoc 5:1564–1573
Marchini J, Howie B (2010) Genotype imputation for genome-wide association studies. Nat Rev Genet 11:499–511
Fisher RA (1925) Statistical methods for research workers. Genesis Publishing Pvt Ltd., Edinburgh
Pearson K (1900) X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5(50):157–175
Fahrmeir L, Kneib T, Lang S et al (2013) Regression: models, methods and applications. Springer Science & Business Media, New York, NY
Yang J, Zaitlen NA, Goddard ME et al (2014) Advantages and pitfalls in the application of mixed-model association methods. Nat Genet 46:100–106
Loh P-R, Tucker G, Bulik-Sullivan BK et al (2015) Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat Genet 47:284–290
Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004
Yang J, Weedon MN, Purcell S et al (2011) Genomic inflation factors under polygenic inheritance. Eur J Hum Genet 19:807–812
Devlin B, Roeder K, Wasserman L (2001) Genomic control, a new approach to genetic-based association studies. Theor Popul Biol 60:155–166
Lippert C, Listgarten J, Liu Y et al (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8:833–835
Widmer C, Lippert C, Weissbrod O et al (2014) Further improvements to linear mixed models for genome-wide association studies. Sci Rep 4:6874
Kang HM, Zaitlen NA, Wade CM et al (2008) Efficient control of population structure in model organism association mapping. Genetics 178:1709–1723
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824
Veyrieras J-B, Kudaravalli S, Kim SY et al (2008) High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 4:e1000214
Mooney MA, Nigg JT, McWeeney SK et al (2014) Functional and genomic context in pathway analysis of GWAS data. Trends Genet 30:390–400
Sedeño-Cortés AE, Pavlidis P (2014) Pitfalls in the application of gene-set analysis to genetics studies. Trends Genet 30:513–514
Ballard DH, Cho J, Zhao H (2010) Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 34:201–212
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575
Listgarten J, Lippert C, Kang EY et al (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics 29:1526–1533
Zuk O, Hechter E, Sunyaev SR et al (2012) The mystery of missing heritability: genetic interactions create phantom heritability. Proc Natl Acad Sci 109:1193–1198
Ueki M, Cordell HJ (2012) Improved statistics for genome-wide interaction analysis. PLoS Genet 8:e1002625
Szklarczyk D, Franceschini A, Kuhn M et al (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39:D561–D568
Franceschini A, Szklarczyk D, Frankild S et al (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41:D808–D815
Li T, Wernersson R, Hansen RB et al (2017) A scored human protein-protein interaction network to catalyze genomic interpretation. Nat Methods 14:61–64
Johnson RC, Nelson GW, Troyer JL et al (2010) Accounting for multiple comparisons in a genome-wide association study (GWAS). BMC Genomics 11:724
Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle probabilita. Libreria internazionale Seeber, Firenze
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B Methodol 57:289–300
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci 100:9440–9445
Thompson JR, Attia J, Minelli C (2011) The meta-analysis of genome-wide association studies. Brief Bioinform 12:259–269
Evangelou E, Ioannidis JPA (2013) Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 14:379–389
Stouffer SA, Suchman EA, DeVinney LC et al (1949) The American soldier: adjustment during army life. In: Studies in social psychology in World War II, vol 1. Princeton University Press, Princeton, NJ
Borenstein M, Hedges LV, Higgins JPT et al (2010) A basic introduction to fixed-effect and random-effects models for meta-analysis. Res Syn Methods 1:97–111
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108
Yang J, Lee SH, Goddard ME et al (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82
Kang HM, Sul JH, S.K. Service et al (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42:348–354
Svishcheva GR, Axenovich TI, Belonogova NM et al (2012) Rapid variance components-based method for whole-genome association analysis. Nat Genet 44:1166–1170
de Leeuw CA, Mooij JM, Heskes T et al (2015) MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput Biol 11:e1004219
Childs LH, Lisec J, Walther D (2012) Matapax: an online high-throughput genome-wide association study pipeline. Plant Physiol 158:1534–1541
Seren Ü, Vilhjálmsson BJ, Horton MW et al (2012) GWAPP: a web application for genome-wide association mapping in Arabidopsis. Plant Cell 24:4793–4805
Grimm DG, Roqueiro D, Salome P et al (2017) easyGWAS: a cloud-based platform for comparing the results of genome-wide association studies. Plant Cell 29:5
Galinsky KJ, Bhatia G, Loh P-R et al (2016) Fast principal-component analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am J Hum Genet 98:456–472
Cormen TH, Leiserson CE, Rivest RL et al (2009) Introduction to algorithms. MIT Press, Cambridge, MA
Llinares-López, Papaxanthos L, Bodenham D, Roqueiro D (2017) COPDGene Investigators, Karsten Borgwardt; Genome-wide genetic heterogeneity discovery with categorical covariates. Bioinformatics 33(2): 1820--1828
Papaxanthos L, Llinares-Lopez F, Bodenham D et al (2016) Finding significant combinations of features in the presence of categorical covariates. In: Lee DD, Sugiyama M, Luxburg UV et al (eds) Advances in neural information processing systems, vol 29. Curran Associates, Inc, Red Hook, NY, pp 2271–2279
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83
Seren Ü, Grimm D, Fitz J et al (2017) AraPheno: a public database for Arabidopsis thaliana phenotypes. Nucleic Acids Res 45:D1054–D1059
McGaughran A, Rödelsperger C, Grimm DG et al (2016) Genomic profiles of diversification and genotype-phenotype association in Island nematode lineages. Mol Biol Evol 33:2257–2272
Easton DF, Eeles RA (2008) Genome-wide association studies in cancer. Hum Mol Genet 17:R109–R115
Kraft P, Hunter DJ (2009) Genetic risk prediction--are we there yet? N Engl J Med 360:1701–1703
Couzin J (2008) DNA test for breast cancer risk draws criticism. Science 322:357–357
Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature 511:421–427
Fuchsberger C, Flannick J, Teslovich TM et al (2016) The genetic architecture of type 2 diabetes. Nature 536:41–47
Welter D, MacArthur J, Morales J et al (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006
T. Burdett, P.N. Hall, E. Hastings, et al. The NHGRI-EBI catalog of published genome-wide association studies. www.ebi.ac.uk/gwas.
Gusev A, Bhatia G, Zaitlen N et al (2013) Quantifying missing heritability at known GWAS loci. PLoS Genet 9:e1003993
Bergen SE, Petryshen TL (2012) Genome-wide association studies (GWAS) of schizophrenia: does bigger lead to better results? Curr Opin Psychiatry 25:76–82
O’Donovan MC, Craddock N, Norton N et al (2008) Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nat Genet 40:1053–1055
Williams HJ, Norton N, Dwyer S et al (2011) Fine mapping of ZNF804A and genome wide significant evidence for its involvement in schizophrenia and bipolar disorder. Mol Psychiatry 16:429–441
Richardson WC, Berwick DM, Bisgard J et al (2001) Crossing the quality chasm: a new health system for the 21st century. Institute of Medicine, National Academy Press, Washington, DC
Manolio TA (2013) Bringing genome-wide association findings into clinical use. Nat Rev Genet 14:549–558
Lencz T, Malhotra AK (2015) Targeting the schizophrenia genome: a fast track strategy from GWAS to clinic. Mol Psychiatry 20:820–826
Chan SL, Jin S, Loh M et al (2015) Progress in understanding the genomic basis for adverse drug reactions: a comprehensive review and focus on the role of ethnicity. Pharmacogenomics 16:1161–1178
Huang W, Massouras A, Inoue Y et al (2014) Natural variation in genome architecture among 205 Drosophila melanogaster genetic reference panel lines. Genome Res 24:1193–1208
Andersen EC, Gerke JP, Shapiro JA et al (2012) Chromosome-scale selective sweeps shape Caenorhabditis elegans genomic diversity. Nat Genet 44:285–290
Farber CR, Bennett BJ, Orozco L et al (2011) Mouse genome-wide association and systems genetics identify Asxl2 as a regulator of bone mineral density and osteoclastogenesis. PLoS Genet 7:e1002038
Hayward JJ, Castelhano MG, Oliveira KC et al (2016) Complex disease and phenotype mapping in the domestic dog. Nat Commun 7:10460
Tang R, Noh HJ, Wang D et al (2014) Candidate genes and functional noncoding variants identified in a canine model of obsessive-compulsive disorder. Genome Biol 15:R25
Flint J, Eskin E (2012) Genome-wide association studies in mice. Nat Rev Genet 13:807–817
Li H, Peng Z, Yang X et al (2013) Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat Genet 45:43–50
Lin T, Zhu G, Zhang J et al (2014) Genomic analyses provide insights into the history of tomato breeding. Nat Genet 46:1220–1226
Nicolas SD, Péros J-P, Lacombe T et al (2016) Genetic diversity, linkage disequilibrium and power of a large grapevine (Vitis vinifera L) diversity panel newly designed for association studies. BMC Plant Biol 16:74
Huang X, Han B (2014) Natural variations and genome-wide association studies in crop plants. Annu Rev Plant Biol 65:531–551
Sharma A, Lee JS, Dang CG et al (2015) Stories and challenges of genome wide association studies in livestock — a review. Asian Australas J Anim Sci 28: 1371–1379
Llinares-López F, Sugiyama M, Papaxanthos L et al (2015) Fast and memory-efficient significant pattern mining via permutation testing. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Sydney, NSW, pp 725–734
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Gumpinger, A.C., Roqueiro, D., Grimm, D.G., Borgwardt, K.M. (2018). Methods and Tools in Genome-wide Association Studies. In: von Stechow, L., Santos Delgado, A. (eds) Computational Cell Biology. Methods in Molecular Biology, vol 1819. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8618-7_5
Download citation
DOI: https://doi.org/10.1007/978-1-4939-8618-7_5
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-8617-0
Online ISBN: 978-1-4939-8618-7
eBook Packages: Springer Protocols