Identification of Disease-Related Genes Using a Genome-Wide Association Study Approach

  • Tobias Wohland
  • Dorit SchleinitzEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1706)


Genome-wide association studies (GWAS) provide a hypothesis-free approach to discover genetic variants contributing to the risk of a certain disease or disease-related trait. Ongoing efforts to annotate the human genome have helped to localize disease-causing variants and point to mechanisms by which genetic variants might exert functional effects. By integrating bioinformatics approaches with in vivo and in vitro genomic strategies to predict and subsequently validate the functional roles of GWAS-identified variants, disease-related pathways can be characterized, providing new possibilities for therapeutic intervention. Here, we describe a basic workflow, from sample preparation to data analysis, for performing a GWAS to identify disease genes. We also discuss resources for the annotation and interpretation of GWAS results.

Key words

GWAS Affymetrix Illumina GenABEL SNP annotation 



We would like to cordially thank Peter Kovacs, head of the research group Genetics of Obesity and Diabetes, and our colleagues for their everlasting scientific and personal support.

Funding: Tobias Wohland is funded by the IFB AdiposityDiseases (AD2-6E95). Dorit Schleinitz is funded by the Boehringer Ingelheim Foundation and by a Collaborative Research Center (C1, CRC1052).


  1. 1.
    LaFrambiose T (2009) Single nucleotide polymorphism arrays: a decade of biological, computational and technological advances. Nucleic Acids Res 37:4181–4193CrossRefGoogle Scholar
  2. 2.
    Bush WS, Moore JH (2012) Chapter 11: genome-wide association studies. PLoS Comput Biol 8:e1002822CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Kemper KE, Deatwyler HD, Visscher PM, Goddard ME (2012) Comparing linkage and association analyses in sheep points to a better way of doing GWAS. Genet Res Camb 94:191–203CrossRefPubMedGoogle Scholar
  4. 4.
    Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, Parkinson H (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42(Database issue):D1001–D1006CrossRefPubMedGoogle Scholar
  5. 5.
    Burdett T (EBI), Hall PN (NHGRI), Hastings E (EBI), Hindorff LA (NHGRI), Junkins HA (NHGRI), Klemm AK (NHGRI), MacArthur J (EBI), Manolio TA (NHGRI), Morales J (EBI), Parkinson H (EBI) and Welter D (EBI). The NHGRI-EBI Catalog of published genome-wide association studies. Available at: Accessed November 2016
  6. 6.
    Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gómez-Marín C, Aneas I, Credidio FL, Sobreira DR, Wasserman NF, Lee JH, Puviindran V, Tam D, Shen M, Son JE, Vakili NA, Sung HK, Naranjo S, Acemel RD, Manzanares M, Nagy A, Cox NJ, Hui CC, Gomez-Skarmeta JL, Nóbrega MA (2014) Obesity-associated variants within FTO form long–range functional connections with IRX3. Nature 507:371–375CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Habek M, Brinar VV, Borovecki F (2010) Genes associated with multiple sclerosis: 15 and counting. Expert Rev Mol Diagn 10:857–861CrossRefPubMedGoogle Scholar
  8. 8.
    Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15:335–346CrossRefPubMedGoogle Scholar
  9. 9.
    McCarthy MI et al (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9:356–369CrossRefPubMedGoogle Scholar
  10. 10.
    Faul F, Erdfelder E, Lang AG, Buchner A (2007) G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods 39:175–191CrossRefPubMedGoogle Scholar
  11. 11.
    Faul F, Erdfelder E, Bucher A, Lang AG (2009) Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses. Behav Res Methods 41:1149–1160CrossRefPubMedGoogle Scholar
  12. 12.
    R Core Team (2016) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Accessed September 2016Google Scholar
  13. 13.
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Gogarten SM, Bhangale T, Conomos MP, Laurie CA, McHugh CP, Painter I, Zheng X, Crosslin DR, Levine D, Lumley T, Nelson SC, Rice K, Shen J, Swarnkar R, Weir BS, Laurie CC (2012) GWASTools: an R/bioconductor package for quality control and analysis of genome-wide association studies. Bioinformatics 28:3329–3331CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies via imputation of genotypes. Nat Genet 39:906–913CrossRefPubMedGoogle Scholar
  16. 16.
    Aulchenko YS, Ripke S, Isaacs A, van Duijn CM (2007) GenABEL: an R library for genome-wide association analysis. Bioinformatics 23:1294–1296CrossRefPubMedGoogle Scholar
  17. 17.
    Aulchenko YS, Karssen LC (2015) The GenABEL project developers. The GenABEL Tutorial Zenodo; doi:
  18. 18.
    Nicolazzi EL, Marras G, Stella A (2016) SNPConvert: SNP array standardization and integration in livestock species. Microarrays 5:17CrossRefPubMedCentralGoogle Scholar
  19. 19.
    Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  20. 20.
    Rice TK, Schork NJ, Rao DC (2008) Methods for handling multiple testing. Adv Genet 60:293–308PubMedGoogle Scholar
  21. 21.
    Panagiotou OA, Ioannidis JPA, the Genome-Wide Significance Project (2012) What should the genome-wide significance threshold be? Empirical replication of borderline genetic associations. Int J Epidemiol 41:273–286CrossRefPubMedGoogle Scholar
  22. 22.
    De S, Pedersen BS, Kechris K (2014) The dilemma of choosing the ideal permutation strategy while estimating statistical significance of genome-wide enrichment. Brief Bioinform 15:919–928CrossRefPubMedGoogle Scholar
  23. 23.
    Backes C, Rühle F, Stoll M, Haas J, Frese K, Franke A, Lieb W, Wichmann HE, Weis T, Kloos W, Lenhof HP, Meese E, Katus H, Meder B, Keller A (2014) Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis. BMC Genomics 15:622CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Turner SD (2014) Qqman: an R package for visualizing GWAS results using Q—Q and manhattan plots. biorXiv.
  25. 25.
    Duggal P, Gillanders EM, Holmes TN, Bailey-Wilson JE (2008) Establishing an adjusted p-value threshold to control the family-wide type 1 error in genome wide association studies. BMC Genomics 9:516CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Stephens M, Scheet P (2005) Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet 76:449–462CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Zheng-Bradley X, Flicek P (2016) Applications of the 1000 genomes project resources. Brief Funct Genomics 16(3):163–170. [Epub ahead of print] PMID: 27436001PubMedCentralGoogle Scholar
  28. 28.
    Browning BL, Browning SR (2016) Genotype imputation with millions of reference samples. Am J Hum Genet 98:116–126CrossRefPubMedPubMedCentralGoogle Scholar
  29. 29.
    Aulchenko YS, Struchalin MV, van Duijn CM (2010) ProbABEL package for genome-wide association analysis of imputed data. BMC Bioinformatics 11:134CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Res 38:e164CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Chang X, Wang K (2012) wANNOVAR: annotating genetic variants for personal genomes via the web. J Med Genet 49:433–436CrossRefPubMedPubMedCentralGoogle Scholar
  32. 32.
    Yang H, Wang K (2015) Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc 10:1556–1566CrossRefPubMedPubMedCentralGoogle Scholar
  33. 33.
    McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F (2016) The ensembl variant effect predictor. Genome Biol 17:122CrossRefPubMedPubMedCentralGoogle Scholar
  34. 34.
    Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila Melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6:80–92CrossRefGoogle Scholar
  35. 35.
    Smedley D, Jacobsen JO, Jäger M, Köhler S, Holtgrewe M, Schubach M, Siragusa E, Zemojtel T, Buske OJ, Washington NL, Bone WP, Haendel MA, Robinson PN (2015) Next-generation diagnostics and disease-gene discovery with the exomiser. Nat Protoc 10:2004–2015CrossRefPubMedPubMedCentralGoogle Scholar
  36. 36.
    Cheng YC, Hsiao FC, Yeh EC, Lin WJ, Tang CY, Tseng HC, Wu HT, Liu CK, Chen CC, Chen YT, Yao A (2012) VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era. Nucleic Acids Res 40(Web Server issue):W76–W81CrossRefPubMedPubMedCentralGoogle Scholar
  37. 37.
    Speir ML, Zweig AS, Rosenbloom KR, Raney BJ, Paten B, Nejad P, Lee BT, Learned K, Karolchik D, Hinrichs AS, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Fujita PA, Eisenhart C, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent WJ (2016) The UCSC genome browser database: 2016 update. Nucleic Acids Res 44(D1):D717–D725CrossRefPubMedGoogle Scholar
  38. 38.
    Yates A, Akanni W, Amode MR, Barrell D, Billis K, Carvalho-Silva D, Cummins C, Clapham P, Fitzgerald S, Gil L, Girón CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Keenan S, Lavidas I, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Nuhn M, Parker A, Patricio M, Pignatelli M, Rahtz M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Birney E, Harrow J, Muffato M, Perry E, Ruffier M, Spudich G, Trevanion SJ, Cunningham F, Aken BL, Zerbino DR, Flicek P (2016) Ensembl 2016. Nucleic Acids Res 44(D1):D710–D716CrossRefPubMedGoogle Scholar
  39. 39.
    Shabalin AA (2012) Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28:1353–1358CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Claussnitzer M, Dankel SN, Klocke B, Grallert H, Glunk V, Berulava T, Lee H, Oskolkov N, Fadista J, Ehlers K, Wahl S, Hoffmann C, Qian K, Rönn T, Riess H, Müller-Nurasyid M, Bretschneider N, Schroeder T, Skurk T, Horsthemke B, Spieler D, Klingenspor M, Seifert M, Kern MJ, Mejhert N, Dahlman I, Hansson O, Hauck SM, Blüher M, Arner P, Groop L, Illig T, Suhre K, Hsu YH, Mellgren G, Hauner H, Laumen H, DIAGRAM+Consortium (2014) Leveraging cross-species transcription factor binding site patterns: from diabetes risk loci to disease mechanisms. Cell 156:343–358CrossRefPubMedGoogle Scholar
  41. 41.
    Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88:283–293CrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    Van der Sluis S, Dolan CV, Li J, Song Y, Sham P, Posthuma D, Li MX (2015) MGAS: a powerful tool for multivariate gene-based genome-wide association analysis. Bioinformatics 31:1007–1015CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2018

Authors and Affiliations

  1. 1.IFB AdiposityDiseasesLeipzig University Medical Center, University of Leipzig - Medical FacultyLeipzigGermany
  2. 2.Clinic and Policlinic for Endocrinology and NephrologyLeipzig University Medical CenterLeipzigGermany

Personalised recommendations