Incorporating Prior Knowledge to Increase the Power of Genome-Wide Association Studies

  • Ashley Petersen
  • Justin Spratt
  • Nathan L. Tintle
Part of the Methods in Molecular Biology book series (MIMB, volume 1019)


Typical methods of analyzing genome-wide single nucleotide variant (SNV) data in cases and controls involve testing each variant’s genotypes separately for phenotype association, and then using a substantial multiple-testing penalty to minimize the rate of false positives. This approach, however, can result in low power for modestly associated SNVs. Furthermore, simply looking at the most associated SNVs may not directly yield biological insights about disease etiology. SNVset methods attempt to address both limitations of the traditional approach by testing biologically meaningful sets of SNVs (e.g., genes or pathways). The number of tests run in a SNVset analysis is typically much lower (hundreds or thousands instead of millions) than in a traditional analysis, so the false-positive rate is lower. Additionally, by testing SNVsets that are biologically meaningful finding a significant set may more quickly yield insights into disease etiology.

In this chapter we summarize the short history of SNVset testing and provide an overview of the many recently proposed methods. Furthermore, we provide detailed step-by-step instructions on how to perform a SNVset analysis, including a substantial number of practical tips and questions that researchers should consider before undertaking a SNVset analysis. Lastly, we describe a companion R package (snvset) that implements recently proposed SNVset methods. While SNVset testing is a new approach, with many new methods still being developed and many open questions, the promise of the approach is worth serious consideration when considering analytic methods for GWAS.

Key words

SNVset SNP-set Gene set Pathway GWAS 


  1. 1.
    Hindorff LA, Sethupathy P, Junkins HA et al (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci 106:9362–9367PubMedCrossRefGoogle Scholar
  2. 2.
    Visscher P, Brown MA, McCarthy M et al (2012) Five years of GWAS discovery. Am J Hum Genet 90:7–24PubMedCrossRefGoogle Scholar
  3. 3.
    Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102:15545–15550PubMedCrossRefGoogle Scholar
  4. 4.
    Dinu I, Potter JD, Mueller T et al (2007) Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 8:242PubMedCrossRefGoogle Scholar
  5. 5.
    Tian L, Greenberg SA, Kong SW et al (2005) Discovering statistically significant pathways in expression profile studies. Proc Natl Acad Sci 102:13544–13549PubMedCrossRefGoogle Scholar
  6. 6.
    Efron B, Tibshirani R (2007) On testing the significance of sets of genes. Ann Appl Stat 1:107–129CrossRefGoogle Scholar
  7. 7.
    Tintle NL, Best AA, DeJongh M et al (2008) Gene set analyses for interpreting microarray experiments on prokaryotic organisms. BMC Bioinformatics 9:469PubMedCrossRefGoogle Scholar
  8. 8.
    Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81:1278–1283PubMedCrossRefGoogle Scholar
  9. 9.
    Ala-Korpela M, Kangas AJ, Inouye M (2011) Genome-wide association studies and systems biology: together at last. Trends Genet 27(12):493–498PubMedCrossRefGoogle Scholar
  10. 10.
    Wang K, Li M, Hakonarson H (2010) Analysing biological pathways in genome-wide association studies. Nat Rev Genet 11(12):843–854PubMedCrossRefGoogle Scholar
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
    Lopes MC, Joyce C, Ritchie GRS et al (2011) A combined functional annotations score of non-synonymous variants. Hum Hered 73:47–51CrossRefGoogle Scholar
  23. 23.
    Petersen A, Alvarez C, DeClaire S, Tintle NL (2013) Assessing methods for assigning SNPs to genes in gene-based testes of association using common variants. PLoS One. In pressGoogle Scholar
  24. 24.
    Elbers CC, van Eijk KR, Franke L (2009) Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet Epidemiol 33:419–431PubMedCrossRefGoogle Scholar
  25. 25.
    Torkamani A, Topol E, Schork N (2008) Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92(5):265–272PubMedCrossRefGoogle Scholar
  26. 26.
    Medina I, Motaner D, Bonifaci N et al (2009) Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res 37:W340–W344PubMedCrossRefGoogle Scholar
  27. 27.
    Holmans P, Green E, Pahwa JS et al (2009) Gene ontology analysis of GWA data sets provides insights into the biology of bipolar disorder. Am J Hum Genet 85:13–24PubMedCrossRefGoogle Scholar
  28. 28.
    Holden M, Deng S, Wojnowski L et al (2008) GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24(23):2784–2785PubMedCrossRefGoogle Scholar
  29. 29.
    Nam D, Kim J, Kim S et al (2010) GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucleic Acids Res 38:W749–W754PubMedCrossRefGoogle Scholar
  30. 30.
    Zhang K, Cui S, Chang S et al (2010) i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res 38:W90–W95PubMedCrossRefGoogle Scholar
  31. 31.
    Yaspan BL, Bush WS, Torstenson ES et al (2011) Genetic analysis of biological pathway data through genomic randomization. Hum Genet 129:563–571PubMedCrossRefGoogle Scholar
  32. 32.
    Jia P, Wang L, Meltzer HY et al (2011) Pathway-based analysis of GWAS datasets: effective but caution required. Int J Neuropsychopharmacol 14:567–572PubMedCrossRefGoogle Scholar
  33. 33.
    Li M, Gui H, Kwan J et al (2011) GATES: a rapid and powerful gene-based association test using extended simes procedure. Am J Hum Genet 88:283–293PubMedCrossRefGoogle Scholar
  34. 34.
    Liu JZ, Mcrae AF, Nyholt DR et al (2010) A versatile gene-based test for genome-wide association studies. Am J Hum Genet 87:139–145PubMedCrossRefGoogle Scholar
  35. 35.
    Luo L, Peng G, Zhu Y et al (2010) Genome-wide gene and pathway analysis. Eur J Hum Genet 18:1045–1053PubMedCrossRefGoogle Scholar
  36. 36.
    Wang L, Jia P, Wolfinger RD et al (2011) An efficient hierarchical generalized linear model for pathway analysis of genome-wide association studies. Bioinformatics 27(5):686–692PubMedCrossRefGoogle Scholar
  37. 37.
    Gauderman WJ, Murcray C, Gilliland F et al (2007) Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol 31:383–395PubMedCrossRefGoogle Scholar
  38. 38.
    Wu MC, Kraft P, Epstein MP et al (2010) Powerful SNP-set analysis for case-control genome-wide association studies. Am J Hum Genet 86:929–942PubMedCrossRefGoogle Scholar
  39. 39.
    Chen LS, Hutter CM, Potter JD et al (2010) Insights into colon cancer etiology using a regularized approach to gene set analysis of GWAS data. Am J Hum Genet 86:860–871PubMedCrossRefGoogle Scholar
  40. 40.
    Schwender H, Ruczinski I (2011) Testing SNPs and sets of SNPs for importance in association studies. Biostatistics 12:18–32PubMedCrossRefGoogle Scholar
  41. 41.
    O’Dushlaine C, Kenny E, Heron E et al (2009) The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 25(20):2762–2763PubMedCrossRefGoogle Scholar
  42. 42.
    D’Addabbo A, Palmieri O, Latiano A et al (2011) RS-SNP: a random-set method for genome-wide association studies. BMC Genet 12:166CrossRefGoogle Scholar
  43. 43.
    Weng L, Macciardi F, Subramanian A et al (2011) SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinformatics 12:99PubMedCrossRefGoogle Scholar
  44. 44.
    Huang H, Chanda P, Alonso A et al (2011) Gene-based tests of association. PLoS Genet 7(7):e1002177PubMedCrossRefGoogle Scholar
  45. 45.
    Kwee L, Liu D, Lin X et al (2008) A powerful and flexible multilocus association test of quantitative traits. Am J Hum Genet 82:386–397PubMedCrossRefGoogle Scholar
  46. 46.
    Braun R, Buetow K (2011) Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet 7:e1002101PubMedCrossRefGoogle Scholar
  47. 47.
    Tang CS, Ferreira MAR (2012) A gene-based test of association using canonical correlation analysis. Bioinformatics 28(6):845–850Google Scholar
  48. 48.
    Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 81:559–575PubMedCrossRefGoogle Scholar
  49. 49.
    Ballard D, Abraham C, Cho J et al (2010) Pathway analysis comparison using Crohn’s disease genome wide association studies. BMC Med Genet 3:25Google Scholar
  50. 50.
    Ballard DH, Aporntewan C, Lee JY et al (2009) A pathway analysis to genetic analysis workshop 16 genome-wide rheumatoid arthritis data. BMC Proc 3(Suppl 7):S91PubMedCrossRefGoogle Scholar
  51. 51.
    Ballard DH, Cho J, Zhao H (2010) Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 34:201–212PubMedCrossRefGoogle Scholar
  52. 52.
    Chai HS, Sicotte H, Bailey KR et al (2009) GLOSSI: a method to assess the association of genetic loci-sets with complex diseases. BMC Bioinformatics 10:102PubMedCrossRefGoogle Scholar
  53. 53.
    Chasman DI (2008) On the utility of gene set methods in genome wide association studies of quantitative traits. Genet Epidemiol 32:658–668PubMedCrossRefGoogle Scholar
  54. 54.
    Chen L, Zhang L, Zhao Y et al (2009) Prioritizing risk pathways: a novel association approach to searching for disease pathways fusing SNPs and pathways. Bioinformatics 25(2):237–242PubMedCrossRefGoogle Scholar
  55. 55.
    Chen M, Cho J, Zhao H (2011) Incorporating biological pathways via a markov random field model in genome-wide association studies. PLoS Genet 7(4):e1001353PubMedCrossRefGoogle Scholar
  56. 56.
    De la Cruz O, Wen X, Ke B et al (2010) Gene, region and pathway level analyses in whole-genome studies. Genet Epidemiol 34:222–231PubMedGoogle Scholar
  57. 57.
    Gao Q, He Y, Yuan Z et al (2011) Gene- or region-based association study via kernel principal component analysis. BMC Genet 12:75PubMedCrossRefGoogle Scholar
  58. 58.
    Guo Y, Li J, Chen Y et al (2009) A new permutation strategy of pathway-based approach for genome-wide association study. BMC Bioinformatics 10:429PubMedCrossRefGoogle Scholar
  59. 59.
    Hong M, Pawitan Y, Magnusson PKE et al (2009) Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum Genet 126:289–301PubMedCrossRefGoogle Scholar
  60. 60.
    Lebrec JJ, Huizinga TW, Toes RE et al (2009) Integration of gene ontology pathways with north American rheumatoid arthritis consortium genome-wide association data via linear modeling. BMC Proc 3(Suppl 7):S94PubMedCrossRefGoogle Scholar
  61. 61.
    Lee J, Ahn S, Oh S et al (2011) SNP-PRAGE: SNP-based parametric robust analysis of gene enrichment. BMC Syst Biol 5(Suppl 2):S11PubMedCrossRefGoogle Scholar
  62. 62.
    Li M, Wang K, Grant SFA et al (2008) ATOM: a powerful gene-based association test by combining optimally weighted markers. Bioinformatics 25(4):297–503Google Scholar
  63. 63.
    Menashe I, Maeder D, Garcia-Closas M et al (2010) Pathway analysis of breast cancer genome-wide association study highlights three pathways and one canonical signaling cascade. Cancer Res 70(11):4453–4459PubMedCrossRefGoogle Scholar
  64. 64.
    Peng G, Luo L, Siu H et al (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 18:111–117PubMedCrossRefGoogle Scholar
  65. 65.
    Shahbaba B, Shachaf CM, Yu Z (2012) A pathway analysis method for genome-wide association studies. Stat Med 31:988–1000. doi: 10.1002/sim.4477 PubMedCrossRefGoogle Scholar
  66. 66.
    Sohns M, Rosenberger A, Bickeboller H (2009) Integration of a priori gene set information into genome-wide association studies. BMC Proc 3:S95PubMedCrossRefGoogle Scholar
  67. 67.
    Tintle N, Borchers B, Brown M et al (2009) Comparing gene set analysis methods on single-nucleotide polymorphism data from genetic analysis workshop 16. BMC Proc 3:S96PubMedCrossRefGoogle Scholar
  68. 68.
    Wang T, Elston RC (2007) Improved power by use of a weighted score test for linkage disequilibrium mapping. Am J Hum Genet 80:353–360PubMedCrossRefGoogle Scholar
  69. 69.
    Yu K, Li Q, Bergen AW et al (2009) Pathway analysis by adaptive combination of p-values. Genet Epidemiol 33(8):700–709PubMedCrossRefGoogle Scholar
  70. 70.
    SNVset, R package.
  71. 71.
    Tintle NL, Sitarik A, Boerema B et al (2012) Evaluating the quality of gene sets used in the analysis of bacterial gene expression data. BMC Bioinformatics 13:193Google Scholar
  72. 72.
    Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5:e10000384CrossRefGoogle Scholar
  73. 73.
    Li B, Leal S (2008) Methods for detecting associations with rare variants for common diseases: applications to analysis of sequence data. Am J Hum Genet 83:311–321PubMedCrossRefGoogle Scholar
  74. 74.
    Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34:188–193PubMedCrossRefGoogle Scholar
  75. 75.
    Zawistowski M, Gopalakrishnan S, Ding J et al (2010) Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes. Am J Hum Genet 87:604–617PubMedCrossRefGoogle Scholar
  76. 76.
    Wu MC, Lee S, Cai T et al (2011) Rare variant association testing for sequencing data with the sequence kernel association test (SKAT). Am J Hum Genet 89:82–93PubMedCrossRefGoogle Scholar
  77. 77.
    Dai Y, Jiang R, Dong J (2012) Weighted selective collapsing strategy for detecting rare and common variants in genetic association study. BMC Genet 13:7PubMedCrossRefGoogle Scholar
  78. 78.
    Cantor RM, Lange K, Sinsheimer JS (2010) Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet 86:6–22PubMedCrossRefGoogle Scholar
  79. 79.
    Tintle N, Lantieri F, Lebree J et al (2009) Inclusion of a priori information in genome-wide association analysis. Genet Epidemiol 33:S74–S80PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2013

Authors and Affiliations

  • Ashley Petersen
    • 1
  • Justin Spratt
    • 2
  • Nathan L. Tintle
    • 2
  1. 1.Department of BiostatisticsUniversity of WashingtonSeattleUSA
  2. 2.Department of Mathematics, Statistics, and Computer ScienceDordt CollegeSioux CenterUSA

Personalised recommendations