Human Genetics

, Volume 135, Issue 6, pp 625–634 | Cite as

Discovery of rare variants for complex phenotypes

  • Jack A. Kosmicki
  • Claire L. Churchhouse
  • Manuel A. Rivas
  • Benjamin M. NealeEmail author
Part of the following topical collections:
  1. Exome Sequencing


With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.


Rare Variant Complex Trait Whole Genome Sequencing Exome Sequencing Transmission Disequilibrium Test 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We would like to thank all members of the ATGU and the Wall lab for their insightful discussions and assistance in writing this manuscript. We also acknowledge 1R01MH101244-02.


  1. Adzhubei IA et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249CrossRefPubMedPubMedCentralGoogle Scholar
  2. Ashley-Koch AE et al (2015) Genome-wide association study of posttraumatic stress disorder in a cohort of Iraq-Afghanistan era veterans. J Affect Disord 184:225–234CrossRefPubMedGoogle Scholar
  3. Asimit JL, Day-Williams AG, Morris AP, Zeggini E (2012) ARIEL and AMELIA: testing for an accumulation of rare variants using next-generation sequencing data. Hum Hered 73:84–94CrossRefPubMedPubMedCentralGoogle Scholar
  4. Auer PL, Lettre G (2015) Rare variant association studies: considerations, challenges and opportunities. Genome Med 7:16CrossRefPubMedPubMedCentralGoogle Scholar
  5. Bansal V, Libiger O, Torkamani A, Schork NJ (2010) Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet 11:773–785CrossRefPubMedPubMedCentralGoogle Scholar
  6. Barsh GS, Copenhaver GP, Gibson G, Williams SM (2012) Guidelines for genome-wide association studies. PLoS Genet 8:e1002812CrossRefPubMedPubMedCentralGoogle Scholar
  7. Bellus GA et al (1995) Achondroplasia is defined by recurrent G380R mutations of FGFR3. Am J Hum Genet 56:368–373PubMedPubMedCentralGoogle Scholar
  8. Chen H et al (2014) Sequence kernel association test for survival traits. Genet Epidemiol 38:191–197CrossRefPubMedPubMedCentralGoogle Scholar
  9. Cohen JC et al (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305:869–872CrossRefPubMedGoogle Scholar
  10. Cohen J et al (2005) Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat Genet 37:161–165CrossRefPubMedGoogle Scholar
  11. Cohen JC, Boerwinkle E, Mosley TH Jr, Hobbs HH (2006) Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med 354:1264–1272CrossRefPubMedGoogle Scholar
  12. Conrad DF et al (2011) Variation in genome-wide mutation rates within and between human families. Nat Genet 43:712–714CrossRefPubMedPubMedCentralGoogle Scholar
  13. Cooper GM, Shendure J (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12:628–640CrossRefPubMedGoogle Scholar
  14. Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base substitution hotspots in Escherichia coli. Nature 274:775–780CrossRefPubMedGoogle Scholar
  15. Davydov EV et al (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP ++. PLoS Comput Biol 6:e1001025CrossRefPubMedPubMedCentralGoogle Scholar
  16. Deciphering Developmental Disorders S (2015) Large-scale discovery of novel genetic causes of developmental disorders. Nature 519:223–228Google Scholar
  17. de Ligt J et al (2012) Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 367:1921–1929CrossRefPubMedGoogle Scholar
  18. De Rubeis S et al (2014) Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515:209–215CrossRefPubMedPubMedCentralGoogle Scholar
  19. Edwards AO et al (2005) Complement factor H polymorphism and age-related macular degeneration. Science 308:421–424CrossRefPubMedGoogle Scholar
  20. Elansary M et al (2015) On the use of the transmission disequilibrium test to detect pseudo-autosomal variants affecting traits with sex-limited expression. Anim Genet 46:395–402CrossRefPubMedGoogle Scholar
  21. Ellegren H, Smith NG, Webster MT (2003) Mutation rate variation in the mammalian genome. Curr Opin Genet Dev 13:562–568CrossRefPubMedGoogle Scholar
  22. Emond MJ et al (2012) Exome sequencing of extreme phenotypes identifies DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis. Nat Genet 44:886–889CrossRefPubMedPubMedCentralGoogle Scholar
  23. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74CrossRefGoogle Scholar
  24. Feng S, Liu D, Zhan X, Wing MK, Abecasis GR (2014) RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 30:2828–2829CrossRefPubMedPubMedCentralGoogle Scholar
  25. Flannick J et al (2014) Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 46:357–363CrossRefPubMedPubMedCentralGoogle Scholar
  26. Fu W et al (2013) Analysis of 6515 exomes reveals the recent origin of most human protein-coding variants. Nature 493:216–220CrossRefPubMedPubMedCentralGoogle Scholar
  27. Genomes Project C et al (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073CrossRefGoogle Scholar
  28. Grimm DG et al (2015) The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 36:513–523CrossRefPubMedPubMedCentralGoogle Scholar
  29. Gudmundsson J et al (2012) A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nat Genet 44:1326–1329CrossRefPubMedPubMedCentralGoogle Scholar
  30. Guey LT et al (2011) Power in the phenotypic extremes: a simulation study of power in discovery and replication of rare variants. Genet Epidemiol 35:236–246PubMedGoogle Scholar
  31. Hardison RC et al (2003) Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res 13:13–26CrossRefPubMedPubMedCentralGoogle Scholar
  32. Hatzikotoulas K, Gilly A, Zeggini E (2014) Using population isolates in genetic association studies. Brief Funct Genom 13:371–377CrossRefGoogle Scholar
  33. He X et al (2013) Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet 9:e1003671CrossRefPubMedPubMedCentralGoogle Scholar
  34. He Z et al (2014) Rare-variant extensions of the transmission disequilibrium test: application to autism exome sequence data. Am J Hum Genet 94:33–46CrossRefPubMedPubMedCentralGoogle Scholar
  35. Helgason A et al (2000) Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am J Hum Genet 67:697–717CrossRefPubMedPubMedCentralGoogle Scholar
  36. Helgason A et al (2001) mtDna and the islands of the North Atlantic: estimating the proportions of Norse and Gaelic ancestry. Am J Hum Genet 68:723–737CrossRefPubMedPubMedCentralGoogle Scholar
  37. Hellmann I et al (2005) Why do human diversity levels vary at a megabase scale? Genome Res 15:1222–1231CrossRefPubMedPubMedCentralGoogle Scholar
  38. Ionita-Laza I, McCallum K, Xu B, Buxbaum JD (2016) A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48:214–220CrossRefPubMedGoogle Scholar
  39. Iossifov I et al (2012) De novo gene disruptions in children on the autistic spectrum. Neuron 74:285–299CrossRefPubMedPubMedCentralGoogle Scholar
  40. Iossifov I et al (2014) The contribution of de novo coding mutations to autism spectrum disorder. Nature 515:216–221CrossRefPubMedPubMedCentralGoogle Scholar
  41. Kiezun A et al (2012) Exome sequencing and the genetic basis of complex traits. Nat Genet 44:623–630CrossRefPubMedPubMedCentralGoogle Scholar
  42. Kircher M et al (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46:310–315CrossRefPubMedPubMedCentralGoogle Scholar
  43. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG (1988) Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture. Am J Hum Genet 43:520–526PubMedPubMedCentralGoogle Scholar
  44. Kondrashov AS (2003) Direct estimates of human per nucleotide mutation rates at 20 loci causing mendelian diseases. Hum Mutat 21:12–27CrossRefPubMedGoogle Scholar
  45. Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci U S A 106:3871–3876CrossRefPubMedPubMedCentralGoogle Scholar
  46. Lander E, Kruglyak L (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nat Genet 11:241–247CrossRefPubMedGoogle Scholar
  47. Lander ES, Schork NJ (1994) Genetic dissection of complex traits. Science 265:2037–2048CrossRefPubMedGoogle Scholar
  48. Lee S, Teslovich TM, Boehnke M, Lin X (2013) General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet 93:42–53CrossRefPubMedPubMedCentralGoogle Scholar
  49. Lee S, Abecasis GR, Boehnke M, Lin X (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95:5–23CrossRefPubMedPubMedCentralGoogle Scholar
  50. Lek M et al (2015) Analysis of protein-coding genetic variation in 60,706 humans. bioRxivGoogle Scholar
  51. Lercher MJ, Hurst LD (2002) Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet 18:337–340CrossRefPubMedGoogle Scholar
  52. Levy-Lahad E et al (1997) Founder BRCA1 and BRCA2 mutations in Ashkenazi Jews in Israel: frequency and differential penetrance in ovarian cancer and in breast-ovarian cancer families. Am J Hum Genet 60:1059–1067PubMedPubMedCentralGoogle Scholar
  53. Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83:311–321CrossRefPubMedPubMedCentralGoogle Scholar
  54. Listgarten J, Lippert C, Heckerman D (2013) FaST-LMM-Select for addressing confounding from spatial structure and rare variants. Nat Genet 45:470–471CrossRefPubMedGoogle Scholar
  55. Liu DJ, Leal SM (2010) Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 87:790–801CrossRefPubMedPubMedCentralGoogle Scholar
  56. Liu DJ et al (2014) Meta-analysis of gene-level tests for rare variant association. Nat Genet 46:200–204CrossRefPubMedPubMedCentralGoogle Scholar
  57. Locke AE et al (2015) Genetic studies of body mass index yield new insights for obesity biology. Nature 518:197–206CrossRefPubMedPubMedCentralGoogle Scholar
  58. MacArthur DG et al (2012) A systematic survey of loss-of-function variants in human protein-coding genes. Science 335:823–828CrossRefPubMedPubMedCentralGoogle Scholar
  59. Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5:e1000384CrossRefPubMedPubMedCentralGoogle Scholar
  60. Manolio TA et al (2009) Finding the missing heritability of complex diseases. Nature 461:747–753CrossRefPubMedPubMedCentralGoogle Scholar
  61. Mathieson I, McVean G (2012) Differential confounding of rare and common variants in spatially structured populations. Nat Genet 44:243–246CrossRefPubMedPubMedCentralGoogle Scholar
  62. Mathieson I, McVean G (2013) Reply to: “FaST-LMM-Select for addressing confounding from spatial structure and rare variants”. Nat Genet 45:471CrossRefPubMedGoogle Scholar
  63. Morgenthaler S, Thilly WG (2007) A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res, Fundam Mol Mech Mutagen 615:28–56CrossRefGoogle Scholar
  64. Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34:188–193CrossRefPubMedPubMedCentralGoogle Scholar
  65. Moutsianas L et al (2015) The power of gene-based rare variant methods to detect disease-associated variation and test hypotheses about complex disease. PLoS Genet 11:e1005165CrossRefPubMedPubMedCentralGoogle Scholar
  66. Neale BM, Sham PC (2004) The future of association studies: gene-based analysis and replication. Am J Hum Genet 75:353–362CrossRefPubMedPubMedCentralGoogle Scholar
  67. Neale BM et al (2011) Testing for an unusual distribution of rare variants. PLoS Genet 7:e1001322CrossRefPubMedPubMedCentralGoogle Scholar
  68. Neale BM et al (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485:242–245CrossRefPubMedPubMedCentralGoogle Scholar
  69. Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11:863–874CrossRefPubMedPubMedCentralGoogle Scholar
  70. O’Roak BJ et al (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485:246–250CrossRefPubMedPubMedCentralGoogle Scholar
  71. Perroud N et al (2011) Genome-wide association study of hoarding traits. Am J Med Genet B Neuropsychiatr Genet 156:240–242CrossRefPubMedGoogle Scholar
  72. Price AL et al (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–909CrossRefPubMedGoogle Scholar
  73. Pritchard JK, Donnelly P (2001) Case-control studies of association in structured or admixed populations. Theor Popul Biol 60:227–237CrossRefPubMedGoogle Scholar
  74. Psaty BM et al (2009) Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ Cardiovasc Genet 2:73–80CrossRefPubMedPubMedCentralGoogle Scholar
  75. Purcell SM et al (2014) A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506:185–190CrossRefPubMedPubMedCentralGoogle Scholar
  76. Replication DIG et al (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 46:234–244CrossRefGoogle Scholar
  77. Rioux JD et al (2007) Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet 39:596–604CrossRefPubMedPubMedCentralGoogle Scholar
  78. Ritchie GR, Dunham I, Zeggini E, Flicek P (2014) Functional annotation of noncoding sequence variants. Nat Methods 11:294–296CrossRefPubMedGoogle Scholar
  79. Rivas MA et al (2011) Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease. Nat Genet 43:1066–1073CrossRefPubMedPubMedCentralGoogle Scholar
  80. Rivas MA et al (2015) Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348:666–669CrossRefPubMedPubMedCentralGoogle Scholar
  81. Romanoski CE, Glass CK, Stunnenberg HG, Wilson L, Almouzni G (2015) Epigenomics: roadmap for regulation. Nature 518:314–316CrossRefPubMedGoogle Scholar
  82. Roth EM, McKenney JM, Hanotin C, Asset G, Stein EA (2012) Atorvastatin with or without an antibody to PCSK9 in primary hypercholesterolemia. N Engl J Med 367:1891–1900CrossRefPubMedGoogle Scholar
  83. Samocha KE et al (2014) A framework for the interpretation of de novo mutation in human disease. Nat Genet 46:944–950CrossRefPubMedPubMedCentralGoogle Scholar
  84. Sanders SJ et al (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485:237–241CrossRefPubMedPubMedCentralGoogle Scholar
  85. Schwarz JM, Cooper DN, Schuelke M, Seelow D (2014) MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 11:361–362CrossRefPubMedGoogle Scholar
  86. Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516PubMedPubMedCentralGoogle Scholar
  87. Stein EA et al (2012) Effect of a monoclonal antibody to PCSK9 on LDL cholesterol. N Engl J Med 366:1108–1118CrossRefPubMedGoogle Scholar
  88. Steinthorsdottir V et al (2014) Identification of low-frequency and rare sequence variants associated with elevated or reduced risk of type 2 diabetes. Nat Genet 46:294–298CrossRefPubMedGoogle Scholar
  89. Sunyaev SR (2012) Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet 21:R10–R17CrossRefPubMedPubMedCentralGoogle Scholar
  90. Tang ZZ, Lin DY (2013) MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics 29:1803–1805CrossRefPubMedPubMedCentralGoogle Scholar
  91. Tang ZZ, Lin DY (2014) Meta-analysis of sequencing studies with heterogeneous genetic associations. Genet Epidemiol 38:389–401CrossRefPubMedPubMedCentralGoogle Scholar
  92. Tang Z-Z, Lin D-Y (2015) Meta-analysis for discovering rare-variant associations: statistical methods and software programs. Am J Hum Genet 97:35–53CrossRefPubMedPubMedCentralGoogle Scholar
  93. Terwilliger JD, Ott J (1992) A haplotype-based ‘haplotype relative risk’ approach to detecting allelic associations. Hum Hered 42:337–346CrossRefPubMedGoogle Scholar
  94. The UKKC (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90CrossRefGoogle Scholar
  95. Vogel F, Rathenberg R (1975) Spontaneous mutation in man. In: Harris H, Hirschhorn K (eds) Advances in human genetics. Springer US, Boston, pp 223–318CrossRefGoogle Scholar
  96. Welter D et al (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006CrossRefPubMedPubMedCentralGoogle Scholar
  97. Wu MC et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93CrossRefPubMedPubMedCentralGoogle Scholar
  98. Zuk O et al (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci USA 111:E455–E464CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Analytic and Translational Genetics Unit, Department of MedicineMassachusetts General Hospital and Harvard Medical SchoolBostonUSA
  2. 2.Program in Medical and Population GeneticsBroad Institute of Harvard and MITCambridgeUSA
  3. 3.Stanley Center for Psychiatric ResearchBroad Institute of Harvard and MITCambridgeUSA
  4. 4.Bioinformatics and Integrative GenomicsHarvard UniversityCambridgeUSA

Personalised recommendations