Improving Imputation Accuracy by Inferring Causal Variants in Genetic Studies

  • Yue Wu
  • Farhad Hormozdiari
  • Jong Wha J. Joo
  • Eleazar EskinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10229)


Genotype imputation has been widely utilized for two reasons in the analysis of Genome-Wide Association Studies (GWAS). One reason is to increase the power for association studies when causal SNPs are not collected in the GWAS. The second reason is to aid the interpretation of a GWAS result by predicting the association statistics at untyped variants. In this paper, we show that prediction of association statistics at untyped variants that have an influence on the trait produces overly conservative results. Current imputation methods assume that none of the variants in a region (locus consists of multiple variants) affect the trait, which is often inconsistent with the observed data. In this paper, we propose a new method, CAUSAL-Imp, which can impute the association statistics at untyped variants while taking into account variants in the region that may affect the trait. Our method builds on recent methods that impute the marginal statistics for GWAS by utilizing the fact that marginal statistics follow a multivariate normal distribution. We utilize both simulated and real data sets to assess the performance of our method. We show that traditional imputation approaches underestimate the association statistics for variants involved in the trait, and our results demonstrate that our approach provides less biased estimates of these association statistics.


Linkage Disequilibrium Summary Statistic Association Statistic Marginal Statistic Causal Variant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Zeggini, E., Weedon, M.N., Lindgren, C.M., et al.: Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316(5829), 1336–1341 (2007)CrossRefGoogle Scholar
  2. 2.
    Sladek, R., Rocheleau, G., Rung, J., et al.: A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445(7130), 881–885 (2007)CrossRefGoogle Scholar
  3. 3.
    Hakonarson, H., Grant, S.F.A., Bradfield, J.P., et al.: A genome-wide association study identifies kiaa0350 as a type 1 diabetes gene. Nature 448(7153), 591–594 (2007)CrossRefGoogle Scholar
  4. 4.
    Yang, J., Manolio, T.A., Pasquale, L.R., et al.: Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43(6), 519–525 (2011)CrossRefGoogle Scholar
  5. 5.
    Kottgen, A., Albrecht, E., Teumer, A., et al.: Genome-wide association analyses identify 18 new loci associated with serum urate concentrations. Nat. Genet. 45(2), 145–154 (2013)CrossRefGoogle Scholar
  6. 6.
    Yi, L., Vitart, V., Burdon, K.P., et al.: Genome-wide association analyses identify multiple loci associated with central corneal thickness and keratoconus. Nat. Genet. 45(2), 155–163 (2013)CrossRefGoogle Scholar
  7. 7.
    Ripke, S., O’Dushlaine, C., Chambert, K., et al.: Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat. Genet. 45(10), 1150–1159 (2013)CrossRefGoogle Scholar
  8. 8.
    Reich, D.E., Cargill, M., Bolk, S., et al.: Linkage disequilibrium in the human genome. Nature 411(6834), 199–204 (2001)CrossRefGoogle Scholar
  9. 9.
    Pritchard, J.K., Przeworski, M.: Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69(1), 1–14 (2001)CrossRefGoogle Scholar
  10. 10.
    Browning, S.R.: Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124(5), 439–450 (2008)CrossRefGoogle Scholar
  11. 11.
    Howie, B., Fuchsberger, C., Stephens, M., Marchini, J., Abecasis, G.R.: Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44(8), 955–959 (2012)CrossRefGoogle Scholar
  12. 12.
    Howie, B.N., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5(6), e1000529 (2009)CrossRefGoogle Scholar
  13. 13.
    Li, Y., Willer, C., Sanna, S., Abecasis, G.: Genotype imputation. Annu. Rev. Genomics Hum. Genet. 10, 387–406 (2009)CrossRefGoogle Scholar
  14. 14.
    Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol 34(8), 816–834 (2010)CrossRefGoogle Scholar
  15. 15.
    Marchini, J., Howie, B.: Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11(7), 499–511 (2010)CrossRefGoogle Scholar
  16. 16.
    Marchini, J., Howie, B.: Comparing algorithms for genotype imputation. Am. J. Hum. Genet. 83(4), 535–539 (2008). (author reply 539–540)CrossRefGoogle Scholar
  17. 17.
    Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39(7), 906–913 (2007)CrossRefGoogle Scholar
  18. 18.
    Han, B., Kang, H.M., Eskin, E.: Rapid and accurate multiple testing correction and power estimation for millions of correlated markers. PLoS Genet. 5(4), e1000456 (2009)CrossRefGoogle Scholar
  19. 19.
    Kostem, E., Lozano, J.A., Eskin, E.: Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics 188(2), 449–460 (2011)CrossRefGoogle Scholar
  20. 20.
    Hormozdiari, F., Kostem, E., Kang, E.Y., Pasaniuc, B., Eskin, E.: Identifying causal variants at loci with multiple signals of association. Genetics 198(2), 497–508 (2014)CrossRefGoogle Scholar
  21. 21.
    Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B., Eskin, E.: Identification of causal genes for complex traits. Bioinformatics 31(12), i206–i213 (2015)CrossRefGoogle Scholar
  22. 22.
    Hormozdiari, F., van de Bunt, M., Segrè, A.V., et al.: Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99(6), 1245–1260 (2016)CrossRefGoogle Scholar
  23. 23.
    Lee, D., Bigdeli, T.B., Riley, B.P., Fanous, A.H., Bacanu, S.A.: DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29(22), 2925–2927 (2013)CrossRefGoogle Scholar
  24. 24.
    Pasaniuc, B., Zaitlen, N., Shi, H., et al.: Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30(20), 2906–2914 (2014)CrossRefGoogle Scholar
  25. 25.
    Sabatti, C., Service, S.K., Hartikainen, A.-L., et al.: Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat. Genet. 41(1), 35–46 (2009)CrossRefGoogle Scholar
  26. 26.
    Durbin, R.M., Altshuler, D.L., Durbin, R.M., et al.: A map of human genome variation from population-scale sequencing. Nature 467(7319), 1061–1073 (2010)CrossRefGoogle Scholar
  27. 27.
    McVean, G.A., Altshuler, D.M., Durbin, R.M., et al.: An integrated map of genetic variation from 1,092 human genomes. Nature 491(7422), 56–65 (2012)CrossRefGoogle Scholar
  28. 28.
    Zaitlen, N., Kang, H.M., Eskin, E., Halperin, E.: Leveraging the hapmap correlation structure in association studies. Am. J. Hum. Genet. 80(4), 683–691 (2007)CrossRefGoogle Scholar
  29. 29.
    Joo, J.W.J., Hormozdiari, F., Han, B., Eskin, E.: Multiple testing correction in linear mixed models. Genome Biol. 17(1), 62 (2016)CrossRefGoogle Scholar
  30. 30.
    Devlin, B., Roeder, K.: Genomic control for association studies. Biometrics 55(4), 997–1004 (1999)CrossRefzbMATHGoogle Scholar
  31. 31.
    Duong, D., Zou, J., Hormozdiari, F., et al.: Using genomic annotations increases statistical power to detect eGenes. Bioinformatics 32(12), i156–i163 (2016)CrossRefGoogle Scholar
  32. 32.
    Hormozdiari, F., Kang, E.Y., Bilow, M., et al.: Imputing phenotypes for genome-wide association studies. Am. J. Hum. Genet. 99(1), 89–103 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Yue Wu
    • 1
  • Farhad Hormozdiari
    • 1
    • 2
  • Jong Wha J. Joo
    • 1
    • 3
  • Eleazar Eskin
    • 1
    • 4
    Email author
  1. 1.Department of Computer ScienceUCLALos AngelesUSA
  2. 2.Program in Genetic Epidemiology and Statistical GeneticsHarvard UniversityCambridgeUSA
  3. 3.Department of Molecular and Medical PharmacologyUCLALos AngelesUSA
  4. 4.Department of Human GeneticsUCLALos AngelesUSA

Personalised recommendations