An Evaluation of the MiDCoP Method for Imputing Allele Frequency in Genome Wide Association Studies

  • Yadu GautamEmail author
  • Carl Lee
  • Chin-I Cheng
  • Carl Langefeld
Part of the Studies in Computational Intelligence book series (SCI, volume 569)


A genome wide association studies require genotyping DNA sequence of a large sample of individuals with and without the specific disease of interest. The current technologies of genotyping individual DNA sequence only genotype a limited DNA sequence of each individual in the study. As a result, a large fraction of Single Nucleotide Polymorphisms (SNPs) are not genotyped. Existing imputation methods are based on individual level data, which are often time consuming and costly. A new method, the Minimum Deviation of Conditional Probability (MiDCoP), was recently developed that aims at imputing the allele frequencies of the missing SNPs using the allele frequencies of neighboring SNPs without using the individual level SNP information. This article studies the performance of the MiDCoP approach using association analysis based on the imputed allele frequency by analyzing the GAIN Schizophrenia data. The results indicate that the choice of reference sets has strong impact on the performance. The imputation accuracy improves if the case and control data sets are imputed using a separate but better matched reference set, respectively.


Association Tests Conditional Probability Imputation Minimum Deviation Multilocus Information Measure Single Nucleotide Polymorphisms 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Marchini, J., Howie, B., Myers, S., McVean, G., Donnelly, P.: A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genetics 39, 906–913 (2007)CrossRefGoogle Scholar
  2. 2.
    Howie, B., Donnelly, P., Marchini, J.: A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5, e1000529 (2009)Google Scholar
  3. 3.
    Li, Y., Ding, J., Abecasis, G.R.: Mach 1.0: Rapid Haplotype Reconstruction and Missing Genotype Inference. The American Journal of Human Genetics 79, S2290 (2006)Google Scholar
  4. 4.
    Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology 35, 816–834 (2010)CrossRefGoogle Scholar
  5. 5.
    Browning, B., Browning, S.R.: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. The American Journal of Human Genetics 84, 210–223 (2009)CrossRefGoogle Scholar
  6. 6.
    Guan, Y., Stephens, M.: Practical Issues in Imputation-Based Association Mapping. PLoS Genetics 4(12), e1000279 (2008), doi:10.1371/journal.pgen.1000279Google Scholar
  7. 7.
    Nicolae, D.L.: Testing untyped alleles (TUNA)-applications to genome-wide association studies. Genetic Epidemiology 30, 718–727 (2006)CrossRefGoogle Scholar
  8. 8.
    Zaitlen, N., Kang, H.M., Eskin, E., Halperin, E.: Leveraging the HapMap correlation structure in association studies. American Journal of Human Genetics 80, 683–691 (2007)CrossRefGoogle Scholar
  9. 9.
    Lin, D.Y., Hu, Y., Huang, B.: Simple and efficient analysis of disease association with missing genotype data. The American Journal of Human Genetics 82, 444–452 (2008)CrossRefGoogle Scholar
  10. 10.
    Gautam, Y.: A novel approach of imputing untypes SNP using the allele frequencies of neighboring SNPs. Unpublished dissertation, Central Michigan University, USA (2014)Google Scholar
  11. 11.
    The International HapMap Consortium: Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010)Google Scholar
  12. 12.
    Zhang, L., Liu, J., Deng, H.W.: A multilocus linkage disequilibrium measure based on mutual information theory and its applications. Genetica 137, 355–364 (2009)CrossRefGoogle Scholar
  13. 13.
    Database of Genotype and phenotype (dbGap): Available at Bethesda (MD): National Center for Biotechnology Information, National Library of Medicine,
  14. 14.
    Zheng, G., Yang, Y., Zhu, X., Elston, R.C.: Analysis of Genetic Association Studies. Springer, New York (2012)CrossRefzbMATHGoogle Scholar
  15. 15.
    The 1000 Genomes Project Consortium: An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yadu Gautam
    • 1
    Email author
  • Carl Lee
    • 1
  • Chin-I Cheng
    • 1
  • Carl Langefeld
    • 2
  1. 1.Department of MathematicsCentral Michigan UniversityMt. PleasantUSA
  2. 2.Department of Biostatistical Sciences, Division of Public Health SciencesWake Forest UniversityWinston-SalemUSA

Personalised recommendations