Genotype Error Detection Using Hidden Markov Models of Haplotype Diversity

  • Justin Kennedy
  • Ion Măndoiu
  • Bogdan Paşaniuc
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4645)


The presence of genotyping errors can invalidate statistical tests for linkage and disease association, particularly for methods based on haplotype analysis. Becker et al. have recently proposed a simple likelihood ratio approach for detecting errors in trio genotype data. Under this approach, a SNP genotype is flagged as a potential error if the likelihood associated with the original trio genotype data increases by a multiplicative factor exceeding a user selected threshold when the SNP genotype under test is deleted. In this paper we give improved error detection methods using the likelihood ratio test approach in conjunction with likelihood functions that can be efficiently computed based on a Hidden Markov Model of haplotype diversity in the population under study. Experimental results on both simulated and real datasets show that proposed methods achieve significantly improved detection accuracy compared to previous methods with highly scalable running time.


Hide Markov Model Likelihood Function Error Detection Haplotype Diversity Real Dataset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pompanon, F., Bonin, A., Bellemain, E., Taberlet, P.: Genotyping errors: causes, consequences and solutions. Nat. Rev. Genet. 6, 847–859 (2005)CrossRefGoogle Scholar
  2. 2.
    Zaitlen, N., Kang, H., Feolo, M., Sherry, S.T., Halperin, E., Eskin, E.: Inference and analysis of haplotypes from combined genotyping studies deposited in dbSNP. Genome Research 15, 1595–1600 (2005)CrossRefGoogle Scholar
  3. 3.
    Douglas, J., Skol, A., Boehnke, M.: Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. AJHG 70, 487–495 (2002)CrossRefGoogle Scholar
  4. 4.
    Gordon, D., Heath, S., Ott, J.: True pedigree errors more frequent than apparent errors for single nucleotide poloymorphisms. Hum. Hered. 49, 65–70 (1999)CrossRefGoogle Scholar
  5. 5.
    Ahn, K., Haynes, C., Kim, W., Fleur, R., Gordon, D., Finch, S.: The effects of SNP genotyping errors on the power of the Cochran-Armitage linear trend test for case/control association studies. Ann. Hum. Genet. 71, 249–261 (2007)CrossRefGoogle Scholar
  6. 6.
    Abecasis, G., Cherny, S., Cardon, L.: The impact of genotyping error on family-based analysis of quantitative traits. Eur. J. Hum. Genet. 9, 130–134 (2001)CrossRefGoogle Scholar
  7. 7.
    Cherny, S., Abecasis, G., Cookson, W., Sham, P., Cardon, L.: The effect of genotype and pedigree error on linkage analysis: Analysis of three asthma genome scans. Genet. Epidemiol. 21, S117–S122 (2001)Google Scholar
  8. 8.
    Knapp, M., Becker, T.: Impact of genotyping errors on type I error rate of the haplotype-sharing transmission/disequilibrium test (HS-TDT). Am. J. Hum. Genet. 74, 589–591 (2004)CrossRefGoogle Scholar
  9. 9.
    Cheng, K.: Analysis of case-only studies accounting for genotyping error. Ann. Hum. Genet. 71, 238–248 (2007)CrossRefGoogle Scholar
  10. 10.
    Liu, W., Yang, T., Zhao, W., Chase, G.: Accounting for genotyping errors in tagging SNP selection. Am. J. Hum. Genet. 71(4), 467–479 (2007)CrossRefGoogle Scholar
  11. 11.
    Sobel, E., Papp, J., Lange, K.: Detection and integration of genotyping errors in statistical genetics. Am. J. Hum. Genet. 70, 496–508 (2002)CrossRefGoogle Scholar
  12. 12.
    Abecasis, G., Cherny, S., Cookson, W., Cardon, L.: Merlin-rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002)CrossRefGoogle Scholar
  13. 13.
    Becker, T., Valentonyte, R., Croucher, P., Strauch, K., Schreiber, S., Hampe, J., Knapp, M.: Identification of probable genotyping errors by consideration of haplotypes. European Journal of Human Genetics 14, 450–458 (2006)CrossRefGoogle Scholar
  14. 14.
    Kimmel, G., Shamir, R.: A block-free hidden Markov model for genotypes and its application to disease association. Journal of Computational Biology 12, 1243–1260 (2005)CrossRefGoogle Scholar
  15. 15.
    Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: Phasing genotypes using a hidden Markov model. In: Bioinformatics Algorithms: Techniques and Applications, Wiley, Chichester, preliminary version in Proc. WABI 2005 (to appear)Google Scholar
  16. 16.
    Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics (to appear)Google Scholar
  17. 17.
    Schwartz, R.: Algorithms for association study design using a generalized model of haplotype conservation. In: Proc. CSB, pp. 90–97 (2004)Google Scholar
  18. 18.
    Gusev, A., Paşaniuc, B., Măndoiu, I.: Highly scalable genotype phasing by entropy minimization. IEEE Transactions on Computational Biology and Bioinformatics (to appear)Google Scholar
  19. 19.
    Becker, T., Knapp, M.: Maximum-likelihood estimation of haplotype frequencies in nuclear families. Genet. Epidemiol. 27, 21–32 (2004)CrossRefGoogle Scholar
  20. 20.
    Douglas, J., Boehnke, M., Lange, K.: A multipoint method for detecting genotyping errors and mutations in sibling-pair linkage data. AJHG 66, 1287–1297 (2000)CrossRefGoogle Scholar
  21. 21.
    Mukhopadhyaya, N., Buxbauma, S., Weeks, D.: Comparative study of multipoint methods for genotype error detection. Hum. Hered. 58, 175–189 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Justin Kennedy
    • 1
  • Ion Măndoiu
    • 1
  • Bogdan Paşaniuc
    • 1
  1. 1.CSE Department, University of Connecticut, Storrs, CT 06269 

Personalised recommendations