Phasing of 2-SNP Genotypes Based on Non-random Mating Model

  • Dumitru Brinza
  • Alexander Zelikovsky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3992)


Emerging microarray technologies allow genotyping of long genome sequences resulting in huge amount of data. A key challenge is to provide an accurate phasing of very long single nucleotide polymorphism (SNP) sequences. In this paper we explore phasing of genotypes with 2 SNPs adjusted to the non-random mating model and then apply it to the haplotype inference of complete genotypes using maximum spanning trees. The runtime of the algorithm is O(nm(n+m)), where n and m are the number of genotypes and SNPs, respectively. The proposed phasing algorithm (2SNP) can be used for comparatively accurate phasing of large number of very long genome sequences. On datasets across 79 regions from HapMap [7] 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week of runtime and admitting no less errors than 2SNP. 2SNP software is publicly available at


International HapMap Project Haplotype Inference Heterozygous Site Genotype Graph Complete Genotype 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Brinza, D., Zelikovsky, A.: 2SNP: Scalable Phasing Based on 2-SNP Haplotypes. Bioinformatics 22(3), 371–374 (2006)CrossRefGoogle Scholar
  2. 2.
    Clark, A.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7, 111–122 (1990)Google Scholar
  3. 3.
    Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E.: High resolution haplotype structure in the human genome. Nat. Genet. 29, 229–232 (2001)CrossRefGoogle Scholar
  4. 4.
    Gabriel, G., Schaffner, S., Nguyen, H., Moore, J., Roy, J., Blumenstiel, B., Higgins, J., et al.: The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002)CrossRefGoogle Scholar
  5. 5.
    Gusfield, D.: Haplotype inference by pure parsimony. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 144–155. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  6. 6.
    Halperin, E., Eskin, E.: Haplotype Reconstruction from Genotype Data using Imperfect Phylogeny. Bioinformatics 20, 1842–1849 (2004)CrossRefGoogle Scholar
  7. 7.
    International HapMap Consortium, The International HapMap Project. Nature 426, 789–796 (2003), Google Scholar
  8. 8.
    Hudson, R.: Gene genealogies and the coalescent process. Oxford Survey of Evolutionary Biology 7, 1–44 (1990)Google Scholar
  9. 9.
    Hull, J., Rowlands, K., Lockhart, E., Sharland, M., Moore, C., Hanchard, N., Kwiatkowski, D.P.: Haplotype mapping of the bronchiolitis susceptibility locus near IL8. Am. J. Hum. Genet. 114, 272–279 (2004)CrossRefGoogle Scholar
  10. 10.
    Kimmel, G., Shamir, R.: GERBIL: Genotype resolution and block identification using likelihood. Proc. Natl. Acad.Sci. 102, 158–162 (2005)CrossRefGoogle Scholar
  11. 11.
    Kruglyak, L., Nickerson, D.A.: Variation is the spice of life. Nat. Genet. 27, 234–236 (2001)CrossRefGoogle Scholar
  12. 12.
    Niu, T., Qin, Z., Xu, X., Liu, J.S.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70, 157–169 (2002)CrossRefGoogle Scholar
  13. 13.
    Niu, T.: Algorithms for inferring haplotypes. Genet. Epidemiol. 27(4), 334–347 (2004)CrossRefGoogle Scholar
  14. 14.
    Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)CrossRefGoogle Scholar
  15. 15.
    Stephens, M., Donnelly, P.: A Comparison of Bayesian Methods for Haplotype Reconstruction from Population Genotype Data. Am. J. Human Genetics 73, 1162–1169 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Dumitru Brinza
    • 1
  • Alexander Zelikovsky
    • 1
  1. 1.Department of Computer ScienceGeorgia State UniversityAtlantaUSA

Personalised recommendations