Fast Bayesian Haplotype Inference Via Context Tree Weighting

  • Pasi Rastas
  • Jussi Kollin
  • Mikko Koivisto
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5251)


We present a new, Bayesian method for inferring haplotypes for unphased genotypes. The method can be viewed as a unification of some ideas of variable-order Markov chain modelling and ensemble learning that so far have been implemented only separately in some of the state-of-the-art methods. Specifically, we make use of the Context Tree Weighting algorithm to efficiently compute the posterior probability of any given haplotype assignment; we employ a simulated annealing scheme to rapidly find several local optima of the posterior; and we sketch a full Bayesian analogue, in which a weighted sample of haplotype assignments is drawn to summarize the posterior distribution. We also show that one can minimize in linear time the average switch distance, a popular measure of phasing accuracy, to a given (weighted) sample of haplotype assignments. We demonstrate empirically that the presented method typically performs as well as the leading fast haplotype inference methods, and sometimes better. The methods are freely available in a computer program BACH (Bayesian Context-based Haplotyping)


Posterior Distribution Switch Sequence Haplotype Inference Heterozygous Site Haplotype Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Conrad, D.F., Andrews, T.D., Carter, N.P., Hurles, M.E., Pritchard, J.K.: A high-resolution survey of deletion polymorphism in the human genome. Nat. Genet. 38, 75–81 (2006)CrossRefGoogle Scholar
  2. 2.
    Corona, E., Raphael, B.J., Eskin, E.: Identification of deletion polymorphisms from haplotypes. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS (LNBI), vol. 4453, pp. 354–365. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Kohler, J.E., Cutler, D.J.: Simultaneous discovery and testing of deletions for disease associations in SNP genotyping studies. Am. J. Hum. Genet. 81, 684–699 (2007)CrossRefGoogle Scholar
  4. 4.
    Bansal, V., Bashir, A., Bafna, V.: Evidence for large inversion polymorphisms in the human genome from HapMap data. Genome Res. 17, 219–230 (2007)CrossRefGoogle Scholar
  5. 5.
    Clark, A.G.: Inference of haplotypes from PCR-amplified samples of diploid populations. Mol. Biol. Evol. 7, 111–122 (1990)Google Scholar
  6. 6.
    Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921–927 (1995)Google Scholar
  7. 7.
    Long, J.C., Williams, R.C., Urbanek, M.: An E-M algorithm and testing strategy for multiple-locus haplotypes. Am. J. Hum. Genet. 56, 799–810 (1995)Google Scholar
  8. 8.
    Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. Am. J. Hum. Genet. 68, 978–989 (2001)CrossRefGoogle Scholar
  9. 9.
    Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms. Am. J. Hum. Genet. 70, 157–169 (2002)CrossRefGoogle Scholar
  10. 10.
    Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 104–113 (2004)CrossRefGoogle Scholar
  11. 11.
    Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: A hidden Markov technique for haplotype reconstruction. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 140–151. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Kimmel, G., Shamir, R.: Genotype resolution and block identification using likelihood. In: Proceeding of the National Academy of Sciences of the United States of America (PNAS), vol. 102, pp. 158–162 (2005)Google Scholar
  13. 13.
    Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006)CrossRefGoogle Scholar
  14. 14.
    Eronen, L., Geerts, F., Toivonen, H.: Haplorec: efficient and accurate large-scale reconstruction of haplotypes. BMC Bioinformatics 7, 542 (2006)CrossRefGoogle Scholar
  15. 15.
    Browning, S., Browning, B.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007)CrossRefGoogle Scholar
  16. 16.
    Willems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context-tree weighting method: Basic properties. IEEE Trans. Inform. Theory 41, 653–664 (1995)zbMATHCrossRefGoogle Scholar
  17. 17.
    Neal, R.M.: Annealed importance sampling. Statist. Comput. 11, 125–139 (2001)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Kääriäinen, M., Landwehr, N., Lappalainen, S., Mielikäinen, T.: Combining haplotypers. Technical Report C-2007-57, Department of Computer Science, University of Helsinki (2007)Google Scholar
  19. 19.
    The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005)Google Scholar
  20. 20.
    Marchini, J., Cutler, D., Patterson, N., et al.: A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78, 437–450 (2006)CrossRefGoogle Scholar
  21. 21.
    Willems, F.M.J.: The context-tree weighting method: Extensions. IEEE Trans. Inform. Theory 44, 792–798 (1998)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. Am. J. Hum. Genet. 71, 1129–1137 (2002)CrossRefGoogle Scholar
  23. 23.
    Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am. J. Hum. Genet. 76, 449–462 (2005)CrossRefGoogle Scholar
  24. 24.
    Schaffner, S.F., Foo, C., Gabriel, S., Reich, D., Daly, M.J., Altshuler, D.: Calibrating a coalescent simulation of human genome sequence variation. Genome Res. 15, 1576–1583 (2005)CrossRefGoogle Scholar
  25. 25.
    Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 105–120 (2004)CrossRefGoogle Scholar
  26. 26.
    Eskin, E., Grundy, W.N., Singer, Y.: Protein family classification using sparse markov transducers. In: Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, pp. 134–145. AAAI Press, Menlo Park (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Pasi Rastas
    • 1
  • Jussi Kollin
    • 1
  • Mikko Koivisto
    • 1
  1. 1.Department of Computer Science & HIIT Basic Research UnitUniversity of HelsinkiFinland

Personalised recommendations