Abstract
We give a new algorithm for the genotype phasing problem. Our solution is based on a hidden Markov model for haplotypes. The model has a uniform structure, unlike most solutions proposed so far that model recombinations using haplotype blocks. In our model, the haplotypes can be seen as a result of iterated recombinations applied on a few founder haplotypes. We find maximum likelihood model of this type by using the EM algorithm. We show how to solve the subtleties of the EM algorithm that arise when genotypes are generated using a haplotype model. We compare our method to the well-known currently available algorithms (phase, hap, gerbil) using some standard and new datasets. Our algorithm is relatively fast and gives results that are always best or second best among the methods compared.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Clark, A.G.: Inference of haplotypes from PCR-amplified samples of dipoid populations. Molecular Biology and Evolution 7, 111–122 (1990)
Gusfield, D.: Haplotype inference by pure parsimony. Technical Report CSE-2003-2, Department of Computer Science, University of California (2003)
Excoffier, L., Slatkin, M.: Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Molecular Biology and Evolution 12, 921–927 (1995)
Long, J.C., Williams, R.C., Urbanek, M.: An E-M algorithm and testing strategy for multiple-locus haplotypes. American Journal of Human genetics 56, 799–810 (1995)
Stephens, M., Smith, N., Donnelly, P.: A new statistical method for haplotype reconstruction from population data. American Journal of Human Genetics 68, 978–989 (2001)
Niu, T., Qin, Z., Xu, X., Liu, J.: Bayesian haplotype inference for multiple linked single nucleotide polymorphisms. American Journal of Human Genetics 70, 157–169 (2002)
Gusfield, D.: Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. In: Research in Computational Molecular Biology (RECOMB 2002), pp. 166–175. ACM Press, New York (2002)
Greenspan, G., Geiger, D.: Model-based inference of haplotype block variation. In: Research in Computational Molecular Biology (RECOMB 2003), pp. 131–137. ACM Press, New York (2003)
Kimmel, G., Shamir, R.: Maximum likelihood resolution of multi-block genotypes. In: Research in Computational Molecular Biology (RECOMB 2004), pp. 2–9. ACM Press, New York (2004)
Kimmel, G., Shamir, R.: Genotype resolution and block identification using likelihood. Proceeding of the National Academy of Sciences of the United States of America (PNAS) 102, 158–162 (2005)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257–285 (1989)
Halperin, E., Eskin, E.: Haplotype reconstruction from genotype data using imperfect phylogeny. Bioinformatics 20, 104–113 (2004)
Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. American Journal of Human Genetics 71, 1129–1137 (2002)
Schwartz, R., Clark, A.G., Istrail, S.: Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 44–59. Springer, Heidelberg (2002)
Jojic, N., Jojic, V., Heckerman, D.: Joint discovery of haplotype blocks and complex trait associations from snp sequences. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence (UAI 2004), pp. 286–292. AUAI Press (2004)
Ukkonen, E.: Finding founder sequences from a set of recombinants. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 277–286. Springer, Heidelberg (2002)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. John Wiley and Sons, Chichester (1996)
Bertsekas, D.P.: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York (1982)
Daly, M.J., Rioux, J.D., Schaffner, S.F., et al.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Hinds, D.A., Stuve, L.L., Nilsen, G.B., et al.: Whole-genome patterns of common dna variation in three human populations. Science 307, 1072–1079 (2005)
Koivisto, M., Perola, M., Varilo, T., et al.: An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In: Pacific Symposium on Biocomputing (PSB 2003), pp. 502–513. World Scientific, Singapore (2003)
Stephens, M., Scheet, P.: Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Americal Journal of Human Genetics 76, 449–462 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E. (2005). A Hidden Markov Technique for Haplotype Reconstruction. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_12
Download citation
DOI: https://doi.org/10.1007/11557067_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)