Abstract
The within-species genetic variation due to recombinations leads to a mosaic-like structure of DNA. This structure can be modeled, e.g. by parsing sample sequences of current DNA with respect to a small number of founders. The founders represent the ancestral sequence material from which the sample was created in a sequence of recombination steps. This scenario has recently been successfully applied on developing probabilistic Hidden Markov Methods for haplotyping genotypic data. In this paper we introduce a combinatorial method for haplotyping that is based on a similar parsing idea. We formulate a polynomial-time parsing algorithm that finds minimum cross-over parse in a simplified ‘flat’ parsing model that ignores the historical hierarchy of recombinations. The problem of constructing optimal founders that would give minimum possible parse for given genotypic sequences is shown NP-hard. A heuristic locally-optimal algorithm is given for founder construction. Combined with flat parsing this already gives quite good haplotyping results. Improved haplotyping is obtained by using a hierarchical parsing that properly models the natural recombination process. For finding short hierarchical parses a greedy polynomial-time algorithm is given. Empirical haplotyping results on HapMap data are reported.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Daly, M., Rioux, J., Schaffner, S., Hudson, T., Lander, E.: High-resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory on NP-Completeness. W. H. Freeman and Company, New York (1979)
Griffiths, R., Marjoram, P.: Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology 3, 479–502 (1996)
Gusfield, D.: Haplotype inference by pure parsimony. Technical Report CSE-2003-2, Department of Computer Science, University of California (2003)
Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Comm. ACM 18, 341–343 (1975)
Kececioglu, J., Gusfield, D.: Reconstructing a history of recombinations from a set of sequences. Discrete Applied Mathematics 88, 239–260 (1998)
Kleinberg, J., Papadimitriou, C., Raghavan, P.: Segmentation problems. In: Proc. STOC 1998, New York, USA, pp. 473–482. ACM Press, New York (1998)
Koivisto, M., Rastas, P., Ukkonen, E.: Recombination systems. In: Karhumäki, J., Maurer, H., Păun, G., Rozenberg, G. (eds.) Theory Is Forever. LNCS, vol. 3113, pp. 159–169. Springer, Heidelberg (2004)
Lajoie, M., El-Mabrouk, N.: Recovering haplotype structure through recombination and gene conversion. Bioinformatics 21(suppl. 2), ii173–ii179 (2005)
Lancia, G., Pinotti, C., Rizzi, R.: Haplotyping populations: Complexity and approximations. Technical Report DIT-02-0080, Department of Information and Communication Technology, University of Trento (2002)
Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples. American Journal of Human Genetics 71, 1129–1137 (2002)
Lyngsø, R., Song, Y., Hein, J.: Minimum recombination histories by branch and bound. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 239–250. Springer, Heidelberg (2005)
Pääbo, S.: The mosaic in our genome. Nature 421, 409–412 (2003)
Rastas, P.: Haplotyyppien määritys (Haplotype inference). Report C-2004-69 (M.Sc. thesis), Department of Computer Science, University of Helsinki (2004)
Rastas, P., Koivisto, M., Mannila, H., Ukkonen, E.: A hidden markov technique for haplotype reconstruction. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 140–151. Springer, Heidelberg (2005)
Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American Journal of Human Genetics 78, 629–644 (2006)
Schwartz, R., Clark, A., Istrail, S.: Methods for inferring block-wise ancestral history from haploid sequences. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 44–59. Springer, Heidelberg (2002)
The International HapMap Consortium: A haplotype map of the human genome. Nature 437, 1299–1320 (2005)
Ukkonen, E.: Finding founder sequences from a set of recombinants. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 277–286. Springer, Heidelberg (2002)
Wade, C., Kulbokas, E., Kirby, A., Zody, M., Mullikin, J., Lander, E., Daly, M.: The mosaic structure of variation in the laboratory mouse genome. Nature 420, 574–578 (2002)
Wang, L., Zhang, K., Zhang, L.: Perfect phylogenetic networks with recombination. Journal of Computational Biology 8, 69–78 (2001)
Wu, Y., Gusfield, D.: Improved algorithms for inferring the minimum mosaic of a set of recombinants. In: Proc. CPM 2007, Springer, Heidelberg (to appear, 2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rastas, P., Ukkonen, E. (2007). Haplotype Inference Via Hierarchical Genotype Parsing. In: Giancarlo, R., Hannenhalli, S. (eds) Algorithms in Bioinformatics. WABI 2007. Lecture Notes in Computer Science(), vol 4645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74126-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-74126-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74125-1
Online ISBN: 978-3-540-74126-8
eBook Packages: Computer ScienceComputer Science (R0)