Abstract
Understanding recombination is a central problem in population genetics. In this paper, we address an established computational problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences (Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing lower bounds for low quality data.
Similar content being viewed by others
References
Bafna V, Bansal V (2004) The number of recombination events in a sample history: conflict graph and lower bounds. IEEE/ACM Trans Comput Biol Bioinf 1:78–90
Bafna V, Bansal V (2006) Inference about recombination from haplotype data: lower bounds and recombination hotspots. J Comput Biol 13:501–521
Bordewich M, Semple C (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann Comb 8:409–423
Foulds LR, Graham RL (1982) The Steiner tree in phylogeny is NP-complete. Adv Appl Math 3
Garey M, Johnson D (1979) Computers and intractability. Freeman, San Francisco
Griffiths RC, Marjoram P (1996) Ancestral inference from samples of DNA sequences with recombination. J Comput Biol 3:479–502
Gusfield D (1991) Efficient algorithms for inferring evolutionary history. Networks 21:19–28
Gusfield D, Eddhu S, Langley C (2004) Optimal, efficient reconstruction of phylogenetic networks with constrained recombination. J Bioinf Comput Biol 2:173–213
Gusfield D, Hickerson D, Eddhu S (2007) An efficiently-computed lower bound on the number of recombinations in phylogenetic networks: theory and empirical study. Discrete Appl Math 155:806–830
Hudson R (2002) Generating samples under the Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2):337–338
Hudson R, Kaplan N (1985) Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 111:147–164
Myers S (2003) The detection of recombination events using DNA sequence data. PhD dissertation, Dept of Statistics, University of Oxford, Oxford, England
Myers SR, Griffiths RC (2003) Bounds on the minimum number of recombination events in a sample history. Genetics 163:375–394
Song YS, Ding Z, Gusfield D, Langley C, Wu Y (2006) Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivations of SNP sequences in populations. In: Proceedings of RECOMB 2006. LNBI, vol 3909
Song YS, Wu Y, Gusfield D (2005) Efficient computation of close lower and upper bounds on the minimum number of needed recombinations in the evolution of biological sequences. Bioinformatics 421:i413–i422. Proceedings of ISMB 2005
Wang L, Zhang K, Zhang L (2001) Perfect phylogenetic networks with recombination. J Comput Biol 8:69–78
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in the Proceedings of COCOON 2007, LNCS, vol. 4598, pp. 16–26.
The work was performed while Y. Wu was with UC Davis and supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.
D. Gusfield supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.
Rights and permissions
About this article
Cite this article
Wu, Y., Gusfield, D. A new recombination lower bound and the minimum perfect phylogenetic forest problem. J Comb Optim 16, 229–247 (2008). https://doi.org/10.1007/s10878-007-9129-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-007-9129-6