Mihaescu, R., Hill, C. & Rao, S. Algorithmica (2013) 66: 419. doi:10.1007/s00453-012-9644-4
Given natural limitations on the length DNA sequences, designing phylogenetic reconstruction methods which are reliable under limited information is a crucial endeavor. There have been two approaches to this problem: reconstructing partial but reliable information about the tree (Mossel in IEEE Comput. Biol. Bioinform. 4:108–116, 2007; Daskalakis et al. in SIAM J. Discrete Math. 25:872–893, 2011; Daskalakis et al. in Proc. of RECOMB 2006, pp. 281–295, 2006; Gronau et al. in Proc. of the 19th Annual SODA 2008, pp. 379–388, 2008), and reaching “deeper” in the tree through reconstruction of ancestral sequences. In the latter category, Daskalakis et al. (Proc. of the 38th Annual STOC, pp. 159–168, 2006) settled an important conjecture of M. Steel (My favourite conjecture. Preprint, 2001), showing that, under the CFN model of evolution, all trees on n leaves with edge lengths bounded by the Ising model phase transition can be recovered with high probability from genomes of length O(logn) with a polynomial time algorithm. Their methods had a running time of O(n10).
Here we enhance our methods from Daskalakis et al. (Proc. of RECOMB 2006, pp. 281–295, 2006) with the learning of ancestral sequences and provide an algorithm for reconstructing a sub-forest of the tree which is reliable given available data, without requiring a-priori known bounds on the edge lengths of the tree. Our methods are based on an intuitive minimum spanning tree approach and run in O(n3) time. For the case of full reconstruction of trees with edges under the phase transition, we maintain the same asymptotic sequence length requirements as in Daskalakis et al. (Proc. of the 38th Annual STOC, pp. 159–168, 2006), despite the considerably faster running time.
Phylogenetic reconstruction Ising model Phase transitions Phylogenetic forests Information flow Ancestral sequence reconstruction