Abstract
Multi-State Perfect Phylogeny is an extension of Binary Perfect Phylogeny where characters are allowed more than two states. In this paper we consider four problems that extend its utility: In the Missing Data (MD) Problem some entries in the input are missing and the question is whether (bounded) values can be imputed so that the resulting data has a multi-state Perfect Phylogeny; In the Character-Removal (CR) Problem we want to minimize the number of characters to remove from the data so that the resulting data has a multi-state Perfect Phylogeny; In the Missing-Data Character-Removal (MDCR) Problem we want to impute values for the missing data to minimize the solution to the resulting Character-Removal Problem; In the Insertion and Deletion (ID) Problem insertion and deletion mutational events spanning multiple characters are also allowed.
In this paper, we introduce a new general conceptual solution to these four problems. The method reduces k-state problems to binary problems with missing data. This gives a new conceptual solution to the multi-state Perfect Phylogeny problem, and conceptual solutions to the MD, CR, MDCR and ID problems for any k significantly improving previous work. Empirical evaluations of our implementations show that they are faster and effective for larger input than previously established methods for general k.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwala, R., Fernandez-Baca, D.: A polynomial-time algorithm for the perfect phylogeny problem when the number of character states is fixed. SIAM Journal on Computing 23(6), 1216–1224 (1994)
Alekseyenko, A.V., Lee, C.J., Suchard, M.A.: Wagner and dollo: a stochastic duet by composing two parsimonious solos. Syst. Biol. 57(5), 772–784 (2008)
Buneman, P.: The recovery of trees from measures of dissimilarity. Mathematics in the archaeological and historical sciences, 387–395 (1971)
Fernández-Baca, D.: The perfect phylogeny problem. In: Du, D.Z., Cheng, X. (eds.) Steiner Trees in Industries. Kluwer Academic Publishers, Dordrecht (2001)
Gusfield, D.: Efficient algorithms for inferring evolutionary trees. Networks 21(1), 19–28 (1991)
Gusfield, D.: The multi-state perfect phylogeny problem with missing and removable data: Solutions via integer-programming and chordal graph theory. In: Batzoglou, S. (ed.) RECOMB 2009. LNCS, vol. 5541, pp. 236–252. Springer, Heidelberg (2009)
Gusfield, D., Frid, Y., Brown, D.: Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, p. 51. Springer, Heidelberg (2007)
Gysel, R., Gusfield, D.: Extensions and Improvements to the Chordal Graph Approach to the Multi-state Perfect Phylogeny Problem. In: Borodovsky, M., Gogarten, J.P., Przytycka, T.M., Rajasekaran, S. (eds.) Bioinformatics Research and Applications. LNCS, vol. 6053, pp. 52–60. Springer, Heidelberg (2010)
Halperin, E., Karp, R.: Perfect phylogeny and haplotype assignment. In: Proceedings of the eighth annual international conference on Resaerch in computational molecular biology, pp. 10–19. ACM, New York (2004)
Hudson, R.: Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18(2), 337–338 (2002)
Kannan, S., Warnow, T.: Inferring evolutionary history from DNA sequences. In: Proceedings of 31st Annual Symposium on Foundations of Computer Science, pp. 362–371 (1990)
Kannan, S., Warnow, T.: A fast algorithm for the computation and enumeration of perfect phylogenies when the number of character states is fixed. In: Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms, pp. 595–603. Society for Industrial and Applied Mathematics, Philadelphia (1995)
Lloyd, D.: Multi-residue gaps, a class of molecular characters with exceptional reliability for phylogenetic analyses. Journal of Evolutionary Biology 4(1), 9–21 (2002)
Pe’er, I., Pupko, T., Shamir, R., Sharan, R.: Incomplete directed perfect phylogeny. SIAM Journal on Computing 33(3), 590–607 (2004)
Satya, R., Mukherjee, A.: The undirected incomplete perfect phylogeny problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics 5(4), 618–629 (2008)
Semple, C., Steel, M.: Phylogenetics. Oxford University Press, USA (2003)
Simmons, M., Ochoterena, H.: Gaps as characters in sequence-based phylogenetic analyses. Systematic Biology 49(2), 369–381 (2000)
Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. Journal of Classification 9(1), 91–116 (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stevens, K., Gusfield, D. (2010). Reducing Multi-state to Binary Perfect Phylogeny with Applications to Missing, Removable, Inserted, and Deleted Data. In: Moulton, V., Singh, M. (eds) Algorithms in Bioinformatics. WABI 2010. Lecture Notes in Computer Science(), vol 6293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15294-8_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-15294-8_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15293-1
Online ISBN: 978-3-642-15294-8
eBook Packages: Computer ScienceComputer Science (R0)