Identifying Rogue Taxa through Reduced Consensus: NP-Hardness and Exact Algorithms

  • Akshay Deepak
  • Jianrong Dong
  • David Fernández-Baca
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7292)

Abstract

A rogue taxon in a collection of phylogenetic trees is one whose position varies drastically from tree to tree. The presence of such taxa can greatly reduce the resolution of the consensus tree (e.g., the majority-rule or strict consensus) for a collection. The reduced consensus approach aims to identify and eliminate rogue taxa to produce more informative consensus trees. Given a collection of phylogenetic trees over the same leaf set, the goal is to find a set of taxa whose removal maximizes the number of internal edges in the consensus tree of the collection. We show that this problem is NP-hard for strict and majority-rule consensus. We give a polynomial-time algorithm for reduced strict consensus when the maximum degree of the strict consensus of the original trees is bounded. We describe exact integer linear programming formulations for computing reduced strict, majority and loose consensus trees. In experimental tests, our exact solutions improved over heuristic methods on several problem instances.

Keywords

Integer Linear Programming Consensus Tree Internal Edge Input Tree Consensus Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amenta, N., Clarke, F., John, K.S.: A Linear-Time Majority Tree Algorithm. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 216–227. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Amir, A., Keselman, D.: Maximum agreement subtree in a set of evolutionary trees. SIAM Journal on Computing 26, 758–769 (1994)MathSciNetGoogle Scholar
  3. 3.
    Bryant, D.: A classification of consensus methods for phylogenetics. In: Janowitz, M., Lapointe, F.-J., McMorris, F., Mirkin, B.B., Roberts, F. (eds.) Bioconsensus. Discrete Mathematics and Theoretical Computer Science, vol. 61, pp. 163–185. American Mathematical Society, Providence (2003)Google Scholar
  4. 4.
    Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N.: Frequent subtree mining — an overview. Fundamenta Informaticae 66(1-2), 161–198 (2004)MathSciNetGoogle Scholar
  5. 5.
    Cranston, K.A., Rannala, B.: Summarizing a posterior distribution of trees using agreement subtrees. Systematic Biology 56(4), 578 (2007)CrossRefGoogle Scholar
  6. 6.
    Dong, J., Fernández-Baca, D.: Constructing Large Conservative Supertrees. In: Przytycka, T.M., Sagot, M.-F. (eds.) WABI 2011. LNCS, vol. 6833, pp. 61–72. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Dong, J., Fernández-Baca, D., McMorris, F.R.: Constructing majority-rule supertrees. Algorithms in Molecular Biology 5(2) (2010)Google Scholar
  8. 8.
    Farach, M., Przytycka, T.M., Thorup, M.: On the agreement of many trees. Inf. Process. Lett. 55(6), 297–301 (1995)MathSciNetMATHCrossRefGoogle Scholar
  9. 9.
    Finden, C.R., Gordon, A.D.: Obtaining common pruned trees. Journal of Classification 2(1), 255–276 (1985)CrossRefGoogle Scholar
  10. 10.
    Gusfield, D., Frid, Y., Brown, D.: Integer Programming Formulations and Computations Solving Phylogenetic and Population Genetic Problems with Missing or Genotypic Data. In: Lin, G. (ed.) COCOON 2007. LNCS, vol. 4598, pp. 51–64. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  11. 11.
    Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations. Plenum, New York (1972)Google Scholar
  12. 12.
    Lee, C.-M., Hung, L.-J., Chang, M.-S., Shen, C.-B., Tang, C.-Y.: An improved algorithm for the maximum agreement subtree problem. Information Processing Letters 94(5), 211–216 (2005)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Margush, T., McMorris, F.R.: Consensus n-trees. Bulletin of Mathematical Biology 43(2), 239–244 (1981)MathSciNetMATHGoogle Scholar
  14. 14.
    Nadler, S.A., Carreno, R.A., Mejía-Madrid, H., Ullberg, J., Pagan, C., Houston, R., Hugot, J.P.: Molecular phylogeny of clade III nematodes reveals multiple origins of tissue parasitism. Parasitology 134(10), 1421–1442 (2007)CrossRefGoogle Scholar
  15. 15.
    Pattengale, N., Aberer, A., Swenson, K., Stamatakis, A., Moret, B.: Uncovering hidden phylogenetic consensus in large datasets. IEEE/ACM Trans. Comput. Biol. Bioinformatics 8-4(99), 1 (2011)Google Scholar
  16. 16.
    Redelings, B.: Bayesian phylogenies unplugged: Majority consensus trees with wandering taxa (2009)Google Scholar
  17. 17.
    Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics. Oxford University Press, Oxford (2003)MATHGoogle Scholar
  18. 18.
    Sridhar, S., Lam, F., Blelloch, G.E., Ravi, R., Schwartz, R.: Mixed integer linear programming for maximum-parsimony phylogeny inference. IEEE/ACM Trans. Comput. Biol. Bioinformatics 5(3), 323–331 (2008)CrossRefGoogle Scholar
  19. 19.
    Sullivan, J., Swofford, D.L.: Are guinea pigs rodents? The importance of adequate models in molecular phylogenetics. Journal of Mammalian Evolution 4(2), 77–86 (1997)CrossRefGoogle Scholar
  20. 20.
    Swenson, K.M., Chen, E., Pattengale, N.D., Sankoff, D.: The Kernel of Maximum Agreement Subtrees. In: Chen, J., Wang, J., Zelikovsky, A. (eds.) ISBRA 2011. LNCS, vol. 6674, pp. 123–135. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Thomson, R.C., Shaffer, H.B.: Sparse supermatrices for phylogenetic inference: taxonomy, alignment, rogue taxa, and the phylogeny of living turtles. Systematic Biology 59(1), 42 (2010)CrossRefGoogle Scholar
  22. 22.
    Wilkinson, M.: Common cladistic information and its consensus representation: reduced Adams and reduced cladistic consensus trees and profiles. Systematic Biology 43(3), 343 (1994)Google Scholar
  23. 23.
    Wilkinson, M.: More on reduced consensus methods. Systematic Biology 44(3), 435 (1995)Google Scholar
  24. 24.
    Wilkinson, M.: Majority-rule reduced consensus trees and their use in bootstrapping. Molecular Biology and Evolution 13(3), 437 (1996)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Xiao, Y., Yao, J.F.: Efficient data mining for maximal frequent subtrees. In: Proc. IEEE International Conference on Data Mining, pp. 379–386. IEEE (2003)Google Scholar
  26. 26.
    Zaki, M.J.: Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Trans. on Knowl. and Data Eng. 17(8), 1021–1035 (2005)CrossRefGoogle Scholar
  27. 27.
    Zhang, S., Wang, J.T.L.: Discovering frequent agreement subtrees from phylogenetic data. IEEE Trans. on Knowl. and Data Eng. 20, 68–82 (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Akshay Deepak
    • 1
  • Jianrong Dong
    • 1
  • David Fernández-Baca
    • 1
  1. 1.Department of Computer ScienceIowa State UniversityAmesUSA

Personalised recommendations