Summary
In this chapter we present new techniques for discovering knowledge from evolutionary trees. An evolutionary tree is a rooted unordered labeled tree in which there is a root and the order among siblings is unimportant. The knowledge to be discovered from these trees refers to “cousin pairs” in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(T 2) time where T is the number of nodes in T. We also extend this algorithm to find interesting cousin pairs in multiple trees. Experimental results on synthetic data and real trees demonstrate the scalability and effectiveness of the proposed algorithms. To show the usefulness of these techniques, we discuss an application of the cousin pairs to evaluate the consensus of equally parsimonious trees and compare them with the widely used clusters in the trees. We also report the implementation status of the system built based on the proposed algorithms, which is fully operational and available on the world-wide web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adams, E. N., 1972: Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21, 390–97.
Bremer, K., 1990: Combinable component consensus. Cladistics, 6, 369–72.
Brodal, G. S., R. Fagerberg and C. N. S. Pedersen, 2003: Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica, 38(2), 377–95.
Brown, E. K., and W. H. E. Day, 1984: A computationally efficient approximation to the nearest neighbor interchange metric. Journal of Classification, 1, 93–124.
Bryant, D., J. Tsang, P. E. Kearney and M. Li, 2000: Computing the quartet distance between evolutionary trees. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 285–6.
Bustamante, C. D., R. Nielsen and D. L. Hartl, 2002: Maximum likelihood method for analyzing pseudogene evolution: Implications for silent site evolution in humans and rodents. Molecular Biology and Evolution, 19(1), 110–17.
DasGupta, B., X. He, T. Jiang, M. Li, J. Tromp and L. Zhang, 1997: On distances between phylogenetic trees. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 427–36.
Day W. H. E., 1985: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification, 1, 7–28.
Douchette, C. R., 1985: An efficient algorithm to compute quartet dissimilarity measures. Unpublished BSc (Hons) dissertation, Memorial University of Newfoundland.
Felsenstein, J., 1989: PHYLIP: Phylogeny inference package (version 3.2). Cladistics, 5, 164–6.
Fitch, W., 1971: Toward the defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology, 20, 406–16.
Genealogy.com, What is a first cousin, twice removed? Available at URL: www.genealogy.com/16 cousn.html.
Han, J., and M. Kamber, 2000: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, California.
Heymans, M., and A. K. Singh, 2003: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. In Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology, 138–46.
Holmes, S., and P. Diaconis, 2002: Random walks on trees and matchings. Electronic Journal of Probability, 7.
Lundrigan, B. L., S. Jansa and P. K. Tucker, 2002: Phylogenetic relationships in the genus mus, based on paternally, maternally, and biparentally inherited characters. Systematic Biology, 51, 23–53.
Margush, T., and F. R. McMorris, 1981: Consensus n-trees. Bull. Math. Biol., 43, 239–44.
Nelson, G., 1979: Cladistic analysis and synthesis: Principles and definitions, with a historical note on Adanson’s Famille des Plantes (1763-4). Systematic Zoology, 28, 1–21.
Page, R. D. M., 1989: COMPONENT user’s manual (release 1.5). University of Auckland, Auckland.
Pearson, W. R., G. Robins and T. Zhang, 1999: Generalized neighborjoining: More reliable phylogenetic tree reconstruction. Molecular Biology and Evolution, 16(6), 806–16.
Sanderson, M. J., M. J. Donoghue, W. H. Piel and T. Erikson, 1994: Treebase: A prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. American Journal of Botany, 81(6), 183.
Shasha, D., J. T. L. Wang, and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. In Proceedings of the 20th International Conference on Data Engineering, 708–19.
Stockham, C., L. Wang and T. Warnow, 2002: Statistically based postprocessing of phylogenetic analysis by clustering. In Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology, 285–93.
Tao, J., E. L. Lawler and L. Wang, 1994: Aligning sequences via an evolutionary tree: Complexity and approximation. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 760–9.
Wang, J. T. L., T. G. Marr, D. Shasha, B. A. Shapiro, G. W. Chirn and T. Y. Lee, 1996: Complementary classification approaches for protein sequences. Protein Engineering, 9(5), 381–6.
Wang, J. T. L., S. Rozen, B. A. Shapiro, D. Shasha, Z. Wang and M. Yin, 1999: New techniques for DNA sequence classification. Journal of Computational Biology, 6(2), 209–218.
Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel, 2003: Tree-Rank: A similarity measure for nearest neighbor searching in phylogenetic databases. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management, 171–80.
Wang, J. T. L, B. A. Shapiro and D. Shasha, eds., 1999: Pattern Discovery in Biomolecular Data: Tools, Techniques and Applications. Oxford University Press, New York, New York.
Wang, J. T. L., C. H. Wu and P. P. Wang, eds., 2003: Computational Biology and Genome Informatics. World Scientific, Singapore.
Zhang, K., J. T. L. Wang and D. Shasha, 1996: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 7 (1), 43–58.
Rights and permissions
Copyright information
© 2005 Dr Sanghamitra Bandyopadhyay
About this chapter
Cite this chapter
Zhang, S., Wang, J.T.L. (2005). Knowledge Discovery from Evolutionary Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_8
Download citation
DOI: https://doi.org/10.1007/1-84628-284-5_8
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)