Knowledge Discovery from Evolutionary Trees

Zhang, Sen; Wang, Jason T. L.

doi:10.1007/1-84628-284-5_8

Sen Zhang &
Jason T. L. Wang

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

856 Accesses

Summary

In this chapter we present new techniques for discovering knowledge from evolutionary trees. An evolutionary tree is a rooted unordered labeled tree in which there is a root and the order among siblings is unimportant. The knowledge to be discovered from these trees refers to “cousin pairs” in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(T ²) time where T is the number of nodes in T. We also extend this algorithm to find interesting cousin pairs in multiple trees. Experimental results on synthetic data and real trees demonstrate the scalability and effectiveness of the proposed algorithms. To show the usefulness of these techniques, we discuss an application of the cousin pairs to evaluate the consensus of equally parsimonious trees and compare them with the widely used clusters in the trees. We also report the implementation status of the system built based on the proposed algorithms, which is fully operational and available on the world-wide web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adams, E. N., 1972: Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21, 390–97.
Google Scholar
Bremer, K., 1990: Combinable component consensus. Cladistics, 6, 369–72.
Article Google Scholar
Brodal, G. S., R. Fagerberg and C. N. S. Pedersen, 2003: Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica, 38(2), 377–95.
Article MathSciNet Google Scholar
Brown, E. K., and W. H. E. Day, 1984: A computationally efficient approximation to the nearest neighbor interchange metric. Journal of Classification, 1, 93–124.
Article MathSciNet MATH Google Scholar
Bryant, D., J. Tsang, P. E. Kearney and M. Li, 2000: Computing the quartet distance between evolutionary trees. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 285–6.
Google Scholar
Bustamante, C. D., R. Nielsen and D. L. Hartl, 2002: Maximum likelihood method for analyzing pseudogene evolution: Implications for silent site evolution in humans and rodents. Molecular Biology and Evolution, 19(1), 110–17.
Google Scholar
DasGupta, B., X. He, T. Jiang, M. Li, J. Tromp and L. Zhang, 1997: On distances between phylogenetic trees. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 427–36.
Google Scholar
Day W. H. E., 1985: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification, 1, 7–28.
MathSciNet Google Scholar
Douchette, C. R., 1985: An efficient algorithm to compute quartet dissimilarity measures. Unpublished BSc (Hons) dissertation, Memorial University of Newfoundland.
Google Scholar
Felsenstein, J., 1989: PHYLIP: Phylogeny inference package (version 3.2). Cladistics, 5, 164–6.
Google Scholar
Fitch, W., 1971: Toward the defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology, 20, 406–16.
Google Scholar
Genealogy.com, What is a first cousin, twice removed? Available at URL: www.genealogy.com/16 cousn.html.
Google Scholar
Han, J., and M. Kamber, 2000: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, California.
Google Scholar
Heymans, M., and A. K. Singh, 2003: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. In Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology, 138–46.
Google Scholar
Holmes, S., and P. Diaconis, 2002: Random walks on trees and matchings. Electronic Journal of Probability, 7.
Google Scholar
Lundrigan, B. L., S. Jansa and P. K. Tucker, 2002: Phylogenetic relationships in the genus mus, based on paternally, maternally, and biparentally inherited characters. Systematic Biology, 51, 23–53.
Article Google Scholar
Margush, T., and F. R. McMorris, 1981: Consensus n-trees. Bull. Math. Biol., 43, 239–44.
Google Scholar
Nelson, G., 1979: Cladistic analysis and synthesis: Principles and definitions, with a historical note on Adanson’s Famille des Plantes (1763-4). Systematic Zoology, 28, 1–21.
Google Scholar
Page, R. D. M., 1989: COMPONENT user’s manual (release 1.5). University of Auckland, Auckland.
Google Scholar
Pearson, W. R., G. Robins and T. Zhang, 1999: Generalized neighborjoining: More reliable phylogenetic tree reconstruction. Molecular Biology and Evolution, 16(6), 806–16.
Google Scholar
Sanderson, M. J., M. J. Donoghue, W. H. Piel and T. Erikson, 1994: Treebase: A prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. American Journal of Botany, 81(6), 183.
Google Scholar
Shasha, D., J. T. L. Wang, and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. In Proceedings of the 20th International Conference on Data Engineering, 708–19.
Google Scholar
Stockham, C., L. Wang and T. Warnow, 2002: Statistically based postprocessing of phylogenetic analysis by clustering. In Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology, 285–93.
Google Scholar
Tao, J., E. L. Lawler and L. Wang, 1994: Aligning sequences via an evolutionary tree: Complexity and approximation. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 760–9.
Google Scholar
Wang, J. T. L., T. G. Marr, D. Shasha, B. A. Shapiro, G. W. Chirn and T. Y. Lee, 1996: Complementary classification approaches for protein sequences. Protein Engineering, 9(5), 381–6.
Google Scholar
Wang, J. T. L., S. Rozen, B. A. Shapiro, D. Shasha, Z. Wang and M. Yin, 1999: New techniques for DNA sequence classification. Journal of Computational Biology, 6(2), 209–218.
Article Google Scholar
Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel, 2003: Tree-Rank: A similarity measure for nearest neighbor searching in phylogenetic databases. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management, 171–80.
Google Scholar
Wang, J. T. L, B. A. Shapiro and D. Shasha, eds., 1999: Pattern Discovery in Biomolecular Data: Tools, Techniques and Applications. Oxford University Press, New York, New York.
Google Scholar
Wang, J. T. L., C. H. Wu and P. P. Wang, eds., 2003: Computational Biology and Genome Informatics. World Scientific, Singapore.
Google Scholar
Zhang, K., J. T. L. Wang and D. Shasha, 1996: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 7 (1), 43–58.
Google Scholar

Download references

Authors

Sen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jason T. L. Wang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, S., Wang, J.T.L. (2005). Knowledge Discovery from Evolutionary Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_8

Download citation

DOI: https://doi.org/10.1007/1-84628-284-5_8
Publisher Name: Springer, London
Print ISBN: 978-1-85233-989-0
Online ISBN: 978-1-84628-284-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics