Skip to main content

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

  • 856 Accesses

Summary

In this chapter we present new techniques for discovering knowledge from evolutionary trees. An evolutionary tree is a rooted unordered labeled tree in which there is a root and the order among siblings is unimportant. The knowledge to be discovered from these trees refers to “cousin pairs” in the trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(T 2) time where T is the number of nodes in T. We also extend this algorithm to find interesting cousin pairs in multiple trees. Experimental results on synthetic data and real trees demonstrate the scalability and effectiveness of the proposed algorithms. To show the usefulness of these techniques, we discuss an application of the cousin pairs to evaluate the consensus of equally parsimonious trees and compare them with the widely used clusters in the trees. We also report the implementation status of the system built based on the proposed algorithms, which is fully operational and available on the world-wide web.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adams, E. N., 1972: Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21, 390–97.

    Google Scholar 

  2. Bremer, K., 1990: Combinable component consensus. Cladistics, 6, 369–72.

    Article  Google Scholar 

  3. Brodal, G. S., R. Fagerberg and C. N. S. Pedersen, 2003: Computing the quartet distance between evolutionary trees in time O(n log n). Algorithmica, 38(2), 377–95.

    Article  MathSciNet  Google Scholar 

  4. Brown, E. K., and W. H. E. Day, 1984: A computationally efficient approximation to the nearest neighbor interchange metric. Journal of Classification, 1, 93–124.

    Article  MathSciNet  MATH  Google Scholar 

  5. Bryant, D., J. Tsang, P. E. Kearney and M. Li, 2000: Computing the quartet distance between evolutionary trees. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms, 285–6.

    Google Scholar 

  6. Bustamante, C. D., R. Nielsen and D. L. Hartl, 2002: Maximum likelihood method for analyzing pseudogene evolution: Implications for silent site evolution in humans and rodents. Molecular Biology and Evolution, 19(1), 110–17.

    Google Scholar 

  7. DasGupta, B., X. He, T. Jiang, M. Li, J. Tromp and L. Zhang, 1997: On distances between phylogenetic trees. In Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms, 427–36.

    Google Scholar 

  8. Day W. H. E., 1985: Optimal algorithms for comparing trees with labeled leaves. Journal of Classification, 1, 7–28.

    MathSciNet  Google Scholar 

  9. Douchette, C. R., 1985: An efficient algorithm to compute quartet dissimilarity measures. Unpublished BSc (Hons) dissertation, Memorial University of Newfoundland.

    Google Scholar 

  10. Felsenstein, J., 1989: PHYLIP: Phylogeny inference package (version 3.2). Cladistics, 5, 164–6.

    Google Scholar 

  11. Fitch, W., 1971: Toward the defining the course of evolution: Minimum change for a specific tree topology. Systematic Zoology, 20, 406–16.

    Google Scholar 

  12. Genealogy.com, What is a first cousin, twice removed? Available at URL: www.genealogy.com/16 cousn.html.

    Google Scholar 

  13. Han, J., and M. Kamber, 2000: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco, California.

    Google Scholar 

  14. Heymans, M., and A. K. Singh, 2003: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. In Proceedings of the 11th International Conference on Intelligent Systems for Molecular Biology, 138–46.

    Google Scholar 

  15. Holmes, S., and P. Diaconis, 2002: Random walks on trees and matchings. Electronic Journal of Probability, 7.

    Google Scholar 

  16. Lundrigan, B. L., S. Jansa and P. K. Tucker, 2002: Phylogenetic relationships in the genus mus, based on paternally, maternally, and biparentally inherited characters. Systematic Biology, 51, 23–53.

    Article  Google Scholar 

  17. Margush, T., and F. R. McMorris, 1981: Consensus n-trees. Bull. Math. Biol., 43, 239–44.

    Google Scholar 

  18. Nelson, G., 1979: Cladistic analysis and synthesis: Principles and definitions, with a historical note on Adanson’s Famille des Plantes (1763-4). Systematic Zoology, 28, 1–21.

    Google Scholar 

  19. Page, R. D. M., 1989: COMPONENT user’s manual (release 1.5). University of Auckland, Auckland.

    Google Scholar 

  20. Pearson, W. R., G. Robins and T. Zhang, 1999: Generalized neighborjoining: More reliable phylogenetic tree reconstruction. Molecular Biology and Evolution, 16(6), 806–16.

    Google Scholar 

  21. Sanderson, M. J., M. J. Donoghue, W. H. Piel and T. Erikson, 1994: Treebase: A prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. American Journal of Botany, 81(6), 183.

    Google Scholar 

  22. Shasha, D., J. T. L. Wang, and S. Zhang, 2004: Unordered tree mining with applications to phylogeny. In Proceedings of the 20th International Conference on Data Engineering, 708–19.

    Google Scholar 

  23. Stockham, C., L. Wang and T. Warnow, 2002: Statistically based postprocessing of phylogenetic analysis by clustering. In Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology, 285–93.

    Google Scholar 

  24. Tao, J., E. L. Lawler and L. Wang, 1994: Aligning sequences via an evolutionary tree: Complexity and approximation. In Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 760–9.

    Google Scholar 

  25. Wang, J. T. L., T. G. Marr, D. Shasha, B. A. Shapiro, G. W. Chirn and T. Y. Lee, 1996: Complementary classification approaches for protein sequences. Protein Engineering, 9(5), 381–6.

    Google Scholar 

  26. Wang, J. T. L., S. Rozen, B. A. Shapiro, D. Shasha, Z. Wang and M. Yin, 1999: New techniques for DNA sequence classification. Journal of Computational Biology, 6(2), 209–218.

    Article  Google Scholar 

  27. Wang, J. T. L., H. Shan, D. Shasha and W. H. Piel, 2003: Tree-Rank: A similarity measure for nearest neighbor searching in phylogenetic databases. In Proceedings of the 15th International Conference on Scientific and Statistical Database Management, 171–80.

    Google Scholar 

  28. Wang, J. T. L, B. A. Shapiro and D. Shasha, eds., 1999: Pattern Discovery in Biomolecular Data: Tools, Techniques and Applications. Oxford University Press, New York, New York.

    Google Scholar 

  29. Wang, J. T. L., C. H. Wu and P. P. Wang, eds., 2003: Computational Biology and Genome Informatics. World Scientific, Singapore.

    Google Scholar 

  30. Zhang, K., J. T. L. Wang and D. Shasha, 1996: On the editing distance between undirected acyclic graphs. International Journal of Foundations of Computer Science, 7 (1), 43–58.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Dr Sanghamitra Bandyopadhyay

About this chapter

Cite this chapter

Zhang, S., Wang, J.T.L. (2005). Knowledge Discovery from Evolutionary Trees. In: Advanced Methods for Knowledge Discovery from Complex Data. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/1-84628-284-5_8

Download citation

  • DOI: https://doi.org/10.1007/1-84628-284-5_8

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-989-0

  • Online ISBN: 978-1-84628-284-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics