Journal of Mathematical Biology

, Volume 66, Issue 1–2, pp 399–420 | Cite as

Orthology relations, symbolic ultrametrics, and cographs

  • Marc Hellmuth
  • Maribel Hernandez-Rosales
  • Katharina T. Huber
  • Vincent Moulton
  • Peter F. Stadler
  • Nicolas Wieseke
Article

Abstract

Orthology detection is an important problem in comparative and evolutionary genomics and, consequently, a variety of orthology detection methods have been devised in recent years. Although many of these methods are dependent on generating gene and/or species trees, it has been shown that orthology can be estimated at acceptable levels of accuracy without having to infer gene trees and/or reconciling gene trees with species trees. Thus, it is of interest to understand how much information about the gene tree, the species tree, and their reconciliation is already contained in the orthology relation on the underlying set of genes. Here we shall show that a result by Böcker and Dress concerning symbolic ultrametrics, and subsequent algorithmic results by Semple and Steel for processing these structures can throw a considerable amount of light on this problem. More specifically, building upon these authors’ results, we present some new characterizations for symbolic ultrametrics and new algorithms for recovering the associated trees, with an emphasis on how these algorithms could be potentially extended to deal with arbitrary orthology relations. In so doing we shall also show that, somewhat surprisingly, symbolic ultrametrics are very closely related to cographs, graphs that do not contain an induced path on any subset of four vertices. We conclude with a discussion on how our results might be applied in practice to orthology detection.

Keywords

Orthology Symbolic ultrametric Cograph Cotree Rooted triples 

Mathematics Subject Classification

05C05 92D15 68R10 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aho AV, Sagiv Y, Szymanski TG, Ullman JD (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput 10: 405–421MathSciNetMATHCrossRefGoogle Scholar
  2. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262Google Scholar
  3. Berglund AC, Sjölund E, Ostlund G, Sonnhammer EL (2008) InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res 36: D263–D266CrossRefGoogle Scholar
  4. Böcker S, Dress AWM (1998) Recovering symbolically dated, rooted trees from symbolic ultrametrics. Adv Math 138: 105–125MathSciNetMATHCrossRefGoogle Scholar
  5. Brandstädt A, Le VB, Spinrad JP (1999) Graph classes: a survey. SIAM monographs on discrete mathematics and applications. Soc Ind Appl Math, PhiladelphiaGoogle Scholar
  6. Byrka J, Guillemot S, Jansson J (2010) New results on optimizing rooted triplets consistency. Discrete Appl Math 158: 1136–1147MathSciNetMATHCrossRefGoogle Scholar
  7. Corneil DG, Lerchs H, Stewart Burlingham LK (1981) Complement reducible graphs. Discrete Appl Math 3: 163–174MathSciNetMATHCrossRefGoogle Scholar
  8. Datta RS, Meacham C, Samad B, Neyer C, Sjölander K (2009) Berkeley PHOG: phylofacts orthology group prediction web server. Nucleic Acids Res 37: W84–W89CrossRefGoogle Scholar
  9. Falls C, Powell B, Snœyink J (2008) Computing high-stringency COGs using Turán-type graphs. Technical report. http://www.cs.unc.edu/~snoeyink/comp145/cogs.pdf
  10. Goodstadt L, Ponting CP (2006) Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput Biol 2: e133CrossRefGoogle Scholar
  11. Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E (2007) Ensembl 2007. Nucleic Acids Res 35: D610–D617CrossRefGoogle Scholar
  12. Huson D, Rupp R, Scornavacca C (2010) Phylogenetic networks. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  13. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV (2011) Computational methods for gene orthology inference. Brief Bioinf 12: 379–391CrossRefGoogle Scholar
  14. Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ (2011) Proteinortho: detection of (co-) orthologs in large-scale analysis. BMC Bioinformatics 12: 124CrossRefGoogle Scholar
  15. Li L, Stoeckert CJ Jr, Roos DS (2003) Orthomcl: identification of ortholog groups for eukaryotic genomes. Genome Res 13: 2178–2189CrossRefGoogle Scholar
  16. Li H, Coghlan A, Ruan J, Coin LJ, Hériché JK, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GK, Zheng W, Dehal P, Wang J, Durbin R (2006) TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res 34: D572–D580CrossRefGoogle Scholar
  17. Liu Y, Wang J, Guo J, Chen J (2011) Cographs editing: complexity and parametrized algorithms. In: Fu B, Du DZ (eds) COCOON 2011. Lecture notes computer science, vol 6842. Springer, Berlin, pp 110–121Google Scholar
  18. Maddison WP (1997) Gene trees in species trees. Syst Biol 46: 523–536CrossRefGoogle Scholar
  19. Page RDM, Charleston MA (1998) Trees within trees: phylogeny and historical associations. Trends Ecol Evol 13: 356–359CrossRefGoogle Scholar
  20. Protti F, Dantas da Silva M, Szwarcfiter JL (2009) Applying modular decomposition to parameterized cluster editing problems. Theory Comput Syst 44: 91–104MathSciNetMATHCrossRefGoogle Scholar
  21. Pryszcz LP, Huerta-Cepas J, Gabaldón T (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res 39: e32CrossRefGoogle Scholar
  22. Rauch Henzinger M, King V, Warnow T (1999) Constructing a tree from homeomorphic subtrees, with applications to computational evolutionary biology. Algorithmica 24: 1–13MathSciNetCrossRefGoogle Scholar
  23. Semple C, Steel M (2000) A supertree method for rooted trees. Discrete Appl Math 105: 147–158MathSciNetMATHCrossRefGoogle Scholar
  24. Semple C, Steel M (2003) Phylogenetics. Oxford lecture series in mathematics and its applications, vol 24. Oxford University Press, OxfordGoogle Scholar
  25. Sneath P, Sokal R (1973) Numerical taxonomy. W.H. Freeman and Company, San Francisco, pp 230–234MATHGoogle Scholar
  26. Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28: 33–36CrossRefGoogle Scholar
  27. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E (2008) Database resources of the national center for biotechnology information. Nucleic Acids Res 36: D13–D21CrossRefGoogle Scholar
  28. Zhang J (2003) Evolution by gene duplication: an update. Trends Ecol Evol 18: 292–298CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • Marc Hellmuth
    • 1
  • Maribel Hernandez-Rosales
    • 2
    • 3
  • Katharina T. Huber
    • 4
  • Vincent Moulton
    • 4
  • Peter F. Stadler
    • 2
    • 3
    • 5
    • 6
    • 7
  • Nicolas Wieseke
    • 3
    • 8
  1. 1.Center for BioinformaticsSaarland UniversitySaarbrückenGermany
  2. 2.Max-Planck-Institute for Mathematics in the SciencesLeipzigGermany
  3. 3.Interdisciplinary Center of BioinformaticsUniversity of LeipzigLeipzigGermany
  4. 4.School of Computing SciencesUniversity of East AngliaNorwichUK
  5. 5.Bioinformatics Group, Department of Computer ScienceUniversity of LeipzigLeipzigGermany
  6. 6.Institute for Theoretical ChemistryUniversity of ViennaViennaAustria
  7. 7.Santa Fe InstituteSanta FeUSA
  8. 8.Parallel Computing and Complex Systems Group, Department of Computer ScienceUniversity of LeipzigLeipzigGermany

Personalised recommendations