Advertisement

Reciprocal best match graphs

  • Manuela Geiß
  • Peter F. Stadler
  • Marc HellmuthEmail author
Article
  • 39 Downloads

Abstract

Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.

Keywords

Pairwise best hit Reciprocal best match heuristics Vertex colored graph Phylogenetic tree Hierarchically colored cograph 

Mathematics Subject Classification

05C90 92D15 

Notes

Acknowledgements

Partial financial support by the German Federal Ministry of Education and Research (BMBF, Project No. 031A538A, de.NBI-RBC) is gratefully acknowledged.

Supplementary material

References

  1. Aho A, Sagiv Y, Szymanski T, Ullman J (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput 10:405–421MathSciNetCrossRefGoogle Scholar
  2. Altenhoff AM, Boeckmann B, Capella-Gutierrez S, Dalquen DA, DeLuca T, Forslund K, Jaime HC, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C (2016) Standardized benchmarking in the quest for orthologs. Nat Methods 13:425–430CrossRefGoogle Scholar
  3. Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262CrossRefGoogle Scholar
  4. Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back. J Mol Biol 283:707–725CrossRefGoogle Scholar
  5. Bretscher A, Corneil D, Habib M, Paul C (2008) A simple linear time LexBFS cograph recognition algorithm. SIAM J Discrete Math 22:1277–1296MathSciNetCrossRefGoogle Scholar
  6. Corneil D, Perl Y, Stewart L (1985) A linear recognition algorithm for cographs. SIAM J Comput 14:926–934MathSciNetCrossRefGoogle Scholar
  7. Corneil DG, Lerchs H, Steward Burlingham L (1981) Complement reducible graphs. Discr Appl Math 3:163–174MathSciNetCrossRefGoogle Scholar
  8. Crespelle C, Paul C (2006) Fully dynamic recognition algorithm and certificate for directed cographs. Discr Appl Math 154:1722–1741MathSciNetCrossRefGoogle Scholar
  9. Fitch WM (2000) Homology: a personal view on some of the problems. Trends Genet 16:227–231CrossRefGoogle Scholar
  10. Geiß M, Anders J, Stadler PF, Wieseke N, Hellmuth M (2018) Reconstructing gene trees from Fitch’s xenology relation. J Math Biol 77:1459–1491MathSciNetCrossRefGoogle Scholar
  11. Geiß M, Chávez E, González M, López A, Valdivia D, Hernández Rosales M, Stadler BMR, Hellmuth M, Stadler PF (2019a) Best match graphs. J Math Biol 78:2015–2057MathSciNetCrossRefGoogle Scholar
  12. Geiß M, González Laffitte M, López Sánchez A, Valdivia D, Hellmuth M, Hernández Rosales M, Stadler P (2019b) Best match graphs and reconciliation of gene trees with species trees. Preprint arXiv:1904.12021
  13. Habib M, Paul C (2005) A simple linear time algorithm for cograph recognition. Discrete Appl Math 145:183–197MathSciNetCrossRefGoogle Scholar
  14. Hammack R, Imrich W, Klavžar S (2011) Handbook of product graphs, 2nd edn. Discrete mathematics and its applications. CRC Press, Boca RatonCrossRefGoogle Scholar
  15. Harary F, Schwenk AJ (1973) The number of caterpillars. Discrete Math 6:359–365MathSciNetCrossRefGoogle Scholar
  16. Hellmuth M (2017) Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol 12:23CrossRefGoogle Scholar
  17. Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66:399–420MathSciNetCrossRefGoogle Scholar
  18. Hellmuth M, Marc T (2015) On the Cartesian skeleton and the factorization of the strong product of digraphs. Theor Comp Sci 565:16–29MathSciNetCrossRefGoogle Scholar
  19. Hellmuth M, Seemann CR (2019) Alternative characterizations of Fitch’s xenology relation. J Math Biol 79:969–986MathSciNetCrossRefGoogle Scholar
  20. Hellmuth M, Stadler PF, Wieseke N (2017) The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. J Math Biol 75:199–237MathSciNetCrossRefGoogle Scholar
  21. Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF (2015) Phylogenetics from paralogs. Proc Natl Acad Sci USA 112:2058–2063CrossRefGoogle Scholar
  22. Hernández-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinf 13:S6CrossRefGoogle Scholar
  23. Jahangiri-Tazehkand S, Wong L, Eslahchi C (2017) OrthoGNC: a software for accurate identification of orthologs based on gene neighborhood conservation. Genom Proteom Bioinf 15:361–370CrossRefGoogle Scholar
  24. Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9:e105015CrossRefGoogle Scholar
  25. Li J (2012) Combinatorial logarithm and point-determining cographs. Elec J Comb 19:P8MathSciNetzbMATHGoogle Scholar
  26. McKenzie R (1971) Cardinal multiplication of structures with a reflexive relation. Fund Math 70:59–101MathSciNetCrossRefGoogle Scholar
  27. Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96:2896–2901CrossRefGoogle Scholar
  28. Schieber B, Vishkin U (1988) On finding lowest common ancestors: simplification and parallelization. SIAM J Comput 17:1253–1262MathSciNetCrossRefGoogle Scholar
  29. Setubal JC, Stadler PF (2018) Gene phyologenies and orthologous groups. In: Setubal JC, Stadler PF, Stoye J (eds) Comparative genomics, vol 1704. Springer, Heidelberg, pp 1–28CrossRefGoogle Scholar
  30. Sumner DP (1974) Dacey graphs. J Aust Math Soc 18:492–502MathSciNetCrossRefGoogle Scholar
  31. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637CrossRefGoogle Scholar
  32. Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C (2017) Orthologous matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 33:i75–i82CrossRefGoogle Scholar
  33. Wall DP, Fraser HB, Hirsh AE (2003) Detecting putative orthologs. Bioinformatics 19:1710–1711CrossRefGoogle Scholar
  34. Yu C, Zavaljevski N, Desai V, Reifman J (2011) QuartetS: a fast and accurate algorithm for large-scale orthology detection. Nucleic Acids Res 39:e88CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Manuela Geiß
    • 1
    • 2
  • Peter F. Stadler
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
  • Marc Hellmuth
    • 9
    • 10
    Email author
  1. 1.Bioinformatics Group, Department of Computer ScienceLeipzig UniversityLeipzigGermany
  2. 2.Interdisciplinary Center of BioinformaticsLeipzig UniversityLeipzigGermany
  3. 3.German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-LeipzigLeipzig UniversityLeipzigGermany
  4. 4.Competence Center for Scalable Data Services and SolutionsLeipzig UniversityLeipzigGermany
  5. 5.Leipzig Research Center for Civilization DiseasesLeipzig UniversityLeipzigGermany
  6. 6.Max-Planck-Institute for Mathematics in the SciencesLeipzigGermany
  7. 7.Institute for Theoretical ChemistryUniversity of ViennaViennaAustria
  8. 8.Santa Fe InstituteSanta FeUSA
  9. 9.Institute of Mathematics and Computer ScienceUniversity of GreifswaldGreifswaldGermany
  10. 10.Center for BioinformaticsSaarland UniversitySaarbrückenGermany

Personalised recommendations