## Abstract

Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph *G* is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of *G* are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-RBMGs) can be recognized in polynomial time. Co-RBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-RBMGs can be constructed in polynomial time.

## Keywords

Pairwise best hit Reciprocal best match heuristics Vertex colored graph Phylogenetic tree Hierarchically colored cograph## Mathematics Subject Classification

05C90 92D15## Notes

### Acknowledgements

Partial financial support by the German Federal Ministry of Education and Research (BMBF, Project No. 031A538A, de.NBI-RBC) is gratefully acknowledged.

## Supplementary material

## References

- Aho A, Sagiv Y, Szymanski T, Ullman J (1981) Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J Comput 10:405–421MathSciNetCrossRefGoogle Scholar
- Altenhoff AM, Boeckmann B, Capella-Gutierrez S, Dalquen DA, DeLuca T, Forslund K, Jaime HC, Linard B, Pereira C, Pryszcz LP, Schreiber F, da Silva AS, Szklarczyk D, Train CM, Bork P, Lecompte O, von Mering C, Xenarios I, Sjölander K, Jensen LJ, Martin MJ, Muffato M, Gabaldón T, Lewis SE, Thomas PD, Sonnhammer E, Dessimoz C (2016) Standardized benchmarking in the quest for orthologs. Nat Methods 13:425–430CrossRefGoogle Scholar
- Altenhoff AM, Dessimoz C (2009) Phylogenetic and functional assessment of orthologs inference projects and methods. PLoS Comput Biol 5:e1000262CrossRefGoogle Scholar
- Bork P, Dandekar T, Diaz-Lazcoz Y, Eisenhaber F, Huynen M, Yuan Y (1998) Predicting function: from genes to genomes and back. J Mol Biol 283:707–725CrossRefGoogle Scholar
- Bretscher A, Corneil D, Habib M, Paul C (2008) A simple linear time LexBFS cograph recognition algorithm. SIAM J Discrete Math 22:1277–1296MathSciNetCrossRefGoogle Scholar
- Corneil D, Perl Y, Stewart L (1985) A linear recognition algorithm for cographs. SIAM J Comput 14:926–934MathSciNetCrossRefGoogle Scholar
- Corneil DG, Lerchs H, Steward Burlingham L (1981) Complement reducible graphs. Discr Appl Math 3:163–174MathSciNetCrossRefGoogle Scholar
- Crespelle C, Paul C (2006) Fully dynamic recognition algorithm and certificate for directed cographs. Discr Appl Math 154:1722–1741MathSciNetCrossRefGoogle Scholar
- Fitch WM (2000) Homology: a personal view on some of the problems. Trends Genet 16:227–231CrossRefGoogle Scholar
- Geiß M, Anders J, Stadler PF, Wieseke N, Hellmuth M (2018) Reconstructing gene trees from Fitch’s xenology relation. J Math Biol 77:1459–1491MathSciNetCrossRefGoogle Scholar
- Geiß M, Chávez E, González M, López A, Valdivia D, Hernández Rosales M, Stadler BMR, Hellmuth M, Stadler PF (2019a) Best match graphs. J Math Biol 78:2015–2057MathSciNetCrossRefGoogle Scholar
- Geiß M, González Laffitte M, López Sánchez A, Valdivia D, Hellmuth M, Hernández Rosales M, Stadler P (2019b) Best match graphs and reconciliation of gene trees with species trees. Preprint arXiv:1904.12021
- Habib M, Paul C (2005) A simple linear time algorithm for cograph recognition. Discrete Appl Math 145:183–197MathSciNetCrossRefGoogle Scholar
- Hammack R, Imrich W, Klavžar S (2011) Handbook of product graphs, 2nd edn. Discrete mathematics and its applications. CRC Press, Boca RatonCrossRefGoogle Scholar
- Harary F, Schwenk AJ (1973) The number of caterpillars. Discrete Math 6:359–365MathSciNetCrossRefGoogle Scholar
- Hellmuth M (2017) Biologically feasible gene trees, reconciliation maps and informative triples. Algorithms Mol Biol 12:23CrossRefGoogle Scholar
- Hellmuth M, Hernandez-Rosales M, Huber KT, Moulton V, Stadler PF, Wieseke N (2013) Orthology relations, symbolic ultrametrics, and cographs. J Math Biol 66:399–420MathSciNetCrossRefGoogle Scholar
- Hellmuth M, Marc T (2015) On the Cartesian skeleton and the factorization of the strong product of digraphs. Theor Comp Sci 565:16–29MathSciNetCrossRefGoogle Scholar
- Hellmuth M, Seemann CR (2019) Alternative characterizations of Fitch’s xenology relation. J Math Biol 79:969–986MathSciNetCrossRefGoogle Scholar
- Hellmuth M, Stadler PF, Wieseke N (2017) The mathematics of xenology: di-cographs, symbolic ultrametrics, 2-structures and tree-representable systems of binary relations. J Math Biol 75:199–237MathSciNetCrossRefGoogle Scholar
- Hellmuth M, Wieseke N, Lechner M, Lenhof HP, Middendorf M, Stadler PF (2015) Phylogenetics from paralogs. Proc Natl Acad Sci USA 112:2058–2063CrossRefGoogle Scholar
- Hernández-Rosales M, Hellmuth M, Wieseke N, Huber KT, Moulton V, Stadler PF (2012) From event-labeled gene trees to species trees. BMC Bioinf 13:S6CrossRefGoogle Scholar
- Jahangiri-Tazehkand S, Wong L, Eslahchi C (2017) OrthoGNC: a software for accurate identification of orthologs based on gene neighborhood conservation. Genom Proteom Bioinf 15:361–370CrossRefGoogle Scholar
- Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF (2014) Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9:e105015CrossRefGoogle Scholar
- Li J (2012) Combinatorial logarithm and point-determining cographs. Elec J Comb 19:P8MathSciNetzbMATHGoogle Scholar
- McKenzie R (1971) Cardinal multiplication of structures with a reflexive relation. Fund Math 70:59–101MathSciNetCrossRefGoogle Scholar
- Overbeek R, Fonstein M, D’Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96:2896–2901CrossRefGoogle Scholar
- Schieber B, Vishkin U (1988) On finding lowest common ancestors: simplification and parallelization. SIAM J Comput 17:1253–1262MathSciNetCrossRefGoogle Scholar
- Setubal JC, Stadler PF (2018) Gene phyologenies and orthologous groups. In: Setubal JC, Stadler PF, Stoye J (eds) Comparative genomics, vol 1704. Springer, Heidelberg, pp 1–28CrossRefGoogle Scholar
- Sumner DP (1974) Dacey graphs. J Aust Math Soc 18:492–502MathSciNetCrossRefGoogle Scholar
- Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637CrossRefGoogle Scholar
- Train CM, Glover NM, Gonnet GH, Altenhoff AM, Dessimoz C (2017) Orthologous matrix (OMA) algorithm 2.0: more robust to asymmetric evolutionary rates and more scalable hierarchical orthologous group inference. Bioinformatics 33:i75–i82CrossRefGoogle Scholar
- Wall DP, Fraser HB, Hirsh AE (2003) Detecting putative orthologs. Bioinformatics 19:1710–1711CrossRefGoogle Scholar
- Yu C, Zavaljevski N, Desai V, Reifman J (2011) QuartetS: a fast and accurate algorithm for large-scale orthology detection. Nucleic Acids Res 39:e88CrossRefGoogle Scholar