The Gene Family-Free Median of Three

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9838)


The gene family-free framework for comparative genomics aims at developing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We show that the corresponding computational problem is MAX SNP-hard and we present a 0–1 linear program for its exact solution. The result of this program is a median genome with median genes associated to extant genes, in which median adjacencies are assumed to define positional orthologs. We demonstrate through simulations and comparison with the OMA orthology database that the herein presented method is able compute accurate medians and positional orthologs for genomes comparable in size of bacterial genomes.


Integer Linear Program Orthologous Group Extant Gene Gene Extremity Gene Family Evolution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The authors are grateful to the anonymous reviewers for their valuable comments. DD and MB wish to thank Bernard M.E. Moret for helpful discussions. CC acknowledges funding from the NSERC Discovery Grant.

Supplementary material


  1. 1.
    Altenhoff, A.M., Skunca, N., Glover, N., Train, C.-M., Sueki, A., Pilizota, I., Gori, K., Tomiczek, B., Müller, S., Redestig, H., Gonnet, G.H., Dessimoz, C.: The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements. Nucleic Acids Res. 43, 240–249 (2015)CrossRefGoogle Scholar
  2. 2.
    Bergeron, A., Chauve, C., Gingras, Y.: Formal models of gene clusters. In: Bioinformatics Algorithms: Techniques and Applications, pp. 177–202. Wiley, New York (2008)Google Scholar
  3. 3.
    Braga, M.D.V., Chauve, C., Doerr, D., Jahn, K., Stoye, J., Thévenin, A., Wittler, R.: The potential of family-free genome comparison. In: Models and Algorithms for Genome Evolution, pp. 287–323. Springer, London (2013)Google Scholar
  4. 4.
    Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: Alf - a simulation framework for genome evolution. Mol. Biol. Evol. 29(4), 1115–1123 (2012)CrossRefGoogle Scholar
  5. 5.
    Dewey, C.N.: Positional orthology: putting genomic evolutionary relationships into context. Brief Bioinform. 12(5), 401–412 (2011)CrossRefGoogle Scholar
  6. 6.
    Doerr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl. 19), S3 (2012)CrossRefGoogle Scholar
  7. 7.
    Dörr, D.: Gene family-free genome comparison. Ph.D. thesis, Universität Bielefeld, Bielefeld, Germany (2016)Google Scholar
  8. 8.
    Fitch, W.M.: Homology a personal view on some of the problems. Trends Genet. 16, 227–231 (2000)CrossRefGoogle Scholar
  9. 9.
    Gabaldón, T., Koonin, E.V.: Functional and evolutionary implications of gene orthology. Nat. Rev. Genet. 14, 360–366 (2013)CrossRefGoogle Scholar
  10. 10.
    Kowada, L.A.B., Doerr, D., Dantas, S., Stoye, J.: New genome similarity measures based on conserved gene adjacencies. In: RECOMB 2016, pp. 204–224 (2016)Google Scholar
  11. 11.
    Lechner, M., Hernandez-Rosales, M., Doerr, D., Wieseke, N., Thévenin, A., Stoye, J., Hartmann, R.K., Prohaska, S.J., Stadler, P.F.: Orthology detection combining clustering and synteny for very large datasets. PLoS ONE 9(8), e105015 (2014)CrossRefGoogle Scholar
  12. 12.
    Lechner, M., Findeiß, S., Steiner, L., Marz, M., Stadler, P.F., Prohaska, S.J.: Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC Bioinform. 12, 124 (2011)CrossRefGoogle Scholar
  13. 13.
    Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015)CrossRefGoogle Scholar
  14. 14.
    Pesquita, C., Faria, D., Bastos, H., Ferreira, A.E.N., Falcão, A.O., Couto, F.M.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinform. 9(Suppl. 5), S4 (2008)CrossRefGoogle Scholar
  15. 15.
    Shi, G., Peng, M.-C., Jiang, T.: Multimsoar 2.0: an accurate tool to identify ortholog groups among multiple genomes. PLoS ONE 6(6), e20892 (2011)CrossRefGoogle Scholar
  16. 16.
    Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinform. 10, 120 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.School of Computer and Communication Sciences, EPFLLausanneSwitzerland
  2. 2.Faculty of TechnologyBielefeld UniversityBielefeldGermany
  3. 3.Department of MathematicsSimon Fraser UniversityBurnabyCanada

Personalised recommendations