The Potential of Family-Free Genome Comparison

  • Marília D. V. Braga
  • Cedric Chauve
  • Daniel Doerr
  • Katharina Jahn
  • Jens Stoye
  • Annelyse Thévenin
  • Roland Wittler
Part of the Computational Biology book series (COBO, volume 19)


Many methods in computational comparative genomics require gene family assignments as a prerequisite. While the biological concept of gene families is well established, their computational prediction remains unreliable. This paper continues a new line of research in which family assignments are not presumed. We study the potential of several family-free approaches in detecting conserved structures, genome rearrangements and in reconstructing ancestral gene orders.


Adjacency Graph Ancestral Genome Common Interval Ancestral Gene Order Orthologous Marker 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



MDVB is funded by the Brazilian research agency CNPq grant PROMETRO 563087/10-2. DD receives a scholarship from the CLIB Graduate Cluster Industrial Biotechnology. KJ is funded by DFG grant ST 431/5-1. AT is a research fellow of the Alexander von Humboldt Foundation.


  1. 1.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008) MathSciNetCrossRefGoogle Scholar
  2. 2.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009) MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25(1), 25–29 (2000) CrossRefGoogle Scholar
  4. 4.
    Bergeron, A., Stoye, J.: On the similarity of sets of permutations and its applications to genome comparison. J. Comput. Biol. 13(7), 1340–1354 (2006) MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bergeron, A., Corteel, S., Raffinot, M.: The algorithmic of gene teams. In: Proceedings of WABI 2002. LNCS, vol. 2452, pp. 464–476 (2002) Google Scholar
  6. 6.
    Bergeron, A., Mixtacki, J., Stoye, J.: On sorting by translocations. J. Comput. Biol. 13(2), 567–578 (2006) MathSciNetCrossRefGoogle Scholar
  7. 7.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Proceedings of WABI 2006. LNBI, vol. 4175, pp. 163–173 (2006) Google Scholar
  8. 8.
    Bernt, M., Merkle, D., Middendorf, M.: Solving the preserving reversal median problem. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 332–347 (2008) CrossRefGoogle Scholar
  9. 9.
    Blin, G., Chauve, C., Fertin, G.: The breakpoint distance for signed sequences. In: Proceedings of CompBioNets 2004. Texts in Algorithmics, vol. 3, pp. 3–16 (2004) Google Scholar
  10. 10.
    Blin, G., Chateau, A., Chauve, C., Gingras, Y.: Inferring positional homologs with common intervals of sequences. In: Proceedings of RECOMB-CG 2006, pp. 24–38. Springer, Berlin (2006) Google Scholar
  11. 11.
    Blin, G., Chauve, C., Fertin, G., Rizzi, R., Vialette, S.: Comparing genomes with duplications: a computational complexity point of view. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(4), 523–534 (2007) CrossRefGoogle Scholar
  12. 12.
    Böcker, S., Jahn, K., Mixtacki, J., Stoye, J.: Computation of median gene clusters. J. Comput. Biol. 16(8), 1085–1099 (2009) MathSciNetCrossRefGoogle Scholar
  13. 13.
    Bourque, G., Pevzner, P.A.: Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res. 12(1), 26–36 (2002) Google Scholar
  14. 14.
    Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011) MathSciNetCrossRefGoogle Scholar
  15. 15.
    Caprara, A.: The reversal median problem. INFORMS J. Comput. 15(1), 93–113 (2003) MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Chauve, C., Tannier, E.: A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes. PLoS Comput. Biol. 4(11), e1000234 (2008) MathSciNetCrossRefGoogle Scholar
  17. 17.
    Chauve, C., El-Mabrouk, N., Guéguen, L., Semeria, M., Tannier, E.: Duplication, rearrangement and reconciliation: a follow-up 13 years later. In: Chauve, C. et al. (eds.) Models and Algorithms for Genome Evolution. Computational Biology, vol. 19. Springer, Berlin (2013). In this volume CrossRefGoogle Scholar
  18. 18.
    Csurös, M.: Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood. Bioinformatics 26(15), 1910–1912 (2010) CrossRefGoogle Scholar
  19. 19.
    Darling, A.E., Mau, B., Perna, N.T.: ProgressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE 5(6), e11147 (2010) CrossRefGoogle Scholar
  20. 20.
    Dewey, C.N.: Positional orthology: putting genomic evolutionary relationships into context. Brief. Bioinform. 12(5), 401–412 (2011) CrossRefGoogle Scholar
  21. 21.
    Didier, G., Schmidt, T., Stoye, J., Tsur, D.: Character sets of strings. J. Discrete Algorithms 5(2), 330–340 (2007) MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Doerr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl 19), S3 (2012) CrossRefGoogle Scholar
  23. 23.
    Durand, D., Sankoff, D.: Tests for gene clustering. J. Comput. Biol. 10, 453–482 (2003) CrossRefGoogle Scholar
  24. 24.
    Earnest-DeYoung, J.V., Lerat, E., Moret, B.M.E.: Reversing gene erosion—reconstructing ancestral bacterial genomes from gene-content and order data. In: Proceedings of WABI 2004. LNCS, vol. 3240, pp. 1–13 (2004) Google Scholar
  25. 25.
    El-Mabrouk, N.: Sorting signed permutations by reversals and insertions/deletions of contiguous segments. J. Discrete Algorithms 1(1), 105–122 (2001) MathSciNetGoogle Scholar
  26. 26.
    Feijão, P., Meidanis, J.: SCJ: a breakpoint-like distance that simplifies several rearrangement problems. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1318–1329 (2011) CrossRefGoogle Scholar
  27. 27.
    Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press, Cambridge (2009) CrossRefzbMATHGoogle Scholar
  28. 28.
    Frech, C., Chen, N.: Genome-wide comparative gene family classification. PLoS ONE 5(10), e13409 (2010) CrossRefGoogle Scholar
  29. 29.
    Fu, Z., Chen, X., Vacic, V., Nan, P., Zhong, Y., Jiang, T.: MSOAR: a high-throughput ortholog assignment system based on genome rearrangement. J. Comput. Biol. 14(9), 1160–1175 (2007) MathSciNetCrossRefGoogle Scholar
  30. 30.
    Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1–27 (1999) MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    He, X., Goldwasser, M.H.: Identifying conserved gene clusters in the presence of homology families. J. Comput. Biol. 12(6), 638–656 (2005) CrossRefGoogle Scholar
  32. 32.
    Heber, S., Stoye, J.: Algorithms for finding gene clusters. In: Proceedings of WABI 2001. LNCS, vol. 2149, pp. 252–263 (2001) Google Scholar
  33. 33.
    Heber, S., Mayr, R., Stoye, J.: Common intervals of multiple permutations. Algorithmica 60(2), 175–206 (2011) MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Jahn, K.: Efficient computation of approximate gene clusters based on reference occurrences. J. Comput. Biol. 18(9), 1255–1274 (2011) MathSciNetCrossRefGoogle Scholar
  35. 35.
    Kuhn, H.W.: The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 2(1–2), 83–97 (2006) Google Scholar
  36. 36.
    Li, L., Stoeckert, C.J., Roos, D.S.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003) CrossRefGoogle Scholar
  37. 37.
    Ma, J., Ratan, A., Raney, B.J., Suh, B.B., Zhang, L., Miller, W., Haussler, D.: DUPCAR: reconstructing contiguous ancestral regions with duplications. J. Comput. Biol. 15(8), 1007–1027 (2008) MathSciNetCrossRefGoogle Scholar
  38. 38.
    Manuch, J., Patterson, M., Wittler, R., Chauve, C., Tannier, E.: Linearization of ancestral multichromosomal genomes. BMC Bioinform. 13(Suppl 19), S11 (2012) Google Scholar
  39. 39.
    Milinkovitch, M.C., Helaers, R., Depiereux, E., Tzika, A.C., Gabaldon, T.: 2× genomes—depth does matter. Genome Biol. 11, R6 (2010) CrossRefGoogle Scholar
  40. 40.
    Ostlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Roopra, S., Frings, O., Sonnhammer, E.L.L.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38(Database issue), D196–D203 (2010) CrossRefGoogle Scholar
  41. 41.
    Pe’er, I., Shamir, R.: The median problems for breakpoints are NP-complete. Electron. Colloq. Comput. Complex. 71, 5 (1998) Google Scholar
  42. 42.
    Powell, S., Szklarczyk, D., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T., Jensen, L.J., von Mering, C., Bork, P.: eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40(Database issue), D284–D289 (2012) CrossRefGoogle Scholar
  43. 43.
    Rahmann, S., Klau, G.W.: Integer linear programs for discovering approximate gene clusters. In: Proceedings of WABI 2006. LNBI, vol. 4175, pp. 298–309 (2006) Google Scholar
  44. 44.
    Sankoff, D.: Edit distances for genome comparisons based on non-local operations. In: Proceedings of CPM 1992. LNCS, vol. 644, pp. 121–135 (1992) Google Scholar
  45. 45.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999) CrossRefGoogle Scholar
  46. 46.
    Sankoff, D., Blanchette, M.: The median problem for breakpoints in comparative genomics. In: Proceedings of COCOON 1997. LNCS, vol. 1276, pp. 251–263 (1997) Google Scholar
  47. 47.
    Sankoff, D., Blanchette, M.: Multiple genome rearrangement and breakpoint phylogeny. J. Comput. Biol. 5, 555–570 (1998) CrossRefGoogle Scholar
  48. 48.
    Sankoff, D., El-Mabrouk, N.: Duplication, rearrangement and reconciliation. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and the Evolution of Gene Families. Computational Biology Series, vol. 1, pp. 537–550. Kluwer Academic, Dordrecht (2000) CrossRefGoogle Scholar
  49. 49.
    Sankoff, D., Cedergren, R., Abel, Y.: Genomic divergence through gene rearrangement. In: Doolittle, R.F. (ed.) Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences. Meth. Enzymol., vol. 183, Chap. 26, pp. 428–438. Academic Press, San Diego (1990) Google Scholar
  50. 50.
    Schmidt, T., Stoye, J.: Quadratic time algorithms for finding common intervals in two and more sequences. In: Proceedings of CPM 2004. LNCS, vol. 3109, pp. 347–358 (2004) Google Scholar
  51. 51.
    Shi, G., Peng, M.C., Jiang, T.: MultiMSOAR 2.0: an accurate tool to identify ortholog groups among multiple genomes. PLoS ONE 6(6), e20892 (2011) CrossRefGoogle Scholar
  52. 52.
    Stoye, J., Wittler, R.: A unified approach for reconstructing ancient gene clusters. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(3), 387–400 (2009) CrossRefGoogle Scholar
  53. 53.
    Tang, J., Moret, B.M., Cui, L., Depamphilis, C.W.: Phylogenetic reconstruction from arbitrary gene-order data. In: Proceedings of BIBE 2004, pp. 592–599. IEEE, New York (2004) Google Scholar
  54. 54.
    Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinform. 10, 120 (2009) CrossRefGoogle Scholar
  55. 55.
    Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Jacobs, A.R., Kiryutin, B., Koonin, E.V., Krylov, D.M., Mazumder, R., Mekhedov, S.L., Nikolskaya, A.N., Rao, B.S., Smirnov, S., Sverdlov, A.V., Vasudevan, S., Wolf, Y.I., Yin, J.J., Natale, D.A.: The COG database: an updated version includes eukaryotes. BMC Bioinform. 4, 41 (2003) CrossRefGoogle Scholar
  56. 56.
    Uno, T., Yagiura, M.: Fast algorithms to enumerate all common intervals of two permutations. Algorithmica 26(2), 290–309 (2000) MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics 23(13), i549–i558 (2007) CrossRefGoogle Scholar
  58. 58.
    Wapinski, I., Pfeffer, A., Friedman, N., Regev, A.: Natural history and evolutionary principles of gene duplication in fungi. Nature 449(7158), 54–61 (2007) CrossRefGoogle Scholar
  59. 59.
    Waterhouse, R.M., Zdobnov, E.M., Tegenfeldt, F., Li, J., Kriventseva, E.V.: OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011. Nucleic Acids Res. 39(Database issue), D283–D288 (2011) CrossRefGoogle Scholar
  60. 60.
    Watterson, G., Ewens, W.J., Hall, T., Morgan, A.: The chromosome inversion problem. J. Theor. Biol. 99(1), 1–7 (1982) CrossRefGoogle Scholar
  61. 61.
    Xu, A.W., Moret, B.M.E.: GASTS: parsimony scoring under rearrangements. In: Proceedings of WABI 2011. LNBI, vol. 6833, pp. 351–363 (2011) Google Scholar
  62. 62.
    Xu, X., Sankoff, D.: Tests for gene clusters satisfying the generalized adjacency criterion. In: Proceedings of BSB 2008. LNBI, vol. 5167, pp. 152–160 (2008) Google Scholar
  63. 63.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005) CrossRefGoogle Scholar
  64. 64.
    Yang, Z., Sankoff, D.: Natural parameter values for generalized gene adjacency. In: Proceedings of RECOMB-CG 2009. LNBI, vol. 5817, pp. 13–23 (2009) Google Scholar
  65. 65.
    Zhang, M., Leong, H.W.: Identifying positional homologs as bidirectional best hits of sequence and gene context similarity. In: Proceedings of ISB 2011, pp. 117–122. IEEE, New York (2011) Google Scholar
  66. 66.
    Zhu, B.: Approximability and fixed-parameter tractability for the exemplar genomic distance problems. In: Proc. of Theory and Applications of Models of Computation. LNCS, vol. 5532, pp. 71–80 (2009) CrossRefGoogle Scholar
  67. 67.
    Zhu, Q., Adam, Z., Choi, V., Sankoff, D.: Generalized gene adjacencies, graph bandwidth, and clusters in yeast evolution. IEEE/ACM Trans. Comput. Biol. Bioinform. 6(2), 213–220 (2009) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Marília D. V. Braga
    • 1
  • Cedric Chauve
    • 2
    • 3
  • Daniel Doerr
    • 4
    • 5
  • Katharina Jahn
    • 4
    • 5
  • Jens Stoye
    • 4
    • 5
  • Annelyse Thévenin
    • 4
    • 5
  • Roland Wittler
    • 4
    • 5
  1. 1.InmetroDuque de CaxiasBrazil
  2. 2.Department of MathematicsSimon Fraser UniversityBurnabyCanada
  3. 3.LaBRIUniversité Bordeaux ITalenceFrance
  4. 4.Genome Informatics, Faculty of TechnologyBielefeld UniversityBielefeldGermany
  5. 5.Institute for Bioinformatics, CeBiTecBielefeld UniversityBielefeldGermany

Personalised recommendations