Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

  • Diego P. Rubert
  • Gabriel L. Medeiros
  • Edna A. Hoshino
  • Marília D. V. Braga
  • Jens Stoye
  • Fábio V. MartinezEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)


The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. Here we propose an exact ILP algorithm to solve it, we show its APX-hardness, and we present three combinatorial heuristics, with computational experiments comparing their results to the ILP. Experiments on simulated datasets show that the proposed heuristics are very fast and even competitive with respect to the ILP algorithm for some instances.


Genome rearrangement Double-cut-and-join Family-free genomic similarity 



We would like to thank Pedro Feijão and Daniel Doerr for helping us with hints on how to get the simulated data for our experiments.

Supplementary material


  1. 1.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol. 14(4), 379–393 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer (1999)Google Scholar
  5. 5.
    Bafna, V., Pevzner, P.: Genome rearrangements and sorting by reversals. In: Proceedings of the FOCS 1993, pp. 148–157 (1993)Google Scholar
  6. 6.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). doi: 10.1007/11851561_16 CrossRefGoogle Scholar
  7. 7.
    Berman, P.: A d/2 approximation for maximum weight independent set in d-claw free graphs. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 214–219. Springer, Heidelberg (2000). doi: 10.1007/3-540-44985-X_19 CrossRefGoogle Scholar
  8. 8.
    Berman, P., Karpinski, M.: On some tighter inapproximability results (extended abstract). In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 200–209. Springer, Heidelberg (1999). doi: 10.1007/3-540-48523-6_17 CrossRefGoogle Scholar
  9. 9.
    Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Braga, M.D.V., Chauve, C., Dörr, D., Jahn, K., Stoye, J., Thévenin, A., Wittler, R.: The potential of family-free genome comparison. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 287–307. Springer, London (2013). doi: 10.1007/978-1-4471-5298-9_13. Chap. 13CrossRefGoogle Scholar
  11. 11.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)CrossRefGoogle Scholar
  12. 12.
    Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(6), 1384–1390 (2013)CrossRefGoogle Scholar
  13. 13.
    Crescenzi, P.: A short guide to approximation preserving reductions. In: Twelfth Annual IEEE Conference on Proceedings of Computational Complexity, pp. 262–273 (1997). doi: 10.1109/CCC.1997.612321
  14. 14.
    Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: ALF - a simulation framework for genome evolution. Mol. Biol. Evol. 29(4), 1115 (2012)CrossRefGoogle Scholar
  15. 15.
    Dörr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform. 13(Suppl 19), S3 (2012)CrossRefGoogle Scholar
  16. 16.
    Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of the FOCS 1995, pp. 581–592 (1995). doi: 10.1109/SFCS.1995.492588
  17. 17.
    Håstad, J.: Some optimal inapproximability results. J. ACM (JACM) 48(4), 798–859 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Hawick, K.A., James, H.A.: Enumerating circuits and loops in graphs with self-arcs and multiple-arcs. Technical report CSTN-013, Massey University (2008)Google Scholar
  19. 19.
    Johnson, D.: Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015)CrossRefGoogle Scholar
  21. 21.
    Raman, V., Ravikumar, B., Rao, S.S.: A simplified NP-complete MAXSAT problem. Inf. Process. Lett. 65(1), 1–6 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Rubert, D.P., Feijão, P., Braga, M.D.V., Stoye, J., Martinez, F.V.: Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol. Biol. 12, 3 (2017)CrossRefGoogle Scholar
  23. 23.
    Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). doi: 10.1007/3-540-56024-6_10 CrossRefGoogle Scholar
  24. 24.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  25. 25.
    Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform. 13(Suppl 19), S13 (2012)CrossRefGoogle Scholar
  26. 26.
    Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 280–292. Springer, Cham (2014). doi: 10.1007/978-3-319-05269-4_22 CrossRefGoogle Scholar
  27. 27.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Diego P. Rubert
    • 1
  • Gabriel L. Medeiros
    • 1
  • Edna A. Hoshino
    • 1
  • Marília D. V. Braga
    • 2
  • Jens Stoye
    • 2
  • Fábio V. Martinez
    • 1
    Email author
  1. 1.Faculdade de ComputaçãoUniversidade Federal de Mato Grosso do SulCampo GrandeBrazil
  2. 2.Faculty of Technology and Center for Biotechnology (CeBiTec)Bielefeld UniversityBielefeldGermany

Personalised recommendations