# Algorithms for Computing the Family-Free Genomic Similarity Under DCJ

## Abstract

The genomic similarity is a large-scale measure for comparing two given genomes. In this work we study the (NP-hard) problem of computing the genomic similarity under the DCJ model in a setting that does not assume that the genes of the compared genomes are grouped into gene families. This problem is called family-free DCJ similarity. Here we propose an exact ILP algorithm to solve it, we show its APX-hardness, and we present three combinatorial heuristics, with computational experiments comparing their results to the ILP. Experiments on simulated datasets show that the proposed heuristics are very fast and even competitive with respect to the ILP algorithm for some instances.

## Keywords

Genome rearrangement Double-cut-and-join Family-free genomic similarity## Notes

### Acknowledgments

We would like to thank Pedro Feijão and Daniel Doerr for helping us with hints on how to get the simulated data for our experiments.

## Supplementary material

## References

- 1.Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol.
**15**(8), 1093–1115 (2008)MathSciNetCrossRefGoogle Scholar - 2.Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl.
**13**(1), 19–53 (2009)MathSciNetCrossRefzbMATHGoogle Scholar - 3.Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol.
**14**(4), 379–393 (2007)MathSciNetCrossRefGoogle Scholar - 4.Ausiello, G., Protasi, M., Marchetti-Spaccamela, A., Gambosi, G., Crescenzi, P., Kann, V.: Complexity and Approximation: Combinatorial Optimization Problems and Their Approximability Properties. Springer (1999)Google Scholar
- 5.Bafna, V., Pevzner, P.: Genome rearrangements and sorting by reversals. In: Proceedings of the FOCS 1993, pp. 148–157 (1993)Google Scholar
- 6.Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). doi: 10.1007/11851561_16 CrossRefGoogle Scholar
- 7.Berman, P.: A d/2 approximation for maximum weight independent set in d-claw free graphs. In: Halldórsson, M.M. (ed.) SWAT 2000. LNCS, vol. 1851, pp. 214–219. Springer, Heidelberg (2000). doi: 10.1007/3-540-44985-X_19 CrossRefGoogle Scholar
- 8.Berman, P., Karpinski, M.: On some tighter inapproximability results (extended abstract). In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 200–209. Springer, Heidelberg (1999). doi: 10.1007/3-540-48523-6_17 CrossRefGoogle Scholar
- 9.Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol.
**18**(9), 1167–1184 (2011)MathSciNetCrossRefGoogle Scholar - 10.Braga, M.D.V., Chauve, C., Dörr, D., Jahn, K., Stoye, J., Thévenin, A., Wittler, R.: The potential of family-free genome comparison. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 287–307. Springer, London (2013). doi: 10.1007/978-1-4471-5298-9_13. Chap. 13CrossRefGoogle Scholar
- 11.Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)CrossRefGoogle Scholar
- 12.Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ACM Trans. Comput. Biol. Bioinf.
**10**(6), 1384–1390 (2013)CrossRefGoogle Scholar - 13.Crescenzi, P.: A short guide to approximation preserving reductions. In: Twelfth Annual IEEE Conference on Proceedings of Computational Complexity, pp. 262–273 (1997). doi: 10.1109/CCC.1997.612321
- 14.Dalquen, D.A., Anisimova, M., Gonnet, G.H., Dessimoz, C.: ALF - a simulation framework for genome evolution. Mol. Biol. Evol.
**29**(4), 1115 (2012)CrossRefGoogle Scholar - 15.Dörr, D., Thévenin, A., Stoye, J.: Gene family assignment-free comparative genomics. BMC Bioinform.
**13**(Suppl 19), S3 (2012)CrossRefGoogle Scholar - 16.Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of the FOCS 1995, pp. 581–592 (1995). doi: 10.1109/SFCS.1995.492588
- 17.Håstad, J.: Some optimal inapproximability results. J. ACM (JACM)
**48**(4), 798–859 (2001)MathSciNetCrossRefzbMATHGoogle Scholar - 18.Hawick, K.A., James, H.A.: Enumerating circuits and loops in graphs with self-arcs and multiple-arcs. Technical report CSTN-013, Massey University (2008)Google Scholar
- 19.Johnson, D.: Finding all the elementary circuits of a directed graph. SIAM J. Comput.
**4**(1), 77–84 (1975)MathSciNetCrossRefzbMATHGoogle Scholar - 20.Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol.
**10**, 13 (2015)CrossRefGoogle Scholar - 21.Raman, V., Ravikumar, B., Rao, S.S.: A simplified NP-complete MAXSAT problem. Inf. Process. Lett.
**65**(1), 1–6 (1998)MathSciNetCrossRefzbMATHGoogle Scholar - 22.Rubert, D.P., Feijão, P., Braga, M.D.V., Stoye, J., Martinez, F.V.: Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol. Biol.
**12**, 3 (2017)CrossRefGoogle Scholar - 23.Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). doi: 10.1007/3-540-56024-6_10 CrossRefGoogle Scholar
- 24.Sankoff, D.: Genome rearrangement with gene families. Bioinformatics
**15**(11), 909–917 (1999)CrossRefGoogle Scholar - 25.Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform.
**13**(Suppl 19), S13 (2012)CrossRefGoogle Scholar - 26.Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 280–292. Springer, Cham (2014). doi: 10.1007/978-3-319-05269-4_22 CrossRefGoogle Scholar
- 27.Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics
**21**(16), 3340–3346 (2005)CrossRefGoogle Scholar