A Tractable Variant of the Single Cut or Join Distance with Duplicated Genes

  • Pedro Feijão
  • Aniket Mane
  • Cedric ChauveEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)


In this work, we introduce a variant of the Single Cut or Join distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and creation of single-gene circular chromosomes. We prove that in this model, computing the distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. Simulations show that the inferred number of cuts and joins scales linearly with the true number of such events even at high rates of genome rearrangements and segmental duplications. We also show that the median problem is tractable for this distance.



CC is supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada. PF is supported by the Genome Canada grant PathoGiST.


  1. 1.
    Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press, Cambridge (2009)CrossRefzbMATHGoogle Scholar
  2. 2.
    Wang, D., Li, D., Ning, K., Wang, L.: Core-genome scaffold comparison reveals the prevalence that inversion events are associated with pairs of inverted repeats. BMC Genom. 18(1), 268 (2017)CrossRefGoogle Scholar
  3. 3.
    Neafsey, D., Waterhouse, R., et al.: Mosquito genomics. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science 347(6217), 1258522 (2015)CrossRefGoogle Scholar
  4. 4.
    Hannenhalli, S., Pevzner, P.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: 27th Annual ACM Symposium on the Theory of Computing (STOC 1995), pp. 178–189 (1995)Google Scholar
  5. 5.
    Feijão, P., Meidanis, J.: SCJ: a breakpoint-like distance that simplifies several rearrangement problems. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(5), 1318–1329 (2011)CrossRefGoogle Scholar
  6. 6.
    da Silva, P., Machado, R., Dantas, S., Braga, M.: DCJ-indel and DCJ-substitution distances with distinct operation costs. Algorithms Mol. Biol. 8(1), 21 (2013)CrossRefGoogle Scholar
  7. 7.
    Braga, M., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under various different genomic distances. BMC Bioinform. 10, 120 (2009)CrossRefGoogle Scholar
  9. 9.
    Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Compeau, P.E.C.: DCJ-Indel sorting revisited. Algorithms Mol. Biol. 8, 6 (2013)CrossRefGoogle Scholar
  11. 11.
    Rubert, D., Feijão, P., Braga, M., Stoye, J., Martinez, F.: Approximating the DCJ distance of balanced genomes in linear time. Algorithms Mol. Biol. 12, 3 (2017)CrossRefGoogle Scholar
  12. 12.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment and Evolution of Gene Families, vol. 1, pp. 207–211. Springer, Dordrecht (2000). doi: 10.1007/978-94-011-4309-7_19 CrossRefGoogle Scholar
  13. 13.
    Blin, G., Chauve, C., Fertin, G.: The breakpoint distance for signed sequences. In: Algorithms and Computational Methods for Biochemical and Evolutionary Networks (CompBioNets 2004). Text in Algorithms, vol. 3, pp. 3–16 (2004)Google Scholar
  14. 14.
    Angibaud, S., Fertin, G., Rusu, I., Thevenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Blin, G., Fertin, G., Sikora, F., Vialette, S.: The ExemplarBreakpointDistance for non-trivial genomes cannot be approximated. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 357–368. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-00202-1_31 CrossRefGoogle Scholar
  16. 16.
    Shao, M., Moret, B.: A fast and exact algorithm for the exemplar breakpoint distance. J. Comput. Biol. 23(5), 337–346 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Shao, M., Moret, B.: On computing breakpoint distances for genomes with duplicate genes. J. Comput. Biol. (2016, ahead of print). doi: 10.1089/cmb.2016.0149
  18. 18.
    Wei, Z., Zhu, D., Wang, L.: A dynamic programming algorithm for (1,2)-exemplar breakpoint distance. J. Comput. Biol. 22(7), 666–676 (2014)MathSciNetCrossRefGoogle Scholar
  19. 19.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Zeira, R., Shamir, R.: Sorting by cuts, joins, and whole chromosome duplications. J. Comput. Biol. 24(2), 127–137 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Sankoff, D., El-Mabrouk, N.: Duplication, rearrangement and reconciliation. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics Map, Alignment and Evolution of Gene Families, vol. 1, pp. 537–550. Springer, Dordrecht (2000). doi: 10.1007/978-94-011-4309-7_46 CrossRefGoogle Scholar
  22. 22.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  23. 23.
    Chauve, C., El-Mabrouk, N., Guéguen, L., Semeria, M., Tannier, E.: Duplication, rearrangement and reconciliation a follow-up 13 years later. In: Chauve, C., El-Mabrouk, N., Tannier, E. (eds.) Models and Algorithms for Genome Evolution, vol. 19, pp. 47–62. Springer, London (2013). doi: 10.1007/978-1-4471-5298-9_4 CrossRefGoogle Scholar
  24. 24.
    Duchemin, W., Anselmetti, Y., Patterson, M., Ponty, Y., Bérard, S., Chauve, C., Scornavacca, C., Daubin, V., Tannier, E.: DeCoSTAR: reconstructing the ancestral organization of genes or genomes using reconciled phylogenies. Genome Biol. Evol. 9(5), 1312–1319 (2017)CrossRefGoogle Scholar
  25. 25.
    Plummer, M.D., Lovász, L.: Matching Theory. Elsevier, Amsterdam (1986)zbMATHGoogle Scholar
  26. 26.
    Luhmann N., Lafond M., Thevenin A., Ouangraoua A., Wittler R., Chauve C.: The SCJ small parsimony problem for weighted gene adjacencies. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017, ahead of print). doi: 10.1109/TCBB.2017.2661761
  27. 27.
    Miklós, I., Kiss, S., Tannier, E.: Counting and sampling SCJ small parsimony solutions. Theoret. Comput. Sci. 552, 83–98 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Biller, P., Guéguen, L., Tannier, E.: Moments of genome evolution by Double Cut-and-Join. BMC Bioinform. 16(Suppl 14), S7 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.Department of MathematicsSimon Fraser UniversityBurnabyCanada

Personalised recommendations