A Linear Time Approximation Algorithm for the DCJ Distance for Genomes with Bounded Number of Duplicates

  • Diego P. Rubert
  • Pedro Feijão
  • Marília D. V. Braga
  • Jens Stoye
  • Fábio V. MartinezEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9838)


Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two genomes with the same content, so that we have exactly the same number of copies of each gene in each genome, we are interested in the problem of computing the rearrangement distance between them, i.e., finding the minimum number of DCJ operations that transform one genome into the other. We propose a linear time approximation algorithm with approximation factor O(k) for the DCJ distance problem, where k is the maximum number of duplicates of any gene in the input genomes. Our algorithm uses as an intermediate step an O(k)-approximation for the minimum common string partition problem, which is closely related to the DCJ distance problem. Experiments on simulated data sets show that the algorithm is very competitive both in efficiency and quality of the solutions.


Integer Linear Program Adjacency Graph Breakpoint Distance Input Genome Integer Linear Program Solver 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: Efficient tools for computing the number of breakpoints and the number of adjacencies between two genomes with duplicate genes. J. Comput. Biol. 15(8), 1093–1115 (2008)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Angibaud, S., Fertin, G., Rusu, I., Vialette, S.: A pseudo-boolean framework for computing rearrangement distances between genomes with duplicates. J. Comput. Biol. 14(4), 379–393 (2007)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Braga, M.D.V., Stoye, J.: The solution space of sorting by DCJ. J. Comp. Biol. 17(9), 1145–1165 (2010)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dortrecht (2000)CrossRefGoogle Scholar
  7. 7.
    Farach, M.: Optimal suffix tree construction with large alphabets. In: Proceedings of IEEE/FOCS 1997, pp. 137–143 (1997)Google Scholar
  8. 8.
    Goldstein, A., Kolman, P., Zheng, J.: Minimum common string partition problem: hardness and approximations. Eletron. J. Comb. 12, 18 (2005). R50MathSciNetzbMATHGoogle Scholar
  9. 9.
    Hannenhalli, S., Pevzner, P.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of FOCS 1995, pp. 581–592 (1995)Google Scholar
  10. 10.
    Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint distance. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 83–92. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Kolman, P., Waleń, T.: Reversal distance for strings with duplicates: linear time approximation using hitting set. Electron. J. Comb. 14(1), R50 (2007)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Shao, M., Lin, Y.: Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform. 13(Suppl 19), S13 (2012)CrossRefGoogle Scholar
  13. 13.
    Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Swenson, K., Marron, M., Earnest-DeYong, K., Moret, B.M.E.: Approximating the true evolutionary distance between two genomes. In: Proceedings of ALENEX/ANALCO 2005, pp. 121–129 (2005)Google Scholar
  15. 15.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchanges. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar
  16. 16.
    Yancopoulos, S., Friedberg, R.: DCJ path formulation for genome transformations which include insertions, deletions, and duplications. J. Comput. Biol. 16(10), 1311–1338 (2009)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Diego P. Rubert
    • 1
  • Pedro Feijão
    • 2
  • Marília D. V. Braga
    • 2
  • Jens Stoye
    • 2
  • Fábio V. Martinez
    • 1
    Email author
  1. 1.Faculdade de ComputaçãoUniversidade Federal de Mato Grosso Do SulCampo GrandeBrazil
  2. 2.Faculty of Technology and Center for Biotechnology (CeBiTec)Bielefeld UniversityBielefeldGermany

Personalised recommendations