Skip to main content
Log in

Exemplar or matching: modeling DCJ problems with unequal content genome data

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels (i.e., insertions and deletions) are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin–Kernighan heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to exhibit pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Angibaud S, Fertin G, Rusu I, Thévenin A, Vialette S (2009) On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1):19–53

    Article  MathSciNet  MATH  Google Scholar 

  • Bader DA, Moret BME, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J Comput Biol 8:483–491

    Article  MathSciNet  MATH  Google Scholar 

  • Bafna V, Pevzner PA (1998) Sorting by transpositions. SIAM J Discret Math 11(2):224–240

    Article  MathSciNet  MATH  Google Scholar 

  • Bergeron A, Mixtacki J, Stoye J (2005) On sorting by translocations. In: Journal of computational biology. Springer, Heidelberg, pp 615–629

  • Blin G, Chauve C, Fertin G (2004) The breakpoint distance for signed sequences. In: Proceedings of CompBioNets 2004. vol. text in algorithms. King’s College, London, pp 3–16

  • Bourque G, Pevzner PA (2002) Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res 12(1):26–36

    Google Scholar 

  • Braga MDV, Willing E, Stoye J (2010) Genomic distance with DCJ and indels. In: Proceedings of the 10th international conference on algorithms in bioinformatics, WABI’10. Springer, Berlin/Heidelberg, pp 90–101

  • Brewer C, Holloway S, Zawalnyski P, Schinzel A, FitzPatrick D (1999) A chromosomal duplication map of malformations: regions of suspected haplo and triplolethality and tolerance of segmental aneuploidy in humans. Am J Hum Genet 64(6):1702–1708

    Article  Google Scholar 

  • Caprara A (2003) The reversal median problem. INFORMS J Comput 15(1):93–113

    Article  MathSciNet  MATH  Google Scholar 

  • Chauve C, Fertin G, Rizzi R, Vialette S (2006) Genomes containing duplicates are hard to compare. In: Proceedings of international workshop on bioinformatics research and applications (IWBRA), LNCS. Springer, Reading, pp 783–790

  • Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T (2005) Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans Comput Biol Bioinform 2(4):302–315

    Article  Google Scholar 

  • Chen Z, Fu B, Zhu B (2012) Erratum: the approximability of the exemplar breakpoint distance problem. In: FAW-AAIM. Springer, Heidelberg, p 368

  • Compeau PEC (2012) A simplified view of DCJ-indel distance. In: Proceedings of the 12th international conference on algorithms in bioinformatics, WABI’12. Springer, Berlin/Heidelberg, pp 365–377

  • Gao N, Yang N, Tang J (2013) Ancestral genome inference using a genetic algorithm approach. PLoS One 8(5):e62156

    Article  Google Scholar 

  • Hu F, Zhou J, Zhou L, Tang J (2014) Probabilistic reconstruction of ancestral gene orders with insertions and deletions. IEEE/ACM Trans Comput Biol Bioinform 11(4):667–672

  • Lenne R, Solnon C, Stutzle T, Tannier E, Birattari M (2008) Reactive stochastic local search algorithms for the genomic median problem. In: Carlos Cotta JVH (ed) Eighth European conference on evolutionary computation in combinatorial optimisation (EvoCOP). LNCS, Springer, Berlin, pp 266–276

    Chapter  Google Scholar 

  • Mabrouk NE (2001) Sorting signed permutations by reversals and insertions/deletions of contiguous segments. J Discret Algorithms 1(1):105–122

    MathSciNet  Google Scholar 

  • Moret BME, Tang J, san Wang L, Warnow Y (2002) Steps toward accurate reconstructions of phylogenies from gene-order data. J Comput Syst Sci 65:508–525

    Article  MathSciNet  MATH  Google Scholar 

  • Moret BME, Wang LS, Warnow T, Wyman SK (2001) New approaches for reconstructing phylogenies from gene order data. In: ISMB (Supplement of bioinformatics), pp 165–173

  • Nguyen CT, Tay YC, Zhang L (2005) Divide-and-conquer approach for the exemplar breakpoint distance. Bioinformatics 21(10):2171–2176

    Article  Google Scholar 

  • Pe’er I, Shamir R (1998) The median problems for breakpoints are np-complete. Technical Report 71, Electronic Colloquium on Computational Complexity

  • Rajan V, Xu AW, Lin Y, Swenson KM, Moret BME (2010) Heuristics for the inversion median problem. BMC Bioinform 11(S–1):30

    Article  Google Scholar 

  • Sankoff D (1999) Genome rearrangement with gene families. Bioinformatics 15(11):909–917

    Article  Google Scholar 

  • Shao M, Lin Y (2012) Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform 13(S–19):S13

    Article  Google Scholar 

  • Shao M, Lin Y, Moret BME (2014) An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: RECOMB, Pittsburgh, pp 280–292

  • Xu AW (2009) DCJ median problems on linear multichromosomal genomes: graph representation and fast exact solutions. In: RECOMB-CG, Budapest, pp 70–83

  • Xu AW (2009) A fast and exact algorithm for the median of three problem: a graph decomposition approach. J Comput Biol 16(10):1369–1381

    Article  MathSciNet  Google Scholar 

  • Xu AW, Moret BME (2011) Gasts: parsimony scoring under rearrangements. In: WABI. Springer, Berlin, pp 351–363

  • Xu AW, Sankoff D (2008) Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem. In: Proceedings of the 8th international workshop on algorithms in bioinformatics, WABI ’08. Springer, Berlin/Heidelberg, pp 25–37

  • Yancopoulos S, Attie O, Friedberg R (2005) Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16):3340–3346

    Article  Google Scholar 

  • Yancopoulos S, Friedberg R (2008) Sorting genomes with insertions, deletions and duplications by DCJ. In: Nelson CE, Vialette S (eds) RECOMB-CG. Lecture notes in computer science, vol 5267. Springer, Berlin, pp 170–183

    Google Scholar 

  • Yin Z, Tang J, Schaeffer SW, Bader DA (2013) Streaming breakpoint graph analytics for accelerating and parallelizing the computation of DCJ median of three genomes. In: ICCS, pp 561–570

Download references

Acknowledgments

This Research was sponsored in part by the NSF OCI-0904461 (Bader), OCI-0904179, IIS-1161586 (Tang) and OCI- 0904166 (Schaeffer).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jijun Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yin, Z., Tang, J., Schaeffer, S.W. et al. Exemplar or matching: modeling DCJ problems with unequal content genome data. J Comb Optim 32, 1165–1181 (2016). https://doi.org/10.1007/s10878-015-9940-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9940-4

Keywords

Navigation