Abstract
The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels (i.e., insertions and deletions) are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin–Kernighan heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to exhibit pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.
Similar content being viewed by others
References
Angibaud S, Fertin G, Rusu I, Thévenin A, Vialette S (2009) On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1):19–53
Bader DA, Moret BME, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J Comput Biol 8:483–491
Bafna V, Pevzner PA (1998) Sorting by transpositions. SIAM J Discret Math 11(2):224–240
Bergeron A, Mixtacki J, Stoye J (2005) On sorting by translocations. In: Journal of computational biology. Springer, Heidelberg, pp 615–629
Blin G, Chauve C, Fertin G (2004) The breakpoint distance for signed sequences. In: Proceedings of CompBioNets 2004. vol. text in algorithms. King’s College, London, pp 3–16
Bourque G, Pevzner PA (2002) Genome-scale evolution: reconstructing gene orders in the ancestral species. Genome Res 12(1):26–36
Braga MDV, Willing E, Stoye J (2010) Genomic distance with DCJ and indels. In: Proceedings of the 10th international conference on algorithms in bioinformatics, WABI’10. Springer, Berlin/Heidelberg, pp 90–101
Brewer C, Holloway S, Zawalnyski P, Schinzel A, FitzPatrick D (1999) A chromosomal duplication map of malformations: regions of suspected haplo and triplolethality and tolerance of segmental aneuploidy in humans. Am J Hum Genet 64(6):1702–1708
Caprara A (2003) The reversal median problem. INFORMS J Comput 15(1):93–113
Chauve C, Fertin G, Rizzi R, Vialette S (2006) Genomes containing duplicates are hard to compare. In: Proceedings of international workshop on bioinformatics research and applications (IWBRA), LNCS. Springer, Reading, pp 783–790
Chen X, Zheng J, Fu Z, Nan P, Zhong Y, Lonardi S, Jiang T (2005) Assignment of orthologous genes via genome rearrangement. IEEE/ACM Trans Comput Biol Bioinform 2(4):302–315
Chen Z, Fu B, Zhu B (2012) Erratum: the approximability of the exemplar breakpoint distance problem. In: FAW-AAIM. Springer, Heidelberg, p 368
Compeau PEC (2012) A simplified view of DCJ-indel distance. In: Proceedings of the 12th international conference on algorithms in bioinformatics, WABI’12. Springer, Berlin/Heidelberg, pp 365–377
Gao N, Yang N, Tang J (2013) Ancestral genome inference using a genetic algorithm approach. PLoS One 8(5):e62156
Hu F, Zhou J, Zhou L, Tang J (2014) Probabilistic reconstruction of ancestral gene orders with insertions and deletions. IEEE/ACM Trans Comput Biol Bioinform 11(4):667–672
Lenne R, Solnon C, Stutzle T, Tannier E, Birattari M (2008) Reactive stochastic local search algorithms for the genomic median problem. In: Carlos Cotta JVH (ed) Eighth European conference on evolutionary computation in combinatorial optimisation (EvoCOP). LNCS, Springer, Berlin, pp 266–276
Mabrouk NE (2001) Sorting signed permutations by reversals and insertions/deletions of contiguous segments. J Discret Algorithms 1(1):105–122
Moret BME, Tang J, san Wang L, Warnow Y (2002) Steps toward accurate reconstructions of phylogenies from gene-order data. J Comput Syst Sci 65:508–525
Moret BME, Wang LS, Warnow T, Wyman SK (2001) New approaches for reconstructing phylogenies from gene order data. In: ISMB (Supplement of bioinformatics), pp 165–173
Nguyen CT, Tay YC, Zhang L (2005) Divide-and-conquer approach for the exemplar breakpoint distance. Bioinformatics 21(10):2171–2176
Pe’er I, Shamir R (1998) The median problems for breakpoints are np-complete. Technical Report 71, Electronic Colloquium on Computational Complexity
Rajan V, Xu AW, Lin Y, Swenson KM, Moret BME (2010) Heuristics for the inversion median problem. BMC Bioinform 11(S–1):30
Sankoff D (1999) Genome rearrangement with gene families. Bioinformatics 15(11):909–917
Shao M, Lin Y (2012) Approximating the edit distance for genomes with duplicate genes under DCJ, insertion and deletion. BMC Bioinform 13(S–19):S13
Shao M, Lin Y, Moret BME (2014) An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: RECOMB, Pittsburgh, pp 280–292
Xu AW (2009) DCJ median problems on linear multichromosomal genomes: graph representation and fast exact solutions. In: RECOMB-CG, Budapest, pp 70–83
Xu AW (2009) A fast and exact algorithm for the median of three problem: a graph decomposition approach. J Comput Biol 16(10):1369–1381
Xu AW, Moret BME (2011) Gasts: parsimony scoring under rearrangements. In: WABI. Springer, Berlin, pp 351–363
Xu AW, Sankoff D (2008) Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem. In: Proceedings of the 8th international workshop on algorithms in bioinformatics, WABI ’08. Springer, Berlin/Heidelberg, pp 25–37
Yancopoulos S, Attie O, Friedberg R (2005) Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16):3340–3346
Yancopoulos S, Friedberg R (2008) Sorting genomes with insertions, deletions and duplications by DCJ. In: Nelson CE, Vialette S (eds) RECOMB-CG. Lecture notes in computer science, vol 5267. Springer, Berlin, pp 170–183
Yin Z, Tang J, Schaeffer SW, Bader DA (2013) Streaming breakpoint graph analytics for accelerating and parallelizing the computation of DCJ median of three genomes. In: ICCS, pp 561–570
Acknowledgments
This Research was sponsored in part by the NSF OCI-0904461 (Bader), OCI-0904179, IIS-1161586 (Tang) and OCI- 0904166 (Schaeffer).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yin, Z., Tang, J., Schaeffer, S.W. et al. Exemplar or matching: modeling DCJ problems with unequal content genome data. J Comb Optim 32, 1165–1181 (2016). https://doi.org/10.1007/s10878-015-9940-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10878-015-9940-4