Abstract
One of the key computational problems in comparative genomics is the reconstruction of genomes of ancestral species based on genomes of extant species. Since most dramatic changes in genomic architectures are caused by genome rearrangements, this problem is often posed as minimization of the number of genome rearrangements between extant and ancestral genomes. The basic case of three given genomes is known as the genome median problem. Whole genome duplications (WGDs) represent yet another type of dramatic evolutionary events and inspire the reconstruction of pre-duplicated ancestral genomes, referred to as the genome halving problem. Generalization of WGDs to whole genome multiplication events leads to the genome aliquoting problem.
In the present study, we provide polynomial-size integer linear programming formulations for the aforementioned problems. We further obtain such formulations for the restricted versions of the median and halving problems, which have been recently introduced for improving biological relevance.
P. Avdeyevand and N. Alexeev are contributed equally to this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The strand of a gene is typically encoded by a sign. When the strands are ignored, the genomes are represented as permutations of (unsigned) genes.
- 2.
Here we view genome P as evolving and P-edges as changing.
- 3.
Note that V is determined by the genes present in the genomes \(P_1, P_2, \dots , P_q\), and thus V does not depend on the choice of M.
- 4.
Under the inequality \(|a - b| \le c\), we understand a pair of linear inequalities \(a - b \le c\) and \(b - a \le c\).
- 5.
In fact, they also define a genome \(X\in D_m(R)\) and a labeling of gene copies of A and X such that c(A, X) is maximized.
- 6.
In fact, beside R they also define a genome \(X\in D_m(R)\) and a labeling of gene copies of A and X such that \(c(A,X)+c(R,B)\) is maximized.
References
Alekseyev, M.A., Pevzner, P.A.: Colored de Bruijn graphs and the genome halving problem. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 4(1), 98–107 (2007)
Alekseyev, M.A., Pevzner, P.A.: Whole genome duplications, multi-break rearrangements, and genome halving problem. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pp. 665–679. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2007)
Alekseyev, M.A., Pevzner, P.A.: Multi-break rearrangements and chromosomal evolution. Theoret. Comput. Sci. 395(2), 193–202 (2008)
Alexeev, N., Avdeyev, P., Alekseyev, M.A.: Comparative genomics meets topology: a novel view on genome median and halving problems. BMC Bioinf. 17(14), 418 (2016)
Avdeyev, P., Jiang, S., Aganezov, S., Hu, F., Alekseyev, M.A.: Reconstruction of ancestral genomes in presence of gene gain and loss. J. Comput. Biol. 23(3), 150–164 (2016)
Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). doi:10.1007/11851561_16
Caprara, A.: The reversal median problem. INFORMS J. Comput. 15(1), 93–113 (2003)
Caprara, A., Lancia, G., Ng, S.K.: Fast practical solution of sorting by reversals. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), pp. 12–21. Society for Industrial and Applied Mathematics (2000)
Dehal, P., Boore, J.L.: Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 3(10), e314 (2005)
Dias, Z., de Souza, C.C.: Polynomial-sized ILP models for rearrangement distance problems. In: Brazilian Symposium On Bioinformatics, p. 74 (2007)
El-Mabrouk, N., Sankoff, D.: The reconstruction of doubled genomes. SIAM J. Comput. 32(3), 754–792 (2003)
Feijão, P.: Reconstruction of ancestral gene orders using intermediate genomes. BMC Bioinf. 16(Suppl 14), S3 (2015)
Feijão, P., Araujo, E.: Fast ancestral gene order reconstruction of genomes with unequal gene content. BMC Bioinf. 17(14), 413 (2016)
Gagnon, Y., Savard, O.T., Bertrand, D., El-Mabrouk, N.: Advances on genome duplication distances. In: Tannier, E. (ed.) RECOMB-CG 2010. LNCS, vol. 6398, pp. 25–38. Springer, Heidelberg (2010). doi:10.1007/978-3-642-16181-0_3
Gao, N., Yang, N., Tang, J.: Ancestral genome inference using a genetic algorithm approach. PLoS ONE 8(5), 1–6 (2013)
Gavranović, H., Tannier, E.: Guided genome halving: provably optimal solutions provide good insights into the preduplication ancestral genome of saccharomyces cerevisiae. Pac. Symp. Biocomput. 15, 21–30 (2010)
Gurobi Optimization Inc: Gurobi optimizer reference manual (2016). http://www.gurobi.com
Guyot, R., Keller, B.: Ancestral genome duplication in rice. Genome 47(3), 610–614 (2004)
Haghighi, M., Sankoff, D.: Medians seek the corners, and other conjectures. BMC Bioinform. 13(19), 1 (2012)
Hartmann, T., Wieseke, N., Sharan, R., Middendorf, M., Bernt, M.: Genome Rearrangement with ILP. IEEE/ACM Trans. Comput. Biol. Bioinform. (2017, in press). doi:10.1109/TCBB.2017.2708121
Kellis, M., Birren, B.W., Lander, E.S.: Proof and evolutionary analysis of ancient genome duplication in the yeast saccharomyces cerevisiae. Nature 428(6983), 617–624 (2004)
Lancia, A.C.G., Ng, S.K.: A column-generation based branch-and-bound algorithm for sorting by reversals. Math. Support Mol. Biol. 47, 213 (1999)
Lancia, G., Rinaldi, F., Serafini, P.: A unified integer programming model for genome rearrangement problems. In: Ortuño, F., Rojas, I. (eds.) IWBBIO 2015. LNCS, vol. 9043, pp. 491–502. Springer, Cham (2015). doi:10.1007/978-3-319-16483-0_48
Laohakiat, S., Lursinsap, C., Suksawatchon, J.: Duplicated genes reversal distance under gene deletion constraint by integer programming. In: 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, pp. 527–530, May 2008
Mixtacki, J.: Genome halving under DCJ revisited. In: Hu, X., Wang, J. (eds.) COCOON 2008. LNCS, vol. 5092, pp. 276–286. Springer, Heidelberg (2008). doi:10.1007/978-3-540-69733-6_28
Postlethwait, J.H., Yan, Y.L., Gates, M.A., Horne, S., Amores, A., Brownlie, A., Donovan, A., Egan, E.S., Force, A., Gong, Z., et al.: Vertebrate genome evolution and the zebrafish gene map. Nat. Genet. 18(4), 345–349 (1998)
Rajan, V., Xu, A.W., Lin, Y., Swenson, K.M., Moret, B.M.: Heuristics for the inversion median problem. BMC Bioinf. 11(1), S30 (2010)
Savard, O.T., Gagnon, Y., Bertrand, D., El-Mabrouk, N.: Genome halving and double distance with losses. J. Comput. Biol. 18(9), 1185–1199 (2011)
Shao, M., Lin, Y., Moret, B.M.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015)
Shao, M., Moret, B.M.: Comparing genomes with rearrangements and segmental duplications. Bioinformatics 31(12), i329 (2015)
Suksawatchon, J., Lursinsap, C., Boden, M.: Computing the reversal distance between genomes in the presence of multi-gene families via binary integer programming. J. Bioinf. Comput. Biol. 05(01), 117–133 (2007)
Swenson, K.M., Moret, B.M.: Inversion-based genomic signatures. BMC Bioinf. 10(1), 1 (2009)
Tannier, E., Zheng, C., Sankoff, D.: Multichromosomal median and halving problems under different genomic distances. BMC Bioinf. 10(1), 1 (2009)
The OEIS Foundation: The On-Line Encyclopedia of Integer Sequences. Published electronically at http://oeis.org (2017)
Warren, R., Sankoff, D.: Genome aliquoting with double cut and join. BMC Bioinf. 10(1), S2 (2009)
Warren, R., Sankoff, D.: Genome halving with double cut and join. J. Bioinf. Comput. Biol. 7(02), 357–371 (2009)
Xu, A.W.: A fast and exact algorithm for the median of three problem: A graph decomposition approach. J. Comput. Biol. 16(10), 1369–1381 (2009)
Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)
Zhang, M., Arndt, W., Tang, J.: An exact solver for the DCJ median problem. In: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, p. 138. NIH Public Access (2009)
Zheng, C., Zhu, Q., Adam, Z., Sankoff, D.: Guided genome halving: hardness, heuristics and the history of the hemiascomycetes. Bioinformatics 24(13), i96 (2008)
Zheng, C., Zhu, Q., Sankoff, D.: Genome halving with an outgroup. Evol. Bioinf. 2, 295–302 (2006)
Acknowledgements
The work of PA and MAA is supported by the National Science Foundation under the grant No. IIS-1462107. The work of NA and YR is partially supported by the National Science Foundation under the grant No. DMS-1406984.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Avdeyev, P., Alexeev, N., Rong, Y., Alekseyev, M.A. (2017). A Unified ILP Framework for Genome Median, Halving, and Aliquoting Problems Under DCJ. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-67979-2_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)