A Fast and Exact Algorithm for the Exemplar Breakpoint Distance

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)

Abstract

A fundamental problem in comparative genomics is to compute the distance between two genomes. For two genomes without duplicate genes, we can easily compute a variety of distance measures in linear time, but the problem is NP-hard under most models when genomes contain duplicate genes. Sankoff proposed the use of exemplars to tackle the problem of duplicates genes and gene families: each gene family is represented by a single gene (the exemplar for that family), chosen so as to optimize some metric. Unfortunately, choosing exemplars is itself an NP-hard problem. In this paper, we propose a very fast and exact algorithm to compute the exemplar breakpoint distance, based on new insights in the underlying structure of genome rearrangements and exemplars. We evaluate the performance of our algorithm on simulation data and compare its performance to the best effort to date (a divide-and-conquer approach), showing that our algorithm runs much faster and scales much better. We also devise a new algorithm for the generalized breakpoint distance problem, which can then be applied to assign orthologs. We compare our algorithm with the state-of-the-art method MSOAR by assigning orthologs among five well annotated mammalian genomes, showing that our algorithm runs much faster and is slightly more accurate than MSOAR.

Keywords

Exemplar Breakpoint distance ILP Orthology assignment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fertin, G., Labarre, A., Rusu, I., Tannier, E., Vialette, S.: Combinatorics of Genome Rearrangements. MIT Press (2009)Google Scholar
  2. 2.
    Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: Proc. 27th Ann. ACM Symp. Theory of Comput. (STOC 1995), pp. 178–189. ACM Press, New York (1995)Google Scholar
  3. 3.
    Bader, D.A., Moret, B.M.E., Yan, M.: A fast linear-time algorithm for inversion distance with an experimental comparison. J. Comput. Biol. 8(5), 483–491 (2001)CrossRefGoogle Scholar
  4. 4.
    Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)CrossRefGoogle Scholar
  5. 5.
    Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 163–173. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  6. 6.
    Bailey, J.A., Eichler, E.E.: Primate segmental duplications: crucibles of evolution, diversity and disease. Nature Reviews Genetics 7(7), 552–564 (2006)CrossRefGoogle Scholar
  7. 7.
    Lynch, M.: The Origins of Genome Architecture. Sinauer (2007)Google Scholar
  8. 8.
    Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)CrossRefGoogle Scholar
  9. 9.
    Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics. Computational Biology, vol. 1 (2000)Google Scholar
  10. 10.
    Chen, Z., Fu, B., Zhu, B.: The approximability of the exemplar breakpoint distance problem. In: Cheng, S.-W., Poon, C.K. (eds.) AAIM 2006. LNCS, vol. 4041, pp. 291–302. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  11. 11.
    Blin, G., Fertin, G., Sikora, F., Vialette, S.: The Exemplar Breakpoint Distance for non-trivial genomes cannot be approximated. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp. 357–368. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  12. 12.
    Nguyen, C.T., Tay, Y.C., Zhang, L.: Divide-and-conquer approach for the exemplar breakpoint distance. Bioinformatics 21(10), 2171–2176 (2005)CrossRefGoogle Scholar
  13. 13.
    Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In: Tesler, G., Durand, D. (eds.) RECMOB-CG 2007. LNCS (LNBI), vol. 4751, pp. 16–29. Springer, Heidelberg (2007) CrossRefGoogle Scholar
  14. 14.
    Fu, Z., Chen, X., Vacic, V., Nan, P., Zhong, Y., Jiang, T.: MSOAR: A high-throughput ortholog assignment system based on genome rearrangement. Journal of Computational Biology 14(9), 1160–1175 (2007)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Shao, M., Lin, Y., Moret, B.: An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In: Sharan, R. (ed.) RECOMB 2014. LNCS, vol. 8394, pp. 280–292. Springer, Heidelberg (2014) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Laboratory for Computational Biology and BioinformaticsEPFLLausanneSwitzerland

Personalised recommendations