Skip to main content

Computing the Rearrangement Distance of Natural Genomes

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2020)

Abstract

The computation of genomic distances has been a very active field of computational comparative genomics over the last 25 years. Substantial results include the polynomial-time computability of the inversion distance by Hannenhalli and Pevzner in 1995 and the introduction of the double-cut and join (DCJ) distance by Yancopoulos, Attie and Friedberg in 2005. Both results, however, rely on the assumption that the genomes under comparison contain the same set of unique markers (syntenic genomic regions, sometimes also referred to as genes). In 2015, Shao, Lin and Moret relax this condition by allowing for duplicate markers in the analysis. This generalized version of the genomic distance problem is NP-hard, and they give an ILP solution that is efficient enough to be applied to real-world datasets. A restriction of their approach is that it can be applied only to balanced genomes, that have equal numbers of duplicates of any marker. Therefore it still needs a delicate preprocessing of the input data in which excessive copies of unbalanced markers have to be removed.

In this paper we present an algorithm solving the genomic distance problem for natural genomes, in which any marker may occur an arbitrary number of times. Our method is based on a new graph data structure, the multi-relational diagram, that allows an elegant extension of the ILP by Shao, Lin and Moret to count runs of markers that are under- or over-represented in one genome with respect to the other and need to be inserted or deleted, respectively. With this extension, previous restrictions on the genome configurations are lifted, for the first time enabling an uncompromising rearrangement analysis. Any marker sequence can directly be used for the distance calculation.

The evaluation of our approach shows that it can be used to analyze genomes with up to a few ten thousand markers, which we demonstrate on simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The exclusive markers are not restricted to be singular, because it is mathematically trivial to transform them into singular markers when they occur in multiple copies.

References

  1. Angibaud, S., Fertin, G., Rusu, I., Thévenin, A., Vialette, S.: On the approximability of comparing genomes with duplicates. J. Graph Algorithms Appl. 13(1), 19–53 (2009). A preliminary version appeared in Proceedings of WALCOM 2008

    Article  MathSciNet  MATH  Google Scholar 

  2. Bergeron, A., Mixtacki, J., Stoye, J.: A unifying view of genome rearrangements. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS, vol. 4175, pp. 163–173. Springer, Heidelberg (2006). https://doi.org/10.1007/11851561_16

    Chapter  Google Scholar 

  3. Bohnenkämper, L., Braga, M.D.V., Doerr, D., Stoye, J.: Computing the rearrangement distance of natural genomes. arXiv:2001.02139 (2020)

  4. Braga, M.D.V.: An overview of genomic distances modeled with indels. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds.) CiE 2013. LNCS, vol. 7921, pp. 22–31. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-39053-1_3

    Chapter  Google Scholar 

  5. Braga, M.D.V., Willing, E., Stoye, J.: Double cut and join with insertions and deletions. J. Comput. Biol. 18(9), 1167–1184 (2011). A preliminary version appeared in Proceedings of WABI 2010

    Article  MathSciNet  Google Scholar 

  6. Bryant, D.: The complexity of calculating exemplar distances. In: Sankoff, D., Nadeau, J.H. (eds.) Comparative Genomics, pp. 207–211. Kluwer Academic Publishers, Dordrecht (2000)

    Chapter  Google Scholar 

  7. Bulteau, L., Jiang, M.: Inapproximability of (1,2)-exemplar distance. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(6), 1384–1390 (2013). A preliminary version appeared in Proceedings of ISBRA 2012

    Article  Google Scholar 

  8. Chaisson, M.J.P., et al.: Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10(1), 1–16 (2019)

    Article  Google Scholar 

  9. Compeau, P.E.C.: DCJ-indel sorting revisited. Algorithms Mol. Biol. 8, 6 (2013). A preliminary version appeared in Proceedings of WABI 2012

    Article  Google Scholar 

  10. Friedberg, R., Darling, A.E., Yancopoulos, S.: Genome rearrangement by the double cut and join operation. In: Keith, J.M. (ed.) Bioinformatics, Volume I: Data, Sequence Analysis, and Evolution, Methods in Molecular Biology, vol. 452, pp. 385–416. Humana Press, Totowa (2008)

    Chapter  Google Scholar 

  11. Hannenhalli, S., Pevzner, P.A.: Transforming men into mice (polynomial algorithm for genomic distance problem). In: Proceedings of the 36th Annual Symposium of the Foundations of Computer Science (FOCS 1995), pp. 581–592. IEEE Press (1995)

    Google Scholar 

  12. Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1–27 (1999). A preliminary version appeared in Proceedings of STOC 1995

    Article  MathSciNet  MATH  Google Scholar 

  13. Lyubetsky, V., Gershgorin, R., Gorbunov, K.: Chromosome structures: reduction of certain problems with unequal gene contemnt and gene paralogs to integer linear programming. BMC Bioinform. 18, 537 (2017)

    Article  Google Scholar 

  14. Martinez, F.V., Feijão, P., Braga, M.D.V., Stoye, J.: On the family-free DCJ distance and similarity. Algorithms Mol. Biol. 10, 13 (2015). A preliminary version appeared in Proceedings of WABI 2014

    Article  Google Scholar 

  15. Sankoff, D.: Edit distance for genome comparison based on non-local operations. In: Apostolico, A., Crochemore, M., Galil, Z., Manber, U. (eds.) CPM 1992. LNCS, vol. 644, pp. 121–135. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-56024-6_10

    Chapter  Google Scholar 

  16. Sankoff, D.: Genome rearrangement with gene families. Bioinformatics 15(11), 909–917 (1999)

    Article  Google Scholar 

  17. Shao, M., Lin, Y., Moret, B.M.E.: An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. J. Comput. Biol. 22(5), 425–435 (2015). A preliminary version appeared in Proceedings of RECOMB 2014

    Article  MathSciNet  Google Scholar 

  18. Yancopoulos, S., Attie, O., Friedberg, R.: Efficient sorting of genomic permutations by translocation, inversion and block interchange. Bioinformatics 21(16), 3340–3346 (2005)

    Article  Google Scholar 

  19. Yancopoulos, S., Friedberg, R.: DCJ path formulation for genome transformations which include insertions, deletions, and duplications. J. Comput. Biol. 16(10), 1311–1338 (2009). A preliminary version appeared in Proceedings of RECOMB-CG 2008

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jens Stoye .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bohnenkämper, L., Braga, M.D.V., Doerr, D., Stoye, J. (2020). Computing the Rearrangement Distance of Natural Genomes. In: Schwartz, R. (eds) Research in Computational Molecular Biology. RECOMB 2020. Lecture Notes in Computer Science(), vol 12074. Springer, Cham. https://doi.org/10.1007/978-3-030-45257-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-45257-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-45256-8

  • Online ISBN: 978-3-030-45257-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics