Skip to main content

Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach

Abstract

The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. If we do not require that \(\mathcal {M}\) generates \(\mathcal {S}_N\), then it is certainly trivial!

  2. The result in Serdoz et al. (2017) is stated in terms of likelihoods; however, the likelihoods are for single elements of \(\mathcal {S}_N\), with dihedral symmetry not included in calculations until later in the paper.

  3. We note that the OEIS entry includes a characterisation of this sequence that is equivalent to our definition of genomes (namely, the number of necklaces that may be formed from N distinct beads).

  4. Simply defined via the matrices \(\rho _p(d)\), for \(d\in D_N\).

  5. \(\rho _{p^{*}}(\sigma ):=\mathrm {sgn}(\sigma )\rho _p(\sigma )\).

  6. Underlying code written by Franco Saliola.

  7. Further examples and discussion of this phenomenon are given below.

  8. The errors were easily identified by, for example, summing the projection matrices for a given irreducible representation.

  9. We computed path probabilities \(\alpha _k(\sigma )\) both via partial traces (4) and directly from the irreducible representations (3)—in the latter case, avoiding eigenvalue/eigenvector estimation—and these coincide. Additionally, for the cases predicted theoretically by the results of Sect. 4, we obtained zero partial trace values (within the expected numerical tolerance).

References

  • Bader D, Moret B, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J Comput Biol 8(5):483–491

    Article  MATH  Google Scholar 

  • Bader M, Ohlebusch E (2006) Sorting by weighted reversals, transpositions, and inverted transpositions. In: Proceedings of the 10th annual international conference on research in computational molecular biology, RECOMB 2006, Venice, Italy, April 2–5, 2006, pp 563–577

  • Baudet C, Dias U, Dias Z (2014) Length and symmetry on the sorting by weighted inversions problem. In: Campos S (ed) Advances in bioinformatics and computational biology. Springer, Cham, pp 99–106

    Chapter  Google Scholar 

  • Bhatia S, Feijäo P, Francis AR (2016) Position and content paradigms in genome rearrangements: the wild and crazy world of permutations in genomics. Preprint, arXiv:1610.00077

  • Caprara A, Lancia G (2000) Experimental and statistical analysis of sorting by reversals. In: Sankoff D, Nadeau JH (eds) Comparative genomics: empirical and analytical approaches to gene order dynamics, map alignment and the evolution of gene families. Springer Netherlands, Dordrecht, pp 171–183

  • Darling AE, Miklós I, Ragan MA (2008) Dynamics of genome rearrangement in bacterial populations. PLoS Genet 4(7):e1000128

    Article  Google Scholar 

  • Dobzhansky T, Sturtevant AH (1938) Inversions in the chromosomes of Drosophila pseudoobscura. Genetics 23(1):28–64

    Google Scholar 

  • Egri-Nagy A, Gebhardt V, Tanaka MM, Francis AR (2014) Group-theoretic models of the inversion process in bacterial genomes. J Math Biol 69(1):243–265

    MathSciNet  Article  MATH  Google Scholar 

  • Eriksen N, Hultman A (2004) Estimating the expected reversal distance after a fixed number of reversals. Adv Appl Math 32(3):439–453

    MathSciNet  Article  MATH  Google Scholar 

  • Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland

    Google Scholar 

  • Fertin G, Labarre A, Rusu I, Tannier É, Vialette S (2009) Combinatorics of genome rearrangements. Computational Molecular Biology. MIT Press, Cambridge

    Book  MATH  Google Scholar 

  • Francis AR (2014) An algebraic view of bacterial genome evolution. J Math Biol 69(6–7):1693–1718

    MathSciNet  Article  MATH  Google Scholar 

  • Fulton W, Harris J (1991) Representation theory. Graduate Texts in Mathematics, vol 129. Springer, New York. A first course, Readings in Mathematics

  • Golomb SW, Welch LR (1960) On the enumeration of polygons. Am Math Mon 67:349–353

    MathSciNet  Article  MATH  Google Scholar 

  • Hannenhalli S, Pevzner PA (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM 46(1):1–27

    MathSciNet  Article  MATH  Google Scholar 

  • Kececioglu J, Sankoff D (1995) Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica 13(1–2):180–210

    MathSciNet  Article  MATH  Google Scholar 

  • Larget B, Simon DL, Kadane JB, Sweet D (2005) A Bayesian analysis of metazoan mitochondrial genome arrangements. Mol Biol Evol 22(3):486–495

    Article  Google Scholar 

  • Lin Y, Moret BME (2008) Estimating true evolutionary distances under the DCJ model. Bioinformatics 24(13):i114–i122

    Article  Google Scholar 

  • Moret BM, Wang LS, Warnow T, Wyman SK (2001) New approaches for reconstructing phylogenies from gene order data. Bioinformatics (Oxford, England) 17(Suppl 1):S165–S173

    Article  Google Scholar 

  • R Core Team (2013) R: A language and environment for statistical computing

  • Sagan BE (2001) The symmetric group. Graduate Texts in Mathematics, vol 203, 2nd edn. Springer, New York, Representations, combinatorial algorithms, and symmetric functions

  • Serdoz S, Egri-Nagy A, Sumner J, Holland BR, Jarvis PD, Tanaka MM, Francis AR (2017) Maximum likelihood estimates of pairwise rearrangement distances. J Theor Biol 423:31–40

    MathSciNet  Article  MATH  Google Scholar 

  • Street AP, Day R (1982) Sequential binary arrays. II. Further results on the square grid. In: Combinatorial mathematics, IX (Brisbane, 1981). Lecture Notes in Math., vol 952. Springer, Berlin-New York, pp 392–418

  • Sturtevant AH, Tan CC (1937) The comparative genetics of Drosophila pseudoobscura and D. melanogaster. J Genet 34(3):415–432

    Article  Google Scholar 

  • Sumner JG, Jarvis PD, Francis AR (2017) A representation-theoretic approach to the calculation of evolutionary distance in bacteria. J Phys A 50(33):335601, 14

    MathSciNet  Article  MATH  Google Scholar 

  • The On-Line Encyclopedia of Integer Sequences, 2010. https://oeis.org. Accessed 4 May 2017

  • The Sage Developers. SageMath, the Sage Mathematics Software System (Version 7.5.1) (2017) http://www.sagemath.org. Accessed 3 Mar 2017

  • Wang L-S, Warnow T (2001) Estimating true evolutionary distances between genomes. In: Proceedings of the thirty-third annual ACM symposium on theory of computing, STOC’01. ACM, New York, NY, USA, pp 637–646

  • Wang L-S, Warnow T, Moret BME, Jansen RK, Raubeson LA (2006) Distance-based genome rearrangement phylogeny. J Mol Evol 63(4):473–483

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Venta Terauds.

Additional information

This work was supported by Australian Research Council Discovery Early Career Research Award DE130100423 to JS and by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. We would like to thank Andrew Francis for helpful discussions and for providing the inspiration to follow this line of research. We also thank the anonymous reviewers, whose comments assisted us in making substantial improvements to the manuscript.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 2 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Terauds, V., Sumner, J. Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach. Bull Math Biol 81, 535–567 (2019). https://doi.org/10.1007/s11538-018-0511-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-018-0511-6

Keywords

  • Rearrangement models
  • Circular genomes
  • Maximum likelihood
  • Evolutionary distance
  • Group representations