## Abstract

The calculation of evolutionary distance via models of genome rearrangement has an inherent combinatorial complexity. Various algorithms and estimators have been used to address this; however, many of these set quite specific conditions for the underlying model. A recently proposed technique, applying representation theory to calculate evolutionary distance between circular genomes as a maximum likelihood estimate, reduces the computational load by converting the combinatorial problem into a numerical one. We show that the technique may be applied to models with any choice of rearrangements and relative probabilities thereof; we then investigate the symmetry of circular genome rearrangement models in general. We discuss the practical implementation of the technique and, without introducing any bona fide numerical approximations, give the results of some initial calculations for genomes with up to 11 regions.

This is a preview of subscription content, access via your institution.

## Notes

If we do not require that \(\mathcal {M}\) generates \(\mathcal {S}_N\), then it is certainly trivial!

The result in Serdoz et al. (2017) is stated in terms of likelihoods; however, the likelihoods are for single elements of \(\mathcal {S}_N\), with dihedral symmetry not included in calculations until later in the paper.

We note that the OEIS entry includes a characterisation of this sequence that is equivalent to our definition of genomes (namely, the number of necklaces that may be formed from

*N*distinct beads).Simply defined via the matrices \(\rho _p(d)\), for \(d\in D_N\).

\(\rho _{p^{*}}(\sigma ):=\mathrm {sgn}(\sigma )\rho _p(\sigma )\).

Underlying code written by Franco Saliola.

Further examples and discussion of this phenomenon are given below.

The errors were easily identified by, for example, summing the projection matrices for a given irreducible representation.

We computed path probabilities \(\alpha _k(\sigma )\) both via partial traces (4) and directly from the irreducible representations (3)—in the latter case, avoiding eigenvalue/eigenvector estimation—and these coincide. Additionally, for the cases predicted theoretically by the results of Sect. 4, we obtained zero partial trace values (within the expected numerical tolerance).

## References

Bader D, Moret B, Yan M (2001) A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J Comput Biol 8(5):483–491

Bader M, Ohlebusch E (2006) Sorting by weighted reversals, transpositions, and inverted transpositions. In: Proceedings of the 10th annual international conference on research in computational molecular biology, RECOMB 2006, Venice, Italy, April 2–5, 2006, pp 563–577

Baudet C, Dias U, Dias Z (2014) Length and symmetry on the sorting by weighted inversions problem. In: Campos S (ed) Advances in bioinformatics and computational biology. Springer, Cham, pp 99–106

Bhatia S, Feijäo P, Francis AR (2016) Position and content paradigms in genome rearrangements: the wild and crazy world of permutations in genomics. Preprint, arXiv:1610.00077

Caprara A, Lancia G (2000) Experimental and statistical analysis of sorting by reversals. In: Sankoff D, Nadeau JH (eds) Comparative genomics: empirical and analytical approaches to gene order dynamics, map alignment and the evolution of gene families. Springer Netherlands, Dordrecht, pp 171–183

Darling AE, Miklós I, Ragan MA (2008) Dynamics of genome rearrangement in bacterial populations. PLoS Genet 4(7):e1000128

Dobzhansky T, Sturtevant AH (1938) Inversions in the chromosomes of

*Drosophila pseudoobscura*. Genetics 23(1):28–64Egri-Nagy A, Gebhardt V, Tanaka MM, Francis AR (2014) Group-theoretic models of the inversion process in bacterial genomes. J Math Biol 69(1):243–265

Eriksen N, Hultman A (2004) Estimating the expected reversal distance after a fixed number of reversals. Adv Appl Math 32(3):439–453

Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland

Fertin G, Labarre A, Rusu I, Tannier É, Vialette S (2009) Combinatorics of genome rearrangements. Computational Molecular Biology. MIT Press, Cambridge

Francis AR (2014) An algebraic view of bacterial genome evolution. J Math Biol 69(6–7):1693–1718

Fulton W, Harris J (1991) Representation theory. Graduate Texts in Mathematics, vol 129. Springer, New York. A first course, Readings in Mathematics

Golomb SW, Welch LR (1960) On the enumeration of polygons. Am Math Mon 67:349–353

Hannenhalli S, Pevzner PA (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J ACM 46(1):1–27

Kececioglu J, Sankoff D (1995) Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. Algorithmica 13(1–2):180–210

Larget B, Simon DL, Kadane JB, Sweet D (2005) A Bayesian analysis of metazoan mitochondrial genome arrangements. Mol Biol Evol 22(3):486–495

Lin Y, Moret BME (2008) Estimating true evolutionary distances under the DCJ model. Bioinformatics 24(13):i114–i122

Moret BM, Wang LS, Warnow T, Wyman SK (2001) New approaches for reconstructing phylogenies from gene order data. Bioinformatics (Oxford, England) 17(Suppl 1):S165–S173

R Core Team (2013) R: A language and environment for statistical computing

Sagan BE (2001) The symmetric group. Graduate Texts in Mathematics, vol 203, 2nd edn. Springer, New York, Representations, combinatorial algorithms, and symmetric functions

Serdoz S, Egri-Nagy A, Sumner J, Holland BR, Jarvis PD, Tanaka MM, Francis AR (2017) Maximum likelihood estimates of pairwise rearrangement distances. J Theor Biol 423:31–40

Street AP, Day R (1982) Sequential binary arrays. II. Further results on the square grid. In: Combinatorial mathematics, IX (Brisbane, 1981). Lecture Notes in Math., vol 952. Springer, Berlin-New York, pp 392–418

Sturtevant AH, Tan CC (1937) The comparative genetics of

*Drosophila pseudoobscura*and*D. melanogaster*. J Genet 34(3):415–432Sumner JG, Jarvis PD, Francis AR (2017) A representation-theoretic approach to the calculation of evolutionary distance in bacteria. J Phys A 50(33):335601, 14

The On-Line Encyclopedia of Integer Sequences, 2010. https://oeis.org. Accessed 4 May 2017

The Sage Developers. SageMath, the Sage Mathematics Software System (Version 7.5.1) (2017) http://www.sagemath.org. Accessed 3 Mar 2017

Wang L-S, Warnow T (2001) Estimating true evolutionary distances between genomes. In: Proceedings of the thirty-third annual ACM symposium on theory of computing, STOC’01. ACM, New York, NY, USA, pp 637–646

Wang L-S, Warnow T, Moret BME, Jansen RK, Raubeson LA (2006) Distance-based genome rearrangement phylogeny. J Mol Evol 63(4):473–483

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

This work was supported by Australian Research Council Discovery Early Career Research Award DE130100423 to JS and by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. We would like to thank Andrew Francis for helpful discussions and for providing the inspiration to follow this line of research. We also thank the anonymous reviewers, whose comments assisted us in making substantial improvements to the manuscript.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Rights and permissions

## About this article

### Cite this article

Terauds, V., Sumner, J. Maximum Likelihood Estimates of Rearrangement Distance: Implementing a Representation-Theoretic Approach.
*Bull Math Biol* **81**, 535–567 (2019). https://doi.org/10.1007/s11538-018-0511-6

Received:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s11538-018-0511-6

### Keywords

- Rearrangement models
- Circular genomes
- Maximum likelihood
- Evolutionary distance
- Group representations