A tensorial approach to the inversion of group-based phylogenetic models

Sumner, Jeremy G; Jarvis, Peter D; Holland, Barbara R

doi:10.1186/s12862-014-0236-6

A tensorial approach to the inversion of group-based phylogenetic models

Research Article
Open access
Published: 04 December 2014

Volume 14, article number 236, (2014)
Cite this article

Download PDF

You have full access to this open access article

BMC Evolutionary Biology Aims and scope Submit manuscript

A tensorial approach to the inversion of group-based phylogenetic models

Download PDF

Jeremy G Sumner¹,
Peter D Jarvis¹ &
Barbara R Holland¹

1422 Accesses
4 Altmetric
Explore all metrics

Abstract

Background

Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called “edge length” and “sequence” spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling.

Results

For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only.

Conclusion

We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.

Phylogenetic Networks

Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry

Article Open access 24 October 2018

Rank Conditions on Phylogenetic Networks

Background

Fundamental to evolutionary biology is the development and implementation of molecular phylogenetic methods [1]. These methods provide the means to reconstruct the past evolutionary history of biological entities given present-day molecular data, such as DNA. Considering Kimura’s neutral theory of molecular evolution, it is logical to apply a stochastic model at the level of DNA substitutions to construct probabilistic description of what molecular alignments are expected to be observed, given a proposed evolutionary history (tree topology and edge lengths). is commonly implemented assuming an IID (across sites in the alignment) and Markov process for DNA substitution, leading to a model that has a continuous-time Markov chain at its core (see Semple and Steel [2] for an introduction to the mathematics underlying modern phylogenetic methodology).

In a series of papers, Hendy and colleagues introduced the Hadamard conjugation as a novel tool for phylogenetic analyses [3]–[5]. They found an invertible relationship between a phylogenetic tree, as characterized by its edge length spectrum, and the probability distribution of site patterns (referred to as the sequence spectrum). Originally introduced only for the 2-state symmetric model, the Hadamard conjugation was later extended to the K3ST model [6]–[8] and further to any of the so-called “group-based” models [9]. Hadamard conjugation has been used as both a tool for simulation [10] and to look at statistical properties of methods, exploring the inconsistency of parsimony under a molecular clock [5],[11]. For these sorts of applications, following the notation in Felsenstein [1], we can use the Hadamard transform H to start with an edge length spectrum γ and calculate the sequence spectrum s=H ⁻¹ log(H γ). The beauty of Hadamard conjugations is that one can also begin with an observed sequence spectrum $ŝ$ and perform the inverse of the conjugation to empirically obtain an edge length spectrum $\hat{γ} = H^{- 1} log (Hŝ)$ . Although it is not expected that the $\hat{γ}$ spectrum will precisely match a tree, Hendy [12] proposed using an optimisation criterion to map from $\hat{γ}$ to the “closest tree”.

Several authors have commented that it is potentially a useful feature of Hadamard conjugation that data isn’t forced onto a fixed tree. The conflicting information can be retained and interpreted in the form of a “lentoplot” [13] or a splits-graph [14], with both of these methods implemented in Spectronet [15]. Schliep [16] gives some more statistical justification for such an approach by making a link to modern statistical techniques such as the Lasso and Ridge regression.

von Haeseler and Churchill [17] seems to be the first paper that explicitly suggests using Hadamard conjugation to provide a likelihood framework for networks. The principle idea in this work was to start with an edge length spectrum that encodes a set of incompatible splits, use the Hadamard transformation to get site probabilities and use these to determine a likelihood. This idea was further explored by Bryant [18], and Bryant [19] followed this through defining the “n-taxon process” for group-based models. It should be noted that likelihoods calculated via Hadamard are not equivalent to likelihoods calculated by taking a mixture of trees. Indeed, Matsen and Steel [20], Matsen et al. [21] used Hadamard methods in combination with phylogenetic invariants to show that mixtures of trees with the same topology can exactly mimic another tree under the 2-state model. Considering biological applications, thinking in terms of mixtures of trees or partitions where the data can be thought of as arising on a set of trees [22]–[24] seems more reasonable than the Hadamard conjugation. Strimmer and Moulton [25] suggested using split networks as a spring board to likelihood-based analyses on DAGs, but later identified several problems with the approach [26]; most notably, in split-networks internal nodes do not have a biological interpretation as an ancestor.

In Sumner et al. [27], we gave some additional insight into the interpretation of applying the Hadamard conjugation in a network setting. We showed that permutation group structure inherent to the Hadamard transformation – as for any group-based model – restricts the resulting process from being capable of reproducing truly convergent processes. This is a serious limitation, as one of the biological motivations for explicit network models is the ability to model convergent processes. We also presented an alternative algebraic formalism for the general Markov model, analogous to the n-taxon process, but capable of reproducing convergent processes.

From the point of view of group representation theory, the inversion of group-based models relies on the fact that the irreducible representations of an abelian group are one-dimensional, and the model structure essentially reduces to analysing group characters – hence the standard presentation of a Fourier inversion. In this article, we make this connection concrete. For the general Markov model, it is then immediately apparent that an analogous inversion is not possible because the algebraic structure underlying the model is not abelian and hence the irreducible representations are not one-dimensional. In fact, to obtain one-dimensional representations for the general Markov model, it is necessary to apply higher-degree polynomial maps (beyond the degree 1, linear case), and define “Markov invariants” [28]. These invariants present one-dimensional representations but at the cost of the higher degree – degree 5 in the case of the general Markov model with four states on quartet trees [29],[30]. This connection between Hadamard transformation and Markov invariants is an interesting one, but we do not discuss it further here.

In this paper we approach the inversion of group-based phylogenetic models by taking a representation-theoretic perspective and working explicitly with tensor indices. Our approach rests heavily on the formalism of “phylogenetic tensors”, as presented in Bashford et al. [31], for the binary-symmetric and K3ST model, and Sumner et al. [27],[28], for the general Markov model.

Although the main inversion results presented here are not more general than those in in Székely et al. [7], we think it is important to reformulate them using the language of tensors and representation theory. This viewpoint has already led to new approaches for modeling convergent evolution [27] and for studying non-group-based models [28]. However, in none of our previous work was the link to Hadamard conjugation explicitly discussed. By presenting an old technique (Hadamard conjugation) in a new light we hope to introduce other researchers to the viewpoint of tensor analysis and representation theory.

Methods

Group-based models

We consider the continuous-time formulation of Markov processes, and show how to implement the inversion of a group-based phylogenetic model based on any abelian group G. We note that such an inversion requires a map from tensor product space (where elements are indexed by ordered-n-partitions) to phylogenetic splits (where elements are indexed by bipartitions). We achieve this by finding canonical maps from bipartitions to ordered-n-partitions.

For a group G (not necessarily abelian) with order |G|=d, we write G={σ ₁,σ ₂,…,σ _d}, and, when necessary, write ε∈G to specify the identity element of G. We will discuss the “regular representation” of G shortly, but skipping ahead we find that any rate matrix Q occurring in a group-based Markov model can be written in the form

\begin{array}{lcr} \begin{matrix} Q = - λ 1 + \sum_{ε \neq σ \in G} α^{σ} K_{σ}, \end{matrix} \end{array}

(1)

where each $0 \leq α^{σ} \in ℝ$ , $λ = \sum_{ε \neq σ \in G} α^{σ}$ and the K _σ are the permutation matrices corresponding to the (non-identity) group elements σ∈G.

For the reader interested in deriving this result, consider the d-dimensional vector space ${〈G〉}_{ℂ} \equiv {〈σ_{1}, σ_{2}, \dots, σ_{d}〉}_{ℂ} = {v = v_{1} σ_{1} + v_{2} σ_{2} + \dots + v_{d} σ_{d} : v_{i} \in ℂ}$ , with scalar multiplication and vector addition defined via

\begin{array}{lcr} \begin{matrix} v + λ v^{'} & = (v_{1} σ_{1} + v_{2} σ_{2} + \dots + v_{d} σ_{d}) \\ + λ (v_{1}^{'} σ_{1} + v_{2}^{'} σ_{2} + \dots + v_{d}^{'} σ_{d}) \\ = (v_{1} + λ v_{1}^{'}) σ_{1} + (v_{2} + λ v_{2}^{'}) σ_{2} + \dots \\ + (v_{d} + λ v_{d}^{'}) σ_{d}, \end{matrix} \end{array}

for all $v, v^{'} \in {〈 G 〉}_{ℂ}$ and $λ \in ℂ$ . The regular representation, $ρ_{reg} : G \to GL (d, ℂ)$ , is then defined by setting the group action

σ : v \mapsto σv = v_{1} (σ σ_{1}) + v_{2} (σ σ_{2}) + \dots + v_{d} (σ σ_{d}),

for all $v \in {〈 G 〉}_{ℂ}$ and σ∈G. If we fix {σ ₁,σ ₂,…,σ _d} as an ordered basis for ${〈 G 〉}_{ℂ}$ , it is then clear – via Cayley’s theorem – that each group element σ gets mapped to a permutation matrix K _σ:=ρ _reg(σ), with $K_{σ} σ_{i} = \sum_{j} {[K_{σ}]}_{i}^{j} σ_{j} : = σ σ_{i}$ . Thus K _σ has matrix elements

\begin{array}{lcr} \begin{matrix} {[K_{σ}]}_{i}^{j} = \{\begin{matrix} 1, if σ_{j} = σ σ_{i}, \\ 0, otherwise. \end{matrix} \end{matrix} \end{array}

(2)

Consider the unit column vectors

\begin{matrix} ξ_{1} & = {(1, 0, 0, \dots, 0)}^{T}, ξ_{2} = {(0, 1, 0, 0, \dots, 0)}^{T}, \dots \\ ξ_{d} & = {(0, 0, \dots, 0, 1)}^{T}; \end{matrix}

and identify each $σ_{i} \in {〈G〉}_{ℂ}$ with $ξ_{i} \in ℂ^{d}$ , so that the group action becomes σ:ξ _i↦K _σξ _i=ξ _j where σ _j=σ σ _i. Thus the matrix elements ${[K_{σ}]}_{i}^{j}$ have i as the column label and j as the row label.

A group-based Markov model is then obtained by taking a continuous-time Markov chain with state space G={σ ₁,σ ₂,…,σ _d} and using the group multiplication in G to assign a rate α _σ to all substitutions σ ₁↦σ ₂ where σ σ ₁=σ ₂. Following this through (as is done in detail in [32]) we are led to the formula (1) for rate matrices in any group-based model.

The regular representation is one example of the general concept of a representation of G on a vector space V, defined as a homomorphism ρ:G→G L(V) satisfying ρ(g ₁g ₂)=ρ(g ₁)ρ(g ₂) for all g ₁,g ₂∈G. A representation is said to be reducible if there exists a proper subspace U⊂V satisfying ρ(g)U⊂U, i.e. the set of matrices ρ(G) send vectors in U back to U. In this case, U is called an invariant subspace. The representation ρ is then called irreducible if V does not contain any invariant subspaces.

The reader should note that the usual construction of a “group-based” model [2] stipulates that G be abelian. Although the construction just given using the regular representation allows for non-abelian G, we will nonetheless only consider the abelian case in this paper, because, as discussed in the introduction, it is only in the abelian case that a (linear) inversion of phylogenetic models is possible. In this case the irreducible representations of G are all one-dimensional [33], and hence the analysis reduces to computations with group characters, as is exploited in the previous approaches using Fourier analysis [9],[34].

Phylogenetic tensors

We denote [d]:={1,2,…,d} as the state space for a continuous-time Markov chain. Consider an n-taxa phylogenetic tree and a d-state phylogenetic pattern distribution ${p_{i_{1} i_{2} \dots i_{n}}}_{i_{1}, i_{2}, \dots, i_{n} \in [d]}$ with the interpretation that $p_{i_{1} i_{2} \dots i_{n}}$ is the probability that the observed state at the k ^th leaf on the tree is i _k. As is shown in Sumner and Jarvis [35] and in more detail in Sumner et al. [27], such phylogenetic pattern distributions can be represented abstractly as tensors in the n-fold tensor product space $\otimes^{n} ℂ^{d} : = ℂ^{d} \otimes ℂ^{d} \otimes \dots \otimes ℂ^{d}$ , as follows. If we choose {ξ ₁,ξ ₂,…,ξ _d} as an ordered basis for $ℂ^{d}$ , and ordered basis ${ξ_{i_{1}} \otimes ξ_{i_{2}} \otimes \dots \otimes ξ_{i_{d}}}_{i_{1}, i_{2}, \dots, i_{n} \in [d]}$ for the tensor product space, a “phylogenetic tensor” $P \in \otimes^{n} ℂ^{d}$ is then defined as

\begin{array}{lcr} \begin{matrix} P = \sum_{i_{1}, i_{2}, \dots, i_{n} \in [d]} p_{i_{1} i_{2} \dots i_{n}} ξ_{i_{1}} \otimes ξ_{i_{2}} \otimes \dots \otimes ξ_{i_{n}} . \end{matrix} \end{array}

For readers who are unfamiliar with tensor products, it is possible to understand the general concept via the definition of the “Kronecker” product of a n×m matrix A and a n ^′×m ^′ matrix B as the n n ^′×m m ^′ matrix given by

We can index the matrix A⊗B with row indicies i ₁j ₁=11,12,…,n n ^′ and column indices j ₁j ₂=11,12,…,m m ^′, i.e. generically ${(A \otimes B)}_{i_{1} j_{1}, i_{2} j_{2}} = A_{i_{1} i_{2}} B_{j_{1} j_{2}}$ and specifically (A⊗B)_12,32=A ₁₃B ₂₂. This point of view is useful if one wants to write out specific matrix representations of tensors, however, in the development that follows will focus heavily on the indexing of tensor components in the various cases discussed.

Suppose $π = \sum_{i \in [d]} π_{i} ξ_{i} \in ℂ^{d}$ represents the state distribution of a single taxa, i.e. π _i is the probability that a randomly chosen site in the sequence will be in state i. Now suppose a phylogenetic branching event occurs and the sequence is copied. The corresponding phylogenetic tensor $P = \sum_{i_{1}, i_{2} \in [d]} p_{i_{1} i_{2}} ξ_{i_{1}} \otimes ξ_{i_{2}}$ representing the joint distribution of the two-taxa just after the branching event then has the property that $p_{i_{1} i_{2}} = π_{i_{1}}$ if i ₂=i ₁ and is zero otherwise. Thinking in terms of tensor operations, we find that phylogenetic branching events can be generated by a linear operator $δ : ℂ^{d} \to ℂ^{d} \otimes ℂ^{d}$ determined by δ(π)=P and defined in general using our chosen basis as

\begin{array}{lcr} \begin{matrix} δ (ξ_{i}) : = ξ_{i} \otimes ξ_{i}, δ (π) & = δ (\sum_{i} π_{i} ξ_{i}) = \sum_{i} π_{i} δ (ξ_{i}) \\ = \sum_{i} π_{i} ξ_{i} \otimes ξ_{i} . \end{matrix} \end{array}

The remarkable fact for group-based models, central to the present article, is that the permutation matrices “intertwine” particularly simply with the branching operator:

\begin{array}{lcr} \begin{matrix} δ (K_{σ} ξ_{i}) = δ (ξ_{σ (i)}) = ξ_{σ (i)} \otimes ξ_{σ (i)} = K_{σ} \otimes K_{σ} \cdot δ (ξ_{i}) . \end{matrix} \end{array}

Thus, for any rate matrix Q arising from a group-based model, we have (via the linearity of δ):

\begin{array}{lcr} \begin{matrix} δ \cdot Q = (- λ 1 \otimes 1 + \sum_{ε \neq σ \in G} α^{σ} K_{σ} \otimes K_{σ}) \cdot δ. \end{matrix} \end{array}

(3)

We also note that, since Q can be expressed a linear combination of permutation matrices representing elements in a group G, the matrix powers Q ²,Q ³,Q ⁴… will also be expressible as linear combinations of the same permutation matrices (although precise expressions for the relevant coefficients may or may not be easily computable). Together with (3), this implies that, for any substitution matrix e ^Qt arising from matrix exponentiation,

\begin{array}{lcr} \begin{matrix} δ \cdot e^{Qt} = e^{- λ} exp (\sum_{ε \neq σ \in G} α^{σ} K_{σ} \otimes K_{σ}) \cdot δ. \end{matrix} \end{array}

(4)

This relation shows that mathematically, and hence conceptually, “Markov evolution on a single followed by a branching event” can be replaced with “Branching event on a single taxon followed by (correlated) Markov evolution of two taxa.” This equivalence is illustrated in Figure 1, and should be compared to the equivalent discussion of the “n-taxa process” given in [18] and [19].

In Sumner et al. [27] we showed how to generalise this intertwining action to the case of the general Markov model. Interestingly, for the general Markov model the appropriate intertwining has quite a different structure from what occurs in group-based models, and hence the simplicity of (4) is somewhat misleading in general. We refer the reader to Sumner et al. [27] for more discussion on this point.

Returning to the case of group-based models, for each subset A⊆[n], we define a linear map on $\otimes^{n} ℂ^{d}$ as the tensor product $K_{σ}^{(A)} : = K_{σ}^{a_{1}} \otimes K_{σ}^{a_{2}} \otimes \dots \otimes K_{σ}^{a_{n}}$ where a _i=1 if i∈A and 0 otherwise. For example, if n=5, we have

\begin{array}{lcr} \begin{matrix} K_{σ}^{({1, 2, 4})} = K_{σ} \otimes K_{σ} \otimes 1 \otimes K_{σ} \otimes 1 . \end{matrix} \end{array}

To develop a phylogenetic tensor on a tree, we root the phylogenetic tree at taxon n, and label edges by subsets ∅≠e⊆[n−1], where i∈e if the path from taxon n to taxon i crosses the edge labelled by e. A five taxon tree with this labelling, is presented in Figure 2. To each edge labelled by ∅≠e⊆[n−1], we assign the rate matrix

\begin{array}{lcr} \begin{matrix} Q_{e} : = - λ_{e} 1 + \sum_{ε \neq σ \in G} α_{e}^{σ} K_{σ}, \end{matrix} \end{array}

where each $α_{e}^{σ} \geq 0$ is the rate of substitution for all states σ ₁ to σ ₂ satisfying $σ = σ_{2} σ_{1}^{- 1}$ , and $λ_{e} = \sum_{σ \in G} α_{e}^{σ}$ . Each edge is then assigned substitution matrix $M_{e} = e^{Q_{e}}$ , so that the time parameter for each edge is absorbed into the definition of Q _e.

Now iterating (4) multiple times, Bashford et al. [27],[31] show that any phylogenetic tensor can be written as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1], σ \in G} α_{e}^{σ} K_{σ}^{(e)}) \cdot δ^{n - 1} π. \end{matrix} \end{array}

(5)

where $λ = \sum_{\emptyset \neq e \subseteq [n - 1]} λ_{e} = \sum_{\emptyset \neq e \subseteq [n - 1], ε \neq σ \in G} α_{e}^{σ}$ , and δ ⁿ⁻¹π is the d×d×…×d tensor that represents the “zero edge-length star tree” distribution on n taxa. It is this form of phylogenetic tensors that will do a lot of the heavy lifting in the discussion that follows. The reader should note that under this representation, there is no need for the edge parameters $\{α_{e}^{σ} : \emptyset \neq e \subseteq [n - 1], σ \in G\}$ to be chosen to be compatible with a particular tree, hence the possibilities for generalising to non-tree-like or network models, as discussed in the introduction.

The stationary distribution for group-based models is uniform (because the rate matrices are doubly stochastic). In this paper we always assume a stationary distribution, so that:

\begin{array}{lcr} \begin{matrix} π = \frac{1}{d} {(1, 1, \dots, 1)}^{T}, \end{matrix} \end{array}

and δ ⁿ⁻¹π has tensor components

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{i_{1} i_{2} \dots i_{n}} = \{\begin{matrix} \frac{1}{d}, & if i_{1} = i_{2} = \dots = i_{n}, \\ 0, & otherwise. \end{matrix} \end{matrix} \end{array}

This concludes our discussion of the tensor presentation of phylogenetic probability distributions under group-based models. It is important to note that everything discussed so far works for any group-based model, with no requirement that the underlying group G be abelian.

In what follows, we discuss the inversion of abelian group-based models. We present the simplest case with $G = ℤ_{2}$ ; the $G = ℤ_{3}$ case; the $G = ℤ_{2} \times ℤ_{2}$ case; the general $G = ℤ_{r}$ case; and finally we discuss the case of any abelian group.

Results

The binary-symmetric case

We begin with the inversion of the so-called “binary-symmetric” model. Consider $ℂ^{2}$ with standard basis

\begin{array}{lcr} \begin{matrix} \{ξ_{0} = (\begin{matrix} 1 \\ 0 \end{matrix}), ξ_{1} = (\begin{matrix} 0 \\ 1 \end{matrix})\} . \end{matrix} \end{array}

As a group-based model, the binary-symmetric model arises by taking the group

\begin{array}{lcr} \begin{matrix} G : = ℤ_{2} = {0, 1}_{+ (mod 2)} ≅ 〈σ | σ^{2} = ε〉, \end{matrix} \end{array}

with a generic rate matrix given by

\begin{array}{lcr} \begin{matrix} Q = (\begin{matrix} - 1 & 1 \\ 1 & - 1 \end{matrix}) = - 1 + K, \end{matrix} \end{array}

where $K = (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix})$ is the permutation matrix representing σ in the standard basis.

Now $ρ_{reg} : ℤ_{2} \to M_{2} (ℂ)$ , with σ↦K, is the regular representation of $ℤ_{2}$ , and the character table of $ℤ_{2}$ given in Table 1 is easily recognised to be the Hadamard matrix

\begin{array}{lcr} \begin{matrix} h = (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) . \end{matrix} \end{array}

Table 1 The character table of $ℤ_{2}$

Full size table

As $ℤ_{2}$ is an abelian group, the irreducible representations are one-dimensional.

The corresponding projection operators can be read off from the columns of the character table. That is, the operators

\begin{array}{lcr} \begin{matrix} Θ_{id} : = \frac{1}{2} (ε + σ), Θ_{sgn} : = \frac{1}{2} (ε - σ); \end{matrix} \end{array}

project ρ _reg=i d⊕s g n onto the id and sgn representations of $ℤ_{2}$ , respectively.

This observation prompts us to work in the alternative basis:

\begin{array}{lcr} \begin{matrix} f_{0} : & = Θ_{id} \cdot ξ_{0} = Θ_{id} \cdot ξ_{1} = h ξ_{0} = ξ_{0} + ξ_{1}, \\ f_{1} : & = Θ_{sgn} \cdot ξ_{0} = - Θ_{sgn} \cdot ξ_{1} = h ξ_{1} = ξ_{0} - ξ_{1} . \end{matrix} \end{array}

In this basis the permutation matrix is diagonal:

\begin{array}{lcr} \begin{matrix} \hat{K} : = hK h^{- 1} = (\begin{array}{l} 1 & 0 \\ 0 & - 1 \end{array}), \hat{Q} : = - 1 + \hat{K} = (\begin{array}{l} 0 & 0 \\ 0 & - 2 \end{array}) . \end{matrix} \end{array}

The representation-theoretic perspective on $\hat{K}$ is to observe that i d(σ)=1 and s g n(σ)=−1.

Referring to (5), we know that we can write a generic phylogenetic tensor as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} K^{(e)}) \cdot δ^{n - 1} π, \end{matrix} \end{array}

where $λ = \sum_{\emptyset \neq e \subseteq [n - 1]} α_{e}$ .

We index matrix and tensor indices by using $i, j, k = 0, 1 \in ℤ_{2}$ and allow multiplication × in the ring of integers . The Hadamard matrix then has matrix elements ${[h]}_{i}^{j} = {(- 1)}^{i \times j}$ where j is the row index and i is the column index. Observe that in the diagonal basis, the permutation matrix has elements

\begin{array}{lcr} \begin{matrix} {[\hat{K}]}_{i}^{j} = δ_{ij} {(- 1)}^{i} . \end{matrix} \end{array}

Thus we have expressions such as

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}^{({2, 3})}]}_{i_{1} i_{2} i_{3}}^{j_{1} j_{2} j_{3}} = δ_{i_{1} j_{1}} δ_{i_{2} j_{2}} δ_{i_{3} j_{3}} {(- 1)}^{i_{2} + i_{3}}, \end{matrix} \end{array}

where ${\hat{K}}^{({2, 3})} = 1 \otimes \hat{K} \otimes \hat{K}$ .

As we are dealing with tensors of arbitrary size, it is convenient to represent a string such as i ₁i ₂…i _n as an ordered-bipartition μ=μ ₀: μ ₁ of the set [n], where μ ₀,μ ₁⊆[n] with j∈μ _k if and only if i _j=k. For example we have the following equivalences:

\begin{array}{lcr} \begin{matrix} 00110 & \equiv {1, 2, 5} : {3, 4}, 01111 \equiv {1} : {2, 3, 4, 5}, \\ 10001 & \equiv {2, 3, 4} : {1, 5} \end{matrix} \end{array}

and inequivalence:

\begin{array}{lcr} \begin{matrix} 01010 \equiv {1, 3, 5} : {2, 4} \neq {2, 4} : {1, 3, 5} \equiv 10101 . \end{matrix} \end{array}

We then have

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}^{(e)}]}_{i_{1} i_{2} \dots i_{n}}^{j_{1} j_{2} \dots j_{n}} & = {[{\hat{K}}^{(e)}]}_{μ}^{ν} = {[{\hat{K}}^{(e)}]}_{μ_{0} : μ_{1}}^{ν_{0} : ν_{1}} \\ = δ_{μ_{0} ν_{0}} δ_{μ_{1} ν_{1}} {(- 1)}^{| e \cap μ_{1} |} . \end{matrix} \end{array}

Defining h ⁽ⁿ⁾:=h ⁽ⁿ⁻¹⁾⊗h where h ⁽¹⁾:=h, in the diagonal basis $\hat{P} : = h^{(n)} \cdot P$ and using our notation h ⁽ⁿ⁾ has tensor components

\begin{array}{lcr} \begin{matrix} {[h^{(n)}]}_{μ}^{ν} = {[h^{(n)}]}_{μ_{0} : μ_{1}}^{ν_{0} : ν_{1}} & = {[h^{(n)}]}_{i_{1} i_{2} \dots i_{n}}^{j_{1} j_{2} \dots j_{n}} \\ = {(- 1)}^{i_{1} \times j_{1} + i_{2} \times j_{2} + \dots + i_{n} \times j_{n}} \\ = {(- 1)}^{| μ_{1} \cap ν_{1} |} . \end{matrix} \end{array}

The zero edge-length star-tree initial distribution has tensor components

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{i_{1} i_{2} \dots i_{n}} = \frac{1}{2} δ_{i_{1} i_{2}} δ_{i_{1} i_{3}} \dots δ_{i_{1} i_{n}}, \end{matrix} \end{array}

(where, although it seems we have given preference to taxon 1 in this expression, there are many ways that this distribution can be expressed using the δ _ij). In the diagonal basis with $\hat{δ^{n - 1} π} : = h^{(n)} \cdot δ^{n - 1} π$ , we have components

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{i_{1} i_{2} \dots i_{n}} \\ = \frac{1}{2} \sum_{j_{1}, j_{2}, \dots, j_{n}} {(- 1)}^{i_{1} \times j_{1} + i_{2} \times j_{2} + \dots + i_{n} \times j_{n}} δ_{j_{1} j_{2}} δ_{j_{1} j_{3}} \dots δ_{j_{1} j_{n}} \\ = \frac{1}{2} \sum_{j_{1}} {(- 1)}^{(i_{1} + i_{2} + \dots + i_{n}) \times j_{1}} = \frac{1}{2} (1 + {(- 1)}^{i_{1} + i_{2} + \dots + i_{n}}), \end{matrix} \end{array}

which is exactly the statement

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} = {[\hat{δ^{n - 1} π}]}_{μ_{0} : μ_{1}} = \frac{1}{2} (1 + {(- 1)}^{| μ_{1} |}) . \end{matrix} \end{array}

Since $\hat{K}$ is diagonal in the transformed basis, we can conclude that

\begin{array}{lcr} \begin{matrix} {[\hat{P}]}_{μ} & = {[\hat{P}]}_{μ_{0} : μ_{1}} \\ = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [2, n]} α_{e} {[{\hat{K}}^{(e)}]}_{μ_{0} : μ_{1}}^{μ_{0} : μ_{1}}) \frac{1}{2} (1 + {(- 1)}^{| μ_{1} |}) . \end{matrix} \end{array}

Of course many of these tensor components will be zero and we would like to ignore these.

Take u=u ₀: u ₁ as an ordered bipartition of the reduced set [n−1], so that u≡i ₁i ₂…i _n−1 where j∈u _k if and only if i _j=k, and define

\begin{array}{lcr} \begin{matrix} γ (u) & = \{\begin{array}{l} 0, if | u_{1} | is even, \\ 1, if | u_{1} | is odd; \end{array} \\ = 2 - (0 | u_{0} | + 1 | u_{1} |) (mod 2), \end{matrix} \end{array}

and interpret u·γ(u) as a string: u·γ(u) = i ₁i ₂…i _n−1γ(u).

If we make the definitions

\begin{array}{lcr} \begin{matrix} P_{u} : = {[\hat{P}]}_{u \cdot γ (u)}, η_{u} : = \frac{1}{2} \sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} {[{\hat{K}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{matrix} \end{array}

then we can write the non-zero components as

\begin{array}{lcr} \begin{matrix} P_{u} = e^{- λ} exp (η_{u}), \end{matrix} \end{array}

with inverses

\begin{array}{lcr} \begin{matrix} η_{u} = ln (P_{u}) + λ. \end{matrix} \end{array}

(6)

This is the first part of the inversion.

We would like to go further and actually recover the individual edge weights α _e. To do this we define the (square) 2ⁿ⁻¹×2ⁿ⁻¹ matrix F with components

\begin{array}{lcr} \begin{matrix} {[F]}_{u}^{e} : = {[{\hat{K}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)} = {(- 1)}^{| e \cap u |} = {[h^{(n - 1)}]}_{u}^{e}, \end{matrix} \end{array}

with e a subset and u an ordered-bipartition of [n−1]. As ${(h^{(n - 1)})}^{2} = \frac{1}{2^{n - 1}} 1$ , we see that F provides its own inverse F ⁻¹ with components

\begin{array}{lcr} \begin{matrix} {[F^{- 1}]}_{e}^{u} : = \frac{1}{2^{n - 1}} {[F]}_{u}^{e} . \end{matrix} \end{array}

Defining the column vectors $\vec{α} = \{α_{e}\}$ and $\vec{η} = \{η_{u}\}$ , we can write the matrix equations

\begin{array}{lcr} \begin{matrix} \vec{η} = F \vec{α}, \vec{α} = F^{- 1} \vec{η} . \end{matrix} \end{array}

Together with the first part of the inversion (6), these equations give a one-one map between pattern probabilities and edge weights for the binary-symmetric model.

Inversion of the $ℤ_{3}$ model

Taking confidence from the previous case we now discuss the inversion of the group-based phylogenetic model with $G = ℤ_{3}$ . We take

ℤ_{3} = {0, 1, 2}_{+ (mod 3)} ≅ 〈 σ | σ^{3} = ε 〉,

and, by analogy to the $ℤ_{2}$ case, index tensors with indices i,j=0,1,2 and allow multiplication × by extending $ℤ_{3}$ to the ring $F_{3} = {0, 1, 2}_{+, \times (mod 3)}$ .

In this case a generic rate matrix is given by

\begin{array}{lcr} \begin{matrix} Q & = (\begin{array}{l} - (α + β) & β & α \\ α & - (α + β) & β \\ β & α & - (α + β) \end{array}) \\ = - (α + β) 1 + α K_{1} + β K_{2}, \end{matrix} \end{array}

where

\begin{array}{lcr} \begin{matrix} K_{1} = (\begin{array}{l} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{array}), K_{2} = (\begin{array}{l} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{array}), \end{matrix} \end{array}

are the matrices representing the permutations σ≅(123) and σ ²≅(132) under the regular representation, respectively.

We define ω=e ^2πi/3, and present the character table of $ℤ_{3}$ in Table 2. The decomposition of the regular representation is ρ _reg=i d⊕ω⊕ω ², and the columns of the character table give the projection operators onto the (one-dimensional) irreducible subspaces:

\begin{array}{lcr} \begin{matrix} Θ_{id} : & = \frac{1}{3} (ε + σ + σ^{2}), \\ Θ_{ω} : & = \frac{1}{3} (ε + ωσ + ω^{2} σ^{2}), \\ Θ_{ω^{2}} : & = \frac{1}{3} (ε + ω^{2} σ + ω σ^{2}) . \end{matrix} \end{array}

Table 2 The character table of $ℤ_{3}$

Full size table

Therefore, the matrix

\begin{array}{lcr} \begin{matrix} f = (\begin{array}{l} 1 & 1 & 1 \\ 1 & ω & ω^{2} \\ 1 & ω^{2} & ω \end{array}), \end{matrix} \end{array}

diagonalizes the generic rate matrix for this model:

\begin{array}{lcr} \begin{matrix} \hat{Q} = fQ f^{- 1} = (\begin{array}{l} 0 & 0 & 0 \\ 0 & αω + β ω^{2} & 0 \\ 0 & 0 & α ω^{2} + βω \end{array}), \end{matrix} \end{array}

or, equivalently,

\begin{array}{lcr} \begin{matrix} {\hat{K}}_{1} & = {fK}_{1} f^{- 1} = (\begin{array}{l} 1 & 0 & 0 \\ 0 & ω & 0 \\ 0 & 0 & ω^{2} \end{array}), \\ {\hat{K}}_{2} & = {fK}_{2} f^{- 1} = (\begin{array}{l} 1 & 0 & 0 \\ 0 & ω^{2} & 0 \\ 0 & 0 & ω \end{array}) . \end{matrix} \end{array}

We recall our basic result (5) that for group-based models, a generic phylogenetic tensor can be expressed as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1]} (α_{e} K_{1}^{(e)} + β_{e} K_{2}^{(e)})) \cdot δ^{n - 1} π, \end{matrix} \end{array}

where $λ = \sum_{\emptyset \neq e \subseteq [n - 1]} (α_{e} + β_{e})$ . We take the stationary distribution as initial distribution, so $π = {(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})}^{T}$ .

The matrix elements of f can be expressed as ${[f]}_{i}^{j} = ω^{i \times j}$ , where we extend $i, j \in ℤ_{3}$ to include multiplication × from the ring of integers . Similarly,

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{1}]}_{i}^{j} = δ_{ij} ω^{i}, {[{\hat{K}}_{2}]}_{i}^{j} = δ_{ij} {(ω^{2})}^{i} . \end{matrix} \end{array}

More generally, tensorial components can be expressed as

\begin{array}{lcr} \begin{matrix} {[1 \otimes \hat{K_{1}} \otimes \hat{K_{1}}]}_{i_{1} i_{2} i_{3}}^{j_{1} j_{2} j_{3}} = δ_{i_{1} j_{1}} δ_{i_{2} j_{2}} δ_{i_{3} j_{3}} ω^{i_{2} + i_{3}} . \end{matrix} \end{array}

We represent a string i ₁i ₂…i _n as an ordered-tripartition, i ₁i ₂…i _n≡μ=μ ₀: μ ₁: μ ₂, of the set [n], where j∈μ _k if and only if i _j=k. For example, if we take n=5, we have:

\begin{array}{lcr} \begin{matrix} 00000 & \equiv {1, 2, 3, 4, 5} :∅:∅, 20120 \equiv {2, 5} : {3} : {1, 4}, \\ 01122 & \equiv {1} : {2, 3} : {4, 5} . \end{matrix} \end{array}

Taking n = 3, we have

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{1}^{({2, 3})}]}_{μ}^{ν} = {[1 \otimes {\hat{K}}_{1} \otimes {\hat{K}}_{1}]}_{μ}^{ν} & = {[1 \otimes {\hat{K}}_{1} \otimes {\hat{K}}_{1}]}_{μ_{0} : μ_{1} : μ_{2}}^{ν_{0} : ν_{1} : ν_{2}} \\ = δ_{μν} ω^{| μ_{1} \cap {2, 3} | + 2 | μ_{2} \cap {2, 3} |}, \end{matrix} \end{array}

and in general:

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{1}^{(e)}]}_{μ}^{ν} & = δ_{μν} ω^{| e \cap μ_{1} | + 2 | e \cap μ_{2} |}, \\ {[{\hat{K}}_{2}^{(e)}]}_{μ}^{ν} & = δ_{μν} ω^{| e \cap μ_{2} | + 2 | e \cap μ_{1} |} . \end{matrix} \end{array}

Taking the uniform distribution as initial distribution, the initial star-tree distribution can be written as

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{i_{1} i_{2} \dots i_{n}} = \frac{1}{3} δ_{i_{1} i_{2}} δ_{i_{1} i_{3}} \dots δ_{i_{1} i_{n}} . \end{matrix} \end{array}

Defining f ⁽ⁿ⁾=f ⁽ⁿ⁻¹⁾⊗f where f ⁽¹⁾=f, we have

\begin{array}{lcr} \begin{matrix} {[f^{(n)}]}_{μ}^{ν} & = {[f^{(n)}]}_{i_{1} i_{2} \dots i_{n}}^{j_{1} j_{2} \dots j_{n}} = {[f]}_{i_{1}}^{j_{1}} {[f]}_{i_{2}}^{j_{2}} \dots {[f]}_{i_{n}}^{j_{n}} \\ = ω^{i_{1} \times j_{1} + i_{2} \times j_{2} + \dots + i_{n} \times j_{n}}, \end{matrix} \end{array}

and in the transformed basis, where $\hat{δ^{n - 1} π} : = f^{(n)} \cdot δ^{n - 1} π$ , we have

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{i_{1} i_{2} \dots i_{n}} & = \frac{1}{3} \sum_{j_{1}, j_{2}, \dots, j_{n}} ω^{i_{1} \times j_{1} + i_{2} \times j_{2} + \dots + i_{n} \times j_{n}} \\ \times δ_{j_{1} j_{2}} δ_{j_{1} j_{3}} \dots δ_{j_{1} j_{n}} \\ = \frac{1}{3} \sum_{j_{1}} ω^{j_{1} \times (i_{1} + i_{2} + \dots + i_{n})} \\ = \frac{1}{3} (1 + ω^{i_{1} + i_{2} + \dots + i_{n}} + {(ω^{2})}^{i_{1} + i_{2} + \dots + i_{n}}) . \end{matrix} \end{array}

Indexing by ordered-tripartitions, we conclude that

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} & = \frac{1}{3} (1 + ω^{i_{1} + i_{2} + \dots + i_{n}} + {(ω^{2})}^{i_{1} + i_{2} + \dots + i_{n}}) \\ = \frac{1}{3} (1 + ω^{| μ_{1} | + 2 | μ_{2} |} + {(ω^{2})}^{| μ_{1} | + 2 | μ_{2} |}) . \end{matrix} \end{array}

Now suppose |μ ₁|+2|μ ₂|=0 (mod 3), then

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} = \frac{1}{3} (1 + 1 + 1) = 1 . \end{matrix} \end{array}

If |μ ₁|+2|μ ₂|=1 (mod 3), then

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} = \frac{1}{3} (1 + ω + ω^{2}) = 0, \end{matrix} \end{array}

and if |μ ₁|+2|μ ₂|=2 (mod 3), then

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} = \frac{1}{3} (1 + ω^{2} + ω) = 0 . \end{matrix} \end{array}

Thus we have found a basis where all the elements of the initial star-tree tensor are zero unless the tripartion μ satisfies |μ ₁|+2|μ ₂|=0 (mod 3). Crucially, this statement also holds for the phylogenetic tensor $\hat{P}$ because in this basis the rate matrices of this model are diagonal:

\begin{array}{lcr} \begin{matrix} {[\hat{P}]}_{μ} & = {[\hat{P}]}_{μ_{0} : μ_{1} : μ_{2}} \\ = e^{- λ} exp (\frac{1}{2} \sum_{\emptyset \neq e \subseteq [n - 1]} {[α_{e} K_{1}^{(e)} + β_{e} K_{2}^{(e)}]}_{μ_{0} : μ_{1} : μ_{2}}^{μ_{0} : μ_{1} : μ_{2}}) \\ \times \frac{1}{3} (1 + ω^{1 | μ_{1} |} + ω^{2 | μ_{2} |}) . \end{matrix} \end{array}

We deal with this condition on μ by taking u=u ₀: u ₁: u ₂ as an ordered-tripartion of the reduced set [n−1] and setting μ=u·γ(u) (considered as the concatenation of strings) where

\begin{array}{lcr} \begin{matrix} γ (u) & = \{\begin{array}{l} 0, & if | u_{1} | + 2 | u_{2} | = 0 \\ 1, & if | u_{1} | + 2 | u_{2} | = 2 \\ 2; & if | u_{1} | + 2 | u_{2} | = 1 \end{array} \\ = 3 - (0 | u_{0} | + 1 | u_{1} | + 2 | u_{2} |) (mod 3) . \end{matrix} \end{array}

If we make the definitions

\begin{array}{lcr} \begin{matrix} P_{u} & : = {[\hat{P}]}_{u \cdot γ (u)}, \\ η_{u} & : = {[\sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} K_{1}^{(e)} + β_{e} K_{2}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{matrix} \end{array}

we then have the first part of the inversion

\begin{array}{lcr} \begin{matrix} P_{u} = e^{- λ} exp (η_{u}), η_{u} = ln (P_{u}) + λ. \end{matrix} \end{array}

(7)

As in the $ℤ_{2}$ case, we would like to use η _u to recover the rate parameters α _e,β _e for all ∅≠e⊆[n−1] and thus complete the full inversion for this model. Of course, it is little bit more difficult this time.

Recall that μ=μ ₀: μ ₁: μ ₂ with μ _i⊆[n], whereas u=u ₀: u ₁: u ₂ with u _i⊆[n−1], and ∅≠e⊆[n−1]. Considering

\begin{array}{lcr} \begin{matrix} {[K_{1}^{(e)}]}_{μ}^{μ} & = ω^{| e \cap μ_{1} | + 2 | e \cap μ_{2} |}, \end{matrix} \end{array}

it follows that

\begin{array}{lcr} \begin{matrix} {[K_{1}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)} & = ω^{| e \cap u_{1} | + 2 | e \cap u_{2} |}, \end{matrix} \end{array}

and similarly

\begin{array}{lcr} \begin{matrix} {[K_{2}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)} = ω^{| e \cap u_{2} | + 2 | e \cap u_{1} |} . \end{matrix} \end{array}

We make the observation that

\begin{array}{lcr} \begin{matrix} {[F_{1}]}_{u}^{e} : = {[f^{(n - 1)}]}_{u_{0} : u_{1} : u_{2}}^{e^{c} :e:∅} = ω^{| u_{1} \cap e | + 2 | u_{2} \cap e |} = {[K_{α}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{matrix} \end{array}

and

\begin{array}{lcr} \begin{matrix} {[F_{2}]}_{u}^{e} : = {[f^{(n - 1)}]}_{u_{0} : u_{1} : u_{2}}^{e^{c} :∅:e} = ω^{| u_{2} \cap e | + 2 | u_{1} \cap e |} = {[K_{β}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{matrix} \end{array}

where F ₁ and F ₂ are 2ⁿ⁻¹×3ⁿ⁻¹ matrices.

Thus we may write

\begin{array}{lcr} \begin{matrix} η_{u} = \sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} {[F_{1}]}_{u}^{e} + β_{e} {[F_{2}]}_{u}^{e} . \end{matrix} \end{array}

Defining the column vectors $\vec{α} = {α_{e}}, \vec{β} = {β_{e}}$ and $\vec{η} = {η_{u}}$ , we can write

\begin{array}{lcr} \begin{matrix} \vec{η} = F_{1} \vec{α} + F_{2} \vec{β}, \end{matrix} \end{array}

and define two 3ⁿ⁻¹×2ⁿ⁻¹ matrices G ₁ and G ₂ as

\begin{array}{lcr} \begin{matrix} {[G_{1}]}_{e}^{u} : = {[{f^{- 1}}^{(n - 1)}]}_{e^{c} :e:∅}^{u}, {[G_{2}]}_{e}^{u} : = {[{f^{- 1}}^{(n - 1)}]}_{e^{c} :∅:e}^{u}, \end{matrix} \end{array}

where

\begin{array}{lcr} \begin{matrix} f^{- 1} = (\begin{array}{l} 1 & 1 & 1 \\ 1 & ω & ω^{2} \\ 1 & ω^{2} & ω \end{array}), \end{matrix} \end{array}

with f f ⁻¹=1.

Considering that

\begin{array}{lcr} \begin{matrix} \sum_{v} {[{f^{- 1}}^{(n - 1)}]}_{u}^{v} {[f^{(n - 1)}]}_{v}^{w} = δ_{uw}, \end{matrix} \end{array}

for all ordered-triparitions u,w of [n−1], we have the matrix products

\begin{array}{lcr} \begin{matrix} \begin{array}{l} G_{1} F_{1} = 1, & G_{1} F_{2} = 0, \\ G_{2} F_{2} = 1, & G_{2} F_{1} = 0 . \end{array} \end{matrix} \end{array}

Thus the second part of the inversion for this model is

\begin{array}{lcr} \begin{matrix} \vec{α} = G_{1} \vec{η}, \vec{β} = G_{2} \vec{η} . \end{matrix} \end{array}

Together with (7), these equations give a one-one map between pattern probabilities and edge weights for the group-based model with $G = ℤ_{3}$ .

Inversion of the K3ST model

We now consider the K3ST model [36] which occurs as the group-based model with

\begin{matrix} G & = ℤ_{2} \times ℤ_{2} = {(0, 0), (0, 1), (1, 0), (1, 1)}_{+ (mod 2)} \\ ≅ 〈 (12) (34), (13) (24) 〉 . \end{matrix}

In this model a generic rate matrix is given by

\begin{array}{lcr} \begin{matrix} Q = - (α + β + γ) 1 + α K_{01} + β K_{10} + γ K_{11}, \end{matrix} \end{array}

where

\begin{array}{lcr} \begin{matrix} K_{01} & = 1 \otimes K = (\begin{array}{l} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}), \\ K_{10} & = K \otimes 1 = (\begin{array}{l} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \end{array}), \\ K_{11} & = K \otimes K = (\begin{array}{l} 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array}) . \end{matrix} \end{array}

(8)

We already know that the 2×2 Hadamard matrix h diagonalizes K, so we see immediately that H=h⊗h diagonalizes this model:

\begin{array}{lcr} \begin{matrix} {\hat{K}}_{01} : & = {HK}_{01} H^{- 1} = 1 \otimes hK h^{- 1} = (\begin{array}{l} 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & - 1 \end{array}), \\ {\hat{K}}_{10} : & = {HK}_{10} H^{- 1} = (\begin{array}{l} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & - 1 \end{array}), \\ {\hat{K}}_{11} : & = {HK}_{11} H^{- 1} = (\begin{array}{l} 1 & 0 & 0 & 0 \\ 0 & - 1 & 0 & 0 \\ 0 & 0 & - 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}) . \end{matrix} \end{array}

Of course H is the character table of $ℤ_{2} \times ℤ_{2}$ and the permutation matrices (8), together with K ₀₀:=1, give the regular representation ρ _reg≅i d⊗i d⊕i d⊗s g n⊕s g n⊗i d⊕s g n⊗s g n, where we recall the basic result that the tensor product of two irreducible representations of a group G gives an irreducible representation of G×G.

Simplifying notation, for this model we index tensors with indices given as pairs: $i, j = 00, 01, 10, 11 \in ℤ_{2} \times ℤ_{2}$ ; and we express the individual parts using lower case Roman characters. For example, we write i:=a b=01, with a=0 and b=1. This gives matrix elements:

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{01}]}_{ab}^{cd} & = δ_{ac} δ_{bd} {(- 1)}^{b}, {[{\hat{K}}_{10}]}_{ab}^{cd} = δ_{ac} δ_{bd} {(- 1)}^{a}, \\ {[{\hat{K}}_{11}]}_{ab}^{cd} & = δ_{ac} δ_{bd} {(- 1)}^{a + b}; \end{matrix} \end{array}

and more complicated tensor products such as

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{01} \otimes {\hat{K}}_{01} \otimes 1]}_{a_{1} b_{1} a_{2} b_{2} a_{3} b_{3}}^{c_{1} d_{1} c_{2} d_{2} c_{3} d_{3}} \\ = δ_{a_{1} c_{1}} δ_{b_{1} d_{1}} δ_{a_{2} c_{2}} δ_{b_{2} d_{2}} δ_{a_{3} c_{3}} δ_{b_{3} d_{3}} {(- 1)}^{b_{1} + b_{2}} . \end{matrix} \end{array}

Again we interpret strings such as μ≡a ₁a ₂…a _n and ν≡b ₁b ₂…b _n as ordered-bipartitions μ=μ ₀: μ ₁ and ν=ν ₀: ν ₁ of the set [n]. We can then write matrix elements of tensor products as

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{01}^{(e)}]}_{μ, ν}^{μ^{'}, ν^{'}} & = δ_{μ μ^{'}} δ_{ν ν^{'}} {(- 1)}^{| e \cap ν_{1} |}, \\ {[{\hat{K}}_{10}^{(e)}]}_{μ, ν}^{μ^{'}, ν^{'}} & = δ_{μ μ^{'}} δ_{ν ν^{'}} {(- 1)}^{| e \cap μ_{1} |}, \\ {[{\hat{K}}_{11}^{(e)}]}_{μ, ν}^{μ^{'}, ν^{'}} & = δ_{μ μ^{'}} δ_{ν ν^{'}} {(- 1)}^{| e \cap μ_{1} | + | e \cap ν_{1} |} . \end{matrix} \end{array}

Taking the stationary distribution $π = \frac{1}{4} {(1, 1, 1, 1)}^{T}$ as initial distribution, the zero edge-length star-tree distribution is given by

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{i_{1} i_{2} \dots i_{n}} = \frac{1}{4} δ_{i_{1} i_{2}} δ_{i_{1} i_{3}} \dots δ_{i_{1} i_{n}}, \end{matrix} \end{array}

which in the finer index representation is

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{a_{1} b_{1} a_{2} b_{2} \dots a_{n} b_{n}} \\ = \frac{1}{4} δ_{a_{1} a_{2}} δ_{a_{1} a_{3}} \dots δ_{a_{1} a_{n}} δ_{b_{1} b_{2}} δ_{b_{1} b_{3}} \dots δ_{b_{1} b_{n}} . \end{matrix} \end{array}

Recall that elements of the Hadamard matrix can be written as ${[h]}_{b}^{a} = {(- 1)}^{a \times b}$ , where $a, b \in ℤ_{2}$ and we allow multiplication × by extending to the ring of integers . In the transformed basis, we have

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{a_{1} b_{1} a_{2} b_{2} \dots a_{n} b_{n}} & = {[\hat{δ^{n - 1} π}]}_{μ, ν} \\ = \frac{1}{4} \sum_{c_{1}, c_{2}, \dots, c_{n}}^{d_{1}, d_{2}, \dots, d_{n}} {[h]}_{c_{1}}^{a_{1}} {[h]}_{c_{2}}^{a_{2}} \dots {[h]}_{c_{n}}^{a_{n}} {[h]}_{d_{1}}^{b_{1}} {[h]}_{d_{2}}^{b_{2}} \dots \\ \times {[h]}_{d_{n}}^{b_{n}} δ_{a_{1} a_{2}} δ_{a_{1} a_{3}} \dots δ_{a_{1} a_{n}} δ_{b_{1} b_{2}} δ_{b_{1} b_{3}} \dots δ_{b_{1} b_{n}} \\ = \frac{1}{4} \sum_{c_{1}, d_{1}} {(- 1)}^{(a_{1} + a_{2} + \dots a_{n}) \times c_{1} + (b_{1} + b_{2} + \dots + b_{n}) \times d_{1}} \\ = \frac{1}{4} (1 + {(- 1)}^{a_{1} + \dots + a_{n}} + {(- 1)}^{b_{1} + \dots + b_{n}} \\ + {(- 1)}^{a_{1} + \dots + a_{n} + b_{1} + \dots + b_{n}}) \\ = \{\begin{array}{l} 0, & if either | μ_{1} | or | ν_{1} | is odd; \\ 1, & if | μ_{1} | and | ν_{1} | are both even . \end{array} \end{matrix} \end{array}

We recall (5), so under this model we can express a generic phylogenetic tensor as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} K_{01}^{(e)} + β_{e} K_{10}^{(e)} + γ_{e} K_{11}^{(e)}) \cdot δ^{n - 1} π. \end{matrix} \end{array}

To exclude the vanishing components we define, for all ordered bipartitions u=u ₀: u ₁ of the reduced set [n−1],

\begin{array}{lcr} \begin{matrix} γ (u) & = \{\begin{array}{l} 0, if | u_{1} | is even, \\ 1, if | u_{1} | is odd; \end{array} \\ = 2 - (0 | u_{0} | + 1 | u_{1} |) (mod 2), \end{matrix} \end{array}

and intepret u·γ(u) as the string u·γ(u)=a ₁a ₂…a _n−1γ(u). Then, for each pair u,v of ordered-bipartitions of [n−1], we define

\begin{array}{lcr} \begin{matrix} η_{u, v} : = {[\sum_{\emptyset \neq e \subseteq [n - 1]} α_{e} K_{01}^{(e)} + β_{e} K_{10}^{(e)} + γ_{e} K_{11}^{(e)}]}_{u \cdot γ (u), v \cdot γ (v)}^{u \cdot γ (u), v \cdot γ (v)}, \end{matrix} \end{array}

and

\begin{array}{lcr} \begin{matrix} P_{u, v} : = {[P]}_{u \cdot γ (u), v \cdot γ (v)}, \end{matrix} \end{array}

This gives the inversion

\begin{array}{lcr} \begin{matrix} P_{u, v} = e^{- λ} exp (η_{u, v}), η_{u, v} = λ + ln (P_{u, v}) . \end{matrix} \end{array}

Consider the 2ⁿ×2ⁿ⁻¹ rectangular matrices F ₀₁, F ₁₀ and F ₁₁ with components

\begin{array}{lcr} \begin{matrix} \begin{array}{l} {[F_{01}]}_{u, v}^{e} = {[K_{01}^{(e)}]}_{u, v}^{u, v} = {(- 1)}^{| e \cap v_{1} |}, \\ {[F_{10}]}_{u, v}^{e} = {[K_{10}^{(e)}]}_{u, v}^{u, v} = {(- 1)}^{| e \cap u_{1} |}, \\ {[F_{11}]}_{u, v}^{e} = {[K_{11}^{(e)}]}_{u, v}^{u, v} = {(- 1)}^{| e \cap u_{1} | + | e \cap v_{1} |}; \end{array} \end{matrix} \end{array}

where e⊆[n−1] and u=u ₀: u ₁ and v=v ₀: v ₁ are ordered-bipartitions of [n−1]. If we define the column vector $\vec{η} : = {η_{u, v}}$ indexed by pairs of ordered-bipartitions and the column vectors $\vec{α} : = {α_{e}}$ , $\vec{β} : = {α_{e}}$ and $\vec{γ} : = {α_{e}}$ indexed by subsets of [n−1], we then have the matrix equation

\begin{array}{lcr} \begin{matrix} \vec{η} = F_{01} \vec{α} + F_{10} \vec{β} + F_{11} \vec{γ} . \end{matrix} \end{array}

Writing H ⁽ⁿ⁾=H ⁽ⁿ⁻¹⁾⊗H with H ⁽¹⁾=H, we note that

\begin{array}{lcr} \begin{matrix} {[F_{01}]}_{u, v}^{e} & = {[H^{(n - 1)}]}_{u, v}^{\emptyset, e}, {[F_{10}]}_{u, v}^{e} = {[H^{(n - 1)}]}_{u, v}^{e, \emptyset}, \\ {[F_{11}]}_{u, v}^{e} & = {[H^{(n - 1)}]}_{u, v}^{e, e}; \end{matrix} \end{array}

and define the 2ⁿ⁻¹×2ⁿ rectangular matrices G ₀₁,G ₁₀ and G ₁₁ as

\begin{array}{lcr} \begin{matrix} {[G_{01}]}_{e}^{u, v} & = {[{H^{- 1}}^{(n - 1)}]}_{\emptyset, e}^{u, v}, {[G_{10}]}_{e}^{u, v} = {[{H^{- 1}}^{(n - 1)}]}_{e, \emptyset}^{u, v}, \\ {[G_{11}]}_{e}^{u, v} & = {[{H^{- 1}}^{(n - 1)}]}_{e, e}^{u, v} . \end{matrix} \end{array}

Noting that

\begin{array}{lcr} \begin{matrix} \sum_{w, x} {[{H^{- 1}}^{(n - 1)}]}_{u, v}^{w, x} {[H^{(n - 1)}]}_{w, x}^{y, z} = δ_{u, y} δ_{v, z}, \end{matrix} \end{array}

for all u,v,y,z ordered-bipartitions of [n−1], we then have the matrix identities

\begin{array}{lcr} \begin{matrix} G_{01} F_{01} = 1, G_{10} F_{10} = 1, G_{11} F_{11} = 1, \end{matrix} \end{array}

and

\begin{array}{lcr} \begin{matrix} G_{01} F_{10} & = 0 = G_{01} F_{11} = G_{β} F_{11} = G_{10} F_{01} \\ = G_{11} F_{01} = G_{11} F_{10} . \end{matrix} \end{array}

Writing

\begin{array}{lcr} \begin{matrix} \vec{α} = G_{01} \vec{η}, \vec{β} = G_{10} \vec{η}, \vec{γ} = G_{11} \vec{η}, \end{matrix} \end{array}

completes the inversion for the K3ST model.

Inversion of the $ℤ_{r}$ model

We now consider the group based model for $ℤ_{r} = {\{0, 1, 2, \dots (r - 1)\}}_{+ (mod r)} ≅ 〈 σ : σ^{r} = e 〉$ . For this model the generic rate matrix has the form

\begin{array}{lcr} \begin{matrix} Q = - λ 1 + \sum_{i = 1}^{r} α^{i} K_{σ^{i}}, \end{matrix} \end{array}

where $λ = \sum_{i = 1}^{r} α^{i}$ and

\begin{array}{lcr} \begin{matrix} K_{σ} = (\begin{array}{l} 0 & 0 & \dots & 0 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & \dots & 0 & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ 0 & 0 & \dots & 1 & 0 \end{array}), \end{matrix} \end{array}

so that $K_{σ^{i}} = K_{σ}^{i}$ .

Defining ω=e ^2πi/r, we have ω ^r=1 and 1+ω+ω ²+…+ω ^r−1=0 and ${[f]}_{i}^{j} = ω^{ij}$ where i,j=0,1,2,…,r−1. Of course, f is the character table of $ℤ_{r}$ and ${[f^{- 1}]}_{j}^{i} = \frac{1}{r} ω^{- ij}$ .

Lemma 1.

\begin{array}{lcr} \begin{matrix} \sum_{ν} {[f \otimes f \otimes \dots \otimes f]}_{μ}^{ν} {[f^{- 1} \otimes f^{- 1} \otimes \dots \otimes f^{- 1}]}_{ν}^{μ^{'}} = δ_{μ μ^{'}}, \end{matrix} \end{array}

where μ,ν,μ ^′ are ordered-r-partitions of the set [n] defined by the strings i ₁i ₂…i _n, j ₁j ₂…j _n and k ₁k ₂…k _n, respectively.

Proof.

The result is obvious by the definition of tensor product. However, explicitly we have

\begin{array}{lcr} \begin{matrix} \sum_{ν} {[f \otimes f \otimes \dots \otimes f]}_{μ}^{ν} & {[f^{- 1} \otimes f^{- 1} \otimes \dots \otimes f^{- 1}]}_{ν}^{μ^{'}} \\ = \frac{1}{r^{n}} \sum_{0 \leq j_{1}, j_{2}, \dots, j_{r - 1} \leq (r - 1)} \\ \times ω^{i_{1} j_{1} + i_{2} j_{2} + \dots i_{r - 1} j_{r - 1}} ω^{- (j_{1} k_{1} + j_{2} k_{2} + \dots + j_{n} k_{n})} \\ = \frac{1}{r^{n}} \sum_{0 \leq j_{1}, j_{2}, \dots, j_{r - 1} \leq (r - 1)} \\ \times ω^{j_{1} (i_{1} - k_{1}) + j_{2} (i_{2} - k_{2}) + \dots + j_{n} (i_{n} - k_{n})} \end{matrix} \end{array}

which clearly equals 1 if i _ℓ−k _ℓ=0 for all ℓ, and, by repeatedly applying 1+ω+ω ²+…+ω ^r−1=0, equals 0 otherwise.

The regular representation contains exactly one copy of every irreducible representation and the irreducible representations of $ℤ_{r}$ are given by the powers of ω:

ρ_{i} : \begin{array}{l} ℤ_{r} \to ℂ \\ σ \mapsto ω^{i} \end{array} .

Thus the change of basis $K_{σ^{i}} \mapsto {\hat{K}}_{σ^{i}} = {fK}_{σ^{i}} f^{- 1}$ will give diagonal matrices ${\hat{K}}_{σ^{i}}$ . Additionally,

Lemma 2.

In the diagonal basis, the matrices ${\hat{K}}_{σ^{i}} : = {fK}_{σ^{i}} f^{- 1}$ have matrix elements given by ${[{\hat{K}}_{σ^{s}}]}_{i}^{j} = ω^{is} δ_{ij}$ .

Proof.

Consider the matrix elements ${[K_{σ^{s}}]}_{i}^{j} = δ_{i σ^{s} (j)}$ . Thus

\begin{array}{lcr} \begin{matrix} {[{fK}_{σ^{s}} f^{- 1}]}_{j}^{i} & = \sum_{k, l} ω^{ik} δ_{k σ^{s} (l)} ω^{- lj} = \sum_{l} ω^{i σ^{s} (l) - lj} \\ = \sum_{l} ω^{i (l + s) - lj} \\ = ω^{is} \sum_{l} ω^{l (i - j)} = ω^{is} δ_{ij}, \end{matrix} \end{array}

where we have used $ω^{σ^{s} (m)} = ω^{m + s}$ .

Now

\begin{array}{lcr} \begin{matrix} {[δ^{n - 1} π]}_{i_{1} i_{2} \dots i_{n}} = \frac{1}{r} δ_{i_{1} i_{2}} δ_{i_{1} i_{3}} \dots δ_{i_{1} i_{n}}, \end{matrix} \end{array}

and

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{i_{1} i_{2} \dots i_{n}} & = \frac{1}{r} \sum_{j_{1}, j_{2}, \dots, j_{r}} ω^{i_{1} j_{1} + i_{2} j_{2} + \dots + i_{n} j_{n}} \\ \times δ_{j_{1} j_{2}} δ_{j_{1} j_{3}} \dots δ_{j_{1} j_{n}} \\ = \frac{1}{r} \sum_{j_{1}} ω^{j_{1} (i_{1} + i_{2} + \dots + i_{n})} \\ = \{\begin{array}{l} 1 & if i_{1} + i_{2} + \dots + i_{n} = 0 (mod r) \\ 0, & otherwise. \end{array} \end{matrix} \end{array}

Translating this result using the ordered-r-partitions for indices, we have

Lemma 3.

In the diagonal basis, the uniform initial distribution on the star tree has components

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ} \\ = \{\begin{array}{l} 1 if 0 | μ_{0} | + 1 | μ_{1} | + 2 | μ_{2} | + \dots + (r - 1) | μ_{r - 1} | = 0 (mod r) \\ 0, otherwise. \end{array}, \end{matrix} \end{array}

where μ=μ ₀: μ ₁: μ ₂:…: μ _r−1 is an ordered-r-partition of the set [n].

Again recall that for this model a generic phylogenetic tensor can be written as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1], s \in [r - 1]} α_{e}^{s} K_{σ^{s}}^{(e)}) δ^{n - 1} π, \end{matrix} \end{array}

where $π = \frac{1}{r} {(1, 1, \dots, 1)}^{T}$ . In the diagonal basis $\hat{P} : = f^{(n)} \cdot P$ and as a consequence of Lemma 3 $\hat{P}$ will have many vanishing components. To avoid these we take u=u ₀: u ₁: u ₂:…: u _r−1 as an ordered-r-partition of [n−1] and set

\begin{array}{lcr} \begin{matrix} γ (u) & = r - (0 | u_{0} | + 1 | u_{1} | + 2 | u_{2} | + \dots \\ + (r - 1) | u_{r - 1} |) (mod r) . \end{matrix} \end{array}

If we define $P_{u} : = {[\hat{P}]}_{u \cdot γ (u)}$ and

\begin{array}{lcr} \begin{matrix} η_{u} : = {[\sum_{\emptyset \neq e \subseteq [n - 1], s \in [r - 1]} α_{e}^{s} {\hat{K}}_{σ^{s}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{matrix} \end{array}

we then have the first part of the inversion for the $ℤ_{r}$ model:

\begin{array}{lcr} \begin{matrix} P_{u} & = e^{- λ} exp (η_{u}), \\ η_{u} & = ln (P_{u}) + λ. \end{matrix} \end{array}

For each i∈[r−1], we define the column vectors ${\vec{α}}_{i} : = {\{α_{e}^{i}\}}_{\emptyset \neq e \subseteq [n - 1]}$ , and, for each ∅≠e⊆[n−1] and u an ordered- (r−1)-partition of [n−1], we define the rectangular r ⁿ⁻¹×2ⁿ⁻¹ matrices

\begin{array}{lcr} \begin{matrix} \begin{array}{l} {[F_{1}]}_{u}^{e} & : = {[K_{σ}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, & {[F_{2}]}_{u}^{e} : = {[K_{σ^{2}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \dots \\ {[F_{r - 1}]}_{u}^{e} & : = {[K_{σ^{r - 1}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)}, \end{array} \end{matrix} \end{array}

so we have the vector equation

\begin{array}{lcr} \begin{matrix} η = F_{1} \vec{α_{1}} + F_{2} \vec{α_{2}} + \dots + F_{r - 1} {\vec{α}}_{r - 1} . \end{matrix} \end{array}

We claim that

Lemma 4.

\begin{array}{lcr} \begin{matrix} {[F_{1}]}_{u}^{e} & = {[f^{(n - 1)}]}_{u}^{e^{c} :e:∅:∅: \dots :∅}, \\ {[F_{2}]}_{u}^{e} & = {[f^{(n - 1)}]}_{u}^{e^{c} :∅:e:∅: \dots :∅}, \dots \\ \dots {[F_{r - 1}]}_{u}^{e} & = {[f^{(n - 1)}]}_{u}^{e^{c} :∅:∅:∅: \dots :e} . \end{matrix} \end{array}

Proof.

We recall that ${[{\hat{K}}_{σ^{s}}]}_{i}^{j} = ω^{is} δ_{ij}$ , so, for μ=μ ₀: μ ₁: μ ₂:…: μ _r−1 an ordered-r-parition of [n], and e a subset of [n−1] we have

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{σ^{s}}^{(e)}]}_{μ}^{μ} = ω^{s (0 | μ_{0} \cap e | + 1 | μ_{1} \cap e | + \dots + (r - 1) | μ_{r - 1} \cap e |)}, \end{matrix} \end{array}

so

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{σ^{s}}^{(e)}]}_{u \cdot γ (u)}^{u \cdot γ (u)} = ω^{s (0 | u_{0} \cap e | + 1 | u_{1} \cap e | + \dots + (r - 1) | u_{r - 1} \cap e |)}, \end{matrix} \end{array}

because e⊆[n−1]. On the other hand ${[f]}_{i}^{j} = ω^{ij}$ , so

\begin{array}{lcr} \begin{matrix} {[f^{(n - 1)}]}_{u}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅} = ω^{s (0 | u_{0} \cap e | + 1 | u_{1} \cap e | + \dots + (r - 1) | u_{r - 1} \cap e |)}, \end{matrix} \end{array}

where e appears in the s ^th position.

Define, for i∈[r−1], the rectangular 2ⁿ⁻¹×r ⁿ⁻¹ matrices

\begin{array}{lcr} \begin{matrix} {[G_{1}]}_{e}^{u} : & = {[{f^{- 1}}^{(n - 1)}]}_{u \cdot γ (u)}^{e^{c} \cdot γ (u) :e:∅:∅: \dots :∅} \\ {[G_{2}]}_{e}^{u} : & = {[{f^{- 1}}^{(n - 1)}]}_{u \cdot γ (u)}^{e^{c} \cdot γ (u) :∅:e:∅: \dots :∅} \\ ⋮ \\ {[G_{r - 1}]}_{e}^{u} & = {[{f^{- 1}}^{(n - 1)}]}_{u \cdot γ (u)}^{e^{c} \cdot γ (u) :∅:∅:∅: \dots :e} . \end{matrix} \end{array}

Of course G _iF _j=δ _ij1, so we now have the second part of the inversion:

\begin{array}{lcr} \begin{matrix} \vec{α_{i}} = G_{i} η. \end{matrix} \end{array}

Inversion of any abelian group-based model

Lemma 5.

Any (finitely generated) abelian group G is isomorphic to a direct product of cyclic groups of prime-power order, ie. $G ≅ ℤ_{r_{1}} \times ℤ_{r_{2}} \times \dots \times ℤ_{r_{q}}$ where each $r_{i} = p_{i}^{n_{i}}$ where p _i is prime and n _i is a positive integer.

Lemma 6.

The group-based model arising from the G is defined only up to group isomorphisms of G.

Proof.

A generic rate matrix for the group-based model arsing from G is given by

\begin{array}{lcr} \begin{matrix} Q = - λ 1 + \sum_{e \neq σ \in G} α^{σ} K_{σ} . \end{matrix} \end{array}

Under a group isomorphism ϕ:G→G ^′, we have ϕ(σ _iσ _j)=ϕ(σ _i)ϕ(σ _j).

Recall (2), so that the matrix elements ${[K_{σ}]}_{i}^{j}$ is set via the action σ _i↦σ σ _i=σ _j. If we consider the regular representation of G ^′ we then have ${[K_{ϕ (σ})]}_{i}^{j}$ defined by ϕ(σ _i)↦ϕ(σ)ϕ(σ _i). Now ϕ(σ)ϕ(σ _i)=ϕ(σ σ _i)=ϕ(σ _j) and, because ϕ is a group isomorphism, this occurs if and only if σ σ _i=σ _j. Thus ${[K_{ϕ (σ})]}_{i}^{j} = {[K_{σ}]}_{i}^{j}$ for all i and j.

This means that we can restrict attention to a single representitive in the isomorphism class of G. Of course, for this purpose we choose the representative guaranteed by Lemma 5.

Thus, for any abelian group G, with generators σ ₁,σ ₂,…,σ _q the corresponding group-based model has rate generators given by

\begin{array}{lcr} \begin{matrix} L_{σ} = - 1 + K_{σ_{1}^{m_{1}}} \otimes K_{σ_{2}^{m_{2}}} \otimes \dots \otimes K_{σ_{q}^{m_{q}}}, \end{matrix} \end{array}

for all $e \neq σ = (σ_{1}^{m_{1}}, σ_{2}^{m_{2}}, \dots, σ_{q}^{m_{q}}) \in G$ , where $K_{σ_{i}}$ is the permutation matrix representing the generator $σ_{i} \in ℤ_{r_{i}}$ . The character table f of G is simply the tensor product of the individual character tables of the $ℤ_{r_{i}}$ :

\begin{array}{lcr} \begin{matrix} f = f_{1} \otimes f_{2} \otimes \dots \otimes f_{q} . \end{matrix} \end{array}

In the diagonal basis we have matrix elements

\begin{array}{lcr} \begin{matrix} {[f_{k} K_{σ_{k}^{s}} f_{k}^{- 1}]}_{i}^{j} = {[{\hat{K}}_{σ_{k}^{s}}]}_{i}^{j} = {(ω_{k})}^{is} δ_{ij}, \end{matrix} \end{array}

where ω _k is a k ^th root of unity. Thus

\begin{array}{lcr} \begin{matrix} {[{\hat{K}}_{σ_{1}^{m_{1}}} \otimes {\hat{K}}_{σ_{2}^{m_{2}}} \otimes \dots {\hat{K}}_{σ_{q}^{m_{q}}}]}_{i_{1} i_{2} \dots i_{q}}^{j_{1} j_{2} \dots j_{q}} \\ = δ_{i_{1} j_{1}} δ_{i_{2} j_{2}} \dots δ_{i_{q} j_{q}} {(ω_{1})}^{i_{1} m_{1}} {(ω_{2})}^{i_{2} m_{2}} \dots {(ω_{q})}^{i_{q} m_{q}} . \end{matrix} \end{array}

We write phylogenetic tensors for this model in the form

P_{i_{11} i_{12} \dots i_{1 n}, i_{21} i_{22} \dots i_{2 n} \dots \dots i_{q 1} i_{q 2} \dots i_{qn}},

where 0≤i _sj≤r _s for all 0≤s≤q. We simplify notation by writing each group of indices as μ ^(s):=i _s1i _s2…i _sn where μ ^(s) is an ordered- r _s-partition of [n].

Lemma 7.

In the diagonal basis, the uniform initial distribution on the star tree has components

\begin{array}{lcr} \begin{matrix} {[\hat{δ^{n - 1} π}]}_{μ^{(1)} μ^{(2)} \dots μ^{(q)}} \\ = \{\begin{array}{l} 1, & if 0 | μ_{0}^{(i)} | + 1 | μ_{1}^{(i)} | + \dots + (r_{i} - 1) | μ_{r - 1}^{(i)} | = 0 \forall i; \\ 0, & otherwise. \end{array} \end{matrix} \end{array}

A generic phylogenetic tensor for this model can be expressed as

\begin{array}{lcr} \begin{matrix} P = e^{- λ} exp (\sum_{\emptyset \neq e \subseteq [n - 1]}^{s_{i} \in [r_{i} - 1]} α_{e}^{s_{1} s_{2} \dots s_{q}} K_{σ_{1}^{s_{1}}}^{(e)} \otimes K_{σ_{2}^{s_{2}}}^{(e)} \otimes \dots \otimes K_{σ_{q}^{s_{q}}}^{(e)}) \cdot δ^{n - 1} π, \end{matrix} \end{array}

where π is the unifrom distribution on $\sum_{i = 1}^{q} r_{i}$ states, i.e.

π = {(\sum_{i = 1}^{q} r_{i})}^{- 1} {(1, 1, \dots, 1)}^{T} .

In the diagonal basis $\hat{P} = {(f_{1} \otimes f_{2} \otimes \dots \otimes f_{q})}^{(n)} \cdot P$ , and, as a consequence of the previous lemma, P has many vanishing components. To avoid these, for each i∈[q] we take $u^{(i)} = u_{0}^{(i)} : u_{1}^{(i)} : u_{2}^{(i)} : \dots : u_{r_{i} - 1}^{(i)}$ as an ordered- r _i-partition of [n−1] and set

\begin{array}{lcr} \begin{matrix} γ_{i} (u^{(i)}) & = r_{i} - (0 | u_{0}^{(i)} | + 1 | u_{1}^{(i)} | + 2 | u_{2}^{(i)} | + \dots \\ + (r_{i} - 1) | u_{r - 1}^{(i)} |) (mod r) . \end{matrix} \end{array}

We then define

\begin{array}{lcr} \begin{matrix} P_{u^{(1)} u^{(2)} \dots u^{(q)}} : = {[\hat{P}]}_{u^{(1)} \cdot γ_{1} (u^{(1)}) u^{(2)} \cdot γ_{2} (u^{(2)}) \dots u^{(q)} \cdot γ_{1} (u^{(q)})}, \end{matrix} \end{array}

and

\begin{array}{lcr} \begin{matrix} η_{u^{(1)} \dots u^{(q)}} \\ : = {[\sum_{\emptyset \neq e \subseteq [n - 1]}^{s_{i} \in [r_{i} - 1]} α_{e}^{s_{1} \dots s_{q}} {\hat{K}}_{σ_{1}^{s_{1}}}^{(e)} \otimes \dots \otimes {\hat{K}}_{σ_{q}^{s_{q}}}^{(e)}]}_{u^{(1)} \cdot γ_{1} (u^{(1)}) \dots u^{(q)} \cdot γ_{1} (u^{(q)})}^{u^{(1)} \cdot γ_{1} (u^{(1)}) \dots u^{(q)} \cdot γ_{1} (u^{(q)})}, \end{matrix} \end{array}

so that we have the first part of the inversion

\begin{array}{lcr} \begin{matrix} P_{u^{(1)} u^{(2)} \dots u^{(q)}} & = e^{- λ} exp (η_{u^{(1)} u^{(2)} \dots u^{(q)}}), \\ η_{u^{(1)} u^{(2)} \dots u^{(q)}} & = λ + ln (P_{u^{(1)} u^{(2)} \dots u^{(q)}}) . \end{matrix} \end{array}

We define the column vectors ${\vec{α}}^{s_{1} s_{2} \dots s_{q}} : = {\{α_{e}^{s_{1} s_{2} \dots s_{q}}\}}_{\emptyset \neq e \subseteq [n - 1]}$ and $\vec{η} : = {η_{u^{(1)} u^{(2)} \dots u^{(q)}}}$ where u _i is an ordered- r _i-partition of [n−1], and we define the (r ₁r ₂…r _q)ⁿ⁻¹×2ⁿ⁻¹ matrices

\begin{array}{lcr} \begin{matrix} {[F_{s_{1} s_{2} \dots s_{q}}]}_{u_{1} u_{2} \dots u_{q}}^{e} \\ : = {[K_{σ_{1}^{s_{1}}}^{(e)}]}_{u_{1} \cdot γ (u_{1})}^{u_{1} \cdot γ (u_{1})} {[K_{σ_{2}^{s_{2}}}^{(e)}]}_{u_{2} \cdot γ (u_{2})}^{u_{2} \cdot γ (u_{2})} \dots {[K_{σ_{q}^{s_{q}}}^{(e)}]}_{u_{q} \cdot γ (u_{q})}^{u_{q} \cdot γ (u_{q})} \\ = {[f_{1}^{(n - 1)}]}_{u_{1}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅} {[f_{2}^{(n - 1)}]}_{u_{2}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅} \dots \\ \dots {[f_{q}^{(n - 1)}]}_{u_{q}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅}, \end{matrix} \end{array}

where in each term e appears in the $s_{i}^{th}$ position and the equality follows from Lemma 4.

We can then write the vector equation

\begin{array}{lcr} \begin{matrix} \vec{η} = \sum_{s_{1} s_{2} \dots s_{q} : 1 \leq s_{i} \leq r_{i} - 1} F_{s_{1} s_{2} \dots s_{q}} {\vec{α}}^{s_{1} s_{2} \dots s_{q}} . \end{matrix} \end{array}

If we define the 2ⁿ⁻¹×(r ₁r ₂…r _q)ⁿ⁻¹ matrices

\begin{array}{lcr} \begin{matrix} {[G_{s_{1} s_{2} \dots s_{q}}]}_{e}^{u_{1} u_{2} \dots u_{q}} \\ = {[{f_{1}^{- 1}}^{(n - 1)}]}_{u_{1}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅} & {[{f_{2}^{- 1}}^{(n - 1)}]}_{u_{2}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅} \dots \\ \dots {[{f_{q}^{- 1}}^{(n - 1)}]}_{u_{q}}^{e^{c} :∅: \dots :∅:e:∅: \dots :∅}, \end{matrix} \end{array}

where in each term e appears in the $s_{i}^{th}$ position, we have the orthogonality relations

\begin{array}{lcr} \begin{matrix} G_{s_{1} s_{2} \dots s_{q}} F_{s_{1}^{'} s_{2}^{'} \dots s_{q}^{'}} = δ_{s_{1} s_{1}^{'}} δ_{s_{2} s_{2}^{'}} \dots δ_{s_{q} s_{q}^{'}} 1 . \end{matrix} \end{array}

This gives us the second part of the inversion of any group-based model:

\begin{array}{lcr} \begin{matrix} {\vec{α}}^{s_{1} s_{2} \dots s_{q}} = G_{s_{1} s_{2} \dots s_{q}} \vec{η} . \end{matrix} \end{array}

Conclusion

In this article we have given an alternative derivation of the inversion of group-based phylogenetic models. Primarily our method relies on the remarkable intertwining relation between branching events and Markov evolution (4), and the resulting simplified expression of phylogenetic tensors given in (5). From there we took a representation theoretic approach concentrating on the structure of tensor indices.

Authors’ information

Jeremy G. Sumner is a ARC Research Fellow, Peter D. Jarvis is a Alexander von Humboldt Fellow and Barbara R. Holland is a ARC Future Fellow.

References

Felsenstein J: Inferring Phylogenies . 2004, Sinauer Associates, Sunderland
Google Scholar
Semple C, Steel M: Phylogenetics . 2003, Oxford University Press, Oxford
Google Scholar
Hendy MD, Penny D: A framework for the quantitative study of evolutionary trees . Syst Zool. 1989, 38: 297-309. 10.2307/2992396.
Article Google Scholar
Hendy MD: The relationship between simple evolutionary tree models and observable sequence data . Syst Zool. 1989, 38: 310-321. 10.2307/2992397.
Article Google Scholar
Hendy MD, Penny D: Spectral analysis of phylogenetic data . J Class. 1993, 10: 1-20. 10.1007/BF02638451.
Article Google Scholar
Steel M, Hendy M, Székely L, Erdös P: Spectral analysis and a closest tree method for genetic sequences . Appl Math Lett. 1992, 5 (6): 63-67. 10.1016/0893-9659(92)90016-3.
Article Google Scholar
Székely LA, Erdos P, Steel M, Penny D: A fourier inversion formula for evolutionary trees . Appl Math Lett. 1993, 6 (2): 13-16. 10.1016/0893-9659(93)90004-7.
Article Google Scholar
Hendy MD, Penny D, Steel M: A discrete Fourier analysis for evolutionary trees . Proc Natl Acad Sci. 1994, 91: 3339-3343. 10.1073/pnas.91.8.3339.
Article PubMed CAS PubMed Central Google Scholar
Székely LA, Steel MA, Erdős PL: Fourier calculus on evolutionary trees. Adv Appl Math. 1993, 14: 200-216. 10.1006/aama.1993.1001.
Article Google Scholar
Hendy MD, Charleston MA: Hadamard conjugation: a versatile tool for modelling nucleotide sequence evolution . New Zeal J Bot. 1993, 31: 231-237. 10.1080/0028825X.1993.10419500.
Article Google Scholar
Holland BR, Penny D, Hendy MD: Outgroup misplacement and phylogenetic inaccuracy under a molecular clock – a simulation study . Syst Biol. 2003, 52: 229-238. 10.1080/10635150390192771.
Article PubMed CAS Google Scholar
Hendy MD: A combinatorial description of the closest tree algorithm for finding evolutionary trees . Discrete Math. 1991, 96: 51-58. 10.1016/0012-365X(91)90469-I.
Article Google Scholar
Lento GM, Hickson RE, Chambers GK, Penny D: Use of spectral analysis to test hypotheses on the origin of pinninpeds . Mol Biol Evol. 1995, 12: 28-52. 10.1093/oxfordjournals.molbev.a040189.
Article PubMed CAS Google Scholar
Huber KT, Watson EE, Hendy MD: An algorithm for constructing local regions in a phylogenetic network . Mol Phylogenet Evol. 2001, 19: 1-8. 10.1006/mpev.2000.0891.
Article PubMed CAS Google Scholar
Huber KT, Langton M, Penny D, Moulton V, Hendy M: Spectronet: a package for computing spectra and median networks . Appl Bioinform. 2002, 1: 2041-2059.
Google Scholar
Schliep KP: Some applications of statistical phylogenetics. Ph.D. thesis. Massey University; 2009.
von Haeseler A, Churchill GA: Network models for sequence evolution . J Mol Evol. 1993, 37: 77-85. 10.1007/BF00170465.
Article PubMed CAS Google Scholar
Bryant D: Extending tree models to split networks . Algebraic Statistics and Computational Biology . Edited by: Pachter L, Sturmfels B. 2005, Cambridge University Press, Cambridge, 297-368.
Google Scholar
Bryant D: Hadamard phylogenetic methods and the n -taxon process . Bull Math Biol. 2009, 71: 297-309. 10.1007/s11538-008-9364-8.
Article Google Scholar
Matsen FA, Steel M: Phylogenetic mixtures on a single tree can mimic a tree of another topology . Syst Biol. 2007, 56: 767-775. 10.1080/10635150701627304.
Article PubMed Google Scholar
Matsen FA, Mossel E, Steel M: Mixed-up trees: the structure of phylogenetic mixtures . Bull Math Biol. 2008, 70: 1115-1139. 10.1007/s11538-007-9293-y.
Article PubMed Google Scholar
Griffiths RC, Majoram P: Ancestral inference from samples of DNA sequences with recombination . J Comput Biol. 1996, 3: 479-502. 10.1089/cmb.1996.3.479.
Article PubMed CAS Google Scholar
Griffiths RC, Marjoram P: An ancestral recombination graph . Progress in Population Genetics and Human Evolution, Volume 87 of IMA volumes in mathematics and its applications . 1997, Springer Verlag, Berlin, 257-270.
Chapter Google Scholar
Jin G, Nakhleh L, Snir S, Tuller T: Maximum likelihood of phylogenetic networks . Bioinformatics. 2006, 21: 2604-2611. 10.1093/bioinformatics/btl452.
Article Google Scholar
Strimmer K, Moulton V: Likelihood analysis of phylogenetic networks using directed graphical models . Mol Biol Evol. 2000, 17: 875-881. 10.1093/oxfordjournals.molbev.a026367.
Article PubMed CAS Google Scholar
Strimmer K, Wiuf C, Moulton V: Recombination analysis using directed graphical models . Mol Biol Evol. 2001, 18: 97-99. 10.1093/oxfordjournals.molbev.a003725.
Article PubMed CAS Google Scholar
Sumner JG, Holland BR, Jarvis PD: The algebra of the general Markov model on trees and networks . Bull Math Biol. 2012, 74 (4): 858-880. 10.1007/s11538-011-9691-z.
Article PubMed CAS Google Scholar
Sumner JG, Charleston MA, Jermiin LS, Jarvis PD: Markov invariants, plethysms, and phylogenetics . J Theor Biol. 2008, 253: 601-615. 10.1016/j.jtbi.2008.04.001.
Article PubMed CAS Google Scholar
Sumner JG, Jarvis PD: Markov invariants and the isotropy subgroup of a quartet tree . J Theor Biol. 2009, 258: 302-310. 10.1016/j.jtbi.2009.01.021.
Article PubMed CAS Google Scholar
Holland BR, Jarvis PD, Sumner JG: Low-parameter phylogenetic inference under the general markov model . Syst Biol. 2013, 62: 78-92. 10.1093/sysbio/sys072.
Article PubMed Google Scholar
Bashford JD, Jarvis PD, Sumner JG, Steel MA: U(1)×U(1)×U(1) symmetry of the Kimura 3ST model and phylogenetic branching processes . J Phys A Math Gen. 2004, 37: L1-L9. 10.1088/0305-4470/37/8/L01.
Article Google Scholar
Sumner JG, Fernández-Sánchez J, Jarvis PD: Lie Markov models . J Theor Biol. 2012, 298: 16-31. 10.1016/j.jtbi.2011.12.017.
Article PubMed CAS Google Scholar
Sagan BE: The Symmetric Group: Representations, Combinatorial Algorithms, and Symmetric Functions . 2001, Springer, New York
Book Google Scholar
Evans SN, Speed T: Invariants of some probability models used in phylogenetic inference . Ann Stat. 1993, 21: 355-377. 10.1214/aos/1176349030.
Article Google Scholar
Sumner JG, Jarvis PD: Entanglement invariants and phylogenetic branching . J Math Biol. 2005, 51: 18-36. 10.1007/s00285-004-0309-z.
Article PubMed CAS Google Scholar
Kimura M: Estimation of evolutionary distances between homologous nucleotide sequences . Proc Natl Acad Sci. 1981, 78: 1454-1458.
Google Scholar

Download references

Acknowledgements

This research was supported by Australian Research Council Discovery Grants DP0877447 (JGS and PDJ) and FT100100031 (BRH).

Author information

Authors and Affiliations

School of Physical Sciences, University of Tasmania, Hobart, TAS, 7001, Australia
Jeremy G Sumner, Peter D Jarvis & Barbara R Holland

Authors

Jeremy G Sumner
View author publications
You can also search for this author in PubMed Google Scholar
Peter D Jarvis
View author publications
You can also search for this author in PubMed Google Scholar
Barbara R Holland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeremy G Sumner.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JGS conducted much of the theoretical work presented, and was primarily responsible for drafting of the paper. PDJ contributed to the theoretical work and assisted in editing of the paper. BRH contributed to the theoretical work and contributed substantially to the writing of the paper. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Sumner, J.G., Jarvis, P.D. & Holland, B.R. A tensorial approach to the inversion of group-based phylogenetic models. BMC Evol Biol 14, 236 (2014). https://doi.org/10.1186/s12862-014-0236-6

Download citation

Received: 15 April 2014
Accepted: 06 November 2014
Published: 04 December 2014
DOI: https://doi.org/10.1186/s12862-014-0236-6

A tensorial approach to the inversion of group-based phylogenetic models

Abstract

Background

Results

Conclusion

Similar content being viewed by others

Phylogenetic Networks

Maximum Likelihood Estimation of Symmetric Group-Based Models via Numerical Algebraic Geometry

Rank Conditions on Phylogenetic Networks

Background

Methods

Group-based models

Phylogenetic tensors

Results

The binary-symmetric case

Inversion of the ℤ 3 model

Inversion of the K3ST model

Inversion of the ℤ r model

Lemma 1.

Proof.

Lemma 2.

Proof.

Lemma 3.

Lemma 4.

Proof.

Inversion of any abelian group-based model

Lemma 5.

Lemma 6.

Proof.

Lemma 7.

Conclusion

Authors’ information

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Inversion of the $ℤ_{3}$ model

Inversion of the $ℤ_{r}$ model