The following theorem is the main result of this paper. It characterizes \((\mathcal {G},L)\)-embeddable transition matrices in terms of their eigenvalues.
Theorem 1
Fix a finite abelian group \(\mathcal {G}\), a finite set \(\mathcal {L}\), and a symmetric \(\mathcal {G}\)-compatible labeling function \(L: \mathcal {G} \rightarrow \mathcal {L}\). Let P be a \((\mathcal {G},L)\)-Markov matrix. Then P is \((\mathcal {G},L)\)-embeddable if and only if the vector \(\lambda \in \mathbb {R}^{\mathcal {G}}\) of eigenvalues of P is in the set
$$\begin{aligned} \{\lambda \in \mathbb {R}^{\mathcal {G}}:&\lambda _0=1, \prod _{h \in \mathcal {G}} \lambda _{h}^{\text {Re}((K)_{g,h})} \ge 1 \text { for all nonzero } g\in \mathcal {G},\\&\lambda _g>0 \text { for all } g \in \mathcal {G}, \text { and } \lambda _g=\lambda _h \text { whenever } L(g)=L(h)\}. \end{aligned}$$
Proof
We start by summarizing the idea of the proof. We consider the set \(\Psi _{\mathcal {G},L}\) that consists of vectors \(\psi \) that determine \((\mathcal {G},L)\)-rate matrices. Our goal is to characterize the set \(\check{F}_{\mathcal {G},L}\) of eigenspectra of Markov matrices that are matrix exponentials of \((\mathcal {G},L)\)-rate matrices determined by vectors \(\psi \) in \(\Psi _{\mathcal {G},L}\). The first step is to consider the discrete Fourier transform of the set \(\Psi _{\mathcal {G},L}\), which we denote by \(\check{\Psi }_{\mathcal {G},L}\). By Lemma 4, this set is the set of eigenvalues of the \((\mathcal {G},L)\)-rate matrices. The second step is to consider the image of the set \(\check{\Psi }_{\mathcal {G},L}\) under coordinatewise exponentiation. This set is precisely \(\check{F}_{\mathcal {G},L}\), because \((\mathcal {G},L)\)-rate matrices are diagonalizable by the discrete Fourier transform matrix K by the discussion after Lemma 4 and thus if a \((\mathcal {G},L)\)-rate matrix Q is determined by \(\psi \in \mathbb {R}^{\mathcal {G}}\) then
$$\begin{aligned} P=e^Q=K \cdot e^{\text {diag}(\check{\psi })} \cdot K^{-1}=K \cdot \text {diag}(e^{\check{\psi }}) \cdot K^{-1}, \end{aligned}$$
where \(\check{\psi }\) is the vector of eigenvalues of Q and \(e^{\check{\psi }}\) is the vector of eigenvalues of P.
More specifically, let
$$\begin{aligned} \Psi _{\mathcal {G},L}=\{\psi \in \mathbb {R}^{\mathcal {G}}:&\sum _{g \in \mathcal {G}} \psi (g)=0, \psi (g) \ge 0 \text { for all nonzero } g \in \mathcal {G}, \text {and}\\&\psi (g)=\psi (h) \text { whenever } L(g)=L(h)\}. \end{aligned}$$
The vectors in the set \(\Psi _{\mathcal {G},L}\) are in one-to-one correspondence with \((\mathcal {G},L)\)-rate matrices. The image of \(\Psi _{\mathcal {G},L}\) under the discrete Fourier transform is the set
$$\begin{aligned} \check{\Psi }_{\mathcal {G},L}= \{\check{\psi } \in \mathbb {R}^{\mathcal {G}}:&\check{\psi }(0)=0, (K^{-1}\check{\psi })(g)\ge 0 \text { for all nonzero } g\in \mathcal {G}, \text { and}\\ {}&\check{\psi }(g)=\check{\psi }(h) \text { whenever } L(g)=L(h)\}. \end{aligned}$$
By Lemma 4, this set is the set of eigenvalues of the \((\mathcal {G},L)\)-rate matrices.
The image of \(\check{\Psi }_{\mathcal {G},L}\) under the coordinatewise exponentiation is the set of eigenvalues of the \((\mathcal {G},L)\)-Markov matrices, which we denote by \(\check{F}_{\mathcal {G},L}\). We claim that \(\check{F}_{\mathcal {G},L}\) is equal to the set
$$\begin{aligned} \begin{aligned}&\{\check{f} \in \mathbb {R}^{\mathcal {G}}: \check{f}(0)=1, \prod _{h\in \mathcal {G}}(\check{f}(h))^{(K^{-1})_{g,h}}\ge 1 \text { for all nonzero } g\in \mathcal {G},\\&\check{f}(g)>0 \text { for all } g \in \mathcal {G}, \text { and } \check{f}(g)=\check{f}(h) \text { whenever } L(g)=L(h)\}. \end{aligned} \end{aligned}$$
(4.1)
Indeed, let \(\check{f}=\exp (\check{\psi })\). Then \(\check{f}>0\) because the image of the exponentiation map is positive. The inequality \(a^Tx\ge 0\) is equivalent to \(\exp (a^Tx)\ge 1.\) Hence, the equation \(\check{\psi }(0)=0\) gives \(\check{f}(0)=1\) and the inequalities \((K^{-1}\check{\psi })(g)\ge 0\) give
$$\begin{aligned} \prod _{h\in \mathcal {G}}(\check{f}(h))^{(K^{-1})_{g,h}}=\prod _{h\in \mathcal {G}}(e^{(\check{\psi }(h))})^{(K^{-1})_{g,h}}=e^{\sum _{h\in \mathcal {G}}\check{\psi }(h)(K^{-1})_{g,h}} = e^{(K^{-1}\check{\psi })(g)} \ge 1 \end{aligned}$$
(4.2)
for all nonzero \(g\in \mathcal {G}\). Hence \(\check{f}\) is in the set (4.1). Conversely, let \(\check{f}\) be a vector in the set (4.1). Then under coordinatewise logarithm, \(\log (\check{f}) \in \check{\Psi }_{\mathcal {G},L}\) and \(\check{f}=\exp (\log (\check{f}))\). Hence \(\check{f}\) is in the image of \(\check{\Psi }_{\mathcal {G},L}\). Thus \(\check{F}_{\mathcal {G},L}\) is equal to the set (4.1).
It is left to rewrite the inequalities (4.2) as in the statement of the theorem. We have
$$\begin{aligned}&(K^{-1})_{g,-h}=\frac{1}{|\mathcal {G}|}\overline{K_{-h,g}}=\frac{1}{|\mathcal {G}|}\overline{\widehat{-h}(g)}=\frac{1}{|\mathcal {G}|}\overline{\widehat{h}(-g)}\\&=\frac{1}{|\mathcal {G}|}\widehat{h}(g) =\frac{1}{|\mathcal {G}|}K_{h,g} = \overline{(K^{-1})_{g,h}} \end{aligned}$$
for all \(g,h\in \mathcal {G}\). Here we use Lemma 1 and the definition of the discrete Fourier transformation matrix. If \(-h=h\), then \((K^{-1})_{g,h}=(K^{-1})_{g,-h}=\overline{(K^{-1})_{g,h}}\), and hence \((K^{-1})_{g,h}=\text {Re}((K^{-1})_{g,h})\). If \(-h \ne h\), then \(\check{f}(h)=\check{f}(-h)\) by Lemma 2. Hence
$$\begin{aligned} {\begin{matrix} (\check{f}(h))^{(K^{-1})_{g,h}} (\check{f}(-h))^{(K^{-1})_{g,-h}} &{}= (\check{f}(h))^{(K^{-1})_{g,h}} (\check{f}(h))^{\overline{(K^{-1})_{g,h}}}\\ &{}=(\check{f}(h))^{2\text {Re}((K^{-1})_{g,h})}\\ &{}=(\check{f}(h))^{\text {Re}((K^{-1})_{g,h})} (\check{f}(-h))^{\text {Re}((K^{-1})_{g,-h})}. \end{matrix}} \end{aligned}$$
(4.3)
We replace \(K^{-1}\) by \(1/|\mathcal {G}|\cdot \overline{K}\) and take both sides of the resulting inequality to the power \(|\mathcal {G}|\). Finally, making the substitution \(\lambda _{{h}}=\check{f}(h)\) gives the desired characterization.
For \(\mathcal {G}\) cyclic, Theorem 1 has been independently proven by Baake and Sumner in the context of circulant matrices (Baake and Sumner 2020, Theorem 5.7). Moreover, they show that every embeddable circulant matrix is circulant embeddable (Baake and Sumner 2020, Corollary 5.2).
It follows from Lemma 3 that if a \((\mathcal {G},L)\)-Markov matrix P is \((\mathcal {G},L)\)-embeddable, then there exists a unique \((\mathcal {G},L)\)-rate matrix Q such that \(P=\exp (Q)\). Indeed, since Q and P have both real eigenvalues and the eigenvalues of P are exponentials of eigenvalues of Q, then the eigenvalues of Q are uniquely determined by the eigenvalues of P. Then the \((\mathcal {G},L)\)-rate matrix Q is the principal logarithm of P.
The inequalities \(\lambda _g >0\) in Theorem 1 imply \(\det (P)=\prod \lambda _{g}>0\). Hence the set of \((\mathcal {G},L)\)-embeddable matrices for a symmetric group-based model is a relatively closed subset of a connected component of the complement of \(\det (P)=0\). A relatively closed subset means here a set that can be written as the intersection of a closed subset of \(\mathbb {R}^{\mathcal {G} \times \mathcal {G}}\) and the connected component of the complement of \(\det (P)=0\).
In the rest of the current section and in Sect. 5, we will discuss applications of Theorem 1. We will recover known results about \((\mathcal {G},L)\)-embeddability and as a novel application characterize embeddability for three group-based models of hachimoji DNA.
Example 4
The CFN model is the group-based model associated to the group \(\mathbb {Z}_2\). The CFN Markov matrices have the form
$$\begin{aligned} P=\begin{pmatrix} a&{}\quad b\\ b&{}\quad a\\ \end{pmatrix}. \end{aligned}$$
The discrete Fourier transform matrix is
$$\begin{aligned} K=\begin{pmatrix} 1 &{}\quad 1 \\ 1 &{}\quad -\,1 \end{pmatrix}. \end{aligned}$$
The eigenvalues of P are \(\lambda _0=a+b=1\) and \(\lambda _1=a-b\). By Theorem 1, the Markov matrix P is CFN embeddable if and only if \(0 < \lambda _1 \le 1\) or equivalently \(0 < a-b \le 1\). This is equivalent to P satisfying \(\det (P)>0\), or equivalently \(\text {tr}(P)>1\). The result that a general \(2\times 2\) stochastic matrix is embeddable if and only if \(\det (P)>0\) or \(\text {tr}(P)>1\) goes back to Kingman (1962, Proposition 2). Hence P is CFN embeddable if and only if it is embeddable.
Example 5
Recall that the Kimura 3-parameter model is the group-based model associated to group \(\mathcal {G}=\mathbb {Z}_2 \times \mathbb {Z}_2\) and a K3P Markov matrix P has the form (2.1). The eigenvalues of P are
$$\begin{aligned}&\lambda _{(0,0)}=a+b+c+d, \lambda _{(0,1)}=a-b+c-d, \lambda _{(1,0)}\\&=a+b-c-d, \lambda _{(1,1)}=a-b-c+d. \end{aligned}$$
By Theorem 1, a Markov matrix P is K3P embeddable if and only if
$$\begin{aligned} \begin{aligned}&\lambda _{(0,0)}=1,\lambda _{(0,1)}>0,\lambda _{(1,0)}>0,\lambda _{(1,1)}>0,\\&\lambda _{(0,1)} \ge \lambda _{(1,0)}\lambda _{(1,1)}, \lambda _{(1,0)} \ge \lambda _{(0,1)}\lambda _{(1,1)}, \lambda _{(1,1)} \ge \lambda _{(0,1)}\lambda _{(1,0)}. \end{aligned} \end{aligned}$$
(4.4)
In the Kimura 2-parameter model \(b=c\) and \(\lambda _{(0,1)}=\lambda _{(1,0)}\). We get the conditions for the K2P embeddability by setting \(\lambda _{(0,1)}=\lambda _{(1,0)}\) in (4.4). Hence a K2P Markov matrix is K2P embeddable if and only if
$$\begin{aligned} \lambda _{(0,0)}=1,\lambda _{(0,1)}>0,1 \ge \lambda _{(1,1)}\ge \lambda _{(0,1)}^2. \end{aligned}$$
In the Jukes–Cantor model \(b=c=d\) and \(\lambda _{(0,1)}=\lambda _{(1,0)}=\lambda _{(1,1)}\). A JC Markov matrix is JC embeddable if and only if
$$\begin{aligned} \lambda _{(0,0)}=1,1 \ge \lambda _{(0,1)}>0. \end{aligned}$$
The K3P embeddability of a K3P Markov matrix with no repeated eigenvalues is equivalent to the embeddability of the matrix. Similarly, the JC embeddability of a JC Markov matrix is equivalent to the embeddability of the matrix. The same is not true for K2P Markov matrices with exactly two coinciding eigenvalues. See Roca-Lacostena and Fernández-Sánchez (2018, Section 3) for similar computations and further discussion on the model embeddability of K3P, K2P, and JC Markov matrices.
Remark 3
By Kingman (1962, Corollary on page 18), the map from rate matrices to transition matrices is locally homeomorphic except possibly when the rate matrix has a pair of eigenvalues differing by a non-zero multiple of \(2 \pi i\). Since for symmetric group-based models rate matrices are real symmetric, then all their eigenvalues are real and hence the map from rate matrices to transition matrices is a homeomorphism. Therefore the boundaries of embeddable transition matrices of symmetric group-based models are images of the boundaries of the rate matrices. For general Markov model, the boundaries of embeddable transition matrices are characterized in Kingman (Kingman 1962, Propositions 5 and 6).
Corollary 1
A \((\mathcal {G},L)\)-embeddable transition matrix lies on the boundary of the set of \((\mathcal {G},L)\)-embeddable transition matrices for a symmetric group-based model if and only if it satisfies at least one of the inequalities in Theorem 1 with equality.