In this section, we prove Lemmas 2.2 and 2.4. The proofs are based on analyzing stochastic difference equations satisfied by the Fourier transform of the proportion vector or matrix.
The Fourier transform for G
We first establish some notation and preliminaries for the Fourier transform. Let \(G^*\) be a complete set of non-trivial irreducible representations of G. In other words, for each \(\rho \in G^*\), we have a finite dimensional complex vector space \(V_\rho \) such that \(\rho : G \rightarrow GL(V_\rho )\) is a non-trivial irreducible representation, and any non-trivial irreducible representation of G is isomorphic to some unique \(\rho \in G^*\). Moreover, we may equip each \(V_\rho \) with an inner product for which \(\rho \in G^*\) is unitary.
For a configuration \(\sigma \in {{\mathcal {S}}}\) and for each \(\rho \in G^*\), we consider the matrix acting on \(V_\rho \) given by
$$\begin{aligned} x_\rho (\sigma ) := \sum _{a \in G} \frac{n_a(\sigma )}{n}\rho (a), \end{aligned}$$
so that \(x_\rho (\sigma )\) is the Fourier transform of the proportion vector at the representation \(\rho \). We write \(x(\sigma ):=(x_\rho (\sigma ))_{\rho \in G^*}\).
Let \(\widetilde{V} := \bigoplus _{\rho \in G^*}\mathrm{End}_{{\mathbb {C}}}(V_\rho )\), and write \(d_\rho := \dim _{{\mathbb {C}}}V_\rho \). For an element \(x = (x_\rho )_{\rho \in G^*} \in \widetilde{V}\), we define a norm \(\Vert \cdot \Vert _{\widetilde{V}}\) given by
$$\begin{aligned} \Vert x\Vert _{\widetilde{V}}^2 := \frac{1}{{{\mathcal {Q}}}}\sum _{\rho \in G^*}d_\rho \Vert x_\rho \Vert _{\mathrm{HS}}^2, \end{aligned}$$
where \(\langle A, B \rangle _{\mathrm{HS}} =\mathrm{Tr}\,(A^*B)\) denotes the Hilbert–Schmidt inner product in \(\mathrm{End}_{{\mathbb {C}}}(V_\rho )\) and \(\Vert \cdot \Vert _{\mathrm{HS}}\) denotes the corresponding norm. (Note that \(\langle \cdot , \cdot \rangle _{\mathrm{HS}}\) and \(\Vert \cdot \Vert _{\mathrm{HS}}\) depend on \(\rho \), but for sake of brevity, we omit the \(\rho \) when there is no danger of confusion.)
The Peter–Weyl theorem [7, Chapter 2] says that
$$\begin{aligned} L^2(G) \cong {{\mathbb {C}}}\oplus \widetilde{V}, \end{aligned}$$
where the isomorphism is given by the Fourier transform. The Plancherel formula then reads
$$\begin{aligned} \Vert x(\sigma )\Vert _{\widetilde{V}}^2 = \left\| \left( \frac{n_a(\sigma )}{n}\right) _{a \in G} - \left( \frac{1}{{{\mathcal {Q}}}}\right) _{a \in G}\right\| ^2. \end{aligned}$$
(6)
Thus, in order to show that \(\sigma \in {{\mathcal {S}}}_*\left( \delta \right) \), it suffices to show that \(\Vert x(\sigma )\Vert _{\widetilde{V}}\) is small. A similar argument may be applied to the proportion matrix instead of the proportion vector.
Finally, for an element \(A \in \mathrm{End}_{{\mathbb {C}}}(V_\rho )\), we will at times also consider the operator norm\(\Vert A\Vert _{op} := \sup _{v \in V_\rho , v \ne 0} \Vert Av\Vert / \Vert v\Vert \). We will also sometimes use the following (equivalent) variational characterization of the operator norm:
$$\begin{aligned} \sup _{\begin{array}{c} X \in \mathrm{End}_{{\mathbb {C}}}(V_\rho ) \\ \Vert X\Vert _{\mathrm{HS}} = 1 \end{array}} \Vert XA\Vert ^2_{\mathrm{HS}}&= \sup _{\begin{array}{c} X \in \mathrm{End}_{{\mathbb {C}}}(V_\rho ) \\ \Vert X\Vert _{\mathrm{HS}} = 1 \end{array}} \mathrm{Tr}\,(XAA^*X^*) = \sup _{\begin{array}{c} X \in \mathrm{End}_{{\mathbb {C}}}(V_\rho ) \\ \Vert X\Vert _{\mathrm{HS}} = 1 \end{array}} \mathrm{Tr}\,(X^*XAA^*) \\&= \sup _{\begin{array}{c} Y \in \mathrm{End}_{{\mathbb {C}}}(V_\rho ) \\ Y = Y^*, \;\; \mathrm{Tr}\,Y = 1 \end{array}} \langle Y , AA^* \rangle _{\mathrm{HS}} = \Vert AA^*\Vert _{op} = \Vert A\Vert _{op}^2. \end{aligned}$$
The special case of \(G = {{\mathbb {Z}}}/q\)
On a first reading of this section, the reader may wish to consider everything for the special case of \(G = {{\mathbb {Z}}}/q\) for some integer \(q \ge 2\). In that case, each representation is one-dimensional, and the representations can be indexed by \(\ell = 0, 1, 2, \ldots , q - 1\). The Fourier transform is then particularly simple: the coefficients are scalar values
$$\begin{aligned} x_\ell (\sigma ) = \sum _{a = 0}^{q - 1} \frac{n_a(\sigma )}{n} \omega ^{a \ell }, \end{aligned}$$
where \(\omega := e^{\frac{2\pi i}{q}}\) is a primitive q-th root of unity.
This special case already illustrates most of the main ideas while simplifying the estimates in some places (e.g. matrix inequalities we use will often be immediately obvious for scalars).
A stochastic difference equation for the \(n_a\)
For \(a \in G\), we next analyze the behavior of \(n_a(\sigma _t)\) over time. For convenience, we write \(n_a(t) = n_a(\sigma _t)\). Let \({{\mathcal {F}}}_t\) denote the \(\sigma \)-field generated by the Markov chain \((\sigma _t)_{t \ge 0}\) up to time t. Then, our dynamics satisfy the equation
$$\begin{aligned} {{\mathbb {E}}}[n_a(t+1)-n_a(t) \mid {{\mathcal {F}}}_t] = \sum _{b \in G} \frac{n_{ab^{-1}}(t) n_b(t) }{2n(n-1)}+\sum _{b \in G} \frac{n_{ab}(t) n_b(t)}{2n(n-1)} - \frac{n_a(t)}{n}. \end{aligned}$$
(7)
Note that \(|n_a(t + 1) - n_a(t)| \le 1\) almost surely. Thus, for each \(a \in G\), we can write the above as a stochastic difference equation
$$\begin{aligned} n_a(t+1) - n_a(t)= & {} \sum _{b \in G} \frac{n_{ab^{-1}}(t) n_b(t)}{2n(n-1)}+\sum _{b \in G} \frac{n_{ab}(t) n_b(t)}{2n(n-1)} \nonumber \\&- \frac{n_a(t)}{n} + M_a(t+1), \end{aligned}$$
(8)
where \({{\mathbb {E}}}[M_a(t+1) \mid {{\mathcal {F}}}_t] = 0\) and \(|M_a(t)| \le 2\) almost surely.
It is easiest to analyze this equation through the Fourier transform. Writing \(x_\rho (t) = x_\rho (\sigma _t)\), we calculate from (8) that
$$\begin{aligned} x_\rho (t+1) - x_\rho (t) = \frac{1}{n-1}x_\rho (t) \left( \frac{x_\rho (t) + x_\rho (t)^*}{2} -\frac{n-1}{n}\right) + \widehat{M}_\rho (t+1), \end{aligned}$$
where \(\widehat{M}_\rho (t) := \frac{1}{n}\sum _{a \in G}M_a(t) \rho (a)\). For convenience, write
$$\begin{aligned} X_\rho (t) = \frac{1}{n - 1}\left( \frac{x_\rho (t) + x_\rho (t)^*}{2} - \frac{n-1}{n}\right) , \end{aligned}$$
so that our equation becomes
$$\begin{aligned} x_\rho (t+1) - x_\rho (t) = x_\rho (t) X_\rho (t) + \widehat{M}_\rho (t+1). \end{aligned}$$
(9)
Note that we have
$$\begin{aligned} \Vert x_\rho (t)\Vert _{\mathrm{HS}} \le \sqrt{d_\rho }, \qquad {{\mathbb {E}}}[\widehat{M}_\rho (t+1) \mid {{\mathcal {F}}}_t] = 0, \qquad \text {and}\qquad \Vert \widehat{M}_\rho (t)\Vert _{\mathrm{HS}} \le \frac{2 {{\mathcal {Q}}}\sqrt{d_\rho }}{n}, \end{aligned}$$
and thus,
$$\begin{aligned} \Vert x(t)\Vert _{\widetilde{V}} \le 1 \qquad \text {and}\qquad \Vert \widehat{M}(t)\Vert _{\widetilde{V}}\le \frac{2{{\mathcal {Q}}}}{n}, \end{aligned}$$
where \(\widehat{M}=(\widehat{M}_\rho )_{\rho \in G^*}\).
A general estimate for stochastic difference equations
Before proving Lemma 2.2, we also need a technical lemma for controlling the behavior of stochastic difference equations, which will be used to analyze (9) as well as other similar equations.
Lemma 3.1
Let \((z(t))_{t \ge 0}\) be a sequence of [0, 1]-valued random variables adapted to a filtration \(({{\mathcal {F}}}_t)_{t \ge 0}\). Let \(\varepsilon \in (0, 1)\) be a small constant, and let \(\varphi : {{\mathbb {R}}}^+ \rightarrow (0,1]\) be a non-decreasing function.
Suppose that there are \({{\mathcal {F}}}_t\)-measurable random variables M(t) for which
$$\begin{aligned} z(t+1) - z(t) \le -\varepsilon \varphi (t+1) z(t) + M(t+1) \end{aligned}$$
(10)
and which, for some constant D, satisfy the bounds
$$\begin{aligned} {{\mathbb {E}}}[ M(t+1) \mid {{\mathcal {F}}}_t ] \le D\varepsilon \sqrt{\varepsilon }, \qquad |M(t)| \le D\varepsilon . \end{aligned}$$
Then, for each t and each \(\lambda > 0\), we have
$$\begin{aligned} {{\mathbb {P}}}\left( z(t) \ge \lambda \sqrt{\varepsilon } + e^{- \varepsilon \int _0^t \varphi (s)\,ds} \cdot z(0) \right) \le C_{D,\varphi } e^{-c_{D,\varphi } \lambda ^2} \end{aligned}$$
for constants \(c_{D,\varphi }, C_{D,\varphi }\) depending only on D and \(\varphi \).
Proof
Let us define for integers \(t \ge 1\),
$$\begin{aligned} \Phi (t) := \varepsilon ^{-1} \sum _{k = 1}^t \log \frac{1}{1-\varepsilon \varphi (k)}, \qquad \text {and} \qquad \Phi (0) := 0. \end{aligned}$$
Taking conditional expectations in the inequality relating \(z(t+1)\) to z(t), we have
$$\begin{aligned} {{\mathbb {E}}}[ z(t+1) \mid {{\mathcal {F}}}_t ] \le (1 - \varepsilon \varphi (t+1)) z(t) + D \varepsilon \sqrt{\varepsilon }. \end{aligned}$$
Rearranging and using the fact that \(\varphi (t)\) is non-decreasing, we have
$$\begin{aligned} {{\mathbb {E}}}[ z(t+1) \mid {{\mathcal {F}}}_t ] - \frac{D\sqrt{\varepsilon }}{\varphi (0)}&\le (1 - \varepsilon \varphi (t+1)) z(t) - \frac{D\sqrt{\varepsilon }(1 - \varepsilon \varphi (t+1))}{\varphi (0)} \\&\le (1 - \varepsilon \varphi (t+1)) \left( z(t) - \frac{D\sqrt{\varepsilon }}{\varphi (0)} \right) . \end{aligned}$$
Consequently,
$$\begin{aligned} Z_t := e^{\varepsilon \Phi (t)} \left( z(t) - \frac{D\sqrt{\varepsilon }}{\varphi (0)} \right) \end{aligned}$$
is a supermartingale, and its increments are bounded by
$$\begin{aligned} |Z_{t+1}-Z_t| \le e^{\varepsilon \Phi (t+1)}\left( |M(t+1)|+D \varepsilon \right) \le 2D\varepsilon e^{\varepsilon \Phi (t+1)}. \end{aligned}$$
(11)
Recall that \(\varphi \) is non-decreasing, so that for all \(t \ge s \ge 0\), we have
$$\begin{aligned} \Phi (t) = \Phi (s) + \varepsilon ^{-1} \sum _{k = s + 1}^t \log \frac{1}{1 - \varepsilon \varphi (k)} \ge \Phi (s) + (t - s) \varphi (0). \end{aligned}$$
Using this with (11), we see that the sum of the squares of the first t increments is at most
$$\begin{aligned} \sum _{s = 1}^{t} 4D^2 \varepsilon ^2 e^{2\varepsilon \Phi (s)}&\le 4D^2\varepsilon ^2 \sum _{s = 1}^t e^{2\varepsilon \Phi (t) - 2\varepsilon \varphi (0)(t - s)} \le 4D^2\varepsilon ^2 e^{2\varepsilon \Phi (t)} \cdot \frac{1}{1 - e^{-2\varepsilon \varphi (0)}} \\&\le 4D^2\varepsilon ^2 e^{2\varepsilon \Phi (t)} \cdot \frac{1}{1 - (1 - \frac{1}{2}\varepsilon \varphi (0))} = \frac{8D^2 \varepsilon }{\varphi (0)} \cdot e^{2\varepsilon \Phi (t)}. \end{aligned}$$
By the Azuma–Hoeffding inequality, this yields
$$\begin{aligned} {{\mathbb {P}}}\left( Z_t \ge \lambda \sqrt{\varepsilon } e^{\varepsilon \Phi (t)} + Z_0 \right) \le \exp \left( - \frac{\varphi (0) \lambda ^2 \varepsilon \cdot e^{2\varepsilon \Phi (t)}}{16D^2 \varepsilon \cdot e^{2\varepsilon \Phi (t)}} \right) = \exp \left( -\frac{\varphi (0) \lambda ^2}{16D^2} \right) , \end{aligned}$$
which in turn implies
$$\begin{aligned} {{\mathbb {P}}}\left( z(t) \ge \frac{D\sqrt{\varepsilon }}{\varphi (0)} + e^{-\varepsilon \Phi (t)} z(0) + \lambda \sqrt{\varepsilon } \right) \le \exp \left( -\frac{\varphi (0)\lambda ^2}{16D^2} \right) . \end{aligned}$$
Finally, observe that \(\Phi (t) \ge \sum _{k = 1}^t \varphi (k) \ge \int _0^t \varphi (s)\, ds\). The result then follows upon shifting and rescaling of \(\lambda \). \(\square \)
Proportion vector chain: Proof of Lemma 2.2
We first prove a bound for the Fourier coefficients \(x_\rho (t)\).
Lemma 3.2
Consider any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \) and any \(\rho \in G^*\). We have a constant \(c_G\) depending only on G for which
$$\begin{aligned} {{\mathbb {P}}}_\sigma \left( \bigcup _{t = 1}^{n^2} \left\{ \Vert x_\rho (t)\Vert _{\mathrm{HS}} \ge \frac{1}{n^{1/8}} + e^{-c_G t/n}\cdot \Vert x_\rho (0)\Vert _{\mathrm{HS}} \right\} \right) \le \frac{1}{n^3}. \end{aligned}$$
for all large enough n.
This immediately implies Lemma 2.2.
Proof of Lemma 2.2
With \(c_G\) defined as in Lemma 3.2, take \(C_{G, \delta }\) large enough so that for any \(T \ge C_{G, \delta } n\),
$$\begin{aligned} \frac{1}{n^{1/8}} + e^{-c_G T/n}\sqrt{d_\rho } \le \delta . \end{aligned}$$
Then, Lemma 3.2 and Plancherel’s formula yield
$$\begin{aligned} {{\mathbb {P}}}_\sigma \left( \sigma _T \notin {{\mathcal {S}}}_*\left( \delta \right) \right)&\le {{\mathbb {P}}}_\sigma \left( \Vert x_\rho (T)\Vert _{\mathrm{HS}} \ge \delta \hbox { for some}\ \rho \in G^*\right) \\&\le \frac{{{\mathcal {Q}}}}{n^3} \le \frac{1}{n}, \end{aligned}$$
for large enough n, as desired. \(\square \)
We are now left with proving Lemma 3.2, which relies on the following bound on the operator norm.
Lemma 3.3
There exists a positive constant \(\gamma _G\) depending on G such that for any \(\rho \in G^*\) and any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \),
$$\begin{aligned} \Vert I_{d_\rho }+X_\rho (\sigma )\Vert _{op} \le 1-\frac{\gamma _G}{n}. \end{aligned}$$
Proof
Let \(\Delta _G\) denote the set of all probability distributions on G, and for \(c \in (0, 1)\), let \(\Delta _G(c) \subset \Delta _G\) denote the set of all probability distributions \(\mu \) such that \(\mu (H) \le 1 - c\) for all proper subgroups \(H \subset G\).
Consider a representation \(\rho \in G^*\), and consider the function \(h : \Delta _G(1/6) \rightarrow \mathrm{End}_{{\mathbb {C}}}(V_\rho )\) given by
$$\begin{aligned} h(\mu ) = \sum _{a \in G} \mu (a) \frac{\rho (a)+\rho (a)^*}{2}. \end{aligned}$$
Then, \(h(\mu )\) is hermitian, and since \(\rho \) is unitary, we clearly have
$$\begin{aligned} \lambda (\mu ):=\max _{v \in V_\rho , \Vert v\Vert =1}\langle h(\mu )v, v\rangle \le 1. \end{aligned}$$
We claim that \(\lambda (\mu ) < 1\) for each \(\mu \in \Delta _G(c)\). Indeed, suppose the contrary. Then, there exists a non-zero vector \(v \in V_\rho \) such that \(\mathrm{Re}\langle \rho (a)v, v \rangle =1\) for all \(a \in G\) with \(\mu (a)>0\). This implies that the support of \(\mu \) is included in the subgroup
$$\begin{aligned} H=\{a \in G \ : \ \rho (a)v=v\}. \end{aligned}$$
Since \(\rho \) is a (non-trivial) irreducible representation, H is a proper subgroup of G, and \(\mu (H)=1\), contradicting the assumption that \(\mu \in \Delta _G(c)\).
Note that \(\mu \mapsto \lambda (\mu )\) is continuous. We may define
$$\begin{aligned} \gamma _\rho :=\max _{\mu \in \Delta _G(1/6)}\lambda (\mu )< 1 \qquad \text {and}\qquad {{\tilde{\gamma }}}_G:=\max _{\rho \in G^*}\gamma _\rho <1. \end{aligned}$$
Then, we have for any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \),
$$\begin{aligned} \frac{x_\rho (\sigma ) + x_\rho (\sigma )^*}{2} = \sum _{a \in G} \frac{n_a(\sigma )}{n}\frac{\rho (a)+\rho (a)^*}{2} \preceq {{\tilde{\gamma }}}_G I_{d_\rho }. \end{aligned}$$
Taking \(0<\gamma _G < 1-{{\tilde{\gamma }}}_G\), and plugging this into the definition of \(X_\rho \) gives \(X_\rho (\sigma ) \preceq -\frac{\gamma _G}{n}I_{d_\rho }\). Note that \(X_\rho (\sigma ) \succeq -\frac{2}{n-1}I_{d_\rho }\). Combining these together gives the result. \(\square \)
Remark 3.4
A much more direct approach is possible in the case \(G = {{\mathbb {Z}}}/q\). The condition \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \) implies that \(n_0(\sigma ) \le \frac{5}{6}\). Then, we have
$$\begin{aligned} \mathrm{Re}x_\ell (\sigma ) := \mathrm{Re}\sum _{a = 0}^{q - 1} \frac{n_a(\sigma )}{n} \omega ^{a \ell } \le \frac{5}{6} + \frac{1}{6} \max _{1 \le a \le q - 1} \mathrm{Re}\omega ^{a \ell } = \frac{5}{6} + \frac{1}{6} \cos \frac{2\pi }{q} < 1 - \gamma _G \end{aligned}$$
for some positive \(\gamma _G\). Some rearranging of equations then yields the desired result.
Proof of Lemma 3.2
Fix \(\rho \in G^*\). Let \({{\mathcal {G}}}_t\) denote the event where for all \(0 \le s \le t\), we have \(\Vert I_{d_\rho }+X_\rho (s)\Vert _{op} \le 1 - \frac{\gamma _G}{n}\), where \(\gamma _G\) is taken as in Lemma 3.3. Since our chain starts at \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \), Lemmas 2.1 and 3.3 together imply that
$$\begin{aligned} {{\mathbb {P}}}_\sigma ({{\mathcal {G}}}_{n^2}^{{\textsf {c}}}) \le C_G n^2 e^{-n/10}. \end{aligned}$$
Next, we turn to (9). Rearranging (9) and squaring, we have
$$\begin{aligned} \Vert x_\rho (t+1)\Vert _{\mathrm{HS}}^2&= \Vert x_\rho (t)(I_{d_\rho } + X_\rho (t))\Vert _{\mathrm{HS}}^2 + \Vert \widehat{M}_\rho (t+1)\Vert _{\mathrm{HS}}^2 \nonumber \\&\quad + 2\mathrm{Re}\langle x_\rho (t)(I_{d_\rho } + X_\rho (t)), \widehat{M}_\rho (t+1) \rangle _{\mathrm{HS}} \end{aligned}$$
(12)
Let \(z_t := \mathbf{1}_{{{\mathcal {G}}}_t} \Vert x_\rho (t)\Vert _{\mathrm{HS}}^2\) and
$$\begin{aligned} M'(t+1) := \Vert \widehat{M}_\rho (t+1)\Vert _{\mathrm{HS}}^2 + 2\mathrm{Re}\langle x_\rho (t)(I_{d_\rho } + X_\rho (t)), \widehat{M}_\rho (t+1) \rangle _{\mathrm{HS}}. \end{aligned}$$
Substituting into (12), we obtain
$$\begin{aligned} z_{t+1} \le \Vert I_{d_\rho } + X_\rho (t)\Vert _{op}^2 \cdot z_t + \mathbf{1}_{{{\mathcal {G}}}_t} M'(t+1) \le \left( 1 - \frac{\gamma _G}{n}\right) ^2 z_t + \mathbf{1}_{{{\mathcal {G}}}_t} M'(t+1). \end{aligned}$$
Note that we have the bounds
$$\begin{aligned} {{\mathbb {E}}}[ M'(t+1) \mid {{\mathcal {F}}}_t ]= & {} {{\mathbb {E}}}[ \Vert \widehat{M}_\rho (t+1)\Vert _{\mathrm{HS}}^2 \mid {{\mathcal {F}}}_t ] \le \frac{4{{\mathcal {Q}}}^2d_\rho }{n^2} \\ |M'(t+1)|\le & {} \Vert \widehat{M}_\rho (t+1)\Vert _{\mathrm{HS}}^2 + 2\sqrt{d_\rho }\left( 1+\frac{1}{n(n-1)}\right) \Vert \widehat{M}_\rho (t+1)\Vert _{\mathrm{HS}}\\\le & {} \frac{6{{\mathcal {Q}}}^2 d_\rho }{n}. \end{aligned}$$
We now apply Lemma 3.1 with \(\varepsilon = \frac{1}{n}\), \(\varphi (t) = \gamma _G\), \(D = 6{{\mathcal {Q}}}^2 d_\rho \), and \(\lambda = n^{1/4}\). This yields
$$\begin{aligned} {{\mathbb {P}}}\left( z_t \ge n^{-1/4} + e^{-\gamma _G t/n}\cdot z_0 \right) \le C'_G e^{-c'_G \sqrt{n}}. \end{aligned}$$
Consequently,
$$\begin{aligned} {{\mathbb {P}}}_\sigma \left( \Vert x_\rho (t)\Vert _{\mathrm{HS}} \ge n^{-1/8} + e^{-\gamma _G t/2n} \cdot \Vert x_\rho (0)\Vert _{\mathrm{HS}} \right) \le C'_G e^{-c'_G \sqrt{n}} + C_G n^2 e^{-n/10}. \end{aligned}$$
The lemma with \(c_G = \gamma _G/2\) then follows from union bounding over all \(1 \le t \le n^2\) and taking n sufficiently large. \(\square \)
Proportion matrix chain: Proof of Lemma 2.4
We carry out a similar albeit more refined strategy to analyze the proportion matrix. Throughout this section, we assume our Markov chain \((\sigma _t)_{t \ge 0}\) starts at an initial state \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \). We again write \(n_a(t)=n_a(\sigma _t)\) and \(n_{a, b}(t)=n^{\sigma _*}_{a, b}(\sigma _t)\), and similar to before, the \(n_{a, b}(t)\) satisfy the difference equation
$$\begin{aligned} n_{a, b}(t+1)-n_{a, b}(t)= & {} \sum _{c \in G}\frac{n_{a, bc^{-1}}(t)n_c(t)}{2n(n-1)}+\sum _{c \in G}\frac{n_{a, bc}(t)n_c(t)}{2n(n-1)} \nonumber \\&- \frac{n_{a, b}(t)}{n} + M_{a, b}(t+1), \end{aligned}$$
(13)
where \({{\mathbb {E}}}[M_{a, b}(t+1) \mid {{\mathcal {F}}}_t]=0\) and \(|M_{a, b}(t)| \le 2\) for all \(t \ge 0\).
We can again analyze this equation via the Fourier transform. In this case, for each \(a \in G\), we take the Fourier transform of \(\left( n_{a, b}(t)/n_a(\sigma _*)\right) _{b \in G}\). For \(\rho \in G^*\), let
$$\begin{aligned} y_{a,\rho }(t) = y_{a,\rho }^{\sigma _*}(t) := \sum _{b \in G}\frac{n_{a, b}(t)}{n_a(\sigma _*)}\rho (b) \end{aligned}$$
denote the Fourier coefficient at \(\rho \). Let \(\widehat{M}_{a, \rho }(t) := \frac{1}{n_a(\sigma _*)}\sum _{b \in G}M_{a, b}(t)\rho (b)\). Then, (13) becomes
$$\begin{aligned} y_{a, \rho }(t+1) - y_{a, \rho }(t) = y_{a, \rho }(t) X_\rho (t) + \widehat{M}_{a, \rho }(t+1). \end{aligned}$$
(14)
Note that \({{\mathbb {E}}}_\sigma [\widehat{M}_{a, \rho }(t+1) \mid {{\mathcal {F}}}_t]=0\). Also, since we assumed \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \), it follows that \(\frac{n_a(\sigma _*)}{n} \ge \frac{1}{2{{\mathcal {Q}}}}\). Thus, we also know \(\Vert \widehat{M}_{a, \rho }(t+1)\Vert _{\mathrm{HS}} \le \frac{4{{\mathcal {Q}}}^2\sqrt{d_\rho }}{n}\).
Again, our main step is a bound on the Fourier coefficients \(y_{a, \rho }(t)\), which will also be useful later in proving Lemma 2.5.
Lemma 3.5
Consider any \(\sigma _*, \sigma '_*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \). There exist constants \(c_G, C_G > 0\) depending only on G such that for all large enough n, we have
$$\begin{aligned} {{\mathbb {P}}}_{\sigma '_*}\left( \Vert y^{\sigma _*}_{a, \rho }(t)\Vert _{\mathrm{HS}} \ge R \left( \frac{1}{\sqrt{n}} + e^{-t/n} \Vert y^{\sigma _*}_{a, \rho }(0)\Vert _{\mathrm{HS}} \right) \right) \le e^{-\Omega _G(R^2) + O_G(1)} + \frac{2}{n^3} \end{aligned}$$
for all t and \(R > 0\).
The above lemma directly implies Lemma 2.4.
Proof of Lemma 2.4
We apply Lemma 3.5 to each \(a \in G\) and \(\rho \in G^*\). Recall that \(T = \left\lceil \frac{1}{2} n \log n \right\rceil \), so that
$$\begin{aligned} \frac{1}{\sqrt{n}} + e^{-T/n} \Vert y^{\sigma _*}_{a, \rho }(0)\Vert _{\mathrm{HS}} \le \frac{2\sqrt{d_\rho }}{\sqrt{n}}. \end{aligned}$$
Then, Lemma 3.5 implies
$$\begin{aligned} {{\mathbb {P}}}_{\sigma '_*}\left( \Vert y^{\sigma _*}_{a, \rho }(T)\Vert _{\mathrm{HS}} \ge \frac{R}{\sqrt{n}} \right) \le e^{-\Omega _G(R^2) + O_G(1)} + \frac{2}{n^3}. \end{aligned}$$
Union bounding over all \(a \in G\) and \(\rho \in G^*\) and using the Plancherel formula, this yields
$$\begin{aligned} {{\mathbb {P}}}_{\sigma '_*}\left( \sigma _*\not \in {{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \right)&\le {{\mathbb {P}}}_{\sigma '_*}\left( \max _{a, \rho } \Vert y^{\sigma _*}_{a, \rho }(T)\Vert _{\mathrm{HS}} \ge \frac{R}{\sqrt{n}} \right) \\&\le e^{-\Omega _G(R^2) + O_G(1)} + \frac{2 {{\mathcal {Q}}}^2}{n^3} \le C_G e^{-R} + \frac{1}{n} \end{aligned}$$
for sufficiently large \(C_G\) and n. \(\square \)
We now prove Lemma 3.5. Before proceeding with the main proof, we need the following routine estimate as a preliminary lemma.
Lemma 3.6
Let \(\theta _n : {{\mathbb {R}}}^d \rightarrow {{\mathbb {R}}}^+\) be the function given by \(\theta _n(x) = \Vert x\Vert + \frac{1}{\sqrt{n}}e^{-\sqrt{n}\Vert x\Vert } - \frac{1}{\sqrt{n}}\). Then, we have the inequalities
$$\begin{aligned} \Vert \nabla \theta _n(x)\Vert \le 1, \qquad \theta _n(x + h) \le \theta _n(x) + \langle h, \nabla \theta _n(x) \rangle + \frac{\sqrt{n}}{2} \Vert h\Vert ^2. \end{aligned}$$
Proof
We can write \(\theta _n(x) = f(\Vert x\Vert )\), where \(f(r) = r + \frac{1}{\sqrt{n}} e^{-\sqrt{n} r} - \frac{1}{\sqrt{n}}\). By spherical symmetry, we have
$$\begin{aligned} \Vert \nabla \theta _n(x)\Vert = f'(\Vert x\Vert ) = 1 - e^{-\sqrt{n}\Vert x\Vert } \le 1, \end{aligned}$$
which is the first inequality. Again by spherical symmetry, the eigenvalues of the Hessian \(\nabla ^2 \theta _n(x)\) can be directly computed to be \(f''(\Vert x\Vert )\) and \(f'(\Vert x\Vert ) / \Vert x\Vert \). But these are bounded by
$$\begin{aligned} f''(r) \le \sqrt{n} e^{-\sqrt{n}r} \le \sqrt{n}, \qquad f'(r)/r \le \frac{1 - e^{-\sqrt{n}r}}{r} \le \sqrt{n}. \end{aligned}$$
Thus, \(\nabla ^2 \theta _n(x) \preceq \sqrt{n} I\), and the second inequality follows from Taylor expansion. \(\square \)
Proof of Lemma 3.5
Let \(\gamma _G\) and \(c_G\) be the constants from Lemmas 3.3 and 3.2, respectively. Define the events
$$\begin{aligned} \begin{aligned} {{\mathcal {G}}}_t&:= \bigcap _{s = 0}^t \left\{ X_\rho (\sigma _s) \preceq -\frac{\gamma _G}{n} \right\} , \\ {{\mathcal {G}}}'_t&:= \bigcap _{s = 0}^t \left\{ X_\rho (\sigma _s) \preceq -\frac{1 - \sqrt{d_\rho } e^{-c_G s/n} - 2n^{-1/8}}{n} \right\} . \end{aligned} \end{aligned}$$
Note that \(\sigma '_*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \subseteq {{\mathcal {S}}}_{non}\left( 1/3 \right) \). Hence, by Lemmas 2.1 and 3.3, we have \({{\mathbb {P}}}({{\mathcal {G}}}^{{\textsf {c}}}_{n^2}) \le C_G n^2 e^{-n/10}\). We also have
$$\begin{aligned} X_\rho (s)&= \frac{1}{n - 1} \left( \frac{x_\rho (s)+x_\rho (s)^*}{2} - \frac{n - 1}{n}I_{d_\rho } \right) \preceq -\frac{1}{n}\left( 1 - \frac{n \Vert x_\rho (s)\Vert _{\mathrm{HS}}}{n - 1} \right) I_{d_\rho } \\&\preceq -\frac{1}{n}\left( 1 - \Vert x_\rho (s)\Vert _{\mathrm{HS}} - \frac{\sqrt{d_\rho }}{n-1} \right) I_{d_\rho }, \end{aligned}$$
where we have used the fact that \(\left\| \frac{x_\rho (s)+x_\rho (s)^*}{2}\right\| _{op} \le \Vert x_\rho (s)\Vert _{op} \le \Vert x_\rho (s)\Vert _{\mathrm{HS}}\).
Lemma 3.2 then implies that \({{\mathbb {P}}}({{\mathcal {G}}}'^{{\textsf {c}}}_{n^2}) \le \frac{1}{n^3}\). Thus, setting
$$\begin{aligned} \begin{aligned} \varphi (t)&:= \max (\gamma _G, 1 - \sqrt{d_\rho } e^{-c_G t/n} - 2 n^{-1/8}),\\ {{\mathcal {H}}}_t&:= {{\mathcal {G}}}_t \cap {{\mathcal {G}}}'_t = \bigcap _{s = 0}^t \left\{ X_\rho (\sigma _s) \preceq -\frac{\varphi (t)}{n} \right\} , \end{aligned} \end{aligned}$$
we conclude that
$$\begin{aligned} {{\mathbb {P}}}({{\mathcal {H}}}^{{\textsf {c}}}_{n^2}) \le {{\mathbb {P}}}({{\mathcal {G}}}^{{\textsf {c}}}_{n^2}) + {{\mathbb {P}}}({{\mathcal {G}}}'^{{\textsf {c}}}_{n^2}) \le \frac{2}{n^3} \end{aligned}$$
for all large enough n.
Next, we turn to (14) and apply \(\theta _n\) to both sides, where we identify \({{\mathbb {C}}}^{d_\rho ^2}\) with \({{\mathbb {R}}}^{2d_\rho ^2}\). Using Lemma 3.6 and taking the conditional expectation, we obtain
$$\begin{aligned} {{\mathbb {E}}}\left[ \theta _n\left( y_{a, \rho }(t+1) \right) \, \Big |\, {{\mathcal {F}}}_t \right]&\le \theta _n\left( y_{a, \rho }(t) (I_{d_\rho } + X_\rho (t))\right) + \frac{8 {{\mathcal {Q}}}^4 d_\rho }{n \sqrt{n}} \\&\le \theta _n(\Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \cdot y_{a, \rho }(t)) + \frac{8 {{\mathcal {Q}}}^4 d_\rho }{n \sqrt{n}} \\&\le \Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \cdot \theta _n(y_{a, \rho }(t)) + \frac{8 {{\mathcal {Q}}}^4 d_\rho }{n \sqrt{n}}, \end{aligned}$$
where the second inequality follows from the variational formula for operator norm (i.e. that \(\Vert BA\Vert _{\mathrm{HS}} \le \Vert A\Vert _{op} \Vert B\Vert _{\mathrm{HS}}\)), and the third inequality follows from the fact that \(\theta _n\) is convex with \(\theta _n(0) = 0\). Thus, we may write
$$\begin{aligned} \theta _n(y_{a, \rho }(t+1)) \le \Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \cdot \theta _n(y_{a, \rho }(t)) + M'(t+1) \end{aligned}$$
where
$$\begin{aligned} {{\mathbb {E}}}[ M'(t+1) \mid {{\mathcal {F}}}_t ] \le \frac{8 {{\mathcal {Q}}}^4 d_\rho }{n\sqrt{n}}, \qquad |M'(t+1)| \le \frac{8{{\mathcal {Q}}}^2 \sqrt{d_\rho }}{n}. \end{aligned}$$
Now, let \(z_t := \mathbf{1}_{{{\mathcal {H}}}_t} \theta _n(y_{a, \rho }(t))\), and note that since \(X_\rho (\sigma ) \succeq -\frac{2}{n-1} I_{d_\rho }\), we have \(\Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \le 1-\frac{\varphi (t)}{n}\) whenever \({{\mathcal {H}}}_t\) holds. Thus,
$$\begin{aligned} z_{t+1} \le \Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \cdot z_t + \mathbf{1}_{{{\mathcal {H}}}_t}M'(t+1) \le \left( 1 - \frac{1}{n} \varphi (t) \right) z_t + \mathbf{1}_{{{\mathcal {H}}}_t}M'(t+1). \end{aligned}$$
We may then apply Lemma 3.1 with \(\varepsilon = \frac{1}{n}\) and \(D = 8{{\mathcal {Q}}}^4 d_\rho \). Note that
$$\begin{aligned} \int _0^t \varphi (s) \,ds&\ge \left( 1 - 2 n^{-\frac{1}{8}}\right) t - \sqrt{d_\rho } \int _0^\infty e^{-\frac{c_G s}{n}} \,ds \ge t - O_G(n) \end{aligned}$$
for all large enough n. Thus, Lemma 3.1 implies that
$$\begin{aligned} {{\mathbb {P}}}\left( z_t \ge \frac{\lambda }{\sqrt{n}} + C_G e^{-t/n} \cdot z_0 \right) \le C'_G e^{-c'_G \lambda ^2}. \end{aligned}$$
(15)
Consequently,
$$\begin{aligned}&{{\mathbb {P}}}\left( \Vert y_{a, \rho }(t)\Vert _{\mathrm{HS}} \ge R\left( \frac{1}{\sqrt{n}} + e^{-\frac{t}{n}}\Vert y_{a, \rho }(0)\Vert _{\mathrm{HS}} \right) \right) \\&\quad \le {{\mathbb {P}}}\left( \theta _n(y_{a, \rho }(t)) \ge \frac{R - 1}{\sqrt{n}} + Re^{-\frac{t}{n}}\Vert y_{a, \rho }(0)\Vert _{\mathrm{HS}} \right) \\&\quad \le {{\mathbb {P}}}\left( \theta _n(y_{a, \rho }(t)) \ge \frac{R - 1}{\sqrt{n}} + Re^{-\frac{t}{n}}\theta _n(y_{a, \rho }(0)) \right) \\&\quad \le {{\mathbb {P}}}\left( z_t \ge \frac{R - 1}{\sqrt{n}} + Re^{-\frac{t}{n}}z_0 \right) + {{\mathbb {P}}}({{\mathcal {H}}}^{{\textsf {c}}}_{n^2}) \\&\quad \le e^{-\Omega _G(R^2) + O_G(1)} + \frac{2}{n^3}, \end{aligned}$$
as desired. \(\square \)