Abstract
We analyze a Markov chain, known as the product replacement chain, on the set of generating n-tuples of a fixed finite group G. We show that as \(n \rightarrow \infty \), the total-variation mixing time of the chain has a cutoff at time \(\frac{3}{2} n \log n\) with window of order n. This generalizes a result of Ben-Hamou and Peres (who established the result for \(G = {{\mathbb {Z}}}/2\)) and confirms a conjecture of Diaconis and Saloff-Coste that for an arbitrary but fixed finite group, the mixing time of the product replacement chain is \(O(n \log n)\).
1 Introduction
Let G be a finite group, and let \([n] := \{1, 2, \ldots , n\}\). We consider the set \(G^n\) of all functions \(\sigma : [n] \rightarrow G\) (or “configurations”). We may define a Markov chain \((\sigma _t)_{t\ge 0}\) on \(G^{n}\) as follows: if we have a current state \(\sigma \), then uniformly at random, choose an ordered pair (i, j) of distinct integers in [n], and change the value of \(\sigma (i)\) to \(\sigma (i) \sigma (j)^{\pm 1}\), where the signs are chosen with equal probability.
We will restrict the chain \((\sigma _t)_{t\ge 0}\) to the space of generating n-tuples, i.e. the set of \(\sigma \) whose values generate G as a group:
It is not hard to see that for fixed G and large enough n, the chain on \({{\mathcal {S}}}\) is irreducible (see [8, Lemma 3.2]). We will always assume n is large enough so that this irreducibility holds. Note that the chain is also symmetric, and it is aperiodic because it has holding on some states. Thus, the chain has a uniform stationary distribution \(\pi \) with \(\pi (\sigma )=1/|{{\mathcal {S}}}|\).
This Markov chain was first considered in the context of computational group theory—it models the product replacement algorithm for generating random elements of a finite group introduced in [6]. By running the chain for a long enough time t and choosing a uniformly random index \(k \in [n]\), the element \(\sigma _t(k)\) is a (nearly) uniformly random element of G. The product replacement algorithm has been found to perform well in practice [6, 10], but the question arises: how large does t need to be in order to ensure near uniformity?
One way of answering the question is to estimate the mixing time of the Markov chain. It was shown by Diaconis and Saloff-Coste that for any fixed finite group G, there exists a constant \(C_G\) such that the \(\ell ^2\)-mixing time is at most \(C_G n^2 \log n\) [8, 9] (see also Chung and Graham [3] for a simpler proof of this fact with a different value for \(C_G\)).
In another line of work, Lubotzky and Pak [12] analyzed the mixing of the product replacement chain in terms of Kazhdan constants (see also subsequent quantitative estimates for Kazhdan constants by Kassabov [11]). We also mention a result of Pak [14] which shows mixing in \(\text {polylog}(|G|)\) steps when \(n = \Theta (\log |G| \log \log |G|)\). The reader may consult the survey [15] for further background on the product replacement algorithm.
Diaconis and Saloff-Coste conjectured that the mixing time bound can be improved to \(C_G n \log n\) [9, Remark 2, Section 7, p. 290], based on the observation that at least \(n \log n\) steps are needed by the classical coupon-collector’s problem. This was confirmed in the case \(G = {{\mathbb {Z}}}/2\) by Chung and Graham [4] and recently refined by Ben-Hamou and Peres, who show that when \(G={{\mathbb {Z}}}/2\), the chain in fact exhibits a cutoff at time \(\frac{3}{2}n \log n\) in total-variation with window of order n [2].
In this paper, we extend the result of Ben-Hamou and Peres to all finite groups. Note that this also verifies the conjecture of Diaconis and Saloff-Coste for a fixed finite group. To state the result, let us denote the total variation distance between \({{\mathbb {P}}}_\sigma (\sigma _t \in \cdot \ )\) and \(\pi \) by
Theorem 1.1
Let G be a finite group. Then, the Markov chain \((\sigma _t)_{t \ge 0}\) on the set of generating n-tuples of G has a total-variation cutoff at time \(\frac{3}{2}n\log n\) with window of order n. More precisely, we have
and
1.1 A connection to cryptography
We mention another motivation for studying the product replacement chain in the case \(G=({{\mathbb {Z}}}/q)^m\) for a prime \(q \ge 2\) and integers \(m \ge 1\). It comes from a public-key authentication protocol proposed by Sotiraki [16], which we now briefly describe. In the protocol, a verifier wants to check the identity of a prover based on the time needed to answer a challenge.
First, the prover runs the Markov chain with \(G = ({{\mathbb {Z}}}/q)^m\) and \(n = m\), which can be interpreted as performing a random walk on \(SL_n({{\mathbb {Z}}}/q)\), where \(\sigma (k)\) is viewed as the k-th row of a \(n \times n\) matrix. (In each step, a random row is either added to or subtracted from another random row.)
After t steps, the prover records the resulting matrix \(A \in SL_n({{\mathbb {Z}}}/q)\) and makes it public. To authenticate, the verifier gives the prover a vector \(x \in ({{\mathbb {Z}}}/q)^n\) and challenges her to compute \(y := Ax\). The prover can perform this calculation in O(t) operations by retracing the trajectory of the random walk.
Without knowing the trajectory, if t is large enough, an adversary will not be able to distinguish A from a random matrix and will be forced to perform the usual matrix-vector multiplication (using \(n^2\) operations) to complete the challenge. Thus, the question is whether \(t \ll n^2\) is large enough for the matrix A to become sufficiently random, so that the prover can answer the challenge much faster than an adversary.
Note that when \(n > m\), the product replacement chain on \(G = ({{\mathbb {Z}}}/q)^m\) amounts to the projection of the random walk on \(SL_n({{\mathbb {Z}}}/q)\) onto the first m columns. Thus, Theorem 1.1 shows that when m is fixed and \(n \rightarrow \infty \), the mixing time for the first m columns is around \(\frac{3}{2} n \log n\). One then hopes that the mixing of several columns is enough to make it computationally intractable to distinguish A from a random matrix; this would justify the authentication protocol, as \(n \log n \ll n^2\).
We remark that when t is much larger than the mixing time of the random walk on \(SL_n({{\mathbb {Z}}}/q)\) generated by row and additions and subtractions, it is information theoretically impossible for an adversary to distinguish A from a random matrix. However, the diameter of the corresponding Cayley graph on \(SL_n({{\mathbb {Z}}}/q)\) is known to be of order \(\Theta \left( \frac{n^2}{\log _q n} \right) \) [1, 5], so a lower bound of the same order necessarily holds for the mixing time. Diaconis and Saloff-Coste [8, Section 4, p. 420] give an upper bound of \(O(n^4)\), which was subsequently improved to \(O(n^3)\) by Kassabov [11]. Closing the gap between \(n^3\) and \(\frac{n^2}{\log n}\) remains an open problem.
1.2 Outline of proof
The proof of Theorem 1.1 analyzes the mixing behavior in several stages:
-
an initial “burn-in” period lasting around \(n \log n\) steps, after which the group elements appearing in the configuration are not mostly confined to any proper subgroup of G;
-
an averaging period lasting around \(\frac{1}{2} n \log n\) steps, after which the counts of group elements become close to their average value under the stationary distribution; and
-
a coupling period lasting O(n) steps, after which our chain becomes exactly coupled to the stationary distribution with high probability.
The argument is in the spirit of [2], but a more elaborate analysis is required in the second and third stages. To analyze the first stage, for a fixed proper subgroup H, the number of group elements in H appearing in the configuration is a birth-and-death process whose transition probabilities are easy to estimate. The analysis of the resulting chain is the same as in [2], and we can then union bound over all proper subgroups H.
In the second stage, for a given starting configuration \(\sigma _0 \in {{\mathcal {S}}}\), we consider quantities \(n_{a,b}(\sigma )\) counting the number of sites k where \(\sigma _0(k) = a\) and \(\sigma (k) = b\). A key observation (which also appears in [2]) is that by symmetry, projecting the Markov chain onto the values \((n_{a,b}(\sigma _t))_{a, b \in G}\) does not affect the mixing behavior. Thus, it is enough to understand the mixing behavior of the counts \(n_{a,b}\).
One expects these counts to evolve towards their expected value \({{\mathbb {E}}}_{\sigma \sim \pi } n_{a,b}(\sigma )\) as the chain mixes. To carry out the analysis rigorously, we write down a stochastic difference equation for the \(n_{a,b}\) and analyze it via the Fourier transform. Intuitively, as \(n \rightarrow \infty \), the process approaches a “hydrodynamic limit” so that it becomes approximately deterministic. It turns out that after about \(\frac{1}{2} n \log n\) steps, the \(n_{a,b}\) are likely to be within \(O(\sqrt{n})\) of their expected value. Our analysis requires a sufficiently “generic” initial configuration, which is why the first stage is necessary.
Finally, in the last stage, we show that if the \((n_{a,b}(\sigma ))_{a,b\in G}\) and \((n_{a,b}(\sigma '))_{a,b\in G}\) for two configurations are within \(O(\sqrt{n})\) in \(\ell ^1\) distance, they can be coupled to be exactly the same with high probability after O(n) steps of the Markov chain. A standard argument involving coupling to the stationary distribution then implies a bound on the mixing time.
The main idea to prove the coupling bound is that even if the \(\ell ^1\) distance evolves like an unbiased random walk, there is a good chance that it will hit 0 due to random fluctuations. A similar argument is used to prove cutoff for lazy random walk on the hypercube [13, Chapter 18]. However, some careful accounting is necessary in our setting to ensure that in fact the \(\ell ^1\) distance does not increase in expectation and to ensure sufficient fluctuations.
1.3 Organization of the paper
The rest of the paper is organized as follows. In Sect. 2, we state (without proof) the key lemmas describing the behavior in each of the three stages and use these to prove the upper bound (1) in Theorem 1.1. Sections 3 and 4 contain the proofs of these lemmas. Finally, in Sect. 5, we prove the lower bound (2) in Theorem 1.1; this is mostly a matter of verifying that the estimates used in the upper bound were tight.
1.4 Notation
Throughout this paper, we use \(c, C, C', \ldots \), to denote absolute constants whose exact values may change from line to line, and also use them with subscripts, for instance, \(C_G\) to specify its dependency only on G. We also use subscripts with big-O notation, e.g. we write \(O_G(\,\cdot \,)\) when the implied constant depends only on G.
2 Proof of Theorem 1.1 (1)
Let us fix a finite group G and denote its cardinality by \({{\mathcal {Q}}}:= |G|\). For a configuration \(\sigma \in {{\mathcal {S}}}\), let \(n_a(\sigma )\) denote the number of sites having group element a, i.e.,
2.1 The burn-in period
For a proper subgroup \(H \subseteq G\), let
denote the number of sites not in H, and define for \(c \in (0, 1)\) the set
Thus, \({{\mathcal {S}}}_{non}\left( c \right) \) is the set of states \(\sigma \) where the group elements appearing in \(\sigma \) are not mostly confined to any particular proper subgroup of G. The next lemma shows that we reach \({{\mathcal {S}}}_{non}\left( 1/3 \right) \) in about \(n \log n\) steps, and once we reach \({{\mathcal {S}}}_{non}\left( 1/3 \right) \), we remain in \({{\mathcal {S}}}_{non}\left( 1/6 \right) \) for \(n^2\) steps with high probability. Note that \(n^2\) is much larger than the overall mixing time, so we may essentially assume that we are in \({{\mathcal {S}}}_{non}\left( 1/6 \right) \) for all of the later stages.
Lemma 2.1
Let \(\tau _{1/3} := \min \{t \ge 0 : \sigma _t \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \}\) be the first time to hit \({{\mathcal {S}}}_{non}\left( 1/3 \right) \). Then for all large enough n and for all large enough \(\beta > 0\),
Moreover, there exists a constant \(C_G\) depending only on G such that
Proof
Fix a proper subgroup \(H \subset G\), and consider what happens to \(n_{non}^H(\sigma _t)\) at time t. Suppose our next step is to replace \(\sigma (i)\) with \(\sigma (i)\sigma (j)\).
If \(\sigma (j) \in H\), then \(n_{non}^H(\sigma _{t+1}) = n_{non}^H(\sigma _t)\). If \(\sigma (j) \not \in H\) and \(\sigma (i) \in H\), then \(n_{non}^H(\sigma _{t+1}) = n_{non}^H(\sigma _t) +1\). Finally, if \(\sigma (j), \sigma (i) \not \in H\), then \(\sigma (i)\sigma (j)\) may or may not be in H, so \(n_{non}^H(\sigma _{t+1}) \ge n_{non}^H(\sigma _t) - 1\).
Let \((N_t)_{t \ge 0}\) be the birth-and-death chain with the following transition probabilities for \(1 \le k \le n\):
We start this chain at \(N_0 = n^H_{non}(\sigma _0)\); note that because the elements appearing in \(\sigma _0\) generate G, we are guaranteed to have \(n^H_{non}(\sigma _0) > 0\).
The above birth-and-death chain corresponds to the behavior of \((n^H_{non}(\sigma _t))\) if whenever \(\sigma (j), \sigma (i) \not \in H\), it always happened that \(\sigma (i)\sigma (j) \in H\). Thus, \((n^H_{non}(\sigma _t))\) stochastically dominates \((N_t)\).
The chain \((N_t)\) is precisely what is analyzed in [2] for the case \(G = {{\mathbb {Z}}}/2\). Let
Then, we have \({{\mathbb {E}}}_{k-1}T_k \le \frac{n^2}{k(n-2k)}\) [2, (2) in the proof of Lemma 1] and thus \({{\mathbb {E}}}_1 (T_{n/3}) =\sum _{k=2}^{n/3}{{\mathbb {E}}}_{k-1}T_k \le n \log n + n\). On the other hand, setting \(v_k=\mathrm{Var}_{k-1}(T_k)\), we have \(v_2 \le n^2\),
and \(\mathrm{Var}_1 (T_{n/3}) = \sum _{k=2}^{n/3}v_k \le 110 n^2\) [2, The proof of Lemma 1]. Hence by Chebyshev’s inequality for all large enough \(\beta > 0\),
Moreover, we have \({{\mathbb {P}}}_{n/3} \left( T_{n/6} \le n^2 \right) \le n^2e^{-n/10}\). Indeed, this follows from the fact that for \(m<k\), we have
where \(\pi _{\mathrm{BD}}(k)={n \atopwithdelims ()k}/(2^n-1)\) [2, (5) and the following in the proof of Proposition 2].
We now take a union bound over all the proper subgroups H. \(\square \)
2.2 The averaging period
In the next stage, the counts \(n_a(\sigma _t)\) go toward their average value. We actually analyze this stage in two substages, looking at a “proportion vector” and “proportion matrix”, as described below.
2.2.1 Proportion vector chain
For a configuration \(\sigma \in {{\mathcal {S}}}\), we consider the \({{\mathcal {Q}}}\)-dimensional vector \((n_a(\sigma )/n)_{a \in G}\), which we call the proportion vector of \(\sigma \). One may check that for a typical \(\sigma \in {{\mathcal {S}}}\), each \(n_a(\sigma )/n\) is about \(1/{{\mathcal {Q}}}\). For each \(\delta > 0\), we define the \(\delta \)-typical set
where \(\Vert \cdot \Vert \) denotes the \(\ell ^2\)-norm in \({{\mathbb {R}}}^G\).
The following lemma implies that starting from \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \), we reach \({{\mathcal {S}}}_*\left( \delta \right) \) in \(O_\delta (n)\) steps with high probability. The proof is given in Sect. 3.4.
Lemma 2.2
Consider any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \) and any constant \(\delta >0\). There exists a constant \(C_{G, \delta }\) depending only on G and \(\delta \) such that for any \(T \ge C_{G, \delta } n\), we have
for all large enough n.
2.2.2 Proportion matrix chain
We actually need a more precise averaging than what is provided by Lemma 2.2. Fix a configuration \(\sigma _0 \in {{\mathcal {S}}}\). For any \(\sigma \in {{\mathcal {S}}}\) and for any \(a, b \in G\), define
If we run the Markov chain \((\sigma _t)_{t\ge 0}\) with initial state \(\sigma _0\), then \(n_{a,b}^{\sigma _0}(\sigma _t)\) is the number of sites that originally contained the element a (at time 0) but now contain b (at time t). Note that
We can then associate with \((\sigma _t)_{t \ge 0}\) another Markov chain \(\left( n_{a,b}^{\sigma _0}(\sigma _t)\right) _{a, b \in G}\) for \(t \ge 0\), which we call the proportion matrix chain (with respect to\(\sigma _0\)). The state space for the proportion matrix chain is \(\{0, 1, \ldots , n\}^{G \times G}\), and the transition probabilities depend on \(\sigma _0\).
The proportion matrix acts like a “sufficient statistic” for analyzing our Markov chain started at \(\sigma _*\), because of the permutation invariance of our dynamics. In fact, as the following lemma shows, the distance to stationarity of the proportion matrix chain is equal to the distance to stationarity of the original chain.
Lemma 2.3
Let \(\sigma _*\in {{\mathcal {S}}}\) be a configuration. For the Markov chain \((\sigma _t)_{t \ge 0}\) with initial state \(\sigma _*\), we consider \(\left( n_{a, b}^{\sigma _*}(\sigma _t)\right) _{a, b \in G}\). Let \({\overline{\pi }}^{\sigma _*}\) be the stationary measure for the Markov chain \(\{(n_{a, b}^{\sigma _*}(\sigma _t))_{a, b \in G}\}_{t \ge 0}\) on \(\left\{ 0, 1, \ldots , n\right\} ^{G \times G}\). Then, for every \(t \ge 0\), we have
Proof
For any matrix \(N = (N_{a,b})_{a,b \in G} \in \{0, 1, \ldots , n\}^{G \times G}\), write
for the set of configurations with N as their proportion matrix.
Since the distribution of \(\sigma _t\) is invariant under permutations on sites \(i \in [n]\) preserving the set \(\{ i : \sigma _*(i) = a\}\) for every \(a \in G\), the conditional probability measures \({{\mathbb {P}}}_{\sigma _*}\left( \sigma _t \in \cdot \mid \sigma _t \in {{\mathcal {X}}}_{(N)} \right) \) and \(\pi ( \ \cdot \mid {{\mathcal {X}}}_{(N)})\) are both uniform on \({{\mathcal {X}}}_{(N)}\). This implies that for each \(\sigma \in {{\mathcal {X}}}_{(N)}\),
and summing over all \(\sigma \in {{\mathcal {X}}}_{(N)}\) and all N, we obtain the claim. \(\square \)
For \(\sigma _0 \in {{\mathcal {S}}}\) and \(r > 0\), define the set of configurations
Roughly speaking, the following lemma shows that starting from a typical configuration \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \), we need about \(\frac{1}{2}n \log n\) steps to reach \({{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \), where R is a constant. We show this fact in a slightly more general form where the initial state need not be \(\sigma _*\); the proof is given in Sect. 3.5.
Lemma 2.4
Consider any \(\sigma _*, \sigma '_*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \), and let \(T := \left\lceil \frac{1}{2} n \log n \right\rceil \). There exists a constant \(C_G > 0\) depending only on G such that for any given \(R > 0\), we have
for all large enough n.
2.3 The coupling period
After reaching \({{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \), we show that only O(n) additional steps are needed to mix in total variation distance. The main ingredient in the proof is a coupling of proportion matrix chains so that they coalesce in O(n) steps when they both start from configurations \(\sigma , {{\tilde{\sigma }}} \in {{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \). We construct such a coupling and prove the following lemma in Sect. 4.
Lemma 2.5
Consider any \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{5{{\mathcal {Q}}}^3} \right) \), and let \(R > 0\). Suppose \(\sigma , {{\tilde{\sigma }}} \in {{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \). Then, there exists a coupling \((\sigma _t, {{\tilde{\sigma }}}_t)\) of the Markov chains with initial states \((\sigma , {{\tilde{\sigma }}})\) such that for a given \(\beta > 0\) and all large enough n,
where \(\tau :=\min \{t \ge 0 : n^{\sigma _*}_{a, b}(\sigma _t) = n^{\sigma _*}_{a, b}({{\tilde{\sigma }}}_t) \ \text {for all} \, a, b \in G\}\).
To translate this coupling time into a bound on total variation distance, we need also the simple observation that the stationary measure \(\pi \) concentrates on \({{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \) except for probability \(O(1/R^2)\), as given in the next lemma.
Lemma 2.6
For the stationary distribution \(\pi \) of the chain \((\sigma _t)_{t \ge 0}\), for every \(R>0\) and for all \(n > m\),
Moreover for every \(\delta <1/(2{{\mathcal {Q}}})\), for every \(R>0\) and for all \(n>m\),
where \(C_G\) and m are constants depending only on G.
Proof
Observe that since the stationary distribution \(\pi \) is uniform on \({{\mathcal {S}}}\), it is given by the uniform distribution \(\mathrm{Unif}\) on \(G^n\) conditioned on \({{\mathcal {S}}}\). Note that we can always generate G using each of its |G| elements, so we have an easy lower bound of \(|{{\mathcal {S}}}| \ge |G|^{n - |G|}\). Consequently, we have
Concerning the second assertion, we note that \(n_a(\sigma _*) \ge (1/{{\mathcal {Q}}}-\delta )n\) for each \(a \in G\); the rest follows similarly, so we omit the details. \(\square \)
Remark 2.7
In Lemma 2.6 above, we have given a very loose bound on \(C_G\) for sake of simplicity. Actually, it is not hard to see that holding G fixed, we have \(\lim _{n\rightarrow \infty } |{{\mathcal {S}}}|/|G|^n = 1\). See also [9, Section 6.B.] for more explicit bounds for various families of groups.
Together, Lemmas 2.4, 2.5, and 2.6 imply the following bound for total variation distance.
Lemma 2.8
Let \(\beta > 0\) be given, and let \(T := \left\lceil \frac{1}{2} n \log n \right\rceil + \left\lceil \beta n \right\rceil \). Then, for any \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{5{{\mathcal {Q}}}^3} \right) \), we have
where \(C_G\) is a constant depending only on G.
Proof
Let \({{\tilde{\sigma }}}\) be drawn from the stationary distribution \(\pi \). Define
where \(({{\tilde{\sigma }}}_t)\) is a Markov chain started at \({{\tilde{\sigma }}}\). Let \({\overline{\pi }}^{\sigma _*}\) denote the stationary distribution for the proportion matrix with respect to \(\sigma _*\). Since \({{\tilde{\sigma }}}\) was drawn from \(\pi \), the proportion matrix of \({{\tilde{\sigma }}}_t\) remains distributed as \({\overline{\pi }}^{\sigma _*}\) for all t.
We first run \(\sigma \) and \({{\tilde{\sigma }}}\) independently up until time \(T_1 := \left\lceil \frac{1}{2} n \log n \right\rceil \). For a parameter R to be specified later, consider the events
Lemma 2.4 implies that \({{\mathbb {P}}}({{\mathcal {G}}}^{{\textsf {c}}}) \le C_G e^{-R} + \frac{1}{n}\), and Lemma 2.6 implies that \({{\mathbb {P}}}({{\tilde{{{\mathcal {G}}}}}}^{{\textsf {c}}}) \le \frac{2 C_G {{\mathcal {Q}}}}{R^2}\).
Let \(T_2 := \left\lceil \beta n \right\rceil \). Starting from time \(T_1\), as long as both \({{\mathcal {G}}}\) and \({{\tilde{{{\mathcal {G}}}}}}\) hold, we may use Lemma 2.5 to form a coupling \((\sigma _t, {{\tilde{\sigma }}}_t)\) so that
Setting \(R = \beta ^{1/4}\), we conclude that
We have \(T = T_1 + T_2\), and recall that the proportion matrix for \({{\tilde{\sigma }}}\) is stationary for all time. This yields
The result then follows by Lemma 2.3. \(\square \)
2.4 Proof of the main theorem
We now combine the lemmas from the burn-in, averaging, and coupling periods to complete the proof of the upper bound in Theorem 1.1.
Proof of Theorem 1.1 (1)
Define \(T_1 := \left\lceil n \log n + \beta n \right\rceil \), \(T_2 := \left\lceil \beta n \right\rceil \), and \(T_3 := \left\lceil \frac{1}{2} n \log n \right\rceil + \left\lceil \beta n \right\rceil \).
Let \(\tau _{1/3}\) be the first time to hit \({{\mathcal {S}}}_{non}\left( 1/3 \right) \) as in Lemma 2.1. Then, Lemma 2.1 implies that for any \(\sigma _1 \in {{\mathcal {S}}}\) and any \(t \ge 0\), we have
Next, by Lemma 2.2, for any \(\sigma _2 \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \) and when \(\beta \) and n are sufficiently large, we have that \({{\mathbb {P}}}_{\sigma _2} \left( \sigma _{T_2} \not \in {{\mathcal {S}}}_*\left( \frac{1}{5{{\mathcal {Q}}}^3} \right) \right) \le \frac{1}{n}\). Consequently, for \(\sigma _2 \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \), we have
Finally, Lemma 2.8 states that
Thus, combining (3), (4), and (5), we obtain for any \(\sigma \in {{\mathcal {S}}}\) that
sending \(n \rightarrow \infty \) and then \(\beta \rightarrow \infty \) yields (1). \(\square \)
3 Proofs for the averaging period
In this section, we prove Lemmas 2.2 and 2.4. The proofs are based on analyzing stochastic difference equations satisfied by the Fourier transform of the proportion vector or matrix.
3.1 The Fourier transform for G
We first establish some notation and preliminaries for the Fourier transform. Let \(G^*\) be a complete set of non-trivial irreducible representations of G. In other words, for each \(\rho \in G^*\), we have a finite dimensional complex vector space \(V_\rho \) such that \(\rho : G \rightarrow GL(V_\rho )\) is a non-trivial irreducible representation, and any non-trivial irreducible representation of G is isomorphic to some unique \(\rho \in G^*\). Moreover, we may equip each \(V_\rho \) with an inner product for which \(\rho \in G^*\) is unitary.
For a configuration \(\sigma \in {{\mathcal {S}}}\) and for each \(\rho \in G^*\), we consider the matrix acting on \(V_\rho \) given by
so that \(x_\rho (\sigma )\) is the Fourier transform of the proportion vector at the representation \(\rho \). We write \(x(\sigma ):=(x_\rho (\sigma ))_{\rho \in G^*}\).
Let \(\widetilde{V} := \bigoplus _{\rho \in G^*}\mathrm{End}_{{\mathbb {C}}}(V_\rho )\), and write \(d_\rho := \dim _{{\mathbb {C}}}V_\rho \). For an element \(x = (x_\rho )_{\rho \in G^*} \in \widetilde{V}\), we define a norm \(\Vert \cdot \Vert _{\widetilde{V}}\) given by
where \(\langle A, B \rangle _{\mathrm{HS}} =\mathrm{Tr}\,(A^*B)\) denotes the Hilbert–Schmidt inner product in \(\mathrm{End}_{{\mathbb {C}}}(V_\rho )\) and \(\Vert \cdot \Vert _{\mathrm{HS}}\) denotes the corresponding norm. (Note that \(\langle \cdot , \cdot \rangle _{\mathrm{HS}}\) and \(\Vert \cdot \Vert _{\mathrm{HS}}\) depend on \(\rho \), but for sake of brevity, we omit the \(\rho \) when there is no danger of confusion.)
The Peter–Weyl theorem [7, Chapter 2] says that
where the isomorphism is given by the Fourier transform. The Plancherel formula then reads
Thus, in order to show that \(\sigma \in {{\mathcal {S}}}_*\left( \delta \right) \), it suffices to show that \(\Vert x(\sigma )\Vert _{\widetilde{V}}\) is small. A similar argument may be applied to the proportion matrix instead of the proportion vector.
Finally, for an element \(A \in \mathrm{End}_{{\mathbb {C}}}(V_\rho )\), we will at times also consider the operator norm\(\Vert A\Vert _{op} := \sup _{v \in V_\rho , v \ne 0} \Vert Av\Vert / \Vert v\Vert \). We will also sometimes use the following (equivalent) variational characterization of the operator norm:
3.1.1 The special case of \(G = {{\mathbb {Z}}}/q\)
On a first reading of this section, the reader may wish to consider everything for the special case of \(G = {{\mathbb {Z}}}/q\) for some integer \(q \ge 2\). In that case, each representation is one-dimensional, and the representations can be indexed by \(\ell = 0, 1, 2, \ldots , q - 1\). The Fourier transform is then particularly simple: the coefficients are scalar values
where \(\omega := e^{\frac{2\pi i}{q}}\) is a primitive q-th root of unity.
This special case already illustrates most of the main ideas while simplifying the estimates in some places (e.g. matrix inequalities we use will often be immediately obvious for scalars).
3.2 A stochastic difference equation for the \(n_a\)
For \(a \in G\), we next analyze the behavior of \(n_a(\sigma _t)\) over time. For convenience, we write \(n_a(t) = n_a(\sigma _t)\). Let \({{\mathcal {F}}}_t\) denote the \(\sigma \)-field generated by the Markov chain \((\sigma _t)_{t \ge 0}\) up to time t. Then, our dynamics satisfy the equation
Note that \(|n_a(t + 1) - n_a(t)| \le 1\) almost surely. Thus, for each \(a \in G\), we can write the above as a stochastic difference equation
where \({{\mathbb {E}}}[M_a(t+1) \mid {{\mathcal {F}}}_t] = 0\) and \(|M_a(t)| \le 2\) almost surely.
It is easiest to analyze this equation through the Fourier transform. Writing \(x_\rho (t) = x_\rho (\sigma _t)\), we calculate from (8) that
where \(\widehat{M}_\rho (t) := \frac{1}{n}\sum _{a \in G}M_a(t) \rho (a)\). For convenience, write
so that our equation becomes
Note that we have
and thus,
where \(\widehat{M}=(\widehat{M}_\rho )_{\rho \in G^*}\).
3.3 A general estimate for stochastic difference equations
Before proving Lemma 2.2, we also need a technical lemma for controlling the behavior of stochastic difference equations, which will be used to analyze (9) as well as other similar equations.
Lemma 3.1
Let \((z(t))_{t \ge 0}\) be a sequence of [0, 1]-valued random variables adapted to a filtration \(({{\mathcal {F}}}_t)_{t \ge 0}\). Let \(\varepsilon \in (0, 1)\) be a small constant, and let \(\varphi : {{\mathbb {R}}}^+ \rightarrow (0,1]\) be a non-decreasing function.
Suppose that there are \({{\mathcal {F}}}_t\)-measurable random variables M(t) for which
and which, for some constant D, satisfy the bounds
Then, for each t and each \(\lambda > 0\), we have
for constants \(c_{D,\varphi }, C_{D,\varphi }\) depending only on D and \(\varphi \).
Proof
Let us define for integers \(t \ge 1\),
Taking conditional expectations in the inequality relating \(z(t+1)\) to z(t), we have
Rearranging and using the fact that \(\varphi (t)\) is non-decreasing, we have
Consequently,
is a supermartingale, and its increments are bounded by
Recall that \(\varphi \) is non-decreasing, so that for all \(t \ge s \ge 0\), we have
Using this with (11), we see that the sum of the squares of the first t increments is at most
By the Azuma–Hoeffding inequality, this yields
which in turn implies
Finally, observe that \(\Phi (t) \ge \sum _{k = 1}^t \varphi (k) \ge \int _0^t \varphi (s)\, ds\). The result then follows upon shifting and rescaling of \(\lambda \). \(\square \)
3.4 Proportion vector chain: Proof of Lemma 2.2
We first prove a bound for the Fourier coefficients \(x_\rho (t)\).
Lemma 3.2
Consider any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \) and any \(\rho \in G^*\). We have a constant \(c_G\) depending only on G for which
for all large enough n.
This immediately implies Lemma 2.2.
Proof of Lemma 2.2
With \(c_G\) defined as in Lemma 3.2, take \(C_{G, \delta }\) large enough so that for any \(T \ge C_{G, \delta } n\),
Then, Lemma 3.2 and Plancherel’s formula yield
for large enough n, as desired. \(\square \)
We are now left with proving Lemma 3.2, which relies on the following bound on the operator norm.
Lemma 3.3
There exists a positive constant \(\gamma _G\) depending on G such that for any \(\rho \in G^*\) and any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \),
Proof
Let \(\Delta _G\) denote the set of all probability distributions on G, and for \(c \in (0, 1)\), let \(\Delta _G(c) \subset \Delta _G\) denote the set of all probability distributions \(\mu \) such that \(\mu (H) \le 1 - c\) for all proper subgroups \(H \subset G\).
Consider a representation \(\rho \in G^*\), and consider the function \(h : \Delta _G(1/6) \rightarrow \mathrm{End}_{{\mathbb {C}}}(V_\rho )\) given by
Then, \(h(\mu )\) is hermitian, and since \(\rho \) is unitary, we clearly have
We claim that \(\lambda (\mu ) < 1\) for each \(\mu \in \Delta _G(c)\). Indeed, suppose the contrary. Then, there exists a non-zero vector \(v \in V_\rho \) such that \(\mathrm{Re}\langle \rho (a)v, v \rangle =1\) for all \(a \in G\) with \(\mu (a)>0\). This implies that the support of \(\mu \) is included in the subgroup
Since \(\rho \) is a (non-trivial) irreducible representation, H is a proper subgroup of G, and \(\mu (H)=1\), contradicting the assumption that \(\mu \in \Delta _G(c)\).
Note that \(\mu \mapsto \lambda (\mu )\) is continuous. We may define
Then, we have for any \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \),
Taking \(0<\gamma _G < 1-{{\tilde{\gamma }}}_G\), and plugging this into the definition of \(X_\rho \) gives \(X_\rho (\sigma ) \preceq -\frac{\gamma _G}{n}I_{d_\rho }\). Note that \(X_\rho (\sigma ) \succeq -\frac{2}{n-1}I_{d_\rho }\). Combining these together gives the result. \(\square \)
Remark 3.4
A much more direct approach is possible in the case \(G = {{\mathbb {Z}}}/q\). The condition \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/6 \right) \) implies that \(n_0(\sigma ) \le \frac{5}{6}\). Then, we have
for some positive \(\gamma _G\). Some rearranging of equations then yields the desired result.
Proof of Lemma 3.2
Fix \(\rho \in G^*\). Let \({{\mathcal {G}}}_t\) denote the event where for all \(0 \le s \le t\), we have \(\Vert I_{d_\rho }+X_\rho (s)\Vert _{op} \le 1 - \frac{\gamma _G}{n}\), where \(\gamma _G\) is taken as in Lemma 3.3. Since our chain starts at \(\sigma \in {{\mathcal {S}}}_{non}\left( 1/3 \right) \), Lemmas 2.1 and 3.3 together imply that
Next, we turn to (9). Rearranging (9) and squaring, we have
Let \(z_t := \mathbf{1}_{{{\mathcal {G}}}_t} \Vert x_\rho (t)\Vert _{\mathrm{HS}}^2\) and
Substituting into (12), we obtain
Note that we have the bounds
We now apply Lemma 3.1 with \(\varepsilon = \frac{1}{n}\), \(\varphi (t) = \gamma _G\), \(D = 6{{\mathcal {Q}}}^2 d_\rho \), and \(\lambda = n^{1/4}\). This yields
Consequently,
The lemma with \(c_G = \gamma _G/2\) then follows from union bounding over all \(1 \le t \le n^2\) and taking n sufficiently large. \(\square \)
3.5 Proportion matrix chain: Proof of Lemma 2.4
We carry out a similar albeit more refined strategy to analyze the proportion matrix. Throughout this section, we assume our Markov chain \((\sigma _t)_{t \ge 0}\) starts at an initial state \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \). We again write \(n_a(t)=n_a(\sigma _t)\) and \(n_{a, b}(t)=n^{\sigma _*}_{a, b}(\sigma _t)\), and similar to before, the \(n_{a, b}(t)\) satisfy the difference equation
where \({{\mathbb {E}}}[M_{a, b}(t+1) \mid {{\mathcal {F}}}_t]=0\) and \(|M_{a, b}(t)| \le 2\) for all \(t \ge 0\).
We can again analyze this equation via the Fourier transform. In this case, for each \(a \in G\), we take the Fourier transform of \(\left( n_{a, b}(t)/n_a(\sigma _*)\right) _{b \in G}\). For \(\rho \in G^*\), let
denote the Fourier coefficient at \(\rho \). Let \(\widehat{M}_{a, \rho }(t) := \frac{1}{n_a(\sigma _*)}\sum _{b \in G}M_{a, b}(t)\rho (b)\). Then, (13) becomes
Note that \({{\mathbb {E}}}_\sigma [\widehat{M}_{a, \rho }(t+1) \mid {{\mathcal {F}}}_t]=0\). Also, since we assumed \(\sigma _*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \), it follows that \(\frac{n_a(\sigma _*)}{n} \ge \frac{1}{2{{\mathcal {Q}}}}\). Thus, we also know \(\Vert \widehat{M}_{a, \rho }(t+1)\Vert _{\mathrm{HS}} \le \frac{4{{\mathcal {Q}}}^2\sqrt{d_\rho }}{n}\).
Again, our main step is a bound on the Fourier coefficients \(y_{a, \rho }(t)\), which will also be useful later in proving Lemma 2.5.
Lemma 3.5
Consider any \(\sigma _*, \sigma '_*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \). There exist constants \(c_G, C_G > 0\) depending only on G such that for all large enough n, we have
for all t and \(R > 0\).
The above lemma directly implies Lemma 2.4.
Proof of Lemma 2.4
We apply Lemma 3.5 to each \(a \in G\) and \(\rho \in G^*\). Recall that \(T = \left\lceil \frac{1}{2} n \log n \right\rceil \), so that
Then, Lemma 3.5 implies
Union bounding over all \(a \in G\) and \(\rho \in G^*\) and using the Plancherel formula, this yields
for sufficiently large \(C_G\) and n. \(\square \)
We now prove Lemma 3.5. Before proceeding with the main proof, we need the following routine estimate as a preliminary lemma.
Lemma 3.6
Let \(\theta _n : {{\mathbb {R}}}^d \rightarrow {{\mathbb {R}}}^+\) be the function given by \(\theta _n(x) = \Vert x\Vert + \frac{1}{\sqrt{n}}e^{-\sqrt{n}\Vert x\Vert } - \frac{1}{\sqrt{n}}\). Then, we have the inequalities
Proof
We can write \(\theta _n(x) = f(\Vert x\Vert )\), where \(f(r) = r + \frac{1}{\sqrt{n}} e^{-\sqrt{n} r} - \frac{1}{\sqrt{n}}\). By spherical symmetry, we have
which is the first inequality. Again by spherical symmetry, the eigenvalues of the Hessian \(\nabla ^2 \theta _n(x)\) can be directly computed to be \(f''(\Vert x\Vert )\) and \(f'(\Vert x\Vert ) / \Vert x\Vert \). But these are bounded by
Thus, \(\nabla ^2 \theta _n(x) \preceq \sqrt{n} I\), and the second inequality follows from Taylor expansion. \(\square \)
Proof of Lemma 3.5
Let \(\gamma _G\) and \(c_G\) be the constants from Lemmas 3.3 and 3.2, respectively. Define the events
Note that \(\sigma '_*\in {{\mathcal {S}}}_*\left( \frac{1}{4{{\mathcal {Q}}}} \right) \subseteq {{\mathcal {S}}}_{non}\left( 1/3 \right) \). Hence, by Lemmas 2.1 and 3.3, we have \({{\mathbb {P}}}({{\mathcal {G}}}^{{\textsf {c}}}_{n^2}) \le C_G n^2 e^{-n/10}\). We also have
where we have used the fact that \(\left\| \frac{x_\rho (s)+x_\rho (s)^*}{2}\right\| _{op} \le \Vert x_\rho (s)\Vert _{op} \le \Vert x_\rho (s)\Vert _{\mathrm{HS}}\).
Lemma 3.2 then implies that \({{\mathbb {P}}}({{\mathcal {G}}}'^{{\textsf {c}}}_{n^2}) \le \frac{1}{n^3}\). Thus, setting
we conclude that
for all large enough n.
Next, we turn to (14) and apply \(\theta _n\) to both sides, where we identify \({{\mathbb {C}}}^{d_\rho ^2}\) with \({{\mathbb {R}}}^{2d_\rho ^2}\). Using Lemma 3.6 and taking the conditional expectation, we obtain
where the second inequality follows from the variational formula for operator norm (i.e. that \(\Vert BA\Vert _{\mathrm{HS}} \le \Vert A\Vert _{op} \Vert B\Vert _{\mathrm{HS}}\)), and the third inequality follows from the fact that \(\theta _n\) is convex with \(\theta _n(0) = 0\). Thus, we may write
where
Now, let \(z_t := \mathbf{1}_{{{\mathcal {H}}}_t} \theta _n(y_{a, \rho }(t))\), and note that since \(X_\rho (\sigma ) \succeq -\frac{2}{n-1} I_{d_\rho }\), we have \(\Vert I_{d_\rho } + X_\rho (t)\Vert _{op} \le 1-\frac{\varphi (t)}{n}\) whenever \({{\mathcal {H}}}_t\) holds. Thus,
We may then apply Lemma 3.1 with \(\varepsilon = \frac{1}{n}\) and \(D = 8{{\mathcal {Q}}}^4 d_\rho \). Note that
for all large enough n. Thus, Lemma 3.1 implies that
Consequently,
as desired. \(\square \)
4 Construction of the coupling: Proof of Lemma 2.5
For each \(\delta >0\), we define a subset of \(\{0, 1, \ldots , n\}^{G \times G}\) by
Lemma 4.1
Consider a configuration \(\sigma _*\in {{\mathcal {S}}}\) and a constant \(0<\delta \le \frac{1}{2{{\mathcal {Q}}}^2}\), and assume that \((1 - \delta )n/{{\mathcal {Q}}}^2\) is an integer. Let \((\sigma _t)_{t \ge 0}\) and \(({{\tilde{\sigma }}}_t)_{t \ge 0}\) be two product replacement chains started at \(\sigma \) and \({{\tilde{\sigma }}}\), respectively. Then, there exists a coupling \((\sigma _t, {{\tilde{\sigma }}}_t)\) of the Markov chains satisfying the following:
Let
Then, on the event \(\{(n^{\sigma _*}_{a, b}(\sigma _t))_{a, b \in G}, (n^{\sigma _*}_{a, b}({{\tilde{\sigma }}}_t))_{a, b \in G} \in {{\mathcal {M}}}_\delta \}\) and \(\{D_t > 0\}\), one has
Proof
Let us abbreviate \(n_{a, b}(t) = n^{\sigma _*}_{a, b}(\sigma _t)\) and \({\tilde{n}}_{a, b}(t) = n^{\sigma _*}_{a, b}({{\tilde{\sigma }}}_t)\). Let \(m_{a, b}(t):=\min (n_{a, b}(t), {\tilde{n}}_{a, b}(t))\). For each \(a \in G\), we define the quantity
so that \(D_t = \sum _{a \in G} d_a(t)\).
For accounting purposes, it is helpful to introduce two sequences
of elements of \(G \times G\). These sequences are chosen so that the number of \(x_k\) equal to (a, b) is exactly \(n_{a, b}\), and similarly the number of \({\tilde{x}}_k\) equal to (a, b) is \({\tilde{n}}_{a, b}\). Moreover, we arrange their indices in a coordinated fashion, as described below.
We define three families of disjoint sets: \(P_{a, b}\), \(Q_a\), and \(R_a \subset [n]\).
-
For each \(a, b \in G\), let \(P_{a, b}\) be a set of size \((1 - \delta )n/{{\mathcal {Q}}}^2\) such that for any \(k \in P_{a, b}\), we have \(x_k ={\tilde{x}}_k = (a, b)\). (This is possible provided that \((n_{a, b}(t)), ({{\tilde{n}}}_{a, b}(t)) \in {{\mathcal {M}}}_\delta \) holds.)
-
For each \(a \in G\), let \(Q_a\) be a set of size \(\sum _{b \in G}(m_{a, b} - |P_{a, b}|)\) such that for any \(k \in Q_a\), \(x_k ={\tilde{x}}_k= (a, b)\) for some b. (Note that \(Q_a\) may be empty.)
-
For each \(a \in G\), let \(R_a\) be a set of size \(d_a\) such that for any \(k \in R_a\), \(x_k\) and \({\tilde{x}}_k\) both have a as their first coordinate. (This \(R_a\) is well-defined since \(\sum _b n_{a, b} = \sum _b {{\tilde{n}}}_{a, b}\) for each a; it may also be empty.)
Define
Suppose that \(D_t > 0\), so that for some \(a_*, b_*, b_*' \in G\) we have \(n_{a_*, b_*} > {\tilde{n}}_{a_*, b_*}\) and \(n_{a_*, b'_*} < {\tilde{n}}_{a_*, b'_*}\). Let us consider all possible ways to sample a pair of indices and a sign \((k, l, s) \in \{1, 2, \ldots , n\}^2 \times \{\pm 1\}\) with \(k \ne l\).
Suppose \(x_k = (a_k, b_k)\) and \(x_l = (a_l, b_l)\). We think of \((k, l, +1)\) as corresponding to a move on \((n_{a, b}(t))\) where \(n_{a_k, b_k}\) is decremented and \(n_{a_k, (b_k \cdot b_l)}\) is incremented. Similarly, \((k, l, -1)\) corresponds to a move where \(n_{a_k, b_k}\) is decremented and \(n_{a_k, (b_k \cdot b_l^{-1})}\) is incremented. We may also think of \((k, l, \pm 1)\) as corresponding to moves on \(({{\tilde{n}}}_{a, b}(t))\) in an analogous way.
We now analyze four cases, as illustrated in Fig. 1.
(i) Case\((k, l) \in (P \sqcup Q) \times (P \sqcup Q)\). For all but an exceptional situation described below, we apply the move corresponding to (k, l, s) to both states \((n_{a, b}(t))\) and \(({{\tilde{n}}}_{a, b}(t))\). In these cases, \(D_{t+1}=D_t\).
We now describe the exceptional situation. Define
Then, the exceptional situation occurs when \(s = +1\) and \((k, l) \in S \sqcup S'\).
Take any bijection \(\tau \) from S to \(S'\). If \((k, l) \in S\), then we apply \((k, l, +1)\) to \((n_{a, b}(t))\) while applying \((\tau (k, l), +1)\) to \(({{\tilde{n}}}_{a, b}(t))\). This increments \(n_{a_*, b'_*}\), decrements \(n_{a_*, b_*}\), and has no effect on the \(({\tilde{n}}_{a, b}(t))\). The overall effect is that \(D_{t+1} = D_t - 1\).
If instead \((k, l) \in S'\), then we apply \((k, l, +1)\) to \((n_{a, b}(t))\) and \((\tau ^{-1}(k, l), +1)\) to \(({{\tilde{n}}}_{a, b}(t))\). A similar analysis shows that in this case \(D_{t+1} = D_t + 1\).
The exceptional event occurs with probability \(\frac{(1 - \delta )^2}{2{{\mathcal {Q}}}^3}\), and when it occurs, \(D_t\) increases or decreases by 1 with equal probability. Thus, the exceptional situation plays the role of introducing some unbiased fluctuation in \(D_t\) and gives us (17).
(ii) Case\((k, l) \in (Q \sqcup R) \times (Q \sqcup R)\) but \((k, l) \not \in Q \times Q\). This occurs with probability
which is at most
Apply the move corresponding to (k, l, s) to both states. This increases \(D_t\) by at most 1. We will see later that the effect of this case is small compared to the other cases.
(iii) Case\((k, l) \in P \times R\). This occurs with probability
Apply the move corresponding to (k, l, s) to both states. Again, this increases \(D_t\) by at most 1, but there is also a chance not to increase.
Suppose that \(x_l = (a_1, b_1)\) and \({\tilde{x}}_l = (a_1, {\tilde{b}}_1)\), and suppose that \(k \in P_{a_2, b_2}\). Then the move has the effect of decreasing \(n_{a_2, b_2}\) and \({\tilde{n}}_{a_2, b_2}\) while increasing \(n_{a_2, (b_2 \cdot b_1^s)}\) and \({\tilde{n}}_{a_2, (b_2\cdot {\tilde{b}}_1^s)}\). Note that conditioned on this case happening, \((a_2, b_2)\) is distributed uniformly over \(G \times G\). When \((a_2, (b_2\cdot {\tilde{b}}_1^s)) = (a_*, b_*)\) or \((a_2, (b_2\cdot b_1^s)) = (a_*, b'_*)\), the move does not increase \(D_t\). Therefore there is at least a \(2/{{\mathcal {Q}}}^2\) chance that \(D_t\) is actually not increased. Hence, the probability that \(D_t\) is increased by 1 is at most
(iv) Case\((k, l) \in R \times P\). This occurs with probability
Suppose that \(x_k = (a, b)\) and \({\tilde{x}}_k = (a, {\tilde{b}})\). Let \(\tau \) be a permutation of P such that for \(l \in P_{a, c}\), one has \(\tau (l) \in P_{a, {\tilde{b}}^{-1}\cdot b \cdot c^s}\). Then apply (k, l, s) to \((n_{a, b}(t))\) and apply \((k, \tau (l), s)\) to \(({{\tilde{n}}}_{a, b}(t))\). This always decreases \(D_t\) by 1.
Let us now summarize what we know when \((n_{a, b}(t)), ({{\tilde{n}}}_{a, b}(t)) \in {{\mathcal {M}}}_\delta \) and \(D_t>0\). From Cases (i), (ii), and (iii), we have
From Cases (i) and (iv), we have
Therefore, if \(0<\delta \le \frac{1}{2{{\mathcal {Q}}}^2}\), then
verifying (16).
To fully define the coupling, when \(D_t = 0\), we can couple \(\sigma _t\) and \(\sigma _t\) to be identical, and if either \((n_{a, b}(t)) \notin {{\mathcal {M}}}_\delta \) or \(({{\tilde{n}}}_{a, b}(t)) \notin {{\mathcal {M}}}_\delta \), we may run the two chains independently. \(\square \)
Proof of Lemma 2.5
Since \(\sigma \in {{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \), we must have for each \(a \in G\) and \(\rho \in G^*\) that \(\Vert y^{\sigma _*}_{a, \rho }(\sigma )\Vert _{\mathrm{HS}} \le \frac{R}{\sqrt{n}}\). Note that for large enough n, we have \({{\mathcal {S}}}_*\left( \sigma _*, \frac{R}{\sqrt{n}} \right) \subseteq {{\mathcal {S}}}_*\left( \frac{1}{5{{\mathcal {Q}}}^3} \right) \). Thus, we may apply Lemma 3.5 to obtain
for large enough n. Define the event
The Plancherel formula applied to (18) implies that \({{\mathbb {P}}}({{\mathcal {G}}}^{{\textsf {c}}}_{n^2}) \le \frac{3{{\mathcal {Q}}}^2}{n}\). We may analogously define an event \({{\tilde{{{\mathcal {G}}}}}}_t\) for \({{\tilde{\sigma }}}\) and let \({{\mathcal {A}}}_t := {{\mathcal {G}}}_t \cap {{\tilde{{{\mathcal {G}}}}}}_t\). Thus, \({{\mathbb {P}}}({{\mathcal {A}}}_{n^2}^{{\textsf {c}}}) \le \frac{6{{\mathcal {Q}}}^2}{n}\).
Pick \(\delta ' \in \left( \frac{2}{5{{\mathcal {Q}}}^2}, \frac{3}{7{{\mathcal {Q}}}^2}\right) \) so that \((1 - \delta ')n/{{\mathcal {Q}}}^2\) is an integer. Note that when \({{\mathcal {A}}}_t\) holds, we have
and similarly \({{\tilde{\sigma }}}_t \in {{\mathcal {M}}}_{\delta '}\).
Thus, we may invoke Lemma 4.1 to give a coupling between \(\sigma \) and \({{\tilde{\sigma }}}\) where on the event \({{\mathcal {A}}}_t\), the quantity \(D_t\) is more likely to decrease than increase. Letting \(\mathbf{D}_t:=\mathbf{1}_{{{\mathcal {A}}}_t}D_t\), we see that \((\mathbf{D}_t)\) is a supermartingale with respect to \(({{\mathcal {F}}}_t)\).
Define
Then, Lemma 4.1 ensures that on the event \(\{{{\tilde{\tau }}}> t\}\), we have \(\mathrm{Var}(\mathbf{D}_{t+1}\mid {{\mathcal {F}}}_t) \ge \alpha ^2\), where \(\alpha ^2:=\left( 1- \frac{1}{{{\mathcal {Q}}}^2}\right) \frac{(1-\delta ')^2}{4{{\mathcal {Q}}}^3}\). By [13, Proposition 17.20], for every \(u > 12/\alpha ^2\),
Recall that \(T = \left\lceil \beta n \right\rceil \) and \(D_0 \le \sqrt{{{\mathcal {Q}}}} R\sqrt{n}\). As long as \(\beta \) is large enough, we may apply (19) with \(u = T\) to get
for all large enough n, as desired. \(\square \)
5 Proof of Theorem 1.1 (2)
The lower bound is proved essentially by showing that the estimates of Lemmas 2.1 and 2.4 cannot be improved. Let \(a_1, a_2, \ldots , a_k\) be a set of generators for G. Let \(\sigma _\star \in {{\mathcal {S}}}\) be the configuration given by
We will analyze the Markov chain started at \(\sigma _\star \) and show that it does not mix too fast.
Recall from Sect. 2 the notation
for the number of sites in \(\sigma \) that do not contain the identity. We first show that if we run the chain for slightly less than \(n \log n\) steps, most of the sites will still contain the identity.
Lemma 5.1
Let \(T := \left\lfloor n \log n - Rn \right\rfloor \). Then,
Proof
Recall that in one step of our Markov chain, we pick two indices \(i, j \in [n]\) and replace \(\sigma (i)\) with \(\sigma (i) \cdot \sigma (j)\) or \(\sigma (i) \cdot \sigma (j)^{-1}\). The only way for \(n^{\{\mathrm{id}\}}_{non}(\sigma _t)\) to increase after this step is if \(\sigma (j) \ne \mathrm{id}\). Thus,
Let \(\tau := \min \{ t \ge 0 : n^{\{\mathrm{id}\}}_{non}(\sigma _t) \ge \frac{n}{3} \}\) be the first time that \(n^{\{\mathrm{id}\}}_{non}(\sigma _t)\) is at least \(\frac{n}{3}\). We have that \(n^{\{\mathrm{id}\}}_{non}(\sigma _\star ) = k\), so it follows from (20) that \(\tau \) stochastically dominates the sum
where the \(G_s\) are independent geometric variables with success probability \(\frac{s}{n}\). Note that we have the bounds
Hence,
for \(R \ge 2 {{\mathcal {Q}}}\ge 2 \log (3k)\). On the other hand, the bound claimed in the lemma is trivial for \(R \le 2 {{\mathcal {Q}}}\), so we have completed the proof. \(\square \)
Next, we show that it really takes about \(\frac{1}{2}n \log n\) steps for the Fourier coefficients \(x_\rho \) to decay to \(O\left( \frac{1}{\sqrt{n}} \right) \), as suggested by Lemma 2.4. Note that it suffices here to analyze the \(x_\rho \) instead of the \(y_{a, \rho }\), which simplifies our analysis. Actually, it suffices to consider (the real part of) the trace of \(x_\rho \). Here the orthogonality of characters reads \(\frac{1}{{{\mathcal {Q}}}}\sum _{a \in G} \mathrm{Tr}\,\rho (a)=0\), and it takes about \(\frac{1}{2}n \log n\) steps for \(\mathrm{Re}\mathrm{Tr}\,x_\rho (t)\) to decay to \(O\left( \frac{1}{\sqrt{n}} \right) \).
Lemma 5.2
Consider any \(\rho \in G^*\) and any \(R > 5\). Let \(T := \left\lfloor \frac{1}{2}n \log n - Rn \right\rfloor \), and suppose that \(\sigma \in {{\mathcal {S}}}\) satisfies \(n^{\{\mathrm{id}\}}_{non}(\sigma ) \le \frac{n}{3}\). Then,
Proof
Let \(z(t) := (1/d_\rho ) \mathrm{Tr}\,(x_\rho (t)+x_\rho (t)^*)/2\). Then, noting that (9) also holds for \(x_\rho (t)^*\) since \(x_{\rho ^*}(t)=x_\rho (t)^*\), we have
where
Here we have
We compare z(t) to another process \((w(t))_{t \ge 0}\) defined by \(w(0) := \frac{1}{3}\) and
We will show by induction that \(z(t) \ge w(t)\) for all t. For the base case, note that since \(n^{\{\mathrm{id}\}}_{non}(\sigma ) \le \frac{n}{3}\), we have
Suppose now that \(z(t) \ge w(t)\). Then,
completing the induction.
It now suffices to lower bound w(T). To this end, we first note that applying (21) repeatedly and taking expectations, we obtain
In order to calculate the variance of w(T), we can also square (21) and take the expectation, which gives us
Then, by Chebyshev’s inequality, we have
as desired. \(\square \)
Proof of Theorem 1.1 (2)
Let \(T = T_1 + T_2\), where \(T_1 := \left\lfloor n \log n - \beta n \right\rfloor \) and \(T_2 := \left\lfloor \frac{1}{2} n \log n - \beta n \right\rfloor \). Fix any \(\rho \in G^*\). By Lemma 5.1 followed by Lemma 5.2, we have for large enough \(\beta \) that
On the other hand, Lemma 2.6 tells us that
Consequently,
which tends to 1 as \(\beta \rightarrow \infty \), establishing (2). \(\square \)
References
Andrén, D., Hellström, L., Markström, K.: On the complexity of matrix reduction over finite fields. Adv. Appl. Math. 39(4), 428–452 (2007)
Ben-Hamou, A., Peres, Y.: Cutoff for a stratified random walk on the hypercube. Electron. Commun. Probab. 23(32), 1–10 (2018)
Chung, F.R.K., Graham, R.L.: Random walks on generating sets for finite groups. Electron. J. Comb. 4(2) (1997)
Chung, F.R.K., Graham, R.L.: Stratified random walks on the \(n\)-cube. Random Struct. Algorithms 11(3), 199–222 (1997)
Christofides, D.: The asymptotic complexity of matrix reduction over finite fields. arXiv preprint arXiv:1406.5826 (2014)
Celler, F., Leedham-Green, C.R., Murray, S.H., Niemeyer, A.C., O’Brien, E.A.: Generating random elements of a finite group. Commun. Algebra 23(13), 4931–4948 (1995)
Diaconis, P.: Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Lecture Notes-Monograph Series, vol. 11. Institute of Mathematical Statistics, Hayward (1988)
Diaconis, P., Saloff-Coste, L.: Walks on generating sets of abelian groups. Probab. Theory Relat. Fields 105(3), 393–421 (1996)
Diaconis, P., Saloff-Coste, L.: Walks on generating sets of groups. Inventiones Mathematicae 134(2), 251–299 (1998)
Holt, D.F., Rees, S.: An implementation of the Neumann–Praeger algorithm for the recognition of special linear groups. Exp. Math. 1(3), 237–242 (1992)
Kassabov, M.: Kazhdan constants for \({SL}_n({\mathbb{Z}})\). Int. J. Algebra Comput. 15(5–6), 971–995 (2005)
Lubotzky, A., Pak, I.: The product replacement algorithm and Kazhdan’s property \(({T})\). J. Am. Math. Soc. 14(2), 347–363 (2001)
Levin, D., Peres, Y., Wilmer, E.: Markov Chains and Mixing Times, 2nd edn. American Mathematical Society, Providence (2017)
Pak, I.: The product replacement algorithm is polynomial. In: Proceedings of the 41st Annual Symposium on the Foundations of Computer Science (FOCS), pp. 476–485. IEEE (2000)
Pak, I.: What do we know about the product replacement algorithm? In: Groups and Computation, III (Columbus, OH, 1999), Volume 8 of Ohio State University Mathematical Research Institute Publications, pp. 301–347. de Gruyter, Berlin (2001)
Sotiraki, A.: Authentication protocol using trapdoored matrices. Master Thesis, Massachusetts Institute of Technology (2016)
Acknowledgements
This work was initiated while R.T. and A.Z. were visiting Microsoft Research in Redmond. They thank Microsoft Research for the hospitality. R.T. was also visiting the University of Washington in Seattle and thanks Professor Christopher Hoffman for making his visit possible. R.T. is supported by JSPS Grant-in-Aid for Young Scientists (B) 17K14178 and partially by JST, ACT-X Grant Number JPMJAX190J, Japan. A.Z. is supported by a Stanford Graduate Fellowship. The authors thank an anonymous referee for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Alex Zhai was affiliated with Stanford University, Stanford, CA, USA, when work was conducted.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Peres, Y., Tanaka, R. & Zhai, A. Cutoff for product replacement on finite groups. Probab. Theory Relat. Fields 177, 823–853 (2020). https://doi.org/10.1007/s00440-020-00962-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-020-00962-1
Keywords
- Markov chain
- Mixing time
- Cutoff phenomenon
- Product replacement algorithm
Mathematics Subject Classification
- Primary 60K35
- 60K35
- Secondary 60K35