1 Introduction: TV approximation Lemmas and limit profiles

The cutoff phenomenon describes a situation where a Markov chain stays away from equilibrium for some time, but then converges to equilibrium very abruptly. In rare cases, one can find an explicit function which describes this sharp transition, called the limit profile; see, eg, [2, 13, 17].

In this paper, we develop a technique which allows us to well-approximate the distance from equilibrium, and hence study the limit profiles. We consider the cases of general reversible Markov chains using a spectral decomposition and random walks on homogeneous spaces, ie \(X = G/K\) with G a group and K a subgroup of G using Fourier analysis. The method is an extension of one introduced by Teyssier [20] for random walks on Cayley graphs where the generating set is a union of conjugacy classes. We then apply these techniques to prove the limit profile behaviours for the k-cycle shuffle, the multiple Ehrenfest urn model and the Gibbs sampler with Binomial prior densities, sharpening results of Hough [14], Berestycki, Schramm and Zeitouni [3], Ceccherini-Silberstein, Scarabotti and Tolli [5], and Diaconis, Khare and Salo-Coste [9].

1.1 Mixing times and limit profiles

Let \(\Omega \) be a finite set and P a transition matrix on \(\Omega \). Then \(P^t(x,y)\) is the probability of moving from x to y in t steps for all \(x,y \in \Omega \) and all \(t \in \mathbb {N}_0\). If P is irreducible and aperiodic, then the basic limit theorem of Markov chains tells us that \(P^t(x,\cdot )\) converges to the (unique) invariant distribution \(\pi \) as \(t\rightarrow \infty \) with respect to the total variation (abbreviated TV) distance, defined by

The most common situation is to study the worst-case TV distance: There are other possibilities, such as the typical TV distance where the starting point \(x\) is chosen according to \(\pi \): \( d_\text {typ}(\cdot ) :=\sum _{x\in \Omega } \pi (x) d_\text {TV}(\cdot , x).\) The (worst-case) mixing time is then defined by

For a sequence of Markov chains indexed by N, if there exist \((t_\star ^{(N)})_{N\in \mathbb {N}}\)   and  \((w_\star ^{(N)})_{N\in \mathbb {N}}\) satisfying

then the sequence of chains exhibits cutoff at \(t_\star \) with window \(\mathcal {O}( w_* )\).

One can look beyond just finding the cutoff time and window, but instead determine the profile inside the window: the aim is to choose \(t_\star \) and \(w_\star \) appropriately so that

The limit \(N\rightarrow \infty \) is taken for each fixed \(\alpha \in \mathbb {R}\).

Officially, when we look at \(d_\text {TV}(t)\), we need \(t \in \mathbb {N}\); in practice, we omit floor/ceiling signs.

1.2 TV convergence profile for random walks

In this paper we present three lemmas for obtaining the TV profile for random walks; see Lemmas AC. They work by finding a decomposition of the TV distance as a sum using either a spectral decomposition or Fourier analysis. One then separates out the ‘important’ terms in the sum to give a ‘main term’ (which asymptotically captures all the TV mass) and an ‘error’ term. Lemmas A and C are original contributions; Lemma B is due to Teyssier [20]. For each lemma, we give an example application, establishing a limit profile of the TV convergence to equilibrium.

We denote the cdf of the standard normal distribution by \(\Phi \) throughout the paper.

1.2.1 Reversible Markov chains

First we consider general reversible Markov chains on an arbitrary set \(\Omega \). The following lemma is based off the well-known spectral decomposition for a reversible Markov chain P:

where \(P^t(x,y)\) is the probability of moving from \(x\) to \(y\) in \(t\) steps, \(\pi \) is the invariant distribution and \( \{ f_i, \lambda _i \} _{i=1}^{|\Omega |}\) are the eigenstatistics; see [16, Lemma 12.2]. Recall that, for \(x\in \Omega \) and \(t\in \mathbb {N}_0\), we write \(d_\text {TV}(t,x)\) for the TV distance from \(\pi \) after t steps when started from \(x\).

We come to our first contribution: the TV-approximation lemma for reversible Markov chains.

Lemma A

(Reversible Markov Chains) Consider a reversible, irreducible and aperiodic Markov chain on a finite set \(\Omega \) with invariant distribution \(\pi \). Denote by \( -1< \lambda _{|\Omega |} \le \ldots \le \lambda _2 < \lambda _1 = 1 \) its eigenvalues and by \(f_{|\Omega |},\ldots , f_1\) its corresponding orthonormal (with respect to \(\pi \)) eigenvectors. For \(t\in \mathbb {N}_0\) and \(x \in \Omega \), denote by \(d_\text {TV}(t,x)\) the TV distance from equilibrium (ie \(\pi \)) of the Markov chain started from x.

For all \(t\in \mathbb {N}_0\), all \(x \in \Omega \) and all \(I\subseteq \{ 2, \ldots , |\Omega | \} \), we have

$$\begin{aligned} \big | d_\text {TV} ( t,x ) - \tfrac{1}{2} \sum \nolimits _{y \in \Omega } \pi (y) \big | \sum \nolimits _{i\in I} f_i(x) f_i(y) \lambda _i^t \bigr | \bigr | \le \tfrac{1}{2} \sum \nolimits _{i\notin I} | f_i(x) | | \lambda _i |^t. \end{aligned}$$

As an application of Lemma A, we determine the limit profile for a specific two-component Gibbs sampler, which is an important tool in statistical physics as explained in [9, §1].

Let \((\mathcal {X}, \mathscr {F}, \mu )\) and \((\Theta , \mathscr {G}, \pi )\) be two probability spaces. The probability measure \(\pi \) is called the prior. Let \( \{ f_\theta (\cdot ) \} _{\theta \in \Theta }\) be a family of probability densities on \(\mathcal {X}\) with respect to \(\mu \). These define a probability measure \(\Pr \) on \(\mathcal {X}\times \Theta \) by

The marginal density on \(\mathcal {X}\) is given by \( m(x) :=\int _{\theta } f_\theta (x) d\pi (\theta )\) for \(x \in \mathcal {X}\). The posterior density with respect to the prior \(\pi \) is defined by \( \pi (\theta \vert x) :=f_\theta (x) / m(x)\) for \((x, \theta ) \in \mathcal {X}\times \Theta \).

The (\(\mathcal {X}\)-chain) Gibbs sampler is defined informally as follows (each draw is independent):

$$\begin{aligned} \varvec{\cdot }\hbox { input } x; \varvec{\cdot }\hbox { draw } \theta \sim \pi (\cdot \vert x); \varvec{\cdot }\hbox { draw } x' \sim f_\theta (\cdot ); \varvec{\cdot }\hbox { output } x'. \end{aligned}$$

Formally, it is the Markov chain defined by the transition kernel P given by

Observe that P is reversible with respect to m, ie the marginal density on \(\mathcal {X}\).

We consider the special case of location families: \( f_\theta (x) = g(x - \theta )\) for all \( (x, \theta ) \in \mathcal {X}\times \Theta .\) for some function g; see [9, §5]. The Gibbs sampler can then be realised in the following way:

\(\varvec{\cdot }\)  input x; \(\varvec{\cdot }\)  draw \(\theta \sim \pi (\cdot \vert x)\); \(\varvec{\cdot }\)  draw \(\varepsilon \sim g\); \(\varvec{\cdot }\)  output \(x' :=\theta + \varepsilon \).

We consider prior \(\pi \) and g each being Binomial, which leads to a hypergeometric posterior.

Our next contribution is the limit profile for the two-component Gibbs sampler with Binomial priors, established as an application of Lemma A. A more refined statement is given in Theorem 2.1.

Theorem A

(Gibbs Sampler) Let \(n_1, n_2 \in \mathbb {N}\) and \(p \in (0,1)\); write \(n:=n_1 + n_2\) and \(\AA :=p/(1-p)\). Let \(\pi \sim {{\,\mathrm{Bin}\,}}(n_1, p)\) and \(g \sim {{\,\mathrm{Bin}\,}}(n_2, p)\). For \(t\in \mathbb {N}_0\), write \(d_\text {TV}^{n_1,n_2,p}(t)\) for the TV distance of the (location family) Gibbs sampler after \(t\) steps started from \(0 \in \mathbb {N}\) from its invariant distribution m.

Suppose that \(\min \{ p, 1-p \} \cdot n\gg 1\). Then, for all \(c\in \mathbb {R}\) (independent of \(n\)), we have

$$\begin{aligned} d_\text {TV}^{n_1,n_2,p}\bigl ( \bigl ( \tfrac{1}{2} \log (\alpha n) + c \bigr ) / \log \bigl ( \tfrac{1}{1 - n_2/n} \bigr ) \bigr ) \rightarrow 2 \, \Phi \bigl ( \tfrac{1}{2} e^{-c} \bigr ) - 1. \end{aligned}$$

The above set-up implicitly sets the sample spaces \(\mathcal {X}:= \{ 0, \ldots , n_1 \} \) and \(\Theta := \{ 0, \ldots , n_2 \} \) and the event spaces to be the respective set of all subsets. The sample spaces are finite, so this is natural.

Cutoff for the \(L_2\) mixing time of this Gibbs sampler was established by Diaconis, Khare and Salo-Coste [9, §5.1]; these tools could likely be adapted to give cutoff for the usual TV (\(L_1\)) mixing time. However, the techniques of Diaconis, Khare and Salo-Coste [9, §5.1] are not sufficiently refined to give access to the limit profile; a more detailed analysis is required.

1.2.2 Random walks on groups

We start by recalling some standard terminology from representation theory.

Definition

Let G be a finite group and V a finite dimensional vector space over \(\mathbb {C}\). A representation \(\rho \) of G over V is an action \( (g,v) \mapsto \rho (g) \cdot v : G \times V \rightarrow V\) such that \(\rho (g) : V \rightarrow V\) is an invertible linear map for all \(g \in G\). The Fourier transform of a function \(\mu : G \mapsto \mathbb {C}\) with respect to the representation \((\rho , V)\) is the linear operator \( \widehat{\mu }(\rho ) : V \rightarrow V\) defined by \( \widehat{\mu }(\rho ) :=\sum \nolimits _{g \in G} \mu (g) \rho (g).\)

Using the Fourier inversion formula, for all probability measures \(\mu \) on G and all \(t \in \mathbb {N}_0\), we have

$$\begin{aligned} d_\text {TV}(\mu ^{*t}, \, {{\,\mathrm{Unif}\,}}_G) = \tfrac{1}{2} |G |^{-1} \sum \nolimits _{g \in G} \big | \sum \nolimits _{\rho \in R^*} d_\rho {{\,\mathrm{tr}\,}}\bigl ( \widehat{\mu }(\rho )^t \rho (g^{-1}) \bigr ) \bigr |, \end{aligned}$$

where \(\mu ^{*t}\) is the t-fold self-convolution of \(\mu \), \(R^*\) is the set of all non-constant irreducible representations (abbreviated irreps) of G and \(d_\rho \) is the dimension of the irrep \(\rho \); see [6, §3.10].

If \(\mu \) is the step distribution of a random walk on G, then this determines exactly TV distance after t steps; cf the well-known spectral representation for reversible random walks. One must still control the Fourier transform at arbitrary irreps. There are two important special cases.

  • Suppose that \(\mu \) is conjugacy-invariant, ie \(\mu (g) = \mu (h^{-1} g h)\) for all \(g,h \in G\). By Schur’s lemma, \(\widehat{\mu }(\rho )\) is a multiple of the identity for each irrep \(\rho \). Then the key object in calculating the Fourier transform is the character: \( \chi _g(\rho ) :={{\,\mathrm{tr}\,}}(\rho (g))\) for \(g \in G\) and \(\rho \in R^*\). This is the case considered originally in [10], and then in [20], for random transpositions.

  • Suppose that the matrices \(\widehat{\mu }(\rho )\) have only one non-trivial entry which is in the first position (in an appropriate ‘spherical’ basis). This radical but frequent simplification occurs in the framework of Gelfand pairs, see §4 for details. Diaconis and Shahshahani [11] consider this in the set-up of the Bernoulli–Laplace urn model, and more generally.

Conjugacy-invariant random walks

In this subsection we state Teyssier’s lemma for conjugacy-invariant random walks.

Definition B

A random walk G is conjugacy-invariant if there is a probability measure \(\mu \) which is constant on each conjugacy class of G for which the transition matrix P satisfies \(P(x,xg) = \mu (g)\) for all \(x \in G\). For a representation \(\rho \), define the character ratio \( s_\rho :=d_\rho ^{-1} \sum \nolimits _{g \in G} \mu (g) \chi _g(\rho ).\)

Teyssier’s lemma for conjugacy-invariant random walks states the following.

Lemma B

(Teyssier [20, Lemma 2.1]) Let G be a finite group; let \(\mu : G \mapsto [0,1]\) be a conjugacy-invariant probability distribution on G. For \(t\in \mathbb {N}_0\), denote by \(d_\text {TV}(t)\) the TV distance to equilibrium of the random walk on G started from the identity with step distribution \(\mu \) and run for \(t\) steps.

Let \(t\in \mathbb {N}_0\) and \(I \subseteq R^*\), ie the set of non-trivial irreps of G. Then \( d_\text {TV}(t) = d_\text {TV}(\mu ^{*t}, \, {{\,\mathrm{Unif}\,}}_G)\)  and

$$\begin{aligned} \big | d_\text {TV}(t) - \tfrac{1}{2} |G |^{-1} \sum \nolimits _{g \in G} \big | \sum \nolimits _{\rho \in I} d_\rho s_\rho ^t\chi _\rho (g) \bigr | \bigr | \le \tfrac{1}{2} \sum \nolimits _{\rho \in R^* \setminus I} d_\rho |s_\rho |^t. \end{aligned}$$

We apply this lemma to the k-cycle random walk on the symmetric group \(\mathcal {S}_n\). In this walk, at each step a k-cycle is chosen uniformly at random and composed with the current location. We establish the limit profile for \(2 \le k \ll n\). There are parity constraints. To handle such parity constraints, we follow the set-up used by Hough [14]:

  • if k is odd, then the walk is supported on the set of even permutations;

  • if k is even and t is even, then the walk at time t is supported on the set of even permutations;

  • if k is even and t is odd, then the walk at time t is supported on the set of odd permutations.

We come to our next contribution: the limit profile for the random k-cycle shuffle, established as an application of Lemma B. A more refined statement is given in Theorem 3.1.

Theorem B

(Random k-Cycles) Let \(k, n \in \mathbb {N}\) with \(k \ge 2\). For \(t \in \mathbb {N}_0\), denote by \(d_\text {TV}^{n,k}(t)\) the TV distance of the k-cycle random walk on \(\mathcal {S}_n\) from the uniform distribution on the appropriate set of permutations of a fixed parity started from the identity and run for t steps.

Suppose that \(2 \le k \ll n\). Then, for all \(c\in \mathbb {R}\) (independent of n), we have

$$\begin{aligned} d_\text {TV}^{n,k}\bigl ( \tfrac{1}{k} n ( \log n + c ) \bigr ) \rightarrow d_\text {TV}\bigl ( {{\,\mathrm{Pois}\,}}(1 + e^{-c}), \, {{\,\mathrm{Pois}\,}}(1) \bigr ). \end{aligned}$$

Cutoff for the this shuffle was already established, but not the limit profile. The case of random transpositions, ie \(k = 2\), was one of the first Markov chains studied using representation theory; cutoff was established by Diaconis and Shahshahani [10]. For general \(2 \le k \ll n\), cutoff was established for by Hough [14] using representation theory. Berestycki, Schramm and Zeitouni [3] previously established the same result for k independent of n, using probabilistic arguments instead of representation theory. Berestycki and Şengül [4] studied a generalisation where one draws uniformly from a prescribed conjugacy class with support k with \(2 \le k \ll n\).

The limit profile, even for \(k = 2\), remained a famous open problem for a long time. A breakthrough came recently by Teyssier [20], using Lemma B above; we apply this lemma here. Also, we adapt and extend some character theory for the k-cycle walk developed by Hough [14]. Finally, we adapt and extend some of the analysis of Teyssier [20] from \(k = 2\) to general k.

Random walks on homogeneous spaces Finally we turn our attention to random walks on homogeneous spaces \(X :=G/K\), where G is a finite group and K a subgroup of G. Where [10, 20] considered conjugacy-invariant \(\mu \) to simplify the calculation of the Fourier transforms, here we consider the case that \(\mu \) is K bi-invariant, ie \( \mu (k_1 g k_2) = \mu (g)\) for all \(g \in G\) and all \(k_1, k_2 \in K\) and that (GK) is a Gelfand pair, ie the algebra of K bi-invariant functions (under convolution) is commutative; see Definition 4.1. In this case, for any K bi-invariant function \(\mu \) on G, if \((\rho , V)\) is a spherical irrep, defined in Definition 4.2, then the matrix \(\widehat{\mu }(\rho )\) has only one non-zero entry, which is in the top-left position; this entry is called the spherical Fourier transform of \(\mu \) with respect to \(\rho \) (rescaled by \(|K |\)). Moreover, if \((\tau , W)\) is a non-spherical irrep, then \(\widehat{\mu }(\tau ) = 0\) is the zero matrix.

Using this simplification, we prove the following lemma for random walks on homogeneous spaces corresponding to a Gelfand pair started from some element \(\bar{x} \in K\) stabilised by K, ie \(k \bar{x} = \bar{x}\) for all \(k \in K\) (under the usual left coset action). The canonical quotient projection \(G \rightarrow G/K\) preserves the uniform distribution. So the invariant distribution of any random walk on a homogeneous space is uniform on that space.

Our next contribution is a TV-approximation lemma for random walks on homogeneous spaces.

Lemma C

(Homogeneous Spaces) Let (GK) be a Gelfand pair and denote \(X :=G/K\). Let \(\bar{x}\) be an element of X whose stabiliser is K. Let \( \{ \varphi _i \} _{i=0}^N\) be the associated spherical functions, with \(\varphi _0(x) = 1\) for all \(x \in X\), considered as K-invariant functions on X, and \( \{ d_i \} _{i=0}^N\) the associated dimensions. Let P be a G-invariant stochastic matrix and set \(\mu _{\bar{x}}(\cdot ) :=P ( \bar{x}, \cdot ) \). For \(t\in \mathbb {N}_0\), denote by \(d_\text {TV}(t,\bar{x})\) the TV distance to equilibrium of the random walk on X started from \(\bar{x}\) with step distribution \(\mu _{\bar{x}}\) and run for \(t\) steps.

Let \(t\in \mathbb {N}_0\) and \(I \subseteq \{ 1, \ldots , N \} \). Then \( d_\text {TV}(t, \bar{x}) = d_\text {TV} ( \mu _{\bar{x}}^{*t}, \, {{\,\mathrm{Unif}\,}}_X ) \) and

$$\begin{aligned} \Bigl |d_\text {TV}(t, \bar{x}) - \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \big |\sum \nolimits _{i \in I} d_i \varphi _i(x) \widehat{\mu }_{\bar{x}}(i)^t \bigr | \Bigr | \le \tfrac{1}{2} \sum \nolimits _{i \notin I} \sqrt{d_i} | \widehat{\mu }_{\bar{x}}(i) |^t, \end{aligned}$$

where \(\widehat{\mu }_{\bar{x}} : i \mapsto \sum \nolimits _{x \in X} \mu _{\bar{x}}(x) \overline{\varphi _i(x)}\) is the spherical Fourier transform of \(\mu \) with respect to \( \{ \varphi _i \} _{i=0}^N\).

We come to our final contribution: the limit profile for the multiple urn Ehrenfest urn diffusion model, established as an application of Lemma C. A more refined statement is given in Theorem 4.9.

Theorem C

(Ehrenfest Urn) Let \(n,m \in \mathbb {N}\). Consider n labelled balls and \(m+1\) labelled urns. Consider the following Markov chain: at each step, choose a ball and an urn uniformly and independently; place said ball in said urn. For \(t\in \mathbb {N}_0\), denote by \(d_\text {TV}^{n,m}(t)\) the TV distance of this urn model started with all balls in a single urn from its invariant distribution and run for \(t\) steps.

Suppose that \(1 \le m \ll n\). Then, all \(c\in \mathbb {R}\) (independent of n), we have

$$\begin{aligned} d_\text {TV}^{n,m}\bigl ( \tfrac{1}{2} n \log (nm) + cn \bigr ) \rightarrow 2 \, \Phi \bigl ( \tfrac{1}{2} e^{-c} \bigr ) - 1. \end{aligned}$$

Cutoff, but not the limit profile, was established for this multiple urn model by Ceccherini-Silberstein et al. [5, §6] using representation theory. To establish the profile, we apply the approximation lemma for random walks on homogeneous spaces, ie Lemma C, using the character theory developed by Ceccherini-Silberstein, Scarabotti and Tolli [5].

This model was originally introduced (with two urns) by Ehrenfest and Ehrenfest [12] in 1907. In this case, the model can be viewed as a TV-preserving projection of the simple random walk on the n-hypercube. There cutoff was established by Aldous [1, Example 3.19]. The limit profile is even known: see Salez [19, Theorem 18 in §6.2] (in French) for a ‘probabilistic’ argument using convergence theorems or Diaconis, Graham and Morrison [8, Theorem 1] for a Fourier analytical argument. We present a significantly simpler Fourier analytical argument, using only basic representation theory of the Abelian group \(\mathbb {Z}_2^d\) in the appendix of the arXiv version of this paper, namely [18, Theorem 5.1].

1.2.3 Corollaries to TV approximation Lemma for reversible Markov chains (Lemma A)

We close this section with two simple corollaries of the general TV-approximation lemma for reversible Markov chains, Lemma A. The first is for transitive Markov chains; the second is for typical TV distance. For transitive chains, the starting point is irrelevant; that is, for each t, the map \(x \mapsto d_\text {TV}(t,x)\) is constant (ie does not depend on the input x). In particular, \( d_\text {TV}(\cdot ) = \sum \nolimits _{x \in \Omega } \pi (x) d_\text {TV}(\cdot ,x).\) Also, by transitivity, the invariant distribution \(\pi \) is uniform on \(\Omega \).

Corollary A.1

Consider the set-up of Lemma A; in addition, assume that the chain is transitive.

For all \(t\in \mathbb {N}_0\) and \(I\subseteq \{ 2, \ldots , |\Omega | \} \), we have

$$\begin{aligned} \big | d_\text {TV}(t) - \tfrac{1}{2} |\Omega |^{-2} \sum \nolimits _{x,y \in \Omega } \big | \sum \nolimits _{i\in I} f_i(x) f_i(y) \lambda _i^t \bigr | \bigr | \le \tfrac{1}{2} \sum \nolimits _{i\notin I} | \lambda _i^t |. \end{aligned}$$

Instead of looking at TV from a given starting point, we can also consider averaging over the starting point (with respect to the invariant distribution). This is sometimes known as typical TV distance (as opposed to worst-case). For \(t\in \mathbb {N}_0\), denote

$$\begin{aligned} d_\text {typ}(t) :=\sum \nolimits _{x \in \Omega } \pi (x) d_\text {TV}(\cdot ,x). \end{aligned}$$

Corollary A.2

Consider the set-up of Lemma A; no transitivity is necessary.

For all \(t\in \mathbb {N}_0\) and all \(I\subseteq \{ 2, \ldots , |\Omega | \} \), we have

$$\begin{aligned} \big | d_\text {typ}(t) - \tfrac{1}{2} \sum \nolimits _{x,y \in \Omega } \pi (x) \pi (y) \big | \sum \nolimits _{i\in I} f_i(x) f_i(y) \lambda _i^t \bigr | \bigr | \le \tfrac{1}{2} \sum \nolimits _{i\notin I} | \lambda _i^t |. \end{aligned}$$

1.3 Organisation of the paper

The remainder of the paper is organised as follows.

  • §2 Here we study general reversible Markov chains. We prove the our TV-approximation lemma (Lemma A) via an application of the spectral decomposition for reversible Markov chains.

    As an application of Lemma A, we establish the limit profile for a two-component Gibbs sampler, which are fundamental tools in statistical physics (see [9, §1]).

  • §3 Here we establish the limit profile of the random k-cycle walk on the symmetric group. We do this via an application of the TV-approximation lemma of Teyssier [20] (Lemma B), along with extending and applying character theory developed by Hough [14].

  • §4 Here we study random walks on homogeneous spaces corresponding to Gelfand pairs. We develop and apply (mostly classical) theory to prove our TV-approximation lemma (Lemma C).

    As an application of Lemma C, we establish the limit profile for the famous Ehrenfest urn diffusion with many urns, using some character theory developed by Ceccherini-Silberstein, Scarabotti and Tolli [5].

2 Reversible Markov chains

In this section general reversible Markov chains are considered. First we prove the lemma and corollaries from the introduction, then we apply them to a Gibbs sampler.

2.1 Proof of TV-approximation Lemmas for reversible Markov chains

Lemma A follows from the usual spectral representation of TV distance along with some algebraic manipulations and inequalities. Corollaries A.2 and A.1 follow, in an identical way to each other, from averaging both sides of Lemma A with respect to \(\pi \). We give the full details now.

Proof of Lemma A

As an immediate consequence of [16, Lemma 12.2], for \(x \in \Omega \), we have

$$\begin{aligned} d_\text {TV}(t,x) = \tfrac{1}{2} \sum \nolimits _{y \in \Omega } \pi (y) \big |\sum \nolimits ^{|\Omega |}_{i=2} f_i(x) f_i(y) \lambda _i^t\bigr |. \end{aligned}$$

Let \(I\subseteq \{ 2, \ldots , |\Omega | \} \). Elementary manipulations using the triangle inequality (twice) then Cauchy–Schwarz and the fact that the eigenfunctions are orthonormal with respect to \(\pi \), give

$$\begin{aligned} \big |&d_\text {TV}(t,x) - \tfrac{1}{2} \sum \nolimits _{y \in \Omega } \pi (y) | \sum \nolimits _{i\in I} f_i(x) f_i(y) \lambda _i^t | \bigr |\\&\quad \le \tfrac{1}{2} \sum \nolimits _{y \in \Omega } \pi (y) | \sum \nolimits _{i\notin I} f_i(x) f_i(y) \lambda _i^t |\\&\quad \le \tfrac{1}{2} \sum \nolimits _{i\notin I} | f_i(x) \lambda _i^t | \sum \nolimits _{y \in \Omega } \pi (y) | f_i(y) | \le \tfrac{1}{2} \sum \nolimits _{i\notin I} | f_i(x) \lambda _i^t |. \end{aligned}$$

\(\square \)

Proof of Corollaries A.1 and A.2

For a transitive chain, for each \(t\), the map \(x \mapsto d_\text {TV}(t,x)\) is constant (ie does not depend on the input x). So we may replace \(d_\text {TV}(t,x)\) by \(\sum \nolimits _{x \in \Omega } \pi (x) d_\text {TV}(t,x) = d_\text {typ}(t)\). The corollaries now follow by averaging the error term with respect to \(\pi \), using Cauchy–Schwarz and the normalisation of the eigenfunctions. \(\square \)

2.2 Application to Gibbs sampler with binomial priors

In this subsection, we consider the Gibbs sampler with Binomial priors, namely \(\pi \sim {{\,\mathrm{Bin}\,}}(n_1, p)\) and \(g \sim {{\,\mathrm{Bin}\,}}(n_2, p)\), as described in Theorem A. Here \(\mathcal {X}:= \{ 0, 1, \ldots , n \} \) where \(n:=n_1 + n_2\).

The following theorem is a restatement of Theorem A, but written more formally: cutoff is for a sequence of Markov chains; we make this sequence explicit.

Theorem 2.1

Let \(n_1, n_2 \in \mathbb {N}\) and \(p\in (0,1)\); write \(n:=n_1 + n_2\) and \(\AA :=p/ (1 - p)\). Consider the (location family) Gibbs sampler with \(\pi \sim {{\,\mathrm{Bin}\,}}(n_1, p)\) and \(g \sim {{\,\mathrm{Bin}\,}}(n_2, p)\). For \(t\in \mathbb {N}_0\), let \(d_\text {TV}^{n_1, \, n_2, \, p}(t)\) denote the TV distance from equilibrium after \(t\) steps in this Gibbs sampler started from 0.

Let \((n_{1,N})_{N\in \mathbb {N}}, (n_{2,N})_{N\in \mathbb {N}}\in \mathbb {N}^\mathbb {N}\) and \((p_N) \in (0,1)^\mathbb {N}\); for each \({N\in \mathbb {N}}\), write \(n_N :=n_{1,N} + n_{2,N}\) and \(\alpha _N :=p_N / (1 - p_N)\). Suppose that \(\lim _N n_N \min \{ p_N, \, 1 - p_N \} = \infty \). Then, for all \(c\in \mathbb {R}\), we have

As in previous sections, for ease of presentation we omit the N-subscripts in the proof. The technical calculations in this section are analogous to those in §2.2; the eigenfunctions are the same (after a reparametrisation) but the eigenvalues are slightly different.

It is straightforward to check that the invariant distribution m of the \(\mathcal {X}\)-chain is Binomial:

The eigenfunctions are then the family of polynomials orthogonal to the Binomial. These are the Krawtchouk polynomials (appropriately rescaled), defined precisely now.

Definition 2.2

Define the Krawtchouk polynomials \( \{ K_i \} _{i \in \mathbb {N}}\) via

When the second two parameters are fixed, abbreviate

The Krawtchouk polynomials are orthogonal with respect to the Binomial measure.

Lemma 2.3

([15, §1.10]) The Krawtchouk polynomials satisfy the orthogonality relations

Thus the Krawtchouk polynomials are orthogonal with respect to the Binomial measure:

The following proposition describes the eigenstatistics of this model; it is taken from Diaconis, Khare and Salo-Coste [9, §5.1].

Proposition 2.4

(Eigenstatistics; [9, §5.1]) The eigenvalues \( \{ \lambda _i \} _{i\in \mathcal {X}}\) and eigenfunctions

\( \{ f_i \} _{i\in \mathcal {X}}\) are given by the following:

Note that for all \(i\in \mathcal {X}\) and \(\lambda _i = 0\) for all \(i \ge n_1 + 1\).

Applying Proposition 2.4, we obtain the following expressions for the terms in Lemma A:

Our first aim is to use this to determine which are the ‘important’ eigenstatistics.

Lemma 2.5

(Error Term) For all \(\varepsilon > 0\) and all \(c\in \mathbb {R}\), there exists an \(M:=M(\varepsilon ,c)\) so that, for \(t:=\tfrac{1}{2} ( \log (\AA n) + c ) / \log ( \tfrac{1}{1 - n_2/n} ) \), if \(I := \{ 1, \ldots , M \} \), then

Proof

Observe that \(0 \le \lambda _i\le \lambda _1^i= (1 - n_2/n)^i\) for all \(i\). The inequality \(\text {ET}\le \text {ET}'\) now follows. The equality in the definition of \(\text {ET}'\) is an immediate consequence of Proposition 2.4. For the inequality \(\text {ET}' \le \varepsilon \), choose \(M\) so that \(\sum \nolimits _{i> M} e^{-ci} / \sqrt{i!} < \varepsilon \); then, using Proposition 2.4 again, we have

$$\begin{aligned} \sum \nolimits _{i> M} | f_i(0) | \lambda _1^{it} \le \sum \nolimits _{i> M} \bigl ( \sqrt{\alpha n} ( 1 - n_2/n ) ^t \bigr )^i/ \sqrt{i!} = \sum \nolimits _{i> M} e^{-ci} / \sqrt{i!} < \varepsilon . \end{aligned}$$

\(\square \)

From now on, choose \(M:=M(c,\varepsilon )\) as in Lemma 2.5. Hence, for the main term, we need only deal with eigenstatistics with \(i \asymp 1\). We would then like to use the replacement \(\lambda _i\approx (1 - n_2/n)^i\).

Definition 2.6

(Adjusted Main Term) Recalling that \(t= \bigl ( \tfrac{1}{2} \log (\AA n) + c \bigr ) / \log \bigl ( \tfrac{1}{1 - n_2/n} \bigr )\), define

The following pair of lemmas approximate \(\text {MT}\) by \(\text {MT}'\) and then evaluate (asymptotically) \(\text {MT}'\).

Lemma 2.7a

(Main Term: Approximation) For all \(\varepsilon > 0\) and \(c\in \mathbb {R}\), for \(M:=M(c,\varepsilon )\), we have

$$\begin{aligned} | \text {MT}- \text {MT}' | \le 2 \varepsilon . \end{aligned}$$

It thus suffices to work with the \(\text {MT}'\), which has a significantly simpler form. This is the main power of the technique: it allows us to replace the complicated \(\lambda _i^t\) by the simpler \(\lambda _1^{it}\). Typically, this power will be much easier to handle, particularly when melded with Binomial coefficients.

Lemma 2.7b

(Main Term: Evaluation) For all \(c\in \mathbb {R}\), with \(M:=M(c,\varepsilon )\), we have

$$\begin{aligned} \tfrac{1}{2} \text {MT}' \rightarrow 2 \, \Phi \bigl ( \tfrac{1}{2} e^{-c} \bigr ) - 1. \end{aligned}$$

Proof of Lemma 2.7a

Since \((1 - n_2/n)^t= (\AA n)^{1/2} e^{-c}\) and \(\lambda _1 = 1 - n_2/n\), we have

$$\begin{aligned} | \text {MT}- \text {MT}' |&= \big | \sum \nolimits _{x\in \mathcal {X}} m(x) \big | \sum \nolimits _{[}M]{i=1} f_i(0) f_i(x) \lambda _i^t \bigr | - \sum \nolimits _{x\in \mathcal {X}} m(x) \big | \sum \nolimits _{[}\infty ]{i=1} f_i(0) f_i(x) \lambda _1^{it} \bigr | \bigr |\\&\le \sum \nolimits _{x\in \mathcal {X}} m(x) \sum \nolimits _{[}M]{i=1} | f_i(0) f_i(x) | | \lambda _i^t- \lambda _1^{it} | + \sum \nolimits _{x\in \mathcal {X}} m(x) \sum \nolimits _{i> M} | f_i(0) f_i(x) | \lambda _1^{it}. \end{aligned}$$

We consider these two sums separately. Recall that \(M:=M(c,\varepsilon )\) is a constant.

For the first sum, which we denote \(S_1\), we use the relation which is easy to derive. Using Cauchy–Schwarz and the unit-normalisation of the eigenfunctions as well as the relations \(\lambda _1^{it} = e^{-ci} (\AA n)^{-i/2}\) and \(f_i(0) = \alpha ^{i/2} \left( {\begin{array}{c}n\\ i\end{array}}\right) ^{1/2} \le (\alpha n)^{i/2} / \sqrt{i!}\), we see that

For the second sum, which we denote \(S_2\), using Cauchy–Schwarz and the unit-normalisation of the eigenfunctions again and then the error term bound of Lemma 2.5, we see that

$$\begin{aligned} S_2 = \sum \nolimits _{i> M} |f_i(0) | \lambda _1^{it} \bigl ( \sum \nolimits _{x\in \mathcal {X}} m(x) |f_i(x) | \bigr ) \le \sum \nolimits _{i> M} |f_i(0) | \lambda _1^{it} = \text {ET}' \le \varepsilon . \end{aligned}$$

In conclusion, we see that \( | \text {MT}- \text {MT}' | \le \varepsilon + o( 1 ) \le 2 \varepsilon \) (asymptotically), as desired. \(\square \)

Proof of Lemma 2.7b

Evaluating this requires some algebraic manipulation then approximation.

For convenience, we drop some of the min/max from the limits in the sum in \(\varphi _i\); define \(\left( {\begin{array}{c}N\\ r\end{array}}\right) :=0\) whenever it is not the case that \(0 \le r \le N\). Abbreviate \(z:=e^{-c}/\sqrt{\alpha n}\). For \(\ell \in \{ 0, 1 , \ldots , n \} \), we have

We now need to take absolute values and average with respect to the weights .

Observe that, for any \(\zeta \in \mathbb {R}\), we have

$$\begin{aligned} \tfrac{\alpha }{\alpha +1}(1 - \zeta ) + \tfrac{1}{\alpha +1}(1 + \alpha \zeta ) = 1. \end{aligned}$$

So, setting \(p_x:=\tfrac{\alpha }{\alpha +1}(1 - x/ \sqrt{\AA n})\) for \(x\in \mathbb {R}\), the above is a \({{\,\mathrm{Bin}\,}}(n, p_{e^{-c}})\)-type probability. Indeed,

It remains to compare these Binomials. We do precisely this via the local CLT in the appendix of the arXiv version of this paper, namely [18, Lemma 5.2]:

\(\square \)

We now have all the ingredients to establish the limit profile for this Gibbs sampler.

Proof of Theorem 2.1

Let us summarise what we have proved. The following are all evaluated at the target mixing time \( t= \tfrac{1}{2} \log (\alpha n) + cn\) with \(M:=M(c,\varepsilon )\) given by Lemma 2.5.

  • By Lemma 2.5, we have \( \text {ET}\le \varepsilon .\)

  • By Lemma 2.7a, we have \( |\text {MT}- \text {MT}' | \le 2 \varepsilon \) for \(n\) sufficiently large

  • By Lemma 2.7b, we have \( \tfrac{1}{2} \text {MT}' \rightarrow 2 \, \Phi (\tfrac{1}{2} e^{-c}) - 1\) .

Since \(\varepsilon > 0\) is arbitrary, applying the TV-approximation lemma for reversible Markov chains, namely Lemma A, we immediately deduce the theorem. \(\square \)

3 Random k-cycle walk on the symmetric group

3.1 Walk definition and statement of result

We analyse the limit profile of the random k-cycle walk on the symmetric group \(\mathcal {S}_n\). This random walk starts (without loss of generality) from the identity permutation, and a step involves composing the current location with a uniformly chosen k-cycle. This is an extension of the random transpositions studied by Teyssier [20]. We use representation theory for k-cycles, studied recently by Hough [14], who established cutoff for any \(2 \le k \ll n\), and found the order of the window if further . We determine the limit profile for any \(2 \le k \ll n\).

For \(\mathcal {S}_n\), the irreducible representations are indexed by partitions of n. As is common for card shuffles, the main contribution comes from those partitions with long first row; it is these we use as our set I. We sharpen some of Hough’s results slightly to determine the limit profile.

Theorem 3.1

(Random k-Cycle Walk) Let \(n,k \in \mathbb {N}\). Consider the random k-cycle walk on \(\mathcal {S}_n\): start at \(\text {id}\in \mathcal {S}_n\); at each step, choose a k-cycle \(\tau \) uniformly at random; move by right-multiplication. For \(t\in \mathbb {N}_0\), write \(d_\text {TV}^{n,k}(t)\) for the TV distance of the random k-cycle walk on \(\mathcal {S}_n\) from the uniform distribution on the appropriate set of permutations of a fixed parity, ie the odd ones if k is even and t is odd and the even ones otherwise.

Let \((n_N)_{N\in \mathbb {N}}, (k_N)_{N\in \mathbb {N}}\in (\mathbb {N}\setminus \{ 1 \} )^\mathbb {N}\). Suppose that \(\lim _N k_N / n_N = 0\). Then, for all \(c\in \mathbb {R}\),  we have

Throughout the proof, for notational ease, we drop the subscripts, just writing k and n, and assuming that \(2 \le k \ll n\). Write \(\mathcal {A}_{n;k,t}\) for the set of odd permutations in \(\mathcal {S}_n\) if k is even and t is odd and the even permutations otherwise. Then the k-cycle walk at time t is supported on \(\mathcal {A}_{n;k,t}\).

It is well-known that the irreducible representations for \(\mathcal {S}_n\) are parametrised by partitions of n; see [7]. We need to find a collection of irreducible representations which asymptotically contains all the total variation mass. As is often the case with card shuffle-type walks, it is the partitions with long first row which we use. More precisely, for a partition \(\lambda \) of n, write \(\lambda = (\lambda _1, \ldots , \lambda _n)\) with \(\lambda _1 \ge \cdots \lambda _n\); let \(M \in \mathbb {N}\), and set

$$\begin{aligned} \mathcal {P}_n(M) :=\bigl \{ \lambda \text { partition of } n \mid n - M< \lambda _1 < n \bigr \}. \end{aligned}$$

The trivial representation, denoted \(\text {triv}^n\), corresponds to the partition with only one block, ie \(\text {triv}^n_1 = n\) and \(\text {triv}^n_i = 0\) for \(i \ge 2\). Write \(\mathcal {P}^*_n(M) :=\mathcal {P}_n(M) \cup \{ \text {triv}^n \} \).

We now phrase Teyssier’s lemma, ie Lemma B, in this set-up.

Lemma 3.2

For all \(t \in \mathbb {N}_0\) and all \(M \ge 1\), we have

$$\begin{aligned} \big | d_\text {TV}(t) - \tfrac{1}{n!} \sum \nolimits _{\sigma \in \mathcal {A}_{n;k,t}} \big | \sum \nolimits _{\lambda \in \mathcal {P}_n(M)} d_\lambda s_\lambda (k)^t \chi _\lambda (\sigma ) \bigr | \bigr | \le \sum \nolimits _{\lambda \notin \mathcal {P}^*_n(M)} d_\lambda |s_\lambda (k) |^t . \end{aligned}$$

Given k and t, the random walk is supported on the set of permutations with a fixed sign; half the permutations are odd and half are even. Hence the factor \(\tfrac{1}{2} |\mathcal {A}_{n;k,t} |^{-1} = \tfrac{1}{n!}\) in the lemma above. (We emphasise the dependence on k in the character ratio \(s_\lambda (k)\).)

Outline of Proof of Theorem 3.1

We show in Lemma 3.7 that, for all \(\varepsilon > 0\) and all \(c\in \mathbb {R}\), there exists a constant \(M :=M(\varepsilon ,c)\) so that this the right hand side of (3.1) is at most \(\varepsilon \) when \(t = \tfrac{1}{k} n ( \log n + c ) \). Thus, for the main term, we are interested in partitions \(\lambda \) with \(n - \lambda _1 \asymp 1\).

Further, it is well-known that , where \(\lambda ^* :=\lambda \setminus \lambda _1\) is the partition \(\lambda \) with the largest element removed. In fact, when \(r :=n - \lambda _1 \asymp 1\); see [20, Proposition 3.2].

Hough [14, Theorem 5] states a rather general result on the character ratios \(s_\lambda (k)\). Manipulating this general formula in the special case of \(\lambda \in \mathcal {P}_n(M)\), ie \(r = n - \lambda _1 \asymp 1\), we show in Corollary 3.4 that \( s_\lambda (k) = e^{-kr/n} ( 1 + o( 1 ) ) .\) While the precise form of \(s_\lambda (k)\) is complicated, roughly, this allows us to replace \(s_\lambda (k)\) with \((e^{-k/n})^r\), which decays exponentially in \(r \in [1,M]\).

Altogether, by allowing us to replace

this converts an unmanageable main term sum into what is in essence a generating function. We then adapt results of Teyssier [20, §4] to control this generating function. \(\square \)

As stated above, to prove this theorem we use representation theory results on the k-cycle walk from [14]. We state these precisely in the next section; we have to sharpen some results slightly. Throughout this section, \(\lambda \) will always be a partition of n, written \(\lambda \vdash n\).

Following [14], we use the Frobenius notation for a partition:

where \(\lambda '\) is the transpose of the partition \(\lambda \). Writing \(r :=n - \lambda _1\), the following hold:

We use the following notation for the descending factorial: for \(z \in \mathbb {R}\) and \(k \in \mathbb {N}\), write

$$\begin{aligned} (z)_{k} :=z (z - 1) \cdots (z - k + 1). \end{aligned}$$

Without further ado, we quote the required results from [14] in the next subsection.

3.2 Statements of character ratio bounds

In this subsection, we state a result from [14], and deduce some corollaries of these statements. We do not give any proofs at this stage; these are deferred to §3.4.

The first result which we quote determines asymptotically the character ratio for partitions with long first row—which, we recall, are the partitions of particular interest to us.

Theorem 3.3

([14, Theorem 5(a)]) Let \(0< \varepsilon < \tfrac{1}{2}\). Suppose that \(r + k + 1 < \tfrac{1}{3} n\). Then

Further, if \(r < k\), then the error term is actually 0.

In this article, we are interested in partitions with long first row, namely \(\mathcal {P}_n(M)\). We can apply this theorem to analyse asymptotics of partitions with long first row.

Corollary 3.4

(Long First Row) Let \(2 \le k \le \tfrac{1}{3} n\). Let \(r \in \mathbb {N}\). Let \(\lambda \vdash n\) with \(\lambda _1 = n - r\). Then

$$\begin{aligned} s_\lambda (k) = e^{-rk/n} \cdot \bigl ( 1 + \mathcal {O}( k/n^2 ) \bigr ). \end{aligned}$$

This covers the case where the first row is long. The next two results consider shorter rows; the first is for and the second for . These statements are not exactly the same as in [14], but are slight strengthenings; their proofs are given in §3.4.

Theorem 3.5

(cf [14, Theorem 5(b)]) Assume that . Let \(\theta :=0.68 > \tfrac{2}{3}\); so \(e^{-\theta } > 0.506\). Consider \(\lambda \) with \(b_1 \le a_1 \le e^{-\theta } n\). Then

Lemma 3.6

(cf [14, Lemmas 14 and 15]) Assume that . Let \(\lambda \vdash n\) with \(b_1 \le a_1\) and \(r :=n - \lambda _1 \in [\tfrac{1}{3} n, n]\). Then

From these statements, along with the standard bounds on \(d_\rho \), the dimension of an irreducible representation \(\rho \), we are able to control the two terms, which we call the main and error terms, in Lemma B. Our first port of call is to find a suitable M to bound the error term. Once we have determined this, for the main term we need only consider partitions \(\lambda \) with \(\lambda _1 \ge n - M\). We take M to be order 1 (but arbitrarily large); so \(\lambda _1 \ge n - M\) falls into the “long first row” case.

Lemma 3.7

(Error Term) Let \(c\in \mathbb {R}\) and . For \(M \in \mathbb {N}\), let

$$\begin{aligned} \text {ET}_M :=\sum \nolimits _{\lambda : \lambda _1 \le n - M} d_\lambda |s_\lambda (k) |^t = \sum \nolimits _{r=M}^{\infty } \sum \nolimits _{\lambda : \lambda _1 = n - r} d_\lambda |s_\lambda (k) |^t. \end{aligned}$$

Then

This controls the error term. We now consider the main term in Lemma 3.2.

Lemma 3.8

(Main Term) Let \(c\in \mathbb {R}\) and . For \(M \in \mathbb {N}\), let

$$\begin{aligned} \text {MT}_M :=\tfrac{1}{n!} \sum \nolimits _{\sigma \in \mathcal {A}_{n;k,t}} \big | \sum \nolimits _{\lambda \in \mathcal {P}_n(M)} d_\lambda s_\lambda ^t \chi _\lambda (\sigma ) \bigr |. \end{aligned}$$

Then

Proof of Theorem 3.1 Given Lemmas 3.7 and 3.8

Lemma 3.2 formulates Teyssier’s lemma, ie Lemma B, in the set-up of the random k-cycle walk. Lemmas 3.7 and 3.8 control the error and main terms, respectively. Combining these three ingredients establishes Theorem 3.1. \(\square \)

It remains to control the error and main terms, ie prove Lemmas 3.7 and 3.8 respectively.

3.3 Controlling the main and error terms

We control the main term in §3.3.1 and the error term in §3.3.2.

3.3.1 Controlling the main term

We analyse the main term, ie Lemma 3.8, first. The analysis follows similarly to the case of random transpositions (ie \(k = 2\)) considered by Teyssier [20]. We need only consider partitions \(\lambda \) with long first row, namely \(\lambda _1 = n - r\) with \(1 \le r \le M\), where M is some (arbitrarily large) constant. These are precisely the partitions considered in the results quoted from Hough [14].

Teyssier [20, §4.1 and §4.2] then has some technical lemmas to get the main term into the desired form. We summarise these now. Note that he considers time \(\tfrac{1}{2} n \log n + cn\), while we are considering \(\tfrac{1}{k} n ( \log n + c ) \); hence our two \(c\)-s differ by a factor 2.

Before digging into the details of his lemmas, we give the high-level reasons why his proof passes over to our case. When considering the main term, one need only study those partitions with long first row, ie \(\lambda \) with \(r :=n - \lambda _1 \asymp 1\). For such \(\lambda \), consider the difference between \(s_\lambda (2)\) and \(s_\lambda (k)\):

Teyssier needs \(s_\lambda (2)^t= n^{-r} e^{-c}\). This goes some way to justifying why we expect to be the mixing time, and that the cutoff window should scale down with k linearly.

We now proceed more formally. For each \(r \ge 1\), define the polynomials \(T_r\) by

For a partition \(\lambda \), write \(\lambda ^* :=\lambda \setminus \lambda _1\) for \(\lambda \) with the first row removed. For a permutation \(\sigma \in \mathcal {S}_n\), write \({{\,\mathrm{Fix}\,}}\sigma \) for the number of fixed points in \(\sigma \).

Lemma 3.9

([20, Lemma 4.3]) Let \(r \in \mathbb {N}\). Let \(\sigma \in \mathcal {S}_n\) be a permutation with at least one cycle of length greater than r. Then

$$\begin{aligned} \tfrac{1}{r!} \sum \nolimits _{\lambda \vdash n : \lambda _1 = n - r} d_{\lambda ^*} \chi _\lambda (\sigma ) = T_r({{\,\mathrm{Fix}\,}}\sigma ). \end{aligned}$$

The proof of this lemma is combinatorial and strongly relies on the Murnagham–Nakayama rule. Observe that Lemma 3.9 is a statement purely about the representation theory of the symmetric group; it is nothing to do with the random walk.

Using this result, one can obtain the following approximation.

Lemma 3.10

([20, Lemma 4.2]) Set \(t :=\tfrac{1}{k} n ( \log n + c ) \). Let \(M \in \mathbb {N}\). Then

To prove this, one separates \(\mathcal {A}_{n;k,t}\) into the set of permutations with a cycle of length greater than M and those with all cycles of length at most M. Also, it is not difficult to check, using the hook-length formula (see, eg, [20, Propositions 3.1 and 3.2]) and Corollary 3.4, that

These results can be combined to prove Lemma 3.10 exactly as for [20, Lemma 4.2].

Given these, one can then neglect polynomials of high degree, in the following sense.

Lemma 3.11

([20, Lemma 4.4]) For any M with , we have

We must next evaluate this infinite sum. For \(c\in \mathbb {R}\), define the function \(f_c\) by

Proposition 3.12

([20, Proposition 4.5]) Let \(m \in \mathbb {N}\). Then

$$\begin{aligned} \sum \nolimits _{r=1}^{\infty } e^{-rc} T_r(m) = f_c(m). \end{aligned}$$

Finally, we evaluate this function at \({{\,\mathrm{Fix}\,}}\sigma \) with \(\sigma \sim {{\,\mathrm{Unif}\,}}(\mathcal {A}_{n;k,t})\) and take the expectation.

Lemma 3.13

(cf [20, Lemma 4.6]) We have

The idea being this lemma is simple: it is well-known that if \(\sigma \sim {{\,\mathrm{Unif}\,}}(\mathcal {S}_n)\) then \({{\,\mathrm{Fix}\,}}\sigma \rightarrow ^d {{\,\mathrm{Pois}\,}}(1)\); we show that the same is true when \(\sigma \) is restricted to having a prescribed parity.

Proof of Lemma 3.13

We claim that precisely half the permutations with a given number of fixed points are even (and hence half are odd): if \(\mathcal {A}_n\) is the alternating group of even permutations, then

Given this claim, the lemma follows easily, as in [20, Lemma 4.6].

We now justify our claim. First, we find the number of permutations in \(\mathcal {S}_n\) (of either parity) with exactly r fixed points, which we denote \(f_{n,r}\). Note that \( f_{n,r} = \left( {\begin{array}{c}n\\ r\end{array}}\right) f_{n-r,0}.\) Indeed: first select the r points to be fixed, for which there are \(\left( {\begin{array}{c}n\\ r\end{array}}\right) \) choices; then choose a permutation on the remaining \(n-r\) points with no fixed points. It remains to calculate \(f_{m,0}\), ie the number of derangements of m objects, for each \(m \in \{ 0, \ldots , n \} \). To do this, we use the inclusion–exclusion principle. For \(i \in [m]\), let \( \mathcal {S}_{m,i} := \{ \sigma \in \mathcal {S}_m \mid \sigma (i) = i \} \) denote the set of permutations on m objects that fix the i-th object. Observe that \( |\cap _{i \in I} \mathcal {S}_{m,i} | = |\mathcal {S}_{m - |I |} | = (m - |I |)! \) for all \(I \subseteq [m]\); for each \(\ell \in [m]\), there are \(\left( {\begin{array}{c}m\\ \ell \end{array}}\right) \) choices of \(I \subseteq [m]\) with \(|I | = \ell \). Hence, by inclusion–exclusion, we have

$$\begin{aligned} f_{m,0} = m! \sum \nolimits _{\ell =0}^{m} (-1)^\ell / \ell !. \end{aligned}$$

Combined with the fact that \(f_{n,r} = \left( {\begin{array}{c}n\\ r\end{array}}\right) f_{n-r,0}\), we thus deduce that

We now turn to even permutations, ie \(\mathcal {A}_n\). We apply an analogous method. Denote by \(f'_{n,r}\) the number of permutations in \(\mathcal {A}_n\) with exactly r fixed points. Since appending fixed points to a permutation does not change its parity, again we have \(f'_{n,r} = \left( {\begin{array}{c}n\\ r\end{array}}\right) f'_{n-r,0}.\) For \(m \in \mathbb {N}\) and \(i \in [m]\), define \( \mathcal {A}_{m,i} := \{ \sigma \in \mathcal {A}_m \mid \sigma (i) = i \} .\) Analogously to before, since appending fixed points does not affect the parity, we have \( |\cap _{i \in I} \mathcal {A}_{m,i} | = |\mathcal {A}_{m - |I |} | = \tfrac{1}{2} (m - |I |)! \) for all \(I \subseteq [m]\). This is a factor \(\tfrac{1}{2}\) different to \(|\cap _{i \in I} \mathcal {S}_{m,i} |\) from before. Using the inclusion–exclusion principle thus gives, as before,

Since half the permutations of a given sign are even, ie \(f'_{n,r} = \tfrac{1}{2} f_{n,r}\), the other half must be odd. \(\square \)

Observe that Lemmas 3.11 and 3.12 and Proposition 3.13 are statements purely about the representation theory of the symmetric group; it is nothing to do with the random walk.

Using standard applications of the triangle inequality, these lemmas can then be combined to deduce that the main term converges to the TV-distance in question; see [20, §4.4].

Proof of Lemma 3.8

Let \(\varepsilon > 0\) and let M and n be large enough so that all the approximations are true up to an additive error of \(\varepsilon \). The following inequalities hold:

Since \(\varepsilon > 0\) was arbitrary, the proof is now completed by the triangle inequality. \(\square \)

3.3.2 Controlling the error term

Finally, we control the error term, ie Lemma 3.7. As before, we consider only \(\lambda \) with \(\lambda _1 \ge \lambda '_1\). Consider first the dimensions of the irreducible representations, ie \(d_\lambda \).

Lemma 3.14

The following bounds hold:

(3.1)
(3.2)

Proof

It is well-known that \(d_\lambda \) is equal to the number of ways of placing the numbers 1 through n into the Young diagram of \(\lambda \); see, eg, [10, Lemma 6]. From this, it is immediate that \(d_\lambda \le \left( {\begin{array}{c}n\\ \lambda _1\end{array}}\right) d_{\lambda ^*}\) where \(\lambda ^* :=\lambda \setminus \lambda _1\) is the partition obtained by removing the largest element of \(\lambda \). It is also standard that \(\sum _{\rho \vdash r} d_\rho ^2 = |\mathcal {S}_r | = r!\); see, eg, [6, Theorem 3.8.11]. (This last claim is true for any group, not just the symmetric group.) Associate to the partition \(\rho = (\rho _1, \ldots , \rho _r)\) of [r], written in increasing order, the subset \(\tilde{\rho }:= \{ \rho _1, \rho _1 + \rho _2, \ldots , \rho _1 + \cdots + \rho _r \} \) of [r]. This mapping is injective, and so \( | \{ \rho \mid \rho \vdash r \} | \le | \{ \tilde{\rho }\mid \tilde{\rho }\subseteq [r] \} | = 2^r.\) Combining these bounds and using Cauchy–Schwarz gives

The second claim is a special case: \( n^r / r^{r/2} \le n^r / (\tfrac{1}{3} n)^{r/2} = 3^{r/2} n^{r/2} \le 2^r n^{r/2}\) when \( r \ge \tfrac{1}{3} n. \) \(\square \)

We split the summation \(\sum _{r=M}^{\infty }\) in the error term \(\text {ET}_M\) into two parts: \(r \le 0.495 n\) and \(r \ge 0.495 n\); the latter sum is separated according to whether or not .

Proof of Lemma 3.7

Throughout this proof, \( t :=\tfrac{1}{k} n ( \log n + c ) .\)

Consider first \(r \in [M,0.495n]\) with \(2 \le k \ll n\). Recall Corollary 3.4 which implies that

Note that \(tk \asymp n \log n \ll n^2\). Thus, for all \(c\in \mathbb {R}\), using (3.1), we have

(3.3a)
(3.3b)
(3.3c)

The summand is independent of n, and gives rise to a summable series; hence \(\mathcal {E}_1 \rightarrow 0\) as \(M\rightarrow \infty \).

Consider next \(r \in [0.495n, n]\) with . When \(r \ge 0.495n\) we have \(a_1 \le 0.505n < e^{-0.68}n\). Recall Theorem 3.5 which implies that

Now, \(tk = n(\log n + c)\). Hence, for all \(c\in \mathbb {R}\), using (3.2), we have

(3.4a)
(3.4b)
(3.4c)
(3.4d)

Consider finally \(r \in [0.495n, n]\) with . Note that \(r \ge 0.495 n \ge \tfrac{1}{3} n\). Recall Lemma 3.6 which implies that

Again, \(tk = n(\log n + c)\). Hence, for all \(c\in \mathbb {R}\), using (3.2), we have

(3.5a)
(3.5b)
(3.5c)
(3.5d)

The lemma follows immediately from these three considerations, namely (3.3, 3.4, 3.5). Indeed, define \(\mathcal {E}_2 :=\mathcal {E}_{2,<} \varvec{1}( 2 \le k \le 6 \log n ) + \mathcal {E}_{2,>} \varvec{1}( 6 \log n < k \ll n )\). Then \( \text {ET}_M = \mathcal {E}_1 + \mathcal {E}_2.\) The lemma follows from (3.3, 3.4, 3.5). \(\square \)

3.4 Proofs of character ratio bounds

In this section we give the deferred proofs from §3.2.

Proof of Corollary 3.4

Write \(P_0\), \(P_1\) and \(P_2\) for the three terms in the product from Theorem 3.3:

$$\begin{aligned}&P_0 :=\frac{(n-r-1)_{k}}{(n)_{k}}; \quad P_1 :=\prod _{i=2}^m \biggl ( 1 - \frac{k}{n - (1 + r + \lambda _i - i)} \biggr );\\&\quad P_2 :=\prod _{i=1}^m \biggl ( 1 - \frac{k}{n - (r - \lambda '_i + i)} \biggr )^{\!-1}. \end{aligned}$$

Then, by Theorem 3.3 (ie [14, Theorem 5(a)]), the main contribution to \(s_\lambda (k)\) is \(P_0 P_1 P_2\).

Since r is a constant, all the \( \{ a_i, b_i, \lambda _i, \lambda '_i \} \) are order 1, with the exception of \(\lambda _1 = n - r\) and \(a_1 = n - r - \tfrac{3}{2}\). Hence all the terms in the two products are very similar to \(1-k/n\) and in the first term to \(1-r/n\). In particular, for \(1 \le j,\ell \le \tfrac{1}{2} n\), we have

$$\begin{aligned} 1 - \tfrac{\ell }{n-j} = 1 - \tfrac{\ell }{n} \tfrac{1}{1-j/n} = 1 - \tfrac{\ell }{n} + \tfrac{\ell }{n} \tfrac{j/n}{1-j/n} = \bigl ( 1 - \tfrac{\ell }{n} \bigr ) \bigl ( 1 + \mathcal {O}( jl/n^2 ) \bigr ). \end{aligned}$$
(3.6)

We turn first to \(P_0\). First note that \((n-r-1)_{k} = (n-r)_{k} \cdot ( 1 - k/(n-r) ) \). We have

(3.7)

Combining (3.6, 3.7), and using the fact that \(r \asymp 1\), we obtain

Now, \(P_0 = \tfrac{n-r-1}{n-k} \cdot (n-r)_{k} / (n)_{k}.\) So applying (3.6) again, we obtain

$$\begin{aligned} P_0 = ( 1 - r/n ) ^k ( 1 - k/n ) \bigl ( 1 + \mathcal {O}( k/n^2 ) \bigr ). \end{aligned}$$
(3.8)

We now turn to \(P_1\) and \(P_2\). Using the approximation to \(1 - \ell /(n-j)\), ie (3.6), the following hold:

(3.9)
(3.10)

Also \(\max \{ \lambda '_1, \{ \lambda _i,\lambda '_i \} _2^m, m \} \le r \asymp 1\). (Recall that \(\lambda = (a_1, \ldots , a_m \mid b_1, \ldots , b_m)\).) Hence

$$\begin{aligned} P_1 P_2 = ( 1 - k/n ) ^{-1} ( 1 + \mathcal {O}( k/n^2 ) ) . \end{aligned}$$
(3.11)

Combining the expressions for \(P_0\) and \(P_1 P_2\), ie (3.8, 3.11), we obtain

$$\begin{aligned} P_0 P_1 P_2 = ( 1 - r/n ) ^k \bigl ( 1 + \mathcal {O}( k/n^2 ) \bigr ) = e^{-rk/n} \bigl ( 1 + \mathcal {O}( k/n^2 ) \bigr ). \end{aligned}$$

This is the main contribution to \(s_\lambda (k)\); it remains to control the error in Theorem 3.3.

If \(k \gg 1\) then we necessarily have \(r < k\), and so the error term is 0; if \(k \asymp 1\), then the error term is \(\mathcal {O}( n^{-k} ) = \mathcal {O}( n^{-2} )\), as \(k \ge 2\). But \((1 - r/n)^k \asymp 1\), since \(k \le \tfrac{1}{3} n\) and \(r \asymp 1\), and so this additive \(\mathcal {O}( 1/n^2 )\) error is absorbed into the larger \(\mathcal {O}( k/n^2 )\) error.

In summary, we have shown the desired expression for the character ratio \(s_\lambda (k)\):

$$\begin{aligned} s_\lambda (k) = e^{-rk/n} \bigl ( 1 + \mathcal {O}( k/n^2 ) \bigr ). \end{aligned}$$

\(\square \)

Proof of Theorem 3.5

Choose \(\theta ' :=0.677\); then \(\theta ' - \tfrac{1}{6} > \tfrac{1}{2} + \tfrac{1}{10}\). Noting that inspection of the proof of [14, Theorem 5(b)] gives the upper bound

\(\square \)

Proof of Lemma 3.6

Under the given assumptions, [14, Lemma 14] states that

Further, we claim that if \(r :=n - \lambda _1\) satisfies \(\tfrac{1}{3} n \le r \le n\) then

Note that and . Since \(r \asymp n\), and so \(k r/n \asymp k > rsim 1\), these error terms can be absorbed by . Lemma 3.6 then follows.

It remains to prove our claim, which is a slight sharpening of [14, Lemma 15]. The following claim comes from inspecting the proof of [14, Lemma 15]: in order to prove that

it suffices, writing \(\delta :=r/n \in [\tfrac{1}{3}, 1]\), to prove that

The worst case is clearly \(k = 2\), in which case \(k/(k-1) = 2\). Thus we need \(1 - \delta \le e^{-2c\delta }\). If one can allow \(\delta \) all the way down to 0, then one must take \(c \le \tfrac{1}{2}\); however, we only need \(\delta \in [\tfrac{1}{3}, 1]\). One can then check that it is then sufficient to take c so that

In particular, we may take \(c :=\tfrac{1}{2} + \tfrac{1}{10} + \tfrac{1}{200} = 0.605\). \(\square \)

4 Random walks on homogeneous spaces

Throughout this section, G will be a finite group and K a subgroup. Denote the homogeneous space consisting of the (right) cosets by \( X :=G/K := \{ g K \mid g \in G \} . \) Denote the set of complex-valued functions on X by \( L(X) := \{ f : X \rightarrow \mathbb {C} \} . \) We frequently identify this with the space of K invariant functions on G, ie those \(f : G \rightarrow \mathbb {C}\) for which \(f(gk) = f(g)\) for all \(g \in G\) and all \(k \in K\).

4.1 Gelfand pairs and spherical Fourier analysis for invariant random walks

The majority of this subsection—namely, the analysis leading up to Proposition 4.7—is an abbreviated exposition of [6, §4]; a related exposition can be found in [5, §2].

Let G be a finite group and let K be a subgroup. A function \(f : G \rightarrow \mathbb {C}\) is K bi-invariant if

Definition 4.1

Let G be a finite group and K be a subgroup. The pair (GK) is called a Gelfand pair if the algebra of K bi-invariant functions (under convolution) is commutative.

Equivalently, (GK) is a Gelfand pair if the permutation representation \(\lambda \) of G on X defined by \( ( \lambda (g) f ) (x) :=f(g^{-1} x) \) for \(g \in G\), \(f \in L(X)\) and \(x \in X\), is multiplicity-free.

This equivalence is shown in [6, Theorem 4.4.2]. From now on, assume that (GK) is a Gelfand pair. We next introduce spherical functions and spherical representations.

Definition 4.2

A K bi-invariant function \(\varphi : G \rightarrow \mathbb {C}\) is said to be spherical if \(\varphi (\text {id}_G) = 1\) and For a spherical function \(\varphi \), the subspace of L(X) generated by the G-translates of \(\varphi \), ie \( V_\varphi :=\langle \lambda (g) \varphi \mid g \in G \rangle \) where \(\lambda \) is the permutation representation of G on X, is called the spherical representation.

For a representation \((\rho , V)\), write \( V^K := \{ v \in V \mid \rho (k) v = v \, \forall \, k \in K \} \) for the space of K invariant vectors in V. The following theorem is a culmination of statements from [6, §4.5 and §4.6].

Theorem 4.3

The number of distinct spherical functions equals the number of orbits of K on X. Denote by \( \{ \varphi _i \} _0^N\) the distinct spherical functions, with \(\varphi _0\) the constant function 1.

Then \( L(X) = \oplus _0^N V_{\varphi _i}, \) which is a multiplicity-free decomposition into irreps. Further, \( \{ \varphi _i \} _0^N\) forms an orthogonal basis for the set of K bi-invariant functions on G with normalisation given, for each i, by \( \sum \nolimits _{x \in X} |\varphi _i(x) |^2 = |X | / d_i \) where \(d_i :=\dim V_{\varphi _i}\) is the dimension of the irrep \(V_{\varphi _i}\).

For any irrep V we have \(\dim V^K \le 1\) and \(\dim V^K = 1\) if and only if V is spherical.

This allows us to construct a ‘spherical basis’ in which the Fourier transform has a simple form.

Definition 4.4

The spherical Fourier transform \(\widetilde{\mu }\) of a K invariant function \(\mu \in L(X)\) is defined by

Corollary 4.5

There exists an orthonormal basis of K invariant functions on G with the following property. Let \(\mu \) be a K bi-invariant function on G. If \((\tau , W)\) is a non-spherical irrep, then \(\hat{\mu }(\tau ) = 0\). If \((\rho _i, V_{\varphi _i})\) is a spherical irrep (with \(i \in \{ 0, \ldots , N \} \)), then the matrix representing the operator \(\widehat{\mu }(\rho _i)\) has only one non-zero entry, which is in the first position and has value \(|K | \widetilde{\mu }(i)\).

As a consequence, a Fourier inversion formula holds:

where \(\mu ^{*t}\) is the t-fold self-convolution of \(\mu \).

From this we immediately obtain for the TV distance between \(\mu ^{*t}\) and \({{\,\mathrm{Unif}\,}}_X\). To apply this to random walks on G, the step distribution must be K bi-invariant; this is the case if the stochastic transition matrix \(P = (p_{x,y})_{x,y \in X}\) is G-invariant: \( p_{x,y} = p_{gx,gy} \) for all \(x,y \in X\) and all \(g \in G\).

When looking at such random walks, we always start from a point which is stabilised by K.

Definition 4.6

Let G be a finite group and K be a subgroup. Let G act on the homogeneous space \(X :=G/K\) by the left coset action: \( g \cdot (hK) :=(gh) K. \) Say \(\bar{x} \in K\) is stabilised by K if \(k \cdot \bar{x} = \bar{x}\) for all \(k \in K\). Equivalently, \(\bar{g} K\) is stabilised by K if and only if \(K = \bar{g} K \bar{g}^{-1}\).

When starting a random walk with G-invariant transition matrix from \(\bar{x} \in X = G/K\) which is stabilised by K, one can then check \(P^t(\bar{x}, \cdot ) = \mu _{\bar{x}}^{*t}(\cdot )\) for all \(t \in \mathbb {N}_0\) where \(\mu _{\bar{x}}(\cdot ) :=P(\bar{x}, \cdot )\); that is, the probability of being at x after t steps when started from \(\bar{x}\) is \(\mu _{\bar{x}}^{*t}(x)\) for all \(x \in X\) and all \(t \in \mathbb {N}_0\). Altogether, we have now proved the following proposition.

Proposition 4.7

([6, Proposition 4.9.1]) Let (GK) be a Gelfand pair and denote \(X :=G/K\). Let \( \{ \varphi _i \} _{i=0}^N\) be the associated spherical functions, considered as K bi-invariant functions on X, and \( \{ d_i \} _{i=0}^N\) the associated dimensions; assume that \(\varphi _0(x) = 1\) for all \(x \in X\).

Let \(\bar{x}\) be an element of X stabilised by K. Let P be a G-invariant stochastic matrix and set \(\mu _{\bar{x}}(\cdot ) :=p_{\bar{x}, \cdot }\). Let \(t \in \mathbb {N}_0\) and \(x \in X\). Then

$$\begin{aligned} \mu _{\bar{x}}^{*t}(x) - |X |^{-1} = |X |^{-1} \sum \nolimits _{i=1}^{N} d_i \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^t, \end{aligned}$$

where \(\widetilde{\mu }_{\bar{x}}\) is the spherical Fourier transform of \(\mu _{\bar{x}}\). As a corollary, we have

$$\begin{aligned} d_\text {TV}\bigl ( P^t(\bar{x}, \cdot ), \, {{\,\mathrm{Unif}\,}}_X \bigr ) = \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \big | \sum \nolimits _{i=1}^{N} d_i \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^t \bigr |. \end{aligned}$$

We now have all the ingredients to prove our TV-approximation lemma for random walks on homogeneous spaces corresponding to Gelfand pairs, ie Lemma C; we rested it here for convenience.

Lemma 4.8

(TV Approximation Lemma) Let (GK) be a Gelfand pair and denote \(X :=G/K\). Let \(\bar{x}\) be an element of X stabilised by K. Let \( \{ \varphi _i \} _{i=0}^N\) be the associated spherical functions, considered as K bi-invariant functions on X, and \( \{ d_i \} _{i=0}^N\) the associated dimensions; assume that \(\varphi _0(x) = 1\) for all \(x \in X\). Let P be a G-invariant stochastic matrix and set \(\mu _{\bar{x}}(\cdot ) :=P ( \bar{x}, \cdot ) \).

Let \(t \in \mathbb {N}_0\) and \(I \subseteq \{ 1, \ldots , N \} \). Then

$$\begin{aligned} \Bigl |&d_\text {TV}\bigl ( P^t(\bar{x}, \cdot ), \, {{\,\mathrm{Unif}\,}}_X \bigr ) - \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \big | \sum \nolimits _{i \in I} d_i \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^t \bigr | \Bigr |\\&\le \tfrac{1}{2} \sum \nolimits _{i \notin I} \sqrt{d_i} | \widetilde{\mu }_{\bar{x}}(i) |^t, \end{aligned}$$

where \(\widetilde{\mu }_{\bar{x}} : i \mapsto \sum \nolimits _{x \in X} \mu _{\bar{x}}(x) \overline{\varphi _i(x)}\) is the spherical Fourier transform of \(\mu _{\bar{x}}\).

Proof

First we apply Proposition 4.7 and the triangle inequality:

$$\begin{aligned}&\Bigl | d_\text {TV} ( \mu _{\bar{x}}^{*k}, \, \pi ) - \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \big | \sum \nolimits _{i \in I} d_i \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^k \bigr | \Bigr |\\&\quad \le \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \big | \sum \nolimits _{i \notin I} d_i \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^k \bigr |\\&\quad \le \tfrac{1}{2} |X |^{-1} \sum \nolimits _{x \in X} \sum \nolimits _{i \notin I} d_i | \varphi _i(x) \widetilde{\mu }_{\bar{x}}(i)^k | \\&\quad = \tfrac{1}{2} \sum \nolimits _{i \notin I} d_i |\widetilde{\mu }_{\bar{x}}(i) |^k \cdot |X |^{-1} \sum \nolimits _{x \in X} |\varphi _i(x) |. \end{aligned}$$

Applying Cauchy–Schwarz and the standard spherical orthogonality relations (see, eg, [6, Proposition 4.7.1] or [5, Equation (2.11)]), we obtain

$$\begin{aligned} \bigl ( \sum \nolimits _{x \in X} |\varphi _i(x) | \bigr )^2 \le |X | \sum \nolimits _{x \in X} |\varphi _i(x) |^2 = |X | \cdot |X | / d_i. \end{aligned}$$

Plugging this into the previous bound, we deduce the lemma. \(\square \)

4.2 Limit profile for many Urn Ehrenfest diffusion

Suppose that one has \(n\) balls labelled 1 through \(n\) and \(m+1\) urns labelled 0 through m. The set of all configurations can be identified with the set \( X_{n,m+1} := \{ 0, 1, \ldots , m \} ^n: \) an element \(x = (x_1, \ldots , x_n) \in X_{n,m+1}\) indicates that the j-th ball is in the \(x_j\)-th urn. Initially, put all the balls in the first urn (labelled 0): this is the initial configuration, and corresponds to \(\bar{x} :=(0, 0, \ldots , 0)\).

We can endow X with a metric structure: for \(x,y \in X_{n,m+1}\), set

$$\begin{aligned} d(x,y) :=\big | \{ k \in [n] \mid x_k \ne y_k \} \bigr |. \end{aligned}$$

Thinking of x and y as configurations of balls, d(xy) is the number of balls which are not in the same urn in the two configurations.

We consider the random walk on \(X :=X_{n,m+1}\) described by the following step: choose uniformly at random a ball and an urn; put the chosen ball in the chosen urn. In terms of a transition matrix R on \(X \times X\), this is given by the following expressions, for \(x,y \in X\):

$$\begin{aligned} R(x,y) = \tfrac{1}{m+1} \text { if } x = y; \quad R(x,y) {=} \tfrac{1}{n(m+1)} \text { if } d(x,y) = 1; \quad R(x,y) {=} 0 \ \text {otherwise}. \end{aligned}$$

The following theorem is a restatement of Theorem C, but written more formally: cutoff is for a sequence of Markov chains; we make this sequence explicit.

Theorem 4.9

(Limit Profile for Generalised Ehrenfest Urn) Let \(n,m \in \mathbb {N}\). Consider \(n\) balls labelled \(1, \ldots , n\) and \(m+1\) urns labelled \(0, 1, \ldots , m\). Consider the following Markov chain: at each step, choose a ball and an urn uniformly and independently; place said ball in said urn. For \(t\in \mathbb {N}_0\), write \(d_\text {TV}^{n,m}(t)\) for the TV distance of this Markov chain after t steps from its invariant distribution when started with all n balls initially in the urn labelled 0.

Let \((n_N)_{N\in \mathbb {N}}, (m_N)_{N\in \mathbb {N}}\in \mathbb {N}^\mathbb {N}\). Suppose that \(\lim _N m_N / n_N = 0\). Then, for all \(c\in \mathbb {R}\), we have

As in previous sections, for ease of presentation we omit the N-subscripts in the proof. We start by phrasing the Ehrenfest urn model in Gelfand pair language. To do this, we give a very abbreviated exposition of [5, §3]. Let \(\mathcal {S}_{m+1}\) and \(\mathcal {S}_n\) be the symmetric groups on \( \{ 0, 1, \ldots , m \} \) and \( \{ 1, \ldots , n \} \), respectively. Then \(X_{n,m+1} = \{ 0, 1, \ldots , n \} ^n\) is a homogeneous space for the wreath product \(\mathcal {S}_{m+1} \wr s_n\) under the action \( ( \sigma _1, \ldots , \sigma _n; \theta ) \cdot ( x_1, \ldots , x_n ) := ( \sigma _1 x_{\theta ^{-1}(1)}, \ldots , \sigma _n x_{\theta ^{-1}(n)} ) , \) ie \(X_i\) is moved by \(\theta \) to the position \(\theta (i)\) and then it is changed by the action of \(\sigma _{\theta (i)}\). Note that the stabiliser of \(\bar{x} :=(0, 0, \ldots , 0) \in X_{n,m+1}\) coincides with the wreath product \(\mathcal {S}_m \wr \mathcal {S}_n\), where \(\mathcal {S}_m \le \mathcal {S}_{m+1}\) is the stabiliser of 0. Therefore we can write \( X_{m+1,n} = ( \mathcal {S}_{m+1} \wr \mathcal {S}_n ) / ( \mathcal {S}_m \wr \mathcal {S}_n ) . \) The action is distance transitive, and so the group \(\mathcal {S}_{m+1} \wr \mathcal {S}_n\) acts isometrically on \(X_{n,m+1}\). It follows that \( ( \mathcal {S}_{m+1} \wr \mathcal {S}_n ) / ( \mathcal {S}_m \wr \mathcal {S}_n ) \) is a Gelfand pair; see [5, Example 2.5].

The associated spherical functions and dimensions are given by the following proposition.

Theorem 4.10

(Spherical Functions; [5, Theorem 3.1]) For each \(i \in \{ 0, 1, \ldots , n \} \), the dimension \(d_i\) satisfies and the spherical function \(\varphi _i\) satisfies

Remark 4.11

The spherical functions are the Krawtchouk polynomials, given in Definition 2.6:

using the notation there. These are orthogonal with respect to the Binomial measure by Lemma 2.3:

This can also be seen as consequence of the orthogonality of spherical functions, ie Theorem 4.3.

We first determine the spherical Fourier transform of the step distribution \(\mu (\cdot ) :=R(\bar{x}, \cdot )\).

Lemma 4.12

(Spherical Fourier Transform) For all \(i\in \{ 0, 1, \ldots , n \} \), we have \( \widetilde{\mu }(i) = 1 - i/n. \)

Proof

Noting the slight laziness, we have

$$\begin{aligned} \widetilde{\mu }(i) = \tfrac{m}{m+1} \bigl ( \tfrac{1}{m} + \varphi _i(1) \bigr ). \end{aligned}$$

Using the expression for \(\varphi _i(1)\) given by Theorem 4.10, we obtain \( \widetilde{\mu }(i) = 1 - i/n. \) \(\square \)

There are \(m^\ell \left( {\begin{array}{c}n\\ \ell \end{array}}\right) \) different x with \(d(\bar{x},x) = \ell \). Applying Theorem 4.10, we obtain the following expressions for the terms in Lemma C:

Our first aim is to use this to determine which are the ‘important’ spherical statistics.

Lemma 4.13

(Error Term) For all \(\varepsilon > 0\) and all \(c\in \mathbb {R}\), there exists an \(M:=M(c,\varepsilon )\) so that, for \(t:=\tfrac{1}{2} n \log (mn) + cn\), if \(I:= \{ 1, \ldots , M \} \), then

Proof

Using Lemma 4.12, we have \( |\widetilde{\mu }(i) | \le e^{-i/n} \) for all \(i\). The inequality \(\text {ET}\le \text {ET}'\) now follows. The equality in the definition of \(\text {ET}'\) is now an immediate consequence of Theorem 4.10. For the inequality \(\text {ET}' \le \varepsilon \), choose \(M\) so that \( \sum \nolimits _{i> M} e^{-ci} / \sqrt{i!} \le \varepsilon . \) Then we have

$$\begin{aligned} \text {ET}\le \sum \nolimits _{i> M} \bigl ( (mn)^{i/2} e^{-t/n} \bigr )^i/ \sqrt{i!} = \sum \nolimits _{i> M} e^{-ci} / \sqrt{i!} \le \varepsilon . \end{aligned}$$

\(\square \)

From now on, choose \(M:=M(c,\varepsilon )\) as in Lemma 4.13. Hence, for the main term, we need only deal with spherical statistics with \(i \asymp 1\). We would then like to use the replacement \(\lambda _i\approx e^{-i/n}\).

Definition 4.14

(Adjusted Main Term) Recalling that \(t= \tfrac{1}{2} n\log (mn) + cn\), define

Conveniently, the adjusted main term \(\text {MT}'\) in this case (Definition 4.14) is exactly the same as that for the Gibbs sampler (see Definition 2.6) in §2.2; to match notation, replace \(m\) with \(\alpha \).

The following two lemmas are simply a restatement of Lemmas 2.7a and 2.7b.

Lemma 4.15

(Main Term: Approximation) For all \(\varepsilon > 0\) and all \(c\in \mathbb {R}\), with \(M:=M(c,\varepsilon )\), we have

$$\begin{aligned} \big | \text {MT}- \text {MT}' \bigr | \le 2 \varepsilon . \end{aligned}$$

Lemma 4.16

(Main Term: Evaluation) For all \(c\in \mathbb {R}\), with \(M:=M(c,\varepsilon )\), we have

$$\begin{aligned} \tfrac{1}{2} \text {MT}' \rightarrow 2 \, \Phi \bigl ( \tfrac{1}{2} e^{-c} \bigr ) - 1. \end{aligned}$$

We now have all the ingredients to establish the limit profile for the Ehrenfest urn model.

Proof of Theorem 4.9

Let us summarise what we have proved. These are all evaluated at the target mixing time \( t= \tfrac{1}{2} \log (mn) + cn\) with \(M:=M(c,\varepsilon )\) given by Lemma 4.13.

  • By Lemma 4.13, the error term \(\text {ET}\) satisfies \( \text {ET}\le \varepsilon . \)

  • By Lemma 4.15, the original main term \(\text {MT}\) satisfies \( |\text {MT}- \text {MT}' | \le 2 \varepsilon . \)

  • By Lemma 4.16, the adjusted main term \(\text {MT}'\) satisfies \( \tfrac{1}{2} \text {MT}' \rightarrow 2 \, \Phi (\tfrac{1}{2} e^{-c}) - 1 \) .

Since \(\varepsilon > 0\) is arbitrary, applying the TV-approximation lemma for random walks on homogeneous spaces, namely Lemma C, we immediately deduce the theorem. \(\square \)