Limit Profiles for Reversible Markov Chains

In a recent breakthrough, Teyssier [Tey20] introduced a new method for approximating the distance from equilibrium of a random walk on a group. He used it to study the limit profile for the random transpositions card shuffle. His techniques were restricted to conjugacy-invariant random walks on groups; we derive similar approximation lemmas for random walks on homogeneous spaces and for general reversible Markov chains. We illustrate applications of these lemmas to some famous problems: the $k$-cycle shuffle, improving results of Hough [Hou16] and Berestycki, Schramm and Zeitouni [BSZ11]; the Ehrenfest urn diffusion with many urns, improving results of Ceccherini-Silberstein, Scarabotti and Tolli [CST07]; a Gibbs sampler, which is a fundamental tool in statistical physics, with Binomial prior and hypergeometric posterior, improving results of Diaconis, Khare and Saloff-Coste [DKS08].

exists for all α ∈ R.
The limit N → ∞ is taken for each fixed α ∈ R.
Officially, when we look at d TV (t), we need t ∈ N; in practice, we omit floor/ceiling signs.

TV Convergence Profile for Random Walks
In this paper we present three lemmas for obtaining the TV profile for random walks; see Lemmas A to C. They work by finding a decomposition of the TV distance as a sum using either a spectral decomposition or Fourier analysis. One then separates out the 'important' terms in the sum to give a 'main term' (which asymptotically captures all the TV mass) and an 'error' term. Lemmas A and C are original contributions; Lemma B is due to Teyssier [Tey20]. For each lemma, we give an example application, establishing a limit profile of the TV convergence to equilibrium.
We denote the cdf of the standard normal distribution by Φ throughout the paper.

Reversible Markov Chains
First we consider general reversible Markov chains on an arbitrary set Ω. The following lemma is based off the well-known spectral decomposition for a reversible Markov chain P : for all x, y ∈ Ω and t ∈ N 0 , where P t (x, y) is the probability of moving from x to y in t steps, π is the invariant distribution and are the eigenstatistics; see [LPW17,Lemma 12.2]. Recall that, for x ∈ Ω and t ∈ N 0 , we write d TV (t, x) for the TV distance from π after t steps when started from x.
We come to our first contribution: the TV-approximation lemma for reversible Markov chains.
Lemma A (Reversible Markov Chains). Consider a reversible, irreducible and aperiodic Markov chain on a finite set Ω with invariant distribution π. Denote by −1 < λ |Ω| ≤ . . . ≤ λ 2 < λ 1 = 1 its eigenvalues and by f |Ω| , ..., f 1 its corresponding orthonormal (with respect to π) eigenvectors. For t ∈ N 0 and x ∈ Ω, denote by d TV (t, x) the TV distance from equilibrium (ie π) of the Markov chain started from x. For all t ∈ N 0 , all x ∈ Ω and all I ⊆ {2, ..., |Ω|}, we have As an application of Lemma A, we determine the limit profile for a specific two-component Gibbs sampler, which is an important tool in statistical physics as explained in [DKS08,§1].
Let (X , F , µ) and (Θ, G , π) be two probability spaces. The probability measure π is called the prior. Let {f θ (·)} θ∈Θ be a family of probability densities on X with respect to µ. These define a probability measure Pr on X × Θ by The marginal density on X is given by m(x) := θ f θ (x)dπ(θ) for x ∈ X . The posterior density with respect to the prior π is defined by π(θ|x) := f θ (x)/m(x) for (x, θ) ∈ X × Θ.
The (X -chain) Gibbs sampler is defined informally as follows (each draw is independent): · input x; · draw θ ∼ π(·|x); · draw x ′ ∼ f θ (·); · output x ′ . Formally, it is the Markov chain defined by the transition kernel P given by Observe that P is reversible with respect to m, ie the marginal density on X .
We consider the special case of location families: f θ (x) = g(x − θ) for all (x, θ) ∈ X × Θ. for some function g; see [DKS08,§5]. The Gibbs sampler can then be realised in the following way: · input x; · draw θ ∼ π(·|x); · draw ε ∼ g; · output x ′ := θ + ε. We consider prior π and g each being Binomial, which leads to a hypergeometric posterior. Our next contribution is the limit profile for the two-component Gibbs sampler with Binomial priors, established as an application of Lemma A. A more refined statement is given in Theorem 2.1.
The above set-up implicitly sets the sample spaces X := {0, ..., n 1 } and Θ := {0, ..., n 2 } and the event spaces to be the respective set of all subsets. The sample spaces are finite, so this is natural.
Cutoff for the L 2 mixing time of this Gibbs sampler was established by Diaconis, Khare and Saloff-Coste [DKS08, §5.1]; these tools could likely be adapted to give cutoff for the usual TV (L 1 ) mixing time. However, the techniques of Diaconis, Khare and Saloff-Coste [DKS08, §5.1] are not sufficiently refined to give access to the limit profile; a more detailed analysis is required.

Random Walks on Groups
We start by recalling some standard terminology from representation theory.
Definition. Let G be a finite group and V a finite dimensional vector space over C. A representation ρ of G over V is an action (g, v) → ρ(g) · v : G × V → V such that ρ(g) : V → V is an invertible linear map for all g ∈ G. The Fourier transform of a function µ : G → C with respect to the representation (ρ, V ) is the linear operator µ(ρ) : V → V defined by µ(ρ) := g∈G µ(g)ρ(g).
Using the Fourier inversion formula, for all probability measures µ on G and all t ∈ N 0 , we have where µ * t is the t-fold self-convolution of µ, R * is the set of all non-constant irreducible representations (abbreviated irreps) of G and d ρ is the dimension of the irrep ρ; see [CST08, §3.10].
If µ is the step distribution of a random walk on G, then this determines exactly TV distance after t steps; cf the well-known spectral representation for reversible random walks. One must still control the Fourier transform at arbitrary irreps. There are two important special cases.
· Suppose that µ is conjugacy-invariant, ie µ(g) = µ(h −1 gh) for all g, h ∈ G. By Schur's lemma, µ(ρ) is a multiple of the identity for each irrep ρ. Then the key object in calculating the Fourier transform is the character : χ g (ρ) := tr(ρ(g)) for g ∈ G and ρ ∈ R * . This is the case considered originally in [DS81], and then in [Tey20], for random transpositions.
· Suppose that the matrices µ(ρ) have only one non-trivial entry which is in the first position (in an appropriate 'spherical' basis). This radical but frequent simplification occurs in the framework of Gelfand pairs; see §4 for details. Diaconis and Shahshahani [DS87] consider this in the set-up of the Bernoulli-Laplace urn model, and more generally.

Conjugacy-Invariant Random Walks
In this subsection we state Teyssier's lemma for conjugacy-invariant random walks.
Definition B. A random walk G is conjugacy-invariant if there is a probability measure µ which is constant on each conjugacy class of G for which the transition matrix P satisfies P (x, xg) = µ(g) for all x ∈ G. For a representation ρ, define the character ratio s ρ := d −1 ρ g∈G µ(g)χ g (ρ).
Teyssier's lemma for conjugacy-invariant random walks states the following.
Lemma B (Teyssier [Tey20, Lemma 2.1]). Let G be a finite group; let µ : G → [0, 1] be a conjugacyinvariant probability distribution on G. For t ∈ N 0 , denote by d TV (t) the TV distance to equilibrium of the random walk on G started from the identity with step distribution µ and run for t steps. Let t ∈ N 0 and I ⊆ R * , ie the set of non-trivial irreps of G. Then d TV (t) = d TV (µ * t , Unif G ) and We apply this lemma to the k-cycle random walk on the symmetric group S n . In this walk, at each step a k-cycle is chosen uniformly at random and composed with the current location. We establish the limit profile for 2 ≤ k ≪ n. There are parity constraints. To handle such parity constraints, we follow the set-up used by Hough: · if k is odd, then the walk is supported on the set of even permutations; · if k is even and t is even, then the walk at time t is supported on the set of even permutations; · if k is even and t is odd, then the walk at time t is supported on the set of odd permutations. We come to our next contribution: the limit profile for the random k-cycle shuffle, established as an application of Lemma B. A more refined statement is given in Theorem 3.1.
Theorem B (Random k-Cycles). Let k, n ∈ N with k ≥ 2. For t ∈ N 0 , denote by d n,k TV (t) the TV distance of the k-cycle random walk on S n from the uniform distribution on the appropriate set of permutations of a fixed parity started from the identity and run for t steps.
Suppose that 2 ≤ k ≪ n. Then, for all c ∈ R (independent of n), we have Cutoff for the this shuffle was already established, but not the limit profile. The case of random transpositions, ie k = 2, was one of the first Markov chains studied using representation theory; cutoff was established by Diaconis and Shahshahani [DS81]. For general 2 ≤ k ≪ n, cutoff was established for by Hough [Hou16] using representation theory. Berestycki, Schramm and Zeitouni [BSZ11] previously established the same result for k independent of n, using probabilistic arguments instead of representation theory. Berestycki and Şengül [BŞ19] studied a generalisation where one draws uniformly from a prescribed conjugacy class with support k with 2 ≤ k ≪ n.
The limit profile, even for k = 2, remained a famous open problem for a long time. A breakthrough came recently by Teyssier [Tey20], using Lemma B above; we apply this lemma here. Also, we adapt and extend some character theory for the k-cycle walk developed by Hough [Hou16]. Finally, we adapt and extend some of the analysis of Teyssier [Tey20] from k = 2 to general k.

Random Walks on Homogenous Spaces
Finally we turn our attention to random walks on homogeneous spaces X := G/K, where G is a finite group and K a subgroup of G. Where [DS81,Tey20] considered conjugacy-invariant µ to simplify the calculation of the Fourier transforms, here we consider the case that µ is K bi-invariant, ie µ(k 1 gk 2 ) = µ(g) for all g ∈ G and all k 1 , k 2 ∈ K and that (G, K) is a Gelfand pair, ie the algebra of K bi-invariant functions (under convolution) is commutative; see Definition 4.1. In this case, for any K bi-invariant function µ on G, if (ρ, V ) is a spherical irrep, defined in Definition 4.2, then the matrix µ(ρ) has only one non-zero entry, which is in the top-left position; this entry is called the spherical Fourier transform of µ with respect to ρ (rescaled by |K|). Moreover, if (τ, W ) is a non-spherical irrep, then µ(τ ) = 0 is the zero matrix.
Using this simplification, we prove the following lemma for random walks on homogeneous spaces corresponding to a Gelfand pair started from some elementx ∈ K stabilised by K, ie kx =x for all k ∈ K (under the usual left coset action). The canonical quotient projection G → G/K preserves the uniform distribution. So the invariant distribution of any random walk on a homogenous space is uniform on that space.
Our next contribution is a TV-approximation lemma for random walks on homogeneous spaces.
Lemma C (Homogeneous Spaces). Let (G, K) be a Gelfand pair and denote X := G/K. Letx be an element of X whose stabiliser is K. Let {ϕ i } N i=0 be the associated spherical functions, with ϕ 0 (x) = 1 for all x ∈ X, considered as K-invariant functions on X, and {d i } N i=0 the associated dimensions. Let P be a G-invariant stochastic matrix and set µx(·) := P (x, ·). For t ∈ N 0 , denote by d TV (t,x) the TV distance to equilibrium of the random walk on X started fromx with step distribution µx and run for t steps.
where µx : i → x∈X µx(x)ϕ i (x) is the spherical Fourier transform of µ with respect to {ϕ i } N i=0 .
We come to our final contribution: the limit profile for the multiple urn Ehrenfest urn diffusion model, established as an application of Lemma C. A more refined statement is given in Theorem 4.9.
Theorem C (Ehrenfest Urn). Let n, m ∈ N. Consider n labelled balls and m + 1 labelled urns. Consider the following Markov chain: at each step, choose a ball and an urn uniformly and independently; place said ball in said urn. For t ∈ N 0 , denote by d n,m TV (t) the TV distance of this urn model started with all balls in a single urn from its invariant distribution and run for t steps.
Cutoff, but not the limit profile, was established for this multiple urn model by Ceccherini-Silberstein, Scarabotti and Tolli [CST07, §6] using representation theory. To establish the profile, we apply the approximation lemma for random walks on homogeneous spaces, ie Lemma C, using the character theory developed by Ceccherini-Silberstein, Scarabotti and Tolli [CST07].
This model was originally introduced (with two urns) by Ehrenfest and Ehrenfest [EE07] in 1907. In this case, the model can be viewed as a TV-preserving projection of the simple random walk on the n-hypercube. There cutoff was established by Aldous [Ald83, Example 3.19]. The limit profile is even known: see Salez [Sal18, Theorem 18 in §6.2] (in French) for a 'probabilistic' argument using convergence theorems or Diaconis, Graham and Morrison [DGM90, Theorem 1] for a Fourier analytical argument. We present a significantly simpler Fourier analytical argument, using only basic representation theory of the Abelian group Z d 2 in Theorem 5.1.

Corollaries to TV Approximation Lemma for Reversible Markov Chains (Lemma A)
We close this section with two simple corollaries of the general TV-approximation lemma for reversible Markov chains, Lemma A. The first is for transitive Markov chains; the second is for typical TV distance. For transitive chains, the starting point is irrelevant; that is, for each t, the map x → d TV (t, x) is constant (ie does not depend on the input x). In particular, d TV (·) = x∈Ω π(x)d TV (·, x). Also, by transitivity, the invariant distribution π is uniform on Ω.
Corollary A.1. Consider the set-up of Lemma A; in addition, assume that the chain is transitive.
For all t ∈ N 0 and I ⊆ {2, ..., |Ω|}, we have Instead of looking at TV from a given starting point, we can also consider averaging over the starting point (with respect to the invariant distribution). This is sometimes known as typical TV distance (as opposed to worst-case). For t ∈ N 0 , denote

Organisation of the Paper
The remainder of the paper is organised as follows. §2 Here we study general reversible Markov chains. We prove the our TV-approximation lemma (Lemma A) via an application of the spectral decomposition for reversible Markov chains.
As an application of Lemma A, we establish the limit profile for a two-component Gibbs sampler, which are fundamental tools in statistical physics (see [DKS08,§1]). §3 Here we establish the limit profile of the random k-cycle walk on the symmetric group. We do this via an application of the TV-approximation lemma of Teyssier [Tey20] (Lemma B), along with extending and applying character theory developed by Hough [Hou16]. §4 Here we study random walks on homogeneous spaces corresponding to Gelfand pairs. We develop and apply (mostly classical) theory to prove our TV-approximation lemma (Lemma C).
As an application of Lemma C, we establish the limit profile for the famous Ehrenfest urn diffusion with many urns, using some character theory developed by Ceccherini-Silberstein, Scarabotti and Tolli [CST07].

Reversible Markov Chains
In this section general reversible Markov chains are considered. First we prove the lemma and corollaries from the introduction, then we apply them to a Gibbs sampler.

Proof of TV-Approximation Lemmas for Reversible Markov Chains
Lemma A follows from the usual spectral representation of TV distance along with some algebraic manipulations and inequalities. Corollaries A.1 and A.2 follow, in an identical way to each other, from averaging both sides of Lemma A with respect to π. We give the full details now.
Proof of Lemma A. As an immediate consequence of [LPW17, Lemma 12.2], for x ∈ Ω, we have Let I ⊆ {2, ..., |Ω|}. Elementary manipulations using the triangle inequality (twice) then Cauchy-Schwarz and the fact that the eigenfunctions are orthonormal with respect to π, give Proof of Corollaries A.1 and A.2. For a transitive chain, for each t, the map The corollaries now follow by averaging the error term with respect to π, using Cauchy-Schwarz and the normalisation of the eigenfunctions.
The following theorem is a restatement of Theorem A, but written more formally: cutoff is for a sequence of Markov chains; we make this sequence explicit.
As in previous sections, for ease of presentation we omit the N -subscripts in the proof. The technical calculations in this section are analogous to those in §2.2; the eigenfunctions are the same (after a reparametrisation) but the eigenvalues are slightly different.
It is straightforward to check that the invariant distribution m of the X -chain is Binomial: The eigenfunctions are then the family of polynomials orthogonal to the Binomial. These are the Krawtchouk polynomials (appropriately rescaled), defined precisely now.
When the second two parameters are fixed, abbreviate The Krawtchouk polynomials are orthogonal with respect to the Binomial measure.
Thus the Krawtchouk polynomials are orthogonal with respect to the Binomial measure: The following proposition describes the eigenstatistics of this model; it is taken from Diaconis, Khare and Saloff-Coste [DKS08, §5.1].
Applying Proposition 2.4, we obtain the following expressions for the terms in Lemma A: Our first aim is to use this to determine which are the 'important' eigenstatistics.
Lemma 2.5 (Error Term). For all ε > 0 and all c ∈ R, there exists an M := M (ε, c) so that, for From now on, choose M := M (c, ε) as in Lemma 2.5. Hence, for the main term, we need only deal with eigenstatistics with i ≍ 1. We would then like to use the replacement λ i ≈ (1 − n 2 /n) i .
Lemma 2.7a (Main Term: Approximation). For all ε > 0 and c ∈ R, for M := M (c, ε), we have It thus suffices to work with the MT ′ , which has a significantly simpler form. This is the main power of the technique: it allows us to replace the complicated λ t i by the simpler λ it 1 . Typically, this power will be much easier to handle, particularly when melded with Binomial coefficients.
For the first sum, which we denote S 1 , we use the relation max i∈[M] |λ t i /λ it 1 − 1| = o(1), which is easy to derive. Using Cauchy-Schwarz and the unit-normalisation of the eigenfunctions as well as the relations λ it 1 = e −ci (αn) −i/2 and f i (0) = α i/2 n i 1/2 ≤ (αn) i/2 / √ i!, we see that For the second sum, which we denote S 2 , using Cauchy-Schwarz and the unit-normalisation of the eigenfunctions again and then the error term bound of Lemma 2.5, we see that In conclusion, we see that |MT − MT ′ | ≤ ε + o(1) ≤ 2ε (asymptotically), as desired.
Proof of Lemma 2.7b. Evaluating this requires some algebraic manipulation then approximation. For convenience, we drop some of the min/max from the limits in the sum in ϕ i ; define N r := 0 whenever it is not the case that 0 We now need to take absolute values and average with respect to the weights α ℓ n ℓ /(α + 1) n . Observe that, for any ζ ∈ R, we have So, setting p x := α α+1 (1 − x/ √ αn) for x ∈ R, the above is a Bin(n, p e −c )-type probability. Indeed, It remains to compare these Binomials. We do precisely this via the local CLT in Lemma 5.2: We now have all the ingredients to establish the limit profile for this Gibbs sampler.
Proof of Theorem 2.1. Let us summarise what we have proved. The following are all evaluated at the target mixing time t = 1 2 log(αn) + cn with M := M (c, ε) given by Lemma 2.5. · By Lemma 2.5, we have ET ≤ ε. · By Lemma 2.7a, we have |MT − MT ′ | ≤ 2ε for n sufficiently large · By Lemma 2.7b, we have 1 2 MT ′ → 2 Φ( 1 2 e −c ) − 1 as n → ∞. Since ε > 0 is arbitrary, applying the TV-approximation lemma for reversible Markov chains, namely Lemma A, we immediately deduce the theorem.
3 Random k-Cycle Walk on the Symmetric Group

Walk Definition and Statement of Result
We analyse the limit profile of the random k-cycle walk on the symmetric group S n . This random walk starts (without loss of generality) from the identity permutation, and a step involves composing the current location with a uniformly chosen k-cycle. This is an extension of the random transpositions studied by Teyssier [Tey20]. We use representation theory for k-cycles, studied recently by Hough [Hou16], who established cutoff for any 2 ≤ k ≪ n, and found the order of the window if further k ≪ n/ log n. We determine the limit profile for any 2 ≤ k ≪ n.
For S n , the irreducible representations are indexed by partitions of n. As is common for card shuffles, the main contribution comes from those partitions with long first row; it is these we use as our set I. We sharpen some of Hough's results slightly to determine the limit profile.
Theorem 3.1 (Random k-Cycle Walk). Let n, k ∈ N. Consider the random k-cycle walk on S n : start at id ∈ S n ; at each step, choose a k-cycle τ uniformly at random; move by right-multiplication. For t ∈ N 0 , write d n,k TV (t) for the TV distance of the random k-cycle walk on S n from the uniform distribution on the appropriate set of permutations of a fixed parity, ie the odd ones if k is even and t is odd and the even ones otherwise. Let Throughout the proof, for notational ease, we drop the subscripts, just writing k and n, and assuming that 2 ≤ k ≪ n. Write A n;k,t for the set of odd permutations in S n if k is even and t is odd and the even permutations otherwise. Then the k-cycle walk at time t is supported on A n;k,t .
It is well-known that the irreducible representations for S n are parametrised by partitions of n; see [Dia88]. We need to find a collection of irreducible representations which asymptotically contains all the total variation mass. As is often the case with card shuffle-type walks, it is the partitions with long first row which we use. More precisely, for a partition λ of n, write λ = (λ 1 , ..., λ n ) with λ 1 ≥ · · · λ n ; let M ∈ N, and set P n (M ) := λ partition of n | n − M < λ 1 < n .
The trivial representation, denoted triv n , corresponds to the partition with only one block, ie triv n 1 = n and triv n i = 0 for i ≥ 2. Write P * n (M ) := P n (M ) ∪ {triv n }. We now phrase Teyssier's lemma, ie Lemma B, in this set-up.
Lemma 3.2. For all t ∈ N 0 and all M ≥ 1, we have Given k and t, the random walk is supported on the set of permutations with a fixed sign; half the permutations are odd and half are even. Hence the factor 1 2 |A n;k,t | −1 = 1 n! in the lemma above. (We emphasise the dependence on k in the character ratio s λ (k).) Outline of Proof of Theorem 3.1. We show in Lemma 3.7 that, for all ε > 0 and all c ∈ R, there exists a constant M := M (ε, c) so that this the right hand side of (3.2) is at most ε when t = 1 k n(log n + c). Thus, for the main term, we are interested in partitions λ with n − λ 1 ≍ 1. Further, it is well-known that d λ ≤ n λ1 d λ * , where λ * := λ\λ 1 is the partition λ with the largest element removed. In fact, d λ = n r d λ * (1 + o(1)) when r := n − λ 1 ≍ 1; see [Tey20, Proposition 3.2]. Hough [Hou16, Theorem 5] states a rather general result on the character ratios s λ (k). Manipulating this general formula in the special case of λ ∈ P n (M ), ie r = n − λ 1 ≍ 1, we show in Corollary 3.4 that s λ (k) = e −kr/n (1 + o(1)). While the precise form of s λ (k) is complicated, roughly, this allows us to replace s λ (k) with (e −k/n ) r , which decays exponentially in r ∈ [1, M ].
Altogether, by allowing us to replace this converts an unmanageable main term sum into what is in essence a generating function. We then adapt results of Teyssier [Tey20,§4] to control this generating function.
As stated above, to prove this theorem we use representation theory results on the k-cycle walk from [Hou16]. We state these precisely in the next section; we have to sharpen some results slightly. Throughout this section, λ will always be a partition of n, written λ ⊢ n.
Following [Hou16], we use the Frobenius notation for a partition: where λ ′ is the transpose of the partition λ. Writing r := n − λ 1 , the following hold: We use the following notation for the descending factorial : for z ∈ R and k ∈ N, write (z) k := z(z − 1) · · · (z − k + 1).
Without further ado, we quote the required results from Hough [Hou16] in the next subsection.

Statements of Character Ratio Bounds
In this subsection, we state a result from [Hou16], and deduce some corollaries of these statements. We do not give any proofs at this stage; these are deferred to §3.4.
The first result which we quote determines asymptotically the character ratio for partitions with long first row-which, we recall, are the partitions of particular interest to us.
In this article, we are interested in partitions with long first row, namely P n (M ). We can apply this theorem to analyse asymptotics of partitions with long first row. This covers the case where the first row is long. The next two results consider shorter rows; the first is for k ≥ 6 log n and the second for k ≤ 6 log n. These statements are not exactly the same as in [Hou16], but are slight strengthenings; their proofs are given in §3.4.  From these statements, along with the standard bounds on d ρ , the dimension of an irreducible representation ρ, we are able to control the two terms, which we call the main and error terms, in Lemma B. Our first port of call is to find a suitable M to bound the error term. Once we have determined this, for the main term we need only consider partitions λ with λ 1 ≥ n − M . We take M to be order 1 (but arbitrarily large); so λ 1 ≥ n − M falls into the "long first row" case.
Lemma 3.7 (Error Term). Let c ∈ R and t := 1 k n(log n + c). For M ∈ N, let Then ET M → 0 as M → ∞.
This controls the error term. We now consider the main term in Lemma 3.2. Proof of Theorem 3.1 Given Lemmas 3.7 and 3.8. Lemma 3.2 formulates Teyssier's lemma, ie Lemma B, in the set-up of the random k-cycle walk. Lemmas 3.7 and 3.8 control the error and main terms, respectively. Combining these three ingredients establishes Theorem 3.1.
It remains to control the error and main terms, ie prove Lemmas 3.7 and 3.8 respectively.

Controlling the Main and Error Terms
We control the main term in §3.3.1 and the error term in §3.3.2.

Controlling the Main Term
We analyse the main term, ie Lemma 3.8, first. The analysis follows similarly to the case of random transpositions (ie k = 2) considered by Teyssier [Tey20]. We need only consider partitions λ with long first row, namely λ 1 = n − r with 1 ≤ r ≤ M , where M is some (arbitrarily large) constant. These are precisely the partitions considered in the results quoted from Hough [Hou16].
Teyssier [Tey20, §4.1 and §4.2] then has some technical lemmas to get the main term into the desired form. We summarise these now. Note that he considers time 1 2 n log n + cn, while we are considering 1 k n(log n + c); hence our two c-s differ by a factor 2. Before digging into the details of his lemmas, we give the high-level reasons why his proof passes over to our case. When considering the main term, one need only study those partitions with long first row, ie λ with r := n − λ 1 ≍ 1. For such λ, consider the difference between s λ (2) and s λ (k): Teyssier needs s λ (2) t = n −r e −c . This goes some way to justifying why we expect t = 1 k n(log n + c) to be the mixing time, and that the cutoff window should scale down with k linearly.
We now proceed more formally. For each r ≥ 1, define the polynomials T r by For a partition λ, write λ * := λ \ λ 1 for λ with the first row removed. For a permutation σ ∈ S n , write Fix σ for the number of fixed points in σ.
The proof of this lemma is combinatorial and strongly relies on the Murnagham-Nakayama rule. Observe that Lemma 3.9 is a statement purely about the representation theory of the symmetric group; it is nothing to do with the random walk.
Using this result, one can obtain the following approximation. To prove this, one separates A n;k,t into the set of permutations with a cycle of length greater than M and those with all cycles of length at most M . Also, it is not difficult to check, using the hook-length formula (see, eg, [Tey20, Propositions 3.1 and 3.2]) and Corollary 3.4, that These results can be combined to prove Lemma 3.10 exactly as for [Tey20, Lemma 4.2]. Given these, one can then neglect polynomials of high degree, in the following sense.  Finally, we evaluate this function at Fix σ with σ ∼ Unif(A n;k,t ) and take the expectation. The idea being this lemma is simple: it is well-known that if σ ∼ Unif(S n ) then Fix σ → d Pois(1); we show that the same is true when σ is restricted to having a prescribed parity.
Proof of Lemma 3.13. We claim that precisely half the permutations with a given number of fixed points are even (and hence half are odd): if A n is the alternating group of even permutations, then σ ∈ A n ⊆ S n | Fix σ = r = 1 2 σ ∈ S n | Fix σ = r for all r ≥ 0.
Given this claim, the lemma follows easily, as in [Tey20, Lemma 4.6]. We now justify our claim. First, we find the number of permutations in S n (of either parity) with exactly r fixed points, which we denote f n,r . Note that f n,r = n r f n−r,0 . Indeed: first select the r points to be fixed, for which there are Combined with the fact that f n,r = n r f n−r,0 , we thus deduce that We now turn to even permutations, ie A n . We apply an analogous method. Denote by f ′ n,r the number of permutations in A n with exactly r fixed points. Since appending fixed points to a permutation does not change its parity, again we have f ′ n,r = n r f ′ n−r,0 . For m ∈ N and i ∈ [m], define A m,i := {σ ∈ A m | σ(i) = i}. Analogously to before, since appending fixed points does not affect the parity, we have . This is a factor 1 2 different to |∩ i∈I S m,i | from before. Using the inclusion-exclusion principle thus gives, as before, Since half the permutations of a given sign are even, ie f ′ n,r = 1 2 f n,r , the other half must be odd. Observe that Lemmas 3.11 and 3.13 and Proposition 3.12 are statements purely about the representation theory of the symmetric group; it is nothing to do with the random walk.
Using standard applications of the triangle inequality, these lemmas can then be combined to deduce that the main term converges to the TV-distance in question; see [Tey20,§4.4].
Since ε > 0 was arbitrary, the proof is now completed by the triangle inequality.

Controlling the Error Term
Finally, we control the error term, ie Lemma 3.7. As before, we consider only λ with λ 1 ≥ λ ′ 1 . Consider first the dimensions of the irreducible representations, ie d λ .
The second claim is a special case: n r /r r/2 ≤ n r /( 1 3 n) r/2 = 3 r/2 n r/2 ≤ 2 r n r/2 when r ≥ 1 3 n. We split the summation ∞ r=M in the error term ET M into two parts: r ≤ 0.495n and r ≥ 0.495n; the latter sum is separated according to whether or not k ≤ 6 log n.

Proofs of Character Ratio Bounds
In this section we give the deferred proofs from §3.2.
This is the main contribution to s λ (k); it remains to control the error in Theorem 3.3. If k ≫ 1 then we necessarily have r < k, and so the error term is 0; if k ≍ 1, then the error term is O(n −k ) = O(n −2 ), as k ≥ 2. But (1 − r/n) k ≍ 1, since k ≤ 1 3 n and r ≍ 1, and so this additive O(1/n 2 ) error is absorbed into the larger O(k/n 2 ) error.
Note that 1 + O(log n/n 1/4 ) ≤ exp(O(n −1/5 )) and O(e −k (log n) 4 /n 1/4 ) = o(e −k ). Since r ≍ n, and so kr/n ≍ k 1, these error terms can be absorbed by exp(− 1 200 kr/n). Lemma 3.6 then follows. It remains to prove our claim, which is a slight sharpening of [Hou16,Lemma 15]. The following claim comes from inspecting the proof of [Hou16,Lemma 15]: in order to prove that it suffices, writing δ := r/n ∈ [ 1 3 , 1], to prove that The worst case is clearly k = 2, in which case k/(k − 1) = 2. Thus we need 1 − δ ≤ e −2cδ . If one can allow δ all the way down to 0, then one must take c ≤ 1 2 ; however, we only need δ ∈ [ 1 3 , 1]. One can then check that it is then sufficient to take c so that In particular, we may take c := 1 2 + 1 10 + 1 200 = 0.605.

Random Walks on Homogenous Spaces
Throughout this section, G will be a finite group and K a subgroup. Denote the homogenous space consisting of the (right) cosets by X := G/K := {gK | g ∈ G}. Denote the set of complexvalued functions on X by L(X) := {f : X → C}. We frequently identify this with the space of K invariant functions on G, ie those f : G → C for which f (gk) = f (g) for all g ∈ G and all k ∈ K.

Gelfand Pairs and Spherical Fourier Analysis for Invariant Random Walks
The majority of this subsection-namely, the analysis leading up to Proposition 4.7-is an abbreviated exposition of [CST08, §4]; a related exposition can be found in [CST07,§2].
Let G be a finite group and let K be a subgroup. A function f : G → C is K bi-invariant if f (k 1 gk 2 ) = f (g) for all g ∈ G and k 1 , k 2 ∈ K.
Definition 4.1. Let G be a finite group and K be a subgroup. The pair (G, K) is called a Gelfand pair if the algebra of K bi-invariant functions (under convolution) is commutative. Equivalently, (G, K) is a Gelfand pair if the permutation representation λ of G on X defined by (λ(g)f )(x) := f (g −1 x) for g ∈ G, f ∈ L(X) and x ∈ X, is multiplicity-free.
This equivalence is shown in [CST08,Theorem 4.4.2]. From now on, assume that (G, K) is a Gelfand pair. We next introduce spherical functions and spherical representations.
For a spherical function ϕ, the subspace of L(X) generated by the G-translates of ϕ, ie V ϕ := λ(g)ϕ | g ∈ G where λ is the permutation representation of G on X, is called the spherical representation.
The following theorem is a culmination of statements from [CST08, §4.5 and §4.6]. This allows us to construct a 'spherical basis' in which the Fourier transform has a simple form.
Definition 4.4. The spherical Fourier transform µ of a K invariant function µ ∈ L(X) is defined by Corollary 4.5. There exists an orthonormal basis of K invariant functions on G with the following property. Let µ be a K bi-invariant function on G. N }), then the matrix representing the operator µ(ρ i ) has only one non-zero entry, which is in the first position and has value |K| µ(i).
As a consequence, a Fourier inversion formula holds: where µ * t is the t-fold self-convolution of µ.
From this we immediately obtain for the TV distance between µ * t and Unif X . To apply this to random walks on G, the step distribution must be K bi-invariant; this is the case if the stochastic transition matrix P = (p x,y ) x,y∈X is G-invariant : p x,y = p gx,gy for all x, y ∈ X and all g ∈ G.
When looking at such random walks, we always start from a point which is stabilised by K.
Definition 4.6. Let G be a finite group and K be a subgroup. Let G act on the homogenous space X := G/K by the left coset action: g · (hK) : When starting a random walk with G-invariant transition matrix fromx ∈ X = G/K which is stabilised by K, one can then check P t (x, ·) = µ * t x (·) for all t ∈ N 0 where µx(·) := P (x, ·); that is, the probability of being at x after t steps when started fromx is µ * t x (x) for all x ∈ X and all t ∈ N 0 . Altogether, we have now proved the following proposition.
Proposition 4.7 ([CST08, Proposition 4.9.1]). Let (G, K) be a Gelfand pair and denote X := G/K. Let {ϕ i } N i=0 be the associated spherical functions, considered as K bi-invariant functions on X, and {d i } N i=0 the associated dimensions; assume that ϕ 0 (x) = 1 for all x ∈ X. Letx be an element of X stabilised by K. Let P be a G-invariant stochastic matrix and set µx(·) := px ,· . Let t ∈ N 0 and x ∈ X. Then where µx is the spherical Fourier transform of µx. As a corollary, we have We now have all the ingredients to prove our TV-approximation lemma for random walks on homogeneous spaces corresponding to Gelfand pairs, ie Lemma C; we rested it here for convenience.  Lemma). Let (G, K) be a Gelfand pair and denote X := G/K. Let x be an element of X stabilised by K. Let {ϕ i } N i=0 be the associated spherical functions, considered as K bi-invariant functions on X, and {d i } N i=0 the associated dimensions; assume that ϕ 0 (x) = 1 for all x ∈ X. Let P be a G-invariant stochastic matrix and set µx(·) := P (x, ·).
Proof. First we apply Proposition 4.7 and the triangle inequality: Applying Cauchy-Schwarz and the standard spherical orthogonality relations (see, eg, [CST08, Proposition 4.7.1] or [CST07, Equation (2.11)]), we obtain Plugging this into the previous bound, we deduce the lemma.

Limit Profile for Many Urn Ehrenfest Diffusion
Suppose that one has n balls labelled 1 through n and m + 1 urns labelled 0 through m. The set of all configurations can be identified with the set X n,m+1 := {0, 1, ..., m} n : an element x = (x 1 , ..., x n ) ∈ X n,m+1 indicates that the j-th ball is in the x j -th urn. Initially, put all the balls in the first urn (labelled 0): this is the initial configuration, and corresponds tox := (0, 0, ..., 0).
We can endow X with a metric structure: for x, y ∈ X n,m+1 , set Thinking of x and y as configurations of balls, d(x, y) is the number of balls which are not in the same urn in the two configurations.
We consider the random walk on X := X n,m+1 described by the following step: choose uniformly at random a ball and an urn; put the chosen ball in the chosen urn. In terms of a transition matrix R on X × X, this is given by the following expressions, for x, y ∈ X: The following theorem is a restatement of Theorem C, but written more formally: cutoff is for a sequence of Markov chains; we make this sequence explicit. As in previous sections, for ease of presentation we omit the N -subscripts in the proof. We start by phrasing the Ehrenfest urn model in Gelfand pair language. To do this, we give a very abbreviated exposition of [CST07,§3]. Let S m+1 and S n be the symmetric groups on {0, 1, ..., m} and {1, ..., n}, respectively. Then X n,m+1 = {0, 1, ..., n} n is a homogenous space for the wreath product S m+1 ≀ s n under the action (σ 1 , ..., σ n ; θ) · (x 1 , ..., x n ) := (σ 1 x θ −1 (1) , ..., σ n x θ −1 (n) ), ie X i is moved by θ to the position θ(i) and then it is changed by the action of σ θ(i) . Note that the stabiliser ofx := (0, 0, ..., 0) ∈ X n,m+1 coincides with the wreath product S m ≀ S n , where S m ≤ S m+1 is the stabiliser of 0. Therefore we can write X m+1,n = (S m+1 ≀ S n )/(S m ≀ S n ). The action is distance transitive, and so the group S m+1 ≀ S n acts isometrically on X n,m+1 . It follows that (S m+1 ≀ S n )/(S m ≀ S n ) is a Gelfand pair; see [CST07, Example 2.5].
The associated spherical functions and dimensions are given by the following proposition. This can also be seen as consequence of the orthogonality of spherical functions, ie Theorem 4.3. △ We first determine the spherical Fourier transform of the step distribution µ(·) := R(x, ·). Using the expression for ϕ i (1) given by Theorem 4.10, we obtain µ(i) = 1 − i/n.
There are m ℓ n ℓ different x with d(x, x) = ℓ. Applying Theorem 4.10, we obtain the following expressions for the terms in Lemma C: Our first aim is to use this to determine which are the 'important' spherical statistics. Proof. Using Lemma 4.12, we have | µ(i)| ≤ e −i/n for all i. The inequality ET ≤ ET ′ now follows. The equality in the definition of ET ′ is now an immediate consequence of Theorem 4.10. For the inequality ET ′ ≤ ε, choose M so that i>M e −ci / √ i! ≤ ε. Then we have ET ≤ i>M (mn) i/2 e −t/n i / √ i! = i>M e −ci / √ i! ≤ ε.
From now on, choose M := M (c, ε) as in Lemma 4.13. Hence, for the main term, we need only deal with spherical statistics with i ≍ 1. We would then like to use the replacement λ i ≈ e −i/n . Conveniently, the adjusted main term MT ′ in this case (Definition 4.14) is exactly the same as that for the Gibbs sampler (see Definition 2.6) in §2.2; to match notation, replace m with α.
The following two lemmas are simply a restatement of Lemmas 2.7a and 2.7b.  We now have all the ingredients to establish the limit profile for the Ehrenfest urn model.
Proof of Theorem 4.9. Let us summarise what we have proved. These are all evaluated at the target mixing time t = 1 2 log(mn) + cn with M := M (c, ε) given by Lemma 4.13. · By Lemma 4.13, the error term ET satisfies ET ≤ ε. · By Lemma 4.15a, the original main term MT satisfies |MT − MT ′ | ≤ 2ε. · By Lemma 4.15b, the adjusted main term MT ′ satisfies 1 2 MT ′ → 2 Φ( 1 2 e −c ) − 1 as n → ∞. Since ε > 0 is arbitrary, applying the TV-approximation lemma for random walks on homogenous spaces, namely Lemma C, we immediately deduce the theorem.
This proves the stated claim, and hence completes the proof of the lemma.