Cutoff for conjugacy-invariant random walks on the permutation group

We prove a conjecture raised by the work of Diaconis and Shahshahani (1981) about the mixing time of random walks on the permutation group induced by a given conjugacy class. To do this we exploit a connection with coalescence and fragmentation processes and control the Kantorovitch distance by using a variant of a coupling due to Oded Schramm. Recasting our proof in the language of Ricci curvature, our proof establishes the occurrence of a phase transition, which takes the following form in the case of random transpositions: at time $cn/2$, the curvature is asymptotically zero for $c\le 1$ and is strictly positive for $c>1$.


Main results
Let S n denote the multiplicative group of permutations of {1, . . . , n}. Let Γ ⊂ S n be a fixed conjugacy class in S n , i.e., Γ = {gτ g −1 : g ∈ S n } for some fixed permutation τ ∈ S n . Alternatively, Γ is the set of permutation in S n having the same cycle structure as σ. Let X σ = (X 0 , X 1 , . . .) be discrete-time random walk on S n induced by Γ, started in the permutation σ ∈ S n , and let Y σ be the associated continuous time random walk. These are the processes defined by where γ 1 , γ 2 , . . . are i.i.d. random variables which are distributed uniformly in Γ; and (N t , t ≥ 0) is an independent Poisson process with rate 1. Then Y is a Markov chain which converges to an invariant measure µ as t → ∞. If Γ ⊂ A n (where A n denotes the alternating group) then µ is uniformly distributed on A n and otherwise µ is uniformly distributed on S n . The simplest and most well known example of a conjugacy class is the set T of all transpositions, or more generally of all cyclic permutations of length k ≥ 2. This set will play an important role in the rest of the paper. Note that Γ depends on n but we do not indicate this dependence in our notation.
The main goal of this paper is to study the cut-off phenomenon for the random walk X. More precisely, recall that the total variation distance X − Y T V between two random variables X, Y taking values in a set S is given by For 0 < δ < 1, the mixing time t mix (δ) is by definition given by and µ is the invariant measure defined above.
In the case where Γ = T is the set of transpositions, a famous result of Diaconis and Shahshahani [9] is that the cut-off phenomenon takes place at time (1/2)n log n asymptotically as n → ∞. That is, t mix (δ) is asymptotic to (1/2)n log n for any fixed value of 0 < δ < 1. It has long been conjectured that for a general conjugacy class such that |Γ| = o(n) (where here and in the rest of the paper, |Γ| denotes the number of non fixed points of any permutation γ ∈ Γ), a similar result should hold at a time (1/|Γ|)n log n. This has been verified for k-cycles with a fixed k ≥ 2 by Berestycki, Schramm and Zeitouni [6]. This is a problem with a substantial history which will be detailed below.
The primary purpose of this paper is to verify this conjecture. Hence our main result is as follows.
Our main tool for this result is the notion of discrete Ricci curvature as introduced by Ollivier [17], for which we obtain results of independent interest. We briefly discuss this notion here; however we point out that this turns out to be equivalent to the more well-known path coupling method and transportation metric introduced by Bubley and Dyer [7] and Jerrum [13] (see for instance Chapter 14 of the book [15] for an overview). However we will cast our results in the language of Ricci curvature because we find it more intuitive. Recall first that the definition of the L 1 -Kantorovitch distance (sometimes also called Wasserstein or transportation metric) between two random variables X, Y taking values in a metric space (S, d) is given by where the infimum is taken over all couplings (X,Ŷ ) which are distributed marginally as X and Y respectively. Ollivier's definition of Ricci curvature of a Markov chain (X t , t ≥ 0) on a metric space (S, d) is as follows: Definition 1.1. Let t > 0. The curvature between two points x, x ∈ S with x = x is given by where X x t and X x t denote Markov chains started from x and x respectively. The curvature of X is by definition equal to κ t := inf x =x κ t (x, x ).
In the terminology of Ollivier [17], this is in fact the curvature of the discrete-time random walk whose transition kernel is given by m x (·) = P(X t = ·|X 0 = x). We refer the reader to [17] for an account of the elegant theory which can be developed using this notion of curvature, and point out that a number of classical properties of curvature generalise to this discrete setup.
For our results it will turn out to be convenient to view the symmetric group as a metric space equipped with the metric d which is the word metric induced by the set T of transpositions (we will do so even when the random walk is not induced by T but by a general conjugacy class Γ). That is, the distance d(σ, σ ) between σ, σ ∈ S n is the minimal number of transpositions one must apply to get from one element to the other (one can check that this number is independent of whether right-multiplications or left-multiplications are used).
For simplicity we focus in this introduction on the case where the random walk is induced by the set of transpositions T . (A more general result will be stated later on the paper). For c > 0 and σ = σ , let κ c (σ, σ ) = 1 − W 1 (X σ cn/2 / , X σ cn/2 ) d(σ, σ ) (8) and define κ c (σ, σ) = 1. That is, κ c (σ, σ ) = κ cn/2 (σ, σ ) with our notation from (7). In particular, κ c depends on n but this dependency does not appear explicitly in the notation. It is not hard to see that κ c (σ, σ ) ≥ 0 (apply the same transpositions to both walks X σ and X σ ). For parity reasons it is obvious that that κ c (σ, σ ) = 0 if σ and σ do not have the same signature. Thus we only consider the curvature between elements of even distance. For c > 0 define where the infimum is taken over all σ, σ ∈ S n such that d(σ, σ ) is even. Our main result states that κ c experiences a phase transition at c = 1. More precisely, the curvature κ c is asymptotically zero for c ≤ 1 but for c > 1 the curvature is strictly positive asymptotically. In order to state our result, we introduce the quantity θ(c), which is the largest solution in [0, 1] to the equation It is easy to see that θ(c) = 0 for c ≤ 1 and θ(c) > 0 for c > 1. In fact, θ(c) is nothing else but the survival probability of a Galton-Watson tree with Poisson offspring distribution with mean c.
On the other hand, for c > 1 lim inf n→∞ κ c ≥ θ(c) 4 (11) and lim sup n→∞ κ c ≤ θ(c) 2 A more general version of this theorem will be presented later on, which gives results for the curvature of a random walk induced by a general conjugacy class Γ. This will be stated as Theorem 2.2.
We believe that the upper bound is the sharp one here, and thus make the following conjecture. Of course the conjecture is already established for c ≤ 1 and so is only interesting for c > 1.

Relation to previous works and organisation of the paper
Mixing times of Markov chains were initiated independently by Aldous [1] and by Diaconis and Shahshahani [9]. In particular, as already mentioned, Diaconis and Shahshahani proved Theorem 1.1 in the case where Γ is the set T of transpositions. Their proof relies on some deep connections with the representation theory of S n and bounds on so-called character ratios. The conjecture about the general case appears to have first been made formally in print by Roichman [18] but it has no doubt been asked privately before then. We shall see that that the lower bound t mix (δ) ≥ (1/|Γ|)n log n is fairly straightforward; the difficult part is the corresponding upper bound. Flatto, Odlyzko and Wales [12] built on the earlier work of Vershik and Kerov [23] to obtain that t mix (δ) ≤ (1/2)n log n when |Γ| is bounded (as is noted in [8, p.44-45]). This was done using character ratios and this method was extended further by Roichman [18,19] to show an upper bound on t mix (δ) which is sharp up to a constant when |Γ| = o(n) (and in fact, more generally when |Γ| is allowed to grow to infinity as fast as (1 − δ)n for any δ ∈ (0, 1)). Again using character ratios Lulov and Pak [16] shows the cut-off phenomenon as well as t mix = (1/|Γ|)n log n in the case when |Γ| ≥ n/2. Roussel [20,21] shows the correct mixing time as well as the cut-off phenomenon for the case when |Γ| ≤ 6. Finally, in a more recent article Berestycki, Schramm and Zeitouni [6], it is shown using coupling arguments that the cut-off phenomenon occurs and t mix = (1/k)n log n in the case when Γ consists only of cycles of length k for any k ≥ 2 fixed.
The authors in Berestycki, Schramm and Zeitouni [6] remark that their proof can be extended to cover the case when Γ is a fixed conjugacy class and indicate that their methods can probably be pushed to cover the case when |Γ| = o( √ n). Their argument uses very delicate estimates about the mixing time of small cycles, together with a variant of a coupling due to Schramm [22] to deal with large cycles. The most technical part of the argument is to analyse the distribution of small cycles. While our approach in this paper bears some similarities with the paper [6], we shall see that our use of the L 1 -Kantorovitch distance (Ricci curvature) allows us to completely bypass the difficulty of ever working with small cycles. This is quite surprising given that the small cycles (in particular, the fixed points) are responsible for the occurrence of the cut-off at time t mix .
2 Curvature and mixing

Curvature theorem
The left part of (5) is relatively easy and is probably known. We give a proof in Appendix A. We now start the proof of the main results of this paper, which is the right hand side of (5). We will show how our bounds on coarse Ricci curvature imply the desired result for the upper bound on t mix (δ). We first state the more general version of Theorem 1.2 discussed in the introduction. To begin, we define the cycle structure (k 2 , k 3 , . . . ) of Γ to be a vector such that for each j ≥ 2, there are k j cycles of length j in the cycle decomposition of any τ ∈ Γ (note that this is the same for any τ ∈ Γ). Then k j = 0 for all j > n and we have that |Γ| = ∞ j=2 jk j . In the case for the transposition random walk the quantity θ(c) which appears in the bounds is the survival probability of a Galton-Watson process with offspring distribution given by a Poison random variable with mean c. Our first task is to generalise θ(c). We do so via a fixed point equation, which is more complex here (and we point out that the interpretation in terms of survival probability of a certain Galton-Watson process does not hold in general). Firstly notice that for each j ≥ 2 we have that jk j /|Γ| ≤ 1. Thus (jk j /|Γ|) j≥2 is compact in the product topology (the topology of pointwise convergence). Hence by extracting a subsequence we may if we wish assume without loss of generality that pointwise as n → ∞. It follows that for each j ≥ 2, k j ∈ [0, 1] and ∞ j=2 k j ≤ 1 by Fatou's lemma.
Note that for each c > 0, x → Ψ(x, c) is convex on [0, 1]. In the case when j≥2 k j < 1, the function Ψ(·, c) is not a generating function of a random variable for any c > 0. On the other hand if j≥2 k j = 1 then for any c > 0 it is possible to write Ψ(·, c) as the generating function of a random variable.
There are two cases to consider. First suppose that z = ∞ j=2 k j < 1. Then we have that As x → f c (x) is concave on [0, 1] it follows that there exists a unique θ(c) ∈ (0, 1) such that f c (θ(c)) = 0. Next suppose that ∞ j=2 k j = 1, then Hence for c > c Γ we have that d dx f c (x)| x=0 > 0 and again by concavity it follows that there exists a unique θ(c) ∈ (0, 1) such that f c (θ(c)) = 0.
For the rest of the statements suppose that c > c Γ . The fact that c → θ(c) is increasing follows from the definition of Ψ(x, c) and the fact that θ(c) = Ψ(θ(c), c).
Let δ > 0, then it follows that for any x ∈ (0, 1): On the other hand we have that uniformly in x ∈ [0, 1], and from this it follows that c → θ(c) is continuous and differentiable on (c Γ , ∞).
Notice that θ(c) ∈ [0, 1], hence θ(c) has convergent subsequences as c ↓ c Γ . Let L denote a subsequential limit of θ(c) as c ↓ c Γ . Then it follows that L solves the equation L = 1 − Ψ(L, c Γ ). This equation has only a zero solution and thus L = 0 and hence lim c↓c Γ θ(c) = 0. The limit as c ↑ ∞ follows from a similar argument.
In the case when Γ = T is the set of transpositions we have that k 2 = 1 and k j = 0 for j ≥ 3, hence Ψ(x, c) = e −cx and thus the definition of θ(c) above agrees with the definition given in the introduction.
Having introduced θ(c) we now introduce the notion of Ricci curvature we will use in the general case. For c > 0 and σ = σ , let where k = |Γ| and define κ c (σ, σ) = 1. Then let where the infimum is taken over all σ, σ ∈ S n such that d(σ, σ ) is even. That is, κ c (σ, σ ) = κ cn/k (σ, σ ) with our notation from (7). We now state a more general form of Theorem 1.2 which in particular covers the case of Theorem 1.2.
Theorem 2.2. Let Γ ⊂ S n be a conjugacy class and recall the definition of c Γ from (15). Then for On the other hand, for c > c Γ lim inf and where θ(c) is the unique solution in (0, 1) of where Ψ is given by (14).

Curvature implies mixing
We now show how Theorem 2.2 implies Theorem 1.1. Again fix > 0 and define t = (1 + 2 )(1/k)n log n and let t = (1+ )(1/k)n log n where k = |Γ|. We are left to prove that d T V (t) → 0 as n → ∞. For s ≥ 0 letd where the sup is taken over all permutations at even distances. We first claim that it suffices to prove thatd Indeed, assume thatd T V (t ) → 0 as n → ∞. Then there are two cases to consider. Assume that Γ ⊂ A n . Then X s ∈ A n for all s ≥ 1 and µ is uniform on A n . Then by Lemma 4.11 in [15], Hence Theorem 1.1 follows from (21) in this case. In the second case, Γ ⊂ A c n . In this case X s ∈ A n for s even, and X s ∈ A c n for s odd. Using the same lemma, we deduce that if s ≥ t is even, be the Poisson clock of the random walk Y . Then P(N s even) → 1/2 as s → ∞, µ = (1/2)(µ 1 + µ 2 ), and P(N t ≥ t ) → 1 as n → ∞. Thus we deduce that Y id t − µ T V → 0. Again, Theorem 1.1 follows. Hence it suffices to prove (21).
Note that for any two random variables X, Y on a metric space (S, d) we have the obvious inequality X − Y T V ≤ W 1 (X, Y ) provided that x = y implies d(x, y) ≥ 1 on S. This is in particular the case when S = S n and d is the word metric induced by the set T of transpositions. In other words it suffices to prove mixing in the L 1 -Kantorovitch distance.
By Corollary 21 in [17] we have that for each s ≥ 1, sup d(σ,σ ) even since the diameter of S n is equal to n − 1. Solving Thus if u = scn/k ≥ s cn/k , it suffices that Now, Theorem 1.2 gives lim inf Proof. Using L'Hopital's rule twice we have that Next we have that lim c→∞ θ(c) = 1 and hence Consequently we have that for u ≥ t = (1 + )(1/k)n log n u satisfies (24) for some sufficiently large c > c Γ . Hence lim sup n→∞dT V (t ) → 0 and thus (21) holds, which finishes the proof.

Stochastic commutativity
To conclude this section on curvature, we state a simple but useful lemma. Roughly, this says that the random walk is "stochastically commutative". This can be used to show that the L 1 -Kantorovitch distance is decreasing under the application of the heat kernel. In other words, initial discrepancies for the Kantorovitch metric between two permutations are only smoothed out by the application of random walk. Lemma 2.4. Let σ be a random permutation with distribution invariant by conjugacy. Let σ 0 be a fixed permutation. Then σ 0 • σ has the same distribution as σ • σ 0 .
Proof. Define σ = σ 0 • σ • σ −1 0 . Then since σ is invariant under conjugacy, the law of σ is the same as the law of σ. Furthermore, we have σ 0 σ = σ σ 0 so the result is proved. This lemma will be used repeatedly in our proof, as it allows us to concentrate on events of high probability for our coupling.

Preliminaries on random hypergraphs
For the proof of Theorem 1.1 we rely on properties of certain random hypergraph processes. The reader who is only interested in a first instance in the case of random transpositions, and is familiar with Erdős-Renyi random graphs and with the result of Schramm [22] may safely skip this section.

Hypergraphs
In this section we present some preliminaries which will be used in the proof of Theorem 2.2. Throughout we let Γ ⊂ S n be a conjugacy class and let (k 2 , k 3 , . . . ) denote the cycle structure of Γ. Thus Γ consists of permutations such that in their cycle decomposition they have k 2 many transpositions, k 3 many 3-cycles and so on. Note that we have suppressed the dependence of Γ and (k 2 , k 3 , . . . ) on n. We assume that (13) is satisfied so that for each j ≥ 2, jk j /|Γ| → k j as n → ∞. We also let k = |Γ| so that k = j≥2 jk j .
We will need some results which are generalise those of Schramm [22]. The framework which we will use is that of random hypergraphs. Consider the random walk X = (X t : t = 0, 1 . . .) on S n where X t = X id t with our notations from the introduction. Hence A given step of the random walk, say γ s , can be broken down into cycles, say γ s,1 • . . . γ s,r where r = j k j . We will say that a given cyclic permutation γ has been applied to X before time t if γ = γ s,i for some s ≤ t and 1 ≤ i ≤ r.
To X we associate a certain hypergraph process H = (H t : t = 0, 1, . . .) defined as follows. For t = 0, 1, . . ., H t is a hypergraph on {1, . . . , n} where a hyperedge {x 1 , . . . , x j } is present if and only if the cyclic permutation (x 1 , . . . , x j ) has been applied to the random walk X prior to time t. For instance, H 1 has exactly k j many j-hyperedges for j ≥ 2. Note that the presence of hyperedges are not independent.

Giant component of the hypergraph
In the case Γ = T , the set of transpositions, the hypergraph H s is a realisation of an Erdős-Renyi graph. Analogous to Erdős-Renyi graphs, we first present a result about the size of the components of the hypergraph process H = (H t : t = 0, 1, . . . ) (where by size, we mean the number of vertices in this component). For the next lemma recall the definition of Ψ(x, c) in (14). Recall that for c > c Γ , where c Γ is given by (15), there exists a unique root θ(c) ∈ (0, 1) of the equation θ(c) = 1−Ψ(θ(c), c).
Theorem 3.1. Consider the random hyper graph H s and suppose that s = s(n) is such that sk/n → c as n → ∞ for some c > c Γ . Then there is a constant β > 0, depending only on c, such that with probability tending to one all components but the largest have size β log(n). Further the size of the largest component, normalised by n, converges to θ(c) in probability as n → ∞.
Of course, this is the standard Erdős-Renyi theorem in the case where Γ = T is the set of transpositions. See for instance [11], in particular Theorem 2.3.2 for a proof. In the case of k-cycles with k fixed and finite, this is the case of random regular hyper graphs analysed by Karoński and Łuczak [14]. For the slightly more general case of bounded conjugacy classes, this was proved by Berestycki [4].
Remark 3.2. Note that the behaviour of H s in Theorem 3.1 can deviate markedly from that of Erdős-Renyi graphs. The most obvious difference is that H s can contain mesoscopic components, something which has of course negligible probability for Erdős-Renyi graphs. For example, suppose Γ consists of n 1/2 transpositions and one cycle of length n 1/3 . Then the giant component appears at time n 1/2 /2 with a phase transition. Yet even at the first step there is a component of size n 1/3 . (However it will follow from the proof that, in the supercritical phase c > c Γ , such a dichotomy still holds). From a technical point of view this has nontrivial consequences, as proofs of the existence of a giant component are usually based on the dichotomy between microscopic components and giant components. Furthermore, when the conjugacy class is large and consists of many small or mesoscopic cycles, the hyperedges have a strong dependence, which makes the proof very delicate.
Proof of Theorem 3.1. Suppose that s = s(n) is such that sk/n → c for some c > c Γ as n → ∞ for some c ≥ 0. We reveal the vertices of the component containing a fixed vertex v ∈ {1, . . . , n} using breadth-first search exploration, as follows. There are three states that each vertex can be: unexplored, removed or active. Initially v is active and all the other vertices are unexplored. At each step of the iteration we select an active vertex w according to some prescribed rule among the active vertices at this stage (say with the smallest label). The vertex w becomes removed and every unexplored vertex which is joined to w by a hyperedge becomes active. We repeat this exploration procedure until there are no more active vertices.
We will need to keep track of the hyperedges we reveal and where they came from, in order to deal with dependencies mentioned in Remark 3.2. For t = 1, . . . , s we call the hyperedges which are in H t but not in H t−1 the t-th packet. Note that each packet consists of k j hyperedges of size j, j ≥ 2, which are sampled uniformly at random without replacement from {1, . . . , n}. However, crucially, hyperedges from different packets are independent. For t = 1, . . . , s and j ≥ 2 let Y (t) j (i) be the number of j-hyperedges in the t-th packet that were revealed in the exploration process, prior to step i. Let i ≥ 0 and let H i denote the filtration generated by the exploration process up to stage i, including the information of which edge came from which packet: Our goal will be to give uniform stochastic bounds on the distribution of |A i+1 \ A i |, so long as i is not too large. We will thus fix i and in order to ease notations we will often suppress the dependence on i, in Y (t) j (i): we will thus simply write Y (t) j . Note that by definition, for each t = 1, . . . , s and Let w be the vertex being explored for stage i + 1. For t = 1, . . . , s let M t be the indicator that w is part of a hyperedge in the t-th packet. Thus, (M t ) 1≤t≤s are independent conditionally given H i , and If w is part of a hyperedge in the t-th packet, let V t be the size of the (unique) hyperedge of that packet containing it. Then Note that when M t = 1 it implies that the denominator above is non-zero and thus (27) is well defined. When M t = 0 we simply put V t = 1 by convention. Then we have the following almost sure inequality: This would be an equality if it were not for possible self-intersections, as hyperedges connected to w coming from different packets may share several vertices in common. In order to get a bound in the other direction, we simply truncate the |A i+1 \ A i | at n 1/4 . Let I i be the indicator that among the first n 1/4 vertices, no such self-intersection occurs. Note that E(I i ) ≥ p n = 1 − n −1/2 , by straightforward bounds on the birthday problem. We then have We will stop the exploration process once we have discovered enough vertices, or if the active set dies out, whichever comes first. Therefore we define and we set T = T ↑ ∧ T ↓ . The following lemma shows that the distribution of |A i+1 \ A i | converges to a limit in distribution, uniformly for i < T . (Note however that the limit is improper if j k j < 1.) Lemma 3.3. There exists some deterministic function w : N → R such that w(n) → 0 as n → ∞ with the following property. For each x ∈ (0, 1), almost surely.
Proof. Suppose T > i. In particular, from the definition of T ↑ and (25) we have that s t=1 j≥2 almost surely. From (28) we have that Recall from (27) that where w 1 (n) vanishes at infinity. For the last inequality we have used that which follows from the dominated convergence theorem, as jk j /k is uniformly bounded by 1. Note that the above estimate is uniform in i ≥ 1.
For the upper bound, we use (29). Let n → 0 sufficiently slowly that ε n n 1/3 → ∞. For concreteness take ε n = n −1/6 . Define and let I = G c . Packets t ∈ I are the bad packets for which a significant fraction of the mass (at least ε n ) was already discovered. In the case where the conjugacy class contains only one type of cycles, say k-cycles, then I coincides with the set of hyperedges already revealed. At the other end of the spectrum, when the conjugacy class Γ is broken down into many small cycles, then I is likely to be empty. But in all cases, |I| satisfies the trivial bound |I| ≤ n 2/3 n k by (30), and in particular k|I| This turns out to be enough for our purposes.
Note that E(x s t=1 Mt(Vt−1) ) and E(x n 1/4 ∧ s t=1 Mt(Vt−1) ) can only differ by at most x n 1/4 , which is exponentially small, so we can neglect this difference. Then we may write, counting only hyper edges from good packets, using the fact that 1 − x ≤ e −x for all x ∈ R, and (32): where the function w 2 : N → R vanish at infinity, invoking again (31). The proof is complete.
We will need the following lemma which tells us the number of vertices in logarithmically large components, among other things.
Moreover, letting for v ∈ {1, . . . , n} let C v denote the size of the component containing v in probability as n → ∞.
Proof. We start with the lower bound of (34). For simplicity write θ = θ(c). Let x = x n be the solution of the equation It is easy to check that x n ∈ (0, 1) is well-defined and x n → 1 − θ as n → ∞. From (33) we see that uniformly in i ≤ T , where the implied constant is nonrandom. Consequently, We apply the optional stopping theorem at time r ∧ T r ∧ T ↓ , and we bound from below M by considering only its value on the event {T ↓ < r}, in which case also T r > T ↓ , and hence r ∧ T r ∧ T ↓ = T ↓ . Therefore, Hence Taking r = β log n, and recalling that x n → 1 − θ, we deduce that from which the lower bound of (34) follows. For the upper bound of (34), we make the following observation. Let m ≥ 1, be finite arbitrary (eventually chosen to be large), and observe P( Then clearly for all m, lim and thus lim sup On the other hand the right hand side is easily shown, by standard random walk theory, to equal θ. Thus the upper bound of (34) follows. We now turn to (35). Observe that |C v | ≥ β log n precisely if T ↓ ≥ β log n, hence if Z = n v=1 1 {|Cv|≥β log n} , we have that E(Z)/n → θ by (34). Hence if we show that Var(Z) = o(n 2 ) then (35) follows by Chebyshev's inequality. In particular, it suffices to show that for v = w ∈ {1, . . . , n}, given that we already from (34) that P(|C v | ≥ β log n) → θ(c). On the other hand, (36) can be proved in exactly the same way as the upper bound of (34) above. Details are left to the reader.
We claim that we can choose x ∈ (0, 1) such that Ψ(1 − x, c)/x < 1. There are two cases to consider. If j≥2 k j < 1 so the result is trivial. Otherwise, z(1) = 1 and it is not hard to argue that by definition of c Γ and since c > c Γ . By Taylor's theorem it follows that z(x)/x < 1 for some 0 < x < 1 sufficiently close to 1. Hence, using Lemma 3.3, for x fixed as above, we can suppose that n is large enough so that almost surely on {T > i} for some fixed > 0.
Step 1. We now fix x as in (37) and let r = β log n for some constant β > 0 to be chosen later. Our first task is to show that if |A r | > 0, then it follows that T ↑ < T ↓ with high probability. If T ↑ < r there is nothing to do so we may assume that T ≥ r. To do this, let C > 3 log(1/x), we first show that if T ≥ r and |A r | > 0, then |A r | > C log n with high probability.
Thus we can choose β > 0 suitably large so that Step 2. We now show that, under the assumption T > r, T ↓ is unlikely to occur before T ↑ .
hence by the optional stopping theorem (since M is bounded), on the event {T > r} We deduce that Step 3. Note that if T ↑ > T ↓ we have necessarily that |A n 2/3 | > 0 (indeed, recall that |A i | + i is monotone as the total number of vertices discovered by stage i). In our third step we show that if |A n 2/3 | > 0, then with high probability |A n 2/3 | ≥ Kn 2/3 for some constant K > 0. There are two cases to consider: either T ↑ ≤ n 2/3 or T ↑ > n 2/3 . In the first case we have that since |A i | + i is the number of vertices discovered by stage i and is thus monotone, and the second inequality is the definition of T ↑ . Therefore, and so the claim is satisfied with K = 1. Thus consider the second case T ↑ < n 2/3 . Since we are also assuming that |A n 2/3 | > 0, we may thus assume that T > n 2/3 . Now Let K > 0 be chosen small enough that x −K < 1+ , so that the above quantity decays exponentially in n 2/3 , and in particular is smaller than n −3 for n sufficiently large. In either case we see that Step 4. Combining this with (38) we get P |A n 2/3 | ≤ Kn 2/3 ; T > r ≤ P(|A n 2/3 | = 0; T > r) + P(|A n 2/3 | ≤ Kn 2/3 ; T > n 2/3 ) In particular, Suppose v is a vertex and that |C v | > β log n = r. Then observe that T ↓ > r. Accordingly, it is likely that |A T ∧n 2/3 | ≥ Kn 2/3 by (39). If v is another vertex and we assume that |C v | > r, we may likewise explore its component. We seek to show that v and v are likely to be connected. As we explore C v we may find a connection from C v to C v before time T ∧ n 2/3 (in the exploration of C v ) in which case we are done. Else, we can repeat the argument above and show that it is likely that the active vertex set of C v also reaches Kn 2/3 , at a time T ∧ n 2/3 , with obvious notations. Let A = A T ∧n 2/3 (resp. A = A T ∧n 2/3 ) denote the active vertex set of C v (resp. C v ) at time T ∧ n 2/3 (resp. T ∧ n 2/3 ). Hence we may assume that A ∩ A = ∅ and |A|, |A | ≥ Kn 2/3 . We now show that A and A are likely to be connected by making use of the sprinkling technique. That is, suppose we add s packets, with s = Dn 2/3 log n k for some D > 0 to be chosen later on. Note that s k/n → 0 so that (s + s )k/n → c. Since s = s(n) is an arbitrary sequence such that sk/n → c it suffices to show that v and v are then connected at time s + s . In fact we will check that A and A are connected using smaller edges that the hyperedges making each packet, as follows. For each hyperedge of size j we will only reveal a subset of j/2 edges with disjoint support. This gives us a total of at least k/2 edges for each packet which are sampled uniformly at random without replacement from {1, . . . , n}. We will check that a connection occurs between A and A within these s k/2 edges. Let us say that A (resp. A ) is left half-vacant by a given (sub)packet if the intersection of the edges of the pack with A (resp. A ) don't contain more than Kn 2/3 /2 vertices, and call A or A half-full otherwise. It is obvious that if A is half-full then the probability for an edge to join A to A tends to one exponentially fast in n 2/3 , so we restrict to the case where a given (sub)packet leaves both A and A half-vacant. In this case, each edge from subpacket connects A to A with probability at least (Kn 2/3 /2) 2 (n − k) 2 ≥ K 2 8n 2/3 , independently for each edge within a given (sub)packet, and hence in particular independently for all the s k/2 edges we are adding in total. Consequently, the probability that no connection occurs during these s k/2 trials is at most For D > 0 sufficiently large this is less than n −3 .
Step 5. We are now ready to conclude that vertices are either in small component at time s or connected at time s + s . For v ∈ {1, . . . , n} let C v (s) denote the the component containing v at time s and for v, v ∈ {1, . . . , n}. Write v ↔ v to indicate that v is connected to v and define the good event Altogether we have just shown that P(G(v, v ) c ) ≤ 4n −3 , since if . Hence we see that by a union bound Then by (35), we know that |V |/n → θ(c) in probability as n → ∞. Moreover, we see that all the vertices of V are connected with probability 1 − o(1) at time s + s . Theorem 3.1 follows.

Poisson-Dirichlet structure
The renormalised cycle lengths X(σ) of a permutation σ ∈ S n is the cycle lengths of σ divided by n, written in decreasing order. In particular we have that X(σ) takes values in We equip Ω ∞ with the topology of pointwise convergence. If σ n is uniformly distributed in S n then X(σ n ) → Z in distribution as n → ∞ where Z is known as a Poisson-Dirichlet random variable. It can be constructed as follows. Let U 1 , U 2 , . . . be i.i.d. uniform random variables on [0, 1]. Let can be ordered in decreasing size and the random variable Z has the same law as (Z * 1 , Z * 2 , . . . ) ordered by decreasing size.
The next result is a generalisation of Theorem 1.1 in [22] to the case of general conjugacy classes. The proof is a simple adaptation of the proof of Schramm and we provide the details in an appendix.
Theorem 3.5. Suppose s = s(n) is such that sk/n → c as n → ∞ for some c > c Γ . Then we have that for any m ∈ N in distribution as n → ∞ where Z = (Z 1 , Z 2 , . . . ) is a Poisson-Dirichlet random variable.
4 Proof of curvature theorem

Proof of the upper bound on curvature
We claim that it is enough to show the upper bound for c > c Γ in (19). Indeed, notice that c → κ c is increasing. Let c ≤ c Γ and assume that lim sup n→∞ κ c ≤ θ(c ) 2 holds for all c > c Γ . Then we have that lim sup n→∞ κ c ≤ θ(c ) 2 for each c > c Γ . Taking c ↓ c Γ and using the fact that lim c ↓c Γ θ(c ) = 0 shows that lim n→∞ κ c = 0. Fix c > c Γ and let t := cn/|Γ| . We are left to show (19). In other words, we wish to prove that for some σ, σ ∈ S n lim inf We will choose σ = id and σ = τ 1 •τ 2 , where τ 1 , τ 2 are independent uniformly chosen transpositions.
To prove the lower bound on the Kantorovitch distance we use the dual representation of the distance W 1 (X, Y ) between two random variables X, Y : Let f (σ) = d(id, σ) be the distance to the identity (using only transpositions, as usual). Then observe that f is 1-Lipschitz. It suffices to show We will now show (42) by a coupling argument. Construct the two walks X τ 1 •τ 2 and X id as follows. Let γ 1 , γ 2 , . . . be a sequence of i.i.d. random variables uniformly distributed on Γ, independent of (τ 1 , τ 2 ). Using Lemma 2.4 with σ 0 = τ 1 • τ 2 , which is independent of X id , we can construct Next we couple X id t by constructing it as Thus under this coupling we have that We recall that a transposition can either induce a fragmentation or a coalescence of the cycles. Indeed, a transposition involving elements from the same cycle generates a fragmentation of that cycle, and one involving elements from different cycles results in the cycles being merged. (This property is the basic tool used in the probabilistic analysis of random transpositions, see e.g. [5] or [22]). Hence either τ 1 fragments a cycle of X t or τ 1 coagulates two cycles of X t . In the first case, d(id, X t • τ 1 ) = d(id, X t • τ 1 ) − 1, and in the second case we have d(id, X t • τ 1 ) = d(id, X t • τ 1 ) + 1. Let F denote the event that τ 1 causes a fragmentation. Then Using the Poisson-Dirichlet structure described in Theorem 3.5 it is not hard to show that P(F ) → θ(c) 2 /2 (see, e.g., Lemma 8 in [3]). Applying the same reasoning to X t • τ 1 • τ 2 and X t • τ 1 we deduce that lim from which the lower bound (43) and in turn (12) follow readily.

Proof of lower bound on curvature.
We now assume that c > c Γ and turn out attention to the lower bound on the Ricci curvature, which is the heart of the proof. Throughout we let k = |Γ| and t = cn/k . With this notation in mind we wish to prove that for some appropriate coupling of X σ and X σ , where the supremum is taken over all σ, σ with even distance. Note that we can make several reductions: first, by vertex transitivity we can assume σ = id is the identity permutation. Also, by the triangle inequality (since W 1 is a distance), we can assume that σ = (i, j) • ( , m) is the product of two distinct transpositions. There are two cases to consider: either the supports of the transpositions are disjoint, or they overlap on one vertex. We will focus in this proof on the first case where the support of the transpositions are disjoint; that is, i, j, l, m are pairwise distinct. The other case is dealt with very much in the same way (and is in fact a bit easier). Clearly by symmetry Ed(X id t , X ) is independent of i, j, and m, so long as they are pairwise distinct. Hence it is also equal to Ed(X id t , X τ 1 •τ 2 t ) conditioned on the event A that τ 1 , τ 2 having disjoint support, where τ 1 and τ 2 are independent uniform random transpositions. This event has an overwhelming probability for large n, thus it suffices to construct a coupling between X id and X τ 1 •τ 2 such that Indeed, it then immediately follows that the same is true with the expectation replaced by the conditional expectation given A. Next, let X be a random walk on S n which is the composition of i.i.d. uniform elements of the conjugacy class Γ. We decompose the random walk X into a walkX which evolves by applying transpositions at each step as follows. For t = 0, 1, . . . , write out where γ 1 , γ 2 , . . . are i.i.d. uniformly distributed in Γ. As before we decompose each step γ s of the walk into a product of cyclic permutations, say where r = j≥2 k j . The order of this decomposition is irrelevant and can be chosen arbitrarily. For concreteness, we decide that we start from the cycles of smaller sizes and progressively increase to cycles of larger sizes. We will further decompose each of these cyclic permutation into a product of transpositions, as follows: for a cycle c = (x 1 , . . . , x j ), write This allows to break any step γ s of the random walk X into a number where τ (j) s are transpositions. Note that the vectors (τ (i) s ; 1 ≤ i ≤ ρ) in (46) are independent and identically distributed for s = 1, 2, . . . and for a fixed s and 1 ≤ i ≤ ρ, τ (i) s is a uniform transposition, by symmetry. However it is important to observe that they are not independent. Nevertheless, they obey a crucial conditional uniformity which we explain now. First we have differentiate between the set of times when a new cycle starts and the set of times when we are continuing an old cycle. We see that s is a refreshment time if the transposition being applied toX at time s is the start of a new cycle. Using this we can describe the law of the transpositions being applied toX. Note that in either case, the second marker y is conditionally uniformly distributed among the vertices which have not been used so far. This conditional independence property is completely crucial, and allows us to make use of methods (such as that of Schramm [22]) developed initially for random transpositions) for general conjugacy classes, so long as |Γ| = o(n). Indeed in that case the second marker y itself is not very different from a uniform random variable on {1, . . . , n}.
We will study this random walk using this new transposition time scale. We thus define a process X = (X u : u = 0, 1, . . .) as follows. Let u ∈ {0, 1, . . .} and write u = sρ+i where s, i are nonnegative integers and i < ρ. Then defineX Thus it follows that for any s ≥ 0,X sρ = X s . Notice thatX evolves by applying successively transpositions with the above mentioned conditional uniformity rules. Now consider our two random walks, X id and X τ 1 •τ 2 respectively, started respectively from id and τ 1 • τ 2 , and letX id andX τ 1 •τ 2 be the associated processes constructed using (47), on the transposition time scale. Thus to prove (44) it suffices to construct an appropriate coupling betweeñ X id tρ andX τ 1 •τ 2 tρ . Next, recall that for a permutation σ ∈ S n , X(σ) denotes the renormalised cycle lengths of σ, taking values in Ω ∞ defined in (40). The walksX id andX τ 1 •τ 2 are invariant by conjugacy and hence both are distributed uniformly on their conjugacy class. Thus ultimately it will suffice to couple X(X id tρ ) and X(X τ 1 •τ 2 tρ ). Fix δ > 0 and let ∆ = δ −9 . Define Our coupling consists of three intervals [0, s 1 ], (s 1 , s 2 ] and (s 2 , s 3 ].
Let us informally describe the coupling before we give the details. In what follows we will couple the random walksX id andX τ 1 •τ 2 such that they keep their distance constant during the time intervals [0, s 1 ] and (s 2 , s 3 ]. In particular we will see that at time s 1 , the walksX id andX τ 1 •τ 2 will differ by two independently uniformly chosen transpositions. Thus at time s 1 most of the cycles ofX id andX τ 1 •τ 2 are identical but some cycles may be different. We will show that given that the cycles that differ at time s 1 are all reasonably large, then we can reduce the distance between the two walks to zero during the time interval (s 1 , s 2 ]. Otherwise if one of the differing cycles is not reasonably large, then we couple the two walks to keep their distance constant during the time interval [0, s 1 ], (s 1 , s 2 ] and (s 2 , s 3 ]. More generally, our coupling has the property that d(X id t , X τ 1 •τ 2 t ) is uniformly bounded, so that it will suffice to concentrate on events of high probability in order to get a bound on the ).

Coupling for [0, s 1 ]
First we describe the coupling during the time interval [0, s 1 ]. LetX = (X s : s ≥ 0) be a walk with the same distribution asX id , independent of the two uniform transpositions τ 1 and τ 2 . Then we have that by Lemma 2.4 for any s ≥ 0,X τ 1 •τ 2 s has the same distribution asX s • τ 1 • τ 2 . Thus we can couple X(X id s 1 ) and X(X τ 1 •τ 2 (48)

Coupling for (s 1 , s 2 ]
For s ≥ 0 defineX s = X(X id s+s 1 ) andȲ s = X(X τ 1 •τ 2 s+s 1 ). Here we will coupleX s andȲ s for s = 0, . . . , ∆. We create a matching betweenX s andȲ s by matching an element ofX s to at most one element ofȲ s of the same size. At any time s there may be several entries that cannot be matched. By parity the combined number of unmatched entries is an even number, and observe that this number cannot be equal to two. NowX id s 1 andX τ 1 •τ 2 s 1 differ by two transpositions as can be seen from (48). This implies that in particular initially (i.e., at the beginning of (s 1 , s 2 ]), there are four, six or zero unmatched entries betweenX 0 andȲ 0 .
Fix δ > 0 and let A(δ) denote the event that the smallest unmatched betweenX 0 andȲ 0 has size greater than δ > 0. We will show that on the event A(δ) we can couple the walks such that X ∆ =Ȳ ∆ with high probability. On the complementary event A(δ) c , couple the walks so that their distance remains 1 during the time interval (s 1 , s 2 ], similar to the coupling during [0, s 1 ]. It remains to define the coupling during the time interval (s 1 , s 2 ] on the event A(δ). We begin by estimating the probability of A(δ).  Recall the hypergraph H s 1 /ρ on {1, . . . , n} defined in the beginning of Section 3.1. Since c > c Γ , H s 1 /ρ has a (unique) giant component with high probability. Let A 1 be the event that the four points composing the transpositions τ 1 , τ 2 fall within the largest component of the associated hypergraph H s 1 /ρ . It follows from Theorem 3.5 that conditionally on the event A 1 , A(δ) has probability greater than (1 − δ) 4 . Also, since the relative size of the giant component converges in probability θ(c) by Lemma 3.1, it is obvious that P(A 1 ) → θ(c) 4 and thus the lemma follows.
Recall that the transpositions which make up the walksX id andX τ 1 •τ 2 obey what we called conditional uniformity in Proposition 4.1. For the duration of (s 1 , s 2 ] we will assume the relaxed conditional uniformity assumption, which we describe now. In both cases we take y to be uniformly distributed on {1, . . . , n}\{x}. In making the relaxed conditional uniformity assumption we are disregarding the constraints on (x, y) given in Proposition 4.1. However the probability we violate this constraint at any point during the interval (s 1 , s 2 ] is at most 2(s 2 − s 1 )ρ/n = 2∆k/n and on the event that this constraint is violated the distance between the random walks can increase by at most (s 2 − s 1 ) = ∆. Hence we can without a loss of generality assume that during the interval (s 1 , s 2 ] bothX id andX τ 1 •τ 2 satisfy the relaxed conditional uniformity assumption. Now we show that on the event A(δ) we can couple the walks such thatX ∆ =Ȳ ∆ with high probability. The argument uses a coupling of Berestycki, Schramm, Zeitouni [6], itself a variant of a beautiful coupling introduced by Schramm [22]. We first introduce some notation. Let Notice that the walksX andȲ both take values in Ω n .
Let us describe the evolution of the random walkX = (X s : s = 0, 1, . . . ). Suppose that s ≥ 0 andX s =x = (x 1 , . . . , x n ). Now imagine the interval (0, 1] tiled using the intervals (0, x 1 ], . . . , (0, x n ] (the specific tiling rule does not matter). Initially for s = 0 we select u ∈ {1/n, . . . , n/n} uniformly at random and then call the tile that u falls into marked. Next if s ≥ 1 is not a refreshment time then we keep marked the tile which was marked in the previous step. Otherwise if s ≥ 1 is a refreshment time we select a new marked tile by selecting u ∈ {1/n, . . . , n/n} uniformly at random and marking the tile which u falls into.
Let I be the marked tile. Select v ∈ {2/n, . . . , n/n} uniformly at random and let I be the tile that v falls in. Then if I = I then we merge the tiles I and I . The new tile we created is now marked. If I = I then we split I into two tiles, one of size v − 1/n and the other of size |I| − (v − 1/n). The tile of size v is now marked. NowX s+1 is the sizes of the tiles in the new tiling we have created, ordered in decreasing order.
The evolution ofX described above corresponds to the evolution of X as follows. Suppose we apply the transposition (x, y) to X s in order to obtain X s+1 . The marked tile at time s corresponds to the cycle of X s containing x: if s is a refreshment time then x ∈ {1, . . . , n} is chosen uniformly, otherwise x is the second marker from the previous step. Then we write the cycle containing x as (x, x 1 , . . . , x m ) and so the point x corresponds to 1/n in the tiling. Then we select the second marker y ∈ {1, . . . , n}\{x} uniformly which corresponds to the selection of the marker v ∈ {2/n, . . . , n/n}.
Before we describe the coupling in detail let us make a remark. In the course of the coupling there may be several things that may go wrong; for example the size of the smallest unmatched component may become too small. We will estimate the probability of such unfortunate events and see that these tend to zero when we take n → ∞ and then δ → 0. The coupling which we describe keeps the distance between walks X id and X τ 1 •τ 2 bounded by 4, hence we can safely ignore these unfortunate events.
We now recall the coupling of [6]. Let s ≥ 0. Suppose thatX s =x = (x 1 , . . . , x n ) and Y s =ȳ = (y 1 , . . . , y n ). Then we can differentiate between the entries that are matched and those that are unmatched: recall that two entries fromx andȳ are matched if they are of identical size. Our goal will be to create as many matched parts as possible and as quickly as possible. Let Q be the total mass of the unmatched parts in eitherx orȳ. When putting down the tilingsx andỹ, associated withx andȳ respectively, we will do so in such a way that all matched parts are at the right of the interval (0, 1] and the unmatched parts occupy the left part of the interval. Initially for s = 0 suppose that u ∈ {1/n, . . . , n/n} is chosen uniformly and call the tile that u falls into in each ofx andỹ, marked. As before if s ≥ 1 is not a refreshment time then we keep marked the tiles which were marked in the previous step. Otherwise if s ≥ 1 is a refreshment time we select new marked tiles in bothx andỹ by selecting u ∈ {1/n, . . . , n/n} uniformly at random and marking the tiles which u falls into in each ofx andỹ. Let Ix and Iȳ be the respective marked tiles of the tilingsx andỹ, and letx,ŷ be the tiling which is the reordering ofx,ỹ in which Ix and Iȳ have been put to the left of the interval (0, 1]. Let a = |Ix| and let b = |Iȳ| be the respective lengths of the marked tiles, and assume without loss of generality that a < b. Let v ∈ {2/n, . . . , n/n} be chosen uniformly. We will apply v tox as we did in the transition above and obtainX s+1 . We now describe how construct an other uniform random variable v ∈ {2/n, . . . , n/n} which will be applied toŷ. If Ix is matched (which implies that Iȳ is also matched) then we take v = v as in the coupling of Schramm [22]. In the case when Ix is unmatched (which implies Iȳ is also unmatched) in the coupling of Schramm one again takes v = v , here we do not take them equal and apply to v a measure-preserving map Φ, defined as follows.
For w ∈ {2/n, . . . , n/n} consider the map where γ n := an/2 − 1 /n. It is not hard to check that Φ is measure preserving, thus letting v = Φ(v) we have that v has the correct marginal distribution.
If v / ∈ Ix then we merge the tile containing v and Ix. The new tile is now marked. If v ∈ Ix we split the tile Ix into two tiles, one of length v − 1/n and one of length a − (v − 1/n). We mark the tile of size v − 1/n. NowX s+1 is the sizes of the tiles in the new tiling we have created, ordered in decreasing order. We obtainȲ s+1 from the same procedure as we did to obtainX s+1 , but we use v instead of v. We give an example of an evolution under this coupling in Figure 1 The somewhat remarkable property of this coupling is that the number of unmatched entries can only decrease. Unmatched entries disappear when they are coalesced. In particular they disappear quickly when their size is reasonably large. Hence it is particularly desirable to have a coupling in which unmatched components stay large. The second crucial property of this coupling is that it does not create arbitrarily small unmatched entries: even when unmatched entry is fragmented, Figure 1: The evolution under the coupling betweenX andȲ . The red entries represent the marked entries.
the size of the smallest unmatched entry cannot decrease by more than a factor of two. This is summarised by the following, which is Lemma 19 from [6]. Lemma 4.3. Let U be the size of the smallest unmatched entry in two partitionsx,ȳ ∈ Ω n , let x ,ȳ be the corresponding partitions after one transposition of the coupling, and let U be the size of the smallest unmatched entry inx ,ȳ . Assume that 2 j ≤ U < 2 j+1 for some j ≥ 0. Then it is always the case that U ≥ U/2 − 1/n, and moreover,  We now explain our strategy. On A(δ) we will expect that the unmatched components will remain of a size roughly of order at least δ for a while. In fact we will show that they will stay at least as big as O(δ 2 ) for a long time. Unmatched entries disappear when they are merged together. If all unmatched entries are of size at least δ 2 , we will see that with probability at least δ 8 , we have a chance to reduce the number of unmatched entries in every 4 steps. Then a simple argument shows that after time ∆ = δ −9 ,X ∆ andȲ ∆ are perfectly matched with a probability tending to one as δ → 0. Proof. Let δ 0 > 0 be such that (1 − δ 0 ) 9! ≥ δ 1/2 0 and assume that δ < δ 0 . Hence it also true that (1 − δ) 9! ≥ δ 1/2 . Let Z = (Z 1 , . . .) be a Poisson-Dirichlet random variable on Ω ∞ and let (Z * 1 , . . . ) denote the size biased ordering of Y . Recall that Z * 1 is uniformly distributed over [0, 1], Z * 2 is uniformly distributed on [0, 1 − Z * 1 ], and so on. For the event {Z 1 ≤ δ} to occur it is necessary that Z * 1 ≤ δ, Z * 2 ≤ δ/ (1 − δ), . . . , Z * 10 ≤ δ/(1 − δ) 9 . This has probability at most δ 10 /(1 − δ) 9! . Note that since δ < δ 0 , we have that (1 − δ) 9! ≥ δ 1/2 . Thus Summing over ∆ = O(δ −9 ) steps we see that the expected number of times during the interval [0, ∆] such thatX s orȲ s don't have a component of size at least θ(c)δn is less than δ 1/2 as n → ∞ and is thus less than 2δ 1/2 for n sufficiently large, by Theorem 3.5 (note that we can apply the result because this calculation involves only a finite number of components). The result follows.
We now check that all unmatched components really do stay greater than δ 2 during [0, ∆]. Let T δ denote the first time s that eitherX s orȲ s have no cycles greater than δθ(c)n.
Proof. Say that an integer k is in scale j if 2 j /n ≤ k < 2 j+1 /n. For s ≥ 0, let U (s) denote the scale of the smallest unmatched entry ofX s ,Ȳ s . Let j 0 be the scale of δ, and let j 1 be the integer immediately above the scale of δ 2 .
Suppose for some time s ≤ T δ , we have U (s) = j with j 1 ≤ j ≤ j 0 , and the marked tile at time s corresponds to the smallest unmatched entry. Then after this transposition we have U (s+1) ≥ j −1 by the properties of the coupling (Lemma 4.3). Moreover, U (s + 1) = j − 1 with probability at most r j = 2 j+2 /n. Furthermore, since s ≤ T δ , we have that this marked tile merges with a tile of size at least θ(c)δ with probability at least θ(c)δ after the transposition. We call the first occurrence a failure and the second a mild success.
Once a mild success has occurred, there may still be a few other unmatched entries in scale j, but no more than five since the total number of unmatched entries is decreasing. And therefore if six mild successes occur before a failure, we are guaranteed that U (s + 1) ≥ j + 1. We call such an event a good success, and note that the probability of a good success, given that U (s) changes scale, is at least p j = 1 − 6r j /(r j + θ(c)δ). We call q j = 1 − p j .
Let {q i } i≥0 be the times at which the smallest unmatched entry changes scale, with q 0 being the first time the smallest unmatched entry is of scale j 0 . Let {U i } denote the scale of the smallest unmatched entry at time q i . Introduce a birth-death chain on the integers, denoted v n , such that v 0 = j 0 and and Then it is a consequence of the above observations that j}. An analysis of the birth-death chain defined by (50), (51) gives that (see, e.g., Theorem (3.7) in Chapter 5 of [10]). Thus, by considering the 10 lowest terms in the product above (and note that for δ > 0 small enough, there are at least 10 terms in this product), we deduce that P j 0 (τ j 1 < τ j 0 ) decays faster than (16δ/θ(c)) 10 . Since T δ ∧ ∆ ≤ ∆ = O(δ −9 ) we conclude that the probability that U (s) = j 1 before T δ ∧ ∆ is at most δ(16/θ(c)) 10 .
We are now going to prove that on the event A(δ), after time ∆ there are no unmatched with probability tending to one as n → ∞ and δ → 0. The basic idea is that there are initially at most six unmatched parts, and this number cannot increase. Proof. Suppose δ > 0 is sufficiently small and condition throughout on the event A(δ). Let T δ be the first time one of the unmatched entries is smaller than δ 2 or T δ , whichever comes first. By Lemma 4.5 and Lemma 4.6 we have that for large n, Henceforth condition on the event {T δ ≥ ∆}. Initially there are at most 6 unmatched entries. Due to parity there can be either 6, 4 or 0 unmatched entries (note in particular that 2 is excluded, as a quick examination shows that no configuration can give rise to two unmatched entries). Furthermore, by the virtue of the coupling the number of unmatched entries either remains the same or decreases sequentially. Once all the entries are matched they remain matched thereon. In order for the unmatched entries to decrease at time s ∈ {2, . . . , ∆} it must be the case that bothX s and Y s must have at least 2 unmatched entries. Call this a good configuration. Let F s be the event that at time s the configuration is good and one of the two marked tiles at time s is the smallest unmatched tile.
We now show that P(F s ) ≥ δ 4 /2 by considering different cases: • Suppose that at time s − 1 the configuration is good. Then placing the second marker (v or v ) inside the smallest unmatched tile will guarantee that at time s the configuration is still good. Suppose without loss of generality that v lands in the smallest unmatched tile, then it could be the case that at time s − 1 the smallest unmatched tile was marked. In this case the smallest unmatched tile will fragment into two and the smaller of the two pieces will be matched and the resulting tile on the left will be marked. If a is the size of the smallest entry and v ∈ [a/2, a] then both marked tiles at time s will be unmatched and furthermore of them will correspond to the smallest unmatched entry at time s. Hence the probability that F s holds in this case is at least δ 2 /2.
• Suppose that the configuration at time s − 1 is bad: that is, one copy has one unmatched entry and the other copy has either three or five unmatched entries. Suppose, without a loss of generality thatX s−1 has one unmatched entry which means thatȲ s−1 has at least three unmatched entries. To get to a good configuration at time s it suffices to coagulate two of the unmatched entries ofȲ s−1 (as then automatically, by the properties of the coupling, the single unmatched entry inX s−1 fragments into two). In order for this to happen, the marked tiles at time s − 1 must be unmatched. We force the marked entries at time s − 1 to be unmatched as follows.
-If s − 1 is a refreshment time then we ask that the marker u at time s − 1 falls inside an unmatched tile which is not the smallest unmatched tile. This happens with probability at least δ 2 .
-If s − 1 is not a refreshment time then we ask for the marker v and v at time s − 2 to fall inside an unmatched tile which is not the smallest unmatched tile. This happens with probability at least δ 2 . As before, once the markers v and v fall inside unmatched tiles the probability that the marked tile at time s − 1 is unmatched is 1/2.
Suppose now that at s−1 the marked tiles are unmatched but neither is the smallest unmatched tile. If the marker v or the marker v at time s − 1 falls inside the smallest unmatched tile then we are guaranteed that F s holds and this happens with probability at least δ 2 . Hence we see that the probability that F s holds when the configuration at time s − 1 is bad is at least δ 4 /2.
We have just shown that P(F s ) ≥ δ 4 /2. Now suppose F s holds. With probability greater than δ 2 we have that one of the marked tiles at time s is the smallest unmatched tile (in fact the probability is 1 if s is not a refreshment time). Since there are at least 2 unmatched parts in each copy, let R be the tile corresponding to a second unmatched tile in the copy that contains the larger of the two marked tiles. Then |R| > δ 2 , and moreover when v falls in R, we are guaranteed that a coagulation is going to occur in both copies hence decreasing the total number of unmatched entries. Let K s denote this event and call this a success. Thus we have just shown that P(K s |F s ) ≥ δ 4 .
Notice that for s ∈ {2, . . . , ∆} we have that the marked tiles and the markers (v, v ) used in the transition from time s + 1 to s + 2 are independent from F s := σ((X ,Ȳ ) : ≤ s). Thus we can repeat the same argument as before to obtain that for any s ∈ {1, . . . , (∆ − 1)/4 } we have that P(K 4s ∩ F 4s |F 4s−2 ) ≥ δ 8 /2. Hence it follows that the number of successes before time ∆ stochastically dominates a random variable H which has the binomial distribution Bin( (∆ − 1)/4 , δ 8 /2). The event that {X ∆ = Y ∆ } implies that there has been at most one success. Thus for δ > 0 small enough As ∆ = O(δ −9 ), the right hand side of the equation above converges to 0 as δ ↓ 0 and using (52) finishes the proof.

Coupling for (s 2 , s 3 ]
The walksX id andX τ 1 •τ 2 are uniformly distributed on their conjugacy class. Thus one can couplẽ X id andX τ 1 •τ 2 so that • we have that using Lemma 4.7 The theorem now follows immediately. Thus using Lemma 4.8 we see that (44) holds which finishes the proof.

A Lower bound on mixing
In this section we give a proof of the lower bound on t mix (δ) for some arbitrary δ ∈ (0, 1). This is for the most part a well-known argument, which shows that the number of fixed points at time (1 − ) t mix is large. In the case of random transpositions or more generally of a conjugacy class Γ such that |Γ| is finite, this follows easily from the coupon collector problem. When |Γ| is allowed to grow with n, we present here a self-contained argument for completeness. Let Γ ⊂ S n be a conjugacy class and set k = k(n) = |Γ|.
Lemma A.1. We have that for any ∈ (0, 1), Proof. Let K m ⊂ S n be the set of permutations which have at least m fixed points. Recall that µ is the invariant measure, which is a uniform probability measure on S n or A n depending on the parity of Γ. Let U denote the uniform measure on S n . Either way, Fix β > 0 and let Assume that β is such that t β is an integer. Consider for 1 ≤ i ≤ n the event A i that the i-th card is not collected by time t β , that is i / ∈ t β =1 N (γ ). Thus for 1 ≤ i 1 < · · · < i ≤ n and ≤ n − k, Let N = N (n) ∈ N be increasing to infinity such that N 2 = o(n) and N = o(n 2 k −2 ). Then by the inclusion-exclusion formula we have that Writing out the fraction of binomials on the right hand side we have On the other hand we have that Note that ne −t β k/n = β, then combining (55) and (56) we get Let us lower bound the error term on the left hand side of (57). First (1 − /n) ≥ e − 2 /(n− ) , hence it follows that It is easy to see that the right hand side above converges to 1 as n → ∞. Using this and (57) it follows that For integers a < b let Let Likewise, for any j < n/N , Hence Let > 0. Then for any β > 0, if t = (1 − ) t mix then t < t β for n sufficiently large, and hence But it is obvious that ∩ m j=1 K [jN,(j+1)N ] ⊂ K m and hence for t = (1 − ) t mix , Comparing with (53) the result follows.

B Proof of Theorem 3.5
Let Γ ⊂ S n be a conjugacy class with cycle structure (k 2 , k 3 , . . . ). Let X = (X t : t = 0, 1, . . . ) be a random walk on S n which at each step applies an independent uniformly random element of Γ. Let ρ = j (j − 1)k j and letX be the transposition walk associated to the walk X using (47). In particular for t ≥ 0,X tρ = X t . Finally let Z = (Z 1 , Z 2 , . . . ) denote a Poisson-Dirichlet random variable. For convenience we restate Theorem 3.5 here.
Theorem B.1. Let s ≥ 0 be such that sk/(nρ) → c for some c > c Γ . Then for each m ∈ N we have that as n → ∞, in distribution where θ(c) is given by (20).
The proof of this result is very similar to the proof of Theorem 1.1 in [22]. We give the details here.
Recall the hypergraph process H = (H t : t = 0, 1, . . . ) associated with the walk X defined in Section 3.1. Analogously letG = (G t : t = 0, 1, . . . ) be a process of graphs on {1, . . . , n} such that the edge {x, y} is present inG t if and only if the transposition (x, y) has been applied toX prior to and including time t. Hence we have that for each t = 0, 1, . . . ,G tρ = H t .
Recall thatX satisfies conditional uniformity as described in Proposition 4.1. Using the graph processG above and the conditional uniformity ofX the following lemma, which is the analogue of Lemma 2.4 in [22], follows almost verbatim from Schramm's arguments. For t = 0, . . . , ∆ defineX t = X(X s 0 +t ). We can assume that for t ≤ ∆,X s 0 +t satisfies the relaxed conditional uniformity assumption described in Definition 4.2. Indeed by making this assumption we are disregarding the constraint on the transpositions described in Proposition 4.1 applied toX t for t = s 0 , . . . , s. However the probability that we violate this constraint is at most 2∆k/n.
Colour an element ofX 0 = X(X s 0 ) green if the cycle whose renormalised cycle length is this element lies in the giant component ofG s 0 . We colour all the other elements ofX 0 red. Thus asymptotically in n, the sum of the green elements is θ(c) and the sum of the red elements is 1 − θ(c). In the evolution of (X t : t = 0, 1, . . . ) we keep the colour scheme as follows. If an element fragments, both fragments retain the same colour. If we coagulate two elements of the same colour then the new element retains the colour of the previous two elements. If we coagulate a green element and a red element, then the colour of the resulting element is green. DefineX = (X t : t = 0, . . . , ∆) andX = (X t : t = 0, . . . , ∆) as follows. InitiallyX 0 = X 0 =X 0 . Apply the same colouring scheme toX andX as we did toX. Each step evolution is described as follows. Then the walks evolve as follows.
•X t : Evolves the same asX except we ignore any transition which involves a red entry.
•X t : Evolves the same asX except that the markers u, v used in the transitions ofX are distributed uniformly on [0, 1].
Lemma 3.1 states that the second largest component ofG s 0 has size o(n). Hence, initially each red element has size o(1) as n → ∞. Now ∆ does not increase with n, hence for any s = 0, 1, . . . , ∆, we are unlikely to make a coagulation (or fragmentation) inX s without coagulating (or fragmenting) entries ofX s of similar size. Similar considerations for the processesX andX leads to the following lemma.
We define a processZ = (Z t : t = 0, 1, . . . , √ ∆) as follows. Initially Z 0 has the distribution of a Poisson-Dirichlet random variable, independent ofȲ . Then for t = 1, . . . , ∆ defineZ t by applying the coupling in Section 4.2.2 toȲ andZ but with the following modifications: • the markers u, v ∈ [0, 1] are taken uniformly at random, • we always take v = v, • we modify the definition of a refreshment time: s is a refreshment time if either J s−1 + 2 ≤ J s or J s + s 0 is a refreshment time in the sense of Definition 4.1, • when a marked tile of size a fragments, it creates a tile of length v and a tile of length a − v.
We mark the tile of length a − v.
It is not hard to check that the Poisson-Dirichlet distribution is invariant under this evolution and hence we have that for each t = 0, 1, . . . , ∆ , Z t has the law of a Poisson-Dirichlet. Our coupling agrees with the coupling in [22,Section 3] when Γ = T is the set of all transpositions. In this case each time s is a refreshment time and hence the marked tile at time s is always chosen by the marker u. One can adapt the arguments in Chapter 3 of Schramm's paper to our case by using the following idea. Note first that all the estimates of Schramm apply at s when s is a refreshment time. When s is not a refreshment time and Schramm considers the event that the marker u at time s falls inside an unmatched tile, instead we consider the event that the marker v at time s − 1 falls inside an unmatched tile. By the properties of the coupling, this guarantees that at time s the marked tile is unmatched.
Now we bound P(B c 2 ). Firstly we use the bound¯ ≥ and so we are left to bound N 0 from above. Using the stick breaking construction of Poisson-Dirichlet random variables (see for example [