Cutoff for conjugacy-invariant random walks on the permutation group

We prove a conjecture raised by the work of Diaconis and Shahshahani (Z Wahrscheinlichkeitstheorie Verwandte Geb 57(2):159–179, 1981) about the mixing time of random walks on the permutation group induced by a given conjugacy class. To do this we exploit a connection with coalescence and fragmentation processes and control the Kantorovich distance by using a variant of a coupling due to Oded Schramm as well as contractivity of the distance. Recasting our proof in the language of Ricci curvature, our proof establishes the occurrence of a phase transition, which takes the following form in the case of random transpositions: at time cn / 2, the curvature is asymptotically zero for c≤1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c\le 1$$\end{document} and is strictly positive for c>1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$c>1$$\end{document}.


Main results
Let S n denote the multiplicative group of permutations of {1, . . . , n}. Let ⊂ S n be a fixed conjugacy class in S n , i.e., = {gγ g −1 : g ∈ S n } for some fixed permutation γ ∈ S n . Alternatively, is the set of permutation in S n having the same cycle structure as γ . Let X σ = (X 0 , X 1 , . . .) be discrete-time random walk on S n induced by , started in the permutation σ ∈ S n , and let Y σ be the associated continuous time random walk. These are the processes defined by where γ 1 , γ 2 , . . . are i.i.d. random variables which are distributed uniformly in ; and (N t , t ≥ 0) is an independent Poisson process with rate 1. Then Y is a Markov chain which converges to an invariant measure μ as t → ∞. If ⊂ A n (where A n denotes the alternating group) then μ is uniformly distributed on A n and otherwise μ is uniformly distributed on S n . The simplest and most well known example of a conjugacy class is the set T of all transpositions, or more generally of all cyclic permutations of length k ≥ 2. This set will play an important role in the rest of the paper. Note that depends on n but we do not indicate this dependence in our notation.
The main goal of this paper is to study the cut-off phenomenon for the random walk X . More precisely, recall that the total variation distance X − Y T V between two random variables X , Y taking values in a set S is given by For 0 < δ < 1, the mixing time t mix (δ) is by definition given by and μ is the invariant measure defined above.
In the case where = T is the set of transpositions, a famous result of Diaconis and Shahshahani [10] is that the cut-off phenomenon takes place at time (1/2)n log n asymptotically as n → ∞. That is, t mix (δ) is asymptotic to (1/2)n log n for any fixed value of 0 < δ < 1. It has long been conjectured that for a general conjugacy class such that | | = o(n) (where here and in the rest of the paper, | | denotes the number of non fixed points of any permutation γ ∈ ), a similar result should hold at a time (1/| |)n log n. This has been verified for k-cycles with a fixed k ≥ 2 by Berestycki et al. [6]. This is a problem with a substantial history which will be detailed below. The primary purpose of this paper is to verify this conjecture. Hence our main result is as follows.

Theorem 1.1 Let ⊂ S n be a conjugacy class and suppose that | | = o(n). Define
Then for any > 0, The first limit of (5) is proved in Appendix A. The rest of the paper focuses on the second limit. Our main tool for this result is the notion of discrete Ricci curvature as introduced by Ollivier [18], for which we obtain results of independent interest. We briefly discuss this notion here; however we point out that this turns out to be equivalent to the more well-known path coupling method and transportation metric introduced by Bubley and Dyer [8] and Jerrum [14] (see for instance Chapter 14 of the book [16] for an overview). However we will cast our results in the language of Ricci curvature because we find it more intuitive. Recall first that the definition of the L 1 -Kantorovich distance (sometimes also called Wasserstein or transportation metric) between two random variables X, Y taking values in a metric space (S, d) is given by where the infimum is taken over all couplings (X ,Ŷ ) which are distributed marginally as X and Y respectively. Ollivier's definition of Ricci curvature of a Markov chain (X t , t ≥ 0) on a metric space (S, d) is as follows: Definition 1. 1 Let t > 0. The curvature between two points x, x ∈ S with x = x is given by where X x t and X x t denote Markov chains started from x and x respectively. The curvature of X is by definition equal to In the terminology of Ollivier [18], this is in fact the curvature of the discrete-time random walk whose transition kernel is given by m x (·) = P(X t = ·|X 0 = x). We refer the reader to [18] for an account of the elegant theory which can be developed using this notion of curvature, and point out that a number of classical properties of curvature generalise to this discrete setup.
For our results it will turn out to be convenient to view the symmetric group as a metric space equipped with the metric d which is the word metric induced by the set T of transpositions (we will do so even when the random walk is not induced by T but by a general conjugacy class ). That is, the distance d(σ, σ ) between σ, σ ∈ S n is the minimal number of transpositions one must apply to get from one element to the other (one can check that this number is independent of whether right-multiplications or left-multiplications are used).
For simplicity we focus in this introduction on the case where the random walk is induced by the set of transpositions T . (A more general result will be stated later on the paper). For c > 0 and σ = σ , let and define κ c (σ, σ ) = 1. That is, κ c (σ, σ ) = κ cn/2 (σ, σ ) with our notation from (7). In particular, κ c depends on n but this dependency does not appear explicitly in the notation. It is not hard to see that κ c (σ, σ ) ≥ 0 (apply the same transpositions to both walks X σ and X σ ). For parity reasons it is obvious that κ c (σ, σ ) = 0 if σ and σ do not have the same signature. Thus we only consider the curvature between elements of even distance. For c > 0 define where the infimum is taken over all σ, σ ∈ S n such that d(σ, σ ) is even. Our main result states that κ c experiences a phase transition at c = 1. More precisely, the curvature κ c is asymptotically zero for c ≤ 1 but for c > 1 the curvature is strictly positive asymptotically. In order to state our result, we introduce the quantity θ(c), which is the largest solution in [0, 1] to the equation It is easy to see that θ(c) = 0 for c ≤ 1 and θ(c) > 0 for c > 1. In fact, θ(c) is nothing else but the survival probability of a Galton-Watson tree with Poisson offspring distribution with mean c.

Theorem 1.2
For any c > 0, we have: In particular, lim n→∞ κ c = 0 if and only if c ≤ 1, while lim inf n→∞ κ c > 0 otherwise.
A more general version of this theorem will be presented later on, which gives results for the curvature of a random walk induced by a general conjugacy class . This will be stated as Theorem 2. 3.
We believe that the upper bound is the sharp one here, and thus make the following conjecture. Of course the conjecture is already established for c ≤ 1 and so is only interesting for c > 1.

Relation to previous works on the geometry of random transpositions
The transition described by Theorem 1.2 says that the discrete Ricci curvature increases abruptly (asymptotically) from zero to a positive quantity as c increases past the critical value c = 1, and so as we consider longer portions of the random walk. It is related to a result proved by the first author in [2]. There it was shown that the triangle formed by the identity and two independent samples X t and X t from the random walk run for time t = cn/4, is thin (in the sense of Gromov hyperbolicity) if and only if c < 1. Note that by reversibility, the path running from X t to X t (via the identity) is a random walk run for time cn/2. In other words, the result from [2] implies that the permutation group appears Gromov hyperbolic from the point of view of a random walker so long as it takes fewer than cn/2 steps with c < 1.
Hence, in both Theorem 1.2 and [2], there is a change of geometry (as perceived by a random walker) from low to high curvature after running for exactly t = cn/2 steps with c = 1. At this point, we do not know of a formal way to relate these two observations, so they simply seem analogous. In a private conversation with the first author in 2005, Gromov had suggested that the hyperbolicity transition of [2] could be translated more canonically into the language of Ricci curvature and was an effect of the global positive curvature of S n rather than a breakdown in hyperbolicity. In a sense, Theorem 1.2 can be seen as a formalisation and justification of his prediction.

Relation to previous works on mixing times
Mixing times of Markov chains were initiated independently by Aldous [1] and by Diaconis and Shahshahani [10]. In particular, as already mentioned, Diaconis and Shahshahani proved Theorem 1.1 in the case where is the set T of transpositions. Their proof relies on some deep connections with the representation theory of S n and bounds on so-called character ratios. The conjecture about the general case appears to have first been made formally in print by Roichman [20] but it has no doubt been asked privately before then. We shall see that the lower bound t mix (δ) ≥ (1 + o(1))(1/| |)n log n is fairly straightforward (it is carried out in Appendix A and is as usual based on a coupon-collector type argument); the difficult part is the corresponding upper bound.
Flatto et al. [13] built on the earlier work of Vershik and Kerov [25] to obtain that t mix (δ) ≤ (1/2 +o(1))n log n when | | is bounded (as is noted in [9, pp. 44-45]). This was done using character ratios and this method was extended further by Roichman [20,21] to show an upper bound on t mix (δ) which is sharp up to a constant when | | = o(n) (and in fact, more generally when | | is allowed to grow to infinity as fast as (1−δ)n for any δ ∈ (0, 1)). Again using character ratios Lulov and Pak [17] showed the cut-off phenomenon as well as t mix = (1/| |)n log n in the case when | | ≥ n/2. Roussel [22,23] obtains the correct value of the mixing time and establishes the cut-off phenomenon for the case | | ≤ 6.
Finally, let us discuss two more recent papers to which this work is most closely related to. Berestycki et al. [6], show using coupling arguments and a connection to coalescence-fragmentation processes that the cutoff phenomenon occurs at t mix = (1/k)n log n in the case when consists only of cycles of length k for any k ≥ 2 fixed.
Shortly after, Bormashenko [7] devised a path coupling argument for the coagulation-fragmentation process associated to random transpositions to obtain a new proof of a slightly weaker version of the Diaconis-Shahshahani result: her argument implies that the mixing time of random transpositions is O(n log n) (unfortunately the implicit multiplicative constant is not sharp, so this is not sufficient to obtain cutoff). See also [19] for another discussion of her results together with a reformulation in the language of coarse Ricci curvature. In a way her approach is very similar to ours, to the point that it can be considered a precursor to our work, since our method is also based on a certain path coupling for the coagulation-fragmentation process which exploits certain remarkable properties of Schramm's coupling [6,24].
Comparison with [6] The authors in Berestycki et al. [6] remark that their proof can be extended to cover the case when is a fixed conjugacy class and indicate that their methods can probably be pushed to cover the case when | | = o(n 1/2 ), but it is clear that new ideas are needed if | | is larger. Indeed, their argument uses very delicate estimates about the behaviour of small cycles, together with a variant of a coupling due to Schramm [24] to deal with large cycles. The most technical part of their argument is to analyse the distribution of small cycles, using delicate couplings and carefully bounding the error made in these couplings.
However, when k = | | is larger than n 1/2 , we can no longer think of the points in the conjugacy class as being sampled independently (with replacement) from {1, . . . , n}, by the birthday problem. This introduces many more ways in which errors in the above coupling arguments could occur. These seem quite hard to control, and hence new ideas are required for the general case.
The proof in this paper relies on similar observations as [6], and in particular the connection with coalescence-fragmentation process as well as Schramm's coupling argument play a crucial role. The key new idea however, is to try to prove mixing not just in the total variation sense but in the stronger sense of the L 1 -Kantorovich distance (Ricci curvature) and to estimate it at a time well before the mixing time, roughly O(n/k) instead of O(n(log n)/k). This may seem counterintuitive initially, however studying the random walk at this time scale allows us to make precise comparisons between the random walk and an associated random graph process. It turns out the random graph at these time scales can be described rather precisely. Furthermore, due to the contraction properties of the Kantorovich distance, somehow (and rather miraculously, we find), the estimate we obtain can be bootstrapped with sufficient precision to yield mixing exactly at the time t mix = (1/k)n log n.
In particular, since the heart of the proof consists in studying the situation at a time well before mixing, and purely to take advantage of the giant component at such times, we never have to study the distribution of small cycles. This is really quite surprising, given that the small cycles (in particular, the fixed points) are responsible for the occurrence of the cutoff at time t mix .

Organisation of the paper
We stress that compared to [6], the main arguments are quite elementary. The heart of the proof is contained in Sects. 4.2 and 2 . Readers who are familiar with [6] are encouraged to concentrate on these two short sections.
The paper is organised as follows. In Sect. 2 we state and discuss Theorem 2.3, which is a general curvature theorem (of which Theorem 1.2 is the prototype). We also discuss why this implies the main theorem (Theorem 1.1). In Sect. 3.1 we study the associated random hypergraph process. The main result in that section is Theorem 3.1, which proves the existence and uniqueness of the giant component. Curiously this is the most technical aspect of the paper, and really the only place where the myriad of ways in which the conjugacy class might be really big plays a role and needs to be controlled. Section 4 contains a proof of the main curvature theorem (Theorem 2.3), starting with the easy upper bound on curvature (Sect. 4.1) and following up with the slightly more complex lower bound (Sect. 4.2), which really is the heart of the proof. The two appendices contain respectively a proof of the lower bound on the mixing time (certainly known in the folklore, essentially a version of the coupon collector lemma); and an adaptation of Schramm's argument [24] for the Poisson-Dirichlet structure of cycles inside the giant component, which is needed in the proof.

Curvature theorem
As discussed above, the lower bound (5) is relatively easy and is probably known in the folklore; we give a proof in Appendix A. We now start the proof of the main results of this paper, which is the upper bound (the right hand side) of (5). In this section, we first state the more general version of Theorem 1.2 discussed in the introduction, and we will then show how this implies the desired result for the upper bound on t mix (δ). To begin, we define the cycle structure (k 2 , k 3 , . . . ) of to be a vector such that for each j ≥ 2, there are k j cycles of length j in the cycle decomposition of any γ ∈ (note that this does not depend on τ ∈ ). Then k j = 0 for all j > n and we have that k := | | = ∞ j=2 jk j .
In the case for the transposition random walk the quantity θ(c) which appears in the bounds is the survival probability of a Galton-Watson process with offspring distribution given by a Poisson random variable with mean c. Our first task is to generalise θ(c). We do so via a fixed point equation, which is more complex here. Define and note that α j ∈ [0, 1] (α j is the proportion of the mass in cycles of size j for any γ ∈ ). Thus (α j ) j≥2 is compact in the product topology (the topology of pointwise convergence). Suppose that the limit exists, where the limit is taken to be pointwise. It follows that for each j ≥ 2,ᾱ j ∈ [0, 1] and ∞ j=2ᾱ j ≤ 1 by Fatou's lemma. Note that the sum is strictly less than 1 when a positive fraction of the mass of conjugacy class comes from cycles whose size tends to ∞. This will be an important distinction in what follows. For x ∈ [0, 1] and c > 0 define Note that for each is the generating function of a random variable whose law depends on c and is degenerate if j≥2ᾱ j < 1. Note that in the case = T of transpositions, ψ(x, c) = e −cx so that random variable is simply Poisson (c).
Next suppose that ∞ j=2ᾱ j = 1, then Hence for c > c we have that d dx f c (x)| x=0 > 0 and again by concavity it follows that there exists a unique θ(c) ∈ (0, 1) such that f c (θ (c)) = 0.
For the rest of the statements suppose that c > c . The fact that c → θ(c) is increasing follows from the definition of ψ(x, c) and the fact that θ(c) = ψ(θ(c), c). Continuity and differentiability for c > c is a straightforward application of the inverse function theorem.
Notice that θ(c) ∈ [0, 1] and is monotone, hence θ(c) converges as c ↓ c to a limit L. Then it follows that L solves the equation L = 1 − ψ(L , c ). This equation has only a zero solution and thus L = 0 and hence lim c↓c θ(c) = 0. The limit as c ↑ ∞ follows from a similar argument.

Remark 2.2
In the case when = T is the set of transpositions we have that k 2 = 1 andᾱ j = 0 for j ≥ 3, hence ψ(x, c) = e −cx and thus the definition of θ(c) above agrees with the definition given in the introduction.
Having introduced θ(c) we now introduce the notion of Ricci curvature we will use in the general case. For c > 0 and σ = σ , let where d is the graph distance associated with transpositions (even in the case = T ). Define κ c (σ, σ ) = 1. Then let where the infimum is taken over all σ, σ ∈ S n such that d(σ, σ ) is even. That is, κ c (σ, σ ) = κ cn/k (σ, σ ) with our notation from (7). We now state a more general form of Theorem 1.2 which in particular covers the case of Theorem 1.2.

Theorem 2.3
Let ⊂ S n be a conjugacy class such that k = | | = o(n) and the convergence in (C) holds. Recall the definition of c from (12). Then for c ≤ c , On the other hand, for c > c where θ(c) is the unique solution in (0, 1) of where ψ is given by (11).

Curvature implies mixing
We now show how Theorem 2.3 implies the second limit in Theorem 1.1. First suppose that = (n) is a sequence of conjugacy classes for which the limit (C) holds and | | = o(n). Again fix > 0 and define t = (1 + 2 )(1/k)n log n and let where the sup is taken over all permutations at even distances. We first claim that it suffices to prove thatd Indeed, assume thatd T V (t ) → 0 as n → ∞. Then there are two cases to consider. Assume that ⊂ A n . Then X s ∈ A n for all s ≥ 1 and μ is uniform on A n . Then by Lemma 4.11 in [16], sup Hence Theorem 1.1 (or more precisely the second limit in that theorem) follows from (17) in this case. In the second case, ⊂ A c n . In this case X s ∈ A n for s even, and X s ∈ A c n for s odd. Using the same lemma, we deduce that if s ≥ t is even, where μ 1 is uniform on A n . However, if s ≥ t is odd, where this time μ 2 is uniform on A c n . Let N = (N s : s ≥ 0) be the Poisson clock of the random walk Y . Then P(N s even) → 1/2 as s → ∞, μ = (1/2)(μ 1 + μ 2 ), and P(N t ≥ t ) → 1 as n → ∞. Thus we deduce that Again, the second limit in Theorem 1.1 follows. Hence it suffices to prove (17).
Note that for any two random variables X, Y on a metric space (S, d) we have the obvious inequality X − Y T V ≤ W 1 (X, Y ) provided that x = y implies d(x, y) ≥ 1 on S. This is in particular the case when S = S n and d is the word metric induced by the set T of transpositions. In other words it suffices to prove mixing in the L 1 -Kantorovich distance.
Note that by definition of κ c , if σ , σ are at an even distance then so that, iterating as in Corollary 21 of [18] (and noting that the distance between X σ cn/k and X σ cn/k is again even), we have for each s ≥ 1, Now, Theorem 2.3 gives

Lemma 2.4 We have that
Proof Using L'Hopital's rule twice we have that Next we have that lim c→∞ θ(c) = 1 and hence Consequently we have that for u ≥ t = (1 + )(1/k)n log n u satisfies (20) for some sufficiently large c > c . Hence lim sup n→∞d T V (t ) → 0 and thus (17) holds, which shows Theorem 1.1 for conjugacy classes such that the limit in (C) exists and | | = o(n). Now suppose that is a conjugacy class such that | | = o(n). Let t = (1 + )(1/| |)n log n and notice that d T V (t ) is bounded. Along any subsequence {n i } i≥1 such that lim n i →∞ d T V (t ) exists, we can extract a further sub-sequence {n i j } j≥1 such that (C) holds since (α j ) j≥2 ∈ [0, 1] ∞ which is compact under the product topology. Then we see that lim n i j →∞ d T V (t ) = 0 and consequently lim n i →∞ d T V (t ) = 0. Since d T V (t ) is bounded and converges to 0 along any convergent subsequence, we conclude that lim n→∞ d T V (t ) = 0, thus concluding the proof.

Stochastic commutativity
To conclude this section on curvature, we state a simple but useful lemma. Roughly, this says that the random walk is "stochastically commutative". This can be used to show that the L 1 -Kantorovich distance is decreasing under the application of the heat kernel. In other words, initial discrepancies for the Kantorovich metric between two permutations are only smoothed out by the application of random walk.

Lemma 2.5 Let σ be a random permutation with distribution invariant by conjugacy.
Let σ 0 be a fixed permutation. Then σ 0 • σ has the same distribution as σ • σ 0 .
Then since σ is invariant under conjugacy, the law of σ is the same as the law of σ . Furthermore, we have σ 0 • σ = σ • σ 0 so the result is proved. This lemma will be used repeatedly in our proof, as it allows us to concentrate on events of high probability for our coupling.

Preliminaries on random hypergraphs
For the proof of Theorem 1.1 we rely on properties of certain random hypergraph processes. The reader who is only interested in a first instance in the case of random transpositions, and is familiar with Erdős-Renyi random graphs and with the result of Schramm [24] may safely skip this section.

Hypergraphs
In this section we present some preliminaries which will be used in the proof of Theorem 2.3. Throughout we let ⊂ S n be a conjugacy class and let (k 2 , k 3 , . . . ) denote the cycle structure of . Thus consists of permutations such that in their cycle decomposition they have k 2 many transpositions, k 3 many 3-cycles and so on. Note that we have suppressed the dependence of and (k 2 , k 3 , . . . ) on n. We assume that (C) is satisfied so that for each j ≥ 2, jk j /| | →ᾱ j as n → ∞. We also let k = | | so that k = j≥2 jk j , as usual.

Definition 3.1 A hypergraph H = (V, E)
is given by a set V of vertices and E ⊂ P(V ) of edges, where P(V ) denotes the set of all subsets of V . An element e ∈ E is called a hyperedge and we call it a j-hyperedge if |e| = j.
Consider the random walk X = (X t : t = 0, 1 . . .) on S n where X t = X id t with our notations from the introduction. Hence uniform on . A given step of the random walk, say γ s , can be broken down into cycles, say γ s,1 • · · · γ s,r where r = j k j . We will say that a given cyclic permutation γ has been applied to X before time t if γ = γ s,i for some s ≤ t and 1 ≤ i ≤ r .
To X we associate a certain hypergraph process H = (H t : t = 0, 1, . . .) defined as follows. For t = 0, 1, . . ., H t is a hypergraph on {1, . . . , n} where a hyperedge {x 1 , . . . , x j } is present if and only if a cyclic permutation consisting of the points x 1 , . . . , x j in some arbitrary order has been applied to the random walk X prior to time t as part of one of the γ i 's for some i ≤ t. Thus at every step, we add to H t k j hyperedeges of size j sampled uniformly at random without replacement, and these edges are independent from step to step. However, note that the presence of hyperedges themselves are not in general independent.

Giant component of the hypergraph
In the case = T , the set of transpositions, the hypergraph H s is a realisation of an Erdős-Renyi graph. Analogous to Erdős-Renyi graphs, we first present a result about the size of the components of the hypergraph process H = (H t : t = 0, 1, . . . ) (where by size, we mean the number of vertices in this component). For the next result recall the definition of ψ(x, c) in (11). Recall that for c > c , where c is given by (12), there exists a unique root θ(c) ∈ (0, 1) of the equation θ(c) = 1 − ψ(θ(c), c). Theorem 3.1 Consider the random hypergraph H s and suppose that s = s(n) is such that sk/n → c as n → ∞ for some c > c . Then there is a universal constant D > 0 such that with probability tending to one all components but the largest have size at most Dn 2/3 (log(n)) 3 . Furthermore, the size of the largest component, normalised by n, converges to θ(c) in probability as n → ∞.
Of course, this is the standard Erdős-Renyi theorem in the case where = T is the set of transpositions. See for instance [12], in particular Theorem 2.3.2 for a proof. In the case of k-cycles with k fixed and finite, this is the case of random regular hypergraphs analysed by Karoński and Łuczak [15]. For the slightly more general case of bounded conjugacy classes, this was proved by Berestycki [4].
Discussion Note that the behaviour of H s in Theorem 3.1 can deviate markedly from that of Erdős-Renyi graphs. The most obvious difference is that H s can contain mesoscopic components, something which has of course negligible probability for Erdős-Renyi graphs. For example, suppose consists of n 1/2 transpositions and one cycle of length n 1/3 . Then the giant component appears at time n 1/2 /2 with a phase transition (i.e., c > 0, because in this case ᾱ j = 1, as most of the mass comes from microscopic cycles). Yet even at the first step there is a component of size n 1/3 . Nevertheless we will see that once there is a giant component there is a limit to how big can the nongiant component be (we show this is less that O(n 2/3 ) up to logarithmic terms; this is certainly not optimal).
From a technical point of view this has nontrivial consequences, as proofs of the existence of a giant component are usually based on the dichotomy between microscopic components and giant components. Furthermore, when the conjugacy class is large and consists of many small or mesoscopic cycles, the hyperedges have a strong dependence, which makes the proof very delicate.
In effect, perhaps surprisingly this will be the only place of the proof where all the possible ways in which the conjugacy class might be big (potentially of size very close to n), needs to be handled. The difficulty of the proof below is to find an argument which works no matter how is made up, so long as k = | | = o(n). This is of course also the problem in the original question of studying the mixing time of the random walk induced by . However, what we have gained here compared to this original question, is the monotonicity of component sizes when hyperedges are added to H s .
Preliminaries: exploration Suppose that s = s(n) is such that sk/n → c for some c > 0 as n → ∞ for some c ≥ 0. We reveal the vertices of the component containing a fixed vertex v ∈ {1, . . . , n} using breadth-first search exploration, as follows. There are three states that each vertex can be: unexplored, removed or active. Initially v is active and all the other vertices are unexplored. At each step of the iteration we select an active vertex w according to some prescribed rule among the active vertices at this stage (say with the smallest label). The vertex w becomes removed and every unexplored vertex which is joined to w by a hyperedge becomes active. We repeat this exploration procedure until there are no more active vertices. At stage i = 0, 1, . . . of this exploration process, we let A i , R i and U i denote the set of active, removed and unexplored vertices respectively. Thus initially A 0 = {v}, U 0 = {1, . . . , n}\{v} and R 0 = ∅. We will let a i = |A i |, For t = 1, . . . , s we call the hyperedges which are associated with the permutation γ t the t-th packet of hyperedges. Thus note that each packet consists of k j hyperedges of size j, j ≥ 2, which are sampled uniformly at random without replacement from {1, . . . , n}. In particular, within a given packet, hyperedges are not independent. However, crucially, hyperedges from different packets are independent. We will need to keep track of the hyperedges we reveal and where they "came from" (i.e., which packet they were part of), in order to deal with these dependencies. More precisely, as we explore the hypergraph H s , we discover various hyperedges of various sizes in H s and this may affect the likelihood of other types of hyperedges in subsequent steps of the exploration process. To account for this, we introduce for t = 1, . . . , s and for j ≥ 2, the random subset of {1, . . . , n}, Y (t) j (i), which is defined to be the hyperedges of size j in the t-th packet that were revealed in the exploration process prior to step i. We let y (t) Additional notations Let i ≥ 0 and let H i denote the filtration generated by the exploration process up to stage i, including the information of the number of hyperedges of each size in each packet that were revealed up to step i of the exploration process. That is, Our first goal will be to give uniform stochastic bounds on the distribution of a i+1 −a i , so long as i is not too large. We will thus fix i (a step in the exploration process) and in order to ease notations we will often suppress the dependence on i, in Y where the right hand side counts the total number of vertices explored by stage i, while the left hand side counts the sum of the sizes of all hyperedges revealed by stage i, so the ≥ sign accounts for possible intersections between the hyperedges. Let w be the vertex being explored for stage i + 1. For t = 1, . . . , s let M t be the indicator that w is part of an (unrevealed) hyperedge in the t-th packet. Thus, (M t ) 1≤t≤s are independent conditionally given H i , and j counts the number of hyperedges of size j still unrevealed in the t-th packet. If w is part of a hyperedge in the t-th packet, let V t be the size of the (unique) hyperedge of that packet containing it. Then Note that when M t = 1 it implies that the denominator above is non-zero and thus (23) is well defined. When M t = 0 we simply put V t = 1 by convention. Then we have the following almost sure inequality: This would be an equality if it were not for possible self-intersections, as hyperedges connected to w coming from different packets may share several vertices in common.
In order to get a bound in the other direction, we simply truncate the a i+1 − a i at n 1/4 . Let I i be the indicator that among the first n 1/4 vertices to which w is connected, no selfintersection or intersection with the past occurs. Note that E(I i ) ≥ p n = 1 − n −1/2 , by straightforward bounds on the birthday problem. We then have Organisation of proof of Theorem 3.1 We will stop the exploration process once we have discovered enough vertices, or if the active set dies out, whichever comes first. We aim to show that starting from a given vertex v, with probability approximately θ(c) the cluster of v contains about order n vertices. However, we proceed in stages as different arguments are needed in order to reach so many vertices. In Step 1, we first show that the cluster contains about (log n) 2 vertices with probability approximately θ(c). Then in Step 2, given that the exploration of the cluster has discovered (log n) 2 vertices, we show that with high probability the exploration will in fact discover n 2/3 vertices. Finally, in Step 3 we show using the sprinkling technique that any two clusters that reach a size of about n 2/3 can be connected using only very few additional edges, which implies the result.
Main quantitative lemma We define We set T = T ↑ ∧ T ↓ . Hence our first goal (which we will show at the end of Step 2) will be to show that T = T ↑ with probability θ(c): in fact we will show that T ↓ occurs before T ↑ or n 2/3 with probability approximately 1 − θ(c). Either way, this means that the component is greater than n 2/3 with probability approximately θ(c). To do this we need to study the distribution of a i+1 − a i ; the next lemma shows that these random variables converge in distribution to a sequence of i.i.d. (possibly degenerate) random variables, uniformly for i < T : the limit is improper if jᾱ j < 1. Equivalently, the active process |A i | converges (at least for finite dimensional marginals) to the exploration process of a Galton-Watson tree whose offspring distribution is given by the limit of a i+1 − a i + 1 and thus has a moment generating function given by ψ(1 − ·, c).
It is perhaps surprising that the lemma below is sufficient for the proof of Theorem 3.1: the lemma below essentially only records whether a cycle is microscopic (finite) or "more than microscopic"; in particular, whether the mass of comes from many small mesoscopic or fewer big cycles makes no difference.

Lemma 3.2
For each q 0 ∈ [0, 1), there exists some deterministic function w : N → R such that w(n) → 0 as n → ∞ with the following property: Recall from (23) that and from (22) that Hence, since sk/n = O(1) in the regime we are concerned with, where the o(1) term is non random and independent of i, and for the last inequality we have used that which follows from the fact that q ≤ q 0 < 1 and the dominated convergence theorem, as jk j /k is uniformly bounded by 1. Note that the above estimate is uniform in i ≥ 1.
For the upper bound, we use (25). Let n → 0 sufficiently slowly that ε n n 1/3 → ∞. For concreteness take ε n = n −1/6 . Define G := {t ∈ {1, . . . , s} : m≥2 my (t) m ≤ n k}, and let I = G c . Packets t ∈ I are the bad packets for which a significant fraction of the mass corresponding to that packet (at least ε n ) was already discovered at step i; by contrast packets t ∈ G are those for which a fraction at least (1 − ε n ) remains to be discovered in the exploration. In the case where the conjugacy class contains only one type of cycles, say k-cycles, then I coincides with the set of hyperedges already revealed. At the other end of the spectrum, when the conjugacy class is broken down into many small cycles, then I is likely to be empty. But in all cases, |I | satisfies the trivial bound |I | ≤ 2n 2/3 ε n k by definition of T ↑ , and in particular This turns out to be enough for our purposes.
) can only differ by at most q n 1/4 , which is exponentially small in n 1/4 for a fixed q ≤ q 0 < 1, so we can neglect this difference. Then we may write, counting only hyper edges from good packets, using the fact that 1 − x ≤ e −x for all x ∈ R, and (30) (recalling that I i is the indicator of the event that no self-intersection occurs among the first n 1/4 vertices connected to w): where the o(1) term again is non random and uniform in i ≥ 1, but might depend on q (the last inequality again from comes from (29)). The proof is complete.
Lemma 3.2 above tells us that, at the level of generating functions, the distribution of a i+1 − a i behaves very much like a sequence of i.i.d. random variables with distribution determined by ψ, even if we don't ignore self-intersections. It is thus easy to build martingales from quantities of the form q a i , which behave as if the increments of a i were i.i.d., at least until we reach size n 2/3 . Hence this will allow us to reach a size of n 2/3 for a i almost as if there were no self-intersections, and so with probability approximately θ(c). Fundamentally, this is because even if self-intersections do occur, they are rare and do not cause a significant loss of mass. Technically, it is easier to have a separate argument for bringing the cluster to a polylogarithmic size before using this information to show that the cluster reaches size n 2/3 with essentially the same probability. This is what we achieve in Step 1, which we are now ready for.
Step 1. We show that the cluster containing a given vertex v is at least logarithmically large with probability approximately θ(c), and furthermore the number of vertices for which this occurs is approximately nθ(c) in the sense of convergence in probability.
Proof We start with the upper bound of (32), for which we simply make a comparison with a Galton-Watson process: to reach size log n the exploration process must survive more than a finite number of steps. More precisely, we make the following observation. Let m ≥ 1 be some arbitrary fixed large integer, and observe P( where the Poisson random variables are independent. Consequently, if W is the total progeny of a Galton-Watson branching process with offspring distributionX i (note in particular that W = ∞ as soon as one nodes in the tree has offspringX i = ∞). We conclude that P(|C v | ≥ m) → P(W ≥ m), and hence, taking the limsup and letting m → ∞, This proves the upper bound in (32). We now discuss the lower bound to (32), which is essentially the same argument, together with the observation that self-intersections are unlikely to occur before (log n) 2 vertices have been explored. For this we can assume without loss of generality that θ(c) > 0, otherwise there is nothing to prove. Let we will prove the slightly stronger result that lim inf n→∞ P(T 1 < T ↓ ) ≥ θ(c). (This is slightly stronger, because |C v | could in principle be greater than (log n) 2 without the active set ever reaching that size). Let X i be i.i.d. random variables with generating function given by so that, by (23), a 1 − a 0 has the same distribution as X 1 when A 0 = {v} (see e.g. (28) where a similar calculation is carried). We can use the random variables X i to generate the breadth first exploration of C v until we find a self-intersection. Thus let Y i be a collection of randomly chosen vertices of {1, . . . , n} of size X i , and at each time step, add to the active setÃ i+1 the setỸ i and remove the currently explored vertex. Then we can couple A i andÃ i so that A i =Ã i until the first time T inter such thatỸ i ∩ (Ỹ j ∪ {v}) = ∅ for some i = j ≤ T inter . Furthermore, until T inter ,Ã i is the breadth-first search exploration of a branching process with offspring distribution (33). It becomes extinct with a probability q n , and we claim that q n satisfies q n → 1 − θ(c) as n → ∞ by (29). Indeed, ψ n clearly converges uniformly to ψ(·, c) on [0, x 0 ] for x 0 < 1 by Lemma 3.2 and this is the regime we are interested in since by assumption θ(c) > 0. Hence, it is clear that if W n is the total progeny of this branching process, then P(W n ≥ (log n) 2 ) ≥ P(W n = ∞) = 1 − q n → θ(c), and combining with the argument in the upper bound on (32) we deduce that P(W n ≥ (log n) 2 ) → θ(c). On the other hand, T 1 < T inter with probability tending to 1 as n → ∞ by the birthday problem, and so in fact P(T 1 < T ↓ ) = P(W n ≥ (log n) 2 ) + o(1), so we are done.
It is important to note that self-intersections may occur at the very step that a i exceeds (log n) 2 (for instance, think about the case when the conjugacy class has some of its mass coming from cycles larger than n 1/2 : discovering such a cycle would immediately produce a self-intersection). Even so, the active set reaches size (log n) 2 before such a self-intersection is discovered.
As announced at the beginning of Step 1, we complement this with a law of large numbers: in probability as n → ∞.
Proof Let Z = n v=1 1 {|C v |≥(log n) 2 } , so by the previous lemma we know that E(Z )/n → θ by (32). Hence if we show that Var(Z ) ≤ εn 2 for any ε > 0 and any n sufficiently large, then (34) follows by Chebyshev's inequality. In particular, it suffices to show that for v = w ∈ {1, . . . , n}, On the other hand, (35) can be proved in exactly the same way as the upper bound of (32) above: for both |C v | and |C w | to be larger than (log n) 2 , both must be greater than m where m ≥ 1 is fixed. This is an event which depends on a finite number of steps (at most 2m) in the explorations of C v and C w , and so can be approximated by Lemma 3.2 by the same event for two independent branching processes. Letting m → ∞ finishes the proof.
For the rest of the proof we now assume that c > c so that θ(c) > 0. Hence fix q ∈ [0, 1) such that ψ(1 − q, c)/q < 1, and note that using Lemma 3.2, we can suppose that, for some fixed > 0, n is large enough so that almost surely on {T > i}.
Step 2. We now extrapolate the information obtained in the previous step to show that, still with probability approximately θ(c), the active set of C v can reach a size of at least O(n 2/3 ). To do so we suppose our exploration from Step 1 yields an active set of size at least (log n) 2 (which, as discussed, occurs with probability θ(c) + o(1). We will restart the exploration from that point on, calling this time i = 0 again. Hence the setup is the same as before, except that at time i = 0 we have a 0 = (log n) 2 : we only keep the first (log n) 2 of the active vertices discovered at time T 1 , and declare all further active vertices at time T 1 to be removed at time i = 0 in the exploration of Step 2. Recall our notations for T ↓ and T ↑ in (26) and (27). Our goal in this step is to show the following control: Lemma 3.5 Suppose that given H 0 , it is a.s. the case that a 0 = (log n) 2 , and r 0 ≤ n 2/3 .Then Proof Set S = n 2/3 ∧ T ↑ ∧ T ↓ and for i ≥ 0, let is a supermartingale in the filtration (H 0 , H 1 , . . .). Observe that S ≤ n 2/3 so M is bounded. Note that on the event {S = T ↓ }, hence by the optional stopping theorem (since M is bounded), given H 0 and under the assumptions of the lemma on H 0 , as desired.
Consequently, since the error bound in Eq. (37) is o(n −1 ), we deduce that if then G =G with high probability, and hence in particular in probability as n → ∞.
Step 3. We now show that if v and v are two vertices such that C v = C v (s) and C v = C v (s) are both larger at time s than n 2/3 then they are highly likely to be connected at some slightly later time s + s . This follows from a so-called "sprinkling" argument, as follows. That is, suppose we add s packets, with s = Dn 2/3 log n k for some D > 0 to be chosen later on. Note that s k/n → 0 so that (s + s )k/n → c. Since s = s(n) is an arbitrary sequence such that sk/n → c it suffices to show that v and v are then connected at time s + s . In fact we will check that the two clusters can be connected using smaller edges that the hyperedges making each packet, as follows.
For each hyperedge of size j we will only reveal a subset of j/2 edges (of size 2) with disjoint support. Since j/2 ≥ j/3 for any j ≥ 2, this gives us at least k/3 edges for each packet; these are sampled uniformly at random without replacement from {1, . . . , n}. We will check that a connection occurs between the two clusters within these s k/3 edges, with high probability. Call the two clusters A and A for simplicity; these are two arbitrary sets of size n 2/3 which we can assume to be disjoint otherwise there is nothing to prove. Call a packet of edges good if their intersections with each of A and A contains at most n 2/3 /2 vertices, and call it bad otherwise. We reveal the edges in a given packet one by one, sampling without replacement. Note that so long as packet of edges has not been observed to be bad, the probability that the next edge connects A and A is at least n 4/3 /(16(n − k) 2 ) ≥ n −2/3 /32. (Note that if k ≤ n 2/3 /2 then every packet is necessarily good). Hence the probability that no connection between A and A occurs for a good packet is at most Now, each packet is bad independently of each other, with probability tending to 0 by Markov's inequality (since the expected intersection of a pack of edges with A is at most |A|k/n = o(|A|)) and hence less than 1/2 say.
However, if k ≤ n 2/3 , then every packet is good, and so (40) holds without the second term on the right hand side. Either way, Proof of Theorem 3. 1 We are now ready to conclude that vertices are either in small component at time s or connected at time s + s . Recall our notation G = {v : |C v (s)| > n 2/3 }. Then by (39), we know that |G|/n → θ(c) in probability as n → ∞.
We now aim to show that G is connected at time s + s , with high probability. For v, v ∈ {1, . . . , n}, write v ↔ v to indicate that v is connected to v . Then by Step 3 (more specifically, (41)), Hence G is entirely connected at time s + s with probability tending to 1. This proves that H s+s contains a component of relative size converging to θ(c) in probability.
Let us now check that every other component at time s + s is small. Note that since G =G with probability tending to one (whereG is defined in (38)), any component disjoint from G at time s + s must have been smaller than (log n) 2 at time s. Since at most s k connections are added, this means that, on the event G =G, the maximal size of a component at time s + s disjoint from G is smaller than s k(log n) 2 ≤ Dn 2/3 (log n) 3 . This shows that every other component is O(n 2/3 (log n) 3 ) on an event of high probability.
The proof of Theorem 3.1 is complete, since s + s in an arbitrary sequence such that (s + s )k/n → c.

Poisson-Dirichlet structure
The renormalised cycle lengths X(σ ) of a permutation σ ∈ S n is the cycle lengths of σ divided by n, written in decreasing order. In particular we have that X(σ ) takes values in We equip ∞ with the topology of pointwise convergence. If σ n is uniformly distributed in S n then X(σ n ) → Z in distribution as n → ∞ where Z is known as a can be ordered in decreasing size and the random variable Z has the same law as (Z * 1 , Z * 2 , . . . ) ordered by decreasing size.
The next result is a generalisation of Theorem 1.1 in [24] to the case of general conjugacy classes. The proof is a simple adaptation of the proof of Schramm and we provide the details in an appendix.

Proof of curvature theorem 4.1 Proof of the upper bound on curvature
We claim that it is enough to show the upper bound for c > c in (15). Indeed, notice that c → κ c is nondecreasing. Hence let c ≤ c and suppose we know that lim sup n→∞ κ c ≤ θ(c ) 2 holds for all c > c . Then we have that lim sup n→∞ κ c ≤ θ(c ) 2 for each c > c . Taking c ↓ c and using the fact that lim c ↓c θ(c ) = 0 shows that lim n→∞ κ c = 0. Fix c > c and let t := cn/k . We need to show the upper bound in (15). In other words, we wish to prove that for some σ, σ ∈ S n lim inf We will choose σ = id and σ = τ 1 • τ 2 , where τ 1 , τ 2 are independent uniformly chosen transpositions. To prove the lower bound on the Kantorovich distance we use the dual representation of the distance W 1 (X, Y ) between two random variables X, Y : d(id, σ ) be the distance to the identity (using only transpositions, as usual). Then observe that f is 1-Lipschitz. It suffices to show We will now show (44) by a coupling argument. Construct the two walks X τ 1 •τ 2 and X id as follows. Let γ 1 , γ 2 , . . . be a sequence of i.i.d. random variables uniformly distributed on , independent of (τ 1 , τ 2 ). Using Lemma 2.5 with σ 0 = τ 1 • τ 2 , which is independent of X id , we can construct Next we couple X id t by constructing it as Thus under this coupling we have that X τ 1 •τ 2 t = X id t • τ 1 • τ 2 . Let X = X id , then from (44) the problem reduces to showing We recall that a transposition can either induce a fragmentation or a coalescence of the cycles. Indeed, a transposition involving elements from the same cycle generates a fragmentation of that cycle, and one involving elements from different cycles results in the cycles being merged. (This property is the basic tool used in the probabilistic analysis of random transpositions, see e.g. [5] or [24]). Hence either τ 1 fragments a cycle of X t or τ 1 coagulates two cycles of X t . In the first case, d(id, X t • τ 1 ) = d(id, X t • τ 1 ) − 1, and in the second case we have d(id, X t • τ 1 ) = d(id, X t • τ 1 ) + 1. Let F denote the event that τ 1 causes a fragmentation. Then Using the Poisson-Dirichlet structure described in Theorem 3.6 it is not hard to show that P(F) → θ(c) 2 /2 (see, e.g., Lemma 8 in [6]). Applying the same reasoning to X t • τ 1 • τ 2 and X t • τ 1 we deduce that from which the lower bound (45) and in turn the upper bound in (10) follow readily.

Proof of lower bound on curvature
We now assume that c > c and turn out attention to the lower bound on the Ricci curvature, which is the heart of the proof. Throughout we let k = | | and t = cn/k . With this notation in mind we wish to prove that lim sup for some appropriate coupling of X σ and X σ , where the supremum is taken over all σ, σ with even distance. Note that we can make several reductions: first, by vertex transitivity we can assume σ = id is the identity permutation. Also, by the triangle inequality (since W 1 is a distance), we can assume that σ = (i, j) • ( , m) is the product of two distinct transpositions. There are two cases to consider: either the supports of the transpositions are disjoint, or they overlap on one vertex. We will focus in this proof on the first case where the support of the transpositions are disjoint; that is, i, j, l, m are pairwise distinct. The other case is dealt with very much in the same way (and is in fact a bit easier). Clearly by symmetry Ed(X id t , X ) is independent of i, j, and m, so long as they are pairwise distinct. Hence it is also equal to Ed(X id t , X τ 1 •τ 2 t ) conditioned on the event A that τ 1 , τ 2 having disjoint support, where τ 1 and τ 2 are independent uniform random transpositions. This event has an overwhelming probability for large n, thus it suffices to construct a coupling between X id and X τ 1 •τ 2 such that Indeed, it then immediately follows from stochastic commutativity (Lemma 2.5) that the same is true with the expectation replaced by the conditional expectation given A, since the distance is bounded by two. Next, let X be a random walk on S n which is the composition of i.i.d. uniform elements of the conjugacy class . We decompose the random walk X into a walkX which evolves by applying transpositions at each step as follows. For t = 0, 1, . . . , write out where γ 1 , γ 2 , . . . are i.i.d. uniformly distributed in . As before we decompose each step γ s of the walk into a product of cyclic permutations, say where r = j≥2 k j . The order of this decomposition is irrelevant and can be chosen arbitrarily. For concreteness, we decide that we start from the cycles of smaller sizes and progressively increase to cycles of larger sizes. We will further decompose each of these cyclic permutation into a product of transpositions, as follows: for a cycle c = (x 1 , . . . , x j ), write This allows to break any step γ s of the random walk X into a number of elementary transpositions, and hence we can write where τ  We see that s is a refreshment time if the transposition being applied toX at time s is the start of a new cycle. Using this we can describe the law of the transpositions being applied toX . Note that in either case, the second marker y is conditionally uniformly distributed among the vertices which have not been used so far. This conditional independence property is completely crucial, and allows us to make use of methods (such as that of Schramm [24]) developed initially for random transpositions) for general conjugacy classes, so long as | | = o(n). Indeed in that case the second marker y itself is not very different from a uniform random variable on {1, . . . , n}.
We will study this random walk using this new transposition time scale. We thus define a processX = (X u : u = 0, 1, . . .) as follows. Let u ∈ {0, 1, . . .} and write u = sρ + i where s, i are nonnegative integers and i < ρ. Then definẽ Thus it follows that for any s ≥ 0,X sρ = X s . Notice thatX evolves by applying successively transpositions with the above mentioned conditional uniformity rules. Now consider our two random walks, X id and X τ 1 •τ 2 respectively, started respectively from id and τ 1 •τ 2 , and letX id andX τ 1 •τ 2 be the associated processes constructed using (49), on the transposition time scale. Thus to prove (46) it suffices to construct an appropriate coupling betweenX id tρ andX τ 1 •τ 2 tρ . Next, recall that for a permutation σ ∈ S n , X(σ ) denotes the renormalised cycle lengths of σ , taking values in ∞ defined in (42). The walksX id andX τ 1 •τ 2 are invariant by conjugacy and hence both are distributed uniformly on their conjugacy class. Thus ultimately it will suffice to couple X(X id tρ ) and X(X τ 1 •τ 2 tρ ). Fix δ > 0 and let = δ −9 . Define Our coupling consists of three intervals [0, s 1 ], (s 1 , s 2 ] and (s 2 , s 3 ].
Let us informally describe the coupling before we give the details. In what follows we will couple the random walksX id andX τ 1 •τ 2 such that they keep their distance constant during the time intervals [0, s 1 ] and (s 2 , s 3 ]. In particular we will see that at time s 1 , the walksX id andX τ 1 •τ 2 will differ by two independently uniformly chosen transpositions. Thus at time s 1 most of the cycles ofX id andX τ 1 •τ 2 are identical but some cycles may be different. We will show that given that the cycles that differ at time s 1 are all reasonably large, then we can reduce the distance between the two walks to zero during the time interval More generally, our coupling has the property that d( is uniformly bounded, so that it will suffice to concentrate on events of high probability in order to get a bound on the L 1 -Kantorovich distance W (X id t , X τ 1 •τ 2 t ).

Coupling for [0, s 1 ]
First we describe the coupling during the time interval [0, s 1 ]. LetX = (X s : s ≥ 0) be a walk with the same distribution asX id , independent of the two uniform transpositions τ 1 and τ 2 . Then we have that by Lemma 2.5 for any s ≥ 0,X τ 1 •τ 2 s has the same distribution asX s • τ 1 • τ 2 . Thus we can couple X(X id s 1 ) and X(X τ 1 •τ 2

Coupling for (s 1 , s 2 ]
For s ≥ 0 defineX s = X(X id s+s 1 ) andȲ s = X(X τ 1 •τ 2 s+s 1 ). Here we will coupleX s and Y s for s = 0, . . . , . During this time we aim to show that the discrepancies between X 0 andȲ 0 resulting from performing the transpositions τ 1 and τ 2 at the end of the previous phase can be resolved. Our main tool for doing this will be a variant of a coupling of Schramm [24], which was already used in [6].
At each step s we try to create a matching betweenX s andȲ s by matching an element ofX s to at most one element ofȲ s of the same size. At any time s there may be several entries that cannot be matched. By parity the combined number of unmatched entries is an even number, and observe that this number cannot be equal to two. NowX id s 1 andX τ 1 •τ 2 s 1 differ by two transpositions as can be seen from (50). This implies that in particular initially (i.e., at the beginning of (s 1 , s 2 ]), there are four, six or zero unmatched entries betweenX 0 andȲ 0 .
Fix δ > 0 and let A(δ) denote the event that the smallest unmatched entry between X 0 andȲ 0 has size greater than δ > 0. We will show that on the event A(δ) we can couple the walks such thatX =Ȳ with high probability. On the complementary event A(δ) c , couple the walks so that their distance remains O(1) during the time interval (s 1 , s 2 ], similar to the coupling during [0, s 1 ]. It remains to define the coupling during the time interval (s 1 , s 2 ] on the event A(δ). We begin by estimating the probability of A(δ). Furthermore, it follows from Theorem 3.6 that conditionally on the event A 1 , the asymptotic relative size of the cycles containing the four points making the transpositions τ 1 , τ 2 can be thought of as the size of four independent samples from a Poisson-Dirichlet distribution, multiplied by θ(c). Hence the lemma is proved with p(δ) being the probability that one of the four samples has a size smaller than δ/θ(c). Clearly p(δ) → 0 so the result is proved.
Recall that the transpositions which make up the walksX id andX τ 1 •τ 2 obey what we called conditional uniformity in Proposition 4.1. For the duration of (s 1 , s 2 ] we will assume the relaxed conditional uniformity assumption, which we describe now. In both cases we take y to be uniformly distributed on {1, . . . , n}\{x}. In making the relaxed conditional uniformity assumption we are disregarding the constraints on (x, y) given in Proposition 4.1. However the probability we violate this constraint at any point during the interval (s 1 , s 2 ] is at most 2(s 2 − s 1 )ρ/n = 2 k/n → 0; and on the event that this constraint is violated the distance between the random walks can increase by at most (s 2 − s 1 ) = . Hence we can without a loss of generality assume that during the interval (s 1 , s 2 ] bothX id andX τ 1 •τ 2 satisfy the relaxed conditional uniformity assumption. Now we show that on the event A(δ) we can couple the walks such thatX =Ȳ with high probability. The argument uses a coupling of Berestycki et al. [6], itself a variant of a beautiful coupling introduced by Schramm [24]. We first introduce some notation. Let Notice that the walksX andȲ both take values in n .
Marginal evolution Let us describe the evolution of the random walkX = (X s : s = 0, 1, . . . ). Suppose that s ≥ 0 andX s = (x 1 , . . . , x n ). Now imagine the interval (0, 1] tiled using the intervals (0, x 1 ], . . . , (0, x n ] (the specific tiling rule does not matter). Initially for s = 0, and more generally if s is a refreshment time, we select u ∈ {1/n, . . . , n/n} uniformly at random and then call the tile that contains u the marked tile. If s ≥ 1 is not a refreshment time then the marked tile is the one containing the second marker y of Proposition 4.1 from the previous step. Either way, we have a distinguished tile (the tile containing the 'first marker' at the beginning of each step s = 0, 1, . . . We now describe the marginal evolution of this tiling for one step. In fact this evolution takes as an input a tilingX s and a marked tile I . The output will be another tilingX s+1 and a new marked tile for the next step. Let I be the tile containing the first marker at the beginning of the step, and place I first from left. (I represents the cycle containing the first marker u and we imagine that u is the leftmost point of that tile, i.e., in position 1/n). Select v ∈ {2/n, . . . , n/n} uniformly at random and let I be the tile that v falls into. Then there are two possibilities: • if I = I then we merge the tiles I and I . The new tile we created is now marked for the next step. • If I = I then we split I into two fragments, corresponding to where v falls. Thus, one of size v − 1/n and the other of size |I | − (v − 1/n). The rightmost one of these two tiles, containing v, is now marked for the next step. NowX s+1 is the sizes of the tiles in the new tiling we have created (with additional reordering of tiles in decreasing order). . T (X s , I, v). The evolution described above has the law of the projection onto n ofX . Indeed, suppose we apply the transposition (x, y) toX s in order to obtainX s+1 . The marked tile at time s corresponds to the cycle of X s containing x: if s is a refreshment time then x ∈ {1, . . . , n} is chosen uniformly, otherwise x is the second marker from the previous step.

This defines a transformation
Coupling We now recall the coupling of [6]. Let s ≥ 0. Suppose thatX s =X = (x 1 , . . . , x n ) andȲ s =Ȳ = (y 1 , . . . , y n ). Then we can differentiate between the entries that are matched and those that are unmatched: we say that two entries from X andȲ are matched if they are of identical size. Our goal will be to create as many matched parts as possible and as quickly as possible. When putting down the tilings X andȲ , associated withX andȲ respectively, we will do so in such a way that all matched parts are to the right of the interval (0, 1] and the unmatched parts occupy the left part of the interval. Let IX and IȲ be the respective marked tiles of the tilingsX andȲ at some step s ≥ 0, and letX ,Ŷ be the tiling which is the reordering ofX ,Ȳ in which IX and IȲ have been put to the left of the interval (0, 1]. We assume that at the start of the step, either IX and IȲ are both matched to each other, or they are both unmatched. (We will then verify that this property is preserved by the coupling). Let a = |IX | and let b = |IȲ | be the respective lengths of the marked tiles, and assume without loss of generality that a < b. Let v ∈ {2/n, . . . , n/n} be chosen uniformly. We will apply T (X , IX , v) toX as we did before, and obtainX s+1 . To obtainȲ s+1 we will also apply the transformation T to it, but with an other uniform random variable v ∈ {2/n, . . . , n/n} which may differ from v. To construct v we proceed as follows.
If IX is matched (so that IŶ is matched to it by assumption) then we take v = v, as in the coupling of Schramm [24]. In the case when IX is unmatched (which also implies that IŶ is unmatched), we apply to v a measure-preserving map , defined as follows: for w ∈ {2/n, . . . , n/n} consider the map where γ n := an/2−1 /n. (This is contrast with Schramm's original coupling, where v = v no matter what). See Fig. 1 (top right corner) for an illustration of , from which it should be clear in particular that is a bijection and hence measure-preserving; this is easy to check. Thus letting v = (v) we have that v has the correct marginal Before checking that the coupling is well defined (in the sense that our assumption on the marked tiles, which are needed for the definition of the coupling, remains true throughout), we briefly add a few words of motivation for this definition.
Motivation for the coupling The coupling defined above is, as already mentioned above, the same as the one used in [6], which is a modification of a coupling due to Schramm [24]. In Schramm's original coupling, the map was taken to be the identity, which is natural enough. However this leads to the undesirable property that it is possible for very small unmatched pieces to appear; once these small unmatched pieces appear they remain in the system for a very long time which could prevent coupling. The reason for introducing the map here and in [6] (where it was one of the main innovations) is that it prevents the occurrence of small unmatched pieces: as we will see in Lemma 4.4, the crucial property is that the worst thing that can happen is for the smallest unmatched piece to become smaller by a factor of two, and this only happens with small probability. This means all unmatched pieces remain relatively large, and so they disappear quite quickly (leading in turn to a coupling of the two copies). We start by a proof that the coupling is well defined: Proof The proof consists of examining several cases. If the first marker u was in a matched tile, then whether v falls in the matched or unmatched part, the property holds (if v is unmatched then we attach two unmatched tiles to two matched tiles, so they both become unmatched. If it falls in the matched part, either two tiles of the same size are being attached, or the marked tile splits into two tiles of same size, and the rightmost piece which is the new marked tile matches in both copies).
A similar analysis can be done if the first marked tile was unmatched. The only case which requires an observation is if v falls in the same tile as IX (where we assume, as in the figure, that this is the sorter of the two unmatched pieces), then if v falls in the first half of the tile this results in two matched pieces which are unmarked for the next step and two unmatched pieces which are both marked. If however v falls in the second half of IX then this results in two matched pieces which are marked, and two unmatched pieces which are not marked. (Recall that the marked tile at the next step is the one containing the rightmost fragment).
This coupling has several remarkable deterministic properties, as already observed in [6]. Chief among those is the fact that the number of unmatched entries can only decrease. Unmatched entries disappear when they are coalesced. In particular they disappear quickly when their size is reasonably large. Hence it is particularly desirable to have a coupling in which unmatched components stay large. The second crucial property of this coupling is that it does not create arbitrarily small unmatched entries: even when unmatched entry is fragmented, the size of the smallest unmatched entry cannot decrease by more than a factor of two. (As these properties hold deterministically given the marked tiles, they do not need to be proved again). A direct consequence of these properties is the following lemma, which is Lemma 19 from [6].

Lemma 4.4
Let U be the size of the smallest unmatched entry in two partitionsx,ȳ ∈ n , letx ,ȳ be the corresponding partitions after one transposition of the coupling, and let U be the size of the smallest unmatched entry inx ,ȳ . Assume that 2 j ≤ U < 2 j+1 for some j ≥ 0. Then it is always the case that U ≥ (1/n) nU/2 , and moreover, Finally, the combined number of unmatched parts may only decrease. Remark 4.5 In particular, it holds that U ≥ 2 j−1 /n.
We now explain our strategy. On A(δ) we will expect that the unmatched components will remain of a size roughly of order at least δ for a while. In fact we will show that they will stay at least as big as O(δ 2 ) for a long time. Unmatched entries disappear when they are merged together. If all unmatched entries are of size at least δ 2 , we will see that with probability at least δ 8 , we have a chance to reduce the number of unmatched entries in every 3 steps. Then a simple argument shows that after time = δ −9 ,X andȲ are perfectly matched with a probability tending to one as δ → 0.

Lemma 4.6
There is δ 0 such that if δ < δ 0 , during [0, ], bothX s andȲ s always have an entry of size greater than δθ(c) with probability at least 1 − 2δ 1/2 for all n sufficiently large.

Summing over
= δ −9 steps we see that the expected number of times during the interval [0, ] such thatX s orȲ s don't have a component of size at least θ(c)δn is less than δ 1/2 as n → ∞ and is thus less than 2δ 1/2 for n sufficiently large, by Theorem 3.6 (note that we can apply the result because this calculation involves only a finite number of components). The result follows.
We now check that all unmatched components really do stay greater than δ 2 during [0, ]. Let T δ denote the first time s that eitherX s orȲ s have no cycles greater than δθ(c)n (suppose without loss of generality that δ is small enough that δ 2 ≤ δθ(c)).

Lemma 4.7 On A(δ), for all s ≤ T δ ∧ , all unmatched components stay greater than δ 2 with probability at least 1 − O(δ), where the constant implied in O(δ) can depend on c but not on δ.
Proof Say that a number x ∈ [0, 1] is in scale j if 2 j /n ≤ x < 2 j+1 /n. For s ≥ 0, let U (s) denote the scale of the smallest unmatched entry ofX s ,Ȳ s . Let j 0 be the scale of δ, and let j 1 be the integer immediately above the scale of δ 2 .
Suppose for some time s ≤ T δ , we have U (s) = j with j 1 ≤ j ≤ j 0 , and the marked tile at time s corresponds to the smallest unmatched entry. Then after this transposition we have U (s + 1) ≥ j − 1 by the properties of the coupling (Lemma 4.4). Moreover, U (s + 1) = j − 1 with probability at most r j = 2 j+2 /n. Furthermore, since s ≤ T δ , we have that this marked tile merges with a tile of size at least θ(c)δ with probability at least θ(c)δ after the transposition. We call the first occurrence a failure and the second a mild success.
Once a mild success has occurred, there may still be a few other unmatched entries in scale j, but no more than five since the total number of unmatched entries is decreasing, and there were at most six initially. And therefore if six mild successes occur before a failure, we are guaranteed that U (s + 1) ≥ j + 1. We call such an event a good success, and note that the probability of a good success, given that U (s) changes scale, is at least p j = 1 − 6r j /(r j + θ(c)δ). We call q j = 1 − p j .
Let {q i } i≥0 be the times at which the smallest unmatched entry changes scale, with q 0 being the first time the smallest unmatched entry is of scale j 0 . Let {U i } denote the scale of the smallest unmatched entry at time q i . Introduce a birth-death chain on the integers, denoted v n , such that v 0 = j 0 and and Then it is a consequence of the above observations that j}. An analysis of the birth-death chain defined by (52), (53) gives that (see, e.g., Theorem (3.7) in Chapter 5 of [11]). Note also that q j ≤ 6r j /(δθ (c)), so q j / p j r j /(δθ (c)). Moreover all the r j in this product satisfy r j δ. Thus, by considering the 10 terms with lowest index in the product above (and note that for δ > 0 small enough, there are at least 10 terms in this product), we deduce that P j 0 (τ j 1 < τ j 0 ) decays faster than O(δ) 10 . Since T δ ∧ ≤ = δ −9 we conclude that the probability that U (s) = j 1 before T δ ∧ is at most O(δ).
We are now going to prove that on the event A(δ), after time there are no unmatched entries with probability tending to one as n → ∞ and δ → 0. The basic idea is again to exploit that there are initially at most six unmatched parts, and this number cannot increase. We need a few preparatory lemmas which construct a scenario which lead to a coagulation of two unmatched entries in each copy, in three steps. Let T δ be the first time one of the unmatched entries is smaller than δ 2 . Let F s denote the filtration generated by (X 1 , . . . ,X s ) including the marked tiles at the end of each step up to time s. Let K s be the event that step s results in two unmatched entries being merged in both copies, so our first goal (achieved in Lemma 4.10) will be to get a lower bound on the probability of K s .
Step 1. We show that with good probability both marked tiles are unmatched at the end of a step (and thus also at the beginning of the next step).
Proof Let u, v be the two markers for step s. If the tile containing u was matched, then it suffices for v to fall in an unmatched tile (then M s occurs), which occurs again with probability at least δ 2 . If however the tile containing u was unmatched, the copy which contains the smallest of these two unmatched tiles necessarily contains at least another unmatched tile. It then suffices for v to fall in that tile. Indeed if we are very lucky and the other copy also just happen to have two unmatched entries this might lead to a reduction in the number of unmatched entries, in which case K s has occurred. Otherwise we have simply shuffled the unmatched entries and v is now in an unmatched entry, so M s holds. Either way, the conditional probability is at least δ 2 .
Step 2. We show that if the marked tiles are unmatched, with good probability we can get to a "balanced configuration" where both copies contain at least two unmatched tiles, and that the marked tiles at the end of the step are both unmatched.

Lemma 4.9
Suppose s is not a refreshment time. Let B s denote the event thatX s and Y s contain at least two unmatched entries each, and that the second marker is in one of these unmatched tiles for bothX s ,Ȳ s at the end of the step (i.e., M s holds). Then Suppose now that s is a refreshment time. Then Proof Let u, v be the two markers for step s. Suppose first that s is not a refreshment time, so we aim to prove (54). We treat several cases, according to whether B s−1 holds or not. We start by assuming that B s−1 does not hold. The idea is that in that case, at time s − 1, one copy (sayȲ s−1 ) has one unmatched entry, while the other one has at least three. It then suffices to fragment the unmatched entry inȲ s−1 and to coagulate the other two entries inX s−1 . Since M s−1 holds, and s is not a refreshment time, it suffices for v (the marker corresponding to the copy which has the smallest unmatched entry, which is necessarilyX s−1 ) to fall in any of the other unmatched entries ofX s−1 : this necessarily results in a balanced configuration. Note also that this always results in both marked tiles to be unmatched at the end of the step, so B s indeed holds in that case. Moreover, this event has probability at least δ 2 since T δ ≥ s. Suppose now that B s−1 holds. Then let us show directly K s can occur with good probability. Indeed, if the second marker v = (v) (this is the marker associated with the copy, sayȲ s , that contains the larger of the two marked unmatched tiles) falls in another unmatched tile ofȲ s , then in this case a coagulation of two unmatched entries is guaranteed to occur in both copies. Hence K s occurs with probability at least δ 2 . Either way, (54) is proved. Now suppose that s is a refreshment time. In that case it suffices to require that the first marker falls in an unmatched tile (which occurs with probability δ 2 ) and from then on we argue exactly as in the proof of (54) to obtain a proof of (55). All in all the lemma is proved.
We point out that, combining Lemmas 4.8 and 4.9 , regardless of whether s is a refreshment time, P(B s |F s−1 ) ≥ δ 4 .
Step 3. Having reached a balanced configuration with one marked unmatched entry in both copies, we show that a coagulation of two unmatched entries in both copies has a good chance of occurring. In that case, the number of unmatched entries has decreased by two or four. Lemma 4. 10 We have Proof We again need to distinguish between the cases where s is refreshment time or not. If not, then since B s−1 holds, then the first marker is in an unmatched tile for both copies. If the second marker v corresponding to the copy with the larger of these two unmatched tiles falls in a different unmatched tile (which has probability at least δ 2 ) then a coagulation is guaranteed to occur in both copies so K s holds.
If s is refreshment time, then the same argument applies, but the first marker must first fall in an unmatched component (which has probability at least δ 2 since T δ ≥ s). This gives a lower bound of δ 4 on the probability of K s , as desired.
Combining these three steps, it is now relatively easy to deduce the following: Proof Initially there are at most 6 unmatched entries. Due to parity there can be either 6, 4 or 0 unmatched entries (note in particular that 2 is excluded, as a quick examination shows that no configuration can give rise to two unmatched entries). Furthermore, form the properties of the coupling, the number of unmatched entries either remains the same or decreases at each step. Once all the entries are matched they remain matched thereafter.
We have just shown that in any sequence of three transpositions, the probability that the number of unmatched decreases is at least δ 8 , unless T δ occurs during this sequence. Let Z be a binomial random variable with parameters m = ( − 1)/3 and p = δ 8 . Thus the event that {X = Y } implies that there has been at most one success (i.e. Z ≤ 1, so P(X = Y |A(δ)) ≤ P(Z ≤ 1) + P(T δ ≤ |A(δ)) Since δ −9 and p = δ 8 , the first two terms tend to 0 as δ → 0, which proves Lemma 4.11.

Coupling for (s 2 , s 3 ]
The walksX id andX τ 1 •τ 2 are uniformly distributed on their conjugacy class. Thus one can coupleX id andX τ 1 •τ 2 so that The theorem now follows immediately: Thus using Lemma 4.12 we see that (46) holds which finishes the proof. Fix β > 0 and let Assume that β is such that t β is an integer. Consider for 1 ≤ i ≤ n the event A i that the i-th card is not collected by time t β , that is i / ∈ t β =1 N (γ ). Thus for 1 ≤ i 1 < · · · < i ≤ n and ≤ n − k, Let N = N (n) ∈ N be increasing to infinity such that N 2 = o(n) and N = o(n 2 k −2 ). Then by the inclusion-exclusion formula we have that Let us lower bound the error term on the left hand side of (60). First (1 − /n) ≥ e − 2 /(n− ) , hence it follows that It is easy to see that the right hand side above converges to 1 as n → ∞. Using this and (60) it follows that For Likewise, for any j < n/N , Hence lim inf Let > 0. Then for any β > 0, if t = (1 − ) t mix then t < t β for n sufficiently large, and hence lim inf But it is obvious that ∩ m j=1 K [ j N,( j+1)N ] ⊂ K m and hence for t = (1 − ) t mix , lim inf n→∞ P(X t ∈ K m ) = 1.
Comparing with (56) the result follows.

B Proof of Theorem 3.6
Let ⊂ S n be a conjugacy class with cycle structure (k 2 , k 3 , . . . ). Let X = (X t : t = 0, 1, . . . ) be a random walk on S n which at each step applies an independent uniformly random element of . Let ρ = j ( j − 1)k j and letX be the transposition walk associated to the walk X using (49). In particular for t ≥ 0,X tρ = X t . Finally let Z = (Z 1 , Z 2 , . . . ) denote a Poisson-Dirichlet random variable. For convenience we restate Theorem 3.6 here. in distribution where θ(c) is given by (16).
The proof of this result is very similar to the proof of Theorem 1.1 in [24]. We give the details here.
Recall the hypergraph process H = (H t : t = 0, 1, . . . ) associated with the walk X defined in Sect. 3.1. Analogously letG = (G t : t = 0, 1, . . . ) be a process of graphs on {1, . . . , n} such that the edge {x, y} is present inG t if and only if the transposition (x, y) has been applied toX prior to and including time t. Hence we have that for each t = 0, 1, . . . ,G tρ = H t .
Recall thatX satisfies conditional uniformity as described in Proposition 4.1. Using the graph processG above and the conditional uniformity ofX the following lemma, which is the analogue of Lemma 2.4 in [24], follows almost verbatim from Schramm's arguments. For t = 0, . . . , defineX t = X(X s 0 +t ). We can assume that for t ≤ ,X s 0 +t satisfies the relaxed conditional uniformity assumption described in Definition 4.2. Indeed by making this assumption we are disregarding the constraint on the transpositions described in Proposition 4.1 applied toX t for t = s 0 , . . . , s. However the probability that we violate this constraint is at most 2 k/n.
Colour an element ofX 0 = X(X s 0 ) green if the cycle whose renormalised cycle length of this element lies in the giant component ofG s 0 . We colour all the other elements ofX 0 red. Thus asymptotically in n, the sum of the green elements is θ(c) and the sum of the red elements is 1 − θ(c). In the evolution of (X t : t = 0, 1, . . . ) we keep the colour scheme as follows. If an element fragments, both fragments retain the same colour. If we coagulate two elements of the same colour then the new element retains the colour of the previous two elements. If we coagulate a green element and a red element, then the colour of the resulting element is green. DefineX = (X t : t = 0, . . . , ) andX = (X t : t = 0, . . . , ) as follows. InitiallyX 0 =X 0 =X 0 . Apply the same colouring scheme toX andX as we did toX . Each step evolution is described as follows. Then the walks evolve as follows.
•X t : Evolves the same asX except we ignore any transition which involves a red entry. •X t : Evolves the same asX except that the markers u, v used in the transitions of X are distributed uniformly on [0, 1].
Lemma 3.1 states that the second largest component ofG s 0 has size o(n). Hence, initially each red element has size o(1) as n → ∞. Now does not increase with n, hence for any s = 0, 1, . . . , , we are unlikely to make a coagulation (or fragmentation) inX s without coagulating (or fragmenting) entries ofX s of similar size. Similar considerations for the processesX andX leads to the following lemma.

Lemma B.3
There exists a coupling between the walksX andX , and betweenX andX such that for each η > 0, Using the preceding lemma, it suffices now to find an appropriate coupling between X and Z . To do this we modify Schramm's coupling in [24]. First we let {J 1 , . . . , J L } be the set of times s ∈ {0, . . . , } such thatX s−1 =X s . It is easy to see that lim n→∞ P(L > √ ) = 1 and henceforth we will condition on the event that {L > √ } and set = √ . Define a processȲ = (Ȳ t : t = 0, . . . , ) as follows. InitiallyȲ 0 =X 0 . For t = 1, . . . , we letȲ t be X J t renormalised so that iȲ i (t) = 1 whereȲ i (t) is the i-th element ofȲ t .
We define a processZ = (Z t : t = 0, 1, . . . , √ ) as follows. Initially Z 0 has the distribution of a Poisson-Dirichlet random variable, independent ofȲ . Then for t = 1, . . . , defineZ t by applying the coupling in Sect. 4.2.2 toȲ andZ but with the following modifications: • the markers u, v ∈ [0, 1] are taken uniformly at random, • we always take v = v, • we modify the definition of a refreshment time: s is a refreshment time if either J s−1 + 2 ≤ J s or J s + s 0 is a refreshment time in the sense of Definition 4.1, • when a marked tile of size a fragments, it creates a tile of length v and a tile of length a − v. We mark the tile of length a − v.
It is not hard to check that the Poisson-Dirichlet distribution is invariant under this evolution and hence we have that for each t = 0, 1, . . . , , Z t has the law of a Poisson-Dirichlet. Our coupling agrees with the coupling in [24, Section 3] when = T is the set of all transpositions. In this case each time s is a refreshment time and hence the marked tile at time s is always chosen by the marker u. One can adapt the arguments in Chapter 3 of Schramm's paper to our case by using the following idea. Note first that all the estimates of Schramm apply at s when s is a refreshment time. When s is not a refreshment time and Schramm considers the event that the marker u at time s falls inside an unmatched tile, instead we consider the event that the marker v at time s − 1 falls inside an unmatched tile. By the properties of the coupling, this guarantees that at time s the marked tile is unmatched.
Adapting Schramm's arguments leads to the following lemma, which is the analogue of [24,Corollary 3.4].   for some constant C > 0. Hence it follows that for > 0 small P(B c 1 ) = P(¯ > 2 −5/4 5/8 ) ≤ P(¯ > 5/6 ) ≤ 1/6 + C 1/6 . which shows that P(B 1 ) → 1 as ↓ 0. Now we bound P(B c 2 ). Firstly we use the bound¯ ≥ and so we are left to bound N 0 from above. Using the stick breaking construction of Poisson-Dirichlet random variables (see for example [