Cutoff for conjugacyinvariant random walks on the permutation group
 296 Downloads
Abstract
We prove a conjecture raised by the work of Diaconis and Shahshahani (Z Wahrscheinlichkeitstheorie Verwandte Geb 57(2):159–179, 1981) about the mixing time of random walks on the permutation group induced by a given conjugacy class. To do this we exploit a connection with coalescence and fragmentation processes and control the Kantorovich distance by using a variant of a coupling due to Oded Schramm as well as contractivity of the distance. Recasting our proof in the language of Ricci curvature, our proof establishes the occurrence of a phase transition, which takes the following form in the case of random transpositions: at time cn / 2, the curvature is asymptotically zero for \(c\le 1\) and is strictly positive for \(c>1\).
Keywords
Random walks Symmetric group Mixing times Random transpositions Conjugacy classes Coalescence and fragmentation Coarse Ricci curvatureMathematics Subject Classification
60B15 60G50 60C051 Introduction
1.1 Main results
In the case where \(\Gamma = T\) is the set of transpositions, a famous result of Diaconis and Shahshahani [10] is that the cutoff phenomenon takes place at time \((1/2) n \log n\) asymptotically as \(n\rightarrow \infty \). That is, \({{\mathrm{\textit{t}_{mix}}}}(\delta )\) is asymptotic to \((1/2) n \log n\) for any fixed value of \(0< \delta <1\). It has long been conjectured that for a general conjugacy class such that \(\Gamma  = o(n)\) (where here and in the rest of the paper, \(\Gamma \) denotes the number of non fixed points of any permutation \(\gamma \in \Gamma \)), a similar result should hold at a time \((1/\Gamma ) n \log n\). This has been verified for kcycles with a fixed \(k\ge 2\) by Berestycki et al. [6]. This is a problem with a substantial history which will be detailed below. The primary purpose of this paper is to verify this conjecture. Hence our main result is as follows.
Theorem 1.1
Definition 1.1
In the terminology of Ollivier [18], this is in fact the curvature of the discretetime random walk whose transition kernel is given by \(m_x(\cdot ) = \mathbb {P}(X_t = \cdot  X_0 = x)\). We refer the reader to [18] for an account of the elegant theory which can be developed using this notion of curvature, and point out that a number of classical properties of curvature generalise to this discrete setup.
For our results it will turn out to be convenient to view the symmetric group as a metric space equipped with the metric d which is the word metric induced by the set T of transpositions (we will do so even when the random walk is not induced by T but by a general conjugacy class \(\Gamma \)). That is, the distance \(d(\sigma , \sigma ')\) between \(\sigma , \sigma ' \in {\mathcal {S}}_n\) is the minimal number of transpositions one must apply to get from one element to the other (one can check that this number is independent of whether rightmultiplications or leftmultiplications are used).
Theorem 1.2
A more general version of this theorem will be presented later on, which gives results for the curvature of a random walk induced by a general conjugacy class \(\Gamma \). This will be stated as Theorem 2.3.
We believe that the upper bound is the sharp one here, and thus make the following conjecture.
Conjecture 1.3
Of course the conjecture is already established for \(c\le 1\) and so is only interesting for \(c>1\).
1.2 Relation to previous works on the geometry of random transpositions
The transition described by Theorem 1.2 says that the discrete Ricci curvature increases abruptly (asymptotically) from zero to a positive quantity as c increases past the critical value \(c=1\), and so as we consider longer portions of the random walk. It is related to a result proved by the first author in [2]. There it was shown that the triangle formed by the identity and two independent samples \(X_t\) and \(X'_t\) from the random walk run for time \(t=cn/4\), is thin (in the sense of Gromov hyperbolicity) if and only if \(c<1\). Note that by reversibility, the path running from \(X_t\) to \(X'_t\) (via the identity) is a random walk run for time cn / 2. In other words, the result from [2] implies that the permutation group appears Gromov hyperbolic from the point of view of a random walker so long as it takes fewer than cn / 2 steps with \(c< 1\).
Hence, in both Theorem 1.2 and [2], there is a change of geometry (as perceived by a random walker) from low to high curvature after running for exactly \(t = cn/2\) steps with \(c=1\). At this point, we do not know of a formal way to relate these two observations, so they simply seem analogous. In a private conversation with the first author in 2005, Gromov had suggested that the hyperbolicity transition of [2] could be translated more canonically into the language of Ricci curvature and was an effect of the global positive curvature of \(\mathcal {S}_n\) rather than a breakdown in hyperbolicity. In a sense, Theorem 1.2 can be seen as a formalisation and justification of his prediction.
1.3 Relation to previous works on mixing times
Mixing times of Markov chains were initiated independently by Aldous [1] and by Diaconis and Shahshahani [10]. In particular, as already mentioned, Diaconis and Shahshahani proved Theorem 1.1 in the case where \(\Gamma \) is the set T of transpositions. Their proof relies on some deep connections with the representation theory of \({\mathcal {S}}_n\) and bounds on socalled character ratios. The conjecture about the general case appears to have first been made formally in print by Roichman [20] but it has no doubt been asked privately before then. We shall see that the lower bound \({{\mathrm{\textit{t}_{mix}}}}(\delta ) \ge (1+ o(1))(1/\Gamma )n \log n\) is fairly straightforward (it is carried out in Appendix A and is as usual based on a couponcollector type argument); the difficult part is the corresponding upper bound.
Flatto et al. [13] built on the earlier work of Vershik and Kerov [25] to obtain that \({{\mathrm{\textit{t}_{mix}}}}(\delta ) \le (1/2 + o(1))n \log n\) when \(\Gamma \) is bounded (as is noted in [9, pp. 44–45]). This was done using character ratios and this method was extended further by Roichman [20, 21] to show an upper bound on \({{\mathrm{\textit{t}_{mix}}}}(\delta )\) which is sharp up to a constant when \(\Gamma  = o(n)\) (and in fact, more generally when \(\Gamma \) is allowed to grow to infinity as fast as \((1\delta )n\) for any \(\delta \in (0,1)\)). Again using character ratios Lulov and Pak [17] showed the cutoff phenomenon as well as \({{\mathrm{\textit{t}_{mix}}}}= (1/\Gamma )n \log n\) in the case when \(\Gamma  \ge n/2\). Roussel [22, 23] obtains the correct value of the mixing time and establishes the cutoff phenomenon for the case \(\Gamma \le 6\).
Finally, let us discuss two more recent papers to which this work is most closely related to. Berestycki et al. [6], show using coupling arguments and a connection to coalescence–fragmentation processes that the cutoff phenomenon occurs at \({{\mathrm{\textit{t}_{mix}}}}= (1/k)n \log n\) in the case when \(\Gamma \) consists only of cycles of length k for any \(k\ge 2\) fixed.
Shortly after, Bormashenko [7] devised a path coupling argument for the coagulationfragmentation process associated to random transpositions to obtain a new proof of a slightly weaker version of the Diaconis–Shahshahani result: her argument implies that the mixing time of random transpositions is \(O( n \log n)\) (unfortunately the implicit multiplicative constant is not sharp, so this is not sufficient to obtain cutoff). See also [19] for another discussion of her results together with a reformulation in the language of coarse Ricci curvature. In a way her approach is very similar to ours, to the point that it can be considered a precursor to our work, since our method is also based on a certain path coupling for the coagulationfragmentation process which exploits certain remarkable properties of Schramm’s coupling [6, 24].
Comparison with [6] The authors in Berestycki et al. [6] remark that their proof can be extended to cover the case when \(\Gamma \) is a fixed conjugacy class and indicate that their methods can probably be pushed to cover the case when \(\Gamma =o(n^{1/2})\), but it is clear that new ideas are needed if \(\Gamma \) is larger. Indeed, their argument uses very delicate estimates about the behaviour of small cycles, together with a variant of a coupling due to Schramm [24] to deal with large cycles. The most technical part of their argument is to analyse the distribution of small cycles, using delicate couplings and carefully bounding the error made in these couplings.
However, when \(k = \Gamma \) is larger than \(n^{1/2}\), we can no longer think of the points in the conjugacy class as being sampled independently (with replacement) from \(\{1, \ldots , n\}\), by the birthday problem. This introduces many more ways in which errors in the above coupling arguments could occur. These seem quite hard to control, and hence new ideas are required for the general case.
The proof in this paper relies on similar observations as [6], and in particular the connection with coalescence–fragmentation process as well as Schramm’s coupling argument play a crucial role. The key new idea however, is to try to prove mixing not just in the total variation sense but in the stronger sense of the \(L^1\)Kantorovich distance (Ricci curvature) and to estimate it at a time well before the mixing time, roughly O(n / k) instead of \(O(n(\log n) / k)\). This may seem counterintuitive initially, however studying the random walk at this time scale allows us to make precise comparisons between the random walk and an associated random graph process. It turns out the random graph at these time scales can be described rather precisely. Furthermore, due to the contraction properties of the Kantorovich distance, somehow (and rather miraculously, we find), the estimate we obtain can be bootstrapped with sufficient precision to yield mixing exactly at the time \({{\mathrm{\textit{t}_{mix}}}}= (1/k) n \log n\).
In particular, since the heart of the proof consists in studying the situation at a time well before mixing, and purely to take advantage of the giant component at such times, we never have to study the distribution of small cycles. This is really quite surprising, given that the small cycles (in particular, the fixed points) are responsible for the occurrence of the cutoff at time \({{\mathrm{\textit{t}_{mix}}}}\).
1.4 Organisation of the paper
We stress that compared to [6], the main arguments are quite elementary. The heart of the proof is contained in Sects. 4.2 and 2 . Readers who are familiar with [6] are encouraged to concentrate on these two short sections.
The paper is organised as follows. In Sect. 2 we state and discuss Theorem 2.3, which is a general curvature theorem (of which Theorem 1.2 is the prototype). We also discuss why this implies the main theorem (Theorem 1.1). In Sect. 3.1 we study the associated random hypergraph process. The main result in that section is Theorem 3.1, which proves the existence and uniqueness of the giant component. Curiously this is the most technical aspect of the paper, and really the only place where the myriad of ways in which the conjugacy class \(\Gamma \) might be really big plays a role and needs to be controlled. Section 4 contains a proof of the main curvature theorem (Theorem 2.3), starting with the easy upper bound on curvature (Sect. 4.1) and following up with the slightly more complex lower bound (Sect. 4.2), which really is the heart of the proof. The two appendices contain respectively a proof of the lower bound on the mixing time (certainly known in the folklore, essentially a version of the coupon collector lemma); and an adaptation of Schramm’s argument [24] for the Poisson–Dirichlet structure of cycles inside the giant component, which is needed in the proof.
2 Curvature and mixing
2.1 Curvature theorem
As discussed above, the lower bound (5) is relatively easy and is probably known in the folklore; we give a proof in Appendix A. We now start the proof of the main results of this paper, which is the upper bound (the right hand side) of (5). In this section, we first state the more general version of Theorem 1.2 discussed in the introduction, and we will then show how this implies the desired result for the upper bound on \({{\mathrm{\textit{t}_{mix}}}}(\delta )\). To begin, we define the cycle structure \((k_2,k_3, \dots )\) of \(\Gamma \) to be a vector such that for each \(j \ge 2\), there are \(k_j\) cycles of length j in the cycle decomposition of any \(\gamma \in \Gamma \) (note that this does not depend on \(\tau \in \Gamma \)). Then \(k_j=0\) for all \(j>n\) and we have that \(k :=\Gamma  = \sum _{j=2}^\infty j k_j\).
Lemma 2.1
Proof
For the rest of the statements suppose that \(c>c_\Gamma \). The fact that \(c \mapsto \theta (c)\) is increasing follows from the definition of \(\psi (x,c)\) and the fact that \(\theta (c)=\psi (\theta (c),c)\). Continuity and differentiability for \(c> c_\Gamma \) is a straightforward application of the inverse function theorem.
Notice that \(\theta (c) \in [0,1]\) and is monotone, hence \(\theta (c)\) converges as \(c \downarrow c_\Gamma \) to a limit L. Then it follows that L solves the equation \(L=1\psi (L,c_\Gamma )\). This equation has only a zero solution and thus \(L=0\) and hence \(\lim _{c \downarrow c_\Gamma }\theta (c)=0\). The limit as \(c \uparrow \infty \) follows from a similar argument. \(\square \)
Remark 2.2
In the case when \(\Gamma =T\) is the set of transpositions we have that \(k'_2 = 1\) and \({\bar{\alpha }}_j = 0\) for \(j\ge 3\), hence \(\psi (x,c)=e^{cx}\) and thus the definition of \(\theta (c)\) above agrees with the definition given in the introduction.
Theorem 2.3
2.2 Curvature implies mixing
Note that for any two random variables X, Y on a metric space (S, d) we have the obvious inequality \(\Vert X  Y\Vert _{TV} \le W_1(X,Y)\) provided that \(x\ne y \) implies \(d(x,y) \ge 1\) on S. This is in particular the case when \(S= {\mathcal {S}}_n\) and d is the word metric induced by the set T of transpositions. In other words it suffices to prove mixing in the \(L^1\)Kantorovich distance.
Lemma 2.4
Proof
Consequently we have that for \(u\ge t'=\lfloor (1+\epsilon )(1/k)n \log n\rfloor \)u satisfies (20) for some sufficiently large \(c>c_\Gamma \). Hence \(\limsup _{n \rightarrow \infty } {\bar{d}}_{TV}(t')\rightarrow 0\) and thus (17) holds, which shows Theorem 1.1 for conjugacy classes such that the limit in (C) exists and \(\Gamma =o(n)\).
Now suppose that \(\Gamma \) is a conjugacy class such that \(\Gamma =o(n)\). Let \(t' = \lfloor (1+\epsilon )(1/\Gamma )n \log n\rfloor \) and notice that \(d_{TV}(t')\) is bounded. Along any subsequence \(\{n_i\}_{i \ge 1}\) such that \(\lim _{n_i\rightarrow \infty }d_{TV}(t')\) exists, we can extract a further subsequence \(\{n_{i_j}\}_{j \ge 1}\) such that (C) holds since \((\alpha _j)_{j \ge 2} \in [0,1]^\infty \) which is compact under the product topology. Then we see that \(\lim _{n_{i_j}\rightarrow \infty }d_{TV}(t')=0\) and consequently \(\lim _{n_i\rightarrow \infty }d_{TV}(t')=0\). Since \(d_{TV}(t')\) is bounded and converges to 0 along any convergent subsequence, we conclude that \(\lim _{n\rightarrow \infty }d_{TV}(t')=0\), thus concluding the proof.
2.3 Stochastic commutativity
To conclude this section on curvature, we state a simple but useful lemma. Roughly, this says that the random walk is “stochastically commutative”. This can be used to show that the \(L^1\)Kantorovich distance is decreasing under the application of the heat kernel. In other words, initial discrepancies for the Kantorovich metric between two permutations are only smoothed out by the application of random walk.
Lemma 2.5
Let \(\sigma \) be a random permutation with distribution invariant by conjugacy. Let \(\sigma _0\) be a fixed permutation. Then \(\sigma _0 \circ \sigma \) has the same distribution as \(\sigma \circ \sigma _0 \).
Proof
Define \(\sigma ' = \sigma _0 \circ \sigma \circ \sigma _0^{1}\). Then since \(\sigma \) is invariant under conjugacy, the law of \(\sigma '\) is the same as the law of \(\sigma \). Furthermore, we have \( \sigma _0 \circ \sigma = \sigma ' \circ \sigma _0\) so the result is proved. \(\square \)
This lemma will be used repeatedly in our proof, as it allows us to concentrate on events of high probability for our coupling.
3 Preliminaries on random hypergraphs
For the proof of Theorem 1.1 we rely on properties of certain random hypergraph processes. The reader who is only interested in a first instance in the case of random transpositions, and is familiar with Erdős–Renyi random graphs and with the result of Schramm [24] may safely skip this section.
3.1 Hypergraphs
In this section we present some preliminaries which will be used in the proof of Theorem 2.3. Throughout we let \(\Gamma \subset {\mathcal {S}}_n\) be a conjugacy class and let \((k_2,k_3,\dots )\) denote the cycle structure of \(\Gamma \). Thus \(\Gamma \) consists of permutations such that in their cycle decomposition they have \(k_2\) many transpositions, \(k_3\) many 3cycles and so on. Note that we have suppressed the dependence of \(\Gamma \) and \((k_2,k_3,\dots )\) on n. We assume that (C) is satisfied so that for each \(j\ge 2\), \(j k_j/\Gamma  \rightarrow {\bar{\alpha }}_j\) as \(n \rightarrow \infty \). We also let \(k=\Gamma \) so that \(k=\sum _{j \ge 2} j k_j\), as usual.
Definition 3.1
A hypergraph \(H=(V,E)\) is given by a set V of vertices and \(E \subset \mathcal {P}(V)\) of edges, where \(\mathcal {P}(V)\) denotes the set of all subsets of V. An element \(e \in E\) is called a hyperedge and we call it a jhyperedge if \(e=j\).
To X we associate a certain hypergraph process \(H=(H_t:t=0,1, \ldots )\) defined as follows. For \(t =0,1 ,\ldots \), \(H_t\) is a hypergraph on \(\{1,\dots ,n\}\) where a hyperedge \(\{x_1,\dots ,x_j\}\) is present if and only if a cyclic permutation consisting of the points \(x_1,\dots ,x_j\) in some arbitrary order has been applied to the random walk X prior to time t as part of one of the \(\gamma _i\)’s for some \(i \le t\). Thus at every step, we add to \(H_t\)\(k_j\) hyperedeges of size j sampled uniformly at random without replacement, and these edges are independent from step to step. However, note that the presence of hyperedges themselves are not in general independent.
3.2 Giant component of the hypergraph
In the case \(\Gamma =T\), the set of transpositions, the hypergraph \(H_s\) is a realisation of an Erdős–Renyi graph. Analogous to Erdős–Renyi graphs, we first present a result about the size of the components of the hypergraph process \(H=(H_t:t= 0,1,\dots )\) (where by size, we mean the number of vertices in this component). For the next result recall the definition of \(\psi (x,c)\) in (11). Recall that for \(c> c_\Gamma \), where \(c_\Gamma \) is given by (12), there exists a unique root \(\theta (c)\in (0,1)\) of the equation \(\theta (c)=1\psi (\theta (c),c)\).
Theorem 3.1
Consider the random hypergraph \(H_s\) and suppose that \(s=s(n)\) is such that \(sk/n \rightarrow c\) as \(n \rightarrow \infty \) for some \(c> c_\Gamma \). Then there is a universal constant \(D>0\) such that with probability tending to one all components but the largest have size at most \( D n^{2/3} (\log (n))^3\). Furthermore, the size of the largest component, normalised by n, converges to \(\theta (c)\) in probability as \(n\rightarrow \infty \).
Of course, this is the standard Erdős–Renyi theorem in the case where \(\Gamma = T\) is the set of transpositions. See for instance [12], in particular Theorem 2.3.2 for a proof. In the case of kcycles with k fixed and finite, this is the case of random regular hypergraphs analysed by Karoński and Łuczak [15]. For the slightly more general case of bounded conjugacy classes, this was proved by Berestycki [4].
Discussion Note that the behaviour of \(H_s\) in Theorem 3.1 can deviate markedly from that of Erdős–Renyi graphs. The most obvious difference is that \(H_s\) can contain mesoscopic components, something which has of course negligible probability for Erdős–Renyi graphs. For example, suppose \(\Gamma \) consists of \(n^{1/2}\) transpositions and one cycle of length \(n^{1/3}\). Then the giant component appears at time \(n^{1/2}/2\) with a phase transition (i.e., \(c_\Gamma >0\), because in this case \(\sum {\bar{\alpha }}_j =1\), as most of the mass comes from microscopic cycles). Yet even at the first step there is a component of size \(n^{1/3}\). Nevertheless we will see that once there is a giant component there is a limit to how big can the nongiant component be (we show this is less that \(O(n^{2/3})\) up to logarithmic terms; this is certainly not optimal).
From a technical point of view this has nontrivial consequences, as proofs of the existence of a giant component are usually based on the dichotomy between microscopic components and giant components. Furthermore, when the conjugacy class is large and consists of many small or mesoscopic cycles, the hyperedges have a strong dependence, which makes the proof very delicate.
In effect, perhaps surprisingly this will be the only place of the proof where all the possible ways in which the conjugacy class \(\Gamma \) might be big (potentially of size very close to n), needs to be handled. The difficulty of the proof below is to find an argument which works no matter how \(\Gamma \) is made up, so long as \(k = \Gamma  = o(n)\). This is of course also the problem in the original question of studying the mixing time of the random walk induced by \(\Gamma \). However, what we have gained here compared to this original question, is the monotonicity of component sizes when hyperedges are added to \(H_s\).
Preliminaries: exploration Suppose that \(s=s(n)\) is such that \(sk/n \rightarrow c\) for some \(c> 0\) as \(n \rightarrow \infty \) for some \(c\ge 0\). We reveal the vertices of the component containing a fixed vertex \(v \in \{1,\dots ,n\}\) using breadthfirst search exploration, as follows. There are three states that each vertex can be: unexplored, removed or active. Initially v is active and all the other vertices are unexplored. At each step of the iteration we select an active vertex w according to some prescribed rule among the active vertices at this stage (say with the smallest label). The vertex w becomes removed and every unexplored vertex which is joined to w by a hyperedge becomes active. We repeat this exploration procedure until there are no more active vertices. At stage \(i = 0,1, \ldots \) of this exploration process, we let \(A_i\), \(R_i\) and \(U_i\) denote the set of active, removed and unexplored vertices respectively. Thus initially \(A_0=\{v\}\), \(U_0=\{1,\dots ,n\}\backslash \{v\}\) and \(R_0=\emptyset \). We will let \(a_i = A_i, u_i = U_i, r_i = R_i\).
For \(t =1,\dots ,s\) we call the hyperedges which are associated with the permutation \(\gamma _t\) the tth packet of hyperedges. Thus note that each packet consists of \(k_j\) hyperedges of size j, \(j \ge 2\), which are sampled uniformly at random without replacement from \(\{1, \ldots ,n\}\). In particular, within a given packet, hyperedges are not independent. However, crucially, hyperedges from different packets are independent. We will need to keep track of the hyperedges we reveal and where they “came from” (i.e., which packet they were part of), in order to deal with these dependencies. More precisely, as we explore the hypergraph \(H_s\), we discover various hyperedges of various sizes in \(H_s\) and this may affect the likelihood of other types of hyperedges in subsequent steps of the exploration process. To account for this, we introduce for \(t=1,\dots ,s\) and for \(j \ge 2\), the random subset of \(\{1, \ldots , n\}\), \(Y^{(t)}_{j} (i)\), which is defined to be the hyperedges of size j in the tth packet that were revealed in the exploration process prior to step i. We let \(y_j^{(t)}(i) = Y_j^{(t)}(i)\) denote the number of such hyperedges.
Equivalently, the active process \(A_i\) converges (at least for finite dimensional marginals) to the exploration process of a Galton–Watson tree whose offspring distribution is given by the limit of \(a_{i+1}  a_i + 1\) and thus has a moment generating function given by \(\psi (1\cdot , c)\).
It is perhaps surprising that the lemma below is sufficient for the proof of Theorem 3.1: the lemma below essentially only records whether a cycle is microscopic (finite) or “more than microscopic”; in particular, whether the mass of \(\Gamma \) comes from many small mesoscopic or fewer big cycles makes no difference.
Lemma 3.2
Proof
Lemma 3.2 above tells us that, at the level of generating functions, the distribution of \(a_{i+1}  a_i\) behaves very much like a sequence of i.i.d. random variables with distribution determined by \(\psi \), even if we don’t ignore selfintersections. It is thus easy to build martingales from quantities of the form \(q^{a_i}\), which behave as if the increments of \(a_i\) were i.i.d., at least until we reach size \(n^{2/3}\). Hence this will allow us to reach a size of \(n^{2/3}\) for \(a_i\) almost as if there were no selfintersections, and so with probability approximately \(\theta (c)\). Fundamentally, this is because even if selfintersections do occur, they are rare and do not cause a significant loss of mass. Technically, it is easier to have a separate argument for bringing the cluster to a polylogarithmic size before using this information to show that the cluster reaches size \(n^{2/3}\) with essentially the same probability. This is what we achieve in Step 1, which we are now ready for.
Step 1. We show that the cluster containing a given vertex v is at least logarithmically large with probability approximately \(\theta (c)\), and furthermore the number of vertices for which this occurs is approximately \(n \theta (c)\) in the sense of convergence in probability.
Lemma 3.3
Proof
Hence, it is clear that if \(W_n\) is the total progeny of this branching process, then \(\mathbb {P}( W_n \ge (\log n)^2) \ge \mathbb {P}( W_n = \infty ) = 1 q_n \rightarrow \theta (c)\), and combining with the argument in the upper bound on (32) we deduce that \(\mathbb {P}( W_n \ge (\log n)^2) \rightarrow \theta (c)\). On the other hand, \(T_1 < T_{\text {inter}}\) with probability tending to 1 as \(n \rightarrow \infty \) by the birthday problem, and so in fact \(\mathbb {P}( T_1 < T_\downarrow ) = \mathbb {P}( W_n \ge (\log n)^2 ) + o(1)\), so we are done. \(\square \)
It is important to note that selfintersections may occur at the very step that \(a_i\) exceeds \((\log n)^2\) (for instance, think about the case when the conjugacy class has some of its mass coming from cycles larger than \(n^{1/2}\): discovering such a cycle would immediately produce a selfintersection). Even so, the active set reaches size \((\log n)^2\) before such a selfintersection is discovered.
As announced at the beginning of Step 1, we complement this with a law of large numbers:
Lemma 3.4
Proof
Step 2. We now extrapolate the information obtained in the previous step to show that, still with probability approximately \(\theta (c)\), the active set of \(\mathcal {C}_v\) can reach a size of at least \(O(n^{2/3})\). To do so we suppose our exploration from Step 1 yields an active set of size at least \((\log n)^2\) (which, as discussed, occurs with probability \(\theta (c) + o(1)\). We will restart the exploration from that point on, calling this time \(i=0\) again. Hence the setup is the same as before, except that at time \(i=0\) we have \(a_0 = \lfloor (\log n)^2 \rfloor \): we only keep the first \((\log n)^2\) of the active vertices discovered at time \(T_1\), and declare all further active vertices at time \(T_1\) to be removed at time \(i=0\) in the exploration of Step 2.
Recall our notations for \(T^\downarrow \) and \(T^\uparrow \) in (26) and (27). Our goal in this step is to show the following control:
Lemma 3.5
Proof
Proof of Theorem 3.1
The proof of Theorem 3.1 is complete, since \(s+s'\) in an arbitrary sequence such that \((s+s')k/n \rightarrow c\). \(\square \)
3.3 Poisson–Dirichlet structure
The next result is a generalisation of Theorem 1.1 in [24] to the case of general conjugacy classes. The proof is a simple adaptation of the proof of Schramm and we provide the details in an appendix.
Theorem 3.6
4 Proof of curvature theorem
4.1 Proof of the upper bound on curvature
We claim that it is enough to show the upper bound for \(c>c_\Gamma \) in (15). Indeed, notice that \(c \mapsto \kappa _c\) is nondecreasing. Hence let \(c\le c_\Gamma \) and suppose we know that \(\limsup _{n\rightarrow \infty }\kappa _{c'} \le \theta (c')^2\) holds for all \(c' > c_\Gamma \). Then we have that \(\limsup _{n \rightarrow \infty }\kappa _c \le \theta (c')^2\) for each \(c'> c_\Gamma \). Taking \(c' \downarrow c_\Gamma \) and using the fact that \(\lim _{c' \downarrow c_\Gamma } \theta (c')=0\) shows that \(\lim _{n\rightarrow \infty }\kappa _c=0\).
4.2 Proof of lower bound on curvature
Definition 4.1
(Refreshment Times) We call a time s a refreshment time if s is of the form \(s=\rho \ell + \sum _{j=2}^m (j1) k_j\) for some \(\ell \in \mathbb {N}\cup \{0\}\) and \(m \in \mathbb {N}\backslash \{1\}\).
We see that s is a refreshment time if the transposition being applied to \({\tilde{X}}\) at time s is the start of a new cycle. Using this we can describe the law of the transpositions being applied to \({\tilde{X}}\).
Proposition 4.1
 (i)
\(s\rho + i\) is a refreshment time and thus \(\tau ^{(i)}_s\) corresponds to the start of a new cycle
 (ii)
\(s\rho + i\) is not a refreshment time and so \(\tau ^{(i)}_s\) is the continuation of a cycle.
Note that in either case, the second marker y is conditionally uniformly distributed among the vertices which have not been used so far. This conditional independence property is completely crucial, and allows us to make use of methods (such as that of Schramm [24]) developed initially for random transpositions) for general conjugacy classes, so long as \(\Gamma  = o(n)\). Indeed in that case the second marker y itself is not very different from a uniform random variable on \(\{1, \ldots , n\}\).
Now consider our two random walks, \(X^{{{\mathrm{id}}}}\) and \(X^{\tau _1\circ \tau _2}\) respectively, started respectively from id and \(\tau _1 \circ \tau _2\), and let \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1 \circ \tau _2}\) be the associated processes constructed using (49), on the transposition time scale. Thus to prove (46) it suffices to construct an appropriate coupling between \({\tilde{X}}^{{{\mathrm{id}}}}_{t \rho }\) and \({\tilde{X}}^{\tau _1 \circ \tau _2}_{t\rho }\). Next, recall that for a permutation \(\sigma \in {\mathcal {S}}_n\), \(\mathfrak {X}(\sigma )\) denotes the renormalised cycle lengths of \(\sigma \), taking values in \(\Omega _\infty \) defined in (42). The walks \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) are invariant by conjugacy and hence both are distributed uniformly on their conjugacy class. Thus ultimately it will suffice to couple \(\mathfrak {X}({\tilde{X}}^{{{\mathrm{id}}}}_{t \rho })\) and \(\mathfrak {X}({\tilde{X}}^{\tau _1\circ \tau _2}_{t \rho })\).
Let us informally describe the coupling before we give the details. In what follows we will couple the random walks \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) such that they keep their distance constant during the time intervals \([0,s_1]\) and \((s_2,s_3]\). In particular we will see that at time \(s_1\), the walks \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) will differ by two independently uniformly chosen transpositions. Thus at time \(s_1\) most of the cycles of \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) are identical but some cycles may be different. We will show that given that the cycles that differ at time \(s_1\) are all reasonably large, then we can reduce the distance between the two walks to zero during the time interval \((s_1,s_2]\). Otherwise if one of the differing cycles is not reasonably large, then we couple the two walks to keep their distance constant during the time interval \([0,s_1]\), \((s_1,s_2]\) and \((s_2,s_3]\).
More generally, our coupling has the property that \(d(X^{{{\mathrm{id}}}}_t, X^{\tau _1 \circ \tau _2}_t)\) is uniformly bounded, so that it will suffice to concentrate on events of high probability in order to get a bound on the \(L^1\)Kantorovich distance \(W(X^{{{\mathrm{id}}}}_t, X^{\tau _1 \circ \tau _2}_t )\).
4.2.1 Coupling for \([0,s_1]\)
4.2.2 Coupling for \((s_1,s_2]\)
For \(s \ge 0\) define \({\bar{X}}_s=\mathfrak {X}({\tilde{X}}^{{{\mathrm{id}}}}_{s+s_1 })\) and \({\bar{Y}}_s = \mathfrak {X}({\tilde{X}}^{\tau _1\circ \tau _2}_{s+s_1 })\). Here we will couple \({\bar{X}}_s\) and \({\bar{Y}}_s\) for \(s =0,\dots , \Delta \). During this time we aim to show that the discrepancies between \({\bar{X}}_0\) and \({\bar{Y}}_0\) resulting from performing the transpositions \(\tau _1\) and \(\tau _2\) at the end of the previous phase can be resolved. Our main tool for doing this will be a variant of a coupling of Schramm [24], which was already used in [6].
At each step s we try to create a matching between \({\bar{X}}_s\) and \({\bar{Y}}_s\) by matching an element of \({\bar{X}}_s\) to at most one element of \({\bar{Y}}_s\) of the same size. At any time s there may be several entries that cannot be matched. By parity the combined number of unmatched entries is an even number, and observe that this number cannot be equal to two. Now \({\tilde{X}}^{{{\mathrm{id}}}}_{s_1}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}_{s_1}\) differ by two transpositions as can be seen from (50). This implies that in particular initially (i.e., at the beginning of \((s_1,s_2]\)), there are four, six or zero unmatched entries between \({\bar{X}}_0\) and \({\bar{Y}}_0\).
Fix \(\delta >0\) and let \(A(\delta )\) denote the event that the smallest unmatched entry between \({\bar{X}}_{0}\) and \({\bar{Y}}_0\) has size greater than \(\delta >0\). We will show that on the event \(A(\delta )\) we can couple the walks such that \({\bar{X}}_{\Delta }={\bar{Y}}_{\Delta }\) with high probability. On the complementary event \(A(\delta )^c\), couple the walks so that their distance remains O(1) during the time interval \((s_1,s_2]\), similar to the coupling during \([0,s_1]\).
It remains to define the coupling during the time interval \((s_1,s_2]\) on the event \(A(\delta )\). We begin by estimating the probability of \(A(\delta )\).
Lemma 4.2
Proof
Recall that by construction \({\bar{X}}_0\) and \({\bar{Y}}_0\) only differ because of the two transpositions \(\tau _{1} \) and \(\tau _{2}\) appearing in (50).
Recall the hypergraph \(H_{s_1/\rho }\) on \(\{1,\dots ,n\}\) defined in the beginning of Sect. 3.1. Since \(c>c_\Gamma \), by Theorem 3.1, \(H_{s_1/\rho }\) has a (unique) giant component with high probability. Let \(A_1\) be the event that the four points composing the transpositions \(\tau _{1}, \tau _{2}\) fall within the largest component of the associated hypergraph \(H_{s_1/\rho }\). Since the relative size of the giant component converges in probability \(\theta (c)\) by Lemma 3.1, note that \(\mathbb {P}(A_1) \rightarrow \theta (c)^4\).
Furthermore, it follows from Theorem 3.6 that conditionally on the event \(A_1\), the asymptotic relative size of the cycles containing the four points making the transpositions \(\tau _1, \tau _2\) can be thought of as the size of four independent samples from a PoissonDirichlet distribution, multiplied by \(\theta (c)\). Hence the lemma is proved with \(p(\delta )\) being the probability that one of the four samples has a size smaller than \(\delta /\theta (c)\). Clearly \(p(\delta ) \rightarrow 0\) so the result is proved. \(\square \)
Recall that the transpositions which make up the walks \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) obey what we called conditional uniformity in Proposition 4.1. For the duration of \((s_1,s_2]\) we will assume the relaxed conditional uniformity assumption, which we describe now.
Definition 4.2
 (i)
if s is a refreshment time then x is chosen uniformly in \(\{1,\dots ,n\}\),
 (ii)
if s is not a refreshment time then x is taken to be the second marker of the transposition applied at time \(s1\).
In making the relaxed conditional uniformity assumption we are disregarding the constraints on (x, y) given in Proposition 4.1. However the probability we violate this constraint at any point during the interval \((s_1,s_2]\) is at most \(2(s_2s_1)\rho /n=2\Delta k/n \rightarrow 0 \); and on the event that this constraint is violated the distance between the random walks can increase by at most \((s_2s_1)=\Delta \). Hence we can without a loss of generality assume that during the interval \((s_1,s_2]\) both \({\tilde{X}}^{{{\mathrm{id}}}}\) and \({\tilde{X}}^{\tau _1\circ \tau _2}\) satisfy the relaxed conditional uniformity assumption.
Marginal evolution Let us describe the evolution of the random walk \({\bar{X}}=({\bar{X}}_s: s = 0,1,\dots )\). Suppose that \(s\ge 0\) and \({\bar{X}}_{s}= (x_1,\dots ,x_n)\). Now imagine the interval (0, 1] tiled using the intervals \((0,x_1],\dots ,(0, x_n]\) (the specific tiling rule does not matter). Initially for \(s=0\), and more generally if s is a refreshment time, we select \(u \in \{1/n,\dots ,n/n\}\) uniformly at random and then call the tile that contains u the marked tile. If \(s \ge 1\) is not a refreshment time then the marked tile is the one containing the second marker y of Proposition 4.1 from the previous step. Either way, we have a distinguished tile (the tile containing the ‘first marker’ at the beginning of each step \(s = 0, 1, \ldots \)

if \(I' \ne I\) then we merge the tiles I and \(I'\). The new tile we created is now marked for the next step.

If \(I=I'\) then we split I into two fragments, corresponding to where v falls. Thus, one of size \(v1/n\) and the other of size \(I(v1/n)\). The rightmost one of these two tiles, containing v, is now marked for the next step. Now \({\bar{X}}_{s+1}\) is the sizes of the tiles in the new tiling we have created (with additional reordering of tiles in decreasing order). .
Coupling We now recall the coupling of [6]. Let \(s \ge 0\). Suppose that \({\bar{X}}_{s}={\bar{X}}=(x_1,\dots ,x_n)\) and \({\bar{Y}}_s= {\bar{Y}}=(y_1,\dots ,y_n)\). Then we can differentiate between the entries that are matched and those that are unmatched: we say that two entries from \({\bar{X}}\) and \({\bar{Y}}\) are matched if they are of identical size. Our goal will be to create as many matched parts as possible and as quickly as possible. When putting down the tilings \({\bar{X}}\) and \({\bar{Y}}\), associated with \({\bar{X}}\) and \({\bar{Y}}\) respectively, we will do so in such a way that all matched parts are to the right of the interval (0, 1] and the unmatched parts occupy the left part of the interval.
Let \(I_{{\bar{X}}}\) and \(I_{{\bar{Y}}}\) be the respective marked tiles of the tilings \({\bar{X}}\) and \({\bar{Y}}\) at some step \(s \ge 0\), and let \({\hat{X}}, {\hat{Y}}\) be the tiling which is the reordering of \({\bar{X}},{\bar{Y}}\) in which \(I_{{\bar{X}}}\) and \(I_{{\bar{Y}}}\) have been put to the left of the interval (0, 1]. We assume that at the start of the step, either \(I_{{\bar{X}}}\) and \(I_{{\bar{Y}}}\) are both matched to each other, or they are both unmatched. (We will then verify that this property is preserved by the coupling). Let \(a=I_{{\bar{X}}}\) and let \(b=I_{{\bar{Y}}}\) be the respective lengths of the marked tiles, and assume without loss of generality that \(a<b\). Let \(v \in \{2/n,\dots ,n/n\}\) be chosen uniformly. We will apply \(T({\bar{X}}, I_{{\hat{X}}}, v)\) to \({\hat{X}}\) as we did before, and obtain \({\bar{X}}_{s+1}\). To obtain \({\bar{Y}}_{s+1}\) we will also apply the transformation T to it, but with an other uniform random variable \(v' \in \{2/n,\dots ,n/n\}\) which may differ from v. To construct \(v'\) we proceed as follows.
Before checking that the coupling is well defined (in the sense that our assumption on the marked tiles, which are needed for the definition of the coupling, remains true throughout), we briefly add a few words of motivation for this definition.
Motivation for the coupling The coupling defined above is, as already mentioned above, the same as the one used in [6], which is a modification of a coupling due to Schramm [24]. In Schramm’s original coupling, the map \(\Phi \) was taken to be the identity, which is natural enough. However this leads to the undesirable property that it is possible for very small unmatched pieces to appear; once these small unmatched pieces appear they remain in the system for a very long time which could prevent coupling. The reason for introducing the map \(\Phi \) here and in [6] (where it was one of the main innovations) is that it prevents the occurrence of small unmatched pieces: as we will see in Lemma 4.4, the crucial property is that the worst thing that can happen is for the smallest unmatched piece to become smaller by a factor of two, and this only happens with small probability. This means all unmatched pieces remain relatively large, and so they disappear quite quickly (leading in turn to a coupling of the two copies). We start by a proof that the coupling is well defined:
Lemma 4.3
At the end of a step the two marked tiles are either matched to each other or both unmatched.
Proof
The proof consists of examining several cases. If the first marker u was in a matched tile, then whether v falls in the matched or unmatched part, the property holds (if v is unmatched then we attach two unmatched tiles to two matched tiles, so they both become unmatched. If it falls in the matched part, either two tiles of the same size are being attached, or the marked tile splits into two tiles of same size, and the rightmost piece which is the new marked tile matches in both copies).
A similar analysis can be done if the first marked tile was unmatched. The only case which requires an observation is if v falls in the same tile as \(I_{{\bar{X}}}\) (where we assume, as in the figure, that this is the sorter of the two unmatched pieces), then if v falls in the first half of the tile this results in two matched pieces which are unmarked for the next step and two unmatched pieces which are both marked. If however v falls in the second half of \(I_{{\bar{X}}}\) then this results in two matched pieces which are marked, and two unmatched pieces which are not marked. (Recall that the marked tile at the next step is the one containing the rightmost fragment). \(\square \)
This coupling has several remarkable deterministic properties, as already observed in [6]. Chief among those is the fact that the number of unmatched entries can only decrease. Unmatched entries disappear when they are coalesced. In particular they disappear quickly when their size is reasonably large. Hence it is particularly desirable to have a coupling in which unmatched components stay large. The second crucial property of this coupling is that it does not create arbitrarily small unmatched entries: even when unmatched entry is fragmented, the size of the smallest unmatched entry cannot decrease by more than a factor of two. (As these properties hold deterministically given the marked tiles, they do not need to be proved again). A direct consequence of these properties is the following lemma, which is Lemma 19 from [6].
Lemma 4.4
Remark 4.5
In particular, it holds that \(U' \ge 2^{j1}/n\).
We now explain our strategy. On \(A(\delta )\) we will expect that the unmatched components will remain of a size roughly of order at least \(\delta \) for a while. In fact we will show that they will stay at least as big as \(O(\delta ^2)\) for a long time. Unmatched entries disappear when they are merged together. If all unmatched entries are of size at least \(\delta ^2\), we will see that with probability at least \(\delta ^8\), we have a chance to reduce the number of unmatched entries in every 3 steps. Then a simple argument shows that after time \(\Delta =\lceil \delta ^{9}\rceil \), \({\bar{X}}_\Delta \) and \({\bar{Y}}_\Delta \) are perfectly matched with a probability tending to one as \(\delta \rightarrow 0\).
Lemma 4.6
There is \(\delta _0\) such that if \(\delta < \delta _0\), during \([0, \Delta ]\), both \({\bar{X}}_s \) and \({\bar{Y}}_s\) always have an entry of size greater than \( \delta \theta (c) \) with probability at least \(1 2\delta ^{1/2}\) for all n sufficiently large.
Proof
We now check that all unmatched components really do stay greater than \(\delta ^2\) during \([0, \Delta ]\). Let \(T_\delta \) denote the first time s that either \({\bar{X}}_s\) or \({\bar{Y}}_s\) have no cycles greater than \( \delta \theta (c) n\) (suppose without loss of generality that \(\delta \) is small enough that \(\delta ^2 \le \delta \theta (c)\)).
Lemma 4.7
On \(A(\delta )\), for all \(s \le T_\delta \wedge \Delta \), all unmatched components stay greater than \(\delta ^2\) with probability at least \(1O(\delta ) \), where the constant implied in \(O(\delta )\) can depend on c but not on \(\delta \).
Proof
Say that a number \(x \in [0,1]\) is in scale j if \(2^j/n \le x < 2^{j+1}/n\). For \(s\ge 0\), let U(s) denote the scale of the smallest unmatched entry of \({\bar{X}}_s, {\bar{Y}}_s\). Let \(j_0\) be the scale of \(\delta \), and let \(j_1\) be the integer immediately above the scale of \(\delta ^2\).
Suppose for some time \(s\le T_\delta \), we have \(U(s) = j\) with \(j_1 \le j \le j_0\), and the marked tile at time s corresponds to the smallest unmatched entry. Then after this transposition we have \(U(s+1) \ge j1\) by the properties of the coupling (Lemma 4.4). Moreover, \(U(s+1) = j1\) with probability at most \(r_j = 2^{j+2} / n\). Furthermore, since \(s \le T_\delta \), we have that this marked tile merges with a tile of size at least \(\theta (c) \delta \) with probability at least \(\theta (c) \delta \) after the transposition. We call the first occurrence a failure and the second a mild success.
Once a mild success has occurred, there may still be a few other unmatched entries in scale j, but no more than five since the total number of unmatched entries is decreasing, and there were at most six initially. And therefore if six mild successes occur before a failure, we are guaranteed that \(U(s+1) \ge j+1\). We call such an event a good success, and note that the probability of a good success, given that U(s) changes scale, is at least \( p_j = 1 6 r_j / (r_j + \theta (c) \delta )\). We call \(q_j = 1p_j\).
We are now going to prove that on the event \(A(\delta )\), after time \(\Delta \) there are no unmatched entries with probability tending to one as \(n\rightarrow \infty \) and \(\delta \rightarrow 0\). The basic idea is again to exploit that there are initially at most six unmatched parts, and this number cannot increase. We need a few preparatory lemmas which construct a scenario which lead to a coagulation of two unmatched entries in each copy, in three steps. Let \(T'_\delta \) be the first time one of the unmatched entries is smaller than \(\delta ^2\). Let \(\mathcal {F}_s\) denote the filtration generated by \(({\bar{X}}_1, \ldots , {\bar{X}}_s)\) including the marked tiles at the end of each step up to time s. Let \(\mathcal {K}_s\) be the event that step s results in two unmatched entries being merged in both copies, so our first goal (achieved in Lemma 4.10) will be to get a lower bound on the probability of \(\mathcal {K}_s\).
Step 1. We show that with good probability both marked tiles are unmatched at the end of a step (and thus also at the beginning of the next step).
Lemma 4.8
Proof
Let u, v be the two markers for step s. If the tile containing u was matched, then it suffices for v to fall in an unmatched tile (then \(\mathcal {M}_s\) occurs), which occurs again with probability at least \(\delta ^2\). If however the tile containing u was unmatched, the copy which contains the smallest of these two unmatched tiles necessarily contains at least another unmatched tile. It then suffices for v to fall in that tile. Indeed if we are very lucky and the other copy also just happen to have two unmatched entries this might lead to a reduction in the number of unmatched entries, in which case \(\mathcal {K}_{s}\) has occurred. Otherwise we have simply shuffled the unmatched entries and v is now in an unmatched entry, so \(\mathcal {M}_s\) holds. Either way, the conditional probability is at least \(\delta ^2\). \(\square \)
Step 2. We show that if the marked tiles are unmatched, with good probability we can get to a “balanced configuration” where both copies contain at least two unmatched tiles, and that the marked tiles at the end of the step are both unmatched.
Lemma 4.9
Proof
Let u, v be the two markers for step s. Suppose first that s is not a refreshment time, so we aim to prove (54). We treat several cases, according to whether \(\mathcal {B}_{s1}\) holds or not. We start by assuming that \(\mathcal {B}_{s1}\) does not hold. The idea is that in that case, at time \(s1\), one copy (say \({\bar{Y}}_{s1}\)) has one unmatched entry, while the other one has at least three. It then suffices to fragment the unmatched entry in \({\bar{Y}}_{s1}\) and to coagulate the other two entries in \({\bar{X}}_{s1}\). Since \(\mathcal {M}_{s1}\) holds, and s is not a refreshment time, it suffices for v (the marker corresponding to the copy which has the smallest unmatched entry, which is necessarily \({\bar{X}}_{s1}\)) to fall in any of the other unmatched entries of \({\bar{X}}_{s1}\): this necessarily results in a balanced configuration. Note also that this always results in both marked tiles to be unmatched at the end of the step, so \(\mathcal {B}_s\) indeed holds in that case. Moreover, this event has probability at least \(\delta ^2\) since \(T'_\delta \ge s\).
Suppose now that \(\mathcal {B}_{s1}\) holds. Then let us show directly \(\mathcal {K}_s\) can occur with good probability. Indeed, if the second marker \(v' = \Phi (v)\) (this is the marker associated with the copy, say \({\bar{Y}}_s\), that contains the larger of the two marked unmatched tiles) falls in another unmatched tile of \({\bar{Y}}_s\), then in this case a coagulation of two unmatched entries is guaranteed to occur in both copies. Hence \(\mathcal {K}_{s}\) occurs with probability at least \(\delta ^2\). Either way, (54) is proved.
Now suppose that s is a refreshment time. In that case it suffices to require that the first marker falls in an unmatched tile (which occurs with probability \(\delta ^2\)) and from then on we argue exactly as in the proof of (54) to obtain a proof of (55). All in all the lemma is proved. \(\square \)
We point out that, combining Lemmas 4.8 and 4.9 , regardless of whether s is a refreshment time, \(\mathbb {P}( \mathcal {B}_s  \mathcal {F}_{s1}) \ge \delta ^4\).
Step 3. Having reached a balanced configuration with one marked unmatched entry in both copies, we show that a coagulation of two unmatched entries in both copies has a good chance of occurring. In that case, the number of unmatched entries has decreased by two or four.
Lemma 4.10
Proof
We again need to distinguish between the cases where s is refreshment time or not. If not, then since \(\mathcal {B}_{s1}\) holds, then the first marker is in an unmatched tile for both copies. If the second marker \(v'\) corresponding to the copy with the larger of these two unmatched tiles falls in a different unmatched tile (which has probability at least \(\delta ^2\)) then a coagulation is guaranteed to occur in both copies so \(\mathcal {K}_{s}\) holds.
If s is refreshment time, then the same argument applies, but the first marker must first fall in an unmatched component (which has probability at least \(\delta ^2\) since \(T'_\delta \ge s\)). This gives a lower bound of \(\delta ^4\) on the probability of \(\mathcal {K}_s\), as desired. \(\square \)
Combining these three steps, it is now relatively easy to deduce the following:
Lemma 4.11
Proof
Initially there are at most 6 unmatched entries. Due to parity there can be either 6, 4 or 0 unmatched entries (note in particular that 2 is excluded, as a quick examination shows that no configuration can give rise to two unmatched entries). Furthermore, form the properties of the coupling, the number of unmatched entries either remains the same or decreases at each step. Once all the entries are matched they remain matched thereafter.
4.2.3 Coupling for \((s_2,s_3]\)

on the event \(A(\delta )^c\) we have that \(d({\tilde{X}}^{{{\mathrm{id}}}}_{s_2}, {\tilde{X}}^{\tau _1\circ \tau _2}_{s_2})=2\),
 we have that using Lemma 4.11$$\begin{aligned} \liminf _{\delta \downarrow 0}\liminf _{n \rightarrow \infty }\mathbb {P}\left( {\tilde{X}}^{{{\mathrm{id}}}}_{s_2}= {\tilde{X}}^{\tau _1\circ \tau _2}_{s_2}A(\delta )\right) =1, \end{aligned}$$

on the event \(\{{\tilde{X}}^{{{\mathrm{id}}}}_{s_2}\ne {\tilde{X}}^{\tau _1\circ \tau _2}_{s_2}\}\), note that the walks \({\bar{X}}\) and \({\bar{Y}}\) have at most 6 unmatched entries. Hence the coupling is such that \(d({\tilde{X}}^{{{\mathrm{id}}}}_{s_2}, {\tilde{X}}^{\tau _1\circ \tau _2}_{s_2})\le 4\) no matter what.
Lemma 4.12
The theorem now follows immediately:
Proof of Theorem 1.2
Notes
Acknowledgements
We thank Yuval Peres and Spencer Hughes for useful discussions on discrete Ricci curvature.
References
 1.Aldous, D.: Random walks on finite groups and rapidly mixing Markov chains. In: Seminar on Probability, XVII, Volume 986 of Lecture Notes in Mathematics, pp. 243–297. Springer, Berlin (1983)Google Scholar
 2.Berestycki, N.: The hyperbolic geometry of random transpositions. Ann. Probab. 34(2), 429–467 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 3.Berestycki, N.: Recent Progress in Coalescent Theory, Volume 16 of Ensaios Matemáticos [Mathematical Surveys]. Sociedade Brasileira de Matemática, Rio de Janeiro (2009)Google Scholar
 4.Berestycki, N.: Emergence of giant cycles and slowdown transition in random transpositions and kcycles. Electron. J. Probab. 16, 152–173 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 5.Berestycki, N., Durrett, R.: A phase transition in the random transposition random walk. Probab. Theor. Relat. Fields 136, 203–233 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 6.Berestycki, N., Schramm, O., Zeitouni, O.: Mixing times for random \(k\)cycles and coalescence–fragmentation chains. Ann. Probab. 39(5), 1815–1843 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Bormashenko, O.: A coupling argument for the random transposition walk. arXiv preprint arXiv:1109.3915
 8.Bubley, R., Dyer, M.: Path coupling: a technique for proving rapid mixing in Markov chains. In: 38th Annual Symposium on Foundations of Computer Science, 1997. Proceedings, pp. 223–231. IEEE (1997)Google Scholar
 9.Diaconis, P.: Group Representations in Probability and Statistics. Institute of Mathematical Statistics Lecture Notes—Monograph Series, 11. Institute of Mathematical Statistics, Hayward (1988)Google Scholar
 10.Diaconis, P., Shahshahani, M.: Generating a random permutation with random transpositions. Z. Wahrscheinlichkeitstheorie Verwandte Geb. 57(2), 159–179 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Durrett, R.: Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathematics, 4th edn. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
 12.Durrett, R.: Random Graph Dynamics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2010)Google Scholar
 13.Flatto, L., Odlyzko, A.M., Wales, D.B.: Random shuffles and group representations. Ann. Probab. 13(1), 154–178 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
 14.Jerrum, M.: A very simple algorithm for estimating the number of \(k\)colorings of a lowdegree graph. Random Struct. Algorithms 7(2), 157–165 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
 15.Karoński, M., Łuczak, T.: The phase transition in a random hypergraph. J. Comput. Appl. Math. 142(1), 125–135 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 16.Levin, D.A., Peres, Y., Wilmer, E.L.: Markov Chains and Mixing Times. With a Chapter by James G. Propp and David B. Wilson. American Mathematical Society, Providence (2009)zbMATHGoogle Scholar
 17.Lulov, N., Pak, I.: Rapidly mixing random walks and bounds on characters of the symmetric group. J. Algebr. Combin. 16(2), 151–163 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Ollivier, Y.: Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 256(3), 810–864 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Paulin, D.: Mixing and concentration by Ricci curvature. J. Funct. Anal. 270(5), 1623–1662 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 20.Roichman, Y.: Upper bound on the characters of the symmetric groups. Invent. Math. 125(3), 451–485 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Roichman, Y.; Characters of the symmetric groups: formulas, estimates and applications. In: Emerging Applications of Number Theory (Minneapolis, MN, 1996), Volume 109 of The IMA Volumes in Mathematics and its Applications, pp. 525–545. Springer, New York (1999)Google Scholar
 22.Roussel, S.: Marches aléatoires sur le groupe symétrique. Ph.D. thesis, Toulouse (1999)Google Scholar
 23.Roussel, S.: Phénomène de cutoff pour certaines marches aléatoires sur le groupe symétrique. Colloq. Math. 86(1), 111–135 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
 24.Schramm, O.: Compositions of random transpositions. Israel J. Math. 147, 221–243 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
 25.Vershik, A.M., Kerov, S.V.: Asymptotic theory of the characters of a symmetric group. Funkt. Anal. Prilozh. 15(4), 15–27, 96 (1981)MathSciNetGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.