Large deviations for the interchange process on the interval and incompressible flows

We use the framework of permuton processes to show that large deviations of the interchange process are controlled by the Dirichlet energy. This establishes a rigorous connection between processes of permutations and one-dimensional incompressible Euler equations. While our large deviation upper bound is valid in general, the lower bound applies to processes corresponding to incompressible flows, studied in this context by Brenier. These results imply the Archimedean limit for relaxed sorting networks and allow us to asymptotically count such networks.


Introduction
In this paper we investigate the large deviation principle for a model of random permutations called the one-dimensional interchange process. The process can be roughly described as follows. We put N particles, labelled from 1 to N, on a line {1, . . . , N} and at each time step perform the following procedure: an edge is chosen at random and adjacent particles are swapped. By comparing the particles' initial positions with their positions after given time t we obtain a random permutation from the symmetric group S N on N elements.
The interchange process on the interval (whose discrete time analog is known as the adjacent transposition shuffle) and on more general graphs has attracted considerable attention in probability theory, for example with regard to the analysis of mixing times. It is natural to ask whether, after proper rescaling and as N → ∞, the permutations obtained in the interchange process converge in distribution to an appropriately defined limiting process.
Such limits have been recently studied ([HKM + 13], [RVV19]) under the name of permutons and permuton processes. These notions have been inspired by the theory of graph limits ( [Lov12]), where the analogous notion of a graphon as a limit of dense graphs appears. A permuton is a Borel probability measure on [0, 1] 2 with uniform marginals on each coordinate. A sequence of permutations σ N ∈ S N is said to converge to a permuton µ as N → ∞ if the corresponding empirical measures converge weakly to µ. A permuton process is a stochastic process X = (X t , 0 ≤ t ≤ T ) taking values in [0, 1], with continuous sample paths and having uniform marginals at each time t ∈ [0, T ]. A permutation-valued path, such as a sample from the interchange process, is said to converge to X if the trajectory of a randomly chosen particle converges in distribution to X. Depending on the time scale considered, one observes different asymptotic structure in the permutations arising from the interchange process. If the average number of all swaps is greater than ∼ N 3 log N, the process will be close to its stationary distribution ( [Ald83], [Lac16]), which is the uniform distribution on S N . For ∼ N 3 swaps each particle has displacement of order N and the whole process converges, in the sense of permuton processes, to a Brownian motion on [0, 1] ( [RV17]).
Here we will be interested in yet shorter time scales, corresponding to ∼ N 2+ε swaps for fixed ε ∈ (0, 1). In this scaling each particle has displacement ≪ N, so the resulting permutations will be close to the identity permutation. Nevertheless, in the spirit of large deviation theory one can still ask questions about rare events, for example "what is the probability that starting from the identity permutation we are close to a fixed permuton after time t?" or, more generally, "what is the probability that the interchange process behaves like a given permuton process X?". We expect such probabilities to decay exponentially in N γ for some γ > 0, with the decay rate given by a rate function on the space of permuton processes.
The large deviation principle we obtain in this paper can be informally summarized as follows: for a class of permuton processes solving a natural energy minimization problem, the probability P(A) that the interchange process is close in distribution to a process X satisfies asymptotically 1 N γ log P(A) ≈ −I(X), where γ = 2 − ε and I(X) is the energy of X, defined as the expected Dirichlet energy of a path sampled from X. Apart from a purely probabilistic interest, the result is relevant to two other seemingly unrelated subjects, namely the study of Euler equations in fluid dynamics and the study of sorting networks in combinatorics. Let us first state the energy minimization problem in question, which is as follows -given a permuton µ, find inf where the infimum is over all permuton processes X such that (X 0 , X T ) has distribution µ.
As it happens, such energy-minimizing processes have been considered in fluid dynamics in the study of incompressible Euler equations, under the name of generalized incompressible flows. This connection is discussed in more detail in Section 2.2. Very roughly speaking, Euler equations in a domain D ⊆ R d describe motion of fluid particles whose trajectories satisfy the equation for some function p called the pressure. The incompressibility constraint means that the flow defined by the equation has to be volume-preserving. Classical, smooth solutions to Euler equations correspond to flows which are diffeomorphisms of D. Generalized incompressible flows are a stochastic variant of such solutions in which each particle can choose its initial velocity independently from a given probability distribution. It turns out that, under additional regularity assumptions, such generalized solutions to Euler equations (3) for D = [0, 1] correspond exactly to permuton processes solving the energy minimization problem (2) for some permuton µ. Our large deviation result (1) is valid precisely for such energy-minimizing permuton processes (again, under certain regularity assumptions).
As it happens, the original motivation for our work came from a different direction, namely from the study of sorting networks in combinatorics. This connection is explained in more detail below. Using our large deviation principle (1), we are able to prove novel results on a variant of the model we call relaxed sorting networks. Thus the large deviation principle presented in this paper provides a rather unexpected link between problems in combinatorics (sorting networks) and fluid dynamics (incompressible Euler equations), along with a quite general framework for analyzing permuton processes which we hope will find further applications.
Main results. Let us now state our main results more formally, still with complete definitions and discussion of assumptions deferred until Sections 2.1 and 3. Let D = D([0, T ], [0, 1]) be the space of càdlàg paths from [0, T ] to [0, 1] and let M(D) be the space of Borel probability measures on D. Let P ⊆ M(D) denote the space of permuton processes and their approximations by permutation-valued processes. For π ∈ M(D) by I(π) we will denote the expected Dirichlet energy of the process X whose distribution is π.
Let η N denote the interchange process in continuous time on the interval {1, . . . , N}, speeded up by N α for some α ∈ (1, 2). Let γ = 3 − α. We have the following large deviation principle Theorem A (Large deviation lower bound). Let P N be the law of the interchange process η N and let µ η N ∈ M(D) be the empirical distribution of its trajectories. Let π be a permuton process which is a generalized solution to Euler equations (19). Theorem B (Large deviation upper bound). Let P N be the law of the interchange process η N and let µ η N ∈ M(D) be the empirical distribution of its trajectories. For any closed set C ⊆ P we have lim sup The results are referred to as respectively Theorem 7.3 and Theorem 8.4 in the following sections. Here the large deviation upper bound is valid for all permuton processes, without any additional assumptions. On the other hand, in the proof of the lower bound we exploit rather heavily the special structure possessed by generalized solutions to Euler equations. We expect the lower bound to hold for arbitrary permuton processes as well, since one can locally approximate any permuton process by energy minimizers. However, for our techniques to apply one would need to understand in more detail regularity of the associated velocity distributions and pressure functions, which falls outside the scope of our work.
The reader may notice that the rate function, which is the energy I(π), is similar to the one appearing in the analysis of large deviations for independent random walks. In fact, the crux of our proofs lies in proving that particles in the interchange process and its perturbations are in a certain sense almost independent. The main techniques used here come from the field of interacting particle systems. A comprehensive introduction to the subject can be found in [KL99]. The novelty in our approach is in applying tools usually used to study hydrodynamic limits to a setting which is in some respects more involved, since the limiting objects we consider, permuton processes, are stochastic processes instead of deterministic objects like solutions of PDEs apearing, for example, for exclusion processes.
Sorting networks and the sine curve process. The large deviation bounds can be applied to obtain results on a model related to sorting networks. A sorting network on N elements is a sequence of M = N 2 transpositions (τ 1 , τ 2 , . . ., τ M ) such that each τ i is a transposition of adjacent elements and τ M • . . . • τ 1 = rev N , where rev N = (N . . . 2 1) denotes the reverse permutation. It is easy to see that any sequence of adjacent transpositions giving the reverse permutation must have length at least N 2 , hence sorting networks can be thought of as shortest paths joining the identity permutation and the reverse permutation in the Cayley graph of S N generated by adjacent transpositions.
A random sorting network is obtained by sampling a sorting network uniformly at random among all sorting networks on N elements. Let us work in continuous time, assuming each transposition τ i happens at time i M +1 . It was conjectured in [AHRV07] and recently proved in [Dau22] that the trajectory of a randomly chosen particle in a random sorting network has a remarkable limiting behavior as N → ∞, namely it converges in the sense of permuton processes to a deterministic limit, which is the sine curve process described below.
Here it will be more natural to consider the square [−1, 1] 2 and processes with values in [−1, 1] instead of [0, 1] (with the obvious changes in the notion of a permuton and a permuton process which we leave implicit). The Archimedean law is the measure on [−1, 1] 2 obtained by projecting the normalized surface area of a 2-dimensional half-sphere to the plane or, equivalently, the measure supported inside the unit disk {x 2 + y 2 ≤ 1} whose density is given by 1/(2π 1 − x 2 − y 2 ) dx dy. Observe that thanks to the well-known plank property each strip [a, b] × [−1, 1] has measure proportional to b − a, hence the Archimedean law defines a permuton.
The sine curve process is the permuton process A = (A t , 0 ≤ t ≤ 1) with the following distribution -we sample (X, Y ) from the Archimedean law and then follow the path A t = X cos πt + Y sin πt.
One can directly check that A t has uniform distribution on [−1, 1] at each time t, hence A t indeed defines a permuton process. Observe that (A 0 , A 0 ) = (X, X) and (A 0 , A 1 ) = (X, −X), thus the sine curve process defines a path between the identity permuton and the reverse permuton.
An equivalent way of describing the sine curve process consists of choosing a pair (R, θ) at random, where the angle θ is uniform on [0, 2π] and R has density r/2π √ 1 − r 2 dr on [0, 1], and following the path A t = R cos(πt + θ). Thus the trajectories of this process are sine curves with random initial phase and amplitude -the path of a random particle is determined by its initial position X and velocity V , given by (X, V ) = (R cos θ, −πR sin θ).
Recall now the energy minimization problem (18). The sine curve process is the unique minimizer of energy among all permuton processes joining the identity to the reverse permuton ( [Bre89], see also [RVV19]), with the minimal energy equal to I(A) = π 2 6 . It is one of the few examples where the solution to the problem (18) can be explicitly calculated for a target permuton µ. It also seems to play a special role in constructing generalized incompressible flows which are non-unique solutions to the energy minimization problem in dimensions greater than one, see, e.g., [BFS09].
The sine curve process is a generalized solution to Euler equations with the pressure function p(x) = x 2 2 , which unsurprisingly leads to each particle satisyfing the harmonic oscillator equation x ′′ = −x. The reader may check that the sine curve process satisfies the Assumptions (3.1) (with the velocity distribution being time-independent), thus providing a non-trivial and explicit example for which our large deviation bounds hold. To the best of our knowledge the connection between sorting networks on the one hand and Euler equations on the other hand was first observed in the literature in [Dau22].
Let us now describe the results on relaxed sorting networks. Fix δ > 0 and N ≥ 1. We define a δ-relaxed sorting network of length M on N elements to be a sequence of M adjacent transpositions (τ 1 , . . . , τ M ) such that the permutation σ M = τ M • . . . • τ 1 is δ-close to the reverse permutation rev = (N . . . 2 1) in the Wasserstein distance on the space M([0, 1] 2 ) of Borel probability measures on [0, 1] 2 (see Section 2.1 for the definition). For fixed κ ∈ (0, 1) we define a random δ-relaxed sorting network on N elements by choosing M from a Poisson distribution with mean ⌊ 1 2 N 1+κ (N − 1)⌋ and then sampling a δ-relaxed sorting network of length M uniformly at random.
Our first result is that the analog of the sorting network conjecture holds for relaxed sorting networks, that is, in a random relaxed sorting network the trajectory of a random particle is with high probability close in distribution to the sine curve process. Precisely, we have the following Theorem 1.1. Fix κ ∈ (0, 1) and let π N δ denote the empirical distribution of the permutation process (as defined in (5)) associated to a random δ-relaxed sorting network on N elements. Let π A denote the distribution of the sine curve process. Given any ε > 0 we have for all sufficiently small δ > 0 lim where B(π A , ε) is the ε-ball in the Wasserstein distance on P.
Here for consistency of notation we assume that the sine curve process is rescaled so that it is supported on [0, 1] rather than [−1, 1].
The second result is more combinatorial and concerns the problem of enumerating sorting networks. A remarkable formula due to Stanley ([Sta84]) says that the number of all sorting networks on N elements is equal to which is asymptotic to exp N 2 2 log N + ( 1 4 − log 2)N 2 + O(N log N) . For relaxed sorting networks we have the following asymptotic estimate Theorem 1.2. For any κ ∈ (0, 1) let S N κ,δ be the number of δ-relaxed sorting networks on N elements of length M = ⌊ 1 2 N 1+κ (N − 1)⌋. We have The asymptotics is analogous to that of Stanley's formula -the first term in the exponent corresponds simply to the number of all paths of required length, and, crucially, the factor π 2 6 corresponds to the energy of the sine curve process.
The proofs of Theorem 1.1 and Theorem 1.2 are given in Section 9. It would be an interesting problem to obtain analogous results for relaxed sorting networks reaching exactly the reverse permutation, not only being δ-close in the permuton topology. This case is not covered by the results of this paper, since the set of permuton processes reaching exactly the reverse permuton is not open, hence the lower bound of Theorem A does not apply.

Permutons and stochastic processes
Permutons. Consider the space M([0, 1] 2 ) of all Borel probability measures on the unit square [0, 1] 2 , endowed with the weak topology. A permuton is a probability measure µ ∈ M([0, 1] 2 ) with uniform marginals. In other words, µ is the joint distribution of a pair of random variables (X, Y ), with X, Y taking values in [0, 1] and having marginal distribution X, Y ∼ U[0, 1]. We will sometimes call the pair (X, Y ) itself a permuton if there is no risk of ambiguity. A few simple examples of permutons are the identity permuton (X, X), the uniform permuton (the distribution of two independent copies of X, which is the uniform measure on the square) or the reverse permuton (X, 1 − X).
Permutons can be thought of as continuous limits of permutations in the following sense. Let S N be the symmetric group on N elements and let σ ∈ S N . We associate to σ its empirical measure which is an element of M([0, 1] 2 ). By a slight abuse of terminology we will sometimes identify σ with µ σ . Since every such measure has uniform marginals on 1 N , 2 N , . . . , 1 , it is not difficult to see that if a sequence of empirical measures converges weakly, the limiting measure will be a permuton. Conversely, every permuton can be realized as a limit of finite permutations, in the sense of weak convergence of empirical measures (see [HKM + 13]). We will consider M([0, 1] 2 ) endowed with the Wasserstein distance corresponding to the Euclidean metric on [0, 1] 2 , under which the distance of measures µ and ν is given by where the infimum is over all couplings of (X, Y ) and (X ′ , Y ′ ) such that (X, Y ) ∼ µ, The path space D and stochastic processes. A natural setting for analyzing trajectories of particles in random permutation sequences is to consider D = D([0, T ], [0, 1]), the space of all càdlàg paths from [0, T ] to [0, 1]. We endow it with the standard Skorokhod topology, metrized by a metric ρ under which D is separable and complete. By M(D) we will denote the space of all Borel probability measures on D, endowed with the weak topology. It will be convenient to metrize M(D) by the Wasserstein distance, under which the distance between measures µ and ν is given by where the infimum is over all couplings (X, Y ) such that X ∼ µ, Y ∼ ν. We will also make use of the Wasserstein distance associated to the supremum norm, given by where · sup is the supremum norm on D and again the infimum is over all couplings (X, Y ) as above.
Given two times 0 ≤ s ≤ t ≤ T and a stochastic process X = (X t , 0 ≤ t ≤ T ) with distribution µ ∈ M(D), by µ s,t ∈ M([0, 1] 2 ) we will denote the distribution of the marginal (X s , X t ). Note that the projection µ → µ s,t is continuous as a map from M(D) to M([0, 1] 2 ) as long as paths X ∼ µ sampled from µ have almost surely no jumps at times s and t. We will sometimes implicitly identify the stochastic process with its distribution when there is no risk of misunderstanding.
Permutation processes and permuton processes. Consider a permutation-valued path η N = (η N t , 0 ≤ t ≤ T ), with η N t taking values in the symmetric group S N . We will always assume that η N is càdlàg as a map from [0, T ] to S N . Let η N (i) = η N t (i), 0 ≤ t ≤ T be the trajectory of i under η N and let X η N (i) = 1 N η N (i) be the rescaled trajectory. We define the empirical measure where δ X η N (i) is the delta measure concentrated on the trajectory X η N (i).
The associated permutation process X η N = (X η N t , 0 ≤ t ≤ T ) is obtained by choosing i = 1, . . . , N uniformly at random and following the path X η N (i). In other words, X η N is a random path with values in [0, 1] whose distribution is µ η N ∈ M(D). If η N is fixed, the only randomness here comes from the random choice of the particle i. Note that at each time t the marginal distribution of X η N t is uniform on 1 N , 2 N , . . . , 1 .
A permuton process is a stochastic process X = (X t , 0 ≤ t ≤ T ) taking values in [0, 1], with continuous sample paths and such that for every t ∈ [0, T ] the marginal X t is uniformly distributed on [0, 1]. The name is justified by observing that if π is the distribution of X, then for any fixed s, t ∈ [0, T ] the joint distribution π s,t ∈ M([0, 1] 2 ) of (X s , X t ) defines a permuton. As explained in the next subsection, permuton processes arise naturally as limits of permutation processes defined above.
Since every permutation process has marginals uniform on 1 N , 2 N , . . . , 1 , we will call it an approximate permuton process. By P we will denote the space of all permuton processes and approximate permuton processes, treated as a subspace of M(D) (with the same topology and the metric d W ).
Random permutation and permuton processes. A random permuton process is a permuton process chosen from some probability distribution on the space of all permuton processes, i.e., a random variable X, defined for a probability space Ω, such that X(ω) is a permuton process for ω ∈ Ω. By identifying the random variable with its distribution we can also think of a random permuton process as a random element of M(P). In this setting, with weak topology on M(P), one can consider convergence in distribution of random permuton processes X n to a (possibly also random) permuton process X.
One can prove (see [RVV19]) that if a sequence of random permutation processes X η N converges in distribution, then the limit is a permuton process (in general also random). Of particular interest will be sequences of random permutation-valued paths η N (coming for example from the interchange process) such that the corresponding permutation processes X η N converge in distribution to a deterministic permuton process (for example the sine curve process described below).
For any random permuton process X we define its associated random particle process X = E ω X(ω), which is a process with a deterministic distribution, obtained by first sampling a permuton process X(ω) and then sampling a random path according to X(ω).
To elucidate the difference between random and deterministic permuton processes, consider a random permuton process X and its associated random particle processX. If we sample an outcome X(ω) and then a path from X(ω), then obviously the distribution of paths will be the same as forX. However, consider now sampling an outcome X(ω) and then sampling independently two paths from X(ω). The distribution of a pair of paths obtained in this way will not in general be the same as the distribution of two independent copies sampled fromX, since the paths might be correlated within the outcome X(ω). The following general lemma will be useful later for showing that limits of certain random permutation processes are in fact deterministic ([RV17, Lemma 3]): Lemma 2.1. Let K be a compact metric space and let µ be a random probability measure on K, i.e., a random variable with values in M(K). Let X and Y be two independent samples from an outcome of µ and let Z be a sample from an outcome of an independent copy of µ. If (X, Y ), as a K 2 -valued random variable, has the same distribution as (X, Z), then µ is in fact deterministic, i.e., there exists ν ∈ M(K) such that µ = ν almost surely.
Energy. Here we introduce several related notions of energy for paths, permutations, permutons and permuton processes.
Given a path γ : [0, T ] → [0, 1] and a finite partition Π = {0 = t 0 < t 1 < . . . < t k = T } we define the energy of γ with respect to Π as and the energy of γ as where the supremum is over all finite partitions Π = {0 = t 0 < t 1 < . . . < t k = T }. For a path which is not absolutely continuous the supremum is equal to +∞. If a path γ is differentiable, its energy is equal to For a permutation σ ∈ S N we define its energy as Likewise, for a permuton µ ∈ M([0, 1] 2 ) its energy is defined by where the pair (X, Y ) has distribution µ. If µ = µ σ is the empirical measure of a permutation σ ∈ S N , defined by (4), then we have I(µ σ ) = I(σ). Note also that I = I(µ) is a continuous function of µ in the weak topology on M([0, 1] 2 ). Finally, we define the energy of a permuton process π as where the expectation is over paths γ sampled from π. We can extend this definition to any process π ∈ M(D) by adopting the convention that I(π) = +∞ if paths sampled from π are not absolutely continuous almost surely. The function I will turn out to correspond to the rate function in large deviation bounds for random permuton process. It can be checked that I is lower semicontinuous (in the weak topology on P) and its level sets {π ∈ P : I(π) ≤ C} are compact. We will also use the notation to denote the approximation of energy of π associated to the finite partition Π. The following lemma will be useful in characterizing the large deviation rate function in terms of these approximations Lemma 2.2. For any process π ∈ M(D) we have where the supremum is taken over all finite partitions Π = {0 = t 0 < t 1 < . . . < t k = T }.
Note that if Π ′ is a refinement of Π, then we have E Π (γ) ≤ E Π ′ (γ), thus E Πn (γ) → E(γ) monotonically as n → ∞. Now we apply the monotone convergence theorem to get the same same convergence for the expectations E γ∼π E Πn (γ).
The interchange process. The interchange process on the interval {1, . . . , N} is a Markov process in continuous time defined in the following way. Consider particles labelled from 1 to N on a line with N vertices. Each edge has an independent exponential clock that rings at rate 1. Whenever a clock rings, the particles at the endpoints of the corresponding edge swap places. By comparing the initial position of each particle with its position after time t we obtain a random permutation of {1, . . . , N}.
Formally, we define the state space of the process as consisting of permutations η ∈ S N , with the notation η = (x 1 , . . . , x N ) indicating that the particle with label i is at the position x i , or in other words, x i = η(i). The dynamics is given by the generator where η x,x+1 is the configuration η with particles at locations x and x + 1 swapped and α ∈ (1, 2) is a fixed parameter (introduced so that we will be able to consider the limit N → ∞). Since we will also be considering variants of this process with modified rates, we will often refer to the process with generator L as the unbiased interchange process.
The interchange process defines a probability distribution on permutation-valued paths η N = (η N t , 0 ≤ t ≤ T ) for any T ≥ 0. Consider now the permutation process X η N associated to η N , that is, sample η N according to the interchange process, pick a particle uniformly at random and follow its trajectory in η N . The distribution µ η N of X η N , defined by (5), is then a random element of M(D).
The position of a random particle in the interchange process will be distributed as the stationary simple random walk (in continuous time) on the line {1, . . . , N}. If we look at timescales much shorter than N 2 , typically each particle will have distance o(N) from its origin, so the permutation obtained at time t such that tN α ≪ N 2 will be close (in the sense of permutons) to the identity permutation. As mentioned in the introduction, we will be interested in large deviation bounds for rare events such as seeing a nontrivial permutation after a short time.

Euler equations and generalized incompressible flows
Let us now discuss the connection to fluid dynamics and incompressible flows (the discussion here follows [AF09] and [BFS09]). The Euler equations describe the motion of an incompressible fluid in a domain D ⊆ R d in terms of its velocity field u(t, x), which is assumed to be divergence-free. The evolution of u is given in terms of the pressure field p where the second equation encodes the incompressiblity constraint and the third equation means that u is parallel to the boundary ∂D.
Assuming u is smooth, the trajectory g(t, x) of a fluid particle initially at position x is obtained by solving the equation Since u is assumed to be divergence-free, the flow map Φ t g : D → D given by Φ t g (x) = g(t, x) is a measure-preserving diffeomorphism of D for each t ∈ [0, T ]. This means that where from now on by f * we denote the pushforward map on measures, associated to f , and µ D is the Lebesgue measure inside D. Denoting by SDiff(D) the space of all measure-preserving diffeomorphisms of D, we can rewrite the Euler equations in terms Arnold proposed an interpretation according to which the equation above can be viewed as a geodesic equation on SDiff(D). Thus one can look for solutions to (13) by considering the variational problem among all paths g(t, ·) : [0, T ] → SDiff(D) such that g(0, ·) = f , g(T, ·) = h for some prescribed f, h ∈ SDiff(D) (by right invariance without loss of generality f can be assumed to be the identity). The pressure p then arises as a Lagrange multiplier coming from the incompressibility constraint. Shnirelman proved ( [Shn87]) that in dimensions d ≥ 3 the infimum in this minimization problem is not attained in general and in dimension d = 2 there exist diffeomorphisms h = g(T, ·) which cannot be connected to the identity map by a path with finite action. This motivated Brenier ([Bre89]) to consider the following relaxation of this problem. With C(D) denoting the space of continuous paths from [0, T ] to D and M(C(D)) the set of probability measures on C(D), the variational problem is over all π ∈ M(C(D)) satisfying the constraints where π 0,T , π t denote the marginals of π at times respectively 0, T and at time t. Following Brenier, a probability measure π ∈ M(C(D)) satisfying constraints (16) is called a generalized incompressible flow between the identity id and h. To see that indeed (15) is a relaxation of (14), note that any sufficiently regular path g(t, ·) : [0, T ] → SDiff(D), for example corresponding to a solution of (13), induces a generalized incompressible flow given by π = (Φ g ) * µ D , where as before Φ g (x) = g(·, x). As evidenced by the sine curve process mentioned in the introduction, the converse is false -trajectories of particles sampled from a generalized flow can cross each other or split at a later time when starting from the same position, which is not possible for classical, smooth flows. We refer the reader to [Bre08] for an interesting discussion of physical relevance of this phenomenon.
The problem admits a natural further relaxation in which the target map is "nondeterministic", in the sense that we have π 0,T = µ with µ being an arbitrary probability measure supported on D × D and having uniform marginals on each coordinate, not necessarily of the form µ = (id, h) * µ D for some map h. From now on whenever we refer to problem (15) or generalized incompressible flows we will be always considering this more general variant.
The connection between the generalized problem (15) and the original Euler equations (13) is provided by a theorem due to Ambrosio and Figalli ([AF09]), with earlier weaker results by Brenier ([Bre99]). Roughly speaking, they showed that given a measure µ with uniform marginals there exists a pressure function p(t, x) such that the following holds -one can replace the problem of minimizing the functional (15) over incompressible flows satisfying π 0,T = µ by an easier problem in which the incompressibility constraint is dropped, provided one adds to the functional a Lagrange multiplier given by p. We refer the reader to [AF09, Section 6] for a precise formulation and further results on regularity of p.
In particular, if π is optimal for (15) and the corresponding pressure p is smooth enough, their result implies that almost every path γ sampled from π minimizes the functional In that case the equationg(t, x) = −∇p(t, g(t, x)) from (13) is nothing but the Euler-Lagrange equation for extremal points of the functional (17). We can therefore, at least under some regularity assumptions on p, think of generalized incompressible flows as solutions to (13) in which instead of having a diffeomorphism we assume random initial conditions for each particle. From now on let us restrict the discussion to D = [0, 1], which will be most directly relevant to the results of this paper. In this case the original problem (14) is somewhat uninteresting, since the only measure-preserving diffeomorphisms of [0, 1] are f (x) = x and f (x) = 1 − x. However, the relaxed problem (15) is non-trivial and indeed for the target map h(x) = 1 − x and T = 1 the unique optimal solution is given by the sine curve process.
In this setting, the reader may recognize that generalized incompressible flows are in fact the same objects as permuton processes. The term measure-preserving plans is used in [AF09] for what we call permutons. The functional minimized in (15) is the energy I(π) of a permuton process, defined in (10). In this language the optimization problem we are interested in can be rephrased as follows: where the infimum is over all permuton processes π ∈ P satisfying π 0,T = µ for a given permuton µ ∈ M([0, 1] 2 ).
Generalized solutions to Euler equations. We will say that a permuton process π is a generalized solution to Euler equations if there exists a function p : for t ∈ [0, T ]. This is of course equivalent to x ′′ (t) = −∂ x p(t, x(t)). By the remarks above, if π minimizes the energy in (18) and the associated pressure p is smooth enough, then π is always a generalized solution to Euler equations. However, this is only a necessary condition -for a discussion of corresponding sufficient conditions see [BFS09].

Proof outline and structure of the paper
Let us now give a brief outline of the proof strategy for Theorem A and Theorem B. For the lower bound, given a process X we construct a perturbation of the interchange process (defined by introducing asymmetric jump rates based on (19)) for which a law of large numbers holds, namely, the distribution of the path of a random particle converges to a deterministic limit (which is the distribution of X). The large deviation principle is then proved by estimating the Radon-Nikodym derivative between the biased process and the original one.
The key property which makes this construction possible is that the process X satisfies a second order ODE given by (19), so its trajectories are fully specified by the particle's position and velocity (the latter chosen initially from a mean zero distribution). The biased process is then constructed by endowing each particle with an additional parameter keeping track of its velocity, but we perform an additional change variables, working instead of velocity with a variable we call color. The advantage of this is that the uniform distribution of colors is stationary when the jump rates are properly chosen, which will greatly facilitate the analysis. An additional technical difficulty arises if the velocity distribution of X is time-dependent or not regular enough near the boundary, in which case we first approximate X by a process with a sufficiently regular and piecewise time-homogeneous velocity distribution.
To prove the law of large numbers we need to show that in the biased interchange process particles' trajectories behave approximately like independent samples from X. This requires proving that their velocities remain uncorrelated when averaged over time and is accomplished by means of a local mixing result called the one block estimate. It is here that we rely on stationarity of the uniform distribution of colors in the biased process and the fact that X has velocity zero on average.
The strategy for proving the upper bound is somewhat simpler. We consider a family of exponential martingales similar to the one employed in analyzing independent random walks and use the one block estimate to show that the particles' velocities are typically nonnegatively correlated. This enables us to prove the large deviation upper bound for compact sets and the extension to closed sets is done by proving exponential tightness.
Structure of the paper. The rest of the paper is structured as follows. In Section 3 we introduce the change of variables needed to define the process with colors and prove the approximation result for X mentioned above (Proposition 3.7). In Section 4 we define the biased interchange process and derive the conditions on its rates which guarantee stationarity. Section 5 contains the proof of the law of large numbers for the biased interchange process (Theorem 5.1). In Section 6 we prove two variants of the one block estimate -one needed for the large deviation upper bound (Lemma 6.2) and a more involved one needed for the proof of the law of large numbers (Lemma 5.4). In Section 7 these pieces are then used to prove the large deviation lower bound (Theorem 7.3). Section 8 is devoted to the proof of the large deviation upper bound (Theorem 8.4) and is independent of the previous sections (apart from the use of Lemma 6.2). Finally, in Section 9 we prove Theorem 1.1 and Theorem 1.2 on relaxed sorting networks.

ODEs and generalized solutions to Euler equations
Regularity assumptions and properties of generalized solutions. Suppose π is a generalized solution to Euler equations (19) and let X be a process with distribution π. For the proof of the large deviation lower bound we will need to impose additional regularity assumptions on π. For t ∈ [0, T ] let µ t denote the joint distribution of (x(t), x ′ (t)) when x is sampled according to π. In particular, µ 0 is the joint distribution of the initial conditions of the ODE (19).
We will assume that each µ t has a density ρ t (x, v) with respect to the Lebesgue measure on [0, 1] × R. For x ∈ [0, 1] and t ∈ [0, T ] let µ t,x denote the conditional distribution of v, given x, at time t. In addition we assume that for x = 0 or 1 the distribution µ t,x is a delta mass at 0, as otherwise the process X cannot stay confined to [0, 1] and have mean velocity zero everywhere (see the discussion of incompressiblity below).
Let F t,x denote the cumulative distribution function of µ t,x and let V t (x, ·) : [0, 1] → R be the quantile function of µ t,x , defined for x ∈ [0, 1] and φ ∈ (0, 1] by Assumption 3.1. Throughout the paper, we will assume that for a generalized solution to Euler equations π the following properties are satisifed (4) the density ρ t is continuously differentiable in t, x and v for each t ∈ [0, T ] and x, v in the interior of the support of ρ t Let us comment on the relevance of these assumptions. Assumption (1) will guarantee uniqueness of solutions to (19). Assumption (2) implies that the velocity of a particle moving along a path sampled from π stays uniformly bounded in time. Assumption (3) implies that for any x ∈ (0, 1) and φ ∈ [0, 1] we have F t,x (V t (x, φ)) = φ, i.e., V t (x, ·) is the inverse function of F t,x . Assumptions (3) and (4) imply that V t (x, φ) is a continuous function of t, x, φ and it is continuously differentiable in all variables for x ∈ (0, 1).
Note that for V t (x, φ) to be differentiable at φ = 0, 1, the distribution function F t,x necessarily has to be non-differentiable at corresponding v such that F t,x (v) = φ. This is why we can require the density ρ t to be smooth only in the interior of its support and not at the boundary.
From now on we assume that π is a fixed generalized solution to Euler equations, satisfying Assumptions (3.1). Almost every path x : Note that since π is a permuton process, each measure µ t satisfies the incompressibility condition, meaning that its projection onto the first coordinate is equal to the uniform measure on [0, 1]. This is equivalent to the property that for any test function f : An important consequence of the incompressibility assumption is that under µ t the velocity has mean zero at each x, that is, we have the following Proof. Consider any test function f : [0, 1] → R and write By incompressibility the integral above is always equal to 1 0 f (x) dx, in particular does not depend on time. On the other hand its derivative with respect to s is .
x (v)dx for any measurable g and f was an arbitrary test function, the claim of the lemma holds for almost every x. Since we have assumed that µ t has a continuous density, the claim in fact holds for all x, which ends the proof.
We will also make use of an explicit evolution equation that the densities ρ t have to satisfy. This is the content of the following lemma.
Proof. Let f : [0, 1] × R → R be any test function and consider the integral On the one hand, its derivative with respect to s is equal to Performing integration by parts with respect to x for the first term and with respect to v for the second term gives (noting that f has compact support so the boundary terms vanish) On the other hand, we have Since the test function f was arbitrary, the equation from the statement of the lemma must hold for every t, x, v as assumed.
The colored trajectory process. Let X = (X t , 0 ≤ t ≤ T ) be the permuton process with distribution π. For the large deviation lower bound we will need to construct a suitable interacting particle system in which the behavior of a random particle mimics that of the permuton process X. A crucial ingredient will be a property analogous to Lemma 3.2, i.e., having velocity distribution whose mean is locally zero. Instead of working with velocity v, whose distribution ρ t (x, v) at a given site x may change in time, it will be more convenient to perform a change variables and use another variable φ, which we call color, whose distribution will be invariant in time.
Recall that under Assumptions (3.1) the distribution function F t,x (·) and the quantile function V t (x, ·) are related by The reason for introducing the variable φ is the following elementary property -if φ is sampled from the uniform distribution on [0, 1], then V t (x, φ) is distributed according to µ t,x . Thus instead of working with (x, v) variables in the ODE (20), where the distribution of v evolves in time, we can set up an ODE for x and φ such that the joint distribution of (x, φ) will be uniform on [0, 1] 2 at each time. The velocity v and its distribution can then be recovered via the equation Let (x(t), v(t)) be a solution to (20) such that x(t) = 0, 1 and let

Lemma 3.3 implies that
and upon integrating by parts in the last integral we obtain Now, differentiating (21) with respect to x and φ gives If We also note that Lemma 3.2 expressed in terms of (x, φ) variables states that for each t ∈ [0, T ] and x ∈ [0, 1] we have From now on we work exclusively with (23). We will need to make two approximations necessary for the interacting particle system analysis later on. One is necessitated by the fact that the function V t (x, φ) might not be smooth with respect to x at the boundaries x = 0, 1 (this happens, for example, for the sine curve process). We will therefore replace the function by its smooth approximation in a β-neighborhood of the boundary and in the end take β → 0. The other approximation consists in dividing the time interval [0, T ] into intervals of length δ and approximating V t (x, φ) for given x, φ with a piecewise-constant function of t. This will enable us to give a simple stationarity condition for the corresponding interacting particle system and in the end take δ → 0 a well.
Let β ∈ (0, 1 The existence of such a function V β t is proved at the end of this section. By (x β (t), φ β (t)) we will denote the solution to the ODE Take any δ > 0 (to simplify notation we will assume that T is an integer multiple of δ, this will not influence the argument in any substantial way) and consider a partition We can now define the piecewise-stationary process which will be our main tool in subsequent arguments. Consider the ODE where Solutions to (27) exist and are unique as usual for any initial conditions, provided we interpret (y ′ (t), φ ′ (t)) above as right-handed derivatives at t = 0, t 1 , t 2 , . . . , t M −1 (we adopt this convention from now on). Let P β,δ = (X β,δ t , Φ β,δ t ), 0 ≤ t ≤ T be the stochastic process with values in [0, 1] 2 with the following distribution: choose (X β,δ 0 , Φ β,δ 0 ) uniformly at random from [0, 1] 2 and then take (X β,δ t , Φ β,δ t ) = (y(t), φ(t)), where (y, φ) is the solution of the system (27) with initial conditions given by (y(0), φ(0)) = (X β,δ 0 , Φ β,δ 0 ). We will call this process the colored trajectory process associated to (27).
We also define the process P β = (X β t , Φ β t ), 0 ≤ t ≤ T , which is obtained in the same way as P β,δ except that we follow solutions to (25) instead of (27), i.e., make no piecewise approximation in time of V β t . The key property of the process P β,δ is the following Proof. First we show that the process stays confined to [0, 1] 2 . Because of uniqueness of solutions to (27) it is enough to show that if a solution starts in the interior of [0, 1] 2 , it never reaches the boundary, or, equivalently, that if a solution is at the boundary at some t, it is actually at the boundary for all s ∈ [0, T ]. If y(t) = 0 or 1 for any t, then y ′ (t) = 0, since V β,δ (t, 0, φ) = V β,δ (t, 1, φ) = 0 for any φ. By uniqueness of solutions we then have y(t) ≡ 0 or 1. If φ(t) = 0 for any t, then R β,δ (t, y, 0) = 0 regardless of y, so as before φ ′ (t) = 0 and φ(t) ≡ 0. Finally, if φ(t) = 1, then using the property (c) of the function S β t (x, φ) we have so as before φ ′ (t) = 0 and φ(t) ≡ 1. Now we observe that the form of V β,δ and R β,δ in (27) implies that the vector field (V β,δ (t, ·, ·), R β,δ (t, ·, ·)) is divergence-free at each t, so by Liouville's theorem the uniform measure on [0, 1] 2 is invariant for the corresponding flow map.
It is readily seen that the statements above also hold for P β instead of P β,δ , hence with a slight abuse of notation we can allow δ = 0 and write P β,0 = P β , X β,0 = X β etc.
Our goal in the remainder of this section is to show that, as β, δ → 0, the processes X and X β,δ typically stay close to each other and have approximately the same Dirichlet energy, so in the probabilistic part of the arguments it will be enough to work with the process (X β,δ , Φ β,δ ), which is more convenient thanks to piecewise stationarity.
First we prove a simple lemma, showing that X β is unlikely to ever be close to the boundary (so that approximation of X with X β is meaningful as β → 0).
Lemma 3.5. Let P denote the law of the process X β . Let Proof. We will prove that X β t / ∈ [0, β] with high probability as β → 0 (the proof for [1 − β, 1] is analogous). Suppose that y is a solution of (27) with initial condition y(0) / ∈ [0, 2β] and that y(t) ∈ [0, β] for some t ∈ [0, T ]. Then there exists a time interval [s, s ′ ] such that y(s) = 2β, y(s ′ ) = β and y(u) ∈ [β, 2β] for every u ∈ [s, s ′ ]. Without loss of generality we can assume that [s, s ′ ] ⊆ [t k , t k+1 ) for some k (the other case is easily dealt with by further subdividing [β, 2β] into two equal subintervals and repeating the argument for each of them). By the mean value theorem Taking expectation yields Since X β is a permuton process, X β s has uniform distribution for each s, which gives Together with the inequality above this implies Since X β 0 has uniform distribution, we have P(X β 0 ∈ [0, 2β]) = 2β. Thus Since f (β) → 0 as β → 0, the claim is proved.
Proposition 3.7. Let π ∈ M(D) be the distribution of the process X and let π β,δ ∈ M(D) be the distribution of the process X β,δ . Then we have where I(µ) is the energy of the process µ defined in (10).
Proof. For the first convergence it is enough to show that E X − X β,δ sup → 0 in the coupling between X and X β,δ considered before. We have Let B β be the event from the statement of Lemma 3.5. Since the supremum norm is bounded by 1, we have , on the event (B β ) c we have X β = X, so the second term above is equal to 0. As for E X β − X β,δ sup , by Proposition 3.6 for fixed β > 0 we have with probability one X β − X β,δ sup → 0 as δ → 0, which together with the estimate on E X − X β sup proves the first claim of the theorem.
Consider β ′ < β/2 to be fixed later and let f be a smooth approximation of a step function which has values in [0, 1], is equal to 0 on [0, β −2β ′ ], equal to 1 on [β ′ −β, β] and is increasing and We will check that V β t (x, φ) indeed satisfies the desired properties.
Let us first check that the property (c) is satisfied for x ∈ [0, β]. We have As the functions in the formula above are continuously differentiable at x = β, V β t (x, φ) is continuously differentiable at x = β as well.
To see that property (d) is satisfied, we note that by continuity of V t (x, φ) and ∂Vt ∂x (x, φ) for x = 0, 1 we can take β ′ in the definition of f (x) above to be arbitrarily small (depending on V t , ∂Vt ∂x and β) so that on [β − 2β ′ , β] the function V t (x, φ) is less than |V t (β, φ)| + 1 in absolute value. Since on [0, β − 2β ′ ] we have V β t (x, φ) = 0, the desired bound on |V β t (x, φ)| follows.
Finally, to prove that property (e) holds it is enough to show that The claim follows immediately from property (d), since the integrand is bounded independently of β.

The biased interchange process and stationarity
The biased interchange process. For the sake of proving a large deviation lower bound, we will need to perturb the interchange process to obtain dynamics which typically exhibits (otherwise rare) behavior of a fixed permuton process. Let us introduce the biased interchange process. Its configuration space E consists of sequences η = ((x i , φ i )) N i=1 , where as before (x 1 , . . . , x N ) is a permutation of {1, . . . , N} and φ i has N possible values, 1, . . . , N. Here x i will be the position of the particle with label i and φ i will be its color.
By a slight abuse of notation we will write η −1 (x) to denote the label (number) of the particle at position x in configuration η (so that η −1 (x i ) = i). For a position x we will often write φ x as a shorthand for φ η −1 (x) (the positions will be always denoted by x or y and labels by i, so there is no risk of ambiguity). In this way we can treat any configuration η as a function which assigns to each site x a pair (η −1 (x), φ x ), the label and the color of the particle present at x The configuration at time t will be denoted by η N t (or simply η t ), and likewise by x i (η N t ) and φ i (η N t ) we denote the position and the color of the particle number i at time t. We will use notation for the rescaled positions and colors. By the same convention as above Φ x (η N t ) will denote the rescaled color of the particle at site x at time t.
Let ε = N 1−α , with the same α ∈ (1, 2) as in (12). Suppose we are given functions v, r : [0, T ] ×{1, . . . , N} ×{1, . . . , N}. The dynamics of the corresponding biased interchange process is defined by the (time-inhomogeneous) generator Here η x,x+1 is the configuration η with particles at locations x and x + 1 swapped, and η y,± is the configuration η with φ y changed by ±1 (with the convention that η y,+ = η y if φ y = N and likewise η y,− = η y if φ y = 1). We will often use the abbreviated notation v x (t, η) = v(t, x, φ x (η)) (with the convention v 0 (t, η) = v N +1 (t, η) = 0). In other words, at each time neighboring particles make a swap at rate close to 1, with bias proportional to the difference of their velocities v(t, x, φ x ), and each particle independently changes its color by ±1, also at rate close to 1 with bias proportional to ±r(t, x, φ x ). The parameter ε has been chosen so that we expect particles to have displacement of order N at macroscopic times.
Since the interchange process is a pure jump Markov process, for each particle its rescaled position X i (η N ) and color Φ i (η N ) will be càdlàg paths from [0, T ] to [0, 1] and thus elements of D. In the same way we can consider the joint trajectory P i (η N ) = (X i (η N ), Φ i (η N )) as an element of D = D([0, T ], [0, 1] 2 ), the space of cádlág paths from [0, T ] to [0, 1] 2 (equipped with the Skorokhod topology). By M( D) we will denote the space of Borel probability measures on D, endowed with the weak topology, and by a slight abuse of notation the corresponding Wasserstein distance will be denoted by d W , as for M(D).
If η N is the trajectory of the biased interchange process, then by analogy with the permutation process X η N we can define the colored permutation process P η N = (X η N , Φ η N ), obtained by choosing a particle i at random and following the path (X i (η N t ), Φ i (η N t )). Thus we keep track both of the position and the color of a random particle. Since η N is random, the distribution ν η N of P η N , given by is a random element of M( D).
Stationarity conditions. Let us now connect the discussion of the interchange process with deterministic permuton processes and generalized solutions to Euler equations considered in Section 3. Recall the colored trajectory process P β,δ = (X β,δ , Φ β,δ ) defined in Section 3. From now on we consider β ∈ (0, 1 4 ) and δ > 0 to be fixed and we suppress them in the notation, writing Note that this should not be confused with the actual generalized solution to Euler equations, which was also denoted by X, but does not appear in this and the following sections except in Theorem 7.3.
Our goal is to set up a biased interchange process so that typically trajectories of particles will behave like trajectories of the process X. We would also like to preserve the stationarity of the uniform distribution of colors, which will greatly facilitate parts of the argument. To find the correct rates v(t, x, φ) and r(t, x, φ) in (28), recall that by definition the trajectories of the colored trajectory process P = (X, Φ) satisfy the equation with the functions V and R satisfying for F (t, X, Φ) = Φ 0 V (t, X, ψ) dψ. Note that F (t, X, 0) = 0 and F (t, X, 1) = 0, where the latter equality follows from property (c) of V β t (x, φ) (and thus of V = V β,δ ). It is clear that v and r should be chosen so that approximately we have To analyze the stationarity condition, consider the uniform distribution on configurations of the biased interchange process, i.e., a distribution in which the labelling of particles is a uniformly random permutation and each particle has a uniformly random color, chosen indepedently from {1, . . . , N} for each of them. We want to find a condition on rates v(t, x, φ) and r(t, x, φ) such that this measure will be invariant for the dynamics of L t .
Note that since V (t, X, Φ), R(t, X, Φ) are piecewise-constant as functions of t, the dynamics induced by L t is time-homogeneous on each interval [t k , t k+1 ) from the definition (26) of V . Thus the stationarity condition for the uniform measure is that for each state (i.e., each configuration η) the sums of outgoing and incoming jump rates have to be equal. We write down this condition as follows. For any given configuration η, with particle at location x having color φ x = φ x (η), there are the following possible outgoing jumps: • for some x ∈ {1, . . . , N − 1} the particles at locations x and x + 1 swap, at rate • for some x ∈ {1, . . . , N} the particle at x changes its color from φ x to φ x ± 1, at rate 1 ± εr(t, x, φ x ) and incoming jumps: • for some x ∈ {1, . . . , N − 1} the particles at locations x and x + 1 swap, at rate • for some x ∈ {1, . . . , N} the particle at x changes its color from φ x ± 1 to φ x , at rate 1 ∓ εr(t, x, φ x ± 1) Thus the condition on the sums of jump rates is where we adopt the convention r(t, x, 0) = r(t, x, N + 1) = 0. This implies Since we would like this equation to be satisfied for any configuration, regardless of the choice of φ x for each x, we want each term in the sum and each of the boundary terms to vanish. This gives us a set of equations which have to be satisfed for every φ = 1, . . . , N.
Note that with this choice of rates we have for any uniformly in x, φ and t, because of smoothness of F (t, X, Φ) in X and Φ variables. In particular the rates v and r are uniformly bounded for all N. From now on we will always assume that the biased interchange process has rates v(t, x, φ) and r(t, x, φ) given by (32) and is started from the uniform distribution (which by the discussion above is stationary). The properties of v and r which will be relevant to our analysis is that they are bounded, approximately equal to some smooth functions V , R, that the corresponding dynamics has the uniform measure as the stationary distribution and, crucially, that in stationarity the velocities are independent and mean zero. This last property, which should be thought of as the particle system analog of Lemma 3.2, is conveniently summarized in the following proposition.
which by definition of v is equal to Recalling the definition of F below (30), the right-hand side is equal to 0.

Law of large numbers
Throughout this section P N will denote the probability law of the biased interchange process on N particles, started in stationarity, associated to the equation (29) (with all the assumptions from the previous section). To simplify notation we will usually write η = η N . Whenever we use o(·) or O(·) asymptotic notation the implicit constants will depend only on the rates v, r and possibly on T . Let P = (X, Φ) be the colored trajectory process associated to the equation (29) and let P η N be the colored permutation process defined in Section 4. Let us denote the distributions of P and P η N respectively by ν and ν η N , with ν, ν η N ∈ M( D). We will prove the following theorem In other words, the random processes P η N converge in distribution to the process P whose distribution is deterministic. The theorem above can be thought of as a law of large numbers for random permuton processes and it will be useful for establishing the large deviation lower bound.
Remark 5.2. Since the limiting measure ν is deterministic and supported on continuous trajectories, Theorem 5.1 implies that the convergence ν η N → ν in fact holds in a stronger sense, namely in probability when M( D) is endowed with the Wasserstein distance d sup W associated to the supremum norm on D.
To prove Theorem 5.1, we will show that typically trajectories of most particles approximately follow the same ODE (29) as trajectories of the limiting process. In other words, if a given particle is at site x, it should locally move according to its velocity v(t, x, φ x ). However, because of swaps between particles the actual jump rates of the particle will be influenced by velocities of its neighbors. Nevertheless, since velocity at each site has mean 0 in stationarity, we will be able to show that the contribution from velocities of the particle's neighbors cancels out when averaged over time -this will be the content of the one block estimate proved in the next section.
Note that to prove that the random processes converge indeed to a deterministic process, it is not enough to look only at single path distributions, as explained in Section 2.1. Nevertheless, we will show that in the interchange process typically any two particles (in fact almost all of them) behave like independent random walks, which by Lemma 2.1 will be enough to establish a deterministic limit.
Throughout this and the following sections we will make extensive use of martingales associated to Markov processes (see [KL99] for a comprehensive treatment of such techniques applied to interacting particle systems). For any Markov process with generator L and a bounded function F : E → R, where E is the configuration space of the process, the following processes are mean zero martingales ([KL99, Lemma A1.5.1]) Furthermore, for any F as above the following process is a mean one positive martingale (see discussion following [KL99, Lemma A1.7.1]) In the following sections we will also consider the case when F is not necessarily bounded, in which case M t , N t , M t are only local martingales. Our first goal is to prove that with high probability almost all particles move according to their local velocity v(t, x i , φ i ). Recall that are respectively the rescaled position and color of the particle with label i. Our first goal is to prove the following Proposition 5.3. For any fixed t ∈ [0, T ] and ε > 0 we have in the biased interchange process As a starting point let us rewrite X i (η t ) in a more useful form. Recall from (28) that L denotes the generator of the biased interchange process. By the formula (34) applied to F (η s ) = X i (η s ) we have where M i t is a mean zero martingale with respect to P N . Recall that v x (t, η) = v(t, x, φ x (η)) denotes the velocity of the particle at site x in configuration η at time t. For simplicity we will also write v x i (t, η) = v(t, x i (η), φ i (η)) for the velocity of the particle with label i. We have since the position of the particle i changes by ±1 depending on whether it makes a swap with its left or right neighbor. Thus we obtain or in other words For the sake of proving the first part of Proposition 5.3 it will be enough to show that as N → ∞. First we prove that for most particles the martingale term M i t will be small with high probability. Let us define . By the martingale formula (35) we have that is a mean zero martingale. A quick calculation gives so these two quantities are the same up to terms of order o(1). Thus Q i s = o(1) (uniformly in s and i) and, since EN i t = 0, we obtain from (39) that E(M i t ) 2 = o(1) as well. Incidentally, a similar calculation (only simpler, since it does not involve correlations between adjacent particles) and the martingale argument gives us that for for any fixed particle i. This proves the second part of Proposition 5.3.
Recalling (37) and (38), to finish the proof of the first part of Proposition 5.3 we only need to show that Recall from (26) that V β,δ (s, x, φ) was defined in terms of a partition 0 = t 0 < t 1 < . . . < t M = T . We would like to take advantage of the fact that on each interval the dynamics of the biased interchange process is time-homogeneous. Suppose that t ∈ [t l , t l+1 ) for some l ≤ M − 1 and let us write For any t ≥ 0 let Since M is fixed, it is enough to show that for any fixed as N → ∞.
To keep the notation simple we will prove the desired statement just for k = 0, with the general case being exactly analogous. Recall that t 0 = 0. By definition of the piecewiseconstant in time approximation of V β,δ , for s ∈ [0, t 1 ) we have v x (s, η s ) = v x (0, η s ). Let us define v x (η) = v x (0, η). Fix any t ∈ [0, t 1 ] and let us look at We will have four cross-terms here, it is enough to show that each of them is small in expectation. The argument will be similar in all cases, so we will only present the proof for one of them. Let us focus on For each particle i we are looking at the correlation of the velocity of its left neighbor at time u 1 with the velocity of its left neighbor at time u 2 . By averaging over particles i = 1, . . . , N and using the symmetry between u 1 and u 2 we can write the contribution to the second moment of Y t,0 Since the rates v are bounded, it is enough to show that for each fixed u 1 ∈ [0, t] the expression inside the bracket is close to 0 as N → ∞. Let us look at Since the average here depends only on the configuration at time u 1 and its evolution from that point on (and not otherwise on the trajectory of the process before time u 1 ), by stationarity of the biased interchange process it will be the same as since the dynamics of the process is time-homogeneous on [0, t 1 ). Thus we have to prove that for a random particle the velocity of its initial left neighbor is uncorrelated (when averaged over time) with the velocity of its current left neighbor. Let us introduce the following setup -we can rewrite the average above in terms of a sum over sites (for y = x i (η s )) instead over particles To analyze this average we introduce the following extension of the biased interchange process. Consider the extended configuration space E consisting of sequences ((x i , φ i , L i )) N i=1 , with L i ∈ {1, . . . , N}. Here each particle, in addition to its color φ i , also has an additional color L i in which we keep information about the velocity of its left neighbor at time 0, that is The dynamics is given by the same generator (28) as before, i.e., labels (together with their corresponding colors φ i and L i ) are exchanged by swaps of adjacent particles, each φ i has its own evolution and L i does not evolve. For a site x let L x (η) be the additional color at site x in configuration η, i.e., L x (η) = L η −1 (x) . We can now treat η as a function which assigns to each site x a triple (η −1 (x), φ x , L x ) or simply a pair (φ x , L x ), since we are not interested in particles' labels at this point, only in the distribution of colors. In this setup the average (41) can be written as where f y (η) = L y (η)v y−1 (η). Let Λ x,l = {x − l, x − l + 1, . . . , x + l}, denote a box of size l around x (with the convention that the box is truncated if the endpoints x − l or x + l exceed 1 or N, but this will not influence the argument in any substantial way) and let µ η x,l be the empirical distribution of colors in Λ x,l in configuration η, given for any (L, φ) by Consider the associated i.i.d. distribution on configurations restricted to Λ x,l , given by In other words, under the measure µ η x,l the probability of seeing a color pair (L, φ) at site y ∈ Λ x,l is proportional to the number of sites in Λ x,l with the color pair (L, φ), independently for each site.
The superexponential one block estimate says that on an event of high probability we can replace f y (η s ) in the time average (42) by its average E µ ηs y,l (f ) with respect to the local i.i.d. distribution over a sufficiently large box. In other words, due to local mixing the distribution of colors in a microscopic box can be approximated by an i.i.d. distribution for large l.
The lemma is proved in the next section. Let us see how it enables us to finish the proof of Proposition 5.3. By the one block estimate, in (42) we can replace with the difference going to 0 in expectation as first N → ∞ and then l → ∞, so we only need to show that the latter expression goes to 0 in the same limit.
Observe that in f y (η) = L y (η)v y−1 (η) = L y (η)v(y − 1, φ y−1 (η)) the colors φ y−1 and L y depend on different sites, so they are independent under µ ηs y,l , since the measure is product. Thus in the average above we can simply write E µ ηs y,l f y (η s ) = E µη s y,l [L y (η)v y−1 (η)] = E σ∼µ ηs y,l L y (σ) E σ∼µ ηs y,l v y−1 (σ) , where by a slight abuse of notation we have denoted by σ the local configuration of colors in a box Λ y,l and considered L y , v y−1 as functions of σ. The average (43) now becomes Since the distribution of η s in the biased interchange process process without the additional colors L i is stationary, the distribution of the average E σ∼µ ηs y,l v y−1 (σ) does not depend on s. So we only need to show that E σ∼µ η 0 y,l v y−1 (σ) is small, since L y is bounded. Recall that in stationarity φ y has uniform distribution, so for any y the expectation of v y−1 (σ) = v(0, y − 1, φ y−1 (σ)) with respect to µ η 0 y,l is simply equal to where φ j are independent and uniformly distributed on {1, . . . , N}. As for each x the random variables v(0, x, φ j ) are independent, bounded and have mean 0 (see Proposition 4.1), an easy application of Hoeffding's inequality gives that for fixed y the sum above goes to 0 in probability as l → ∞. This finishes the proof of Proposition 5.3.
We can now prove the law of large numbers.
Proof of Theorem 5.1. Consider the random particle processP N = (X N ,Φ N ), obtained by first sampling η = η N and then following the trajectory P i (η t ) = (X i (η t ), Φ i (η t )) of a randomly chosen particle i. We will first show that the (deterministic) distributionν N converges to ν, the distribution of P (in the metric d sup W ). Let us start by proving that the estimate from Proposition 5.3 holds not only at each time t, but also with the supremum over all times t ≤ T under the sum over particles. Consider the process (A N , B N ) defined as where i is a random particle and η = η N comes from the biased interchange process. Proposition 5.3 implies that all finite-dimensional marginals of (A N , B N ) converge to 0. To obtain convergence to 0 for the whole process in the supremum norm we only need to check tightness in the Skorokhod topology (which will imply convergence in the supremum norm, since the limiting process is continuous). We will use the following stopping time criterion ([KL99, Proposition 4.1.6]). Let Y N be a family of stochastic processes with sample paths in D such that for each time t ∈ [0, T ] the marginal distribution of Y N t is tight. If for every ε > 0 we have lim where the supremum is over all stopping times τ bounded by T , then the family Y N is tight.
Here · denotes the Euclidean distance on [0, 1] 2 and for simplicity we write τ + θ instead of (τ + θ) ∧ T . Let τ be any stopping time bounded by T . We have from formula 37 Since v(·, ·, ·) is bounded, the integral is bounded by Cθ for some constant C > 0, regardless of τ , so goes to 0 as θ → 0 (deterministically and for every i). Thus it only remains to bound the martingale term. As τ is a stopping time, by formula (39) we have for each i As in the calculation of E(M i t ) 2 following (39) we have that for fixed θ the right hand side is o(1) as N → ∞. Since M i t is bounded, we obtain E M i τ +θ − M i τ → 0 as N → ∞, for any θ and i (independently of τ ). The calculation for B N is analogous.
This shows that the family (A N , B N ) satisfies the tightness criterion (44). In particular it converges to 0 in the supremum norm as N → ∞. Thus for any ε > 0 we have as N → ∞. Now we can prove thatν N converges to ν. Recalling the definition of the Wasserstein distance d sup W , it is enough to construct for each N a coupling (P N , P ) such that Let us couple these two processes in the following way: first we letP N = X N t ,Φ N t , 0 ≤ t ≤ T be a path sampled according toν η N , starting at (X N 0 ,Φ N 0 ) (whose distribution is uniform on 1 N , . . . , 1 × 1 N , . . . , 1 ). We then take P (t) = (X(t), Φ(t)) to be the solution of the ODE (29) started from an initial condition (X(0), Φ(0)) chosen uniformly at random from (so the two processes start close to each other). Because the initial condition is distributed uniformly on [0, 1] 2 , the path P = (P (t), 0 ≤ t ≤ T ) will be distributed according to ν.
Thus (X N ,Φ N ) approximately satisfies the same ODE as (X, Φ) and an application of Grönwall's inequality gives that for any ε > 0 with probability approaching 1 as N → ∞ we have for some C > 0, where K > 0 depends only on the Lipschitz constants of V an R. By definition of the processesP N and P the initial conditionsX N (0), X(0) andΦ N (0), Φ(0) differ by at most 1 N , which implies that E P N − P sup → 0 as N → ∞. Thus the distributionν η N of the random particle processP N converges to ν in the d sup W metric as desired.
Now we can show that the random measures ν η N converge in distribution to the deterministic measure ν. By the characterization of tightness for random measures (see, e.g., [Kal21,Theorem 23.15]) the family ν η N will be tight, as a family of M( D)-valued random variables, if for any ε > 0 there exists a compact set K ⊆ D such that lim sup or, more simply put, lim sup N →∞ P N P η N ∈ K ≥ 1 − ε. Exactly the same calculation as for the processes (A N , B N ) before shows the processes P η N satisfy the tightness criterion (44), which guarantess the existence of desired compact sets K and in turn tightness of ν η N . Now to finish the proof we only need to show uniqueness of subsequential limits for the family ν η N . Since any such (possibly random) limit must have the associated random particle process distributed according to ν, it is enough to show that the limit is deterministic.
Consider an outcome of ν η N , which is a measure from M( D), and sample independently two paths P N 1 , P N 2 from it. This corresponds to sampling η N according to the biased interchange process, then choosing uniformly at random a pair of particles i, j (possibly with i = j, but this event has vanishing probability) and following their trajectories in η N . By the already established convergenceν N → ν in M( D), each path P N 1 and P N 2 separately has distribution converging to ν. Moreover, due to stationarity of η N the initial colors φ i (η N 0 ), φ j (η N 0 ) of any two particles i, j are chosen uniformly at random, in particular they are independent for i = j. Thus the joint distribution of (P N 1 , P N 2 ) converges to the distribution of two independent paths sampled from ν, as a path P sampled from ν is uniquely determined by its initial conditions. Since we already have tightness, applying Lemma 2.1 gives that any limit of a subsequence has to be deterministic, which finishes the proof.

One block estimate
In this section we prove the one block estimate of Lemma 5.4, needed for the proof of Theorem 5.1. Since another, simpler variant of this estimate will also be needed for the proof of the large deviation upper bound (Lemma 8.2), we prove the result in generality suited for both of these applications.
Let us fix a continuous function w : [0, 1] 2 → R and let . . , 1 . Consider the interchange process on an extended configuration space E ′ in which each particle in addition to its label i has two colors (a i , φ i ), with a i ∈ I N w , φ i ∈ I N . The dynamics is given by the usual generator L -adjacent particles are making swaps at rate 1 2 N α and the colors a i , φ i of the particle i do not evolve in time. Since the one block estimate concerns only the distribution of colors, from now on we ignore the labels of the particles altogether. Similarly as before we use the notation a x = a x (η), φ x = φ x (η) to denote the colors of the particle at site x in configuration η. The configuration at time s is denoted by η s .
Consider a continuous function g : . As in the previous section let Λ x,l = {x − l, x − l + 1, . . . , x + l} denote the box of size l around x (with an appropriate truncation if the endpoints x − l or x + l exceed 1 or N, which we neglect in the notation from now on) and let µ η x,l be the empirical distribution of colors in Λ x,l in configuration η, given for any (α, ϕ) ∈ I N w × I N by Consider the associated i.i.d. distribution on configurations restricted to Λ x,l , given for (α y , ϕ y ) x+l y=x−l ∈ I N w × I N 2l+1 by Since h x depends on η only through the colors at x and x − 1, we will slightly abuse notation by writing E µ η x,l (h x ) for the expectation of h x with respect to µ η x,l . Let ψ : [0, 1] → R be a continuous function and let Let µ denote the uniform distribution on E ′ . Note that the dynamics given by L is reversible with respect to µ and the associated Dirichlet form is given by Lemma 6.1. With µ denoting the uniform distribution on E ′ , we have for any C 0 > 0 lim sup where γ = 3 − α and the supremum is over all densities f with respect to µ such that Proof. Let us decompose a x = a x (η) and b x = b x (η) into their positive and negative parts, x−1 , by the triangle inequality it is enough to prove the lemma with h x replaced by one of the terms in the sum above, say, a + x b + x−1 . Let K = max{1, w ∞ } and let us write where the inequality comes from pulling the integrals over λ and θ outside the absolute value. Let us denote the expression under the integrals on the right hand side by U N l,λ,θ . Since it is nonnegative and bounded, we can write where the supremum is over all densities f satisfying D N (f ) ≤ C 0 N γ . By the same token, when taking the lim sup first over N and then over l, we can bound the resulting limit from above by one with the integral over λ and θ outside the lim sup. Thus we see that it is enough to prove for fixed λ, θ ∈ [0, K] we have reduced the problem to proving the one block estimate for the interchange process in which each particle has only four possible colors, corresponding to the possible values of the pair (½ {a i (η)>λ} , ½ {b i (η)>θ} ). This in turn follows by essentially the same argument as for the simple exclusion process, which can be thought of as interchange process with just two colors (see e.g., [KL99, Lemma 5.3.1]). Since the argument is by now standard and used in several places in the literature (see e.g., [FT04] for the case of three possible colors), let us only explain that the bound on the Dirichlet form under the supremum is of the right order. The argument for the simple exclusion process goes through (see the remark following the proof of [KL99, Lemma 5.4.2]) if we assume that the Dirichlet form corresponding to the generator without time scaling is o(N) and the process is speeded up by N 2 . In our case the generator L has a scaling factor of N α , so if N −α D N (f ) is the Dirichlet form corresponding to the process without time scaling, then our bound on this Dirichlet form is ≤ C 0 N γ−α = C 0 N 3−2α . Since α ∈ (1, 2), this is o(N), which agrees with the assumptions for the simple exclusion process.
Lemma 6.2. Let P N denote the law of the interchange process on E ′ with an arbitrary initial distribution. With the notation as above we have for any t ≥ 0 and δ > 0 lim sup Proof. Let µ 0 be an arbitrary initial distribution. Let P N 0,µ , resp. P N 0,µ 0 , denote the distribution of the process started from µ, resp µ 0 , and let E µ , resp. E µ 0 , denote the corresponding expectation.
By Chebyshev's inequality we have for any c > 0 We also have Since M ≤ N 2 and under µ each initial configuration has probability (MN) N = e o(N γ ) , the supremum norm of the Radon-Nikodym derivative above is e o(N γ ) as well, so to prove (47) it is in fact enough to show that for any c > 0 lim sup and then take c → ∞. An application of Feynman-Kac formula to the semigroup generated by L shows (see e.g., [KL99, Theorem 10.3.1 and Section A1.7]) that to obtain (48) it is sufficient to prove for any c > 0 lim sup where the supremum is taken over all densities with respect to µ. Since U N l is bounded by a constant C > 0 depending only on ψ and g, the expression under the supremum becomes negative if D N (f ) > cCN γ . Thus it is enough to show that for any constant C 0 > 0 we have which exactly the statement of Lemma 6.1.
This estimate will be enough for application in the proof of Lemma 8.2. As for the proof of Lemma 5.4, we will first show that the one block estimate holds for the unbiased process with color evolution, but with all rates equal to 1, i.e., the process with state space E ′ and the generator Here as usual η x,± denotes the configuration obtained from η by changing the color φ x of the particle at site x to φ x ± 1 (note that the colors a i do not evolve in time here). We will then transfer the result to the biased process by estimating its Radon-Nikodym derivative.
Lemma 6.3. Let P N 0 be the law of the unbiased process with rates 1 described above (with an arbitrary initial distribution). With the notation from Lemma 6.2, we have for any t ≥ 0 and δ > 0 Proof. Let us write L 0 = L + L c , where L is the first term in the definition of L 0 and L c is the second term. The dynamics induced by L and by L c is reversible with respect to µ, so the Dirichlet forms associated respectively to L c and L 0 can be written as By repeating the argument from the proof of Lemma 6.2 with the generator L 0 instead of L we obtain that it is enough to prove that for any c > 0 lim sup where the supremum is taken over all densities with respect to µ. Now observe that since D N c (f ) ≥ 0 for any nonnegative f , it is in fact enough to prove the statement above with D N 0 (f ) replaced by D N (f ). Thus we have eliminated color evolution and the conclusion follows as in the proof of Lemma 6.2.
We can now prove the superexponential estimate for the biased process.
Proof of Lemma 5.4. Recall that f x (η) = L x (η)v x−1 (η). Since we can uniformly approximate v(0, x, φ) by finite sums of terms which are product in x and φ, by using the triangle inequality we can without loss of generality assume that v x (η) = ψ(x)g(φ x ) for some continuous functions ψ : [0, 1] → R, g : [0, 1] → [−1, 1]. Applying Lemma 6.3 with w(x, φ) = v(0, x, φ), a i = L i and h x = L x g(φ x−1 ) provides us with the superexponential estimate for the process P N 0 . To transfer the estimate to the biased process P N we will need to estimate the Radon-Nikodym derivative of the two processes.
If P is a Markov process with jump rates λ(x)p(x, y) and P is another process on the same state space with rates λ(x) p(x, y), the Radon-Nikodym derivative up to time t is given by (see, e.g., [KL99, Proposition A1.2.6]) where the sum is over jump times s ≤ t.
Let us look at d P N dP N 0 . By the form (28) of the generator of P N the sum of outgoing rates for any η is equal to Since the sum of ε(v x − v x+1 ) telescopes, the rates v x are 0 at the boundaries x = 1, N and rates r x for the color change cancel out, the intensities λ and λ cancel out as well. The Radon-Nikodym derivative takes the form where j s is the label of the particle which makes a swap at time s and j s ± is the label of the particle that changes its color by ±1 at time s ± .
To simplify this formula we will use the fact that empirical currents across edges can be approximated by their averages, modulo a small martingale. More precisely, let us denote for simplicity We will sometimes use this notation with x = N, in which case we assume (∇ x v)(η) = 0. For brevity of notation whenever sums involving both r x and −r x appear, we will write them as one term with a ± sign, that is, with x (1 ± εr x ) serving as a shorthand for (1 − εr x ) and so on.
We introduce the following extension of the dynamics under P N 0 -for any functions h(x, η), h ± (x, η), x ∈ {1, . . . , N} consider the extended state space E ′ , consisting of pairs (η, J), J ∈ R, and the generator L ′ acting by In other words, in the evolution of the extended configuration (η t , J t ) each time the process makes a jump, J t is increased by h(x, η t ), h + (x, η t ) or h − (x, η t ), depending on the type of the jump (swap or color change). Now if we take we see that J t is simply equal to the sum over jumps appearing in the exponent in (50). Thus to bound the Radon-Nikodym derivative we only need to bound J t . This is done by use of an exponential martingale -for any λ > 0 the following process is a local martingale with respect to P N 0 . We will actually only need to consider λ = 2. Writing out the action of L ′ on the function g(η, J) = e 2J we obtain e 2 log(1+ε(∇xv)(ηs)) − 1 + e 2 log(1±εrx(ηs) − 1 ds    .

Now we have
The sum of terms linear in ε vanishes -the rates r for ±1 color change have opposite sign and the sum involving ∇ x v telescopes. Recalling that ε = N 1−α and γ = 3 − α, so N α+1 ε 2 = N γ , we can then write Since the rates v and r are bounded, we have Z t = e 2Jt−N γ Xt , where |X t | ≤ C for some constant C > 0 depending only on v, r and T . In particular we get Since Z t is a local martingale bounded from below, it is a supermartingale, so we have EZ t ≤ EZ 0 = 1 and thus Now we can transfer the superexponential bound of Lemma 6.3 from P N 0 to P N . Let O N,l be the event from the statement of the lemma and let us write simply d P N Denoting by E the expectation with respect to P N we have Applying the Cauchy-Schwarz inequality gives Recalling that d P N dP N 0 = e J T and applying the bound (51) we obtain and taking lim sup as l → ∞ together with an application of Lemma 6.3 finishes the proof.

Large deviation lower bound
In this section we prove the large deviation lower bound of Theorem A. Let us assume that the permuton process X satisfies equations 29. Since we already know how to construct a biased interchange process that will typically display the behavior of X, to bound the probability that the trajectory of a random particle in the interchange process is close in distribution to X we only need to compare the unbiased process with the biased one by means of calculating their Radon-Nikodym derivative. Since these two processes have different configuration spaces, for convenience we introduce the unbiased interchange process with colors, which has the same configuration space as the biased process associated to 29 and the generator L u obtained by putting all velocities v to 0 Since here the colors do not influence the dynamics of swaps, the corresponding permutation process X η N will be the same as for the ordinary unbiased interchange process (and we will never be interested in the distribution of Φ η N for the unbiased process with colors). Let us start by deriving the formula for the Radon-Nikodym derivative of the unbiased process with colors with respect to the biased one. Recall that v x (s, η s ) = v(s, x, φ x (η s )) denotes the velocity at time s of the particle at site x. Let P N u denote the law of the unbiased process with colors. We will prove the following statement Lemma 7.1. We have where the o(1) term goes to 0 in probability as N → ∞.
Proof. The calculation is similar as in the proof of Lemma 5.4, with the difference that we are using generator L instead of L 0 . By the analog of formula (49) for time-inhomogeneous processes we have where the sum is over jump times s ≤ t.
Denoting the sum in the exponent by J t , we obtain by (34) (by considering as before the generator L acting on an extended configuration space) that where M t is a local martingale with respect to P N . Expanding all terms up to order ε 2 allows us to write As before the term linear in ε vanishes. Recalling that ε = N 1−α and γ = 3 − α we have The martingale term will be typically o(N γ ). To see this, we use formula (35) -by performing a calculation similar to the one above we get that is a local martingale with respect to P N . By expanding the log terms up to ε 2 we see that the second term above is bounded by CN α+1 ε 2 = CN γ for some C > 0. In particular N t is bounded from below, so it is a supermartingale. Thus EN T ≤ EN 0 = 0 and EM 2 T ≤ CN γ , so Chebyshev's inequality implies that M T = o(N γ ) with high probability.
The second sum in the exponent in (53) will be small by invariance of the uniform distribution of colors in the biased process. More precisely, at fixed time s for each x the correlation term v x (s, η s )v x+1 (s, η s ) has mean 0, since η s has stationary distribution and by Proposition 4.1 in stationarity velocities at different sites are independent with mean 0. Moreover, for the same reason these terms are uncorrelated for different x, so by the weak law of large numbers we get that for any s ≤ T and δ > 0 Since this holds for any fixed s and the random variables are bounded, we also have as N → ∞, which proves that the correlation term is o(N γ ) with high probability. Together with the bound on M t this proves the desired formula for the Radon-Nikodym derivative.
We can now use Lemma 7.1 and the law of large numbers established in Theorem 5.1 to prove a large deviation lower bound for the interchange process. As the formula from the lemma suggests, the large deviation rate function will be related to the energy of the process to which the biased interchange process converges.
Recall from (10) that for any process π ∈ P its energy was defined by where E(γ) is the Dirichlet energy of the path γ defined by (7). We have the following large deviation lower bound Theorem 7.2. Let P N be the law of the unbiased interchange process η N and let µ η N be the (random) distribution of the corresponding permutation process X η N . Let P = (X, Φ) be the colored trajectory process associated to the equation ( Proof. It will be enough to show the bound above for O being any open ball B(µ, ε) in P around µ. Let P N u be the distribution of the unbiased process with colors, ν η N the distribution of the colored permutation process P η N = (X η N , Φ η N ) associated to η N . Let ν denote the distribution of P = (X, Φ) and B(ν, ε) an open ball around ν in M( D). Since the projection (X, Φ) → X is continuous as a map from D to D, the corresponding projection from M( D) to M(D) is also continuous. As µ η N has the same law under P N and P N u (remember that in the latter process the colors do not influence the dynamics of swaps), we have that for any ε > 0 there exists ε ′ > 0 such that P N µ η N ∈ B(µ, ε) ≥ P N u ν η N ∈ B(ν, ε ′ ) . Thus to prove the large deviation bound it is sufficient to prove the local lower bound Recall that P N denotes the distribution of the biased process associated to (29) and consider the Radon-Nikodym derivative dP N u d P N (t). By Lemma 7.1 we have where Y N goes to 0 in probability as N → ∞. Now by the law of large numbers from Theorem 5.1 and Remark 5.2 the distributions ν η N converge in probability in the d sup W metric to ν when η N is sampled according to P N . Thus for any ε > 0 and an open ball Since convergence in d sup W implies convergence in d W , to prove (54) it is enough to analyze the probability P N u ν η N ∈ B ε .
Fix arbitrary δ > 0 and let With E denoting the expectation with respect to P N u and E with respect to P N we have for any ε > 0 and sufficiently large N We have lim N →∞ P N (V N,ε ) = 1 and on the event U N we have

This implies
where Now it is not difficult to see that the infimum on the right hand side of (56) converges to I(µ) as N → ∞ and then ε → 0. When (X, Φ) is sampled from ν, X is the solution of (29) with a uniformly random initial condition, so the energy I(µ) is simply equal to where the expectation is with respect to the choice of (X(0), Φ(0)). Recall the notation In light of (33) what we need to show is that as N → ∞ and then ε → 0. Consider the trajectory η N and for any particle i let (X i (t), Φ i (t)) denote the solution of (29) corresponding to the initial condition (X i (η N 0 ), Φ i (η N 0 )). Since the velocities V are bounded, we can write for some C, K > 0 depending on the bound on V and the Lipschitz constant of V . Now note that if ν η N ∈ B ε , then by considering the same coupling as in the proof of Theorem 5.1 we have for some ε ′ > 0 satisfying ε ′ → 0 as ε → 0. Since {ν η N ∈ B ε } ⊆ V N,ε , combining this with (58) we obtain that the left hand side of (57) converges to as N → ∞ and then ε → 0.
Since (X i (t), Φ i (t)) is a solution of (29) and V is the derivative of X, the integral is equal simply to the energy of the path X i (t). Since for each i the initial condition Φ i (η N 0 ) has uniform distribution on 1 N , . . . , 1 , independently for all i, it follows easily that this expression converges with high probability to the expected energy on the right hand side of (57). This implies inf η∈V N,ε I N (η) → I(ν) as N → ∞ and then ε → 0. Since in (56) we can take δ to be arbitrarily small, this proves (54) and finishes the proof of the lower bound.
With this theorem the large deviation lower bound for generalized solutions to Euler equations, announced as Theorem A in the introduction, is now an easy corollary.
Theorem 7.3. Let P N be the law of the interchange process η N and let µ η N be the (random) distribution of the corresponding permutation process X η N . Let π be a permuton process which is a generalized solution to Euler equations (19). Provided π satisfies Assumptions (3.1), for any open set O ⊆ P such that π ∈ O we have Proof. Let π β,δ be the distribution of the process X β,δ defined in Section 3. By the first part of Proposition 3.7 we have d sup W (π, π β,δ ) → 0 as first δ and then β → 0, in particular for small enough δ and β we have π β,δ ∈ O. Then Theorem 7.2 implies that Since by the second part of Proposition 3.7 we have lim β→0 lim δ→0 I(π β,δ ) = I(π), the lower bound is proved.

Large deviation upper bound
In this section we prove Theorem B, a large deviation upper bound for the distribution of the interchange process (we will drop the term "unbiased" from now on). As a first step we will bound the probability that after a (possibly short) time t > 0 we see a fixed permutation in the interchange process. This is summarized in the following Proposition 8.1. Let P N be the law of the interchange process, with η = η N denoting the trajectory of the process. Let σ N ∈ S N be a sequence of permutations. For any t > 0 we have where I(σ) is the energy of the permutation σ defined in (8).
In other words, the large deviation rate of seeing a permutation σ at time t in the interchange process is asymptotically bounded from above by 1 t times the energy of the permutation σ.
To prove Proposition 8.1 we will employ exponential martingales. The idea is as follows -if M S (η) is a function of the process (depending on some set of parameters S) which is a positive mean one martingale, then for any permutation σ ∈ S N we can write where the supremum is over all deterministic permutation-valued paths χ = (χ s , 0 ≤ s ≤ T ) satisfying χ −1 0 χ t = σ and the last inequality comes from the fact that M S (χ) is a positive mean one martingale. If M S depends only on the increment χ −1 0 χ t , we obtain a particularly simple expression P N (η −1 0 η t = σ) ≤ M S (σ) −1 . We can then optimize over the set of parameters S to obtain a large deviation upper bound. The family of martingales we will use is similar to the one used in analyzing large deviations for a simple random walk.
Fix t > 0 and a sequence S = (s 1 , . . . , s N ), with We will think of s i as "velocity" assigned to the particle i. Consider the function where x i (η t ) is the position of the particle i in the configuration η t . If L is the generator of the interchange process, given by (12), then by the formula (36) for exponential martingales we obtain that is a mean one positive martingale with respect to P N . For simplicity we will use the same notation s x (η) = s η −1 (x) as for velocities v x of particles in the previous sections (with the convention that i denotes labels of particles and x denotes the positions), although bear in mind that now s x are just parameters, not related in any way to the the biased interchange process considered in the preceding sections. We have Expanding up to order ε 2 we get where the constants in the O(·) notation depend on t (which is fixed). Observe that the sum of s x − s x+1 telescopes, leaving only terms with s 1 and s N , which are O(N α ε) = o(N γ ).
Rescaling by appropriate powers of N and expressing the exponents in terms of the large deviation exponent γ we get Expanding (s x − s x+1 ) 2 we obtain (after adding and subtracting the boundary terms s 2 1 , s 2 N which are only o(1) after rescaling) twice the sum of s 2 x and the sum of mixed terms s x s x+1 . Since s 2 i does not depend on time, we can write As in the proof of the law of large numbers we want to use the one block estimate to get rid of the sum involving correlations between s x for adjacent x. This time the correlation term might not be small, since s i are arbitrary, but typically it will be nonnegative, so we can neglect it for the sake of the upper bound. More precisely, we have the following Lemma 8.2. Let P N be the law of the interchange process. Fix t > 0 and let t . Then, with notation as above, we have for any δ > 0 lim sup Proof. We employ Lemma 6.2 with w(x, φ) = 2x−1 t , a i = s i and b x (η) = a x (η), in particular h x (η) = s x (η)s x−1 (η). As in the lemma consider E µ ηs x,l (s x (η)s x+1 (η)), where µ ηs x,l is the empirical distribution of a i in a box Λ x,l . Let us write Since under µ ηs x,l the colors are i.i.d. random variables, we have E µ ηs x,l [s x (η)s x+1 (η)] = E µ ηs x,l s x (η) 2 ≥ 0, so the second term on the right hand side of (62) is nonnegative for every l. Lemma 6.2 guarantees that for any δ > 0 lim sup Since the left hand side of (62) does not depend on l, this finishes the proof.
With this lemma the proof of Proposition 8.1 is rather straightforward.
Proof of Proposition 8.1. Lemma 8.2 implies that for any a > 0 there exist sets O N,a such that on O N,a we have Now we can use the strategy outlined earlier with the positive mean one martingale M S t (η). We write On O N,a we can use the bound (63) obtained above. Note also that on the event {η −1 which together with (60) leads us to where To optimize over the choice of S = (s 1 , . . . , s N ), observe that I S (σ N ) is quadratic in s i , so an easy calculation shows that the optimal choice is which is valid, since we assumed This gives the maximal value of I S (σ N ) equal to 1 2 which is exactly the energy I(σ N ) rescaled by t. Inserting this into (64) gives us Since lim sup n→∞ 1 n log(a n + b n ) = max{lim sup n→∞ 1 n log a n , lim sup The second lim sup is −∞ and by taking a → 0 we arrive at as desired.
We can readily extend the bound from Proposition 8.1 to all finite-dimensional distributions of the interchange process as follows. Fix a finite set of times 0 ≤ t 0 < t 1 < . . . < t k ≤ T and for clarity of notation let us write η t 0 ,...,t k = (η −1 t 0 η t 1 , . . . , η −1 t k−1 η t k ) for the corresponding sequence of increments of η. Suppose we want to bound the probability P N ( η t 0 ,...,t k = (σ N 1 , . . . , σ N k )), where (σ N 1 , . . . , σ N k ) is a fixed sequence of permutations for each N, σ N j ∈ S N . Recall that the interchange process has independent increments, i.e., the permutations η t j −1 η t j+1 for any family non-overlapping intervals [t j , t j+1 ) are independent.
Therefore we can write As the interchange process is stationary, we have P N (η −1 Recall that µ η N denotes the distribution of the random permutation process associated to η N (defined by (5)) and for a finite partition Π by I Π (µ η N ) we denote the approximation of energy of µ η N associated to Π (defined by (11)). From equation (65) we obtain the following corollary which will be useful later Corollary 8.3. For any C > 0 and any finite partition Proof. Consider the set A N C of all sequences of permutations (σ N 1 , . . . , σ N k ), σ N j ∈ S N , such that k j=1 1 t j −t j−1 I(σ N j ) ≥ C. By performing a union bound over all such sequences we get Now it is enough to observe that for fixed k we have log(N! k ) = o(N γ ) and apply (65).
Now we can proceed to prove a general large deviation upper bound, announced as Theorem B in the introduction,. Recall that P ⊆ M(D) denotes the space of all permuton and approximate permuton processes.
Theorem 8.4. Let P N be the law of the interchange process η N and let µ η N be the (random) distribution of the corresponding random permutation process X η N . For any closed set C ⊆ P we have lim sup where I(π) is the energy of the process π defined by (10).
Proof. It is standard (see, e.g., [Var16, Lemma 2.3]) that the large deviation upper bound for closed sets follows from a local upper bound for open balls and exponential tightness of the sequence µ η N . The exponential tightness part will be proved in Proposition 8.5 below, so here we focus on the first part, that is, we will prove that for any π ∈ P we have lim sup where B(π, ε) denotes the open ε-ball around π in the Wasserstein distance d W on P.
Fix a finite set of times 0 = t 0 < t 1 < . . . < t k = T . Since almost surely the interchange process does not make jumps at any of the prescribed times t 0 , t 1 , . . . , t k , by continuity of projections for any ε > 0 there exists ε ′ > 0 such that where d denotes the Wasserstein distance on M([0, 1] 2 ). Furthermore, note that the permutation process with distribution µ η N has independent increments, i.e., the permutations for any family non-overlapping intervals [t j , t j+1 ) are independent. Thus we can write In this way we have reduced the problem to bounding the probability that the random measure µ η N t i ,t i+1 is close to a fixed permuton π t i ,t i+1 . Fix i and consider all permutations σ ∈ S N such that the empirical measure µ σ satisfies d(µ σ , π t i ,t i+1 ) < ε. As there are at most N! such permutations, by performing a union bound over this set we obtain where on the right hand side we have the probability that the random measure µ η N t i ,t i+1 is equal to µ σ . This probability is simply equal to P N η N t i −1 η N t i+1 = σ) and by stationarity of the interchange process this is the same as P N η N 0 −1 η N t i+1 −t i = σ) . By employing Proposition 8.1, with σ N ∈ S N being any permutation attaining the supremum above, and noticing that log Now observe that for any σ such that d(µ σ , π t i ,t i+1 ) < ε the energy I(σ) = I(µ σ ) has to be close to I(π t i ,t i+1 ), the energy of the permuton π t i ,t i+1 (recall definition 9), since I is continuous in the weak topology on M([0, 1] 2 ). Thus upon taking ε → 0 we obtain Applying this estimate to the product in (68) and observing that in (67) without loss of generality we can assume ε ′ ≤ ε, we arrive at the following bound lim sup ε→0 lim sup N →∞ Since t 0 , t 1 , . . . , t k were arbitrary, by optimizing over all finite partitions Π = {0 = t 0 < t 1 < . . . < t k = T } we obtain lim sup ε→0 lim sup N →∞ Recalling the definitions (6), (10) and (11), to prove (66) it remains to show that we have I(π) = sup Π I Π (π), which is exactly the statement of Lemma 2.2.
Proposition 8.5. The family of measures µ η N is exponentially tight, that is, there exists a sequence of compact sets K m ⊆ P such that and K m,r = K w m,r ∩ K 0 m,r ∩ K T m,r , where ε k = 4 −k and δ k (m, r) will be appropriately chosen later. We will assume that for fixed m, r we have lim k→∞ δ k (m, r) = 0 and that for any k ≥ 1 both T δ k and δ k (m,r) δ k+1 (m,r) are integer (the latter assumption is for simplicity of notation only). Note that by the aforementioned compactness conditions each set K m,r has compact closure in D. Let µ ∈ M(D) µ(K m,r ) ≥ 1 − 1 r .
We claim that K m has compact closure in M(D). Indeed, by Prokhorov's theorem it is enough to prove that K m is tight. If µ ∈ K m , then for any r ≥ 1 we have µ(K c m,r ) ≤ 1 r , so the sets K m,r form the family of compact sets needed for tightness of K m .
The sets K m (possibly after taking their closures) will form the family of compact sets needed for exponential tightness. Thus our goal is to bound P N (µ η N / ∈ K m ). Let us write It is enough to show that for any m, r ≥ 1 and any N ≥ 1 we have where C > 0 is some global constant. For any given m and r, observe that µ η N (K c m,r ) ≥ 1 r means that in η N we have at least N r particles with paths f / ∈ K m,r . Since K m,r = K w m,r ∩ K 0 m,r ∩ K T m,r , clearly it is enough to estimate separately the probabilities that at least N 3r particles have paths respectively not in K w m,r , K 0 m,r or K T m,r . The argument for K 0 m,r and K T m,r is much simpler, so we skip it and concentrate only on the case of K w m,r . For simplicity we will write α(r) = 1 3r For fixed m and r we will call a path f bad if w ′′ δ k (m,r) (f ) > ε k (m, r) for some k ≥ 1. We will call f bad exactly at scale k if w ′′ δ k (m,r) (f ) > ε k (m, r), but w ′′ δ j (m,r) (f ) ≤ ε j (m, r) for all j ≥ k + 1. Recalling the definition of the set K w m,r , the event whose probability we would like to bound is A m,r N = {there exist ≥ α(r)N particles with bad paths} .
Consider now the events B m,r,k N = there exist ≥ α(r) 2 k N particles whose paths are bad exactly at scale k .
Note that if f is a bad path with jumps of fixed size 1 N , then there exists k ≥ 1 such that f is bad exactly at scale k (since all paths we are considering are cádlàg). Thus we have Thus it is enough to show that for any m, r, k ≥ 1 and any N ≥ 1 we have From now on we fix m, r, k and N. All paths we are considering are assumed to come from the interchange process η N , in particular they have jumps of fixed size 1 N . For the sake of brevity we will simply write δ k = δ k (m, r).

ℓ=0
∆ ℓ (f ) > ε k 4 . From this we obtain ε 2 where the right-hand side estimate follows from the Cauchy-Schwarz inequality. Now let us suppose that the event B m,r,k N holds. Then there exist at least α(r) 2 k N paths f i for which the estimate (71) holds. Consider the partition Π = {0 = t 0 < t 1 < . . . < t n = T } where n = T δ k+1 , t j = jδ k+1 for j = 0, . . . , n. Recalling that f i = 1 N η N (i), the definition of ∆ ℓ (f ) and the definition (11) of the energy I Π (µ η N ) we obtain that on B m,r,k N we have Writing again δ k = δ k (m, r), we have thus obtained the bound Recalling ε k = 4 −k , α(r) = 1 3r , we see that to prove (70) it is sufficient to take δ k (m, r) small enough so that 4 −2k δ k (m, r) 1 3r2 k+5 ≥ 2mrk. By applying Corollary 8.3 we obtain that for N large enough. By taking δ k (m, r) even smaller if necessary we can make this estimate true for all values of N ≥ 1, which proves (70) and finishes the proof of exponential tightness.

Asymptotics of relaxed sorting networks
In this section we prove the limiting behavior of random relaxed sorting networks, given by Theorem 1.1, and the asymptotic counting formula of Theorem 1.2. With the large deviation bounds obtained in the preceding sections both of the proofs are now rather straightforward.
Proof of Theorem 1.1. Let R ⊆ P be the set of permuton processes X reaching exactly the reverse permuton at time 1, i.e., such that (X 0 , X 1 ) ∼ (X, 1 − X), and likewise let R N be the set of permutation processes on N elements reaching exactly the reverse permutation rev N = (N . . . 2 1) at time 1. Let R δ denote the δ-neighborhood in the Wasserstein distance on P of the set R ∪ N ≥1 R N .
Let η N be the interchange process with α = 1 + κ ∈ (1, 2) and let µ η N be the distribution of the corresponding permutation process. By definition of a random relaxed sorting network, for any given δ > 0 we have for sufficiently large N P N π N δ ∈ B(π A , ε) = P N µ η N ∈ B(π A , ε) µ η N ∈ R δ .
This is because if the process has done k swaps up to time T k and µ η N 0,T k ∈ S δ , then with high probability µ η N 0,T M ∈ S δ as well, since S δ is an open set in M([0, 1] 2 ) and the additional number of steps done between T k and T M is ≤ 1 2 N α (N − 1) = o(N 3 ), so typically almost all particles have negligible displacement.
On the other hand, since in the interchange process each sequence of swaps of given length is equally likely, we have (1 + o(1)).
Since under P N J has Poisson distribution with mean 1 2 N α (N − 1), we have P N (J ≤ M) → 1/2 as N → ∞.
To estimate the left-hand side, let P N be the law of the biased interchange process corresponding to the sine curve process π A . Recall Lemma 7.1 and for fixed ε > 0 let A be the event that the o(1) term in the formula for dP N u d P N (T ) is at most ε. Let us write By Theorem 5.1 µ η N 0,T ∈ S δ has high probability under P N and, since the particle swap rates for the biased process sum up to 1 2 N α (N − 1) (recall (28)), we have similarly as for the unbiased process P N (J ≤ M) → 1/2 as N → ∞. By Lemma 7.1 A is a high probability event under P N as well.
To estimate the remaining probabilities, we employ the formula for the Radon-Nikodym derivative from Lemma 7.1. Since in the biased process with high probability the energy term in the derivative is close to I(π A ) = π 2 6 , we obtain P N µ η N 0,T ∈ S δ ∩ {J ≤ M} ∩ A P N µ η N 0,T ∈ S δ ∩ {J ≤ M} ∩ A ≥ e −N γ π 2 6 +ε +o(N γ ) , where γ = 3 − α.
Since |P N M | = (N − 1) M = e ⌊ 1 2 N α (N −1)⌋ log(N −1) and ε was arbitrary, we obtain the asymptotic lower bound on |S N κ,δ | as claimed. For the upper bound, let R δ be as in the previous theorem. By the large deviation upper bound of Theorem 8.4 we have Since I is lower semi-continuous, given any ε > 0 we have for all sufficiently small δ > 0 inf µ∈R δ where again we have used the energy minimization property of π A . Since I(π A ) = π 2 6 , this implies that for any ε > 0 and sufficiently small δ > 0 P N µ η N ∈ R δ ≤ e −N γ (I(π A )−ε+o(1)) .

Now we estimate
and use the same asymptotic estimate for |P N M | as in the lower bound. Since J is Poisson with mean 1 2 N α (N − 1) under P N , the second term on the right-hand side is e O(log N ) . Altogether we obtain which proves the desired asymptotic upper bound on |S N κ,δ |.