Multi-state condensation in Berlin-Kac spherical models

We consider the Berlin-Kac spherical model for supercritical densities under a periodic lattice energy function which has finitely many non-degenerate global minima. Energy functions arising from nearest neighbour interactions on a rectangular lattice have a unique minimum, and in that case the supercritical fraction of the total mass condenses to the ground state of the energy function. We prove that for any sufficiently large lattice size this also happens in the case of multiple global minima, although the precise distribution of the supercritical mass and the structure of the condensate mass fluctuations may depend on the lattice size. However, in all of these cases, one can identify a bounded number of degrees of freedom forming the condensate in such a way that their fluctuations are independent from the rest of the fluid. More precisely, the original Berlin-Kac measure may be replaced by a measure where the condensate and normal fluid degrees of freedom become independent random variables, and the normal fluid part converges to the critical Gaussian free field. The proof is based on a construction of a suitable coupling between the two measures, proving that their Wasserstein distance is small enough for the error in any finite moments of the field to vanish as the lattice size is increased to infinity.


Introduction
Berlin and Kac proposed [1] in 1952 a spherical model as a modification of the Ising model of a ferromagnet. In their model, discrete spin variables are replaced by continuum variables, i.e., by real numbers, while keeping a constraint that the total length of the continuum vector equals that of the discrete spin vector. This enforces the continuum spin vectors to remain on the surface of a fixed high-dimensional sphere, hence the name "spherical model." Their motivation was to find simple models were phase transitions could be studied fairly explicitly, in particular, in the physically relevant case of three dimensions.
Although the partition function of the spherical model cannot be explicitly solved for fixed finite lattices, it has an integral representation which allows studying the properties of its infinite volume limit when restricted to nearest neighbour interactions. The limiting partition function is sufficiently explicit that standard thermal equilibrium properties of the model can be derived from it and, as shown in [1], the spherical model in three dimensions has a phase transition corresponding to spontaneous magnetisation. The reference also contains estimates for the second and fourth moments of the field, implying that the fluctuations at small temperatures, when there is spontaneous magnetisation, cannot be Gaussian.
On a technical level, the spontaneous magnetisation found in [1] is analogous to Bose-Einstein condensation in quantum statistical mechanics. For instance, Yan and Wannier [2] extend the analysis in [1] to compute also the single site distribution (one-point function) in the infinite volume limit. They find that in the subcritical case the distribution is Gaussian whereas in the supercritical case it is not Gaussian but instead corresponds to a random variable which is a sum of a random constant and a Gaussian variable. The appearance of the constant is analogous to the effect of condensation for ideal Bose gas.
To elucidate the connection further, let us begin with more detailed definitions. The spherical model in d dimensions is defined as the random field of "continuous spin" s x ∈ R, x ∈ Λ, where Λ ⊂ R d is a finite lattice of points. The main purpose of using a lattice to label the spins is to define the interaction energy of a spin configuration: one assumes that there is given a coupling function J x,y , x, y ∈ Λ, such that the energy is given by where s * x denotes the complex conjugate, added here for later use. Often one takes J x,y = v(x− y) for a function v which decays sufficiently rapidly with increasing |x− y|. For instance, the rectangular nearest neighbour case with Dirichlet boundary conditions would have Λ ⊂ Z d and v(x) = 0 for |x| ∞ ≥ 2, where |x| ∞ := max i |x i |. We will use both |x| ∞ and the Euclidean norm on R d , |x|, frequently in the following.
Denoting the lattice size by V = |Λ| < ∞, the probability measure for the spin field s at inverse temperature β > 0 is given by (1.1) The first factor is the standard canonical Gibbs weight for the given temperature and energy function. The second "factor" is a δ-function constraint which enforces the assumption that the length of the spin vector divided by the number of particles is equal to one. We will use such δ-functions liberally in the following, and the discussion about their mathematical definition and properties is given in Appendix A. In particular, it follows that under the above measure x∈Λ s 2 x = V almost surely. Here Z BK,Λ,β > 0 is a constant which normalizes the positive measure into a probability measure, and it is also equal to the earlier mentioned finite volume partition function of the spherical model.
Here, we generalize the above spherical model slightly by complexifying the spin field s x and allowing for arbitrary spin-densities ρ > 0. Explicitly, we consider here complex fields φ x ∈ C, x ∈ Λ, whose values are distributed according to the measure where dφ * x dφ x := d Re φ x d Im φ x . The measure (1.2) is a "classical field" version of the ideal gas of bosonic particles in the canonical ensemble where the total particle number is fixed to ρV but energy is allowed to fluctuate according to the canonical Gibbs ensemble. In fact, it follows from our main result that the mechanism behind the spherical model phase transition is identical to that found for Bose-Einstein condensation of non-interacting bosons: if d ≥ 3, we show that for all sufficiently large densities ρ it is possible to separate a finite number of Fourier modes from the field, called the condensate, and these will carry all of the excess mass above criticality. The fluctuations of the remaining degrees of freedom, the normal fluid, are shown to become Gaussian and independent from the condensate fluctuations in the large volume limit.
An important consequence of the analysis here is to observe that the condensate cannot always be composed out of a unique Fourier mode. In fact, the number of relevant modes and their fluctuations might even depend on the precise shape and size of Λ. For spin interactions, and even more so for dispersion relations arising from tight binding approximation or for phonons in solid state physics, it would be important to be able to consider fairly general interaction potentials. A number of example lattice interactions are discussed in Sec. 4. One of these is given by a dispersion relation which has a unique global minimum but its restrictions to periodic rectangular lattices with L particles on each side have a unique condensate mode for odd L but 2 d condensate modes for even L. This is in sharp contrast to the standard ideal Bose gas example [3,Theorem 5.2.30] where L → ∞ limiting behaviour is unique and all excess mass condenses into the (unique) ground state, corresponding to the Fourier mode with wave number zero.
Our main result, Theorem 2.2, provides explicit bounds which may be used to estimate the accuracy of any proposed splitting of the Fourier modes into condensate and normal fluid modes. One of the main goals of the present contribution has been to find methods which would be able to identify the condensate modes properly for general, finite range lattice interactions. This has resulted in the bounds given in Theorem 2.2; as we discuss in Sec. 4, these bounds are indeed sufficiently refined to distinguish the condensate modes correctly not only in the above odd and even L cases, but also in all other examples considered in Sec. 4.
Bose-Einstein condensation has been much more extensively studied in the literature than the spherical model. Although the analysis is complicated by the replacement of the complex field φ x by non-commutative bosonic creation and annihilation operators on the Fock space, the findings are not dissimilar from the above observations. For example, in [4] the properties of the condensate in the so-called imperfect Bose gas are shown to depend on which lattices are used to approach the infinite volume limit, by varying the anisotropy of the lattices. Even more extreme examples for the ideal Bose gas are given in [5]. Multi-state condensation has also been shown to occur in similar models in [6] and its introduction contains a summary of other earlier findings. In contrast, if one adds a one-particle energy gap, single-state condensation occurs for bosons interacting via superstable two-body potentials [7]. The role the explicit gap plays in the result is discussed in the paper but, since the gap is not allowed to depend on the system size, it is not possible to draw conclusions about the minimal gap size needed. Indeed, our results indicate that this dependence could be fairly complex in general.
A second motivation to study the measure (1.2) comes from statistical mechanics of discrete wave equations. Considering (2 1 2 Re φ x , 2 1 2 Im φ x ) to form a pair of canonical variables for each x, one may use the function E Λ [φ] to define Hamiltonian evolution under which it is conserved and may be identified physically as the total energy. Requiring the symmetry condition J * y,x = J x,y from the coupling, the evolution equations are equivalent to In particular, if J x,y = α(x − y; L) where α is L-periodic, this corresponds to a discrete wave equation with periodic boundary conditions and with a dispersion relation ω which is given by the Fourier transform of α. In addition, one may check by differentiation that the ℓ 2 -norm is conserved by the time-evolution, i.e., that x∈Λ |φ x | 2 is also a conserved quantity. By Liouville's theorem, the Lebesgue measure is invariant under the Hamiltonian evolution and thus the measure (1.2) yields a family of stationary measures for the discrete wave equation corresponding to the Hamiltonian E Λ [φ]. Therefore, our result can also be viewed as a proof of "Bose-Einstein" condensation for the equilibrium measures of these discrete wave equations.
To mention one additional motivation for the measures in (1.2), let us point out that they can also be obtained as a weak coupling limit of fixed density, i.e., "canonical", equilibrium measures of the discrete nonlinear Schrödinger equation. In [8], we study the discrete nonlinear Schrödinger evolution with random initial data distributed according to a grand canonical ensemble, aiming at rigorous control of the related kinetic theory. However, the assumptions used in [8] require that the weak coupling measure in the thermodynamic limit becomes Gaussian, hence excluding a range of densities which correspond to the supercritical case studied here. The above results could provide the first step towards understanding kinetic theory for weakly nonlinear waves in presence of a condensate.
The main technique for controlling the error arising from the separation of the condensate degrees of freedom is very different from the previous estimates in [1,2]. Instead of trying to represent the δ-function in terms of oscillatory integrals, we think of it as a constraint defining a positive measure, and aim at minimizing the effect of the separation with a flexible choice of which modes are included in the condensate. It turns out that there are cases in which the condensate degrees of freedom have somewhat irregular fluctuations but the main achievement here is to show that it is possible to make the separation in such a manner that the number of condensate modes always remains bounded and the rest of the modes become independent Gaussian random variables. After the approximate measure has been chosen, we check that it is close to the original one by constructing a coupling between the two measures, borrowing ideas from [9]. This controls the Wasserstein distance between the measures, and together with their translation invariance, we conclude that there is a power p ′ > 0 such that all finite moments of the field φ x are O(L −p ′ ) close to each other as L → ∞.
Couplings and Wasserstein metric are basic tools for optimal transport problems [10]. They have also been used for studies of condensation phenomena in stochastic particle systems, although in models such as zero-range processes the condensation occurs at isolated lattice sites instead of Fourier modes as in the cases discussed above. We refer to [11] and references therein for an up-to-date discussion and examples related to the topic.
In the following sections, we first define the complexified spherical model and describe the main results in more detail in Sec. 2. The fixed finite lattice case for supercritical densities is discussed in Theorem 2.2 while the conclusions for the case where a given dispersion relation is studied in the infinite volume limit are given in Corollary 2.6. These results give bounds for the Wasserstein distance between the spherical model measure and the approximation where the condensate and normal fluid modes have been separated. The bounds typically diverge, but in Sec. 3 we explain how they nevertheless imply that the approximation errors of finite moments vanish in the infinite volume limit. Various scenarios for the formation of the condensate for a number of example continuum dispersion relations are discussed in Sec. 4. In the technical part, we first prove Theorem 2.2 in Sec. 5, and a statement in item 3 of Proposition 2.3 which uses a number of components from the proof. The main estimates allowing to control the infinite volume limit of fixed dispersion relations are given in Sec. 6, in particular, completing the missing proof of Lemma 2.5. In the two Appendices, we first clarify the precise mathematical interpretation of the δ-function constraints and recall the definition and basic properties of the Wasserstein distance.

Acknowledgements
This work has greatly benefited from the input from two colleagues: Herbert Spohn, who in a personal communication suggested the splitting of the condensate fluctuations in the infinite volume limit for the nearest neighbour interactions, similarly to what is stated in item 1 of Proposition 2.3, and asked the question about its proof and generalizations, and Eero Saksman, who generously took the time to explain the technical details of their method for generating efficient couplings between two probability measures which might be mutually singular but approximate each other well with high probability. I am also grateful to Stefan Großkinsky and Stefano Olla for their comments on coupling techniques used in stochastic particle systems. I also thank the anonymous reviewer and Kalle Koskinen for helpful comments on the manuscript.
The work has been supported by the Academy of Finland via the Centre of Excellence in Analysis and Dynamics Research (projects 271983 and 307333), and it has also benefited from the support of the project EDNHS ANR-14-CE25-0011 of the French National Research Agency (ANR) and from the discussions during workshops organized by the Institut Henri Poincaré -CentreÉmile Borel, Paris, France, (IHP trimester "Stochastic Dynamics Out of Equilibrium"), and by the Mathematisches Forschungsinstitut Oberwolfach, Germany (MFO mini-workshop: "Gibbs Measures for Nonlinear Dispersive Equations").
2 Separation of condensate in the spherical model

Notations and definition of the spherical model measure
We begin with the probability measure for a finite complex field φ x , x ∈ Λ, defined by the complexified spherical model of Berlin and Kac given in (1.2). For simplicity, we only consider d-dimensional periodic lattices of fixed side length L, which we parameterize as follows If L is even, Λ L contains those x ∈ Z d for which |x| ∞ ≤ L 2 and x i = − L 2 for all i. We further simplify the discussion by restricting to energy functions satisfying periodic boundary conditions. Without loss of generality, we also include the inverse temperature to the definition, and thus assume that where α : Λ L → C determines the interaction energies. Here, and in the following, we use periodic arithmetic on Λ L , setting x ′ ± x := (x ′ ±x) mod Λ L and −x : The above definition implies that the energies remain invariant under periodic translations of the field configuration, i.e., In fact, we can now "diagonalize" the interaction by using discrete Fourier transform. We define the Fourier transform on Λ = Λ L by first setting as the dual lattice Λ * (L) := Λ L /L ⊂ − 1 2 , 1 2 d and then denoting the Fourier transform of a function f : Λ → C by f : The inverse transform is given by It is straightforward to check that both transforms are pointwise invertible for all f and g, The standard convolution results hold for the discrete Fourier transform, and thus we have where Φ = φ : Λ * → C and ω = α. In this formulation, it is now obvious that if we wish to satisfy the physical requirement of the energy H L being real for all field configurations, it is necessary that ω(k) ∈ R for all k ∈ Λ * . In addition, by the inversion formula Therefore, it is possible to simplify the study of the infinite volume limit L → ∞ by considering a "target" function ω : T d → R, parameterizing the torus using − 1 2 , 1 2 d , and defining α using the formula (2.5). For reasons explained in the Introduction, we call such functions ω dispersion relations. In the following, some of the results concern the limiting behaviour as L → ∞ for some given dispersion relation ω on the torus, while others assume that L is fixed and ω(k), k ∈ Λ * , are some given real numbers.
We also denote and thus arrive at the following expression for the spherical model measure where dΦ * k dΦ k := d Re Φ k d Im Φ k , Z ρ normalizes the integral to one, and As the norm in which to measure the Wasserstein distance, we choose the ℓ 2 -metric on the x-space. By the Plancherel theorem for discrete Fourier transform, this means using the following norm for the field Φ k , and N [Φ] = Φ 2 . We also need spherical coordinates in these variables. We denote the radial distance coordinate by |Φ|, and it is then related to the above norm by

Factorized supercritical measures
Our goal is to study the spherical model for parameter values which lead to generation of a condensate. Since this is a physical, macroscopic notion, we first need to quantify mathematically what it could mean for finite lattice systems such as the spherical model measure introduced in the previous subsection. After this, we will separately consider the large L behaviour of systems whose energy eigenvalues ω(k), k ∈ Λ * , arise from a continuum dispersion relation ω : T d → R as explained earlier.
To quantify condensates and supercriticality, it will be necessary to identify a sufficiently large energy gap separating the modes which belong to the condensate from the rest. To this end, we divide the wave numbers in Λ * into a condensate wave number set Λ * 0 and a normal fluid wave number set Λ * + = Λ * \ Λ * 0 in such a manner that the energies occurring in these sets are separated by a non-empty interval. An important parameter of the split turns out to be the proportional size of the gap, after normalizing the lowest energy to zero; the following item collects the related definitions and terminology. Definition 2.1 Consider Λ * for some fixed L and suppose ω(k) ∈ R, k ∈ Λ * , are given. Define ω 0 := min k∈Λ * ω(k) and e k := ω(k) − ω 0 ≥ 0, k ∈ Λ * . A split of Λ * is a pair (Λ * 0 , Λ * + ) of nonempty disjoint subsets of Λ * whose union covers the whole Λ * . Given 0 ≤ a < b and a split (Λ * 0 , Λ * + ), we say that the split is separated by the energy interval [a, b] if e k ≤ a for all k ∈ Λ * 0 and e k ≥ b for all k ∈ Λ * + . In this case, the relative energy gap of the split is defined as δ −1 where δ := max k∈Λ * 0 e k min k∈Λ * + e k ≤ a b < 1 .
We denote the number of elements in the two subsets of the split by V 0 := |Λ * 0 | and V + := |Λ * + |.
Since V = |Λ * |, for such a split we clearly have 0 < V 0 , V + < V and V = V 0 + V + . Also, every global lattice minima, a point k ∈ Λ * at which ω(k) = ω 0 , belongs to Λ * 0 . Hence, Λ * 0 contains all k for which e k = 0, and thus e k > 0 for all k ∈ Λ * + . Given such a split, we call the field Φ + composed out of modes with k ∈ Λ + the normal fluid while the field Φ 0 resulting from the remaining modes is called the condensate. The goal is to quantify under which assumptions the condensate field can be composed out of a small fraction of the modes, V 0 V ≪ 1, so that they nevertheless carry a substantial fraction of the total mass ρV . Analogously to the Bose-Eistein condensation, one could then expect the normal fluid to fluctuate according to the critical thermal, grand canonical ensemble. Indeed, under the assumptions made in the main theorem we can prove that the normal fluid Φ + follows very accurately Gaussian statistics given by the following distribution This measure is well-defined since ω(k) > ω 0 for all k ∈ Λ * + . The expectation of norm density, Φ + 2 /V , under such a measure is equal to The standard deviation of the norm density is proportional to 1/ √ V = L − d 2 , and thus for large L the normal fluid under this measure cannot carry much more of the density fixed by the condition N [Φ] = ρV as soon as ρ > ρ c . Since N [Φ] = Φ + 2 + Φ 0 2 , then the extra norm density ρ − ρ c will be contained in the condensate modes.
Based on the above analogy, we say the the spherical model is supercritical if ρ > ρ c for a split which has sufficiently large relative energy gap and only a few condensate modes (the precise conditions are given in Theorem 2.2). The above formal discussion will then turn out to give the correct picture for fairly general energy functions ω(k). In fact, the separation between the two sets of modes is so strong that even the fluctuations of condensate and of the normal fluid will become statistically independent. However, if the condensate is degenerate, the fluctuations of the condensate can be nontrivial. In the main result we will compare the spherical model measure µ 0 to the probability measure µ 1 defined by where ∆ := ρ − ρ c > 0, Z 1 is a constant normalizing the integral to one and, using e k := ω(k) − ω 0 , we define Clearly, µ 1 is a product of µ + and a measure for the condensate modes, and the total norm density is split between the normal fluid and condensate in the manner described above.
The structure of the condensate fluctuations under µ 1 may indeed be fairly complicated. However, there are certain situations where they can be replaced by simpler uniform distribution of the excess mass over the condensate modes, i.e., by using the measure instead of µ 1 above. Some sufficient conditions for using the simpler measure are discussed later in Proposition 2.3. As we show there, using µ ′ 1 is allowed at least if a single mode condensate can be used, i.e., if V 0 = 1. We call both µ 1 and µ ′ 1 factorized supercritical measures.

Main results
Our main result is to state conditions under which µ 0 and µ 1 are so close to each other that the expectations of all local observables will agree with each other, up to some error which is proportional to a negative power of L, hence vanishing when L → ∞. The precise conditions are contained in the following Theorem implying a bound for the Wasserstein distance between µ 0 and µ 1 . The proof of Theorem is given in Section 5.
The Wasserstein distance estimate is sufficiently strong that local expectations of the original field, φ =Φ, generated by these two measures agree up to errors which vanish as L → ∞. Namely, if I ⊂ Λ is finite in the sense that |I|/V ≪ 1 and φ I := x∈I φ x , then the bound given in Theorem 2.2 implies the existence of p ′ > 0 such that The proof of this statement will rely on translation invariance of the random field generated by the measures µ 0 and µ 1 and it is given later as Theorem 3.2 in Sec. 3. Therefore, if a split with sufficiently large gap can be found, then the spherical model is well approximated by a critical Gaussian field and a few independent condensate Fourier modes, as determined by µ 1 .
Theorem 2.2 Consider a fixed L and some given ω(k) ∈ R, k ∈ Λ * . Suppose (Λ * 0 , Λ * + ) is a split of Λ * which is separated by the energy interval [a L , b L ], 0 ≤ a L < b L , and has a relative energy gap δ −1 L , as specified in Definition 2.1. We recall also the definitions of the total system size V , the number of the condensate modes V 0 , and the critical norm density ρ c (L) in (2.9).
As shown later in Lemma 2.5, for energies arising from many common continuum dispersion relations a sequence of splits can be found for which ε L → 0 as L → ∞ while V 0 and ρ c (L) remain bounded, implying C 2 = O(1) if ρ > sup L ρ c (L). However, the speed of convergence of ε L is usually not sufficient for the bound of the Wasserstein distance W 2 (µ 0 , µ 1 ) to go to zero, so we cannot state any convergence result in the above (unscaled) L 2 -norm. Nevertheless, as we show in Sec. 3, for errors in local correlation functions the bound can be improved by a factor of L − d 2 which shows that these errors vanish in the limit of large lattices. The precise statement is given in Theorem 3.2, and as discussed in Sec. 3, the main simplification from the replacement of µ 0 by µ 1 is given by the vastly simpler fluctuation properties of the normal fluid under the measure µ 1 .
There are a few special cases for which also the condensate fluctuations have simple structure, summarized in Proposition 2.3. In the statements below, we say for instance that where X is a random variable independent of Φ + and uniformly distributed on the unit sphere S 2V 0 −1 ". There it is implicitly assumed that the first term refers to normal fluid components and the second to the condensate components using the standard isomorphism between C Λ * 0 and R 2V 0 : for k ∈ Λ * + , we then have

If
where θ is a random variable independent of Φ + and uniformly distributed on the interval [0, 2π].

If
where X is a random variable independent of Φ + and uniformly distributed on the unit sphere where X is a random variable independent of Φ + and uniformly distributed on the unit sphere S 2V 0 −1 .
Proof: The assumptions in the first two items imply that E 0 [Φ] = 0 (note that by definition of the split, we necessarily have ω(k) = ω 0 for some, and hence for all, k ∈ Λ * 0 ). Thus the weight related to k ∈ Λ * 0 is equal to one.
is uniformly distributed on the unit sphere S 2V 0 −1 : for any continuous bounded function and the normalization condition fixes the overall constant correctly. If V 0 = 1, X is uniformly distributed on the unit circle and thus equals e iθ in distribution. The proof of the last item uses techniques from the proof of the main Theorem, and it can be found at the end of Section 5.
To study infinite volume limits, we assume that the weights ω(k) are given by an Lindependent dispersion relation, satisfying the following conditions. Assumption 2.4 Suppose d ≥ 3 and consider a function ω : T d → R which is C 2 and has only finitely many non-degenerate minima. More precisely, we assume that both of the following statements hold: 1. The periodic extension of ω into a function R d → R is twice continuously differentiable.
2. By the first assumption and compactness of T d , ω attains a minimum value ω min ∈ R.
We assume that the collection of all global minima in T d , is finite and that the Hessian matrix D 2 ω(k 0 ) is invertible for all k 0 ∈ T 0 Note that these assumptions are invariant if ω is multiplied by any positive constant, and thus they remain invariant in changes of the implicit inverse temperature factor β. It turns out that in the presence of a condensate, the distribution around the degrees of freedom with minimum energy may vary with the lattice size L without converging towards any limiting behaviour as L → ∞. For example, in Section 4.3 we present an example with different number of condensate modes for odd and even L. We illustrate via explicit examples why the split can have nontrivial dependence on the lattice size L in Section 4.
The following Lemma shows that for dispersion relations satisfying Assumption 2.4 a split with the desired properties can be found.
Then there are constants L 0 , M 0 ∈ N + and c 0 , c 2 > 0, depending only on d, the function ω, and the choice of κ, such that for all L ≥ L 0 we can find a split (Λ * 0 , Λ * + ) of Λ * with the following properties: (2.16) 2. The split is separated by an energy interval [a L , b L ] and has a relative energy gap the following positive integral is finite, and, as L → ∞, The proof of the Lemma is postponed to Sec. 6, and it contains ways to construct some constants for which the Theorem holds. However, these constructions are not always optimal since they need to take into account extreme cases such as very anisotropic dispersion relations. Hence, if optimal decay estimates are desired, it is better to optimise the values case by case instead of using, e.g., the worst case estimate in (6.7) for M 0 .
As a straightforward application, we obtain the following consequences for systems where the infinite lattice dispersion relation is kept fixed and L is taken large.
Corollary 2.6 Suppose that d ≥ 3 and ω satisfies Assumption 2.4, and take some cutoff parameters for the minimum distance from criticality, ∆ 0 > 0, and for a maximal density, Then there are L ′ , M 0 , and C ′ > 0 such that for any L ≥ L ′ we can find a split (Λ * 0 , Λ * + ) of Λ * satisfying all properties stated in Lemma 2.5 and for which the Wasserstein distance between the measures µ 0 and µ 1 defined in Theorem 2.2 satisfies for all densities ρ on the interval Proof: Since the assumptions of Lemma 2.5 are satisfied, ρ c (L) → ρ ∞ , as L → ∞, and thus there is In addition, we can conclude from the Lemma that there is M 0 ≥ 1 such that for any appropriately chosen κ, the split ( 2 > 0, uniformly in L. Therefore, we may find L ′ ≥ max(L 0 , L ′ 0 ) such that both inequalities in (2.13) hold for all L ≥ L ′ and all ρ satisfying (2.22).
Thus we may use the conclusions of the main Theorem for these values of parameters, and the constant C ′ = C 2 may be adjusted to work for all allowed values of κ, L, and ρ. Since also M 0 is independent of κ, we can maximize the decay of ε L by setting κ = d−2 2M 0 +1 < d 2 which satisfies κ < 1 for d = 3. This results in the bound stated in the Corollary.

Local correlation estimates from Wasserstein bounds
In the main result, a bound is derived for the Wasserstein distance between two measures µ 0 and µ 1 which are both gauge invariant in the sense that (Φ k ) k∈Λ * and (e iϕ k Φ k ) k∈Λ * have the same distribution for any choice of the constant phase shifts ϕ k ∈ R, k ∈ Λ * . This is a consequence of the geometric identification between C and R 2 which implies that a multiplication Φ k → e iϕ k Φ k corresponds to a rotation by an angle ϕ k and thus it leaves the The weight functions only depend on |Φ k | 2 and thus also they are left invariant. However, in applications, one is usually mainly interested in the corresponding fields φ x , x ∈ Λ L , obtained by inverse Fourier transform from Φ k : we consider the collection of for x ∈ Λ L . The above gauge invariance of the Fourier components is reflected in translation invariance of the field φ x . Namely, for any y ∈ Λ L , we have and thus the field (φ x+y ) x∈Λ has the same distribution as the field (φ x ) x∈Λ . This translation invariance is sufficient to lift the earlier usually divergent Wasserstein bounds to vanishing error estimates for moments of the field φ x . To see this, consider a sequence I of length n ≥ 1 of pairs ( We use the index τ to determine complex conjugation: we set φ x,1 = φ x and φ x,−1 = φ * x , and use the shorthand notation φ I := α∈I φ α := n i=1 φ x i ,τ i for the monomial corresponding to the above sequence I. The expectation of such local observables will get an improvement by a factor L − d 2 for the Wasserstein distance from translation invariance, as stated in the following Lemma. Lemma 3.1 Suppose µ and µ ′ are gauge invariant measures for the Fourier components, field Φ(k), k ∈ Λ * . Given x ∈ Λ, define A 1 (x) := 1 and for n > 1 set Consider the random field φ =Φ and suppose n ≥ 1 is such that A n (x) < ∞ for some x ∈ Λ. Then A n (x) does not depend on the choice of x and for any sequence I of length n as above, we have an estimate Proof: Under either of the measures µ and µ ′ the field φ x is translation invariant, φ I = φ I+y for any y ∈ Λ L , where I + y := ((x i + y, τ i )) n i=1 . Therefore, for any coupling γ between µ and µ ′ the difference of their moments satisfies In particular, if n = 1, by using Cauchy-Schwarz inequality we obtain Since the left hand side does not depend on the coupling γ, taking an infimum yields the bound in (3.3); cf. the definition of the Wasserstein distance in Appendix B.
Consider then the case n > 1. The difference of products in (3.4) can be "telescoped" as follows yielding an estimate Note that the absolute values on the right hand side cancel the effect of any possible complex conjugations on the left hand side. Taking an expectation over γ and then using Cauchy-Schwarz inequality and the natural order in I to simplify the notations, we obtain where in the last step we have used the generalized Hölder's inequality with exponent q ′ = 2(n − 1) for which indeed x ′ ∈I;x ′ =x 1 q ′ = 1 2 for all x ∈ I. We may now conclude that the error X is bounded by Here, only the first factor depends on γ, since all the other factors may be computed using the fixed marginal measures µ and µ ′ . Using the translation invariance of the marginal measures we obtain We next use the assumption that A n < ∞ for the moments given in (3.2). By translation invariance A n is independent of the choice of x ∈ Λ, and thus by applying the Schwarz inequality to the sum over y, we obtain Since the left hand side does not depend on the coupling γ, taking an infimum yields the bound in (3.3), as before. This concludes the proof of the Lemma.
In the bound (3.3) a factor L d 2 gets cancelled from the Wasserstein distance. Combined with the earlier results, the bound thus goes to zero if n is not allowed to increase when taking L → ∞, as long as the constants A n remain bounded in the limit. As proven in Lemma 3.3 at the end of the section, this holds for the measures considered here. Hence, we may conclude that (3.3) combined with the Wasserstein estimates stated in the main results in Sec. 2 implies that a supercritical spherical model measure and µ ′ is a compatible factorized supercritical measure. Summarizing all assumptions in one place, we obtain the following result as an immediate corollary of Corollary 2.6, Lemma 3.1, and Lemma 3.3.
Theorem 3.2 Suppose that d ≥ 3 and ω satisfies Assumption 2.4, and consider any supercritical ρ as in Corollary 2.6. Fix a maximum order n ≥ 1 of the local moment. Then there are C ′ , p ′ , L ′ > 0 for which the following holds: if L ≥ L ′ , we may find a split (Λ * 0 , Λ * + ) of Λ * and define the corresponding factorized supercritical measure µ 1 by (2.10) so that

5)
for any sequence I from Λ L × {±1} of length at most n.
Using the constants occurring in Corollary 2.6, we may use p ′ = d/2−1 2M 0 +1 in (3.5). However, as discussed before the Corollary, this value might not always be optimal, i.e., the result could hold also for larger values of p ′ .
For applications of the approximation result, perhaps the most important consequence is the simplification of the structure of fluctuations. Namely, apart from the few condensate degrees of freedom, the field becomes Gaussian and translation invariant. In fact, as we will show next, its infinite volume statistics are given by the critical lattice field ψ x , x ∈ Z d , which has zero mean and covariance with E[ψ x ψ y ] = 0 and for all x, y ∈ Z d . More precisely, for all of the factorized supercritical measures in Sec. 2.2, the field φ x can be written as a sum of two independent random fields of which the normal fluid component where Φ + is distributed according to the measure µ + in (2.8). Therefore, for any compactly supported test function J : Z d → C, we can define the random variable as soon as L is large enough so that Λ L contains the support of J. Then J, φ + has mean zero and a variance for which J, φ + 2 = 0 and The function J : T d → C is continuous, hence also bounded. We assume that the split (Λ * 0 , Λ * + ) for all L has the properties listed in Lemma 2.5. Then it is possible to partition T d into boxes of side length 1 L so that 1 e k is bounded in the corresponding box by a constant times 1 ω−ω min , apart possibly from a finite number of boxes. Due to the lower bound for e k valid for all k ∈ Λ * + , we may ignore the exceptional boxes, and for the remaining ones use dominated convergence theorem to conclude that for any fixed J Details of this construction, as well as explicit estimates in L for the size of the error, can be found in the proof of (2.20) given at the end of Sec. 6. Then an application of the polarization identity proves that for any two test functions J 1 and J 2 with a compact support we have Restricted to single site test functions, we may thus conclude that (3.6) is indeed the limit of any pointwise covariances. Since both the finite volume and the limit field are Gaussian, these results also immediately imply the convergence of all finite moments. We conclude the section by showing that both the original and factorized fields have uniformly bounded moments. Then to each m ≥ 0 there is an L-independent constant c m such that for the random variable φ x defined by (3.1) for any x ∈ Λ L .
Split φ x into a condensate and normal fluid component as follows Then φ x = φ 0 x + φ + x , and the condensate component may be bound by using the upper bound M 0 from Lemma 2.5, Under the measure µ 0 , ρ 0 [Φ] ≤ ρ almost surely, and under either of the measures µ 1 or µ ′ 1 we have ρ 0 [Φ] = ∆ ≤ ρ almost surely. Therefore, in all of the three cases the condensate field is almost surely uniformly bounded in L, |φ 0 x | ≤ √ M 0 ρ. We then employ Hölder's inequality for the dual pair (2m, 2m/(2m − 1)) to bound the moment The condensate term on the right hand side is now bounded by (M 0 ρ) m , so it only remains to estimate the normal fluid term.
Let us begin with the case where µ is µ 1 or µ ′ 1 . Since φ + x only depends on Φ + , the product structure of these two measures implies that The remaining expectation is over independent, mean zero, Gaussian complex random variables. By the Wick rule and gauge invariance, the expectation is zero unless there is a permutation π of {1, 2, . . . , m} such that k ′ i = k π(i) for all i. Therefore, For any nonzero term in the sum, Therefore, for these two measures, we may use c m = 2 2m−1 (m! + M m 0 )ρ m . It remains to consider the normal fluid contribution for µ = µ 0 . As above, we have and by gauge invariance of µ 0 , the remaining expectation is zero unless for each k ∈ Λ * + there are the same number of Φ k and Φ * k terms in the product, in which case the product yields a positive number. Thus for the nonzero terms also here we can find a permutation π of {1, 2, . . . , m} such that k ′ i = k π(i) for all i. Therefore, Continuing as above, and observing that ρ /V is bounded by ρ almost surely under µ 0 , we find an upper bound Therefore, also for µ = µ 0 , we may use c m = 2 2m−1 (m! + M m 0 )ρ m . Let us point out that by Lemma 2.5 ρ c (L) is bounded in L and thus it is not a contradiction to assume that ρ is fixed and supercritical for all L ≥ L 0 .

Example lattice dispersion relations
As an application, we consider explicitly a number of dispersion relations ω : T d → R, all of which are continuous (periodic) functions. Let us first recall that, once we define φ x by (3.1), the energy and norm satisfy Taking L → ∞ thus shows that α(x; L) → α(x) = T d dk ω(k)e i2πk·x for each x ∈ Z d . Here α(x) are the Fourier coefficients of ω and they are ℓ 2 -summable since ω ∈ L 2 (T d ). In particular, α(x) → 0 as |x| → ∞. Furthermore, if ω is a restriction of an analytic function, we may conclude that its Fourier coefficients α(x) are exponentially decreasing in |x| → ∞, and all such functions correspond to "short-range" interactions for the field φ x .

Nearest neighbour interactions
In the original Berlin-Kac paper nearest neighbour interactions where considered which for a rectangular lattice would correspond to using a dispersion relation where a ∈ R and b > 0. (Since 2 sin 2 (πy) = 1 − cos(2πy) = 1 − 1 2 (e i2πy + e −i2πy ), it is straightforward to check that then |α(x; L)| = 0 if |x| ∞ > 1, i.e., for points which are not nearest neighbour on a rectangular lattice.) Clearly, ω is twice continuously differentiable and k = 0 is the unique minimum point on T d and ω min = ω(0) = a. Also, D 2 ω(0) = 2π 2 b 1 is proportional to the unit matrix 1 and strictly positive. Thus ω satisfies Assumption 2.4 with T 0 = {0}.
For fixed L ≥ 2, let us parameterize the dual lattice Λ * by k = n L where n ∈ Λ L , in particular, |n| ∞ ≤ L 2 . Since 0 ∈ Λ * , we have ω 0 = ω min = a, and thus the excess energies satisfy Therefore, defining Λ * 0 = {0} and Λ * + = Λ * \ {0}, results in a split of Λ * which is separated by the energy interval [0, 4bL −2 ] and thus has δ L = 0. We also have By a Riemann sum approximation (see Sec. 6 for details) we find that the right hand side is Hence, also ε L satisfies these bounds, and we may apply Theorem 2.2 for all large enough L. We conclude that for d ≥ 5, any p ′ < 1, for d = 4, and p ′ = 1 2 , for d = 3. Since V 0 = 1 and k = 0 is the unique condensate Fourier mode, we can then apply Proposition 2.3 and Lemma 3.1 to conclude that for any finite moment, i.e., for index sets I whose length is less than some arbitrary cut-off, we can approximate where ψ x = φ + x + φ 0 x and φ 0 x = ρ − ρ c (L)e iθ is a constant field with a random phase. As shown in Sec. 3, φ + x behaves like the critical Gaussian field.

Acoustic phonon type interactions
Although not covered by Assumption 2.4, we can also apply Theorem 2.2 directly by explicit estimates to the following dispersion relation which would appear in the theory of acoustic phonons: By the computations in the previous subsection, then again k = 0 is the unique minimum also on finite lattices and the excess energies satisfy Hence, for all d ≥

Dispersion relation with several minima
Let which has 2 d global minima at points with k i ∈ {0, 1 2 } for i = 1, 2, . . . , d. All of these are non-degenerate and thus ω satisfies Assumption 2.4. Also, 0 is a minimum and thus for all L the minimum value is reached, ω min = 0 = ω 0 .
Suppose first that L is odd, say L = 2m + 1 with m ∈ N + . Then if k 0 ∈ T 0 is not zero, it has some component i such that k i = 1 2 . For such i and any n ∈ Z d , we have Therefore, e k ≥ L −2 for all k = 0, and one may modify the earlier estimates to prove that the split with Λ * 0 = {0} has ε L = O(L −4p ′ ) with p ′ chosen as for the nearest neighbour interactions. Thus for odd L one finds a single-component condensate, even though |T 0 | = 2 d .
If L is even, say L = 2m with m ∈ N + , we have 1 2 = m L , and thus T 0 ⊂ Λ L /L. Defining Λ * 0 = T 0 results in a split for which ρ c (L) = O(1) and ε L = O(L −4p ′ ) as above but now the condensate is 2 d -fold degenerate. In addition, e k = 0 for each k ∈ Λ * 0 , so it is not possible to decrease Λ * 0 without reducing the gap size to zero. We can also apply item 2 of Proposition 2.3 and conclude that in the condensate the Fourier modes k ∈ T 0 are distributed uniformly on a sphere and hence the condensate field φ 0 x has strong oscillations in x. In summary, the odd and even lattice sizes behave differently, and it does not really make sense to talk about L → ∞ limit of the measure µ 0 , at least not without first removing the condensate modes. This result becomes more transparent if one computes the coupling function α(x; L): these correspond to next-to-nearest neighbour couplings where α(x) = 0 unless x = 0 or |x| ∞ = 2. Considering each of the d directions separately, one observes that if L is even, the odd and even sites become disconnected, and thus the system decouples into 2 d independent nearest neighbour systems. On the other hand, if L is odd, odd and even sites are coupled by "going around the circle once". In fact, this system corresponds to a single nearest neighbour lattice where the particle labels have been permuted. Since the estimates in Theorem 2.2 are sufficiently strong to distinguish between the two cases, we find that they provide a reliable, relatively simple method of isolating the condensate modes also in this somewhat pathological setup.

Dispersion relations with varying condensate energy
As a straightforward generalization of the above dispersion relations, one can have any point ζ ∈ T d as the global minimum, for instance using Even though the minimum point is unique on the torus, if ζ = 0, it does not need to belong to Λ * and then there might be several minimum points in Λ * .

2L
, and thus in this case ω 0 = d sin 2 π 2L and it is reached whenever n i = ±m for all i. Thus to the unique continuum minimum ζ there are 2 d minimum points in Λ * . In fact, in this case one should choose Λ * 0 to consist of these 2 d points, since then for k ∈ Λ * + the excess energies e k increase like |n| 2 /L 2 where |n| denotes the number of "lattice steps" from k to the set Λ * 0 , leading to similar estimates as in the nearest neighbour case. If L is even for this dispersion relation, ζ ∈ Λ * , ω 0 = 0, Λ * 0 = {ζ}, and the behaviour is identical to the nearest neighbour case.
Considering irrational minimum points ζ can lead to much more complicated situations. For example, suppose r is an irrational number between 0 and 1 2 which has a binary representation b j ∈ {0, 1}, j ∈ N, i.e., suppose that r = ∞ j=2 b j 2 −j where the sequence (b j ) does not converge to zero or one. Set ζ 1 = r and ζ i = 0, for i ≥ 2, and consider the following dispersion relation obtained as a product of two previous ones, with global minima at 0 and ζ. Then for each L, 0 ∈ Λ * , ω 0 = 0, and this value can only be reached at k = 0 on Λ * . However, for values of k = n/L, with n i = 0 for i ≥ 2, we have ω(k) ≤ sin 2 π(n 1 − Lr) L .

and for this value
Hence, by considering a binary sequence with ever less frequent ones and sufficiently large N , the bound can be made proportional to L −p for any p ≥ 2. Depending on how small the term is, the above point k = n/L might or might not belong to the condensate modes Λ * 0 . In particular, there are instances for which e k > 0 but e k ≤ 1 2ρ L −2d , and thus item 3 of Proposition 2.3 can be applied without increasing the magnitude of the error. Hence, the system behaves like a uniformly distributed two-component condensate even though e k , k ∈ Λ * 0 , is not identically zero.

Anisotropic dispersion relations
Another generalization of the above condensate cases is to consider anisotropic dispersion relations. For instance, in addition to shifting the global minimum to ζ ∈ T d we may take any finite collection of points M (ℓ) ∈ Z d , ℓ = 1, 2, . . . , N , choose some weights b ℓ > 0 for them, and define If there is a sufficient variety of points in the collection, for instance, if all unit vectors are included, there is only one global minimum for this dispersion relation, located at k = ζ. The Hessian at this point is equal to so that the second derivative into a direction v ∈ S d−1 at k = ζ is given by Hence, essentially arbitrary asymmetries between different directions may be generated near the minimum point by varying m and b.
In the proof of Lemma 2.5 given in Sec. 6, the uniform upper bound for the number of degrees included in the condensate, M 0 , depends on the dimension but also on the ratio between the maximal and minimal eigenvalue of the Hessian of ω at its minima, i.e., on the maximal anisotropy at these points. The value appearing in the proof typically overestimates the true number of degrees of freedom needed. Let us conclude with two examples which highlight the problems which arise when trying to improve on such general uniform bounds.
For simplicity, let us consider anisotropy in the first two components only. To borrow results from the previous computations, assume that L = 2m + 1 is odd and take ζ = ( 1 2 , 0, 0, . . . , 0). We reparameterize the first component using m 1 := m − L|k 1 | ∈ Z and the sign σ 1 of k 1 . Then 0 ≤ m 1 ≤ m and k 1 = σ 1 |k 1 | = σ 1 (m − m 1 )/L, implying also that | sin(π(k 1 − 1 2 ))| = | sin(π(2m 1 + 1)/(2L))| ≥ (2m 1 + 1)/L. We first consider the nearest neighbour case where the first component has unit weight but the rest have a much smaller weight 1/B, where B ≫ 1. Then for k ∈ Λ L , and denoting n i = Lk i , for i = 2, 3, . . . , d, we find an approximation ω(k) ≈ π 2 (m 1 + 1/2) 2 + n 2 /B L 2 , valid for m 1 /L, |n|/L ≪ 1. Thus the minimum value is reached at the two points where m 1 = 0 and n = 0. However, if m 1 = 0, we then also have e k ≈ π 2 n 2 BL 2 whenever |n|/L ≪ 1. Suppose that we wish to include in the condensate Λ * 0 at least all k with e k ≤ L 1 2 −d (corresponding roughly to the choice κ = 1 2 in Lemma 2.5). Since for some finite L it can happen that B ≥ L d−1 , the number of condensate modes can temporarily be very large. This effect can be traced back to the flatness of constant level surfaces of ω caused by the strong anisotropy.
In the second example, we take also the first direction to have a small weight 1/B but add one more point to the collection: set b d+1 = 1 and M (d+1) = (M 1 , M 2 , 0, . . . , 0) where M 1 , M 2 ∈ N are such that M 2 is odd and M 1 is even. Suppose also that L is large enough, satisfying L ≫ M 1 , M 2 . Then for m 1 /L, |n|/L ≪ 1 we have where, using the assumption that M 1 is an integer, Since M 1 is even, M 2 is odd, and both are positive, we may set n 2 = σ 1 M 1 2 and choose m 1 so that 2m 1 + 1 = M 2 . Setting also n i = 0 for i ≥ 3, we obtain two points in Λ * L for which K = 0 and ω(k) ≈ π 2 M 2 1 + M 2 2 4BL 2 . However, for any point for which K = 0, for instance, if m 1 = 0 = n 2 , we have Therefore, if the system is sufficiently anisotropic, e.g., B ≥ M 2 1 + M 2 2 , it can happen that the minimum point is not the nearest lattice point to the minimum on T d , but it could be found many lattice steps away from it. In contrast to the first example, this effect does not disappear when L → ∞, but will persists for all sufficiently large odd L in the present case.

Proof of the main result, Theorem 2.2
Proof of Theorem 2.2: Consider a fixed L and a split (Λ * 0 , Λ * + ) of Λ * which is separated by the energy interval [a, b], 0 ≤ a < b, and has a relative energy gap δ −1 . We aim at separation in the degrees of freedom related to these two sets.
We begin by simplifying the representation of the Berlin-Kac measure µ 0 . Starting from the simplified form, we then construct a change of variables which will bring it closer to the measure µ 1 . We first shift the position of the δ-constraint to match that in µ 1 . This will introduce a shift in the normal fluid energies which we will need to repair back to the critical ones by a second change of variables. Even after these changes, the measures will differ by a weight function which, however, is close to one with high probability. This property is checked quantitatively in a technical Lemma 5.1, resulting in the estimates in Corollary 5.2. To make the final comparison, we use the change of variables to construct a coupling between µ 0 and µ 1 which, together with Corollary 5.2, will result in the stated bound on their Wasserstein distance.
To begin, let us collect the field values for k ∈ Λ * + into a vector Φ + , corresponding to the normal fluid, and those for k ∈ Λ * 0 into a vector Φ 0 , corresponding to the condensate. We denote .
and we may conclude that in the integrand, in which almost surely N [Φ] = ρV , we have Therefore, we may rewrite where the new normalization constant is related to the one given in (2.7) by Z 0 = V e ω 0 ρV Z ρ . Let ρ c > 0 denote the critical density, measured as an expectation of ρ + over the probability measure (2.8), i.e., over By assumption, e k ≥ b > 0 for each k ∈ Λ * + , and thus this is a well-defined Gaussian measure under which Re Φ k , Im Φ k , k ∈ Λ * + , form a collection of jointly independent random variables, with a zero mean and a variance V 2e k . Therefore, as defined in (2.9). Set then ∆ := ρ − ρ c , which is strictly positive by assumption. Then we define the target measure µ 1 as a product between µ + and a suitably chosen condensate measure: we set whereε depends only on the condensate components Φ 0 , Thus, for any k ∈ Λ * To construct a suitable coupling between the measures µ 0 and µ 1 , we rely on a change of variables and the diagonal concentration trick which we learned from Saksman and Webb, from the proof of Lemma B.1 in [9,Appendix B]. The trick is to construct an explicit coupling between two probability measures by concentrating as much of their common mass as possible in the diagonal of the coupling (Φ ′ = Φ) and distributing any remaining mass as a product on the off-diagonal (Φ ′ = Φ). Although this coupling is seldom optimal, it can provide a good estimate of the Wasserstein distance of the two measures in case most of the mass can be concentrated in the diagonal: note that the diagonal mass does not contribute to the value of the integral defining the Wasserstein distance in (B.1).
In our application of the trick, we first need to change into variables using which the two measures share enough common mass. To find new variables better adapted to compare the measures µ 0 and µ 1 , let us start from the measure µ 0 and denote its integration variable by Ψ. The goal is to find a change of variables Ψ = G[Φ] which would yield a measure close to µ 1 : we try to construct G so that for any observable f we would have ) for some function g which is close to one with high µ 1 -probability. Some preliminary estimates and definitions will be needed to find the right choice, and we postpone the precise construction of the coupling later, until Eq. (5.11).
First, let us recall that ∆ = ρ − ρ c > 0 and define Note that α[Ψ] depends only on Ψ + , and − ρc ∆ ≤ α[Ψ] < 1. Consider the expectation of some continuous function f (Ψ + , Ψ 0 ) with a compact support under the original measure µ 0 [dΨ]. The mass constraint function can be written as On the other hand, the set Ψ + ρ + [Ψ] = ρ has a measure zero, and if ρ + [Ψ] > ρ, the mass constraint cannot be satisfied for any Ψ 0 . Hence, the collection of Ψ with ρ + [Ψ] ≥ ρ has zero measure with respect to µ 0 . Since α[Ψ] < 1 depends only on Ψ + , it is straightforward to make a change of variables More detailed discussion about the validity of this formula can be found in Appendix A.
In particular, we are allowed to apply the formal rule for δ-functions to take out the factor (1 − α[Ψ]) here since the δ-function can be integrated out using Ψ 0 while keeping Ψ + , and hence also α[Ψ], fixed.
In the above change of variables, , and therefore we obtain We then use Fubini's theorem to change the order of Ψ and Φ integrals. Then we can simplify the integral by making a change of variables for Ψ + using a fixed E 0 = E 0 [Φ] and assuming ρ 0 = ∆. In particular, for ρ + [Ψ] < ρ, we have Therefore, We now make a second change of variables to correct for the shift of energies here: Φ k = 1 −ε/e k Ψ k for k ∈ Λ * + . As pointed out above, hereε/e k < 1 and we can resolve the change of variables as easily as in the first case. We find that , and we need to substitute in the integrand which are functions of both Φ + and Φ 0 .
To summarize the result, let us define the functions and, using these, the weight function and the change of variables Then the above computation shows that we can then use dominated convergence theorem to conclude that in fact (5.4) holds for all bounded continuous functions f . Note that due to the change of variables implied by G there is a shift in the position of the δ-weight. Therefore, the formula does not imply that µ 0 or µ 1 would be absolutely continuous with respect to each other (in fact, they are not: the collection of Φ with ρ + [Φ] > ρ has zero measure with respect to µ 0 but its measure is non-zero with respect to µ 1 ; conversely, the collection of Φ with ρ 0 [Φ] ≤ ∆ 2 has zero measure with respect to µ 1 but non-zero measure with respect to µ 0 ). However, as we will prove next in Lemma 5.2, the weight g is close to one with high µ 1 -probability, and although there can be regions where it deviates significantly from one, g remains always uniformly bounded. These estimates will provide sufficient control for using the diagonal coupling trick at the end of the section, in (5.11).
Lemma 5.1 Using the above definitions, we have whereδ (5.8) Proof: Using f = 1 in (5.4), we find that g µ 1 = 1, and thus On the left hand side, the integrand is zero unless − ρc ∆ ≤ α ′ < 0. Thus either V 0 = 1 and the term is always zero, or we may bound in the integrand ( Thus the expectation is bounded from above by , and for ρ ′ ≥ ρ, it holds that α ′ ≥ 1. Therefore, We have obtained the bounds which imply also that where (r) + := r½ {r>0} . We may use this result and similar techniques to derive an upper bound for It remains to estimate The remaining Gaussian expectations can be computed explicitly, yielding for k = k ′ (5.9) Therefore, using the definition in (5.8) and the assumption ρ > ρ c . Together with the earlier estimates this completes the proof of the Lemma.
The assumptions made in the Theorem indeed guarantee that δ ≤ 1 2 andδ ≤ ∆ 2 2 4 V 2 0 ρ 2 , sincẽ δ ≤ 2ε. Hence, we may continue the proof of the Theorem assuming that all of the conclusions in Corollary 5.2 are valid.
The above representation allows to construct a coupling γ between µ 0 and µ 1 by combining the change of variables G with the diagonal concentration trick mentioned earlier. Together with the estimates in Corollary 5.2 this will prove the bound stated for the Wasserstein distance between µ 0 and µ 1 in the Theorem. Explicitly, we define a positive Borel measure γ by its action on bounded continuous functions F (Φ, Ψ)¸as follows: Here (r) + := r½ {r>0} and the normalization factor Z ′ is given by where the second equality follows from the identity g = 1 + (g − 1) + − (1 − g) + and the earlier made observation that g µ 1 = 1 by (5.4). The final equality is then a consequence of the identity |g − 1| = (g − 1) + + (1 − g) + . If f is bounded and continuous and F (Φ, Ψ) = f (Φ), a straightforward computation shows that F γ = f µ 1 . If F (Φ, Ψ) = f (Ψ), a similar computation and using the representation in (5.4) proves that F γ = f µ 0 . Therefore, γ is indeed a coupling between µ 0 and µ 1 .
Using this coupling, we can now conclude that In particular, in the case p = 2, we can simplify the computations by first using the upper Let us begin with the second term on the right hand side. The integrand is zero unless g(Ψ) > 1. In particular, then we must have ρ ′ [Ψ] < ρ, implying that Ψ On the other hand, under the measure µ 1 , it holds almost surely that Ψ 0 2 = V ∆. Therefore, almost surely in the above integrand Taking into account the definition of Z ′ , we find an estimate Using the definitions, we find that Φ 2 = Φ + 2 + Φ 0 2 . Therefore, and using the expectations computed in (5.9) By assumption, this term is bounded by 2V 2 ρ 2 , and we may conclude that By Corollary 5.2, (1 − g) 2 1 2 µ 1 ≤ 2 3 V 0 (ρ/∆) V 0 2δ, and thus the second term is bounded by a constant 3 · 2 6 (ρ + ∆)V 0 (ρ/∆) V 0 times L d √ ε. In addition, using the definition (5.3) and Corollary 5.2, we find for the first term Here, whenever ρ ′ [Φ] < ρ and k ∈ Λ * + , we may use the identity valid for all c > 0, and definition of the relative energy gap, to estimate Therefore, Similarly, we have 1 − √ c = (1 − c)/(1 + √ c) for all c ≥ 0, and thus Since the weight is the same for all components k ∈ Λ * 0 , we find using Corollary 5.2 Therefore, since δ ≤ 1 2 and δ ≤ ε 2 , we can add up and simplify the above bounds to arrive at the bound The assumptions about ε allow to simplify this slightly to make the weight comparable to that of the first term. Namely, since now √ ε ≤ ∆/(4ρ) ≤ 2 −2 , we have proven that Taking the square root, we conclude that the claim in the Theorem follows from the assumptions for the measure µ 1 defined in (5.1) and the explicit form for the constant C 2 stated in the Theorem.
Proof of Proposition 2.3, item 3: If e k = 0 for all k ∈ Λ * 0 , we are back to the case in item 2, and since then µ ′ 1 = µ 1 , its conclusions imply also the conclusions of item 3 whenever 0 ≤ε ≤ 1.
Suppose thus that there is some k ∈ Λ * 0 for which e k > 0 and that there isε ≤ 1 for which e k ≤ 1 2ρ L −dε for all k ∈ Λ * 0 . Clearly, thenε > 0. Comparing the definitions of µ 1 and µ ′ 1 , we have Here g 2 depends only on Φ 0 and satisfies g 2 µ ′ 1 = Z 1 Z ′ 1 . As before, the assumptions are tailored to guarantee that g 1 remains close to one, and then an explicit good coupling can be found between µ 1 and µ ′ 1 . As the small parameter we use here In particular, we now have almost surely under µ ′ Since − ln(1 − c) ≤ 2c for 0 ≤ c ≤ 1 2 , we find using the earlier assumption δ ≤ 1 2 that almost surely under µ ′ for all k ∈ Λ * + . Therefore, Similarly, E 0 [Φ] ρc ∆ ≤ δ ′ , and thus we have obtained almost sure bounds Taking expectation over µ ′ 1 we find also that Combining these two results shows that almost surely under µ ′ Since δ ′ ≤ 1 2 , this yields an almost sure bound We define a measure γ 1 by setting for bounded continuous functions F (Φ, Ψ) Note that, since E 0 is not a constant function, g 1 cannot be a constant function, and hence Z ′′ > 0. As before, it is then straightforward to check that the first marginal equals µ ′ 1 and the second marginal equals µ 1 .
Therefore, γ 1 is a coupling between µ 1 and µ ′ 1 , and we have Again, we estimate Φ − Ψ 2 ≤ 2( Φ 2 + Ψ 2 ), and use the symmetry and definition of Z ′′ to obtain a bound Combined with the almost sure bound in (5.12), we find that Here, Φ 2 Note that we obtained a better dependence onε than on ε in the earlier estimate since we did not need to use the Schwarz inequality above. This was possible here since the weight g 1 is almost surely close to one unlike the weight g which is close to one only with high probability.
Since the Wasserstein metric satisfies the triangle inequality, we can now combine the above bound with the one proved in Theorem 2.2, and conclude that L +ε 1 2 , as claimed in the Proposition.
6 Proof of the existence of the energy gap, Lemma 2.5 Here we suppose d ≥ 3 and consider a dispersion relation ω which satisfies Assumption 2.4. For each L, define ω 0 and e k , k ∈ Λ * , as in Definition 2.1. We choose κ such that 0 < κ < d 2 , if d ≥ 4, and 0 < κ < 1, if d = 3, and fix its value for the rest of the proof. In principle, only the local behaviour of ω around its global minima will matter, but the proof is complicated by the fact that the local behaviour in a neighbourhood of each minima can be different and the values of e k can become mixed between the minima.
The proof will be composed out of several steps. The steps are not completely independent, and each step may use estimates and notations accumulated from the previous steps. Although the proof is not isolated into technical Lemmas, the steps highlight its structure by each having a specific goal, listed in the following: 1. Isolate sufficiently small neighbourhoods in T d around each minimum of ω so that second order Taylor series bounds its behaviour in the neighbourhood.
2. Choose sufficiently large L so that the rectangular grid Λ * (L) has some points in each neighbourhood.
3. Construct a condensate candidate set Λ * 1 by isolating all small energies, with an energy difference from the lowest energy proportional to L −2 . Show that the number of points in this set is bounded by some M 0 which does not depend on κ nor on L 4. Use a "pigeon hole" argument to show that this set must contain a large enough relative energy gap. This will fix the condensate wave number set Λ * 0 , hence also Λ * + , and complete the proof of item 1 of the Lemma. 5. Check that the relative energy gap of the construction satisfies item 2 of the Lemma. 6. Use the previous estimates to find a constant c 2 for the bound (2.18), separately for d = 3, d = 4, and d ≥ 5.
7. Using an approximation with suitable Riemann sums, prove the estimates (2.19) and (2.20) for the continuum limit L → ∞.
(Step 1) Consider a point k 0 ∈ T 0 where ω(k 0 ) = ω min . Since k 0 is a non-degenerate minimum of a twice continuously differentiable function ω, we have ∇ω(k 0 ) = 0 and the eigenvalues of D 2 ω(k 0 ) are strictly positive. Let λ − and λ + denote the smallest and, respectively, the largest of these eigenvalues as k 0 varies through the elements in T 0 . Then 0 < λ − ≤ λ + . By continuity of D 2 ω there is δ > 0 such that δ < 1 2 , and whenever 1 k 0 ∈ T 0 , |k − k 0 | < δ, and p ∈ R d we have As T 0 is finite, we can also assume that the balls B(k 0 , δ) are disjoint, by choosing a smaller δ if this is not true initially. Since the set k ∈ T d |k − k 0 | ≥ δ, for all k 0 ∈ T 0 is compact, the continuous function ω has a minimum value ω 2 which is attained within the set. Then we must have ω 2 > ω min since else the point k at which ω(k) = ω 2 would belong to T 0 . Furthermore, by a Taylor expansion up to second order around k 0 , we find that if k 0 ∈ T 0 and |k − k 0 | < δ, then Step 2) We are going to define a cut-off size L 0 , and consider lattices with L ≥ L 0 . We begin by assuming that L 0 ∈ N + satisfies where c 0 is an L-independent constant depending on ω via λ + , For any such Λ * (L), let us first isolate the minimum value of ω on these points, i.e., set as in the Lemma ω 0 (L) := min k∈Λ * ω(k) . As shown by the examples in Sec. 4, ω 0 may then depend on L, and even if ω would have more than one minimum point on T d , the value of ω 0 could be unique.
Since Λ * forms a rectangular grid with side length 1 L on T d , to any point k ∈ T d there is In particular, ω 0 (L) → ω min as L → ∞. ( Step 3) We recall that e k = ω(k) − ω 0 for k ∈ Λ * , and consider the following set of k which have an energy close to the ground state: Clearly, any minimum point has ω(k) = ω 0 and thus it belongs to Λ * 1 . Hence, Λ * 1 is not empty. In addition, the second inequality in (6.2) implies that if k ∈ Λ * 1 , then ω(k) − ω min = e k + ω 0 − ω min < c 0 L −2 ≤ ω 2 − ω min . Therefore, to each k ∈ Λ * 1 , we can find a unique k 0 ∈ T 0 such that |k − k 0 | < δ and the inequalities (6.1) hold.
For each k 0 ∈ T 0 , let us next consider the values in the subset By the same reasoning as above, we can find n 0 ∈ Z d for which |n 0 − Lk 0 | ∞ ≤ 1 2 . Therefore, is it possible to reparameterize the values in Λ * (k 0 ; L) defining m(k) = (Lk − n 0 ) mod Λ L for each k ∈ Λ * (k 0 ; L). Note that then for all k ∈ Λ * (k 0 ; L) we have Lk = (n 0 + m(k)) mod Λ L and and thus also where ⌊x⌋ denotes the smallest integer in Z less than or equal to x ∈ R. Then M ≥ 0, and there are at most (2M + 1) d values m ∈ Z d which can satisfy |m| ∞ ≤ M . Even if the maximal number of points occur in Λ * (k 0 ; L) ∩ Λ * 1 at each k 0 ∈ T 0 , we conclude that there are at most points in Λ * 1 . (Step 4) We are next going to construct Λ * 0 as a subset of Λ * 1 , and then also |Λ * 0 | ≤ M 0 and 0 ≤ ω(k) − ω min < c 0 L −2 for all k ∈ Λ * 0 . Let us stress that M 0 is indeed independent of L and κ, as required in the Lemma. For simplicity, we now add one more requirement for L 0 : we assume that L d 0 ≥ M 0 + 1, so that if L ≥ L 0 , the complement of Λ * 1 cannot be empty. To isolate those Fourier modes which behave as a condensate, recall that κ has been fixed to satisfy the requirements of the Lemma. Define b ′ L = 1 2 c 0 L −d+κ and r L := L − d−2−κ M 0 , to denote the two bounds appearing in item 2 of the Lemma. Then r L ≤ 1, since L ≥ 1, and the assumptions imply that κ < d − 2. We also have Therefore, if e k < b ′ L , also e k < c 0 2 L −2 , and thus k ∈ Λ * 1 . All of these values of k will be included in Λ * 0 but to find a suitable gap, we might need to include also some values from the remainder set, If Λ * 2 = ∅, we can conclude that e k < b ′ L for each k ∈ Λ * 1 and, if k ′ ∈ Λ * \ Λ * 1 , we have Therefore, we may then define Λ * 0 = Λ * 1 and the corresponding split is separated by [a L , b L ] and has an energy gap δ −1 L , where δ L < r L , Suppose thus that N 2 := |Λ * 2 | > 0, and enumerate the elements k i ∈ Λ * 2 , i = 1, 2, . . . , N 2 , so that o i = e k i form an increasing sequence, o i+1 ≥ o i for all i. Define also o N 2 +1 := min k∈Λ * \Λ * 1 e k ≥ c 0 2 L −2 and o 0 := max k∈Λ * 1 \Λ * 2 e k < b ′ L . Note that at least all minimum points belong to Λ * 1 \ Λ * 2 and our L is large enough so that Λ * \ Λ * 1 cannot be empty. Clearly, also the new sequence of o i , i = 0, 1, . . . , N 2 + 1, is increasing. Therefore, we can use a pigeon hole argument to the relative energies: We have The right hand side is equal to ln r −M 0 L = M 0 ln r −1 L , and since N 2 Let j denote the smallest of such i, and define Therefore, neither Λ * 0 nor its complement Λ * + can be empty, and |Λ * 0 | ≤ M 0 . In addition, 0 ≤ ω(k) − ω min < c 0 L −2 for all k ∈ Λ * 0 , and thus (Λ * 0 , Λ * + ) forms a split of Λ * which satisfies item 1 of the Lemma.
Also by construction, if k ′ ∈ Λ * + , then k ′ ∈ Λ * 2 or k ′ ∈ Λ * \ Λ * 1 , and in both cases e k ′ ≥ b ′ L . Thus we may define b L := min k∈Λ * + e k for which b L ≥ b ′ L . In addition, for any k ∈ Λ * 0 we have Therefore, setting a L := o j , we find that this choice results in a split which is separated by [a L , b L ] and has an energy gap δ −1 L , where δ L ≤ r L . (Step 6) We have now shown that the split (Λ * 0 , Λ * + ) constructed above satisfies also item 2 of the Lemma, and thus only the bounds stated in item 3 remain to be proven. We only need to consider values of e k for k ∈ Λ * + for which we have proven a lower bound e k ≥ 1 2 c 0 L −d+κ . In addition, we may also further divide these values into the sets F (k 0 ) := Λ * (k 0 ; L) ∩ Λ * + , k 0 ∈ T 0 , and F ′ := Λ * + \ (∪ k 0 ∈T 0 F (k 0 )). If k ∈ F ′ , we have by construction a lower bound e k ≥ ω 2 − ω 0 which by (6.2) and item 1 of the Lemma is bounded from below by Let us then consider a fixed k 0 ∈ T 0 and the values k ∈ F (k 0 ). As explained above, we may parameterize these using integers m(k) ∈ Λ L . If |m(k)| ∞ ≥ 1, we have then L|k −k 0 | ∞ ≥ |m(k)| ∞ − 1 2 ≥ 1 2 |m(k)| ∞ . On the other hand, then also This implies that whenever |m(k)| 2 For the remaining values we use the bound in item 2 of the Lemma, and taking into account that |m(k)| ∞ ≤ L 2 , we may conclude that The remaining sum satisfies a bound If d ≥ 5, the terms in the sum over n form an increasing sequence and its value is bounded by L d Collecting the above bounds together we find that there is a constant c > 0, which may vary with d but can be chosen independently of L, such that, if d = 3, where d−2κ > 0. In each of the three cases, the first term in the parenthesis on the right hand side dominates over the second term as L → ∞. Therefore, we can always find a constant c 2 so that the bound in (2.18) holds for the fixed choice of κ.
(Step 7) For the final estimates (2.19) and (2.20), let us first recall the bounds (6.1) satisfied by ω(k) − ω min in a δ-neighbourhood of any of its zeroes. Using the bounds and spherical coordinates shows that the integral (2.19) defining ρ ∞ is finite for all d ≥ 3. Denote the integrand by f (k) := 1 ω(k)−ω min for k ∈ T d \ T 0 , and choose arbitrarily f (k) to be zero otherwise. Suppose that L ≥ L 0 , so that we may use all of the above results, in particular, let us continue to use the split (Λ * 0 , Λ * + ) defined above. Cover T d with closed boxes with side length 1 L and with k ∈ Λ * at the centre of each box, i.e., set for each k ∈ Λ * Clearly, then D k dk ′ 1 = L −d , and thus On the other hand, the points on the torus which correspond to a point in more than one box form a set of zero measure, so we may write Therefore, We estimate the error in two parts: First, the sum over k ∈ Λ * 0 and those k ∈ Λ * + which are sufficiently close to some k 0 ∈ T 0 can be estimate similarly. For the remaining k ∈ Λ * + we use differentiability of f and decay of the error with distance from the singular set T 0 .
We first recall the above split of Λ * + into F ′ and F (k 0 ), and consider the sum over k ∈ F (k 0 ) for some fixed k 0 ∈ T 0 . Computing directly from the definitions, we find that Here k ′ ∈ D k , and thus |k ′ − k| ∞ ≤ 1 2L . Hence, by convexity of D k , Using again the parameterization of k by m(k) for which L|k − k 0 | ∞ ≤ |m(k)| ∞ + 1 2 , by the second bound in (6.1) we may estimate for all ξ ∈ D k and sufficiently large L |∇ω(ξ)| ≤ 2λ + |ξ − k 0 | ≤ 2λ Therefore, if k is close enough to k 0 so that |m(k)| 2 ∞ < 2 4 c 0 λ − + 4, we can conclude that there is an L and k-independent constant c ′ such that for all k ′ ∈ D k |ω(k) − ω(k ′ ) − ω 0 + ω min | ≤ c ′ L −2 .
Thus the contribution from such k satisfies In addition, then |k − k 0 | ≤ √ d|k − k 0 | ∞ ≤ L −1 c ′′ , for an L-independent constant c ′′ > 0. Therefore, the sum of the error terms over these k is bounded by 2c ′ c 0 L d−2−κ times This proves that the error from these terms is O(L −κ ) as L → ∞.
Since for each k ∈ Λ * 0 we know that L|k − k 0 | ∞ ≤ 4c 0 /λ − , an identical argument may be used to conclude that, as L → ∞, Let us next estimate terms k ∈ F (k 0 ) with |m(k)| 2 ∞ ≥ 2 4 c 0 λ − + 4. By the earlier computations, we know that then e k ≥ λ − 2 5 |m(k)| 2 ∞ L −2 . On the other hand, since |m(k)| ∞ ≥ 2, we also have |m(k)| ∞ − 1 ≥ 1 2 |m(k)| ∞ , and thus, if k ′ ∈ D k , we may estimate |k ′ − k 0 | ∞ ≥ |k − k 0 | ∞ − |k ′ − k| ∞ ≥ 1 L (|m(k)| ∞ − 1) ≥ 1 2L |m(k)| ∞ . Thus by (6.1) and both 1/e k and f (k ′ ) have similar upper bounds. It is now useful to expand the difference further and integrate the identity Since D k dk ′ (k ′ i − k i ) = 0 for any i = 1, 2, . . . , d, we have and, therefore, Since ω is twice continuously differentiable, together with (6.9) this shows that there is an L-independent constant C ′ > 0 such that Therefore, denoting m = m(k), using (6.10) to estimate the derivative, and recalling the earlier upper bounds for 1/e k and f (k ′ ), we find that where the constant C ′′ is independent of L. Estimating the sum over possible values of m as above, we thus find that the contribution from these terms is O(L −1 ), for d = 3, it is O(L −2 (1 + ln L)), for d = 4, and O(L −2 ), for d ≥ 5. The first two cases are O(L −κ ), and thus we have proven that as required by the Lemma. It remains to estimate the contribution from the values with k ∈ F ′ . Since then e k ≥ (ω 2 − ω min )/2 > 0 uniformly in k and L, we may simply use the uniform bound for the gradient in (6.11), and conclude that Combining all of the above results, we have thus proven that ρ c (L) = ρ ∞ + O(L − min(κ,2) ) , which completes the proof of the Lemma.

A Definition and basic properties of the δ-constraints
In the text, we often use measures which are defined on R n , n ≥ 2, by the formula µ[ds] = w(s) δ |s| 2 − N d n s . (A.1) where N > 0, w : R n → R is a strictly positive continuous function, and d n s denotes the Lebesgue measure on R n . We first move to spherical coordinates to formally integrate out the δ-function. Then for any continuous bounded non-negative function f : R n → R we would have where we have used shorthand notations s = rΩ = √ tΩ and the assumption that N > 0. Here dΩ denotes the solid angle integration and thus its total mass is finite. On the other hand, the values √ N Ω cover the sphere with radius √ N and centre at the origin, which is a compact set. Since the continuous function f w is non-negative and has a maximum on this sphere, we may conclude that the map from f to the right hand side of (A.2) is a positive linear functional on the space of bounded continuous functions on R n . Since R n is a locally compact Hausdorff space, Riesz representation theorem implies that there is a unique regular Borel measure µ on R n for which (A.2) holds for all continuous f with a compact support, and hence obviously also for all bounded continuous f . This yields the definition of µ as a positive Radon measure. The argument also shows that R n µ[ds]1 = 1 2 N n 2 −1 S n−1 dΩ w( √ N Ω) < ∞. Since w > 0 by assumption, and S n−1 is compact, there is c > 0 such that w( √ N Ω) ≥ c. Thus the value of the integral is greater than zero, and it is always possible to normalize µ into a probability measure by multiplying w with a positive constant, as was assumed in the text.
Consider the open set E := s ∈ R n |s| = √ N , and define for all j ∈ N + the closed sets E j := s ∈ R n ||s| − √ N | ≥ 1 j . Clearly, ∪ j E j = E, and by Urysohn's lemma to each j there exists a continuous function f j such that f j (s) = 1 if s ∈ E j , and f j (s) = 0 if s ∈ E. We can use (A.2) to compute R n µ[ds] f j (s) and since f j ( √ N Ω) = 0 for all Ω, it follows that Therefore, µ(E) = 0 and |s| 2 = N almost surely under µ, as claimed in the text. Finally, let us point out that many ordinary properties of Lebesgue measures are inherited by the measure µ. For instance, we are mainly interested in situations where w and f are continuous bounded functions on R n . Then for any sequence ε j > 0 for which ε j → 0, we can approximate the value of µ[ds]f (s) by replacing the δ-function by a Gaussian function with a standard deviation ε j , i.e., if we define for y ∈ R Then, it is possible to perform a change of variables as usual to the Lebesgue integrals on the right hand side, and compute the limit to get the value of the left hand side. Similarly, one may check that, if w is invariant under permutation of the labels of the vector s or rotations of the space R n , then so is µ. In addition, the following two observations arising from the above limits are used in the text. First, if one makes a scaling of the field s, the result follows standard formal rules of δ-functions: given R > 0, make a change of variables s = Rs ′ , yielding Secondly, if I ⊂ {1, 2, . . . , n}, 2 ≤ |I| < n, we may use Fubini's theorem and spherical coordinates in R I to integrate out the δ-constraint. Let J denote the complement of I, set m = |I|, and apply Fubini's theorem to show that Under the above assumptions, the measures µ 1 and µ 2 have a finite p:th Wasserstein distance W p (µ 1 , µ 2 ) which is defined via the formula where the infimum is taken over couplings γ between µ 1 and µ 2 . There is always at least one such coupling, namely µ 1 × µ 2 . Since x 1 − x 2 ≤ x 1 − a 1 + a 1 − a 2 + a 2 − x 2 , the expectation over γ is finite for this coupling, X×X γ(dx 1 , dx 2 ) x 1 − x 2 p < ∞.