Structure of the condensed phase in the inclusion process

We establish a complete picture of condensation in the inclusion process in the thermodynamic limit with vanishing diffusion, covering all scaling regimes of the diffusion parameter and including large deviation results for the maximum occupation number. We make use of size-biased sampling to study the structure of the condensed phase, which can extend over more than one lattice site and exhibit an interesting hierarchical structure characterized by the Poisson-Dirichlet distribution. While this approach is established in other areas including population genetics or random permutations, we show that it also provides a powerful tool to analyse homogeneous condensation in stochastic particle systems with stationary product distributions. We discuss the main mechanisms beyond inclusion processes that lead to the interesting structure of the condensed phase, and the connection to other generic particle systems. Our results are exact, and we present Monte-Carlo simulation data and recursive numerics for partition functions to illustrate the main points.


Introduction
Condensation phenomena in stochastic particle systems (SPS) continue to be a topic of major research interest. They can be caused by spatial inhomogeneities (see e.g. [1,2] and references therein) or attractive particle interaction in spatially homogeneous systems, which is the focus of this paper. If the total density of particles exceeds a critical value, the system phase separates into a homogeneous bulk and a condensed phase, with a finite fraction of the total mass concentrating in a vanishing volume fraction. First introduced in [3], zero-range processes and related models provided a first example of condensation in homogeneous SPS [4,5,6]. On the level of stationary distributions condensation is characterized by heavy-tail behaviour of stationary weights as first noted in [7,8], which has been used to study the phenomenon in the context of equivalence of ensembles and large deviations [9,10,11].
The inclusion process has been introduced in [12] as a discrete dual to a model of heat conduction, and has later been studied as an interesting model of stochastic transport on its own [13,14,15]. It is a natural bosonic counterpart to the exclusion process where particles are subject to an attractive inclusion interaction in addition to independent diffusive motion. It can also be interpreted as a multi-species version of the Moran model of population genetics [16], where the inclusion interaction corresponds to selection, and diffusion to mutation dynamics. The inclusion process is part of a larger class of models introduced in [17] that exhibit factorized stationary distributions, which has recently been extended [18]. Condensation in the inclusion process has first been studied in [19] for inhomogeneous systems. Condensation in homogeneous systems only occurs if the diffusion strength vanishes with the system size. While such scaling of system parameters can lead to non-equivalence of ensembles and discontinuous behaviour as established for a toy zero-range model in [20,21], this is not the case for the inclusion process and small diffusion or mutation rates are in fact very natural in many applications. The dynamics on various time scales have been established on a rigorous level in [22,23], restricted to finite lattices in the limit of diverging particle density. In the thermodynamic limit with a finite limiting density there are only heuristic results so far, covering the dynamics of condensation in the inclusion process [24] and extensions with stronger particle interactions and instantaneous condensation [25,26].
In particular, the stationary behaviour of the inclusion process in the thermodynamic limit has not been characterized so far, which is the main aim of this paper. We establish the equivalence of ensembles, and show that for vanishing diffusion strength the inclusion process exhibits condensation for any positive particle density. While the bulk of the system is empty, the condensed phase can exhibit an interesting hierarchical structure following the Poisson-Dirichlet distribution. The latter was originally introduced in the context of population genetics [27,28], and has later been identified as the generic stationary distribution of split-merge dynamics [29,30], which is related to its appearance in cycle length distributions of random permutations [31,32,33]. It has further been observed (though not identified) more recently in systems of interacting diffusions [34,35], but to our knowledge is a novelty in the context of condensation in SPS. In general, the condensed phase in SPS with stationary product distributions concentrates on a single lattice site [7,8,10,36]. A spread over multiple sites has only been observed in versions of zero-range processes which include an effective (soft) cut-off for site occupation numbers [37,38], or in models with pair-factorized stationary states [39,40] where it occurs naturally due to spatial correlations. Poisson-Dirichlet statistics arise when the diffusion parameter in the inclusion process scales with the inverse system size, and we also establish complete condensation for smaller diffusion where all particles concentrate on a single site, and a universal exponential law for intermediate scales.
Our main results on the structure of the condensed phase are derived using size-biased sampling of occupation numbers, which is related in a natural way to the Poisson-Dirichlet distribution as reviewed in Section 3.2. While this point of view is standard in population genetics (see e.g. [41]), this approach also provides a strong tool to study the condensed phase in SPS where it has not been used so far. After introducing the basic notation and concepts in Section 2, we derive our main results on condensation and the typical structure of the condensed phase for the inclusion process in Section 3. Our results are rigorous and derivations are presented in a general, transferable way, and we show simulation data for illustration. We include results on large deviations of the condensed phase in Section 4, and conclude with a discussion of the main points and relations to other models in Section 5. In Appendix A we show that under a general definition of condensation the system phase separates into a homogeneous bulk and a condensed phase, and that condensation implies divergence of higher moments. In Appendix B we comment on Monte-Carlo dynamics to generate stationary samples, and on differences between one-dimensional and mean-field geometries.
2 Mathematical setting 2.1 Condensation in homogeneous particle systems We study stochastic particle systems (SPS) on a finite set of spatial locations/sites Λ of size |Λ| = L, which can for example be a regular lattice with periodic or closed boundaries. The system has a fixed, finite number of N particles, and we denote configurations by η = (η x : x ∈ Λ), η x ∈ N 0 , and the state space E L,N = η : x∈Λ η x = N denotes the set of all configurations. The dynamics should be irreducible on E L,N , so that the process has a unique (canonical) stationary distribution π L,N . We assume that π L,N is spatially homogeneous, i.e. the single-site marginals π L,N [η x ∈ .] do not depend on site x, and in particular this implies that the density (the expected number of particles per site) is given as We are interested in large-scale condensation phenomena of the system in the thermodynamic limit L, N → ∞ such that the density converges as N/L → ρ ≥ 0, which in the following we often denote by lim N/L→ρ to simplify notation. We assume that in this limit finite marginals of π L,N converge, and we denote the limiting single site marginal as a distribution on N 0 by This convergence of distribution functions is equivalent to weak convergence, i.e.
for all x ∈ Λ and bounded, continuous test functions f ∈ C b (N 0 ). With (1) the first moment η x L,N → ρ converges in the thermodynamic limit, and by Fatou's Lemma this implies for the first moment of the limiting distribution that This is usually called the background or bulk density (indicated by the subscript) as is explained below. Strict inequality above is possible since f (η x ) = η x is an unbounded function on N 0 , and implies that locally the system loses mass in the limit, providing the following standard definition of condensation.
Definition 1. A system with canonical distributions π L,N exhibits condensation in the thermodynamic limit N/L → ρ with background density ρ b as in (4), if ν ρ exists as defined in (2) and i.e. typically all particles in the system concentrate on a single lattice site. If ν ρ exists for all ρ ≥ 0, the systems is said to exhibit a condensation transition with critical density ρ c ≥ 0, if Condensation in the above setting has been established in various SPS, including zero-range processes and related models (see e.g. [42,43] and references therein). It has been shown on a case-by-case basis that ρ b is monotone increasing with ρ and there exists a unique critical density ρ c ∈ [0, ∞] in the sense of (6). One sufficient general condition is monotonicity of the dynamics for the underlying particle system. But in principle more complicated behaviour such as nonmonotonicity of ρ b cannot be ruled out, even though we are not aware of any generic examples in the thermodynamic limit. For condensation on finite lattices possible non-monotonicity of ρ b has been established and discussed e.g. in [44,45] and references therein.
As is discussed in more detail in Appendix A, the interpretation of ρ b < ρ is that the system phase separates into a homogeneous bulk phase and a condensed phase. The latter concentrates on a vanishing volume fraction but contains a non-zero fraction ρ − ρ b > 0 of the total mass in the system, and is usually simply called the condensate. Depending on the specific example and the nature of π L,N the condensate may cover only a single lattice site (see e.g. [10,36]) or a subextensive volume [39,40]. In most cases the bulk density ρ b = ρ c is equal to critical one, but there are also models with ρ b < ρ c , such as zero-range toy models with size-dependent rates [20,21] which introduce an effective long-range interaction and lead to non-equivalence of ensembles. Complete condensation has been established for particular zero-range processes in [7,46] and for inclusion processes in a fixed volume in [23].
As we show in Appendix A in Proposition 6, condensation as defined above implies in particular divergence of higher moments η a x L,N with a > 1. This has been used in some papers as a definition of condensation often using a = 2 [47,48]. The converse does not hold, since moments of limiting distributions ν ρ with heavy tails can diverge also in the absence of phase separation, so we stick to Definition 1 to characterize condensation. For condensing systems, divergence of higher moments is due to the contribution of diverging occupation numbers in the condensed phase which is not described by the limiting distribution ν ρ .

Models with stationary product measures
From now on we focus on stochastic particle systems which are defined by a generator of the form for continuous test functions f ∈ C(E L,N ). This defines a continuous-time Markov process on the state space E L,N jumping from configurations η to η xy where one particle moves from site x to y. The spatial dependence of the rates is given by a multiplicative factor p(x, y), which we take to be an irreducible transition kernel for a single particle on Λ. The interaction between particles is determined by the function u which depends only on the occupation numbers of departure and target site of a jump event. To ensure irreducibility of the process on E L,N we assume u(m, n) ≥ 0 and u(m, n) = 0 if and only if m = 0 .
To ensure spatial homogeneity at stationarity we assume x∈Λ p(x, y) = x∈Λ p(y, x) for all y ∈ Λ , which is a slight generalization of translation invariance on regular lattices. This type of models have first been introduced in the seminal paper [17]. It is well known (see also [2,18]) that they exhibit stationary product measures if and only if and either p(·, ·) is symmetric, or u(n, m) − u(m, n) = u(n, 0) − u(m, 0) for all n, m ≥ 0 .
In this case, normalizing the weights w(n) = n k=1 u(1, k) u(k, 0) leads to product distributions which are stationary for all φ ≥ 0 such that the normalizing partition function z(φ) < ∞. Note that these 'grand-canonical' distributions are supported on the extended state space E L = η : η x ≥ 0 without fixing the total number of particles. The expected number of particles per site is given as a monotone increasing function of φ as For such processes we have explicit representations of the canonical distributions as conditional grand-canonical distributions which in fact do not depend on the choice of φ > 0. This leads to the useful form with canonical partition function Z L,N . This implies in particular that for ρ < ρ c the limits (2) of single-site marginals are given by the marginal ν 1 φ with φ ≥ 0 such that R(φ) = ρ. For models of the above type, the condensation transition as given in Definition 1 is equivalent to existence of φ c < ∞ such that z(φ) = ∞ for all φ > φ c , and R(φ) → ρ * < ∞ as φ → φ c (see e.g. [2] for a detailed discussion). Examples of this type studied so far include zero-range processes with u(m, n) = u(m) and decreasing rates u(m) [8,5,36]., where ρ * = ρ c = ρ b . If the rates can depend on the system size, the transition can also be discontinuous with ρ b < ρ c < ρ * where grand-canonical distributions with densities in the range (ρ c , ρ * ) are metastable [20,21]. More recently, condensation has also been studied for inclusion processes [19] and explosive condensation models [25,43,26] with rates of the form If γ > 2 the system exhibits a condensation transition for all d > 0 with ρ c > 0. For inclusion processes we have γ = 1, and this case is covered in more detail in Section 3.1. In all generic systems with stationary product measures studied so far, we have and the condensed phase concentrates on a single lattice site. In Section 3 we will see for the inclusion process that the condensed phase can extend over more than one site and have an interesting hierarchical structure, which has not been observed for condensing particle systems so far.

Size-biased sampling
Since the condensed phase concentrates on a vanishing volume fraction, the limiting marginal probabilities for a fixed number k of occupation numbers converge to the distribution of the bulk in a condensed system. As explained above, for models with stationary product measures this is usually given by the maximal product measure with critical density ρ c = R(φ c ) and we have (cf. [10]) for all x 1 , . . . , x k ∈ Λ and n 1 , . . . , n k ≥ 0. This asymptotic equivalence of canonical and grand canonical ensembles (distributions) has been established for a large class of models [9,2], and implies weak convergence w.r.t. local, bounded test functions as in (3).
Since it contains a non-zero fraction of all particles, the distribution of the condensed phase can be accessed via size-biased permutations of particle configurations. This can be interpreted as picking a particle uniformly at random and sampling the occupation number η x at its location x. The larger η x , the more likely it is to pick site x in this way. Formally, this can be defined recursively (see e.g. [41], Section 2.4).
Definition 2. For given η ∈ E L,N pick a random permutation σ : Λ → Λ of the lattice indices as . . . and so on .
For models with canonical distributions of the form (12), the distribution of the first size-biased marginal is given by where the stationary weight w(n) is re-weighted proportional to n and re-normalized. Here and in the following we use the convention Z L,k = 0 for all k < 0, so we can omit indicator functions of the form 1 n≤N to simplify notation. Note that the first identity in (14) with the re-weighted marginal probability holds in general, but the second one only because π L,N is a conditional product measure of the form (12). For a two-site size-biased marginal we then have Generalizing to the k-site case we get which includes k = L to get the full distribution ofη with Z 0,n = 1 for all n ∈ {0, . . . , N }. Note that due to size-biased re-ordering, the distribution ofη and its marginals is of course not spatially homogeneous.
To our knowledge, essentially all previous studies of condensation in homogeneous particle systems focus instead on the (decreasing) order statisticŝ and in particular the maximum occupation number η (1) (see e.g. [49,10,21,50]). We will see below how this is related to size-biased sampling, and that the latter is very suitable to study condensation in systems with ρ b = 0 such as the inclusion process and related models. A sizebiased sampling approach can also be useful in models with ρ b > 0 to study the dynamics of the condensed phase and phase separation as recently shown in [51].

The Poisson-Dirichlet and GEM distribution
The Poisson-Dirichlet distribution has been introduced by Kingman in the context of population genetics [27,28] and has since occurred in a variety of applications, such as split-merge dynamics [29,30] and random permutations [31,32,33]. It is a one-parameter family of probability measures defined on the set of ordered partitions of the unit interval It can be characterized for instance as a scaling limit of Dirichlet random variables which form a finite partition of [0, 1], or via scale invariant Poisson processes (see Chapter 2 in [41] for details).
One of the most accessible characterization in terms of practical use is related to the GEM distribution, named in [52] after Griffiths [53,54], Engen [55] and McCloskey [56], which is defined as follows.
, and the uniform distribution as a special case for α = 1. On the set of (unordered) partitions which corresponds intuitively to breaking off a fraction 1−U 1 from the unit interval and continuing this process recursively with the remaining interval. The law of V on ∆ is called the Griffiths-Engen-McCloskey distribution GEM(α), and the corresponding order statisticsV on ∇ has Poisson-Dirichlet distribution PD(α). Alternatively, given a PD(α) distributed partition V on ∇, its size-biased permutationṼ has GEM(α) distribution on ∆ (see e.g. [41] for details). Note that the construction (17) leads to a hierarchical structure of a GEM(α) partition V , and the paramter α > 0 controls the expected size of the components. The expectation of Beta(1, α)distributed random variables U i is 1 1+α , so for small α the size of the first component V 1 is larger and the hierarchy stronger. For larger α the expected sizes of the components are more similar, but always show a strict order since This shows that in fact V ∈ ∆ and that the expected component sizes of V k vanish as k → ∞, and is also a useful relation to numerically test for GEM distributions (see Section 3.4). Carrying over the product topology from [0, 1] ∞ , weak convergence of probability distributions on ∆ and ∇ is equivalent to convergence in distribution of finite marginals (V 1 , . . . , V k ) of partitions. By Theorem 2 in [57], convergence in distribution of a sequence of size biased parti-tionsṼ i → V on ∆, implies convergence in distribution of the corresponding ordered partitionŝ V i →V , and V is a size-biased permutation ofV . In Section 3.2 we will use this fact and that rescaled particle configurations 1 N η ∈ ∆ can be interpreted as finite partitions of the unit interval, to derive our main results. Note that in a condensing system with ρ b < ρ (4) the partitions 1 N η in the thermodynamic limit only converge on the extended space which allows for the loss of mass due to phase separation (see Proposition 6 in Appendix A). On the other hand, size-biased permutations capture the condensed phase and the full mass of the system, and 1 Nη converge on ∆, as we will establish in the next Section.

Condensation in the inclusion process
The inclusion process is a stochastic particle system of type (7) with rates which was first introduced in [12] in the context of energy/mass transport. Another important interpretation of this model is as a multi-species version of the Moran model of population genetics, which describes the selection-mutation dynamics of a population of N individuals which can take L different types [58]. Here the parameter d describes the mutation rate, which is small compared to the reproduction rate of the system and is often taken to depend on the system size d = d L > 0 and vanish as L → ∞. Results in [23] show that for fixed L as N → ∞, complete condensation occurs if d = d N 1/ log N . The thermodynamic limit has not been studied so far, and in this section we will establish a complete picture covering all densities ρ > 0 and possible scaling regimes of the parameter d.
The inclusion process satisfies conditions (8) and (9) and has stationary product measures of the form (10) with weights This also leads to an explicit formula for the canonical distributions which can be identified as a Dirichlet multinomial distribution (cf. [41], Chapter 1). These have been studied in detail in the context of urn models and have interesting structural properties and symmetries, but in the following we only make use of the asymptotic form of the partition function so that our results can be more easily translated to other systems. Our main results in the thermodynamic limit N, L → ∞, N/L → ρ ≥ 0 are derived in the next subsections, and can be summarized as follows: we have asymptotic equivalence of canonical measures and stationary product distributions (10) with φ ∈ [0, 1) such that R(φ) = ρ (11), and there is no condensation.
2. d → 0: the inclusion process exhibits a condensation transition with ρ c = 0 as follows: (a) d → 0 and dL log L → 0: complete condensation (b) d → 0 and dL → α ∈ (0, ∞): the condensed phase exhibits a hierarchical structure on the scale N given by the PD(α) distribution.
(c) d → 0 and dL → ∞: the condensed phase consists of order dL sites with independent occupation numbers of order ρ/d and exponential distribution.
We will make use of the asymptotic behaviour of w(n) (20) and the partition function Z L,N , which can be derived by standard Stirling approximations from (22). Particularly useful in the following is the asymptotic behaviour of the ratio 2 which holds for all sequences a = a L and b

Equivalence of ensembles and condensation
We assume d > 0 constant or d L → d > 0. In this case (21) implies that there exist grandcanonical distributions for any density ρ ≥ 0, by choosing such that R(φ) = ρ. In this case the equivalence of ensembles can be established most naturally in terms of the specific relative entropy between canonical and grand-canonical distributions (see e.g. [9,2]) Computing the leading order terms of Z L,N from (22) with standard Stirling formula we get (24) we see that (25) vanishes in the thermodynamic limit since log z(φ) = −d log(1 − φ). Convergence in specific relative entropy implies convergence of finite marginals [2], i.e. for any fixed k > 0 and n 1 , . . . , n k ≥ 0 The latter limit could also be computed directly in analogy to other results below, but the route via the equivalence of ensembles is more robust since only the logarithm of the partition function has to be controlled to leading order.
An alternative representation of the specific relative entropy is given by (see e.g. [9]) Since the second moment of the single-site marginal ν φ is finite when φ(ρ) = ρ/(ρ + d) < 1, one can show that this vanishes in the thermodynamic limit even without computing the asymptotics of Z L,N , by applying a local central limit theorem to the right hand side (see for example [59,60]).
In the case d → 0, (21) implies that there are no grand-canonical distributions for any positive density and therefore we expect a condensation transition, following the discussion after (12). We summarize this in the following result proved by a direct computation. Proposition 1. Provided that d → 0 as L → ∞, the inclusion process exhibits a condensation transition as given in Definition 1 with ρ c = ρ b = 0, i.e. we have for all fixed n ≥ 0 and ρ ≥ 0 Proof. We have for any n ≥ 0 fixed since with the scaling (27) given below for the partition function in the case The same holds with (33) in the case dL → ∞. From (20)  So locally the system appears empty in the limit, and a further investigation of the condensed phase will be given below in terms of size-biased samples. Note that in the proof we only use the asymptotic behaviour of ratios of partition functions and the fact that w(n) = O(d) for all n > 0.

GEM scaling limit and complete condensation
We study the distribution of the condensed phase by computing size-biased marginals in the case dL → α ≥ 0. Using (23), the leading order behaviour of the partition function is given by Recall from Section 2.4 that 1 N (η 1 , . . . , η L ) is a (finite) partition of the unit interval.
Equivalently, size-biased samples converge as 1 Proof. Following the discussion in Section 2.4 it suffices to show that for all k ≥ 1, With the characterization in (17) this establishes convergence in distribution of size-biased permutations to GEM(α), which is equivalent to (28).
For α → 0 the above limiting distribution PD(α) degenerates, with the mass fraction of the maximal occupation number tending to 1. Under a mild additional assumption dL 1/ log L on the scaling, this statement can be significantly strengthened to ensure complete condensation in analogy with results in [23] for fixed L as N → ∞.
Proposition 2. In the thermodynamic limit L, N → ∞ such that N/L → ρ with dL log L → 0, we have complete condensation in the sense of (5), i.e. π L,N max x∈Λ η x = N → 1.
Proof. It suffices to show for the first size-biased marginal that π L,N [η 1 = N − n] → δ n,0 for all n ≥ 0 , which implies the same for the maximal occupation number. Using again (14), (20) and (27) we have for all n ≥ 0 The first term tends to 1 for all n ≥ 0 and the second scales like Then Z L−1,0 = 1 and Z L−1,n dL/n → 0 for n ≥ 1, which implies (31).

Intermediate scales
Assuming that d → 0 with dL → ∞ we cannot easily apply (23) for asymptotic estimates, and after a slightly more involved Stirling approximation the leading order of the partition function (12) is While in principle this scaling together with that of the weights (20) fully determines the asymptotics of size-biased distributions, it turns out to be more useful to use particular cancellations when estimating ratios of partition functions to proof our main result below. The above scaling implies for all fixed n ≥ 0 that which we have used to prove Proposition 1.
Theorem 2. In the thermodynamic limit L, N → ∞ such that N/L → ρ, d → 0 and dL → ∞, we have for any ρ > 0 and fixed k ∈ N i.e. marginals of rescaled size-biased samplesη converge in distribution to independent exponential random variables with mean ρ.
Proof. To establish convergence of the joint density we have to show for all n 1 , . . . , n k such that In an analogous computation to (30), we get where we used the asymptotic behaviour of the stationary weights (20), and arranged the contributions of the ratio of partition functions in a convenient way. Since d → 0 we have A → 1 and D (dL) dk using (23). The latter does not apply to the other two terms since dL → ∞, and a more careful (but straightforward) analysis leads to and analogously, using Therefore we get and inserting into (35) implies (34). So the condensed phase for any intermediate scale with dL → ∞ has a non-hierarchical structure, locally consisting of independent clusters of average size ρ/d. This general behaviour across a large range of scaling regimes is quite remarkable. However, since dN → ∞, the rescaled size-biased samples dη do not form a partition of a compact interval (as in the previous case of dL → α). So our result on convergence of finite marginals does not imply weak convergence of the full sequence dη, and we only get a local characterization of the condensed phase. Since the total mass of the condensed phase is N , and k in the above result can be chosen arbitrarily large, this at least implies that the volume fraction covered by the condensed phase scales at least as d to leading order.
Note also that the limiting exponential distribution of a rescaled cluster in the condensed phase is not itself the size-biased distribution of a random variable, since this would have density ρ x This cannot be normalized due to divergence at x = 0, and suggests that the condensed phase does not simply consist of O(1/d) clusters with i.i.d. occupation numbers. If, conditional on the volume covered by the condensed phase, one could probe a cluster size without size bias, it would vanish on the scale 1/d. This suggests that the volume fraction covered by the condensed phase could indeed be larger than d with many clusters on smaller scales that do not contribute to the total mass to leading order. Details of this behaviour are most likely depending on the particular scaling of d, and are very hard to access analytically or even to observe numerically.

Simulation results
We illustrate our main results with Monte Carlo simulations of the inclusion process at stationarity. Recall that with (7) and (19) the generator describing the dynamics is given by We initialize the system by distributing N particles independently, uniformly at random on the lattice. The stationary distributions π L,N (22) are conditional product measures for all translation invariant or symmetric choices of p(x, y). On the complete graph with p(x, y) ≡ 1 L−1 one can implement a simple rejection based algorithm to simulate the dynamics, which we summarize in Appendix B and call CG dynamics in the following. We also implemented the standard Gillespie algorithm [61] to simulate totally asymmetric dynamics on a one-dimensional lattice with periodic boundary conditions, i.e. p(x, y) = δ y,x+1modL , which we call TA dynamics.
In both geometries, the number of empty sites grows in time and the particles concentrate in clusters, which exchange particles. Smaller clusters disappear and the average cluster size increases, driving a coarsening process. This leads to stationary distributions where either a balance between cluster aggregation and break-up is reached, which is the case for d → 0 and dL → α ∈ (0, ∞], or the system saturates with a single cluster remaining for dL → 0. While for CG dynamics clusters can directly exchange particles, for TA dynamics the clusters are isolated and the coarsening process is limited by particle transport, which has been studied in [24]. Still, once stationarity is reached (see Appendix B for more details on this), both dynamics provide samples from the same stationary distributions π L,N which do not have any spatial correlations. Two typical stationary configurations for CG and TA dynamics are illustrated in Figure 1.
Since the complete condensation regime dL → 0 has been studied numerically before [24], we focus on the hierarchical results in Theorem 1 with dL → α ∈ (0, ∞), and comment on intermediate scales with dL → ∞ from Theorem 2 later. There are no particularly useful results for marginals of Poisson Dirichlet random variables, so we compare size-biased samples of stationary configurationsη to the GEM(α) distribution. For each k ≥ 1, we define the mass fraction remaining on all sites with index > k in the size-biased sampleη. With the representation (17) of the GEM distribution, Theorem 1 implies that for each k ≥ 1 the random variable R k converges in distribution to a product of i.i.d. random variables 1 − U i , where U i ∼ Beta(1, α). With (18) this implies that which is illustrated in Figure 2 for various values of α and ρ. We see good agreement for small values of k, but in addition to statistical errors there are large systematic finite-size effects (illustrated for α = 10 in Fig. 2 right). These are related to the small amount of non-zero occupation numbers #(η) in typical stationary configurations, leading to a systematic underestimation of R k L,N . This can be derived from Ewen's sampling formula (see e.g. [41], Theorem 2.8), where #(η) corresponds to the number of different types in a finite sample of size N from a Poisson-Dirichlet population, and can be shown to scale as This logarithmic scaling can be seen in Figure 2 (right). Convergence of #(η)/ log N to α is very slow on the scale 1/ √ log N (see [41], Theorem 2.11), so this is not a good estimator for α, and the comparison based on (38) in Figure 2 is more useful.
For small values of d and finite system size L there is a data cross-over to the condensed regime, with very few occupied sites. This is very hard to access numerically, but theoretically, a single condensate site is fully consistent with the limit α → 0 in (38). For large values of d there is a data cross-over to the intermediate regime d → 0 with dL → ∞, which is covered by Theorem 2. This cross-over is illustrated in Figure 3 (left), where we plot the empirical tail distribution of dη i for i = 1, 2, 3 based on 5 size-biased re-samplesη of 100 independent samples of η from π L,N using CG dynamics. We pick small values for i in order to use the same procedure for all values of d including 1/L. For larger d, larger values for i lead to the same behaviour, and tests reveal that the samplesη i are indeed uncorrelated. For fixed density ρ = 1 we see that agreement with the exponential tail, e −u/ρ predicted by Theorem 2, improves with increasing d up to d = 32/L = 1/ √ L. In Figure 3 (right) for this value of d we see good agreement with the predicted tail for several densities ρ.
If we increase d further the system crosses over to the behaviour for constant d > 0, where we have equivalence of ensembles to grand canonical measures ν φ as explained in Section 3.1. Rescaled size-biased variables dη i will then take discrete values in dN given by the size-biased version of ν 1 φ (10), i.e.
Note that for d → 0, we have from the right-hand side of (39) that since nw(n)/d → 1, z(φ(ρ)) → 1 and φ(ρ) n → e −u/ρ . So the size-biased grand-canonical distributions scale consistently with the result in Theorem 2.

Large deviations
In Section 3 we derived the typical stationary behaviour in the condensed phase, and will now study the statistics of large deviations of the maximum occupation number. The most interesting case of complete condensation is covered in Section 4.3, for completeness and to introduce the main concepts of large deviations we first cover the non-condensing and intermediate regime.
Note that in the hierarchical regime with dL → α ∈ (0, ∞), the typical size of the maximum is of order L and it can take any value on that scale with non-vanishing probability.

Non-condensing regime
We first treat the case d → d > 0 as L → ∞ for which we have equivalence of ensembles. We find that the probability of observing maximum site occupations of order L decays exponentially in L, as would be the case under the grand-canonical measures ν φ (10) where the site occupations are i.i.d. with finite mean and variance. We characterise this decay in terms of the large deviation rate function I ρ (m), which is informally defined as This is made precise in the following result which characterizes the local large deviations, and provides an explicit form for the rate function. The results in this section imply large deviation principles in the usual sense, see for example [62,59] and references therein for details.
Proposition 3. If d → d > 0 and m ∈ [0, ρ), then in the thermodynamic limit where Proof. The proof follows a standard tilting argument which we only sketch here, more details can be found in [59]. First note that for grand-canonical measures (10) with φ, φ ∈ [0, 1) and recall that ν 1 φ [η 1 = n] = w(n)φ n /z(φ) with weights w(n) given in (20) and normalization Since the grand canonical single site marginals ν φ have finite exponential moments for each φ ∈ [0, 1), we may choose a sequence of φ such that the expected number of particles per site under in the thermodynamic limit, with Φ given in (24) as the inverse of R(φ) (21). Since ν φ has second moment which converges to η 2 x Φ(ρ−m) < ∞, we may then apply a standard local limit theorem for triangular arrays (see e.g. [60]) to show that with this choice of φ the first term on the second line vanishes. The same is true for the term in the third line choosing φ = Φ(ρ) = ρ/(ρ + d) by equivalence of ensembles proved in Section 3.1, and we can conclude using (42) and taking limits.

Intermediate scales
For the intermediate scale, d → 0 with dL → ∞, we cannot directly apply a local limit theorem for triangular arrays as in the previous case, since with (21) there are no grand-canonical measures with positive densities. Here we will make use of Stirling's approximation of the partition function (32) and truncation arguments to derive the large deviations behaviour of the maximum η (1) . In this regime the probability of observing a maximum site occupation of order L has asymptotic decay rate dL.
Note that this rate function is consistent with the limit d → 0 of I ρ (m)/d in (41), but the case d = 0 is not covered by Proposition 3 and needs a separate proof.
Proof. We firstly extract the contribution due to the maximum site occupation by observing that where Z where we use monotone decay in N of the weights w(N ) (20) and the partition function Z L,N (12), which holds since dL > 1 for L sufficiently large. This leads to a lower bound on Z By applying (32) together with (20) we find that in the thermodynamic limit if M/L → m > 0. We conclude by taking logarithms, and again applying (32) together with (20).
We illustrate the rate function for this and the following case of complete condensation in Figure 4 and compare to exact numerics obtained for finite system size. The latter are generated using the right-hand side of (44) and the recursive structure of the canonical partition functions The same relation holds for truncated partition functions (see [59] for details). With initial condition Z 1,n = w(n), n = 0, . . . N and choosing k = L/2 this can be used effectively in an iteration to reach large system sizes.

Complete condensation
In the case dL 1/ log L we have complete condensation as stated in Proposition 2. We characterise the large deviations of the maximum on the scale L, which turn out to be dominated by the probability of observing the smallest number of occupied sites required to realise a given size of the maximum. To derive this result, it is easier to first understand probabilities of size-biased configurations in analogy to Theorem 1.
Proposition 5. In the thermodynamic limit N, L → ∞ such that N/L → ρ, with dL log L → 0 we have x k ∈ (0, 1). Furthermore, in the same limit Proof. In analogy to (30) in the proof of Theorem 1 we get where we also used The remaining mass, N − k i=1 n i , is of order L since x 1 , x 2 , . . . , x k ∈ (0, 1). Therefore, applying (27) to the ratio of partition functions we find where we used N dL , (N − k i=1 n i ) dL → 1, since dL log L → 0. This completes the proof of (46).
Proof. The result follows rather directly from the previous proposition, and we sketch the main calculations required. From (47) in Proposition 5 we see that π L,N η (1) = M ;η 3 > 0 decays to zero faster than d.
If we take d = L −γ with γ > 1 then we may summarize Corollary 1 in terms of a large deviation rate function (with speed log L), as follows This is illustrated in Figure 4 (right) for γ = 2.

Summary
We have established a complete picture for condensation in the inclusion process in the thermodynamic limit, and characterized the condensed phase in several regimes using size-biased sampling of configurations. Our results cover the full scaling regime of the diffusion parameter d, only excluding some narrow bands of size log L/L for complete condensation and large deviations. A particularly interesting regime is the hierarchical structure discussed in Section 3.2 related to the GEM and the Poisson-Dirichlet distribution. This is well established in the context of population genetics [41], where the full structure of Dirichlet multinomials has been exploited to derive very detailed results for Moran models, which can be interpreted as inclusion processes. We derived our results using only the most general properties of inclusion processes so that our approach can be easily transferred to other systems, and we give more details in the next subsection. The Poisson-Dirichlet distribution has been identified as the unique stationary distribution of split-merge dynamics of clusters [29,30], where split and merge rates are proportional to cluster sizes. Our results show that the inclusion process can be seen as a generic 'monomer exchange' version of such dynamics, where now only single particles are exchanged but with the same proportionality of rates in the inclusion interaction. It would be very interesting to investigate this connection in detail in the context of Poisson-Dirichlet diffusions in analogy to [63]. The crucial prerequisite to see Poisson-Dirichlet statistics in particle systems such as the inclusion process is the asymptotic behaviour of the stationary weights (20), w(0) = 1 , w(n) = O(d) for all n ≥ 1 as L → ∞ , and w(n)/d n −1 as n → ∞ .
The fact that w(n) vanishes proportionally to d as L → ∞ for all n > 0 leads to ρ b = ρ c = 0 and condensation with an empty bulk. The structure of the condensed phase is determined by the 1/n decay of stationary weights for large occupation numbers. This is quite robust, as is discussed in the next subsection. There we summarize some previous results and connections to other particle systems with Poisson-Dirichlet statistics.

Other particle systems with Poisson-Dirichlet statistics
The model studied in [34] consists of N particles moving diffusively on a one-dimensional torus of length L, subject to a logarithmic attractive potential and short-range hard-core exclusion. The weak attraction leads to the formation of large gaps between groups of particles, and the distances y = (y 1 , . . . , y N ) between particles have a stationary distribution of the form (12) with weights w(y) = y −β , where β < 1 corresponds to a dimensionless inverse temperature controlling the strength of the noise. So the rescaled distances 1 L y provide a partition of the unit interval and follow a Dirichlet(1 − β, . . . , 1 − β) distribution. Of particular interest in [34] is the temperature [41] directly applies so that the order statistics converges in distribution to a Poisson-Dirichlet partition of [0, 1]. Indeed, the corresponding Beta(1, b − 1) distribution of the first size-biased marginalỹ 1 as in (17) is established independently in [34] without mentioning the connection to the Poisson-Dirichlet distribution. Note that in this model gaps between particles correspond to cluster sizes, and the average cluster size is therefore L/N . A related paper with a hierarchical clustering phenomenon for interacting diffusions on a ring is [35], and to our knowledge these continuous models are the only particle systems where a connection to Poisson-Dirichlet statistics has been recognized so far. The Brownian energy process introduced in [12,13] as a dual model to the inclusion process exhibits stationary product measures with chi-squared marginals, and conditioning on the total sum of occupation numbers leads to the same canonical distributions as the model in [34].
To test the robustness of our results against small changes in the stationary weights w(n), it is useful to consider zero-range processes. For any given w(n) it is well known that a process with the jump rate for a cluster of size n to lose a particle given by exhibits stationary product measures of the form (10) (see e.g. [3,17] and references therein).
Using the weights (20) for the inclusion process this leads to jump rates so that u(1) = 1/d diverges in a scaling limit with d → 0. All other rates are bounded and converge as u(n) → n n − 1 as L → ∞ for all n ≥ 2.
A zero-range process with rates (51) has exactly the same stationary distributions (12) as the inclustion process and all our results apply. Condensation in zero-range processes has been a major research area in recent years (see e.g. [5,9,2]), where decreasing rates u(n) 1 + b/n lead to stationary weights of order n −b , so that φ c = 1 and the critical density is given by (see discussion in Section 2.2) In such models, condensation is driven by strong enough on-site attraction between particles. The rates (51) have asymptotic behaviour u(n) n n − 1 1 + 1 n as n → ∞ (52) and the attraction between particles is not strong enough. Instead, cluster coarsening and condensation is driven by divergence of u(1) = 1/d, which ensures that ρ b = 0 in the bulk of the system and the remaining mass concentrates on a number of lattice sites decreasing in time.
We have checked numerically that the particular form of the rates (51) is in fact not important, and choices of the form u(n) = n/(n − 1) or u(n) = 1 + 1/n for n ≥ 2 lead to the expected Poisson-Dirichlet statistics at stationarity for u(1) = 1/d L/α with α > 0. This can be checked analytically on a case-by-case basis, but it is known that in general the asymptotic behaviour of the partition function and condensation behaviour may depend sensitively on perturbations of the rates (see e.g. [46] and [64,65]), so we are currently not able to prove a general result analogous to Theorem 1 based only on asymptotics of stationary weights or jump rates. on a vanishing volume fraction but contains a non-zero fraction of the total mass. In the limit L, N → ∞, N/L → ρ and then K → ∞ we get condensed bulk/background mass fraction This follows simply from convergence for bounded test functions in the bulk and conservation of total probability and mass. It implies in particular that in this ordered limit for the average occupation numbers in the bulk and condensed phase, respectively. A further interesting property that is often used is that condensation leads to the divergence of higher order moments, due to the contribution of the condensed phase. This is implied by the following general result. Proposition 6. Assume that a system exhibits condensation as in Definition 1 in the thermodynamic limit with density ρ. Then for all x ∈ Λ and any positive function f : as L, N → ∞, and N/L → ρ.
Proof. For any fixed K > 0 we have as L, N → ∞, N/L → ρ. This holds for all K > 0 and ρ − η x 1 ηx≤K ρ → ρ − ρ b > 0 as K → ∞ with (53), so there exists C > 0 such that Then f (n) → ∞ implies min n>K f (n) → ∞ as K → ∞, which proves the first statement. Essentially the same argument works for the second statement, we have for all K > 0 fixed as L, N → ∞, N/L → ρ, because min n>K f (n) diverges and 1 ηx>K η x L,N is uniformly bounded since it converges to ρ − ρ b as K → ∞ (53). In that limit, the right-hand side converges to η x /f (η x ) ρ which implies This result implies in particular, that for condensing systems all higher moments η a x L,N with a > 1 diverge in the thermodynamic limit due to contributions from the condensed phase. Lower moments with a < 1 converge to η a x ρ , and the first moment with a = 1 is the boundary case, converging to a strictly larger value ρ > ρ b = η x ρ than the bulk density. We stress again that we have only used Definition 1 and weak convergence of single-site marginals of the canonical measures to derive these results. So they hold very generally, and do not depend on the existence of stationary product measures or any other particular structure.

B Some details on dynamics and Monte Carlo simulations
Heuristic results for TA dynamics of the inclusion process [24] show that the equilibration time scales like L/d, and is dominated by a coarsening process with a transport limited mass exchange dynamics between isolated clusters: On a time scale of order 1 the mass in the system concentrates on isolated cluster sites which are separated by at least one empty site. Each cluster of size m then performs an effective totally asymmetric random walk with rate dm. So larger clusters move faster and overtake smaller ones, and during the overtake both clusters exchange mass. This leads to fluctuations in cluster sizes and drives the coarsening process, where smaller clusters disappear and the average cluster size grows as a power law in time. From the point of view of an individual cluster, coarsening determines the time scale τ a on which it aggregates a macroscopic amount of mass, and on the fragmentation time scale τ f it loses a non-zero mass fraction which forms a new cluster on a previously empty site. For TA dynamics, the latter only happens if during a step when a cluster extends over two sites (which takes only a time fraction of order d), a further particle breaks away, which happens again at rate proportional to d (see discussion in [24] for more details). In summary both time scales are τ a = L/d and τ f = d −2 , and we see that they agree exactly in the case dL → α ∈ (0, ∞), leading to a balance of aggregation and fragmentation for macroscopic clusters at stationarity, and the interesting hierarchical structures of Theorem 1. If dL → 0 then τ a τ f and the balance cannot be reached, rather the system saturates in a single remaining cluster consistent with complete condensation results Proposition 2. On the other hand if dL → ∞, fragmentation dominates with τ f τ a for macroscopic clusters, and a balance is reached at sizes of scale 1/d instead (consistent with Theorem 2), which includes the case of no condensation with d = O(1). This heuristic provides useful insight on the level of the dynamics into our rigorous results which only depend on the form of the stationary distributions (12), and also implies that TA dynamics have to be simulated on times of order τ a = L/d to reach stationarity.
A similar argument can be made for the complete graph geometry, where the dynamics is entirely different. Cluster sites are in direct contact, and exchange single particles with a rate of order m 2 /L, where we understand m 1 to be a 'typical' cluster size. Since the exchange is symmetric, it takes of order m 2 exchange events to change cluster sizes by a finite fraction, leading The fragmentation time scale τ f follows since particles jump onto empty sites with rate dm and of order m jumps are needed to fragment a finite fraction of a cluster's mass. Here we used that due to m 1 cluster sites only cover a vanishing volume fraction. Even though both time scales are different from TA dynamics, an aggregation fragmentation balance is again reached for dL → α. Since we only care about the mass distribution and not the spatial location of clusters, equilibration time is now faster of order τ a = L. This is a crucial difference to TA dynamics, where the coarsening process is transport limited and clusters have to move in order to exchange particles. Due to the particular form of the jump rates for the inclusion process (36), CG dynamics can be implemented in a rejection-based algorithm summarized in Alg. 1, and this provides a very simple and efficient way to produce Monte Carlo samples from the distribution π L,N (12).