Structure of the Condensed Phase in the Inclusion Process

We establish a complete picture of condensation in the inclusion process in the thermodynamic limit with vanishing diffusion, covering all scaling regimes of the diffusion parameter and including large deviation results for the maximum occupation number. We make use of size-biased sampling to study the structure of the condensed phase, which can extend over more than one lattice site and exhibit an interesting hierarchical structure characterized by the Poisson–Dirichlet distribution. While this approach is established in other areas including population genetics or random permutations, we show that it also provides a powerful tool to analyse homogeneous condensation in stochastic particle systems with stationary product distributions. We discuss the main mechanisms beyond inclusion processes that lead to the interesting structure of the condensed phase, and the connection to other generic particle systems. Our results are exact, and we present Monte-Carlo simulation data and recursive numerics for partition functions to illustrate the main points.

phase separates into a homogeneous bulk and a condensed phase, with a finite fraction of the total mass concentrating in a vanishing volume fraction. First introduced in [3], zero-range processes and related models provided a first example of condensation in homogeneous SPS [4][5][6]. On the level of stationary distributions condensation is characterized by heavy-tail behaviour of stationary weights as first noted in [7,8], which has been used to study the phenomenon in the context of equivalence of ensembles and large deviations [9][10][11].
The inclusion process has been introduced in [12] as a discrete dual to a model of heat conduction, and has later been studied as an interesting model of stochastic transport on its own [13][14][15]. It is a natural bosonic counterpart to the exclusion process where particles are subject to an attractive inclusion interaction in addition to independent diffusive motion. It can also be interpreted as a multi-species version of the Moran model of population genetics [16], where the inclusion interaction corresponds to selection, and diffusion to mutation dynamics. The inclusion process is part of a larger class of models introduced in [17] that exhibit factorized stationary distributions, which has recently been extended [18]. Condensation in the inclusion process has first been studied in [19] for inhomogeneous systems. Condensation in homogeneous systems only occurs if the diffusion strength vanishes with the system size. While such scaling of system parameters can lead to non-equivalence of ensembles and discontinuous behaviour as established for a toy zero-range model in [20,21], this is not the case for the inclusion process and small diffusion or mutation rates are in fact very natural in many applications. The dynamics on various time scales have been established on a rigorous level in [22,23], restricted to finite lattices in the limit of diverging particle density. In the thermodynamic limit with a finite limiting density there are only heuristic results so far, covering the dynamics of condensation in the inclusion process [24] and extensions with stronger particle interactions and instantaneous condensation [25,26].
In particular, the stationary behaviour of the inclusion process in the thermodynamic limit has not been characterized so far, which is the main aim of this paper. We establish the equivalence of ensembles, and show that for vanishing diffusion strength the inclusion process exhibits condensation for any positive particle density. While the bulk of the system is empty, the condensed phase can exhibit an interesting hierarchical structure following the Poisson-Dirichlet distribution. The latter was originally introduced in the context of population genetics [27,28], and has later been identified as the generic stationary distribution of split-merge dynamics [29,30], which is related to its appearance in cycle length distributions of random permutations [31][32][33]. It has further been observed (though not identified) more recently in systems of interacting diffusions [34,35], but to our knowledge is a novelty in the context of condensation in SPS. In general, the condensed phase in SPS with stationary product distributions concentrates on a single lattice site [7,8,10,36]. A spread over multiple sites has only been observed in versions of zero-range processes which include an effective (soft) cut-off for site occupation numbers [37,38], or in models with pair-factorized stationary states [39,40] where it occurs naturally due to spatial correlations. Poisson-Dirichlet statistics arise when the diffusion parameter in the inclusion process scales with the inverse system size, and we also establish complete condensation for smaller diffusion where all particles concentrate on a single site, and a universal exponential law for intermediate scales.
Our main results on the structure of the condensed phase are derived using size-biased sampling of occupation numbers, which is related in a natural way to the Poisson-Dirichlet distribution as reviewed in Sect. 3.2. While this point of view is standard in population genetics (see e.g. [41]), this approach also provides a strong tool to study the condensed phase in SPS where it has not been used so far. After introducing the basic notation and concepts in Sect. 2, we derive our main results on condensation and the typical structure of the condensed phase for the inclusion process in Sect. 3. Our results are rigorous and derivations are presented in a general, transferable way, and we show simulation data for illustration. We include results on large deviations of the condensed phase in Sect. 4, and conclude with a discussion of the main points and relations to other models in Sect. 5. In Appendix 1 we show that under a general definition of condensation the system phase separates into a homogeneous bulk and a condensed phase, and that condensation implies divergence of higher moments. In Appendix 2 we comment on Monte-Carlo dynamics to generate stationary samples, and on differences between one-dimensional and mean-field geometries.

Condensation in Homogeneous Particle Systems
We study stochastic particle systems (SPS) on a finite set of spatial locations/sites of size | | = L, which can for example be a regular lattice with periodic or closed boundaries. The system has a fixed, finite number of N particles, and we denote configurations by η = (η x : x ∈ ), η x ∈ N 0 , and the state space E L,N = η : x∈ η x = N denotes the set of all configurations. The dynamics should be irreducible on E L,N , so that the process has a unique (canonical) stationary distribution π L,N . We assume that π L,N is spatially homogeneous, i.e. the single-site marginals π L,N [η x ∈ .] do not depend on site x, and in particular this implies that the density (the expected number of particles per site) is given as We are interested in large-scale condensation phenomena of the system in the thermodynamic limit L, N → ∞ such that the density converges as N /L → ρ ≥ 0, which in the following we often denote by lim N /L→ρ to simplify notation. We assume that in this limit finite marginals of π L,N converge, and we denote the limiting single site marginal as a distribution on N 0 by This convergence of distribution functions is equivalent to weak convergence, i.e.
for all x ∈ and bounded, continuous test functions f ∈ C b (N 0 ). With (1) the first moment η x L,N → ρ converges in the thermodynamic limit, and by Fatou's Lemma this implies for the first moment of the limiting distribution that This is usually called the background or bulk density (indicated by the subscript) as is explained below. Strict inequality above is possible since f (η x ) = η x is an unbounded function on N 0 , and implies that locally the system loses mass in the limit, providing the following standard definition of condensation.

Definition 1
A system with canonical distributions π L,N exhibits condensation in the thermodynamic limit N /L → ρ with background density ρ b as in (4), if ν ρ exists as defined in (2) and ρ b < ρ. A system with ρ b = 0 is said to exhibit complete condensation if i.e. typically all particles in the system concentrate on a single lattice site. If ν ρ exists for all ρ ≥ 0, the systems is said to exhibit a condensation transition with critical density ρ c ≥ 0, if Condensation in the above setting has been established in various SPS, including zerorange processes and related models (see e.g. [42,43] and references therein). It has been shown on a case-by-case basis that ρ b is monotone increasing with ρ and there exists a unique critical density ρ c ∈ [0, ∞] in the sense of (6). One sufficient general condition is monotonicity of the dynamics for the underlying particle system. But in principle more complicated behaviour such as non-monotonicity of ρ b cannot be ruled out, even though we are not aware of any generic examples in the thermodynamic limit. For condensation on finite lattices possible non-monotonicity of ρ b has been established and discussed e.g. in [44,45] and references therein.
As is discussed in more detail in Appendix 1, the interpretation of ρ b < ρ is that the system phase separates into a homogeneous bulk phase and a condensed phase. The latter concentrates on a vanishing volume fraction but contains a non-zero fraction ρ − ρ b > 0 of the total mass in the system, and is usually simply called the condensate. Depending on the specific example and the nature of π L,N the condensate may cover only a single lattice site (see e.g. [10,36]) or a sub-extensive volume [39,40]. In most cases the bulk density ρ b = ρ c is equal to critical one, but there are also models with ρ b < ρ c , such as zero-range toy models with size-dependent rates [20,21] which introduce an effective long-range interaction and lead to non-equivalence of ensembles. Complete condensation has been established for particular zero-range processes in [7,46] and for inclusion processes in a fixed volume in [23].
As we show in Appendix 1 in Proposition 6, condensation as defined above implies in particular divergence of higher moments η a x L,N with a > 1. This has been used in some papers as a definition of condensation often using a = 2 [47,48]. The converse does not hold, since moments of limiting distributions ν ρ with heavy tails can diverge also in the absence of phase separation, so we stick to Definition 1 to characterize condensation. For condensing systems, divergence of higher moments is due to the contribution of diverging occupation numbers in the condensed phase which is not described by the limiting distribution ν ρ .

Models with Stationary Product Measures
From now on we focus on stochastic particle systems which are defined by a generator of the form for continuous test functions f ∈ C(E L,N ). This defines a continuous-time Markov process on the state space E L,N jumping from configurations η to η xy where one particle moves from site x to y. The spatial dependence of the rates is given by a multiplicative factor p(x, y), which we take to be an irreducible transition kernel for a single particle on . The interaction between particles is determined by the function u which depends only on the occupation numbers of departure and target site of a jump event. To ensure irreducibility of the process on E L,N we assume u(m, n) ≥ 0 and u(m, n) = 0 if and only if m = 0.
To ensure spatial homogeneity at stationarity we assume which is a slight generalization of translation invariance on regular lattices. This type of models have first been introduced in the seminal paper [17]. It is well known (see also [2,18]) that they exhibit stationary product measures if and only if and either p(·, ·) is symmetric, or In this case, normalizing the weights w(n) = n k=1 u(1, k) u(k, 0) leads to product distributions which are stationary for all φ ≥ 0 such that the normalizing partition function z(φ) < ∞. Note that these 'grand-canonical' distributions are supported on the extended state space E L = η : η x ≥ 0 without fixing the total number of particles. The expected number of particles per site is given as a monotone increasing function of φ as For such processes we have explicit representations of the canonical distributions as conditional grand-canonical distributions which in fact do not depend on the choice of φ > 0. This leads to the useful form with canonical partition function Z L,N . This implies in particular that for ρ < ρ c the limits (2) of single-site marginals are given by the marginal ν 1 φ with φ ≥ 0 such that R(φ) = ρ. For models of the above type, the condensation transition as given in Definition 1 is equivalent to existence of φ c < ∞ such that z(φ) = ∞ for all φ > φ c , and R(φ) → ρ * < ∞ as φ → φ c (see e.g. [2] for a detailed discussion). Examples of this type studied so far include zero-range processes with u(m, n) = u(m) and decreasing rates u(m) [5,8,36], where ρ * = ρ c = ρ b . If the rates can depend on the system size, the transition can also be discontinuous with ρ b < ρ c < ρ * where grand-canonical distributions with densities in the range (ρ c , ρ * ) are metastable [20,21]. More recently, condensation has also been studied for inclusion processes [19] and explosive condensation models [25,26,43] with rates of the form If γ > 2 the system exhibits a condensation transition for all d > 0 with ρ c > 0. For inclusion processes we have γ = 1, and this case is covered in more detail in Sect. 3.1. In all generic systems with stationary product measures studied so far, we have and the condensed phase concentrates on a single lattice site. In Sect. 3 we will see for the inclusion process that the condensed phase can extend over more than one site and have an interesting hierarchical structure, which has not been observed for condensing particle systems so far.

Size-Biased Sampling
Since the condensed phase concentrates on a vanishing volume fraction, the limiting marginal probabilities for a fixed number k of occupation numbers converge to the distribution of the bulk in a condensed system. As explained above, for models with stationary product measures this is usually given by the maximal product measure with critical density ρ c = R(φ c ) and we have (cf. [10]) for all x 1 , . . . , x k ∈ and n 1 , . . . , n k ≥ 0. This asymptotic equivalence of canonical and grand canonical ensembles (distributions) has been established for a large class of models [2,9], and implies weak convergence w.r.t. local, bounded test functions as in (3).
Since it contains a non-zero fraction of all particles, the distribution of the condensed phase can be accessed via size-biased permutations of particle configurations. This can be interpreted as picking a particle uniformly at random and sampling the occupation number η x at its location x. The larger η x , the more likely it is to pick site x in this way. Formally, this can be defined recursively (see e.g. [41,Sect. 2.4]).
Definition 2 For given η ∈ E L,N pick a random permutation σ : → of the lattice indices as . . . and so on.
For models with canonical distributions of the form (12), the distribution of the first size-biased marginal is given by where the stationary weight w(n) is re-weighted proportional to n and re-normalized. Here and in the following we use the convention Z L,k = 0 for all k < 0, so we can omit indicator functions of the form 1 n≤N to simplify notation. Note that the first identity in (14) with the re-weighted marginal probability holds in general, but the second one only because π L,N is a conditional product measure of the form (12). For a two-site size-biased marginal we then have Generalizing to the k-site case we get which includes k = L to get the full distribution ofη with Z 0,n = 1 for all n ∈ {0, . . . , N }.
Note that due to size-biased re-ordering, the distribution ofη and its marginals is of course not spatially homogeneous. To our knowledge, essentially all previous studies of condensation in homogeneous particle systems focus instead on the (decreasing) order statisticŝ and in particular the maximum occupation number η (1) (see e.g. [10,21,49,50]). We will see below how this is related to size-biased sampling, and that the latter is very suitable to study condensation in systems with ρ b = 0 such as the inclusion process and related models. A size-biased sampling approach can also be useful in models with ρ b > 0 to study the dynamics of the condensed phase and phase separation as recently shown in [51].

The Poisson-Dirichlet and GEM Distribution
The Poisson-Dirichlet distribution has been introduced by Kingman in the context of population genetics [27,28] and has since occurred in a variety of applications, such as split-merge dynamics [29,30] and random permutations [31][32][33]. It is a one-parameter family of probability measures defined on the set of ordered partitions of the unit interval It can be characterized for instance as a scaling limit of Dirichlet random variables which form a finite partition of [0, 1], or via scale invariant Poisson processes (see Chap. 2 in [41] for details). One of the most accessible characterization in terms of practical use is related to the GEM distribution, named in [52] after Griffiths [53,54], Engen [55] and McCloskey [56], which is defined as follows. Let U 1 , U 2 , . . . be i.i.d. Beta(1, α) random variables with α > 0, which take values on [0, 1] with PDF α(1 − x) α−1 , and the uniform distribution as a special case for α = 1. On the set of (unordered) partitions which corresponds intuitively to breaking off a fraction 1 − U 1 from the unit interval and continuing this process recursively with the remaining interval. The law of V on is called the Griffiths-Engen-McCloskey distribution GEM(α), and the corresponding order statisticŝ V on ∇ has Poisson-Dirichlet distribution PD(α). Alternatively, given a PD(α) distributed partition V on ∇, its size-biased permutationṼ has GEM(α) distribution on (see e.g. [41] for details). Note that the construction (17) leads to a hierarchical structure of a GEM(α) partition V , and the parameter α > 0 controls the expected size of the components. The expectation of Beta(1, α)-distributed random variables U i is 1 1+α , so for small α the size of the first component V 1 is larger and the hierarchy stronger. For larger α the expected sizes of the components are more similar, but always show a strict order since This shows that in fact V ∈ and that the expected component sizes of V k vanish as k → ∞, and is also a useful relation to numerically test for GEM distributions (see Sect. 3.4).
Carrying over the product topology from [0, 1] ∞ , weak convergence of probability distributions on and ∇ is equivalent to convergence in distribution of finite marginals (V 1 , . . . , V k ) of partitions. By Theorem 2 in [57], convergence in distribution of a sequence of size biased partitionsṼ i → V on , implies convergence in distribution of the corresponding ordered partitionsV i →V , and V is a size-biased permutation ofV . In Sect. 3.2 we will use this fact and that rescaled particle configurations 1 N η ∈ can be interpreted as finite partitions of the unit interval, to derive our main results. Note that in a condensing system with ρ b < ρ (4) the partitions 1 N η in the thermodynamic limit only converge on the extended space which allows for the loss of mass due to phase separation (see Proposition 6 in Appendix 1). On the other hand, size-biased permutations capture the condensed phase and the full mass of the system, and 1 Nη converge on , as we will establish in the next Section.

Condensation in the Inclusion Process
The inclusion process is a stochastic particle system of type (7) with rates which was first introduced in [12] in the context of energy/mass transport. Another important interpretation of this model is as a multi-species version of the Moran model of population genetics, which describes the selection-mutation dynamics of a population of N individuals which can take L different types [58]. Here the parameter d describes the mutation rate, which is small compared to the reproduction rate of the system and is often taken to depend on the system size d = d L > 0 and vanish as L → ∞. Results in [23] show that for fixed L as N → ∞, complete condensation occurs if d = d N 1/ log N . The thermodynamic limit has not been studied so far, and in this section we will establish a complete picture covering all densities ρ > 0 and possible scaling regimes of the parameter d.
The inclusion process satisfies conditions (8) and (9) and has stationary product measures of the form (10) with weights This also leads to an explicit formula for the canonical distributions which can be identified as a Dirichlet multinomial distribution (cf. [41, Chap. 1]). These have been studied in detail in the context of urn models and have interesting structural properties and symmetries, but in the following we only make use of the asymptotic form of the partition function so that our results can be more easily translated to other systems. Our main results in the thermodynamic limit N , L → ∞, N /L → ρ ≥ 0 are derived in the next subsections, and can be summarized as follows: We will make use of the asymptotic behaviour of w(n) (20) and the partition function Z L,N , which can be derived by standard Stirling approximations from (22). Particularly useful in the following is the asymptotic behaviour of the ratio

Equivalence of Ensembles and Condensation
We assume d > 0 constant or d L → d > 0. In this case (21) implies that there exist grand-canonical distributions for any density ρ ≥ 0, by choosing such that R(φ) = ρ. In this case the equivalence of ensembles can be established most naturally in terms of the specific relative entropy between canonical and grand-canonical distributions (see e.g. [2,9]) Computing the leading order terms of Z L,N from (22) with standard Stirling formula we get (24) we see that (25) vanishes in the thermodynamic limit since log z(φ) = −d log(1 − φ). Convergence in specific relative entropy implies convergence of finite marginals [2], i.e. for any fixed k > 0 and n 1 , . . . , n k ≥ 0 The latter limit could also be computed directly in analogy to other results below, but the route via the equivalence of ensembles is more robust since only the logarithm of the partition function has to be controlled to leading order.
An alternative representation of the specific relative entropy is given by (see e.g. [9]) Since the second moment of the single-site marginal ν φ is finite when φ(ρ) = ρ/(ρ +d) < 1, one can show that this vanishes in the thermodynamic limit even without computing the asymptotics of Z L,N , by applying a local central limit theorem to the right hand side (see for example [59,60]). In the case d → 0, (21) implies that there are no grand-canonical distributions for any positive density and therefore we expect a condensation transition, following the discussion after (12). We summarize this in the following result proved by a direct computation.

Proposition 1
Provided that d → 0 as L → ∞, the inclusion process exhibits a condensation transition as given in Definition 1 with ρ c = ρ b = 0, i.e. we have for all fixed n ≥ 0 and ρ ≥ 0 Proof We have for any n ≥ 0 fixed since with the scaling (27) given below for the partition function in the case The same holds with (33)  So locally the system appears empty in the limit, and a further investigation of the condensed phase will be given below in terms of size-biased samples. Note that in the proof we only use the asymptotic behaviour of ratios of partition functions and the fact that w(n) = O(d) for all n > 0.

GEM Scaling Limit and Complete Condensation
We study the distribution of the condensed phase by computing size-biased marginals in the case d L → α ≥ 0. Using (23), the leading order behaviour of the partition function is given by Recall from Sect. 2.4 that 1 N (η 1 , . . . , η L ) is a (finite) partition of the unit interval. Theorem 1 In the thermodynamic limit L, N → ∞ such that N /L → ρ with d L → α > 0, the rescaled order statistics of η (16) converge in distribution to Poisson Dirichlet, i.e.
Proof Following the discussion in Sect. 2.4 it suffices to show that for all k ≥ 1, With the characterization in (17) this establishes convergence in distribution of size-biased permutations to GEM(α), which is equivalent to (28).
For α → 0 the above limiting distribution PD(α) degenerates, with the mass fraction of the maximal occupation number tending to 1. Under a mild additional assumption d L 1/ log L on the scaling, this statement can be significantly strengthened to ensure complete condensation in analogy with results in [23] for fixed L as N → ∞.

Proposition 2
In the thermodynamic limit L, N → ∞ such that N /L → ρ with d L log L → 0, we have complete condensation in the sense of (5), i.e. π L,N max x∈ η x = N → 1.
Proof It suffices to show for the first size-biased marginal that which implies the same for the maximal occupation number. Using again (14), (20) and (27) we have for all n ≥ 0 The first term tends to 1 for all n ≥ 0 and the second scales like Then Z L−1,0 = 1 and Z L−1,n d L/n → 0 for n ≥ 1, which implies (31).

Intermediate Scales
Assuming that d → 0 with d L → ∞ we cannot easily apply (23) for asymptotic estimates, and after a slightly more involved Stirling approximation the leading order of the partition function (12) is While in principle this scaling together with that of the weights (20) fully determines the asymptotics of size-biased distributions, it turns out to be more useful to use particular cancellations when estimating ratios of partition functions to proof our main result below. The above scaling implies for all fixed n ≥ 0 that which we have used to prove Proposition 1. i.e. marginals of rescaled size-biased samplesη converge in distribution to independent exponential random variables with mean ρ.
Proof To establish convergence of the joint density we have to show for all n 1 , . . . , n k such that In an analogous computation to (30), we get where we used the asymptotic behaviour of the stationary weights (20), and arranged the contributions of the ratio of partition functions in a convenient way. Since d → 0 we have A → 1 and D (d L) dk using (23). The latter does not apply to the other two terms since d L → ∞, and a more careful (but straightforward) analysis leads to and analogously, using Therefore we get and inserting into (35) implies (34).
So the condensed phase for any intermediate scale with d L → ∞ has a non-hierarchical structure, locally consisting of independent clusters of average size ρ/d. This general behaviour across a large range of scaling regimes is quite remarkable. However, since d N → ∞, the rescaled size-biased samples dη do not form a partition of a compact interval (as in the previous case of d L → α). So our result on convergence of finite marginals does not imply weak convergence of the full sequence dη, and we only get a local characterization of the condensed phase. Since the total mass of the condensed phase is N , and k in the above result can be chosen arbitrarily large, this at least implies that the volume fraction covered by the condensed phase scales at least as d to leading order.
Note also that the limiting exponential distribution of a rescaled cluster in the condensed phase is not itself the size-biased distribution of a random variable, since this would have density This cannot be normalized due to divergence at x = 0, and suggests that the condensed phase does not simply consist of O(1/d) clusters with i.i.d. occupation numbers. If, conditional on the volume covered by the condensed phase, one could probe a cluster size without size bias, it would vanish on the scale 1/d. This suggests that the volume fraction covered by the condensed phase could indeed be larger than d with many clusters on smaller scales that do not contribute to the total mass to leading order. Details of this behaviour are most likely depending on the particular scaling of d, and are very hard to access analytically or even to observe numerically.

Simulation Results
We illustrate our main results with Monte Carlo simulations of the inclusion process at stationarity. Recall that with (7) and (19) the generator describing the dynamics is given by We initialize the system by distributing N particles independently, uniformly at random on the lattice. The stationary distributions π L,N (22) are conditional product measures for all translation invariant or symmetric choices of p(x, y). On the complete graph with p(x, y) ≡ 1 L−1 one can implement a simple rejection based algorithm to simulate the dynamics, which we summarize in Appendix 2 and call CG dynamics in the following. We also implemented the standard Gillespie algorithm [61] to simulate totally asymmetric dynamics on a onedimensional lattice with periodic boundary conditions, i.e. p(x, y) = δ y,x+1modL , which we call TA dynamics.
In both geometries, the number of empty sites grows in time and the particles concentrate in clusters, which exchange particles. Smaller clusters disappear and the average cluster size increases, driving a coarsening process. This leads to stationary distributions where either a balance between cluster aggregation and break-up is reached, which is the case for d → 0 and d L → α ∈ (0, ∞], or the system saturates with a single cluster remaining for d L → 0. While for CG dynamics clusters can directly exchange particles, for TA dynamics the clusters are isolated and the coarsening process is limited by particle transport, which has been studied in [24]. Still, once stationarity is reached (see Appendix 2 for more details on this), both dynamics provide samples from the same stationary distributions π L,N which do not have any spatial correlations. Two typical stationary configurations for CG and TA dynamics are illustrated in Fig. 1. Since the complete condensation regime d L → 0 has been studied numerically before [24], we focus on the hierarchical results in Theorem 1 with d L → α ∈ (0, ∞), and comment on intermediate scales with d L → ∞ from Theorem 2 later. There are no particularly useful results for marginals of Poisson Dirichlet random variables, so we compare size-biased samples of stationary configurationsη to the GEM(α) distribution. For each k ≥ 1, we define which is illustrated in Fig. 2 for various values of α and ρ. We see good agreement for small values of k, but in addition to statistical errors there are large systematic finite-size effects (illustrated for α = 10 in Fig. 2 right). These are related to the small amount of nonzero occupation numbers #(η) in typical stationary configurations, leading to a systematic This logarithmic scaling can be seen in Fig. 2 (right). Convergence of #(η)/ log N to α is very slow on the scale 1/ √ log N (see [41,Theorem 2.11]), so this is not a good estimator for α, and the comparison based on (38) in Fig. 2 is more useful.
For small values of d and finite system size L there is a data cross-over to the condensed regime, with very few occupied sites. This is very hard to access numerically, but theoretically, a single condensate site is fully consistent with the limit α → 0 in (38). For large values of d there is a data cross-over to the intermediate regime d → 0 with d L → ∞, which is covered by Theorem 2. This cross-over is illustrated in Fig. 3 (left), where we plot the empirical tail distribution of dη i for i = 1, 2, 3 based on 5 size-biased re-samplesη of 100 independent samples of η from π L,N using CG dynamics. We pick small values for i in order to use the same procedure for all values of d including 1/L. For larger d, larger values for i lead to the same behaviour, and tests reveal that the samplesη i are indeed uncorrelated. For fixed density ρ = 1 we see that agreement with the exponential tail, e −u/ρ predicted by Theorem 2, improves with increasing d up to d = 32/L = 1/ √ L. In Fig. 3 (right) for this value of d we see good agreement with the predicted tail for several densities ρ.
If we increase d further the system crosses over to the behaviour for constant d > 0, where we have equivalence of ensembles to grand canonical measures ν φ as explained in Sect. 3.1. Rescaled size-biased variables dη i will then take discrete values in dN given by the size-biased version of ν 1 φ (10), i.e.

Large Deviations
In Sect. 3 we derived the typical stationary behaviour in the condensed phase, and will now study the statistics of large deviations of the maximum occupation number. The most interesting case of complete condensation is covered in Sect. 4.3, for completeness and to introduce the main concepts of large deviations we first cover the non-condensing and intermediate regime. Note that in the hierarchical regime with d L → α ∈ (0, ∞), the typical size of the maximum is of order L and it can take any value on that scale with non-vanishing probability.

Non-condensing Regime
We first treat the case d → d > 0 as L → ∞ for which we have equivalence of ensembles. We find that the probability of observing maximum site occupations of order L decays exponentially in L, as would be the case under the grand-canonical measures ν φ (10) where the site occupations are i.i.d. with finite mean and variance. We characterise this decay in terms of the large deviation rate function I ρ (m), which is informally defined as This is made precise in the following result which characterizes the local large deviations, and provides an explicit form for the rate function. The results in this section imply large deviation principles in the usual sense, see for example [59,62] and references therein for details.
Since ν φ has second moment which converges to η 2 x (ρ−m) < ∞, we may then apply a standard local limit theorem for triangular arrays (see e.g. [60]) to show that with this choice of φ the first term on the second line vanishes. The same is true for the term in the third line choosing φ = (ρ) = ρ/(ρ + d) by equivalence of ensembles proved in Sect. 3.1, and we can conclude using (42) and taking limits.

Intermediate Scales
For the intermediate scale, d → 0 with d L → ∞, we cannot directly apply a local limit theorem for triangular arrays as in the previous case, since with (21) there are no grandcanonical measures with positive densities. Here we will make use of Stirling's approximation of the partition function (32) and truncation arguments to derive the large deviations behaviour of the maximum η (1) . In this regime the probability of observing a maximum site occupation of order L has asymptotic decay rate d L.

Proposition 4 If d → 0 and d L
log L, then in the thermodynamic limit we have as N /L → ρ and M/L → m ∈ [0, ρ).
Note that this rate function is consistent with the limit d → 0 of I ρ (m)/d in (41), but the case d = 0 is not covered by Proposition 3 and needs a separate proof.
Proof We firstly extract the contribution due to the maximum site occupation by observing that where Z  By applying (32) together with (20) we find that in the thermodynamic limit if M/L → m > 0. We conclude by taking logarithms, and again applying (32) together with (20).
We illustrate the rate function for this and the following case of complete condensation in Fig. 4 and compare to exact numerics obtained for finite system size. The latter are generated using the right-hand side of (44) and the recursive structure of the canonical partition functions The same relation holds for truncated partition functions (see [59] for details). With initial condition Z 1,n = w(n), n = 0, . . . N and choosing k = L/2 this can be used effectively in an iteration to reach large system sizes.

Complete Condensation
In the case d L 1/ log L we have complete condensation as stated in Proposition 2. We characterise the large deviations of the maximum on the scale L, which turn out to be dominated by the probability of observing the smallest number of occupied sites required to realise a given size of the maximum. To derive this result, it is easier to first understand probabilities of size-biased configurations in analogy to Theorem 1. x k ∈ (0, 1). Furthermore, in the same limit

Proposition 5 In the thermodynamic limit N
Proof In analogy to (30) in the proof of Theorem 1 we get where we also used The remaining mass, N − k i=1 n i , is of order L since x 1 , x 2 , . . . , x k ∈ (0, 1). Therefore, applying (27) to the ratio of partition functions we find This completes the proof of (46).
Finally, for (47), we let n k+1 = N − k i=1 n i . Then using (15) and the fact that Z L,0 = 1 for all L ≥ 1 we have which tends to one by Proposition 2.

Corollary 1 In the thermodynamic limit N
where 0 < C(x) < ∞ is an x dependent constant.
Proof The result follows rather directly from the previous proposition, and we sketch the main calculations required. First fix M ∈ [N /2, N ) ∩ N, then conditioned on the event {η (1) =M} the configuration must contain at least two non-empty sites. Observe that {η (1) =M} is given by the disjoint union From (47) in Proposition 5 we see that π L,N η (1) = M ;η 3 > 0 decays to zero faster than d. Applying (46) to the probability of the remaining two events we find More generally, fix k ∈ N and M ∈ [N /(k + 1), N /k), let n 1 = M, then we can again decompose as a disjoint union as follows where S k+1 is the set of permutations of {1, 2, . . . , k + 1} and n k+1 = N − k i=1 n i . In order for n 1 ≥ n 2 ≥ . . . ≥ n k+1 to hold we must have that (k (47), the probability of the event {η (1) = n 1 ,η k+2 > 0 decays faster than d k L k−1 . Applying (46) yields , and (49) follows.
If we take d = L −γ with γ > 1 then we may summarize Corollary 1 in terms of a large deviation rate function (with speed log L), as follows This is illustrated in Fig. 4 (right) for γ = 2.

Summary
We have established a complete picture for condensation in the inclusion process in the thermodynamic limit, and characterized the condensed phase in several regimes using sizebiased sampling of configurations. Our results cover the full scaling regime of the diffusion parameter d, only excluding some narrow bands of size log L/L for complete condensation and large deviations. A particularly interesting regime is the hierarchical structure discussed in Sect. 3.2 related to the GEM and the Poisson-Dirichlet distribution. This is well established in the context of population genetics [41], where the full structure of Dirichlet multinomials has been exploited to derive very detailed results for Moran models, which can be interpreted as inclusion processes. We derived our results using only the most general properties of inclusion processes so that our approach can be easily transferred to other systems, and we give more details in the next subsection. The Poisson-Dirichlet distribution has been identified as the unique stationary distribution of split-merge dynamics of clusters [29,30], where split and merge rates are proportional to cluster sizes. Our results show that the inclusion process can be seen as a generic 'monomer exchange' version of such dynamics, where now only single particles are exchanged but with the same proportionality of rates in the inclusion interaction. It would be very interesting to investigate this connection in detail in the context of Poisson-Dirichlet diffusions in analogy to [63]. The crucial prerequisite to see Poisson-Dirichlet statistics in particle systems such as the inclusion process is the asymptotic behaviour of the stationary weights (20), for all n ≥ 1 as L → ∞, and w(n)/d n −1 as n → ∞.
The fact that w(n) vanishes proportionally to d as L → ∞ for all n > 0 leads to ρ b = ρ c = 0 and condensation with an empty bulk. The structure of the condensed phase is determined by the 1/n decay of stationary weights for large occupation numbers. This is quite robust, as is discussed in the next subsection. There we summarize some previous results and connections to other particle systems with Poisson-Dirichlet statistics.

Other Particle Systems with Poisson-Dirichlet Statistics
The model studied in [34] consists of N particles moving diffusively on a one-dimensional torus of length L, subject to a logarithmic attractive potential and short-range hard-core exclusion. The weak attraction leads to the formation of large gaps between groups of particles, and the distances y = (y 1 , . . . , y N ) between particles have a stationary distribution of the form (12) with weights w(y) = y −β , where β < 1 corresponds to a dimensionless inverse temperature controlling the strength of the noise. So the rescaled distances 1 L y provide a partition of the unit interval and follow a Dirichlet(1 − β, . . . , 1 − β) distribution. Of particular interest in [34] is the temperature scaling β = N −b N −1 1 as N → ∞ with b > 1, where Theorem 2.1 in [41] directly applies so that the order statistics converges in distribution to a Poisson-Dirichlet partition of [0, 1]. Indeed, the corresponding Beta(1, b − 1) distribution of the first size-biased marginalỹ 1 as in (17) is established independently in [34] without mentioning the connection to the Poisson-Dirichlet distribution. Note that in this model gaps between particles correspond to cluster sizes, and the average cluster size is therefore L/N . A related paper with a hierarchical clustering phenomenon for interacting diffusions on a ring is [35], and to our knowledge these continuous models are the only particle systems where a connection to Poisson-Dirichlet statistics has been recognized so far. The Brownian energy process introduced in [12,13] as a dual model to the inclusion process exhibits stationary product measures with chi-squared marginals, and conditioning on the total sum of occupation numbers leads to the same canonical distributions as the model in [34].
To test the robustness of our results against small changes in the stationary weights w(n), it is useful to consider zero-range processes. For any given w(n) it is well known that a process with the jump rate for a cluster of size n to lose a particle given by exhibits stationary product measures of the form (10) (see e.g. [3,17] and references therein).
Using the weights (20) for the inclusion process this leads to jump rates so that u(1) = 1/d diverges in a scaling limit with d → 0. All other rates are bounded and converge as u(n) → n n − 1 as L → ∞ for all n ≥ 2.
A zero-range process with rates (51) has exactly the same stationary distributions (12) as the inclusion process and all our results apply. Condensation in zero-range processes has been a major research area in recent years (see e.g. [2,5,9]), where decreasing rates u(n) 1 + b/n lead to stationary weights of order n −b , so that φ c = 1 and the critical density is given by (see discussion in Sect. 2.2) In such models, condensation is driven by strong enough on-site attraction between particles. The rates (51) have asymptotic behaviour u(n) n n − 1 1 + 1 n as n → ∞ (52) and the attraction between particles is not strong enough. Instead, cluster coarsening and condensation is driven by divergence of u(1) = 1/d, which ensures that ρ b = 0 in the bulk of the system and the remaining mass concentrates on a number of lattice sites decreasing in time.
We have checked numerically that the particular form of the rates (51) is in fact not important, and choices of the form u(n) = n/(n − 1) or u(n) = 1 + 1/n for n ≥ 2 lead to the expected Poisson-Dirichlet statistics at stationarity for u(1) = 1/d L/α with α > 0. This can be checked analytically on a case-by-case basis, but it is known that in general the asymptotic behaviour of the partition function and condensation behaviour may depend sensitively on perturbations of the rates (see e.g. [46,64,65]), so we are currently not able to prove a general result analogous to Theorem 1 based only on asymptotics of stationary weights or jump rates.

Appendix A: Condensation and Phase Separation
For completeness we summarize some implications of Definition 1 on phase separation and divergence of higher moments, using only the definition itself without any further assumptions on the canonical measures. Assume that we have a condensing particle system on the state space E L,N according to Definition 1, with canonical distributions π L,N and limiting singlesite marginal ν ρ as defined in (2). Weak convergence of π L,N to ν ρ in the thermodynamic limit N , L → ∞, N /L → ρ is equivalent to convergence of expectations of bounded test functions, so that for any K > 0 η x 1 η x ≤K L,N → η x 1 η x ≤K ρ . Now taking a second limit K → ∞ the right-hand side converges to ρ b = η x ρ , which is strictly smaller than ρ in a condensing system (so that both limits do not commute).
The two limits in this order can be used to characterize phase separation as explained in Sect. 2 on the level of single-site marginals, where η x 1 η x ≤K describes the bulk part of the distribution and η x 1 η x >K the condensed part. Definition 1 implies that the condensed phase is supported on a vanishing volume fraction but contains a non-zero fraction of the total mass. In the limit L, N → ∞, N /L → ρ and then K → ∞ we get condensed bulk/background mass fraction This follows simply from convergence for bounded test functions in the bulk and conservation of total probability and mass. It implies in particular that in this ordered limit η x |η x ≤ K L,N → ρ b and η x |η x > K L,N → ∞ for the average occupation numbers in the bulk and condensed phase, respectively. A further interesting property that is often used is that condensation leads to the divergence of higher order moments, due to the contribution of the condensed phase. This is implied by the following general result. Proposition 6 Assume that a system exhibits condensation as in Definition 1 in the thermodynamic limit with density ρ. Then for all x ∈ and any positive function f : N 0 → R + with f (n) → ∞ as n → ∞ we have and we see that they agree exactly in the case d L → α ∈ (0, ∞), leading to a balance of aggregation and fragmentation for macroscopic clusters at stationarity, and the interesting hierarchical structures of Theorem 1. If d L → 0 then τ a τ f and the balance cannot be reached, rather the system saturates in a single remaining cluster consistent with complete condensation results Proposition 2. On the other hand if d L → ∞, fragmentation dominates with τ f τ a for macroscopic clusters, and a balance is reached at sizes of scale 1/d instead (consistent with Theorem 2), which includes the case of no condensation with d = O (1). This heuristic provides useful insight on the level of the dynamics into our rigorous results which only depend on the form of the stationary distributions (12), and also implies that TA dynamics have to be simulated on times of order τ a = L/d to reach stationarity. A similar argument can be made for the complete graph geometry, where the dynamics is entirely different. Cluster sites are in direct contact, and exchange single particles with a rate of order m 2 /L, where we understand m 1 to be a 'typical' cluster size. Since the exchange is symmetric, it takes of order m 2 exchange events to change cluster sizes by a finite fraction, leading to τ a = L m 2 m 2 = L and τ f = The fragmentation time scale τ f follows since particles jump onto empty sites with rate dm and of order m jumps are needed to fragment a finite fraction of a cluster's mass. Here we used that due to m 1 cluster sites only cover a vanishing volume fraction. Even though both time scales are different from TA dynamics, an aggregation fragmentation balance is again reached for d L → α. Since we only care about the mass distribution and not the spatial location of clusters, equilibration time is now faster of order τ a = L. This is a crucial difference to TA dynamics, where the coarsening process is transport limited and clusters have to move in order to exchange particles. Due to the particular form of the jump rates for the inclusion process (36), CG dynamics can be implemented in a rejection-based algorithm summarized in Algorithm 1, and this provides a very simple and efficient way to produce Monte Carlo samples from the distribution π L,N (12).