Abstract
We study an inhomogeneous sparse random graph, \({\mathcal G }_N\), on \([N]=\{1,\dots ,N\}\) as introduced in a seminal paper by Bollobás et al. (Random Struct Algorithms 31(1):3–122, 2007): vertices have a type (here in a compact metric space \({\mathcal S }\)), and edges between different vertices occur randomly and independently over all vertex pairs, with a probability depending on the two vertex types. In the limit \(N\rightarrow \infty \), we consider the sparse regime, where the average degree is O(1). We prove a largedeviations principle with explicit rate function for the statistics of the collection of all the connected components, registered according to their vertex type sets, and distinguished according to being microscopic (of finite size) or macroscopic (of size \(\asymp N\)). In doing so, we derive explicit logarithmic asymptotics for the probability that \({\mathcal G }_N\) is connected. We present a full analysis of the rate function including its minimizers. From this analysis we deduce a number of limit laws, conditional and unconditional, which provide comprehensive information about all the microscopic and macroscopic components of \({\mathcal G }_N\). In particular, we recover the criterion for the existence of the phase transition given in Bollobás et al. (2007).
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this paper, we study the inhomogeneous random graph model as introduced in the seminal paper [7], that is, an Erdős–Rényi graph whose vertices have types. We consider the limit of a large number of vertices and concentrate on the sparse setting, where each vertex has a number of edges that is of order one. This setting is famous for the emergence of a giant cluster. This phase transition was detected and characterized in [7] with the help of a branching process. We consider the case in which the type set is a compact metric space, but our analysis builds on the proof for the type set being any finite set.
In the present paper, we analyze the model from the view point of large deviations in a detailed way. We go beyond existing results by (1) considering the joint statistics of all the clusters, both microscopic and macroscopic, (2) registering the types within the clusters (not only their sizes), and (3) giving a joint largedeviations principle (LDP) for all this information. In particular, we recover the limiting quantities and the resulting phase transition in great detail, giving a lot of additional information. Our main results are Theorems 1.1 and 3.1 (the LDPs for the type set being a compact metric space and a finite type set, respectively) and Theorems 2.3 and 2.1, where we deduce consequences for the phase transition. A building block for our study is Theorem 3.6, which gives explicit logarithmic asymptotics for the probability of a macroscopic subgraph being connected and which is of independent interest.
The remainder of this section is organized as follows. In Sect. 1.1 we introduce the inhomogeneous random graph, in Sect. 1.2 we present our first main result, the largedeviations principle, and in Sect. 1.3 we give an interpretation of the result. Asymptotic results on the connectivity of graphs are highlighted in Sect. 1.4. In Sect. 1.5 we compare our results with the existing literature.
The structure of the rest of the paper is as follows. Our second main result concerns consequences of the largedeviations principle for the limiting behaviour of the model, i.e., conditional and unconditional laws of large numbers. This relies on explicit variational analysis of the rate function of Theorem 1.1 and it is explained in Sect. 2, together with the giantcluster phase transition and a comparison to the results of [7]. Additionally, we explain the deep connection with an important inhomogeneous coagulation process and derive a solution to a spatial version of the Flory equation. In Sect. 3 we present the proof for our main result, the LDP, in the case of a finite type set. One key ingredient that we use are asymptotics for connection probabilities for different cluster sizes. They require more extensive proofs and are therefore contained separately in Sect. 4. In Sect. 3 we introduce the notation and the results in the framework of a finite type set, where many objects of interest have a simpler representation. Reading only this section is a gentle, yet complete, introduction to our results and it is certainly suitable to readers with limited time or only interested in graphs with finite type sets. In Sect. 5, we derive the LDP in the general setting, i.e., for compact metric type spaces via a discrete approximation in the spirit of the Dawson–Gärtner theorem. In Sect. 6 we derive a full characterization of the minimizers of the microscopic part of the rate function, and in Sect. 7 we analyze all the other parts of the rate function.
1.1 Inhomogeneous random graphs
We are going to define the random graph model that we study in this paper. It is called an inhomogeneous random graph in [7], while at full length it is sometimes named inhomogeneous random Erdős–Rényi graph.
Let \(N \in \mathbb {N}\); we consider a random graph on the vertex set \([N]=\{1,\dots ,N\}\) and fix a nonvoid set \({\mathcal S }\), the type set. We take a type vector \({{\textbf {x}}}={{\textbf {x}}}^{{{{({N}})}}}=(x_1,\dots ,x_N)\in {\mathcal S }^N\) and interpret \(x_i\) as the type of vertex i for any \(i\in [N]\). The edges of this graph are undirected and randomly drawn; self and multiple edges are excluded. The \(\left( {\begin{array}{c}N\\ 2\end{array}}\right) \) possible edges are sampled independently. The probability to draw an edge between two vertices with types r and s, is called \(p_{r,s}\); this defines a map \(p:{\mathcal S }\times {\mathcal S }\rightarrow [0,1]\). The resulting random graph is denoted \({\mathcal G }(N,{{\textbf {x}}},p)\) and is called the inhomogeneous random graph on [N] with type vector \({{\textbf {x}}}\) and function of probabilities p.
In this paper, we are interested in the limit as \(N\rightarrow \infty \) in the sparse case, i.e., in the case where the number of edges per vertex is of finite order. This is the case if the probabilities \(p_{r,s}\) are of order 1/N. Actually, we now impose that they are given by
where \(\kappa _N:{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) is a symmetric nonnegative bounded function, called a kernel. Without loss of generality, we are from now on assuming that \(\frac{1}{N}\kappa _N\le 1\), and hence \(\kappa _N(r,s)/N\) is a probability for any \(N\in \mathbb {N}\) and any \(r,s\in {\mathcal S }\). We will study the graph \({\mathcal G }_N={\mathcal G }(N,{{\textbf {x}}},\frac{1}{N}\kappa _N)\) in the limit \(N\rightarrow \infty \).
We are interested in the structure of all the components^{Footnote 1} of \({\mathcal G }_N\), depending on the types, but not the indices of the vertices. Hence it is sufficient to consider the empirical measure \(\mu _N =\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\) of the type vector \({{\textbf {x}}}\) and \(N \mu _N(R)\) is the number of vertices with type in \(R\subset {\mathcal S }\). We will assume that, as \(N\rightarrow \infty \), \(\mu _N\) converges weakly to a given probability measure \(\mu \) on \({\mathcal S }\). One can conceive \({\mathcal G }_N\) as a graph on \({\mathcal S }\) where in each point \(x\in {\mathcal S }\) there sit precisely \(N\mu _N(\{x\})\in \mathbb {N}\) vertices, which are all distinguished and labelled. Further, we will work under the assumption that \(\kappa _N\) converges to a limiting kernel \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) that is continuous.
We denote by \(\{{\mathcal C }_j\}_{j}\) the collection of all the vertex sets of the connected components of \({\mathcal G }_N\). This collection is a random decomposition of [N]. Since we are only interested in the statistics of these components, the labeling of the vertices in a component is irrelevant. Actually, we are even only interested in the statistics of the types of all the vertices, counted with multiplicity. To this end, we introduce the typeregistering empiricalmeasure function
where \({\mathcal P }([N])\) is the set of subsets of [N], and \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) is the space of finite measures on \({\mathcal S }\) with values in \(\mathbb {N}_0\). We call every element of \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) a typeconfiguration and will denote it by k. In words, \(\eta _{{\textbf {x}}}(A)(R)\) is, for any set \(R\subset {\mathcal S }\), the number of vertices in \(A\subset [N]\) with type in R. In particular, \(\eta _{{{\textbf {x}}}}([N])=N\mu _N\). We write \({\mathcal M }({\mathcal {X}})\) for the set of finite measures on a set \({\mathcal {X}}\); the measurable structure on \({\mathcal {X}}\) will be clear from the context.
We will study the empirical measure of the collection \((\eta _{{{\textbf {x}}}}({\mathcal C }_j))_j\), i.e., the statistics of how many times a given typeconfiguration appears as the typeconfiguration of a component of \({\mathcal G }_N\). We will pay particular attention to the scale of the size of the component, more precisely, whether it is finite or it has a size \(\asymp N\). We will call the first scale microscopic and the second scale macroscopic (macroscopic components are usually called giant components). Hence, the quantities of interest in our study are the microscopic and the macroscopic empirical measures of the typeconfigurations of the components, which we define as follows:
It is clear that both \(\text {Mi}_N\) and \(\text {Ma}_N\) only depend on \({{\textbf {x}}} \) through its empirical measure \(\mu _N\). Both \(\text {Mi}_N\) and \(\text {Ma}_N\) are random measures on \({\mathcal M }({\mathcal S })\), i.e., they are elements of \({\mathcal M }({\mathcal M }({\mathcal S }))\). More precisely, \(\text {Mi}_N\) is a measure on the set \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) of measures on \({\mathcal S }\) with values in \(\mathbb {N}_0\), and \(\text {Ma}_N\) is a measure that takes values in \(\mathbb {N}_0\) and is defined on the set \({\mathcal M }({\mathcal S }){\setminus }\{0\}\) of nontrivial measures on \({\mathcal S }\). Both \(\text {Mi}_N\) and \(\text {Ma}_N\) contain precisely the same information for fixed N, but in the limit \(N\rightarrow \infty \), \(\text {Mi}_N\) asymptotically registers only the microscopic components and \(\text {Ma}_N\) only the macroscopic ones. Indeed, the nonmicroscopic components leave the state space \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) via the set of measures with unbounded total mass, and the nonmacroscopic ones (with prefactor 1/N) leave \({\mathcal M }({\mathcal S })\) via the measures with vanishing total mass, i.e., via \(\{0\}\). The topologies that we will introduce below reflect this effect; it lies at the heart of the phase transition of the emergence of a giant cluster, which we are also interested in here. On the other side, they are very natural, as they reduce to the pointwise respectively to the usual vague topology for a finite set \({\mathcal S }\).
The effect of a diverging or vanishing total mass can also be observed in terms of integrated versions of \(\text {Mi}_N\) respectively \(\text {Ma}_N\). Indeed, observe that for any measurable \(R\subset {\mathcal S }\) and any \(N\in \mathbb {N}\)
According to our assumption that \(\mu _N\) weakly converges towards \(\mu \), the righthand side of (1.3) converges to \(\mu (R)\), as \(N\rightarrow \infty \), if R is a \(\mu \)continuity set, i.e., if \(\mu (\partial R) =0\). However, the topology on the state space for \(\text {Mi}_N\) will be chosen as the vague one, and in this topology any accumulation point \(\lambda \) of \((\text {Mi}_N)_{N\in \mathbb {N}}\) satisfies a priori only \(\int k(R)\,\lambda (\textrm{d}k)\le \mu (R)\), since the map \(k\mapsto k(R)\) is unbounded in general. The same holds for \((\text {Ma}_N)_{N\in \mathbb {N}}\).
To take into account the possibility of a loss of mass we introduce for any \(\lambda \in {\mathcal M }({\mathcal M }_{\mathbb {N}_0}({\mathcal S }))\) and any \(\alpha \in {\mathcal M }_{\mathbb {N}_0}({\mathcal M }({\mathcal S }){\setminus }\{0\})\) the following measures on \({\mathcal S }\)
We call \(c_\lambda \) and \(c_\alpha \) the integrated typeconfigurations of \(\lambda \), respectively of \(\alpha \). If one sees \(\lambda \) as a (not normalized) ‘distribution’ of finite point measures on \({\mathcal S }\), then \(c_\lambda \) registers the ‘expected’ total mass of particles in a given subset of \({\mathcal S }\) that appear in this distribution \(\lambda \); an analogous statement holds for \(\alpha \). The total masses of \(c_\lambda \) and \(c_\alpha \) are equal to the integrals of \(k\mapsto k({\mathcal S })\) under \(\lambda \) respectively under \(\alpha \); they are \(\le 1\) for any accumulation point of \((\text {Mi}_N)_{N\in \mathbb {N}}\) respectively of \((\text {Ma}_N)_{N\in \mathbb {N}}\), according to the above.
The natural state space containing \(\text {Mi}_N\) for any \(N\in \mathbb {N}\), is the set
The condition \(\lambda (\{0\})=0\) is clearly satisfied by any empirical measure \(\text {Mi}_N\), we could have identified it indeed as an element of \({\mathcal M }({\mathcal M }_{\mathbb {N}_0}({\mathcal S }){\setminus } \{0\})\). However for later notational convenience we do not exclude \(\{0\}\) but we add the constraint \(\lambda (\{0\})=0\). Notice that with this constraint and the conditions on \(c(\lambda )\) any \(\lambda \in {\mathcal L }\) is a subprobability measure. Any k in the support of \(\lambda \) is a finite and nonzero point measure \(k=\sum _i\delta _{z_i}\) with \(z_i\in {\mathcal S }\) (with possible repetitions) and stands for the empirical measure of the types of a vertex set of a component, its typeconfiguration. Informally, the event \(\{\text {Mi}_N= \lambda \}\) is the event that, for every \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\), \({\mathcal G }_N\) has \( N \lambda (\{k\})\) components with typeconfiguration k.
The natural state space containing \(\text {Ma}_N\) for any \(N\in \mathbb {N}\), is the set
One can write \(\alpha \in {\mathcal A}\) as a finite or at most countable point measure \(\alpha =\sum _n\delta _{y_n}\) on measures \(y_n\) on \({\mathcal S }\) (with possible repetitions). Note that \(c_\alpha = \sum _n y_n\) and since \(c_\alpha ({\mathcal S }) \le 1\), the total masses \(y_n({\mathcal S })\) of the measures \(y_n\) can accumulate only at zero. Another consequence is that every \(\alpha \in {\mathcal A}\) is concentrated on the set of subprobability measures. For each \(y_n\) we interpret \(N y_n\) as the typeconfiguration of a giant component. Informally, the event \(\{\text {Ma}_N=\alpha \}\) is the event that \({\mathcal G }_N\) has, for any n, a macroscopic component with typeconfiguration \( N y_n\).
We will be working only with two particular choices of the type set: \({\mathcal S }\) as a finite set (equipped with the power set as topology and as sigmafield) and \({\mathcal S }\) equal to a compact metric space (equipped with the topology induced by the metric and the corresponding Borel sigmafield). We equip \({\mathcal M }({\mathcal S })\) with the weak topology that is generated by all the test integrals against continuous bounded functions \({\mathcal S }\rightarrow \mathbb {R}\). However, on the sets \({\mathcal L }\) and \({\mathcal A}\), the appropriate topologies for our purposes are the vague topologies, the ones that are induced by all the test integrals against compactly supported continuous test functions \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\rightarrow \mathbb {R}\), respectively \({\mathcal M }({\mathcal S }){\setminus }\{0\}\rightarrow \mathbb {R}\). If \({\mathcal S }<\infty \), then \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) can be identified with \(\mathbb {N}^{{\mathcal S }}_0\) and vague convergence in \({\mathcal M }({\mathcal M }_{\mathbb {N}_0}({\mathcal S }))\) is the same as pointwise convergence on \({\mathcal M }(\mathbb {N}_0^{{\mathcal S }})\). On \({\mathcal L }\times {\mathcal A}\) we use the product topology. We will show in Lemma 5.2 that both \({\mathcal L }\) and \({\mathcal A}\) are compact, hence also \({\mathcal L }\times {\mathcal A}\) is.
The convergence in this topology is the natural one that reflects the possible loss of mass that we are interested in in view of the phase transition. The crucial point is that mass of \(\text {Mi}_N\) can leak out only via the unboundedness of \(k \mapsto k({\mathcal S })\), i.e., via having larger and larger connected components, while mass of \(\text {Ma}_N\) can leak out due to the fact that \(y \mapsto y({\mathcal S })\) is not bounded away from zero. With other words, mass of \(\text {Ma}_N\) leaks out only via the zero measure, where every nongiant component leaves. See Sect. 5 for details. By the definitions of \({\mathcal L }\) respectively \({\mathcal A}\), the integrated typeconfigurations \(c_\lambda \) and \(c_\alpha \) for \(\lambda \in {\mathcal L }\) and \(\alpha \in {\mathcal A}\) are subprobability measures, while for fixed \(N\in \mathbb {N}\) both \(c_{\text {Mi}_N}\) and \(c_{\text {Ma}_N}\) are even probability measures. The total mass one of \(c_{\text {Mi}_N}\) can partially get lost and \(c_{\text {Ma}_N}\) may not lose all its mass in the limit \(N\rightarrow \infty \). This is precisely the phase transition that we are after.
1.2 The largedeviations principle for the cluster statistics
In this section we formulate the main result of this paper. We assume that \({\mathcal S }\) is a compact metric space.
We need some notation. For any measure \(\nu \) on \({\mathcal S }\) and any function \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\), we write \(\kappa \nu (r)=\int _{\mathcal S }\kappa (r,s)\,\nu (\textrm{d}s)\). The total mass of a measure \(\nu \) on a measure space \({\mathcal {X}}\) is denoted by \(\nu = \nu ({\mathcal X })\). The relative entropy of two (possibly nonnormalized) finite measures \(\nu ,{\widetilde{\nu }}\) on \({\mathcal X }\) is denoted by
We write \(\langle \nu ,f\rangle \) for the integral of a function f with respect to a measure \(\nu \), and we write \(f\,\nu \) for the measure that has the density f with respect to a measure \(\nu \).
An important reference measure is the distribution \(\mathbb {Q}_\nu \) of a Poisson point process \(\mathbb {X}\) on \({\mathcal S }\) with intensity measure \(\nu \in {\mathcal M }({\mathcal S })\), then \(\mathbb {Q}_\nu \) is a measure on \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\). Note that we do not assume that \(\nu \) has a density, hence \(\mathbb {X}\) is not necessarily simple.
We define a function \(\tau \) by
where \((x_1,\dots ,x_{k})\in {\mathcal S }^{k}\) is any vector that is compatible with k, i.e., \(k=\sum _{i=1}^{k}\delta _{x_i}\), and \({\mathcal T }(k)\) is the set of spanning trees on \([{\left{k}\right}]\). Notice that \(\tau \) depends on \((x_1,\dots ,x_{k})\) only through k. We use the convention that \({\mathcal T }(0) =\emptyset \) and hence \(\tau (0)=0\), since the sum is empty. As we will see in Lemma 3.4, up to a factor of \(N^{k+1}\), \(\tau (k)\) is equal to the largeN asymptotics of the probability that the graph \({\mathcal G }(k, (x_1, \ldots , x_{k}),\frac{1}{N} \kappa )\), which can be seen as a subgraph of \({\mathcal G }(N,{{\textbf {x}}}^{{{{({N}})}}}, \frac{1}{N} \kappa )\), is connected.
We say that \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) is irreducible with respect to a measure \(\mu \in {\mathcal M }({\mathcal S })\) if
Otherwise \(\kappa \) is called reducible.
Here is the main result of this paper.
Theorem 1.1
(LDP for \((\text {Mi}_N,\text {Ma}_N)\)). Fix a probability measure \(\mu \) on a compact metric space \({\mathcal S }\) and a kernel \(\kappa \) on \({\mathcal S }\times {\mathcal S }\) that is nonnegative, continuous and irreducible with respect to \(\mu \). Assume that \({{\textbf {x}}}={{\textbf {x}}}^{{{{({N}})}}}\in {\mathcal S }^N\) is such that its empirical measure \(\mu _N\) converges weakly towards \(\mu \) as \(N\rightarrow \infty \). Assume that \(\kappa _N\) is a nonnegative and continuous kernel for any \(N\in \mathbb {N}\) such that \(\kappa _N\) converges uniformly towards \(\kappa \) as \(N\rightarrow \infty \). Let \(\text {Mi}_N\) and \(\text {Ma}_N\) be, respectively, the microscopic and macroscopic empirical measure of the connected components of \({\mathcal G }_N={\mathcal G }(N,{{\textbf {x}}}^{{{{({N}})}} },\frac{1}{N}\kappa _N)\), for any \(N\in \mathbb {N}\), as defined in (1.2).
Then \((\text {Mi}_N,\text {Ma}_N)\) satisfies a largedeviations principle with speed N and rate function I defined by
where, for \(\lambda \in {\mathcal L }\), \(\alpha \in {\mathcal A}\) and \(\nu \in {\mathcal M }({\mathcal S })\),
As in the definition of \(\mathbb {H}\) in (1.8), we define \(I_\text {Ma}(\alpha )=\infty \) if it is not true that \(\alpha \)almost everywhere \(\frac{\textrm{d}y}{(1{{\text {e}} }^{\kappa y})\, \textrm{d}\mu }\) exists. Likewise we define \(I_\text {Me}(\nu )=\infty \) if \(\frac{\textrm{d}\nu }{\kappa \nu \,\textrm{d}\mu }\) does not exist. We use the convention that \(\log 0=\infty \) and \(0\log 0=0\).
Let us recall the notion of an LDP: Theorem 1.1 says that \(I(\cdot )\) is lower semicontinuous and, for any open set \(G\subset {\mathcal L }\times {\mathcal A}\) and any closed set \(F\subset {\mathcal L }\times {\mathcal A}\),
where we wrote \(\mathbb {P}_N\) for the probability measure under the random graph \({\mathcal G }_N\). For the theory of large deviations, see e.g. [19].
An intuitive explanation of Theorem 1.1 is given in Sect. 1.3. The proof of Theorem 1.1 is in Sect. 5. It relies heavily on the special case of Theorem 1.1 for finite sets \({\mathcal S }\), see Theorem 3.1, whose proof we present first in Sect. 3. Our main strategy there is to identify the joint distribution of all the clusters in \({\mathcal G }(N,{{\textbf {x}}}^{{{{({N}})}} },\frac{1}{N}\kappa _N)\) in a combinatorial way and then to explicitly extract the exponential rates. The proof of Theorem 1.1 in Sect. 5 carries out an approximation procedure of \({\mathcal S }\) with finite state spaces in the spirit of the Dawson–Gärtner theorem.
Theorem 1.1 is an extension of [2, Theorem 1.1] from the special case \(\mu =\delta _0\) and constant \(\kappa \) (that is, from the standard Erdős–Rényi graph) to an inhomogeneous Erdős–Rényi graph. Note that this LDP is also highly nontrivial, interesting, and new in the case of an arbitrary \(\mu \) and constant \(\kappa \), to the best of our knowledge.
Remark 1.2
(Quenched and annealed LDPs) One possible application of Theorem 1.1 is to the situation where the vertex types \(x_1,\dots ,x_N\) are themselves random and independent with distribution \(\mu \) each. Then Theorem 1.1 can be seen as a conditional LDP given \({{\textbf {x}}}\), sometimes called a quenched LDP. The rate function turns out to be not random and depending only on \(\mu \). One can then obtain an annealed version of the LDP, i.e., when the probabilities are also taken with respect to the vertices \(x_1,\dots ,x_N\). The annealing follows from a standard mixture argument when \({\mathcal S }\) is a finite set of points; for general \({\mathcal S }\) the construction of a discretization suitable for use in our proof is a delicate matter that we do not explore here. One possible formulation of the annealed result would be that the triple, consisting of the empirical measures of the vertices, and \(\text {Mi}_N\) and \(\text {Ma}_N\) satisfies an LDP with rate function equal to \((\nu ,\lambda ,\alpha )\mapsto \mathbb {H}(\nu \mu )+I_{\nu }(\lambda ,\alpha )\), where we now wrote \(I_\nu \) for the rate function I of Theorem 1.1 with \(\nu \) the limiting empirical measure of the type vector (instead of \(\mu \)).
Remark 1.3
(Detailedness) We decided to register any component of the graph \({\mathcal G }_N\) only through its typeconfiguration, neglecting all the information about the internal connection structure. It is a natural wish to have also a more detailed analysis, for example by distinguishing each component as a subgraph instead of the typeconfiguration. From such a refined LDP, one could derive Theorem 1.1 via the contraction principle.
For the microscopic components it is indeed not too difficult to derive a refined version of the LDP of Theorem 1.1, since for each typeconfiguration \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\), the statistics of the \(\approx N \lambda (k)\) components with typeconfiguration k follow an explicit multinomial distribution. The form of the term \(\tau (k)\) gives the hint that only spanning trees survive. The macroscopic components are much more involved and it is not clear which type of structure gives the decisive contribution in the limit.
Here is a standard corollary from the LDP in Theorem 1.1 about separate LDPs for \(\text {Mi}_N\) and \(\text {Ma}_N\).
Corollary 1.4
(Separate LDPs for \(\text {Mi}_N\) and \(\text {Ma}_N\)) Under the assumptions of Theorem 1.1, the empirical measures \((\text {Mi}_N)_{N\in \mathbb {N}}\) and \((\text {Ma}_N)_{N\in \mathbb {N}}\) each satisfy an LDP on \({\mathcal L }\), respectively on \({\mathcal A}\), with rate functions
The LDP assertion directly follows from the contraction principle (see [19]), since both projections \((\lambda ,\alpha )\mapsto \lambda \) and \((\lambda ,\alpha )\mapsto \alpha \) are continuous. The identification of the two contracted rate functions is formulated in Theorem 2.3 and discussed in Sect. 2.1.
Remark 1.5
(LDP for the mesoscopic part) Analogously to the corresponding result in [2], we could formulate and prove also a corollary about the mesoscopic part of the configuration \(({\mathcal C }_j)_j\) of the components of \({\mathcal G }_N\), i.e., about those components whose cardinalities satisfy \(R<{\mathcal C }_j<\varepsilon N\) in the limit \(N\rightarrow \infty \), followed by \(R\rightarrow \infty \) and \(\varepsilon \downarrow 0\). It is clear that we cannot consider the empirical measure of all these components anymore, but only the empirical measure of the total number of vertices of a given type in any of the mesoscopic components. Our conjecture is that (similarly to [2, Corollary 1.4]), this measure on \({\mathcal S }\) satisfies an LDP as \(N\rightarrow \infty \) for fixed \(R\in \mathbb {N}\) and \(\varepsilon >0\) with a rate that converges towards \(I_\text {Me}\) defined in (1.13) as \(R\rightarrow \infty \) and \(\varepsilon \downarrow 0\). We abstained from writing out the details.
1.3 Interpretation of the LDP
Our main result, the LDP of Theorem 1.1, is highly compressed and contains a number of interesting results as special cases, so let us comment on the impact and draw some conclusions from it. We will restrict here to the largedeviations issues; the limiting issues and the consequences for the giantcluster phase transition are deferred to Sect. 2.
We are examining the probability of the event \(\{\text {Mi}_N\approx \lambda ,\text {Ma}_N\approx \alpha \}\), asymptotically for large N, for any \(\lambda \in {\mathcal L }\) and \(\alpha =\sum _n\delta _{y_n}\in {\mathcal A}\). Indeed, we want to heuristically argue that, as \(N\rightarrow \infty \), one has
Recall from Sect. 1.1 that this is the event that \({\mathcal G }_N\) has \(\sim N \lambda (k)\) components with typeconfiguration k, for any \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\), and a macroscopic component with typeconfiguration \(\sim N y_n\), for any n. We can clearly restrict to the case that \(c_\lambda +c_\alpha \le \mu \), since otherwise the number of vertices of some type in all the microscopic or macroscopic components together would be larger than the number of existing vertices of that type. However, it might be that the difference \(\nu = \mu c_\lambda c_\alpha \) is a positive measure; this means that there are \(\sim N \nu (R)\) of the vertices with type in R in mesoscopic components for any \(R\subset {\mathcal S }\), e.g., in components with Ndependent cardinalities like \(\log N\) or \(N^{1/3}\), or any mixture.
Recall that the types of all the vertices of \({\mathcal G }_N\) are approximately distributed as \(\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\approx \mu \). The key point in our proof is that the probability of \(\{\text {Mi}_N\approx \lambda ,\text {Ma}_N\approx \alpha \}\) consists of a number of terms that are more or less independent, i.e., lead to a sum of exponential rates. These terms are the following:

a combinatorial term that expresses the number of decompositions of [N] into the collection of subsets as above (respecting all the types),

the probability that each of these subsets are connected (this depends on the typeconfigurations k of the microscopic components and on the typeconfiguration \(N y_n\) of the nth macroscopic component, respectively),

the probability that any two of all these vertex sets are not connected.
The above decomposition is the starting point of our combinatorial analysis in Lemma 3.3. Now, the crucial point in our LDP proof consists in the fact that we can get precise asymptotics for most of the terms in the decomposition of \(\mathbb {P}_N(\text {Mi}_N\approx \lambda ,\text {Ma}_N\approx \alpha )\). In particular, we rearrange the terms to isolate the contribution given by the microscopic components, respectively macroscopic, and collect in a remaining term all the rest, identifying this as the contribution of the mesoscopic part. In this rearrangement, we see that the probability that each subset is connected is the hardest term to handle. However, it turns out that it is enough to prove sharp asymptotics only in the case of finite size and order N size components, for the remaining terms upper bounds are sufficient. Let us mention, in particular, that asymptotics of the probability of connectedness are not trivial at all for macroscopic components and obtaining such asymptotics is not only a crucial step in our LDP proof but also an interesting result on its own. We comment more on this in Sect. 1.4. Given this rearrangement, the largedeviations rate terms that we finally obtained are organized and interpreted in a slightly different fashion as follows.
Let us first consider the microscopic part. The first two terms in \(I_\text {Mi}(\lambda )\), \(\mathbb {H}(\lambda {\mathbb {Q}}_\mu )1\), describe the number of labellings of \(N c_{\lambda }\) vertices into microscopic subsets, according to \(\lambda \) and respecting the type configurations. A priori, it is only notationally convenient to write this as an entropy, but this interpretation allowed us to make the step from discrete to continuous setting. The next three terms, \(\langle \lambda ,\log \tau \rangle +c_\lambda \lambda \), describe the probability that all the considered subsets are connected, see the comment below (1.9). The last term, \(\frac{1}{2} \langle c_\lambda , \kappa \mu \rangle \), collects the costs of isolating each microscopic component from the rest of the system.
For the macroscopic part, given a macroscopic measure \(\alpha =\sum _{n}\delta _{y_n}\), the term \(\sum _{n}\langle y_n,\log \frac{\textrm{d}y_n}{\textrm{d}\mu }\rangle \) comes from the number of labellings of \(N c_{\alpha }\) vertices into macroscopic subsets, according to \(\alpha \) and to the type configurations. The term \(\sum _{n}\langle y_n,\log (1{{\text {e}} }^{\kappa y_n})\rangle \) summarizes the connection probabilities of all the macroscopic components, see Theorem 3.6. Finally, for each n, the term \(\frac{1}{2}\langle y_n, \kappa (\mu y_n)\rangle \) is the cost of isolating the macroscopic component from the rest of the system.
In the mesoscopic part of the rate function, we see only the dependence on the measure \(\nu \) of all the vertices of mesoscopic components, without any information about the components themselves. Again, the term \(\langle \nu , \log \frac{\textrm{d}\nu }{\textrm{d}\mu }\rangle \) is a result of the relabelling of the \(N\nu \) vertices according to the type configuration, and the integral of \(\log (\kappa \nu )\) with respect to \(\nu \) describes in a summarizing way that each of the vertex sets is connected. The term \(\frac{1}{2} \langle \nu , \kappa \mu \rangle \) represents the cost of isolating such vertices from the rest of the graph.
1.4 Connectivity of inhomogeneous graphs
On our way to a proof of Theorem 1.1, we derive some interesting formulas for quantities that are of general interest in the theory of multitype Erdős–Rényi graphs. Indeed, in Lemma 3.3 we give a closed formula for the joint distribution of the entire collection of the vertex sets of all the components of \({\mathcal G }_N\). One important ingredient there is the probability for a given subset of vertices \(\subset [N]\) to be connected. If the size of the subset is kept fixed, then it is straightforward to get sharp estimates for the limit of this probability, as \(N\rightarrow \infty \) (see Lemma 4.8). However, if the size of the subset is of order N, then it is much harder to derive sharp asymptotics for this probability. Our LDP relies precisely on this kind of asymptotics, which we derive in Theorem 3.6 in the case of a finite type space. We want to stress that finding Theorem 3.6 was crucial for proving the LDP and that we could not find any similar result in the literature that holds for inhomogeneous graphs. It requires a rather extensive proof, which is given in Sect. 4. Having established the LDP in its full generality one gets the same result about the connection probability in the more general setting of a compact type space.
Corollary 1.6
(Connectivity probability of \({\mathcal G }_N\)) In the situation of Theorem 1.1 we have that
The question about the connection probability of a random graph has attracted quite some interest. Let us mention [20], which studies for the inhomogeneous random graph the regime where the edge probability is of order \(\frac{\log N}{N}\) and proves a phase transition: the probability of the graph to be connected either converges to 1 or to 0, depending on the parameters \(\mu \) and \(\kappa \) (in our notation). In our case, where the edge probability is of order \(\frac{1}{N}\), we are in the case in which the probability of the graph being connected is always going to 0. Corollary 1.6 above identifies the exponential rate of its decay, which we think is of independent interest.
1.5 Related literature
This paper is a natural generalization to the inhomogeneous setting of [2], where we derived an LDP for all the cluster sizes of the Erdős–Rényi random graph in the sparse setting. Indeed, the classical sparse Erdős–Rényi graph corresponds to the case where \({\mathcal S }\) has only one element and \(\kappa \) takes only one value, meaning that the result in [2] is a special case of the results in this paper. As mentioned in [2], the literature on the Erdős–Rényi graph is rich, but very few results on large deviations in the sparse regime are present. To the best of our knowledge, our paper is the first proving a largedeviations statement in the framework of the inhomogeneous graphs. Inhomogeneous random graphs have been introduced in [34] and the first mathematical treatment of the model has been presented in the seminal paper [7]. In [7] events that happen with high probability are studied, while the focus of the present paper is on rare events. In this section we list some earlier results concerning large deviations for random graphs and comment on their relation to our work.
In [2, Section 1.4], we gave a broad survey on known results on LDPs for sparse random graphs; summarizing, there are indeed some results on particular statistics of the graph \({\mathcal G }_N\), many of which are a posteriori contained in [2, Theorem 1.1] as special cases. For example in [32] two LDPs for the size of the largest component and for the number of isolated vertices have been derived. These quantities are continuous functionals of our measures \(\text {Ma}_N\), respectively of \(\text {Mi}_N\), and the contraction principle gives these results as consequences of ours.
It is important to mention the paper [8], where the authors proved an LDP for the empirical measure of the components rooted at each vertex in the sparse Erdős–Rényi graph. This object is a detailed sizebiased version of our \(\text {Mi}_N\) and contains information about the edges that establish the connectedness of the components. However, in [8, Theorem 1.8], the authors show that their rate function is concentrated on trees, therefore any feasible microscopic component of size \(k\in \mathbb {N}\) is indeed a spanning tree on k vertices. This is also implicitly proven in our Lemma 3.4: the element \(\tau \), defined in (1.9), ensures precisely that, when computing the probability that a microscopic component is formed on a certain finite set of vertices, the only important contribution is given by realizing a spanning tree on those vertices. This implies that a refinement of our proofs would give an LDP for a microscopic empirical measure on finite size components, as we mention in Remark 1.3. A sizebiased version of such an empirical measure would correspond to a generalization to the inhomogeneous setting of the empirical neighborhood distribution in [8]. In this direction, in the very recent preprint [3], the authors deal with such an object in the case where the vertices of the graph have a type but the kernel \(\kappa \) is constant. They prove an LDP using the techniques coming from [8] and relying on the notion of entropy for stochastic processes on marked rooted graphs introduced in [17]. Further investigation is needed (and desirable) to understand connections between this entropy, which comes out as the largedeviations rate function, and the rate function from our LDP.
In the framework of the sparse Erdős–Rényi graph, i.e., when the connection probability of \({\mathcal G }(N,p)\) satisfies \(p \asymp N^{1}\), recent progress has been made on the tails of triangle counts [12, 21], while we are not aware of the study of other rare events for the inhomogeneous graphs in such regime.
The case of large deviations in the dense Erdős–Rényi graph, i.e., for \({\mathcal G }(N,p)\) with fixed \(p\in (0,1)\), has been completely covered thanks to Chatterjee and Varadhan [14], see [13] for an overview. In [9, 28] extensions of the LDP to the framework of the dense inhomogeneous graphs are given. This regime is rather different from the sparse one and the LDP relies on the fact that each graph can be associated with a two dimensional symmetric function, called graphon. The limit of any sequence of sparse graphs is the graphon which is identically zero, showing that the space of graphons is not the right space in the sparse setting. Recently extensions to the sparse setting of the concept of graphons have been introduced, see [10], but connections to the sparse Erdős–Rényi graph to our knowledge are not known yet. Let us mention that the literature on LDPs for various statistics of different types of random graphs outside the sparse regime has seen a growth since the seminal work [14], see for example [15, 18]. However these graphs are by their nature so different from the setting of the present paper that we do not go into further details here.
For inhomogeneous graphs in the sparse regime, there are only a few results in the literature, starting with the seminal paper [7], which introduced the model and investigated the giantcluster phase transition in detail. Furthermore, clusters of critical sizes of order \(N^{\alpha }\) with some \(\alpha \in (0,1)\) around the phase transition have been studied for some types of inhomogeneous random graphs in [5, 6] and [36] under certain moment assumptions on the (scalar) types. Let us finally mention some results just outside the sparse setting: [11] analyzes the eigenvalues of the adjacency matrix of an inhomogeneous Erdős–Rényi random graph with vanishing edge probabilities of order \(\gg \frac{1}{N}\). In [20] the authors study the probability of the graph to be connected when the edge probabilities are of order \(\frac{\log N}{N}\). This compares to our result in Corollary 1.6, where we obtain precise asymptotics for such a probability in the sparse regime.
Finally, we have recently become aware of the preprint [27], where the authors study the fixed point equation (2.4) using combinatorial identities that we also rely on in Sect. 4. The focus of the paper is indeed to prove that a multitype version of the Marcus–Lushnikov coagulation model with multiplicative kernel converges to the solution of the multitype Flory equation (2.14). The particle masses of such a coagulation system are in onetoone correspondence with the connected components of the inhomogeneous random graph, linking their convergence result to our Lemma 2.7.
2 Limiting consequences
In this section we present and discuss the second part of our main results, some consequences of the LDP of Theorem 1.1 that imply detailed and comprehensive limiting assertions about the inhomogeneous random graph. These results rely on involved variational analysis, using recursive formulas and elements of combinatorial power series analysis as methods to explicitly construct minimizers.
Like many largedeviations principles, also Theorem 1.1 implies a law of large numbers. This is particularly interesting here, since it implies and illustrates the wellknown phase transition about the emergence of a giant cluster that was established in [7], and which we record and discuss in Sect. 2.1. There we also reveal our identification and interpretation of that phase transition in terms of the minimizer(s) of the rate function of our LDP in Theorem 1.1. In Sect. 2.2, we compare to the description of this phase transition that was given in [7] in terms of a strongly related multitype branching process. Furthermore, in Sect. 2.3 we comment on the irreducibility of \(\kappa \) and explain what the LDP looks like if this assumption is dropped. One of our main motivations for the present work is explained in Sect. 2.4 where we map the inhomogeneous Erdős–Rényi graph on a particular particle process with a random coagulating mechanism and discuss the consequences of our results for this process, in particular the phase transition of gelation type, i.e., the emergence of a gel, a macroscopic particle.
2.1 The phase transition
We give now comprehensive information about the wellknown phase transition of the emergence of a giant component for the inhomogeneous Erdős–Rényi graph using the LDP of Theorem 1.1. Indeed, we derive a detailed picture of all the limiting microscopic and macroscopic clusters, according to their typeconfigurations. In this way, we go substantially beyond the work [7], to which we compare in Sect. 2.2 below. Unlike [7], our point of departure is not a multitype branching process, but the variational analysis of the rate function. This leads us in a natural way to deal with a transformed Poisson point process, which indeed shows deep connections with the multitype branching process.
We introduce first the minimizing microcluster distribution. Let us fix a kernel \(\kappa \) as in Theorem 1.1. Recall that, for any measure \(\nu \) on \({\mathcal S }\), we denote by \(\mathbb {Q}_\nu \) the distribution of a Poisson point process on \({\mathcal S }\) with intensity measure \(\nu \). For \(c\in {\mathcal M }({\mathcal S })\), we introduce the measure \(\lambda _c\) on \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) by
In words, \(\lambda _c\) is obtained by transforming the Poisson point process with intensity measure \({{\text {e}} }^{\kappa c(r)}\, c(\textrm{d}r)\) with the function \(\tau \). It will turn out in the subcritical case (and is implicit in the following theorem) that this measure possesses the integrated type configuration c, that is, the choice of the measure \(\theta _c\) implies the crucial property that \(c_{\lambda _c}=c\), where we recall the notation \(c_\lambda (\textrm{d}r)=\int \lambda (\textrm{d}k) \, k(\textrm{d}r)\).
We now introduce an important quantity (which was shown to be crucial in [7]) that we use for separating the sub and supercritical regimes. For any measure \(\nu \) on \({\mathcal S }\), we denote by \(L^2(\nu )\) the usual \(L^2\)space of functions \({\mathcal S }\rightarrow \mathbb {R}\) with respect to the measure \(\nu \). We introduce the operator
and its operator norm
Informally (an argument will follow in Sect. 4.1), in the special case that the support of \(\nu \) is finite (i.e., the case of a finite set \({\mathcal S }\)), then \(T_{\kappa ,\nu }\) can be identified with the matrix \((\kappa (r,s)\nu (s))_{r,s\in {\mathcal S }}\), and \(\varSigma (\kappa ,\nu )\) is its spectral radius.
Recall the rate function I and the reference measure \(\mu \) from Theorem 1.1, now we describe its minimizer.
Theorem 2.1
(Minimizers of the rate function) Suppose that \(\kappa \) and \(\mu \) are as in Theorem 1.1, then the following hold.

(i)
If \(\varSigma (\kappa , \mu )\le 1\), then the unique minimizer of I is equal to \((\lambda _\mu ,0)\).

(ii)
If \(\varSigma (\kappa , \mu )> 1\), then the unique minimizer of I is equal to \((\lambda _{c^*},\delta _{\mu c^*})\), where the subprobability measure \(c^*\) is the only solution to the characteristic equation
$$\begin{aligned} {{\text {e}} }^{\kappa c(r)}\,c(\textrm{d}r)={{\text {e}} }^{\kappa \mu (r)}\,\mu (\textrm{d}r)\qquad \text{ on } {\mathcal S }, \end{aligned}$$(2.4)that satisfies both \(c^*\le \mu \) and \(c^* \ne \mu \). It further holds that \(\varSigma (\kappa , c^*)< 1\).
In particular,
The proof of Theorem 2.1 is in Sect. 7. Most of it is original research, but take from [7] the discussion of the solutions of the fixed point equation (2.4), see Lemma 4.1 where we summarize it.
The law of large numbers in (2.5) is a standard consequence of an LDP with a unique minimizer for the rate function. This is a very precise and detailed formulation of the famous giantcluster phase transition in the graph \({\mathcal G }_N\). Indeed, the following happens with probability tending to one exponentially fast:

(i)
In the subcritical phase \(\varSigma (\kappa , \mu )<1\), all vertices (meaning all up to o(N)) are in microscopic components, more precisely in the unique optimal configuration encoded by \(\lambda _\mu \). That is, for any \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\), the number of components with vertex set given by k is asymptotically \(N {{\text {e}} }^{\theta _\mu ({\mathcal S })}\tau (k)\,{\mathbb {Q}}_{\theta _\mu }(\textrm{d}k)\), with \(\theta _\mu (\textrm{d}r)={{\text {e}} }^{\kappa \mu (r)}\,\mu (\textrm{d}r)\). We have \(c_{\lambda _\mu }=\mu \), no giant component appears, and the number of vertices in mesoscopic components is o(N).

(ii)
In the case \(\varSigma (\kappa , \mu )> 1\), a unique giant cluster appears with \(\sim N (1c^*({\mathcal S }))\) vertices and typeconfiguration asymptotically equal to \(N(\mu c^*)\) with \(c^*\) characterized by (2.4), since \(\theta _\mu =\theta _{c^*}\). The microscopic components are distributed according to the optimal distribution \(\lambda _{c^*}\), and the number of vertices in mesoscopic components is o(N). Note that this microscopic distribution is not saturated, that is \(\varSigma (\kappa , c^*)<1\), as in the onetype setting [2]. That is, we encounter a phase transition of explosion type, rather than of saturation type, see Remark 2.2 and [2, Section 1.6].
Remark 2.2
(Phase transition: saturation versus explosion) Here is an explanation of the phase transition in terms of a dynamical process. Consider a process of Erdős–Rényi graphs in increasing connection probability, i.e., by adding more and more bonds between the vertices, such that components grow. A suitable growth parameter is \(\varSigma (\kappa , c)\), where c stands for the rescaled typeconfiguration of all the vertices; however, we consider \(\varSigma (\kappa , c)\) as a growing function of \(\kappa \), the bond density. As we explained in [2, Section 1.6], the wellknown giantcluster phase transition (see also the discussion below Theorem 1.1 below) is an explosion phase transition in the sense that, when crossing the threshold one, a positive fraction of finitesize clusters merges rapidly into one giant cluster and, at any time, every cluster keeps participating in merge events. In particular, the total microscopic mass starts decreasing at the phase transition. In contrast, in condensation phase transitions like the famous Bose–Einstein condensation first all microscopic components reach their maximal size (the saturated state), before a macroscopic component, the condensate, appears, and then the microsocopic ones do not change anymore, but all of the additional mass goes exclusively into the condensate.
Now we give the description of the two rate functions for the contracted LDPs of \((\text {Mi}_N)_{N\in \mathbb {N}}\) and \((\text {Ma}_N)_{N\in \mathbb {N}}\), respectively, from Corollary 1.4.
Theorem 2.3
(Minimizers of the contracted rate function) Suppose that \(\kappa \) and \(\mu \) are as in Theorem 1.1. Then the following hold for \(\lambda \in {\mathcal L }\) and for \(\alpha \in {\mathcal A}\), respectively.
and
where for \(c\in {\mathcal M }({\mathcal S })\)
where \(b^* = b^*(c)\in {\mathcal M }({\mathcal S })\) is the minimal nontrivial (i.e., not equal to c) solution to
and it holds that \(\varSigma (\kappa , b^*) =1\).
The proof of Theorem 2.3 is in Sect. 6 for \({\mathcal I }_\text {Mi}\) and in Sect. 7 for \({\mathcal I }_\text {Ma}\). The above theorem suggests us a conditional law of large numbers. Informally, if we fix \(\alpha \in {\mathcal A}\) and a sequence of \(\alpha _N\in {\mathcal A}\) such that \(\alpha _N\rightarrow \alpha \). Then, under the probability \(\mathbb {P}_N(\cdot \text {Ma}_N=\alpha _N)\) we have
Notice that when \(\varSigma (\kappa ,\mu c_\alpha )> 1\), then \(b^*\) is not equal to \(\mu c_\alpha \), therefore \(b^*+c_\alpha \ne \mu \) and the missing mass is interpreted as being mesoscopic.
Remark 2.4
(Conditional limit with saturation phase transition) In formula (2.8) we encounter a phase transition of saturation type in a conditional limit, in contrast to a transition of explosion type of the unconditional one, see Remark 2.2. We refer back to Sect. 1.3 for the interpretation. Recall that \({\mathcal I }_{\text {Ma}}(\alpha )\) is the negative exponential rate of the probability of the event \(\{\text {Ma}_N \approx \alpha \}\). When \(\varSigma (\kappa , \mu c_\alpha )\le 1\), (2.8) shows that \({\mathcal I }_{\text {Ma}}(\alpha )=I_\text {Ma}(\alpha )+I_\text {Mi}(\lambda _{\mu c_\alpha })\), implying that \(\mathbb {P}_N(\text {Ma}_N\approx \alpha )=\mathbb {P}_N(\text {Mi}_N\approx \lambda _{\mu c_\alpha },\,\text {Ma}_N\approx \alpha ){{\text {e}} }^{o(N)}\). The interpretation of this is that, conditionally on the event \(\{\text {Ma}_N\approx \alpha \}\), the nonmacroscopic mass optimally organizes according to the microscopic measure \(\lambda _{\mu c_\alpha }\). In contrast, in the case \(\varSigma (\kappa , \mu c_\alpha )>1\), from (2.8) we see that \({\mathcal I }_{\text {Ma}}(\alpha )=I_\text {Ma}(\alpha )+I_\text {Mi}(\lambda _{b^*})+I_\text {Me}(\mu c_\alpha {b^*})\). This means that, conditionally on the event \(\{\text {Ma}_N\approx \alpha \}\), the nonmacroscopic mass cannot be organized fully in microscopic clusters, but it is organized microscopically according to \(\lambda _{b^*}\) and the remaining vertices, with typeconfiguration \(N(\mu c_\alpha b^*)\), are put in mesoscopic components. The particular rescaled typeconfiguration \(b^*\) is saturated in the sense that \(\varSigma (\kappa , b^*)=1\). This means, given a fixed macroscopic typeconfiguration, if more bonds are thrown into the graph, then first all microscopic clusters grow until they reach the saturated state \(\lambda _{b^*}\), and then this is frozen and only mesoscopic clusters grow. The latter effect is present in literature under the name of frozen percolation, see for example [16, 29, 33, 37] and the difference between the two phase transitions is reflected in substantial differences of the hydrodynamic limit, as we summarize in Sect. 2.4.
2.2 Comparison to [7]: branchingprocess interpretation
Our description of the limiting quantities that we presented in Sect. 2.1 is based on and derived from our analysis of the minimizer of I. Therefore we found it most suitable to present them in terms of transformed Poisson point processes. However, in the analysis of finitesize components of random graphs, it is common and was often successful to employ welladapted branching processes for the description. The main idea is that a component can be efficiently (sampled and) analyzed by exploring it via such a branching algorithm. This idea was also a cornerstone in the seminal paper [7], and it produced a description of the limiting macroscopic component in terms of the extinction probability of a crucial branching process. In this section, we recall this description and compare it to our Poisson point process description, also including the microscopic components.
The main tool that is utilized in [7] is a multitype branching process with type space \({\mathcal S }\), in which each particle of type \(r\in {\mathcal S }\) has offspring with distribution that is a Poisson process with intensity measure \(\kappa (r,s)\,\mu (\textrm{d}s)\). We define \(\rho (r)\in [0,1]\) as the probability of nonextinction of the branching process, if it starts with precisely one particle that has type \(r\in {\mathcal S }\). We summarize the most important facts from [7] that have relevance for our comparison as follows (see [7, Th. 3.1, Th. 3.12, Th. 6.1, Th. 9.10]).
Theorem 2.5
(Existence of a giant component, [7]) Suppose the situation of Theorem 1.1 is given. Abbreviate \({\mathcal G }_N={\mathcal G }(N,{{\textbf {x}}},\frac{1}{N} \kappa _N)\), then the following hold.

(i)
\(\rho :{\mathcal S }\rightarrow [0,\infty )\) is the maximal solution of
$$\begin{aligned} \rho =1{{\text {e}} }^{T_{\kappa ,\mu }\rho }. \end{aligned}$$(2.11) 
(ii)
If \(\varSigma (\kappa , \mu )\le 1\), then the largest component of \({\mathcal G }_N\) has size \(O(\log N)\) as \(N\rightarrow \infty \) with high probability.

(iii)
If \(\varSigma (\kappa , \mu )> 1\), then the largest component \({\mathcal C }_1\) of \({\mathcal G }_N\) has size \(\asymp N\). More precisely, its normalized empirical measure \(\frac{1}{N}\eta _{{{\textbf {x}}}}({\mathcal C }_1)\) (recall (1.1)) converges weakly towards the measure \(\rho (r)\,\mu (\textrm{d}r)\), and \(\rho \) is positive \(\mu \)almost everywhere in \({\mathcal S }\).
Part (iii) identifies the limiting typeconfiguration of the giant component as N times the measure with density \(\rho \) with respect to \(\mu \), and part (i) characterizes \(\rho \) via the functional identity (2.11). It is easily seen to be equivalent to the characteristic equation (2.4) that we use via the substitution \(\rho (r)\,\mu (\textrm{d}r)=\mu (\textrm{d}r) c^*(\textrm{d}r)\) or \(c^*(\textrm{d}r)=(1\rho (r))\,\mu (\textrm{d}r)\). In our analysis of the minimizer of I, (2.4) arose via the Euler–Lagrange equations, while (2.11) emerged in [7, Lemma 5.4] via a standard formula for mixed moments of the offspring of the branching process (which itself uses standard Poisson point process theory).
The statement in part (ii) about the order of the largest component is out of reach of our largedeviations ansatz, which implies that all but o(N) vertices are in components of finite size.
About the distribution of the microscopic clusters of \({\mathcal G }_N\), however, there is no explicit result contained in [7]. However, we can give a description in terms of the above branching process as well. We derive this description now from our form of the minimizer \(\lambda _\mu \) defined in (2.1). Let \(\varXi (\textrm{d}r)\) be the total progeny of type r of the branching process; then \(\varXi \) is a random measure on \({\mathcal S }\). Then, if \(\texttt{P}_{r}\) denotes the measure if the process starts from one individual of type r at time 0, we have
In Remark 4.6 we explain this relation in the setting where \({\mathcal S }\) is a finite set. In words, the empirical statistics of the microscopic components in \({\mathcal G }_N\) in the subcritical case approximate the distribution of the total offspring of the characteristic branching process.
2.3 The reducible case
Let us briefly comment on the case where the kernel \(\kappa \) is reducible with respect to \(\mu \), i.e., \({\mathcal S }\) is composed of at least two irreducible classes (maximal irreducible subsets). Then the graph \({\mathcal G }_N\) decomposes into disconnected subgraphs with types in only one of these classes. Accordingly, the collection of all the connected components can be decomposed into collections for each subgraph. In principle one can apply the LDP of Theorem 1.1 to each of the micro/macro empirical measures of the subgraphs. However, one might have the wish to have a joint LDP for the entire collection. Here one might expect that the same LDP holds true, and the decomposition into the subgraphs reappears in the rate function in a natural way.
It turns out that this expectation is not disappointed, as it concerns the microscopic part, but is disappointed for the macroscopic part. Actually, the formulation of the LDP slightly changes. The main point is that two macroscopic components can very cheaply be connected to form a significantly larger macroscopic component, just by throwing in one connecting edge, which cannot be seen on the exponential scale. Hence, the macroscopic part is very unstable on the exponential scale under adding edges. This argument fails for the microscopic part, since here we are talking now about \(\asymp N\) independent copies of a component of a finite size; if one wants to connect them such that the microscopic statistics change, then one needs to change \(\asymp N\) edges, whose probability is clearly seen on an exponential scale.
As a consequence of this effect, we will see that the rate function is finite only if the macroscopic measure \(\alpha =\sum _n\delta _{y_n}\) is such that for each n the rescaled typeconfiguration \(y_n\) is supported in one of the irreducible classes. To be precise, we say that a measure \(y\in {\mathcal M }({\mathcal S })\) is connectable (with respect to \(\kappa \) and \(\mu \)) if its support is contained in an irreducible class. Furthermore, a measure \(\alpha \in {\mathcal A}\) is called connectable if each of its atoms is connectable.
For the microscopic configurations \(\lambda \) connectability is implicitly ensured by the fact that the measure \(\lambda \) has to be absolutely continuous with respect to \(\tau \,\mathbb {Q}_\mu \) in order to have a finite value of the rate function; notice that \(\tau (k) = 0\) if \(\text {supp}(k)\) is not concentrated on a irreducible class of \({\mathcal S }\). We also have to restrain to a sequence of random graphs that are defined with respect to the same kernel \(\kappa \), rather than an approximating sequence \(\kappa _N\), since each \(\kappa _N\) might be irreducible.
Theorem 2.6
(LDP in the reducible case) Suppose the setting as in Theorem 1.1, with the exception that \(\kappa _N=\kappa \) for all \(N\in \mathbb {N}\) and the kernel \(\kappa \) on \({\mathcal S }\times {\mathcal S }\) is now assumed to be reducible, and suppose that \({\mathcal S }\) is equal to the support of \(\mu \). Then \((\text {Mi}_N,\text {Ma}_N)\) satisfies an LDP with rate function \({\widetilde{I}}\) defined by
where I is as in Theorem 1.1.
The proof follows in a straightforward way from our results, see Remark 3.11 for the finite type case. The intuitive reason is that, for \(\alpha \) that is not connectable, on the event \(\{\text {Ma}_N\approx \alpha \}\), there is a macroscopic component who has two nontrivial parts (i.e., each with \(\asymp N\) types) in two different irreducible classes, even though there cannot be any edge between these sets. Hence this event has the probability zero.
We made the choice to state the theorem under the additional assumption that \({\mathcal S }\) corresponds to the support of \(\mu \). If this was not the case, one could still approach a not connectable measure \(y\in {\mathcal M }({\mathcal S })\) with a finite exponential cost if, for each finite N, the support of y is contained in an irreducible class of \(\kappa \) with respect to \(\mu ^{{{({N}})}}\). The proof would anyway be an extension of our finite type case results, but it is out of our scope to cover this particular framework.
2.4 Motivation: an inhomogeneous coagulation process
With the present work, we actually continue our study of random particle models with coagulation in the light of largedeviation arguments initiated in [2]. Indeed, we make here the first step towards a spatial model.
The model that we are interested in is the following. Fix N atoms \(1,\dots ,N\) at the locations \(x_1,\dots ,x_N\) in a compact metric space \({\mathcal S }\). We consider a Markov process in continuous time on the set of partitions of \([N]=\{1,\dots ,N\}\). Starting with the monodispersed configuration \(M(0)=(\{i\})_{i\in [N]}\), at any time any two subsets A, B in the current partition are replaced by their union \(A\cup B\) after an exponentially distributed random time with parameter
where a symmetric \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) is given. All these random times are supposed to be independent. If M(t) denotes the partition at time t, then M(s) is a refinement of M(t) for any \(s<t\). Hence, the number of elements of M(t) is a nonincreasing (random) process starting at N. The special case of a singleton \({\mathcal S }\) and \(\kappa \equiv 1\) (the homogeneous case) is the case of the Marcus–Lushnikov model that we studied in [2]. There we also explained how the Erdős–Rényi model can be mapped onto the Marcus–Lushnikov model, and this works also in the inhomogeneous setting. Indeed, to any unordered pair \(\{i,j\}\subset [N]\) with \(i\ne j\) we associate an exponential random time e(i, j) with parameter \(\frac{1}{N} \kappa (x_i,x_j)\). These random times are independent and we put a bond between i and j as soon as e(i, j) elapses. At a fixed time \(t\in (0,\infty )\), this graph has the distribution of the inhomogeneous random graph \({\mathcal G }_{t,N}={\mathcal G }([N],(x_1,\dots ,x_N),\frac{1}{N}\kappa _{t,N})\) with type space \({\mathcal S }\) and
Notice that the random partition M(t) of the above coagulation model is equal in distribution to the collection \(({\mathcal C }_j)_j\) of the vertex sets of the components of \({\mathcal G }_{t,N}\). The two main reasons for this fact are the memorylessness of the exponential distribution and the property that the minimum of independent exponential times is also exponential with a parameter that is the sum of all the parameters. The only difference between the two models is that the graph model registers all the bonds that arrive within each of the components (and do not change anything in the connectedness), while the coagulation model just registers that a given set is connected.
We are interested in an LDP for the micro and the macro empirical measure of the partition sets of M(t) in the limit \(N\rightarrow \infty \), assuming the initial locations of particles are such that \(\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\rightarrow \mu \) for some measure \(\mu \) on \({\mathcal S }\). Since \(\kappa _{t,N}\rightarrow t \kappa \), Theorem 1.1 applies also to the above inhomogeneous coagulation process under the appropriate assumptions at a fixed time t. Furthermore, from Theorem 2.1, we obtain that the process \((M(t))_{t\in [0,\infty )}\) has a phase transition at the time
and we have a limiting distribution of the empirical micro and macro measures. This phase transition is of explosion type, as described in Remark 2.2, and in the coagulation literature is usually called gelation. Further consequences for the limiting distribution of M(t) as \(N\rightarrow \infty \) follow in a natural way, but we refrain from writing them down.
Interestingly, we can deduce that the minimizing process of microscopic cluster sizes satisfies the multitype version of the Flory equation, which is a modification of the wellknown Smoluchowski equation. The classical (singletype) Smoluchowski and Flory equation are ubiquitous in the literature concerning coagulation processes, see for example [1]. A multitype extension to the Flory equation can be formulated as follows. We think of an inhomogeneous deterministic coagulation process \((\lambda _t)_{t\in [0,\infty )}\), conceived as a process in \({\mathcal L }\). Each particle \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) consists of \(k({\mathcal S })\) atoms, \(k(\{r\})\) of which have the type r for any \(r\in {\mathcal S }\). Coagulation is nothing but addition of measures in this formulation, i.e., two particles k and \({\widetilde{k}}\) coagulate to a particle \(k+{\widetilde{k}}\). The kernel of this process is given as
Then the weak formulation of the Flory equation is, for any test function \(f\in {\mathcal C }_{\textrm{c}}({\mathcal M }_{\mathbb {N}_0}({\mathcal S }))\),
where \(\lambda _0(\textrm{d}k)=\int _{\mathcal S }\mu (\textrm{d}r)\,\delta _{\delta _r}(\textrm{d}k)\) is the initial condition, which expresses that \(\mu \) is the atom type distribution. In words, the timeevolution of \((\lambda _t)_{t\in [0,\infty )}\) is described by saying that any coagulation of two particles k and \({\widetilde{k}}\) (i.e., replacement of k and \({\widetilde{k}}\) by \(k+{\widetilde{k}}\)) happens with rate \(K(k,{\widetilde{k}})\). In our model the last term in the righthand side can be rewritten as \(\int _{{\mathcal M }_{\mathbb {N}_0}({\mathcal S })} f(k) \langle k,\kappa (\mu c_{\lambda _t})\rangle \,\lambda _t(\textrm{d}k)\). It captures the interaction between the microscopic particles and the gel (the macroscopic mass) once it forms.
See [30, Section 3] for a mathematical discussion of the (wellknown) homogeneous version of the Flory equation and [31, Section 2] for an introduction of the inhomogeneous version of the equation. We now identify a solution \((\lambda _t)_{t\in [0,\infty )}\) to (2.14).
Lemma 2.7
(Solution to the Flory equation) Assume that \({\mathcal S }\) is a finite state space and \(\kappa \) an irreducible nonnegative symmetric matrix on \({\mathcal S }\). Let \(\lambda _0(k)=\sum _{r\in {\mathcal S }} \mu _r\,\delta _{\delta _r}(k)\) and for \(t\in (0,\infty )\), define \(\lambda _t\) to be the first component of the minimizer appearing in Theorem 2.1 with \(\kappa \) replaced by \(t\kappa \).
Then \(t\mapsto \lambda _t\) is a solution to the Flory equation (2.14) on \([0,\infty )\).
The proof of Lemma 2.7, as well as an explicit expression for \((\lambda _t)_{t\ge 0}\), is given in Sect. 7.4. We are confident that Lemma 2.7 is also true in the general setting of Theorem 2.1.
The Flory equation is closely related to the Smoluchowski equation, which we write in its multitype version:
The Smoluchowski equation only considers the microscopic clusters, that is, it excludes any interaction with a possible gel, which we see in the third line of the Flory equation (2.14). The solutions of equations (2.14) and (2.15) coincide until the gelation time \(t_{\textrm{c}}=1/\varSigma (\kappa , \mu )\), after which differences appear.
At the level of the underlying stochastic microscopic models this difference is seen in terms of the type of the phase transitions, as cited in Remark 2.2. The microscopic models of frozen percolationtype, as in [16, 29, 33, 37], correspond to (2.15), while models like ours correspond to (2.14).
3 Proof of Theorem 1.1 for a finite type set
In this section we assume that \({\mathcal S }\) is a finite set and derive the largedeviations principle (LDP) of Theorem 1.1 for this case, Theorem 3.1. This is not only an important special case that is worth being formulated and studied on its own, but it will be the first step in the proof of Theorem 1.1 that is completed in Sect. 5. The formulation in the discrete case is notationally pretty different from the formulation in the general setting and many objects simplify because of the finiteness of \({\mathcal S }\). Therefore we are going to formulate the setting and the LDP from scratch in Sect. 3.1.
The organization of this section is as follows. In Sect. 3.2 we derive a formula for the distribution of \(\text {Mi}_N\). The more involved terms that appear in our formula are certain connection probabilities whose asymptotics are stated in Sect. 3.3. In Sect. 3.4 we will decompose the distribution of \(\text {Mi}_N\) into a micro–, meso– and macroscopic part and derive the exponential rates for each part. The proof of the LDP of Theorem 3.1 is finally finished in Sect. 3.5.
3.1 Formulation of the LDP
Let us recall the objects that we need to formulate the LDP for a finte type space \({\mathcal S }\). Fix a probability measure \(\mu \) on \({\mathcal S }\), which we will denote as a vector \(\mu = (\mu _s)_{s\in {\mathcal S }}\). For any \(N\in \mathbb {N}\) let \({{\textbf {x}}}^{{{({N}})}} = (x^{{{({N}})}}_1, \ldots , x^{{{({N}})}}_N) \in {\mathcal S }^N\) be a type vector such that the normalized empirical measure, \(\mu ^{{{({N}})}}=\frac{1}{N}\sum _{i=1}^N\delta _{x^{{{{({N}})}}}_i}\), converges to \(\mu \) as \(N\rightarrow \infty \). Let \(\kappa =(\kappa (r,s))_{r,s\in {\mathcal S }}\in [0,\infty )^{{\mathcal S }\times {\mathcal S }}\) be a nonnegative and symmetric matrix. For any \(N\in \mathbb {N}\), let \(\kappa _N\) be another such matrix, and assume that the sequence \(\kappa _N\), \(N\in \mathbb {N}\), converges pointwise to \(\kappa \) as \(N\rightarrow \infty \). Throughout the section we will work under the assumptions that we just stated for \(\kappa _N\), \(N\in \mathbb {N}\), and \(\mu ^{{{({N}})}}\), \(N\in \mathbb {N}\).
Recall that the random graph \({\mathcal G }_N={\mathcal G }(N,{{\textbf {x}}}^{{{{({N}})}}},\frac{1}{N}\kappa _N)\) consists of N vertices and that the vertex \(i\in [N]\) has \(x^{{{({N}})}}_i \in {\mathcal S }\). The undirected edges of the graph are set independently for each pair of vertices and two vertices of type r and s are connected via an edge with probability \(1\wedge \frac{1}{N}\kappa _N(r,s)\).
We denote by \(\{{\mathcal C }_j\}_j\) the collection of the vertex sets of all the connected components of \({\mathcal G }_N={\mathcal G }(N,{{\textbf {x}}}^{{{{({N}})}}},\frac{1}{N}\kappa _N)\). We want to study empirical measures depending on the random collection \( \{{\mathcal C }_j\}_j\). For this we introduce the typeregistering mapping \(\eta :{\mathcal {P}}([N]) \rightarrow \mathbb {N}_0^{\mathcal S }\) that gives the (type) composition of an arbitrary vertex set \(A\subset [N]\), i.e., \(\eta (A)=(\eta _s(A))_{s\in {\mathcal S }}\), and \(\eta _s(A)=\#\{i\in A:x_i^{{{{({N}})}}}=s\}\) is the number of vertices in A with type s. Note that the mapping \(\eta \) depends on the entire type vector \({{\textbf {x}}}^{{{{({N}})}}}\), not only on its normalized empirical measure, \(\mu ^{{{({N}})}}=\frac{1}{N}\sum _{i=1}^N\delta _{x^{{{{({N}})}}}_i}\).
We identify \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) with \(\mathbb {N}_0^{{\mathcal S }}\) and will work in \([0,1]^{{\mathcal S }}\) instead of \({\mathcal M }({\mathcal S })\). Now we recall the definition of the main objects, the empirical measures of the connected components of \({\mathcal G }_N\), in microscopic, respectively macroscopic, registration. The microscopic empirical measure \(\text {Mi}_N\) is defined as a measure on \(\mathbb {N}_0^{\mathcal S }\) via
Since \(\mathbb {N}_0^{\mathcal S }\) is a discrete space, we will abbreviate \(\text {Mi}_N(k)=\text {Mi}_N(\{k\})\) for any \(k\in \mathbb {N}_0^{\mathcal S }\). The macroscopic empirical measure is defined as a measure on \([0,1]^{\mathcal S }{\setminus }\{0\}\) via
Therefore our state spaces for \(\text {Mi}_N\) and \(\text {Ma}_N\) are now
and
respectively, where
One can easily verify that for any fixed N we have \(c_r(\text {Mi}_N)= \mu ^{{{({N}})}}_r\) as well as \(c_r(\text {Ma}_N)= \mu ^{{{({N}})}}_r\) for any \(r\in {\mathcal S }\), so indeed \(\text {Mi}_N\in {\mathcal L }\) and \(\text {Ma}_N \in {\mathcal A}\). However, the idea is that due to the topologies that we choose, some of the (rescaled) vertex mass specified by \(c(\lim \text {Mi}_N)\) may get lost when we take the limit for \(N\rightarrow \infty \).
We equip \({\mathcal L }\) and \({\mathcal A}\) with the vague topologies that we introduced in Sect. 1.2. On \({\mathcal L }\), this is identical with the topology of pointwise convergence (i.e., \(\lim _{N\rightarrow \infty } \lambda ^{{{({N}})}} =\lambda \) if and only if \(\lim _{N\rightarrow \infty } \lambda _k^{{{({N}})}} =\lambda _k\) for any \(k\in \mathbb {N}_0^{\mathcal S }\)). The vague topology on \({\mathcal A}\) is formulated by saying that \(\lim _{N\rightarrow \infty } \alpha ^{{{({N}})}} =\alpha \) if and only if \(\lim _{N \rightarrow \infty }\int \alpha ^{{{({N}})}}(\textrm{d}y) f(y) = \int \alpha (\textrm{d}y) f(y)\) for any continuous and compactly supported function \(f:[0,1]^{\mathcal S }{\setminus }\{0\}\rightarrow \mathbb {R}\); note that for every such function f there is an \(\varepsilon >0\) such that \(f=0\) on \(\{x\in [0,1]^{\mathcal S }:x\le \varepsilon \}\).
Recall that we write \(\langle a,f\rangle =\sum _r a_r f_r\) for the integral of a function f with respect to a measure a on \({\mathcal S }\) and also recall the notation \(a = \sum _{s\in {\mathcal S }} a_s\). Further, recall the combinatorial quantity \(\tau (k)\) that collects the weight of all spanning trees on a vertex set with type configuration \(k\in \mathbb {N}_0^{\mathcal S }\), i.e.,
where \({{\textbf {x}}}\in {\mathcal S }^{\left{k}\right}\) is a type vector compatible with k, i.e., \(\sum _{i=1}^{\left{k}\right} \delta _{x_i}= k\), and \({\mathcal T }(k)\) is the set of spanning trees on \([{\left{k}\right}]\). We use the convention that \({\mathcal T }(0)=\emptyset \) and hence \(\tau (0) = 0\).
Here is the main result of Sect. 3:
Theorem 3.1
(LDP for \((\text {Mi}_N,\text {Ma}_N)\) with finitely many types) Assume that the empirical measure \(\mu ^{{{({N}})}}\) of the type sequence \((x_1^{{{{({N}})}}},\dots ,x_N^{{{{({N}})}}})\) converges weakly towards a positive probability vector \(\mu \in (0,1]^{\mathcal S }\) as \(N\rightarrow \infty \) and that the kernel \(\kappa _N\) converges on \({\mathcal S }\times {\mathcal S }\) towards a \(\mu \)irreducible kernel \(\kappa \in [0,\infty )^{{\mathcal S }\times {\mathcal S }}\).
Then \((\text {Mi}_N,\text {Ma}_N)\) satisfies a large deviations principle (LDP) with speed N and rate function \((\lambda , \alpha ) \mapsto I(\lambda , \alpha )\) defined by
where
and where we always use the convention that \(\log 0 = \infty \) and \(0\log 0 = 0\).
Theorem 3.1 is indeed nothing but the special case of Theorem 1.1 for a finite set \({\mathcal S }\). Indeed, it is clear that the setting and the two rate functions \(I_\text {Ma}\) and \(I_\text {Me}\) are the discretespace versions, but (3.7) looks a bit different from the formula for \(I_\text {Mi}\) in Theorem 1.1. But from substituting the notation of the entropy in (3.7) and noting that the distribution of a Poisson point process with intensity measure \(\mu \) can here be identified as
one sees that (3.7) is indeed a discrete analog of (1.11). Furthermore, one can also write
where \(\lambda (\mu )\) is defined as the discrete analog of (2.1) with \(c=\mu \), i.e.,
This formula will be helpful in Sect. 6 when we will identify minimizers of \(I_\text {Mi}\).
We will now give an extension of Theorem 3.1 for kernels \(\kappa \) that are not \(\mu \)irreducible. Recall the notion of connectability for \(\alpha \in {\mathcal A}\) that was introduced in Sect. 2.3.
Theorem 3.2
(Finitetype LDP for \((\text {Mi}_N,\text {Ma}_N)\) without irreducibility) For \((\lambda , \alpha )\in {\mathcal L }\times {\mathcal A}\) define

(i)
Given all the assumptions from Theorem 3.1, except the assumption that \(\kappa \) is \(\mu \)irreducible, the pair \((\text {Mi}_N,\text {Ma}_N)\) satisfies the lower largedeviations bound (1.14) with speed N and rate function \({\widetilde{I}}\).

(ii)
Given all the assumptions from Theorem 3.1, except the assumption that \(\kappa \) is \(\mu \)irreducible and with the additional assumption that \(\kappa _N= \kappa \) for all but finitely many \(N\in \mathbb {N}\), the pair \((\text {Mi}_N,\text {Ma}_N)\) satisfies an LDP with speed N and rate function \({\widetilde{I}}\).
The proof is given in Remark 3.11.
We omit to restate the finite\({\mathcal S }\) analogs of Theorems 2.3, 2.1 and all the related corollaries, as they can be deduced as special cases. For the critical quantity \(\varSigma (\kappa ,\mu )\), we refer to (4.2), and we recall that it is in this setting equal to the spectral radius of the matrix \((\kappa (r,s)\mu _s)_{(r,s)\in {\mathcal S }^2}\).
3.2 The distribution of \(\varvec{\text {Mi}_N}\)
Let us identify the distribution of \(\text {Mi}_N\) for any N in explicit terms. Note that as long as \(N\in \mathbb {N}\) is fixed, the measure \(\text {Ma}_N\) contains exactly the same information as \(\text {Mi}_N\), hence, we are also deriving its distribution. We start by noting that \(N\text {Mi}_N\) takes values in
Let \(k\in \mathbb {N}_0^{\mathcal S }\) and let \({{\textbf {x}}} = (x_1, \ldots , x_{k}) \in {\mathcal S }^{k}\) be a type vector which is compatible with k, meaning that \(\sum _{i=1}^{k} \delta _{x_i} = k\). We define the connection probability of the graph \({\mathcal G }({\left{k}\right},{{\textbf {x}}}, \frac{1}{N}\kappa _N)\) by
and \(p_N(0) = 0\). In the following lemma we write down the distribution of \(\text {Mi}_N\) in terms of the quantities \(p_N(k)\), \(k\in \mathbb {N}_0^{\mathcal S }\).
Lemma 3.3
(The distribution of \(\text {Mi}_N\)) Let \(N\in \mathbb {N}\) and assume that \(\kappa _N(r,s) \le N\) for all \(r,s\in {\mathcal S }\). Then for any \(\ell \in {\mathcal L }_N\) we have that
where
and \(p_N(k)\) is defined in (3.14).
Proof
This is proved in an analogous way to [2, Corollary 2.2]; we omit the details. \(\square \)
The formula in (3.15) is easy to understand. Indeed, the combinatorial term on the right (with an additional factor 1/N!) is equal to the inverse of the number of possible labelings of all the vertices; the event \(\{N\text {Mi}_N(k)=\ell _k\,\forall k\}\) means that \(\ell _k\) is equal to the number of clusters in [N] whose vertex set has the type configuration k, for any multiindex k. The product of \(p_N(k)^{\ell _k}\) over k is the probability that all these clusters are connected, the product over the powers of \(1\kappa _N(r,s)/N\) is equal to the probability that each two of them are not connected, and the product of the two remaining combinatorial terms (with an additional factor N!) is equal to the number of ways to decompose all the types into clusters having the prescribed vertex structure.
3.3 Asymptotics for the connection probabilities
Recall the connection probability \(p_N(k)\), for \(k\in \mathbb {N}_0^{\mathcal S }\), that we defined in (3.14). When going to the limit for \(N\rightarrow \infty \), it will be crucial to distinguish different cases depending on the asymptotic behaviour of k. In the first case, we keep \(k\in \mathbb {N}_0^{\mathcal S }\) fixed, whereas \(N\rightarrow \infty \) and refer to \(p_N(k)\) as the connection probability of a microscopic cluster. In the second case we consider a sequence \(k^{{{({N}})}}\in \mathbb {N}_0^{\mathcal S }\) where \(k^{{{({N}})}}\) is of order N. More precisely, we assume that for any \(s\in {\mathcal S }\) the limit \(\lim _{N\rightarrow \infty }\frac{k^{{{({N}})}}_s}{N} = y_s\) exists and that the vector \(y=(y_s)_{s\in {\mathcal S }}\) is nontrivial. In that case we refer to \(p_N(k^{{{({N}})}})\) as the connection probability of a macroscopic cluster. In between the microscopic and the macroscopic regime there are the cases, in which the sequence \(k^{{{({N}})}}\) diverges, but is in o(N). Those are summarized under the notion of mesoscopic clusters.
In this section the results for the different cases will only be stated. Their proofs are collected in Sect. 4, since the derivation of the asymptotics for the macroscopic case is rather cumbersome. Recall the definition (3.6) for \(\tau \). By \(\tau _N\) we will denote the same quantity, but defined with respect to the kernel \(\kappa _N\). Further, recall that we are working under the assumption that \(\kappa _N\) converges pointwise to \(\kappa \), which will be used in Lemma 3.4 and Theorem 3.6.
Lemma 3.4
(Asymptotics for the connection probability of microscopic clusters) Fix \(k\in \mathbb {N}_0^{\mathcal S }\). Then, as \(N\rightarrow \infty \),
Lemma 3.5
(Estimate for the connection probability of mesoscopic clusters) Fix \(k\in \mathbb {N}_0^{\mathcal S }\) and choose any \(r\in {\mathcal S }\) such that \(k_r>0\). Then
where \(S_k:= \text {supp}(k)\).
The following theorem concerns the connection probabilities of macroscopic clusters. This result is to the best of our knowledge, a new one and might be of independent interest in the theory of random graphs. Note that it is the multitype version of a result from [35], see [2, Lemma 2.4].
Theorem 3.6
(Asymptotics of the connection probability of macroscopic clusters) Fix \(y\in [0,1]^{\mathcal S }\), \(y\ne 0\). Let \(\{k^{{{({N}})}}\}_{N\in \mathbb {N}}\) be a sequence in \(\mathbb {N}_0^{\mathcal S }\) such that \(\lim _{N\rightarrow \infty }{\textstyle {\frac{k^{{{{({N}})}}}_r}{N}}}=y_r\) for all \(r\in {\mathcal S }\).

(i)
Then it holds that
$$\begin{aligned} \limsup _{N\rightarrow \infty } \frac{1}{N} \log p_N(k^{{{({N}})}}) \le \sum _{r\in {\mathcal S }} y_r \log \big (1  {{\text {e}} }^{(\kappa y)_r} \big ), \end{aligned}$$(3.19)where the righthand side takes the value \(\infty \) when \(y\not \ll \kappa y\).

(ii)
Assume that \(\tau (k^{{{{({N}})}}})>0\) for all but finitely many \(N\in \mathbb {N}\) and that \(\{k^{{{({N}})}}_r\}_N\) is bounded in N for all \(r\notin \text {supp}(y)\). Then
$$\begin{aligned} \lim _{N\rightarrow \infty } \frac{1}{N} \log p_N(k^{{{({N}})}}) = \sum _{r\in {\mathcal S }} y_r \log \big (1  {{\text {e}} }^{(\kappa y)_r} \big ) \in [\infty ,0], \end{aligned}$$(3.20)where the righthand side takes the value \(\infty \) when \(y\not \ll \kappa y\).
The additional assumption \(\tau (k^{{{({N}})}})>0\) ensures that the connection probability is indeed strictly positive since otherwise the lefthand side of (3.20) is \(\infty \). The assumption about the boundedness of \(\{k^{{{({N}})}}_r\}_N\) for \(r\notin \text {supp}(y)\) might be weakened. However, for our purposes the statement will suffice in this form.
The proof can be found in Sect. 4. The main idea is to construct a sequence of graphs with the same connection parameter \(\kappa _N\), but a different number of vertices in such a way that it contains with high probability a macroscopic component \(k^{{{({N}})}}\).
3.4 Exponential rates for micro, meso and macro parts
The proof of the LDP in Theorem 3.1 is carried out in the same way as in [2, Section 3]. The main idea is to split the distribution that we obtained in Lemma 3.3 into three parts, which we will call micro, meso and macroscopic part. These parts roughly give the terms \({{\text {e}} }^{NI_\text {Mi}(\lambda )}\), \({{\text {e}} }^{NI_\text {Me}(\mu c(\lambda )c(\alpha ))}\) and \({{\text {e}} }^{NI_\text {Ma}(\alpha )}\), if a properly rescaled version of \(\ell = \ell ^{{{({N}})}} \in {\mathcal L }_N\) is close to \((\lambda , \alpha )\). In the next lemma we give the decomposition into the three parts. Afterwards, we derive the exponential asymptotics of them in Lemmas 3.8–3.10.
Lemma 3.7
(Decomposition into three contributions) Fix \(\ell \in {\mathcal L }_N\). For \(k\in \mathbb {N}_0^{{\mathcal S }}\) define
and for any two numbers \(A,B \in [0,\infty )\) write
Then, for any fixed \(R\in \mathbb {N}\) and \(\varepsilon >0\) we have that, as \(N\rightarrow \infty \)
Proof
On the righthand side of (3.15) we apply Stirling’s formula \(n= {{\text {e}} }^{o(n)}(n/{{\text {e}} })^n\) to the terms \((N\mu ^{{{({N}})}}_r)!\) and use that \(N\mu ^{{{({N}})}}_r = \sum _k \ell _k k_r\). Note that we also used that \(\mu ^{{{({N}})}}_r \rightarrow \mu _r\). \(\square \)
Lemma 3.8
(Asymptotics of the micro part) Fix \(R\in \mathbb {N}\) and let \(\lambda \in {\mathcal L }\). Define
where
and where we use the convention that \(\log 0 =\infty \) and \(0\log 0 =0\). In particular, the righthand side of (3.22) is equal to \(+\infty \) if there is some \(k\in \mathbb {N}_0^{\mathcal S }\) with \({\left{k}\right} \le R\) such that \(\tau (k) =0 \), but \(\lambda _k>0\). Otherwise, the righthand side is finite.
Then for all \(\ell ^{{{({N}})}}=(\ell ^{{{({N}})}}_k)_{k \in \mathbb {N}_0^{\mathcal S }} \in {\mathcal L }_N\) satisfying \(\lambda _k = \lim _{N\rightarrow \infty } \frac{1}{N} \ell ^{{{({N}})}}_k\) for all k with \({\left{k}\right} \le R\) we have
Proof
We use Stirling’s formula for the terms \(\ell ^{{{({N}})}}_k!\) as well as the fact that for any \(k\in \mathbb {N}_0^{\mathcal S }\) with \(k\le R\) we have that \(p_N(k) = {{\text {e}} }^{o(1)}N^{1{\left{k}\right}}\tau _N(k)\) as \(N \rightarrow \infty \), by Lemma 3.4. Therefore, as \(N\rightarrow \infty \), we have
where, we have used that \(0\le \sum _{k:k\le R}\log \ell ^{{{({N}})}}_k \le \{k:k\le R\}\log \big (\frac{\sum _{k:k\le R}\ell ^{{{({N}})}}_k}{\{k:k\le R\}}\big ) =o(N)\). Further, we use that \(\lim _{N\rightarrow \infty }(1+\frac{x}{N})^N = {{\text {e}} }^x\) to get that, as \(N\rightarrow \infty \),
where in the last step we used that
together with the fact that the righthand side converges to 0 as \(N\rightarrow \infty \). Combining the asymptotics from (3.25) and (3.26) gives the claim. \(\square \)
Lemma 3.9
(Asymptotics of the macro part) Fix \(\alpha \in {\mathcal A}\), and note that \(\alpha \) can be written as \(\alpha = \sum _{j \in J} \delta _{y^{{{({j}})}}}\) where \(y^{{{({j}})}} \in [0,1]^{\mathcal S }{\setminus }\{0\}\) for all \(j \in J\) and J is a countable set. Fix any \(\varepsilon >0\) with \(\varepsilon \notin \{y^{{{({j}})}}:j\in J\}\). Define \(J_{\varepsilon }(\alpha ):=\{j\in J :{\left{y^{{{({j}})}}}\right}>\varepsilon \}\), which is a finite set, and
where we use the convention that \(\log 0 = \infty \) and \(0\log 0 = 0\). In particular, the righthand side of (3.27) is equal to \(+\infty \), if there is some \(i\in J_\varepsilon (\alpha )\) such that the condition \(y^{{{({i}})}}\ll \kappa y^{{{({i}})}}\) fails. Then we have the following.

(i)
For any sequence \(\ell ^{{{({N}})}} \in {\mathcal L }_N\) denote \(\alpha ^{{{({N}})}} = \sum _k \ell ^{{{({N}})}}_k \delta _{\frac{k}{N}}\) and assume that \(\alpha ^{{{({N}})}}\) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\) converges to \(\alpha \) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\), as \(N\rightarrow \infty \). Then it holds that
$$\begin{aligned} \limsup _{N \rightarrow \infty } \frac{1}{N}\log z_{\varepsilon N,N}^{{{{({N}})}} } (\ell ^{{{({N}})}}) \le  I_\text {Ma}^{{{({\varepsilon }})}}(\alpha ). \end{aligned}$$(3.28) 
(ii)
For all \(j\in J_{\varepsilon }(\alpha )\), let \(\{k^{{{({j,N}})}}\}_{N\in \mathbb {N}}\) be a sequence in \(\mathbb {N}_0^{\mathcal S }\) such that \(\tau (k^{{{({j,N}})}}) > 0\) for all \(N\in \mathbb {N}\), \(\lim _{N\rightarrow \infty }\frac{k^{{{({j,N}})}}}{N}=y^{{{({j}})}}\) and \(\{k^{{{({j,N}})}}_s\}_{N\in \mathbb {N}}\) is bounded for all \(s\notin \text {supp}(y^{{{({j}})}})\). Let \(\ell ^{{{({N}})}}\) be an element of \({\mathcal L }_N\) such that \(\ell ^{{{({N}})}}_k = \#\{j\in J:k^{{{({j,N}})}} =k\}\) for \(k>\varepsilon N\). Denote \(\alpha ^{{{({N}})}} = \sum _k \ell ^{{{({N}})}}_k \delta _{\frac{k}{N}}\), then \(\alpha ^{{{({N}})}}\) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\) converges to \(\alpha \) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\), and
$$\begin{aligned} \lim _{N \rightarrow \infty } \frac{1}{N}\log z_{\varepsilon N,N}^{{{{({N}})}} } (\ell ^{{{({N}})}}) =  I_\text {Ma}^{{{({\varepsilon }})}}(\alpha ). \end{aligned}$$(3.29)
Proof
We start with the first statement. Let us first turn to the first term on the right of (3.21). We apply Stirling’s bound \(n!\ge n^n {{\text {e}} }^{n}\) to each of the terms \(k_s!\) and the simple bound \(\ell ^{{{({N}})}}_k! \ge 1\) for all \(k\in \mathbb {N}_0^{{\mathcal S }}\). Using the upper bound (3.19) for \(p_N(k)\) from Theorem 3.6 we obtain, as \(N\rightarrow \infty \),
where we used that \(\sum _{k>\varepsilon N}\ell ^{{{({N}})}}_k\le \frac{1}{\varepsilon }\) The second term on the right of (3.21) is estimated using \(1x\le {{\text {e}} }^{x}\) for \(x=\kappa _N(r,s)/N\) as follows for \(N\rightarrow \infty \),
Note that the product of the righthand sides of (3.30) and (3.31) is equal to \({{\text {e}} }^{o(N)} {{\text {e}} }^{N I_\text {Ma}^{{{({\varepsilon }})}}(\alpha ^{{{({N}})}})}\). Using the convergence assumption on \(\alpha ^{{{({N}})}}\) and the fact that \(\varepsilon < \inf _{j\in J_\varepsilon } y^{{{({j}})}}\) one can verify that \(I_\text {Ma}^{{{({\varepsilon }})}}(\alpha ^{{{({N}})}}) \rightarrow I_\text {Ma}^{{{({\varepsilon }})}}(\alpha )\) as \(N\rightarrow \infty \). This gives the result.
To show the second statement, let us first notice that it is clear from the definition that \(\alpha ^{{{({N}})}}\) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\) converges to \(\alpha \) restricted to \(\{y :{\left{y}\right}>\varepsilon \}\). Therefore we can apply the first assertion and we have the upper bound (3.28). In order to get also the lower bound, we lower estimate \(z_{\varepsilon N,N}^{{{{({N}})}} } (\ell ^{{{({N}})}})\) against the sum on \(k^{{{{({N,j}})}}}\) over the finite set \(j\in J_\varepsilon (\alpha )\) and note that these are the only summands k with \(k>\varepsilon N\) such that \(\ell ^{{{({N}})}}_k>0\). For each such j we apply the asymptotic (3.20) from Theorem 3.6 and obtain the corresponding lower bound, also noting that \(\prod _{k> \varepsilon N}(\ell ^{{{({N}})}}_k!)^{1}\ge {{\text {e}} }^{\frac{1}{\varepsilon }\log N}={{\text {e}} }^{o(N)}\) and, by estimating \(k_s\le N\), we get that \((k_s)^{\frac{\ell ^{{{({N}})}}_k}{2}} \ge {{\text {e}} }^{\frac{1}{2\varepsilon }\log N}\) for any \(s\in {\mathcal S }\). To derive the lower bound equivalent of (3.31) we use that \((1\frac{c}{N})^N = {{\text {e}} }^{c}(1+o(1))\). This proves the claim. \(\square \)
In the second statement of Lemma 3.9 we restrict to sequences \(\ell ^{{{({N}})}}\) such that each typeconfiguration k with \(\ell ^{{{({N}})}}_k>0\) is connectable in the sense that \(\tau (k)>0\). We will see in the proof of Theorem 3.1 that for each \(\alpha \in {\mathcal A}\) such that \(I_\text {Ma}(\alpha )<\infty \) and for each \(\varepsilon >0\), we will always be able to find such a sequence. Our second restriction \(\varepsilon \notin \{y^{{{({j}})}}:j\in J\}\) is clearly only technical and gives no problem at all when we later take the limit as \( \varepsilon \downarrow 0\); it frees us from unwanted terms.
Lemma 3.10
(Asymptotics of the meso part) Fix \(R\in \mathbb {N}\), \(\varepsilon >0\) and \(\nu \in [0,1]^{\mathcal S }\). For a sequence \(\ell ^{{{({N}})}} \in {\mathcal L }_N\), \(N\in \mathbb {N}\), we write \(\nu ^{{{({N}})}}_s:= \frac{1}{N}\sum _{R<k\le \varepsilon N}\ell ^{{{({N}})}}_k k_s\), for \(s\in {\mathcal S }\), and assume that \(\lim _{N\rightarrow \infty } \nu ^{{{({N}})}}_s = \nu _s\) holds for all \(s\in {\mathcal S }\). Then
where the term \(C(R,\varepsilon ,\nu )\) is continuous in \(\nu \) and converges to 0 as \(R \rightarrow \infty \) and \(\varepsilon \rightarrow 0\).
Proof
For fixed \(k\in \mathbb {N}_0^{\mathcal S }\) denote \(S_k:=\text {supp}(k)\). For \(p_N(k)\) and \(r\in S_k\) we use the upper bound (3.18) from Lemma 3.5. Also, we apply the Stirling bound \(n! \ge n^n {{\text {e}} }^{n}\) to the terms \(\ell ^{{{({N}})}}_k!\) as well as the terms \(k_s!\) for all \(s\in {\mathcal S }\). This gives that
where \(C= 1\wedge (({\left\Vert {\kappa }\right\Vert }_\infty +1){\mathcal S })^{{\mathcal S }1}\) and where we assumed that N is large enough such that \({\left\Vert {\kappa _N}\right\Vert }_\infty \le {\left\Vert {\kappa }\right\Vert }_\infty +1\). The first term on the righthand side of (3.33) is clearly equal to \({{\text {e}} }^{o(N)}{{\text {e}} }^{N\langle \nu , \log \mu \rangle }\). We now take a look at the second one. For \(s\in {\mathcal S }\) we put
Then \(\frac{1}{N}C_s(\ell ^{{{({N}})}})\in [\nu _s^{{{({N}})}}{\textstyle {\frac{1}{R}}},\nu _s^{{{({N}})}}]\) since \(\sum _{k:R<k\le \varepsilon N} \ell ^{{{({N}})}}_k \le \frac{1}{R} \sum _{k:R<k} k\ell ^{{{({N}})}}_k\le N/R\). Next, we apply Jensen’s inequality and the fact that \(x\mapsto \log x \) is concave to get that
where
which is continuous in \(\nu \) and converges to 0 as \(R\rightarrow \infty \).
It remains to argue that the largeN exponential scale of the last term on the righthand side of (3.33) vanishes when taking \(R\rightarrow \infty \) afterwards. Recall that the choice of r may depend on k; we will denote it as \(r_k\). We first use that
Abbreviating \(D:= \sum _{R<k<\varepsilon N} \ell ^{{{({N}})}}_k/N\) and using Jensen’s inequality we get that
It is easy to see that choosing \(r_k\) such that \(k_{r_k} = \max _{s\in {\mathcal S }} k_s\) ensures the convergence of \(\sum _{k} k_{r_k}^{2{\mathcal S }}\) and the fact that \(\sum _{k\ge R} k_{r_k}^{2{\mathcal S }}\) is polynomial in R. Further, as we remarked below (3.34), \(D\le 1/R\) and therefore the righthand side of (3.35) is bounded by \(\exp (N \delta ^\prime _R)\) for some \(\delta ^\prime _R\) that vanishes as \(R\rightarrow \infty \). Next, we use that \(R/\log R \le k /\log k\) holds on our summation area of k if R is large enough and therefore
Noting that \(\frac{1}{R}\log R\rightarrow 0\) as \(R\rightarrow \infty \), we have shown that the last term on the righthand side of (3.33) can be bounded by \({{\text {e}} }^{N\delta ''_R}\) for some \(\delta ''_R\) that vanishes as \(R\rightarrow \infty \). So far, we have handled the first term in the definition of \(z_{R,\varepsilon N}^{{{{({N}})}}}(\ell ^{{{({N}})}})\), and we saw that its exponential rate is not larger than the first term in \(I_\text {Me}(\nu )\).
Let us now handle the second and last part of \(z_{R,\varepsilon N}^{{{{({N}})}}}(\ell ^{{{({N}})}})\). We use \(1x \le {{\text {e}} }^{x}\) for \(x=\kappa _N(r,s)/N\) as well as \(\frac{1}{2} \sum _{R<k\le \varepsilon N} \ell ^{{{({N}})}}_k \langle k, \kappa k\rangle \le \frac{1}{2} {\left\Vert {\kappa }\right\Vert }_\infty \varepsilon \sum _{R<k\le \varepsilon N} \ell ^{{{({N}})}}_k k \le \frac{N}{2}{\left\Vert {\kappa }\right\Vert }_\infty \varepsilon \) to get that
Note that the exponential rate of the last factor on the righthand side vanishes as \(\varepsilon \rightarrow 0\). Collecting all our estimates we have shown that the estimate (3.32) holds with \(C(R,\varepsilon ,\nu ) = \delta _R(\nu ) + \delta ^\prime _R + \delta ^{\prime \prime }_R + \frac{1}{2} {\left\Vert {\kappa }\right\Vert }_\infty \varepsilon \). \(\square \)
3.5 Proof of Theorem 3.1
Here we finish now the proof of the LDP of Theorem 3.1. Let \(d_{\mathcal L }\) and \(d_{\mathcal A}\) be, respectively, metrics that induce the vague topologies on the state spaces \({\mathcal L }\) and \({\mathcal A}\) (see (3.3) and (3.4)). Notice that, thanks to the contraints \(c(\lambda )\le 1\) and \(c(\alpha )\le 1\), the spaces \({\mathcal L }\) and \({\mathcal A}\) endowed with the vague topologies are compact by the Bolzano–Weierstrass theorem and Fatou’s lemma. Moreover the rate function I is lower semicontinuous on \({\mathcal L }\times {\mathcal A}\), which makes it a good rate function. Hence, a weak LDP implies the claim of Theorem 3.1 and it suffices to prove that
where \(B_\delta (\lambda )\) and \(B_\rho (\alpha )\) are closed balls centered at \(\lambda \) and \(\alpha \) with radii \(\delta \) and \(\rho \), respectively.
Fix \(\lambda \in {\mathcal L }\) and \(\alpha \in {\mathcal A}\).
Step 1: Cardinality of \({\mathcal L }_N\). Recall the definition of \({\mathcal L }_N\) that was given in (3.13). Each \(\ell \in {\mathcal L }_N\) has a unique representation as a product measure \(\ell = \bigotimes _{r\in {\mathcal S }} \ell ^{{{{({r}})}}}\), where \(\ell ^{{{{({r}})}}} = (\ell ^{{{{({r}})}}}_j)_{j \in \mathbb {N}_0}\) and \(\sum _j \ell ^{{{{({r}})}}}_j j = N\mu ^{{{({N}})}}_r\) for all \(r\in {\mathcal S }\). By the same argument as in [2, Lemma 3.2] there are at most \(e^{o(N\mu ^{{{({N}})}}_r)}\) ways to choose the marginal \(\ell ^{(r)}\). Consequently we have
Fix \(R\in \mathbb {N}\) and \(\varepsilon >0\). We denote by \(d^R_{\mathcal L }\) and \(d^\varepsilon _{\mathcal A}\) the distance of the projections of measures on \(\{k:k\le R\}\) and \(\{y:y>\varepsilon \}\), respectively. Then \(d_{\mathcal L }\ge d^R_{\mathcal L }\) and \(d_{\mathcal A}\ge d^R_{\mathcal A}\). For \(\delta >0\) and \(\rho >0\) denote by \({\mathcal L }^{{{{({R,\varepsilon }})}}}_N(\delta , \rho )\) the set of all \(\ell \in {\mathcal L }_N\) with \(d^R_{\mathcal L }(\frac{1}{N} \ell ,\lambda )< \delta \) and \(d^\varepsilon _{\mathcal A}(\ell _{\lfloor N\cdot \rfloor },\alpha ) < \rho \). Note that
According to the preceding, also
Step 2: Case \(c(\lambda ) + c(\alpha ) \not \le \mu \). Assume that there is some \(r \in {\mathcal S }\) such that \(c_r(\lambda ) + c_r(\alpha ) > \mu _r\). Then it is easy to see that \({\mathcal L }_N(\delta , \rho ) = \emptyset \) for sufficiently large N and hence
which is proved with the same argument as in [2, Lemma 3.7].
Step 3: Proof of the upper bound in (3.37) for \(c(\lambda ) + c(\alpha ) \le \mu \). For any \(R \in \mathbb {N}\) and any \(\varepsilon \in (0,1]\) we have by Step 1 and Lemma 3.7 that
For \({\tilde{\lambda }} \in {\mathcal L }\) and \({\tilde{\alpha }} \in {\mathcal A}\) we define
We require that \(\varepsilon \in (0,1]{\setminus }\{y:y \in \text {supp}(\alpha )\}\) (recall that \(\text {supp}(\alpha )\) is countable). This is a prerequisite to apply Lemma 3.9 to \(\tilde{\alpha }\). Now, applying Lemmas 3.8, 3.9(1) and 3.10 to the righthand side of (3.38) we get
where we used the cutoff versions of the rate functions defined in Lemmas 3.8, 3.9 and 3.10. Also, we abbreviated \({\tilde{\nu }}^{{{({R,\varepsilon }})}} := \mu  c^{{{({R}})}}({\tilde{\lambda }})  c^{{{({\varepsilon }})}}({\tilde{\alpha }})\) and recall that \(C(R,\varepsilon , {\tilde{\nu }}^{{{({R,\varepsilon }})}})\) is the term given in Lemma 3.10. Note that the functions \(I_{\text {Mi}}^{{{({R}})}}\) and \(c^{{{({R}})}}\) are continuous (in any point) and the functions \(I_{\text {Ma}}^{{{{({\varepsilon }})}}}\) and \(c^{{{({\varepsilon }})}}\) are continuous in \(\alpha \) due to our requirement that \(\varepsilon \notin \{y:y \in \text {supp}(\alpha )\}\). This also implies that \({\tilde{\nu }}^{{{({R,\varepsilon }})}}\) converges to \(\nu ^{{{({R,\varepsilon }})}} := \mu  c^{{{({R}})}}(\lambda )  c^{{{({\varepsilon }})}}(\alpha )\), as \(\delta , \rho \rightarrow 0\), and due to the continuity of \(I_\text {Me}\) and \(C(R,\varepsilon , \cdot )\) the respective terms converge. Altogether, we get that
Observe that the righthand side converges to \(I(\lambda , \alpha )\), if we let \(R\rightarrow \infty \) and \(\varepsilon \rightarrow 0\), which proves the upper bound in (3.37). Notice that the requirement that \(\varepsilon \notin \{y:y \in \text {supp}(\alpha )\}\) is not a problem since \(\text {supp}(\alpha )\) is countable.
Step 4: Construction of a recovery sequence. In this step, we prepare for the proof of the lower bound in (3.37) (see Step 5) by constructing an almost optimal sequence of \(\ell \)’s. We handle here only the case that \(\kappa \) is irreducible; see Remark 3.11 for hints how to handle the case of a reducible \(\kappa \). We may assume that \(c(\lambda )+c(\alpha )\le \mu \), since the rate function is equal to \(\infty \) otherwise. For the same reason, we also may assume that the mesoscopic mass \(\nu := \mu c(\lambda )  c(\alpha )\) satisfies \(\nu \ll \kappa \nu \), since otherwise \(I_\text {Me}(\nu )=\infty \).
For \(R \in \mathbb {N}\) and \(\varepsilon \in (0,1]\) we construct a suitable recovery sequence \(\ell ^{{{({N}})}}=\ell ^{{{({N}})}}(R,\varepsilon )\) that will turn out in Step 5 as asymptotically optimal. To this end, we construct it in such a way that it will put all mesoscopic mass \(\nu := \mu c(\lambda )  c(\alpha )\) into several components that can all be described by the same configuration \(k^{{{({\text {Me},N}})}}\) and which are actually on the lower end of the macroscopic scale, such that (3.20) of Theorem 3.6 can be applied. Our construction also ensures that all components of macroscopic scale have a strictly positive probability of being connected. Denote by \(\mathbbm {1}\) the element of \(\mathbb {N}_0^{\mathcal S }\) that is equal to 1 in each entry. Define \(k^{{{({\text {Me},N}})}}:= \lfloor N\varepsilon \nu \rfloor + \mathbbm {1}\). Using the representation \(\alpha = \sum _i \delta _{y^{{{({i}})}}}\) put \(k^{{{({i,N}})}} := \lfloor Ny^{{{({i}})}} \rfloor + \mathbbm {1}\) for \(i\in J_\varepsilon (\alpha ) := \{i :{\left{y^{{{({i}})}}}\right} > \varepsilon \}\).
Let us check now that we can actually apply Theorem 3.6 to \(p_N(k^{{{({i,N}})}})\) for all \(i\in J_\varepsilon (\alpha )\cup \{\text {Me}\}\). To check that \(\tau (k^{{{({i,N}})}})>0\) for all \(i\in J_\varepsilon (\alpha )\cup \{\text {Me}\}\) note the following: Depending on the support of \(y^{{{({i}})}}\) we might not have that \(\kappa \) is irreducible with respect to \(y^{{{({i}})}}\). However, the vectors \(k^{{{({i,N}})}}\) have support on the full type space \({\mathcal S }\) for all \(N\in \mathbb {N}\) by construction. Consequently, \(\tau (k^{{{({i,N}})}})>0\) is implied by the fact that \(\kappa \) is irreducible with respect to \(\mu \). Secondly, since \(\nu \ll \kappa \nu \), we also have that \(y^{{{({i}})}}\ll \kappa y^{{{({i}})}}\) for all \(i\in J_\varepsilon (\alpha )\): if not, \(I_\text {Ma}(\alpha )=\infty \) by definition and the lower bound \(\infty \) in (3.37) trivially holds. By construction it holds for all \(i\in J_\varepsilon (\alpha )\) that \(k^{{{({i,N}})}}_s = 1\) for \(s\notin \text {supp}(y^{{{({i}})}})\) and for all \(N\in \mathbb {N}\), which ensures the boundedness condition. The same holds for \(k^{{{({\text {Me}, N}})}}\).
Define now
where the last line ensures that \(\ell ^{{{({N}})}} \in {\mathcal L }_N\). Observe that
which implies that \(\lim _{R\rightarrow \infty ,\varepsilon \rightarrow 0} \lim _{N \rightarrow \infty }d_{\mathcal L }(\frac{1}{N} \ell ^{{{({N}})}},\lambda ) = 0\). Similarly, \(\lim _{\varepsilon \rightarrow 0}\lim _{N \rightarrow \infty }d_{\mathcal A}(\alpha ^{{{({N}})}},\alpha ) = 0\), where \(\alpha ^{{{({N}})}} = \sum _k \ell ^{{{({N}})}}_k \delta _{\frac{k}{N}}\).
Step 5: Proof of the lower bound in (3.37). Now we finish the proof of the lower bound by showing that the recovery sequence \((\ell ^{{{{({N}})}}})_{N\in \mathbb {N}}\) that we constructed in Step 4 is giving the right asymptotics.
Fix \(\delta , \rho > 0\). Then by choosing \(R \in \mathbb {N}\) large enough and \(\varepsilon >0\) small enough, we have that \(\frac{1}{N}\ell ^{{{({N}})}}\in B_\delta (\lambda )\) and \(\alpha ^{{{({N}})}}\in B_\rho (\alpha )\) for all but finitely many \(N \in \mathbb {N}\). Hence, Lemma 3.7 implies that
Next, we want to use Lemmas 3.8 and 3.9(2) to get the exponential rates for the zterms. Note that we do not have \(\lim _{N\rightarrow \infty }\frac{1}{N}\ell ^{{{({N}})}} = \lambda \) on \(\{k :k\le R\}\), so we will instead apply Lemma 3.8 to the sequence \(\lfloor \lambda N\rfloor \), which gives
Using the definition (3.21), it is easy to verify that
for some constant \(\widetilde{C}(R,\varepsilon )\) vanishing as \(R \rightarrow \infty \) and \(\varepsilon \rightarrow 0\). For the mesoscopic part, we use the asymptotic formula (3.20) of Theorem 3.6 for \(p_N(k^{{{({\text {Me}, N}})}})\). We proved in Step 4 that we can actually apply Theorem 3.6 to \(p_N(k^{{{({\text {Me}, N}})}})\). This, together with Stirling’s formula, gives us
The righthand side clearly converges to \(I_\text {Me}(\nu )\) as \(\varepsilon \rightarrow 0\). The analysis of the macroscopic part in Step 4 shows that we can directly use Lemma 3.9(2) to see that
Taking the limits \(R \rightarrow \infty \) and \(\varepsilon \rightarrow 0\) we obtain the lower bound in (3.37). This finishes the proof of Theorem 3.1.
Remark 3.11
(The reducible case: proof of Theorem3.2) Recall from Sect. 2.3 that in the case that \(\kappa \) is reducible, one can decompose the type space \({\mathcal S }= \bigcup _j {\mathcal S }_j\) in such a way that \(\kappa \) restricted to \({\mathcal S }_j\times {\mathcal S }_j\) is irreducible and \(\left. \kappa \right _{{\mathcal S }_i\times {\mathcal S }_j}= 0\) for \(i\ne j\). Assume that \(\alpha \in {\mathcal A}\) is connectable as defined in Sect. 2.3: for each \(y\in \text {supp}(\alpha )\) there is some j such that \(\text {supp}( y)\subset {\mathcal S }_j\). Then we can argue that the lower bound in the proof of (3.37) holds. For any j we can construct a recovery sequence as before, where we approximate the macroscopic components by defining \(k^{{{({i,N}})}} := \lfloor Ny^{{{({i}})}} \rfloor + \mathbbm {1}_{{\mathcal S }_j}\) for any \(y^{{{({i}})}}\in \text {supp}(\alpha )\) to ensure that \(\tau (k^{{{({i,N}})}})>0\). In the same way we approximate the mesoscopic mass by defining \(\nu ^{{{({j}})}} := \nu \mathbbm {1}_{{\mathcal S }_j}\) and \(k^{{{({\text {Me},j,N}})}}:= \lfloor N\varepsilon \nu ^{{{({j}})}} \rfloor + \mathbbm {1}_{{\mathcal S }_j}\). The rest will work out as in the proof above. The only additional observation needed is that for any j we have that \((\kappa \nu ^{{{({j}})}})_s = (\kappa \nu )_s\) if \(s\in {\mathcal S }_j\).
On the other hand, if \(\alpha \in {\mathcal A}\) is not connectable, then we can argue that the upper bound of the LDP holds with \(I(\lambda , \alpha ) = \infty \) by additionally requiring that \(\kappa _N= \kappa \) for all but finitely many \(N\in \mathbb {N}\): Indeed, there exists \(y\in \text {supp}(\alpha )\) such that for any sequence \(k^{{{({N}})}}\) with \(\lim _{N\rightarrow \infty }\frac{k^{{{({N}})}}}{N} = y\) we have that \(\tau (k^{{{({N}})}}) =0\) and by using that \(\kappa _N=\kappa \) we get that \(p_N(k^{{{({N}})}}) \le \tau (k^{{{({N}})}})N^{(k1)} = 0\), which gives that \(I(\lambda , \alpha ) = \infty \).
4 Proofs of the results from Sect. 3.3
The aim of this section is to prove the results about the connection probabilities that where formulated in Sect. 3.3, namely Lemmas 3.4, 3.5 and Theorem 3.6. In our presentation we focus on deriving Theorem 3.6, while the lemmas will be a byproduct of the procedure. Their proofs can be found after Lemma 4.8.
The idea for the proof of Theorem 3.6 is to construct a sequence of graphs that contain with high probability a macroscopic component with the desired type configuration. In order to understand how to choose the parameters correctly we have to understand the characteristic equation (2.4) and how it emerges from the generating function of weighted trees. This will be done in Sect. 4.1. In Sect. 4.2 we will collect estimates that provide the link between the weighted trees and the connection probabilities of the graph and finally prove Theorem 3.6, which is a consequence of Lemmas 4.10 and 4.13.
4.1 The characteristic equation and tree combinatorics
In this section we discuss a power series representation of the solution of fixed point equation (2.4). This will be crucial both in the proof of Theorem 3.6 and in the analysis of the minimizers of the rate function in Sect. 6.
We rewrite the equation in the following way. Fix \(\nu = (\nu _s)_{s\in {\mathcal S }} \in [0,\infty )^{\mathcal S }\). We say that \(\nu ^*=(\nu ^*_s)_{s\in {\mathcal S }}\) is a solution to the characteristic equation with respect to \(\nu \) if
Recall that this is equivalent to the characteristic equation (2.11) that was studied in [7] via the substitution \(\nu = \mu \) and \(\rho = 1 \frac{\textrm{d}\nu ^*}{\textrm{d}\nu }\). However, we will need Eq. (4.1) on different occasions and for several choices of \(\nu \). Note that any solution \(\nu ^*\) to (4.1) is necessarily nonnegative and for any \(s\in {\mathcal S }\) we have that \(\nu ^*_s>0\) if and only if \(\nu _s>0\). Also note that \(\nu \) itself is always a solution. Whether there exists a nontrivial solution \(\nu ^*\) (i.e., \(\nu ^*\ne \nu \)) or not will turn out to depend on the quantity \(\varSigma (\kappa ,\nu )\), introduced in (2.2) and (2.3):
where we write \(\Vert \cdot \Vert _\nu \) for the norm on \(L^2(\mathbb {R}^{\mathcal S },\nu )\) and also for the corresponding operator norm. We note that \(\varSigma (\kappa , \nu )\) is equal to the spectral radius of the matrix \(T_{ \kappa ,\nu }=(\kappa (r,s)\nu _s)_{r,s\in {\mathcal S }}\), as is seen from an elementary analysis of (4.2), also using Frobenius eigenvalue theory. Indeed, the variational equations for the maximizer f in (4.2) (with \(\sum _r \nu _r f(r)^2=1\) instead of \(\le 1\)) say that f is a positive eigenvector of the matrix \(T_{\kappa ,\nu }^2\). Since also the (up to positive multiples, unique) positive eigenvector of \(T_{\kappa ,\nu }\) is a positive eigenvector of \(T_{\kappa ,\nu }^2\), we have that f is the Frobenius eigenvector of \(T_{\kappa ,\nu }\). The corresponding Frobenius eigenvalue, i.e., the spectral radius, is equal to \(\varSigma (\kappa ,\nu )\).
Another elementary application of Frobenius eigenvalue theory yields that the map \(\nu \mapsto \varSigma (\kappa , \nu )\) is nondecreasing with respect to componentwise ordering.
Now we cite the results from [7] regarding the solutions of the characteristic equation.
Lemma 4.1
Let \(\nu \in [0, \infty )^{\mathcal S }\).

(i)
If \(\varSigma (\kappa ,\nu ) \le 1\), then the only solution \(\nu ^*\) to the characteristic equation (4.1) satisfying \(\nu ^*\le \nu \) is given by \(\nu ^*= \nu \).

(ii)
If \(\varSigma ( \kappa , \nu ) > 1\), then there exists a solution \(\nu ^*\) to (4.1) that satisfies \(\nu ^* \le \nu \) and \(\nu ^*\ne \nu \). If additionally \(\kappa \) is irreducible, then \(\nu ^*\) is the only solution to (4.1) that satisfies \(\nu ^* \le \nu \) and \(\nu ^* \ne \nu \). Further, \(\varSigma (\kappa , \nu ^*) < 1\).
Proof
See Theorem 6.2 and Theorem 6.7 in [7] and substitute \(\rho = 1\frac{\textrm{d}\nu ^*}{\textrm{d}\nu }\). \(\square \)
Our aim is to verify the following.
Proposition 4.2
Let \(\nu \in [0,\infty )^{\mathcal S }\). Then for any \(r\in {\mathcal S }\)
where \(\nu ^*\) is the smallest solution to (4.1).
The result of Proposition 4.2 will be used in Sect. 4.2 to derive the asymptotics of the probability that a macroscopic set of vertices is connected. Further, it will be used in Sect. 6 to optimize the microscopic part of the rate function. The lefthand side of Eq. (4.3), when divided by \(\nu _r\), is equal to the extinction probability of the branching process that we mentioned in Sect. 2.2. This observation, together with additional results from [7], is already enough to prove Proposition 4.2. However, we will provide a different proof that mainly uses the structure of the power series and additionally gives us a refined control of the convergence of the series, thanks to the estimates given in Lemma 4.5. They will be used later to bound probabilistic terms that we derive from components of mesoscopic sizes, but also in Sect. 6 where we need uniform convergence for a certain family of power series.
We now prepare for the proof of Proposition 4.2. We define a function \(\varGamma =(\varGamma _r)_{r\in {\mathcal S }}:[0,\infty )^{\mathcal S }\rightarrow [0,\infty )^{\mathcal S }\) by putting, for \(\theta = (\theta _s)_{s\in {\mathcal S }} \in [0,\infty )^{\mathcal S }\),
The idea of the proof is to show that \(\theta \mapsto \varGamma (\theta )\) is the inverse of the function \(\nu \mapsto \theta (\nu ) := \nu {{\text {e}} }^{\kappa \nu }\) on the domain \(\{\nu [0,\infty )^{\mathcal S }:\varSigma (\kappa , \nu ) \le 1\}\). It turns out that this is an easy example of a technique known as Lagrange inversion, where directed trees are used to extract the variables of a power series, see [25, 26] for more general results on this topic. The relation of our combinatorial quantity \(\tau \) to directed trees is given in the following lemma. The second statement will be useful to derive a criterion for analyticity of the power series defined in (4.4).
Lemma 4.3
Let \(k\in \mathbb {N}_0^{\mathcal S }\) and \(r \in {\mathcal S }\). Then the following holds.

1.
Let \(\overrightarrow{{\mathcal T }}_{\!\!r}(k)\) denote the set of directed trees with vertex set \(\{1,\dots ,k\}\), root of type r and edges directed away from the root. We use the convention that \(\overrightarrow{{\mathcal T }}_{\!\!r}(0)=\emptyset \). Then
$$\begin{aligned} \tau (k)k_r = \sum _{T \in \overrightarrow{{\mathcal T }}_{\!\!r}(k)}\prod _{(i,j)\in E(T)}\kappa (x_i,x_j). \end{aligned}$$(4.5) 
2.
Write \(S_0 := \text {supp}(k) \subset {\mathcal S }\) and \(\overrightarrow{T}_{\!\!r}(S_0)\) for the set of all directed trees with vertex set \(S_0\), root vertex given by r and edges directed away from the root. We set \(\overrightarrow{T}_{\!\!r}(\emptyset )=\emptyset \). Then
$$\begin{aligned} \tau (k) k_r = \left( \prod _{s\in S_0} (\kappa k)_s^{k_s1}\right) \times \varDelta _{r}(k) \end{aligned}$$(4.6)where
$$\begin{aligned} \varDelta _{r}(k) = \sum _{A\in \overrightarrow{T}_{\!\!r}(S_0)} \prod _{(s,s^\prime )\in E(A)}\kappa (s,s^\prime ) k_s. \end{aligned}$$(4.7)
Proof
(1) Given some vertex \(i\in [{\left{k}\right}]\) of type r, we can turn any undirected tree from \({\mathcal T }(k)\) uniquely into a directed tree that has vertex i as its root by giving to each edge the direction away from the root. For each undirected tree T in \({\mathcal T }(k)\) there are \(k_r\) ways to choose the root, hence the weight of T appears \(k_r\) times on the righthand side of (4.5).
(2) We use the formula derived in [4, Theorem 2] with \(x_{s,s^\prime ,i}:= \kappa (s,s^\prime ) = \kappa (s^\prime ,s)\). Note that in [4] the edges of the trees are directed towards the root. Adapted to our notation their formula reads as
By (4.5) we have that
so together with (4.8) we have proven equation (4.6). Note that in the case \(r\notin S_0\), we have that \(\overrightarrow{T}_{\!\!r}(S_0) = \emptyset \), so both sides of (4.8) are 0. \(\square \)
Lemma 4.4
If \(\varGamma \) is analytic in \(\theta \) (i.e., \(\varGamma _r\) is analytic in \(\theta \) for all \(r\in {\mathcal S }\)), then for all \(r \in {\mathcal S }\)
Proof
Using formula (4.5) we can rewrite the power series using directed trees. Given a tree T in \(\overrightarrow{\mathcal T }_{\!\!r}(k)\) and some \(h\in \mathbb {N}_0^{\mathcal S }\) we say that a vertex \(i\in [{\left{k}\right}]\) is of parenttype (r, h) if i is of type r and points to exactly \(h_s\) vertices of type s for any \(s\in {\mathcal S }\). In that case, its weight is defined by \(w(i) =\prod _{s\in {\mathcal S }}\kappa (r,s)^{h_s} =:g_{r,h}\). The weight of the tree T is defined by \(w(T) := \prod _{i =1}^{{\left{k}\right}} w(i)\). Defining \(g_r(x) = \sum _{h \in \mathbb {N}_0^{\mathcal S }}g_{r,h}\prod _{s\in {\mathcal S }} \frac{x_s^{h_s}}{h_s!}\), for \(x \in \mathbb {R}^{\mathcal S }\), we can apply the multinomial theorem to see that \(g_r(x)=\exp \{(\kappa x)_r\}\). Althogether, we have that
which is an exponential generating function, evaluated in \(\theta \in \mathbb {R}^{\mathcal S }\) and shall be understood as a formal power series. We now cite a fact from [23] (see the proof of Theorem 1):
where all equations denote equality of formal power series, i.e., we have equality of the coefficients of \(\theta _s^{k_s}\). In particular, \(\varGamma _r(\theta )\) might be infinite. However, by our assumption that \(\varGamma \) is analytic in \(\theta \), Eq. (4.10) holds in the usual sense, i.e., all series in (4.10) converge absolutely. Recalling that \(g_r(x)=\exp \{(\kappa x)_r\}\), we have verified that (4.9) holds.
The idea behind proving (4.10) is to decompose the set of trees with root of type r into sets of trees with root of parenttype (r, h) for each \(h \in \mathbb {N}_0^{\mathcal S }\). For fixed \(h\in \mathbb {N}_0^{\mathcal S }\), one then decomposes a tree with root of parenttype (r, h) into a single vertex of type r and exactly \(h_s\) many trees with root of type s for each \(s\in {\mathcal S }\). By studying how this decomposition affects the exponential generating function, one obtains the formula. \(\square \)
Next, we derive a criterion for analyticity of the map \(\theta \mapsto \varGamma (\theta )\), i.e., about the domain of convergence of the corresponding power series. We need to introduce the quantity
where \({\mathcal M }_1({\mathcal S })\) denotes the set of probability measures on \({\mathcal S }\). One can easily see that \(\theta \mapsto \chi (\kappa , \theta )\) is lower semicontinuous.
Lemma 4.5

1.
For any \(\theta \in [0,\infty )^{\mathcal S }\) and \(r\in {\mathcal S }\),
$$\begin{aligned} \sum _{k\in \mathbb {N}_0^{\mathcal S }:k = n}\tau (k) k_r \prod _{s\in {\mathcal S }}\frac{\theta _s^{k_s}}{k_s!} = {{\text {e}} }^{o(n)}{{\text {e}} }^{n[\chi (\kappa , \theta )1]},\qquad n\rightarrow \infty . \end{aligned}$$(4.12) 
2.
Fix \(\theta \in [0,\infty )^{\mathcal S }\). If \(\chi (\kappa , \theta )>1\), then for any \(r\in {\mathcal S }\) the series \(\varGamma _r\) defined in (4.4) is analytic in \(\theta \). If on the other hand \(\chi (\kappa , \theta )<1\), then \(\varGamma _r(\theta )\) diverges.

3.
For arbitrary \(\nu \in (0,\infty )^{\mathcal S }\) and \(\theta _s(\kappa , \nu )=\nu _s {{\text {e}} }^{(\kappa \nu )_s}\), \(s\in {\mathcal S }\), we have that
$$\begin{aligned} \chi (\kappa , \theta (\kappa ,\nu ))= \varSigma (\kappa ,\nu ) \log \varSigma (\kappa , \nu )\ge 1 \end{aligned}$$(4.13)with equality if and only if \(\varSigma (\kappa , \nu ) =1\).
Proof
(1) We will use the identity (4.6) from Lemma 4.3. Let \(n\in \mathbb {N}\). We write \({\mathcal M }_1^{{{({n}})}}({\mathcal S })\) for the set of all probability measures v on \({\mathcal S }\) satisfying \(n v \in \mathbb {N}_0^{\mathcal S }\). For any \(k\in \mathbb {N}_0^{\mathcal S }\) with \({\left{k}\right} = n\) we substitute \(k = n v\) to get
where we used Stirlings formula \(N! = N^N{{\text {e}} }^{N}{{\text {e}} }^{o(N)}\) as \(N\rightarrow \infty \) and assumed throughout that \(k \ll \theta \) and \(v \ll \theta \) respectively. Note that \({\left{{\mathcal M }_1^{{{({n}})}}({\mathcal S })}\right} = {{\text {e}} }^{o(n)}\) and that the terms \(n^{{\left{{\mathcal S }}\right}}\) and \(\varDelta _r(nv)\) are only polynomial in n, thus also of order \({{\text {e}} }^{o(n)}\). Collecting everything, we arrive at (4.12).
(2) As a consequence of (1) we get the following. If \(\chi (\kappa , \theta ) >1\), then the series \(\varGamma _r(\theta )\) converges and since the mapping \(\theta \mapsto \chi (\kappa , \theta )\) is lowersemicontinuous, we get that \(\varGamma _r\) is analytic in \(\theta \). If \(\chi (\kappa , \theta ) <1\), then \(\varGamma _r(\theta )\) diverges.
(3) Put \(\phi (x) = x \log x\) for \(x\ge 0\). Inserting the definition \(\theta _s(\nu ) = \nu _s {{\text {e}} }^{(\kappa \nu )_s}\), \(s\in {\mathcal S }\), we can estimate for any \(v\in {\mathcal M }_1({\mathcal S })\)
where the estimate is due to Jensen’s inequality applied to the convex function \(\phi \). Further, we used the fact that v is a probability measure and the symmetry of \(\kappa \). Since \(\phi \) is even strictly convex, the application of Jensen’s inequality gives an equality if and only if v is such that the map \({\mathcal S }\ni r\mapsto v_r/((\kappa v)_r\nu _r)\) is constant, i.e., if there is some \(a\in \mathbb {R}\) such that \((\kappa v)_r = a \frac{v_r}{\nu _r}\) for any \(r\in {\mathcal S }\). Clearly, \(a>0\). It follows that \(\langle \nu , \kappa v \rangle = a\) and that \(w=(v_r/\nu _r)_{r\in {\mathcal S }}\) is an eigenvector of the matrix \(T_{\kappa ,\nu } = (\kappa (r,s)\nu _s)_{r,s\in {\mathcal S }}\) with eigenvalue a. Our assumption \(\nu >0\) and the irreducibility of \(\kappa \) imply that \(T_{\kappa ,\nu }\) is also irreducible. Hence the Perron–Frobenius theorem gives that, if a would be smaller than the spectal radius \(\varSigma (\kappa , \nu )\) of \(T_{\kappa ,\nu }\), then w could not be nonnegative, implying that \(v= (w_r \nu _r)_r\) would not be in \({\mathcal M }_1({\mathcal S })\). Hence, a is necessarily equal to \(\varSigma (\kappa , \nu )\) and Eq. (4.13) holds. Since \(x1\ge \log x\) holds for any \(x\ge 0\) with equality if and only if \(x=1\), the rest of the statement in (3) holds. \(\square \)
Now, we can give the proof of Proposition 4.2.
Proof of Proposition 4.2
We will proceed in three steps and weaken the assumptions on \(\nu \) gradually.
(1) Assume that \(\nu \in (0,\infty )^{\mathcal S }\) and that \(\kappa \) is irreducible with respect to \(\nu \). Consider the case \(\varSigma (\kappa ,\nu ) \ne 1\), then by Lemma 4.5 we have that \(\chi (\kappa ,\theta (\nu )) >1\) and thus for any \(r\in {\mathcal S }\) the power series \(\varGamma _r\) defined as in (4.4) is analytic in \(\theta (\nu )\). Applying Lemma 4.4 and using the definition of \(\theta (\nu )\) we get that
for any \(r\in {\mathcal S }\). In other words, \(\varGamma (\theta (\nu ))\) is a solution of (4.1). Now, Lemma 4.1 gives that \(\nu = \varGamma (\theta (\nu ))\) if \(\varSigma (\kappa ,\nu ) \le 1\) and the claim follows. If \(\varSigma (\kappa ,\nu ) > 1\), then by Lemma 4.1 there exists (a strictly positive) \(\nu ^* \le \nu \), \(\nu ^* \ne \nu \), solving Eq. (4.1) and satisfying \(\varSigma (\kappa ,\nu ^*) < 1\). So, \(\varGamma (\theta (\nu )) = \varGamma (\theta (\nu ^*)) = \nu ^*\) follows by applying the previous case.
Now, consider the case \(\varSigma (\kappa ,\nu ) =1\), which is equivalent to \(\chi (\theta (\nu )) = 1\). Let \(\nu ^{{{({n}})}} \nearrow \nu \), in particular \(\varSigma ( \kappa ,\nu ^{{{({n}})}}) <1\) for all n. Then by Fatou’s lemma and the first case we get that for any \(r\in {\mathcal S }\)
Put \(\nu ^*:= \varGamma (\theta (\nu ))\) and assume towards a contradiction that \(\nu ^* \ne \nu \). Then there exists \(s\in {\mathcal S }\) such that \(\nu ^*_s<\nu _s\) and by irreducibility of \(\kappa \) there exists at least one \(s^\prime \) such that \(\kappa (s^\prime ,s) > 0\), and thus \(\kappa (s^\prime , s) \nu ^*_s < \kappa (s^\prime , s) \nu _s\). Hence the Perron–Frobenius theorem implies that \(\varSigma (\kappa ,\nu ^*) <1\). Therefore the power series is analytic in \(\theta (\nu )\), so by Lemma 4.4 and the definition of \(\theta (\nu )\) we get that \(\nu ^* \exp (\kappa \nu ^*)= \nu \exp (\kappa \nu )\). Applying Lemma 4.1 yields \(\nu ^* = \nu \) in contradiction to our assumption.
(2) Let \(\nu \in (0,\infty )^{\mathcal S }\) and an arbitrary \(\kappa \). Then we can decompose \({\mathcal S }\) into disjoint sets \(S_j\), \(j\in J\), such that \(\kappa ^{{{({j}})}} = \left. \kappa \right _{S_j\times S_j}\) is irreducible with respect to \(\nu ^{{{({j}})}} = \left. \nu \right _{S_j}\) for any \(j\in J\). For any \(j\in J\) we can apply (1) to get that
where \(\nu ^{{{({j}})}}\) solves Eq. (4.1) on \(S_j\). Observe that if \(k\in \mathbb {N}_0^{\mathcal S }\) is such that \(\text {supp}(k) \not \subset S_j \) holds for any \(j\in J\), then \(\tau (k) = 0\). Consequently for fixed r and j such that \(r\in S_j\) we get that
where we also used that \((\kappa \nu )_s = (\kappa ^{{{({j}})}} \nu ^{{{({j}})}})_s\) for \(s\in S_j\) holds by construction. Define \(\nu ^*=(\nu ^*_s)_{s\in {\mathcal S }}\) by putting \(\nu ^*_s := \nu ^{{{({*,j}})}}_s\) if \(s\in S_j\). Then it is easy to verify that \(\nu ^*\) is the smallest solution to (4.1).
(3) Let \(\nu \in [0, \infty )^{\mathcal S }\). Observe that for \(r\notin \text {supp}(\nu )\) we have \(\varGamma _r(\theta (\nu )) = 0\). The rest follows by restricting to \(\text {supp}(\nu )\) and applying (2). \(\square \)
Remark 4.6
(Connection with branching processes) For the reader who is familiar with the results and techniques used in [7], a natural question that arises is about the connection between the power series that we study in Proposition 4.2 and the multitype branching process that is used in [7] to explore the clusters of the random graph, as we mention in Sect. 2.2. Indeed, both objects carry the same information, as is shown in the following lemma.
Lemma 4.7
Fix \(\nu \in (0,\infty )^{\mathcal S }\). Let \({\mathcal X }\) be a multitype branching process, where the individuals are equipped with types from \({\mathcal S }\) and an individual of type \(r\in {\mathcal S }\) gives birth to a number of individuals of type \(s\in {\mathcal S }\) that is Poisson distributed with parameter \(\kappa (r,s)\nu _s\), independently for each \(s\in {\mathcal S }\). Let \(\varXi \) be the vector in \(\mathbb {N}_0^{\mathcal S }\) that counts the total progeny of the process \({\mathcal X }\) according to their types. For \(r\in {\mathcal S }\) let \(\texttt{P}_r\) be the probability measure under which \({\mathcal X }\) starts with one single individual of type r. Then
We omit the proof of the lemma, which comes from combinatorial manipulations and properties of the Poisson distribution of the offspring.
As a consequence, the result of Proposition 4.2 can be reformulated in terms of the branching process \({\mathcal X }\). If \(\rho ^*_r\) denotes the probability that the process \({\mathcal X }\) goes extinct under \(\texttt{P}_r\), then
Furthermore, by substituting \(\rho ^*:= \frac{\nu ^*}{\nu }\), the statement of Proposition 4.2 is equivalent to the fact that the survival probability \(\rho = 1\rho ^*\) is the maximal nonnegative solution to
as stated in Theorem 2.5.
Recall that, in the singletype case \({\mathcal S }=1\), the righthand side of (4.18) is identical to the Borel distribution on \(\mathbb {N}_0\). Hence, we call it also in the general case where \({\mathcal S }\) is a finite set the multitype Borel distribution and denote it by \({\textrm{Bo}}_{\kappa ,\nu }\). It is a probability measure on \(\mathbb {N}_0^{\mathcal S }\) if and only if \(\rho ^* = 1\).
4.2 Proofs for the asymptotics for the connection probabilities
Using the results from Sect. 4.1 we can now study the connection probabilities for the different cases, namely for microscopic, mesoscopic and macroscopic clusters.
The first result provides the link between the connection probabilities of microscopic clusters and the weight of spanning trees. Its proof is elementary and uses wellknown arguments. As a consequence we can prove Lemmas 3.4 and 3.5. Recall the definition of \(\tau (k)\) from (3.6) and that \(\kappa \in [0,\infty )^{{\mathcal S }\times {\mathcal S }}\) is the limiting matrix of the sequence \(\kappa _N\) as \(N\rightarrow \infty \). We define \(\tau _N(k)\) as in (3.6) with respect to \(\kappa _N\) instead of \(\kappa \). We will assume that \(\kappa _N(r,s)\le N\) for any \(r,s\in {\mathcal S }\), which holds in the finite type setting if \(N\in \mathbb {N}\) is large enough.
Lemma 4.8
(Bounds for the connection probability of microscopic clusters) For any \(N\in \mathbb {N}\) and \(k\in \mathbb {N}_0^{\mathcal S }{\setminus }{\{0\}}\),
Proof
We start with the upper bound. For \(T \in {\mathcal T }(k)\) we denote by \(\varOmega _T\) the event that the edge set E(T) of T is contained in the edge set of \({\mathcal G }({\left{k}\right},{{\textbf {x}}}, {\textstyle {\frac{1}{N}}}\kappa _N)\). Since \({\mathcal G }({\left{k}\right},{{\textbf {x}}}, {\textstyle {\frac{1}{N}}}\kappa _N)\) has to contain at least one spanning tree in order to be connected, we have
where we used the fact that each \(T \in {\mathcal T }(k)\) has exactly \({\left{k}\right}1\) edges.
Now we continue with the lower bound. For \(T \in {\mathcal T }(k)\) we denote by \(\widetilde{\varOmega }_T\) the event that the edge set of \({\mathcal G }({\left{k}\right},{{\textbf {x}}}, {\textstyle {\frac{1}{N}}}\kappa _N)\) is equal to E(T). Note that the events \(\widetilde{\varOmega }_T\), \(T \in {\mathcal T }(k)\), are disjoint and therefore
\(\square \)
Proof of Lemma 3.4
Equation (3.17) is a Corollary of the estimate (4.20) and the fact that the lefthand side of (4.20) converges to 1, as \(N\rightarrow \infty \), since k does not depend on N and \({\left\Vert {\kappa _N}\right\Vert }_\infty \) is bounded in N. \(\square \)
Proof of Lemma 3.5
For fixed \(k\in \mathbb {N}_0^{\mathcal S }\) denote \(S_k:=\text {supp}(k)\). For \(p_N(k)\) and \(r\in S_k\) we use the upper bound in (4.20) from Lemma 4.8 and also the formula for \(\tau _N(k)k_r\) given in (4.6) to obtain
where \(\varDelta _{N,r}(k)\) is defined as in (4.7) but with respect to the kernel \(\kappa _N\). Now, observe that
and by Cayley’s formula we have that \({\mathcal T }_r(S_k) = S_k^{S_k1}\). Hence, combining Eq. (4.21) with the inequality (4.22) gives (3.18). \(\square \)
Now we turn to the proof of Theorem 3.6, which comes as a consequence of the following Lemmas 4.10–4.13. The intuitive idea of the proof is to embed \({\mathcal G }(k^{{{({N}})}},{{\textbf {x}}},\frac{1}{N}\kappa _N)\) in a larger random graph with vertex set given by some \(m^{{{({N}})}} \in \mathbb {N}_0^{\mathcal S }\) such that the component \(k^{{{({N}})}}\) appears with high probability as the typical giant component in this graph. We make heavy use of the expansion in Lemma 4.9, which is a straightforward multitype generalization of equation (4) from [24].
Lemma 4.9
Fix \(m\in \mathbb {N}_0^{\mathcal S }\) and \(r\in {\mathcal S }\) such that \(m_r\ge 1\). Then the following formula holds
Proof
Consider the graph \({\mathcal G }(m,x,\frac{1}{N}\kappa _N)\) where x is a fixed vector compatible with m. Fix a vertex \(i\in \{1, \ldots , m\}\) of type r. For fixed \(h\in \mathbb {N}_0^{\mathcal S }\) with \({{\textbf {e}}}_r \le h \le m\) denote by \(\varOmega (h)\) the event that the connected component containing i is given by some set \(C\subset \{1, \ldots , m\}\) that contains exactly \(h_s\) vertices of type s for each \(s\in {\mathcal S }\). We claim that the summand on the righthand side of (4.23) is the probability of \(\varOmega (h)\). Indeed, there are \(\prod _{s\in {\mathcal S }} \left( {\begin{array}{c}m_s \delta _{r,s}\\ h_s \delta _{r,s}\end{array}}\right) \) possibilities to choose a set \(C{\setminus }\{i\}\) from \(\{1, \ldots , m\}{\setminus }\{i\}\). The probability that C is the connected component containing vertex i is given as the product of \(p_N(h)\), i.e., the probability that it is connected, and the probability that no edge exists between C and its complement \(\{1, \ldots , m\} {\setminus } C\), which is easily seen to be equal to \(\prod _{s,{{\tilde{s}}}\in {\mathcal S }}( 1 \frac{\kappa _N(s, \tilde{s})}{N})^{h_s(m_{{{\tilde{s}}}}h_{{{\tilde{s}}}})}\). Equation (4.23) follows by the observation that the events \(\varOmega (h)\), \({{\textbf {e}}}_r \le h \le m\), form a decomposition of the underlying probability space. \(\square \)
The idea is now to pick \(m=m^{{{{({N}})}}}\) in such a way that the summand for \(h=k^{{{{({N}})}}}\) is maximal, such that, on the exponential scale, the righthand side of (4.23) can be replaced by just this summand. It will turn out that the correct choice is \(m^{{{{({N}})}}}\sim N \nu \) with \(\nu _r=y_r/(1{{\text {e}} }^{(\kappa y)_r})\) for \(r\in {\mathcal S }\). Intuitively, this choice of \(\nu \) comes from inverting equation (2.11): indeed one can see that \(y=\rho \nu \) where \(\rho \) solves
Notice that the assumption \(y\ll \kappa y\) is crucial for \(\nu \) to be welldefined. While in the upper bound of Lemma 4.10 this is not important (in that case we see from (4.25) that the upper bound of \(p_N(k^{{{({N}})}})\) converges to 0), we need it in order to get a nontrivial lower bound in Lemma 4.12.
Lemma 4.10
(Upper bound in (3.20)) Let \(k\in \mathbb {N}_0^{{\mathcal S }}\) with \(k_r\ge 1\) for some \(r\in {\mathcal S }\). Let \(m\in \mathbb {N}_0^ {\mathcal S }\) such that \(m\le k\). Then
Fix \(y\in [0,1]^{\mathcal S }{\setminus }\{0\}\). Let \(\{k^{{{({N}})}}\}_{N\in \mathbb {N}}\) be a sequence in \(\mathbb {N}_0^{\mathcal S }\) such that \(\lim _{N\rightarrow \infty }{\textstyle {\frac{k^{{{{({N}})}}}_r}{N}}}=y_r\) for all \(r\in {\mathcal S }\). Then
where the righthand side takes the value \(\infty \) when \(y\not \ll \kappa y\).
Proof
The inequality (4.24) is a direct consequence of (4.23), where from the righthand side we pick only the summand for \(h=k\), which is present since \(m\ge k\).
Let us focus now on (4.25). Let \(m^{{{({N}})}}\in \mathbb {N}_0^{\mathcal S }\) be such that \(m^{{{({N}})}}_r :=\lfloor {k^{{{({N}})}}_r}{ (1{{\text {e}} }^{\frac{1}{N} (\kappa k^{{{({N}})}})_r })^{1}}\rfloor \) for all \(r\in {\mathcal S }\), therefore \(m^{{{({N}})}}\ge k^{{{({N}})}}\). Fix \(r\in {\mathcal S }\) such that \(k^{{{({N}})}}_r\ge 1\) and consider (4.24).
For brevity we will write k and m instead of \(k^{{{({N}})}}\) and \(m^{{{({N}})}}\). With the help of Stirling’s formula \(n!=(n/{{\text {e}} })^n{{\text {e}} }^{o(n)}\) and the exponential limit theorem \(\lim _{n\rightarrow \infty }(1+\frac{c}{n})^n={{\text {e}} }^{c}\) and some elementary calculation, we get that, as \(N\rightarrow \infty \),
where in the last step we used that \(k/m = 1{{\text {e}} }^{\frac{1}{N}\kappa k}\) (by the definition of m). Now, we insert the asymptotics in (4.24) and see that
Consequently, the claim in (4.25) follows, including the convention for \(y\not \ll \kappa y\). \(\square \)
In order to prove the lower bound we need the following auxiliary lemma.
Lemma 4.11
For \(h^\prime ,h \in \mathbb {N}_0^{\mathcal S }\) with \(h^\prime \le h\) we have, for any \(N \in \mathbb {N}\),
Proof
Let \(I:=[h]\) and let \(x= (x_i)_{i\in I}\) be compatible with h, i.e., \(\sum _{i\in I} \delta _{x_i}= h\). Consider the graph \(G:= {\mathcal G }(h,x,\frac{1}{N}\kappa _N)\) on the vertex set I. There exists \(I^\prime \subset I\) such that \(x^\prime = (x_i)_{i\in I^\prime }\) is compatible with \(h^\prime \). The subgraph \(G^\prime \) of G that is induced by \(I^\prime \) can be identified with \({\mathcal G }(h^\prime , x^\prime ,\frac{1}{N}\kappa '_N)\), where \(\kappa '\) denotes the restriction of \(\kappa \) to \(I'\times I'\). If \(G^\prime \) is connected and for any \(i\in I{\setminus } I^\prime \) there is an edge \(\{i,j\}\in E(G)\) with \(j\in I^\prime \), then also G is connected. Therefore, using that \(1x\le {{\text {e}} }^{x}\) for any \(x\in \mathbb {R}\), we have
\(\square \)
In the following lemmas we give the lower bound for the connection probabilities in the macroscopic setting. It is sufficient to restrict to the case \(y\ll \kappa y\), since otherwise the limit is already ensured to be \(\infty \) by Lemma 4.10.
Lemma 4.12
(Lower bound in (3.20) for irreducible \(\kappa \)) Fix \(y\in (0,1]^{\mathcal S }\) satisfying \(y\ll \kappa y\) and assume that \(\kappa \) is irreducible with respect to y. Let \(\{k^{{{({N}})}}\}_{N\in \mathbb {N}}\) be a sequence in \(\mathbb {N}_0^{\mathcal S }\) such that \(\lim _{N\rightarrow \infty }{\textstyle {\frac{k^{{{{({N}})}}}_r}{N}}}=y_r\) for all \(r\in {\mathcal S }\) and assume that \(\tau (k^{{{({N}})}}) >0\) holds for all \(N\in \mathbb {N}\). Then
Proof
For \(\delta \ge 0\) satisfying \(2\delta < \inf \{y_s :s\in {\mathcal S }\}\) define \(y^{{{({\delta }})}} := y\delta \) as well as
For purely technical reasons that will become apparent later, we cannot work directly with \(\nu := \nu ^{{{({0}})}}\). Note that for all \(\delta \ge 0\) our construction ensures that with respect to \(\nu ^{{{({\delta }})}}\) the characteristic equation (4.1) has a nontrivial solution, which is equivalent to \(\varSigma (\kappa ,\nu ^{{{({\delta }})}}) > 1\) by Lemma 4.1. Let \(m^{{{({N,\delta }})}}\in \mathbb {N}_0^{\mathcal S }\) be such that \(m^{{{({N,\delta }})}}_r=\lfloor ( k^{{{({N}})}}N\delta )(1{{\text {e}} }^{\frac{1}{N} (\kappa (kN\delta ))_r})^{1}\rfloor \). In particular, \(\lim _{N\rightarrow \infty }{\textstyle {\frac{m^{{{({N,\delta }})}}}{N}}}=\nu ^{{{({\delta }})}}\). Clearly, we have that \(\nu ^{{{({\delta }})}} \rightarrow \nu \) as \(\delta \rightarrow 0\). Note that \(y \le \nu \). Moreover, there exist \(\delta ^*>0\) and \(N_0\) such that \(m^{{{({N, \delta }})}}\ge k^{{{({N}})}}\) for all \(\delta \le \delta ^*\) and \(N\ge N_0\), which we will assume from now on. For brevity we will write k and \(m^{{{({\delta }})}}\) instead of \(k^{{{({N}})}}\) and \(m^{{{({N, \delta }})}}\).
Fix \(r\in {\mathcal S }\) with \(y_r>0\) and note that by the assumption \(y\ll \kappa y\) this implies \((\kappa y)_r>0\) and thus \(y_r<\nu _r\). From here on, h will always denote an element in the set \(\{h\in \mathbb {N}_0^{\mathcal S }:{{\textbf {e}}}_r \le h\le m^{{{({\delta }})}}\}\). For such h abbreviate
Recall from Lemma 4.9 that the sum of \(a^{{{({\delta }})}}_N(h)\) over the mentioned h’s is equal to one. In the following, we will split this sum into the three parts where \(h\le R\), and \(R<h\le \varepsilon N\) and \(h>\varepsilon N\) for some large \(R\in \mathbb {N}\) and some small \(\varepsilon >0\). We will show that the first part converges towards something in (0, 1), the second part vanishes, and we will identify the exponential rate of the other one explicitly via some Laplace approximation. Explicitly, we will prove the following three claims:
Claim 2: If \(\varepsilon \) is small enough, for some sufficiently small \(\delta _0>0\) and some sufficiently large \(N_0\in \mathbb {N}\),
Claim 3: For any \(\delta >0\),
for some sequence \((\varepsilon _N)_N= (\varepsilon _N(\delta ))_N\) that converges to 0, where \({\widetilde{f}}(y,\nu ) = \langle y, \log (1{{\text {e}} }^{\kappa y})\rangle \), and \(C(\delta )\) is a constant only depending on \(\delta \) and vanishing as \(\delta \downarrow 0\).
Let us first explain how the assertion of the lemma follows from these three claims. We start by using Lemma 4.9 and Claim 3 to obtain, in the limit as \(N\rightarrow \infty \),
By (4.30) and (4.31), the left hand side is not smaller than \(y_r/\nu _r\), when taking the limits as \(N\rightarrow \infty \), followed by \(R\rightarrow \infty \) and \(\delta \downarrow 0\), if \(\varepsilon \) is small enough. In particular, the lefthand side is bounded away from zero when taking these limits. Hence, the exponential rate of the righthand side as \(N\rightarrow \infty \) is nonnegative. This implies that \(\liminf _{N\rightarrow \infty }\frac{1}{N} \log p_n(k)\ge {\widetilde{f}}(y,\nu )\), which is the assertion of the lemma. It remains to prove the three claims.
Proof of Claim 3: As we explained in the proof of Lemma 4.10, the exponential rate of the two products in the definition of \(a^{{{({\delta }})}}_N(h)\) can easily be identified with the help of Stirling’s formula and the exponential limit theorem \(\lim _{n\rightarrow \infty }(1+\frac{c}{n})^n={{\text {e}} }^{c}\) and elementary calculations. Indeed, for any sequence \(h^{{{({N}})}}\) such that \(\lim _{N\rightarrow \infty }\frac{h^{{{({N}})}}}{N} = x\) exists and \(x \in [0,1]^{\mathcal S }{\setminus }\{0\}\), and for any \(\delta >0\),
where we introduce \(f(x,{{\tilde{x}}}) = \langle x, \log \frac{x}{\tilde{x}}\rangle + \langle {{\tilde{x}}}x, \log \frac{{{\tilde{x}}}x}{{{\tilde{x}}}} \rangle + \langle {{\tilde{x}}}x, \kappa x\rangle \) for \(x,{{\tilde{x}}} \in [0,1]^{\mathcal S }{\setminus }\{0\}\) and \(x\le {{\tilde{x}}}\). We can write
where we write \(H=(H_s)_{s\in {\mathcal S }}\) and for \(s\in {\mathcal S }\) we write \(H_s(x;{{\tilde{x}}})\) for the entropy of the Bernoulli distribution with parameter \(x_s/{{\tilde{x}}}_s\) with respect to the one with parameter \(1{{\text {e}} }^{(\kappa x)_s}\).
The second term in the second line of (4.34) will be handled jointly with the exponential rate of \(p_N(h^{{{{({N}})}}})\), so let us discuss here the minimum of the first. Since every component of H is an entropy between probability measures, we have that \(H(x,\nu ^{{{({\delta }})}}) \ge 0\) pointwise for all \(x\le \nu ^{{{({\delta }})}}\) with equality if and only if \(\frac{x}{\nu ^{{{({\delta }})}}} = 1 {{\text {e}} }^{\kappa x}\). As \(\kappa \) is irreducible with respect to \(\nu ^{{{({\delta }})}}\) the latter condition is true if and only if \(x= y^{{{({\delta }})}}\) or \(x=0\), see Lemma 4.1. Hence, we have seen that, for \(\varepsilon \) sufficiently small, \(\min _{x:x\ge \varepsilon } \langle \nu ^{{{{({\delta }})}}}, H(x;\nu ^{{{{({\delta }})}}})\rangle =\langle \nu ^{{{{({\delta }})}}}, H(y^{{{{({\delta }})}}};\nu ^{{{{({\delta }})}}})\rangle \).
Let us now prove (4.32). The work we still have to do is to combine (4.33) with estimates for the \(p_N\)term from Lemmas 4.10 and 4.11, and we have to distinguish the cases that h is left of k but close to k (then Lemma 4.11 applies), or bounded away from k (then Lemma 4.10 suffices). Let \(h^{{{({N}})}}\) be such that \(h^{{{({N}})}}>\varepsilon N\) and assume that \(x:=\lim _{N\rightarrow \infty }\frac{h^{{{({N}})}}}{N}\) exists. Note that \(x\in [0,1]{\setminus }\{0\}\). We first examine the case where \(h^{{{({N}})}}\in [k2N\delta ,k]\) for large N and hence \(xy\le 2\delta \). With the help of (4.33) and using the estimate from Lemma 4.11 we get that
where
and it can be easily verified that \(C(\delta ) \rightarrow 0\) as \(\delta \rightarrow 0\). Now, let us examine the case \(h^{{{({N}})}}\notin [k2N\delta ,k]\) for large N, i.e., \(xy^{{{({\delta }})}}\ge \delta \). We start with (4.33) and use the estimate (4.25) from Lemma 4.10 to get
where the number
is strictly positive since the function \(x\mapsto H(x,\nu ^{{{({\delta }})}})\) is continuous and its only zeros are at \(x=0\) and \(x= y^{{{({\delta }})}}\). Now, it is not hard to see that \(\#\{h\in \mathbb {N}_0^{\mathcal S }:h \le m^{{{({\delta }})}}\}\le (\nu N)^{{\mathcal S }} = {{\text {e}} }^{o(N)}\), hence a simple Laplace approximation argument implies that (4.32) holds with \(\varepsilon _N:= {{\text {e}} }^{o(N)}{{\text {e}} }^{N{\widetilde{C}}(\delta )}\), which vanishes as \(N\rightarrow \infty \) since \({\widetilde{C}}(\delta )>0\).
Proof of Claim 1: Applying Lemma 4.8 to \(p_N(h)\) we get,for any \(N,R\in \mathbb {N}\) and \(\delta >0\),
where we used that \(1x\le {{\text {e}} }^{x}\) for any \(x\in \mathbb {R}\) in the second line. Hence,
Observe that by Proposition 4.2 and by the definition of \(\nu ^{{{({\delta }})}}\) given in (4.28), we have that \(\lim _{R\rightarrow \infty }\varLambda ^{{{({\delta }})}}_{R} = \frac{\nu ^{{{({\delta }})}}_ry^{{{({\delta }})}}_r}{\nu ^{{{({\delta }})}}_r} \in [0,1)\) and clearly \(\lim _{\delta \rightarrow 0}\lim _{R\rightarrow \infty } \varLambda ^{{{({\delta }})}}_R = \frac{\nu _ry_r}{\nu _r}\in (0,1)\).
Proof of Claim 2: For the sum on h satisfying \(R< h\le \varepsilon N\), we can use basically the same estimates as in (4.39), but we modify the second line by now estimating
since we do not have that \(m^{{{{({\delta }})}}}\ge h\) on this sum. This gives
This is the main part of a power series with coefficients \(\tau \), which we studied in Lemma 4.5. Note that the term in round brackets converges to \(\nu _s {{\text {e}} }^{(\kappa \nu )_s}\) as \(N \rightarrow \infty \), followed by \(\delta \downarrow 0\). Furthermore, this vector \(\theta (\kappa ,\nu )= \nu {{\text {e}} }^{\kappa \nu }\) satisfies \( \chi (\kappa ,\theta (\kappa ,\nu ))>1\), since \(\varSigma (\kappa ,\nu )>1\) by construction (see Lemma 4.5(3)). It will turn out that this is sufficient to estimate the sum on n against some convergent geometric series.
We pick a small threshold \(\varepsilon '>0\). We may pick \(\delta _0\) so small and then \(N_0 \in \mathbb {N}\) so large that
Additionally, we can also assume that \(N/m^{{{{({\delta }})}}}_r\le 1+1/\nu _r\) for these \(\delta \) and N. By \(\tau \) we denote the function defined as in (3.6) for the matrix \(\kappa \), then, since \(\lim _{N\rightarrow \infty }\kappa _N=\kappa \), we may also pick \(N_0\) so large that additionally
We also may and will assume that \(\Vert \kappa _N\Vert _\infty \le 1+\Vert \kappa \Vert _\infty \) for \(N \ge N_0\). Therefore, we can now estimate, for \(\delta \in (0,\delta _0] \) and \(N\ge N_0\), and any \(R\in \mathbb {N}\),
Now we can apply (4.12) in Lemma 4.5 to estimate the sum on h against \({{\text {e}} }^{o(n)}{{\text {e}} }^{n[\chi (\kappa ,\theta (\kappa ,\nu ))1]}\). We recall that \(\chi (\kappa ,\theta (\kappa ,\nu ))1>0\) and pick now \(\varepsilon \) and \(\varepsilon '\) so small that the sum on n can be estimated against a convergent geometric series. Hence, the entire sum vanishes as \(R\rightarrow \infty \), which ends the proof. \(\square \)
In the following we want to verify the statement of Lemma 4.12 without assuming that \(\kappa \) is irreducible with respect to y but still under the assumption that \(\tau (k^{{{({N}})}})> 0\). What we have in mind are components \(k^{{{({N}})}}\) where for a certain set of types \({{\tilde{S}}} \subset {\mathcal S }\) only o(N) vertices of those types are available, i.e., \(k^{{{({N}})}}_s \in o(N)\) for all \(s\in {{\tilde{S}}}\). However, these types might be necessary in order to connect the vertices with types in \({\mathcal S }{\setminus } {{\tilde{S}}}\). For that case we prove in the following that on the exponential scale the asymptotics of the connection probability \(p_N(k^{{{({N}})}})\) are the same as in the previous lemma.
Lemma 4.13
(Lower bound in (3.20) under generalized assumptions) Fix \(y\in [0,1]^{\mathcal S }{\setminus }\{0\}\) satisfying \(y\ll \kappa y\). Let \(\{k^{{{({N}})}}\}_{N\in \mathbb {N}}\) be a sequence in \(\mathbb {N}_0^{\mathcal S }\) such that \(\lim _{N\rightarrow \infty }{\textstyle {\frac{k^{{{{({N}})}}}_r}{N}}}=y_r\) for all \(r\in {\mathcal S }\) and assume that \(\{k^{{{({N}})}}_s\}_N\) is bounded for all \(s\notin \text {supp}(y)\). Assume further that \(\tau (k^{{{({N}})}}) >0\) holds for all but finitely many \(N\in \mathbb {N}\). Then
Proof
Throughout the proof we will assume without loss of generality that \({\mathcal S }= \bigcap _{N}\text {supp}(k^{{{({N}})}})\). If \(y>0\) on \({\mathcal S }\) and \(\kappa \) is irreducible with respect to y, then we are in the setting of Lemma 4.12, which yields the claim. In the other case, it always holds that \({{\tilde{S}}} := {\mathcal S }{\setminus } \text {supp}(y) \ne \emptyset \). We decompose \(\text {supp}(y)\) into disjoint sets \(S_j\), \(j\in J\), satisfying that the restriction of \(\kappa \) to \(S_j \times S_j\) is irreducible with respect to y restricted to \(S_j\).
We will start with the special case where \(k^{{{({N}})}}_s = 1\) for all \(s\in {{\tilde{S}}}\) and \(N\in \mathbb {N}\). It is easily seen that the assumption \(\tau (k^{{{({N}})}}) > 0\) for all but finitely many \(N\in \mathbb {N}\) implies that there exists a tree T on the type set \({\mathcal S }\) such that \(\prod _{\{r,s\}\in E(T)}\kappa (r,s) > 0\). Define
For every \(r\in {\mathcal S }\) we now fix a vertex \(i_r\in [k^{{{({N}})}}]\) that is of type r. Note that \(\{i_r:r\in {{\tilde{S}}}\}\) already contains all the vertices with types in \({{\tilde{S}}}\). We will abbreviate \({\mathcal G }_N = {\mathcal G }(k^{{{({N}})}},x,\frac{1}{N}\kappa _N)\). Note that the event \(\{\{i_r, i_s\}\in E({\mathcal G }_N) :\{r,s\}\in {{\tilde{E}}}\}\) is independent of the existence of edges \(\{i,i^\prime \}\), if the types of i and \(i^\prime \) are in \(S_j\) for any \(j\in J\). Therefore,
Clearly the second product is \( \ge {{\text {e}} }^{o(N)}\), since \(\kappa _N\) is bounded away from zero, uniformly in all large N. Applying Lemma 4.12 to \(k^{{{({N}})}}\mathbbm {1}_{S_j}\) for every \(j\in J\), we get that
where in the last equation we used that with fixed \(j\in J\), for any \(r\in S_j\) and \(s\in {\mathcal S }{\setminus } S_j\) we have either \(\kappa (r,s) =0\) or \(y_s=0\). Thus, for \(r\in S_j\) we get
This implies that Eq. (4.43) holds in the special case.
Now, consider the general case, where \(\{k^{{{({N}})}}_s\}_N\) is bounded in N for all \(s\in {{\tilde{S}}}\). Then by Lemma 4.11
Observe that the last product is \({{\text {e}} }^{o(N)}\) by assumption, hence the claim is implied by the treatment of the first case. \(\square \)
5 Proof of Theorem 1.1 for a general type set
The aim of this section is to finally prove our main result, Theorem 1.1, for the general case, where the type space \({\mathcal S }\) is some compact metric space. The main idea is to approximate the general graph model introduced in Sect. 1.1 by a discretized model with a finite type space and discretized kernels. To derive the LDP we will use the LDP of Theorem 3.1 for finite type spaces, together with the Dawson–Gärtner theorem, which we will slightly modify for our purposes.
In Sect. 5.1 we introduce the approximation scheme and derive lower and upper approximations for the distribution of the empirical measures of the general graph via a comparison with certain discretized graph models. In order to lift statements about the distribution of the discretized model to the general case we introduce in Sect. 5.2 a projective system. There, we also identify the projective limit spaces with the state spaces of our empirical measures, i.e., with \({\mathcal L }\times {\mathcal A}\), and verify that the projective limit topology is strong enough to imply an LDP result also with respect to our chosen topology. In Sect. 5.3 we conclude with the derivation of our main result by using the Dawson–Gärtner approach, adapted to our purposes: we have to use different distributions for the lower and the upper bound and will have to deal with some additional technical difficulty concerning the lower bound. Finally, Theorem 1.1 is implied by combining the results of Lemma 5.16, where we prove the upper bound, and Lemma 5.18, where we prove the lower bound.
The following objects will be fixed for the remainder of the section. Let \({\mathcal S }\) be a compact metric space. Fix a probability measure \(\mu \) on \({\mathcal S }\). For any N fix a type vector \({{\textbf {x}}} = {{\textbf {x}}}^{{{({N}})}}=(x_1,\dots ,x_N)\in {\mathcal S }^N\) such that its empirical measure \(\mu _N =\frac{1}{N}\sum _{i=1}^N\delta _{x_i}\) weakly converges to \(\mu \) as \(N\rightarrow \infty \). Furthermore, let a continuous kernel \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) be given that is irreducible w.r.t. \(\mu \), as well as a sequence of continuous kernels \(\kappa _N\) that converges to \(\kappa \) uniformly on \({\mathcal S }\times {\mathcal S }\) as \(N\rightarrow \infty \). By \({\mathcal G }_N={\mathcal G }([N],{\textbf {x}},\frac{1}{N}\kappa _N)\) we denote the inhomogeneous random graph introduced in Sect. 1.1 and by \(({\mathcal C }_i)_i\) we denote the vertex sets of the connected components of \({\mathcal G }_N\). We will denote the probability measure corresponding to the graph \({\mathcal G }_N\) by \(\mathbb {P}_N\). Recall the definition of the microscopic and macroscopic empirical measures \(\text {Mi}_N\) and \(\text {Ma}_N\) given in (1.2). The goal is to derive the LDP for the pair \((\text {Mi}_N,\text {Ma}_N)\) that is formulated in Theorem 1.1.
Before chosing a discretization scheme, we would like to collect some properties of our state space \({\mathcal L }\times {\mathcal A}\). Recall that
and that both are equipped with vague topologies, i.e., a sequence \((\lambda _n)_{n\in \mathbb {N}}\) in \({\mathcal L }\) converges to \(\lambda \), if for any continuous, compactly supported test function \(g:{\mathcal M }_{\mathbb {N}_0}({\mathcal S })\rightarrow \mathbb {R}\) the integrals \(\int g\,\textrm{d}\lambda _n\) converge to \(\int g \,\textrm{d}\lambda \), as \(n\rightarrow \infty \). In the same way, a sequence \((\alpha _n)_{n\in \mathbb {N}}\) in \({\mathcal A}\) converges to \(\alpha \), if for any continuous, compactly supported test function \(g:{\mathcal M }({\mathcal S }){\setminus }\{0\}\rightarrow \mathbb {R}\) the integrals \(\int g\,\textrm{d}\alpha _n\) converge to \(\int g \,\textrm{d}\alpha \), as \(n\rightarrow \infty \). Both on \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) and \({\mathcal M }({\mathcal S }){\setminus }\{0\}\) we consider the topologies of weak convergence. In the next lemma, we give a short characterization of compactness that is implied by this choice; a verification is left to the reader.
Lemma 5.1
The following assertions hold:

1.
A subset \({\mathcal N }\subset {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) is compact if and only if it is closed and \(\sup \left\{ \nu ({\mathcal S }) :\nu \in {\mathcal N }\right\} < \infty \).

2.
A subset \({\mathcal N }\subset {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}\) is compact if and only if it is closed and \(\inf \left\{ \nu ({\mathcal S }) :\nu \in {\mathcal N }\right\} > 0\).
The following lemma implies that any vague accumulation points of \(\text {Mi}_N\) and \(\text {Ma}_N\) are indeed elements of \({\mathcal L }\) and \({\mathcal A}\) if they exist.
Lemma 5.2
Both \({\mathcal L }\) and \({\mathcal A}\) are compact spaces.
Proof
Denote \(M:= \{\mu \} \cup \{\mu _N:N\in \mathbb {N}\}\).
(1) Compactness of \({\mathcal L }\): For any \(\lambda \in {\mathcal L }\) note that \(\lambda  \le c_\lambda ({\mathcal S })\le \sup _{{\tilde{\mu }}\in M}{\tilde{\mu }}({\mathcal S }) = 1\). The set \(B_1:= \{\lambda \in {\mathcal M }({\mathcal M }_{\mathbb {N}_0}({\mathcal S }) ):\lambda \le 1\}\) is a bounded subset of the dual of \(C_0({\mathcal M }_{\mathbb {N}_0}({\mathcal S }) {\setminus }\{0\})\) (the space of continuous functions g on \({\mathcal M }_{\mathbb {N}_0}({\mathcal S }) \) with \(\lim _{k({\mathcal S })\rightarrow \infty }g(k)=0\)), so by applying the BanachAlaoglu Theorem we get that \(B_1\) is compact w.r.t. the vague topology. Since \({\mathcal L }\subset B_1\) it remains to argue that \({\mathcal L }\) is closed. Let \((\lambda _n)_{n\in \mathbb {N}}\) be a sequence in \({\mathcal L }\) with vague limit \(\lambda \). Then for each \(n\in \mathbb {N}\) there exists \({\tilde{\mu }}_n \in M\) such that \(c_{\lambda _n}\le \tilde{\mu }_n\). Since M is compact w.r.t. the weak topology, we can find \({\tilde{\mu }}\in M\) and a subsequence (which we will also denote by \(({\tilde{\mu }}_n)_{n\in \mathbb {N}}\)) such that \({\tilde{\mu }}_n\rightarrow {\tilde{\mu }}\) weakly, as \(n\rightarrow \infty \). We now argue that this implies that \(c_{\lambda }\le {\tilde{\mu }}\). Let \(f:{\mathcal S }\rightarrow [0,\infty )\) be a continuous and bounded function. For any \(R\in \mathbb {N}\), let \(\chi _R:[0,\infty ) \rightarrow [0,\infty )\) be a smooth function satisfying \(\mathbbm {1}_{[0,R]}\le \chi _R\le \mathbbm {1}_{[0,R+1]}\), such that \(R\mapsto \chi _R\) is increasing pointwise. Define
and note that \(\varPhi _R^f(k) \nearrow \int f(s)\,k(\textrm{d}s)\), as \(R\rightarrow \infty \), pointwise for any k. By Lemma 5.1 we have that \(\varPhi _R^f\) is compactly supported and it can be easily seen that continuity of \(\chi _R\) and f imply that \(\varPhi _R^f\) is continuous. Therefore
Since this holds for any \(f\ge 0\), we can conclude that \(c_\lambda \le \mu \) and thus \(\lambda \in {\mathcal L }\).
(2) We only sketch the construction of a vague limit point. Fix a sequence \((\varepsilon _i)_{i\in \mathbb {N}}\), with \(\varepsilon _i\searrow 0\) as \(i\rightarrow \infty \), and define \(N_{\varepsilon _i}:= \{y\in {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}:y\ge \varepsilon _i\}\). Note that \(\varepsilon _i \alpha _n(N_{\varepsilon _i}) \le c_{\alpha _n}({\mathcal S }) \le 1\) implies that \(\alpha _n(N_{\varepsilon _i})\le 1/\varepsilon _i\). Thus, for fixed \(i\in \mathbb {N}\), the restricted measures \((\left. \alpha _n\right _{N_{\varepsilon _i}})_{n\in \mathbb {N}}\) are bounded and thus have a vaguely converging subsequence. By diagonalization we can construct for each \(i\in \mathbb {N}\) a subsequence of \((\alpha _n)_{n\in \mathbb {N}}\) such that the restrictions to \(N_{\varepsilon _i}\) converge vaguely to some \(\alpha ^{{{({i}})}} \in {\mathcal M }(N_{\varepsilon _i})\) and in such a way that \(\left. \alpha ^{{{({i+1}})}}\right _{N_{\varepsilon _i}} = \alpha ^{{{({i}})}}\) holds for all \(i\in \mathbb {N}\). Thus the monotone limit \(\lim _{i \rightarrow \infty } \alpha ^{{{({i}})}} =:\alpha \) on \({\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}\) is a countably additive extension of the measures \(\alpha ^{{{({i}})}}\), \(i\in \mathbb {N}\). Since every compactly supported test function has its support contained in \(N_{\varepsilon _i}\) for some i by Lemma 5.1 one sees that in the vague topology \(\alpha _n \rightarrow \alpha \).
To see that \(\alpha \in {\mathcal A}\) (i.e., that \({\mathcal A}\) is closed) we can proceed as in the proof for \({\mathcal L }\). Note that for \(\varepsilon >0\) we can find a smooth function \(\chi _\varepsilon :[0,\infty ) \rightarrow [0, \infty )\) satisfying \(\mathbbm {1}_{[2\varepsilon , \infty )}\le \chi _\varepsilon \le \mathbbm {1}_{[\varepsilon ,\infty )}\), which allows us to define \(\varPhi _\varepsilon ^f\) as above but by truncating with \(\chi _\varepsilon \). \(\square \)
5.1 Discretization and approximation
Let \((P_m)_{m\in \mathbb {N}}\) be a sequence of finite partitions of \({\mathcal S }\) into nonempty sets. For \(m\in \mathbb {N}\) we denote \(P_m = \{A_{m,i}:i=1, \ldots , P_m\}\). We say that \((P_m)_{m\in \mathbb {N}}\) is nested if for \(m\le n\) and any \(A_{m,i}\in P_m\) there is \(J\subset \{1, \dots , P_n\}\), such that \(A_{m,i} = \bigcup _{j\in J} A_{n,j}\). For any subset A we write \(\text {diam}(A) :=\sup \{d(x,y):x,y\in A\}\). For any measure \(\nu \) on \({\mathcal S }\) a set \(A\subset {\mathcal S }\) is called a continuity set of \(\nu \) if \(\nu (\partial A)=0\). The following lemma is a modification of Lemma 7.1 from [7].
Lemma 5.3
There exists a sequence of finite partitions \((P_m)_{m\in \mathbb {N}}\) of \({\mathcal S }\) with the following properties:

1.
For any \(m\in \mathbb {N}\) and any \(i=1, \ldots , P_m\) we have that \(A_{m,i}\) is measurable and a continuity set of \(\mu \) and \(\mu _N\) for all \(N\in \mathbb {N}\).

2.
\((P_m)_{m\in \mathbb {N}}\) is nested.

3.
It holds that
$$\begin{aligned} \lim _{m\rightarrow \infty }\max _{i = 1, \ldots , P_m}\text {diam}(A_{m,i}) = 0. \end{aligned}$$(5.4)
Proof
The proof can be done as in the proof of Lemma 7.1 in [7] with two small modifications. Let \(A\subset {\mathcal S }\) be the set of points that are atoms of \(\mu \) or \(\mu _N\) for some \(N\in \mathbb {N}\). Then A is still a countable set, so we may pick the balls \(B_{mi}\) in a way such that they are continuity sets of \(\mu \) and \(\mu _N\) for all \(N\in \mathbb {N}\). Since in our case the set \({\mathcal S }\) is compact, we can cover it with finitely many of these balls, hence we get the stronger property formulated in (5.4). \(\square \)
In the following we always assume that \((P_m)_{m\in \mathbb {N}}\) has all the properties given in Lemma 5.3. For \(m\in \mathbb {N}\) and any \(A_{m,i}\in P_m\) we pick exactly one point \(x_{m,i}\) from the set \(A_m\) which we call the representative of \(A_{m,i}\). We define
and define the projection
where i is such that \(A_{m,i}\) is the unique set containing x.
Further we can lift the projection to the space \({\mathcal M }({\mathcal S })\), the space of finite measures on \({\mathcal S }\). For any \(m\in \mathbb {N}\) we introduce (by abuse of notation)
Going one level further, we also define
We can now apply the projections to all the levels of our graph setting. On the first level of discretization, i.e., from \({\mathcal S }\) to \({\mathcal S }_m\), we approximate the type of a vertex by some type from the discrete set \({\mathcal S }_m\). On the second level, we approximate the type configuration of a vertex set which, depending on the context, may or may not be a cluster. On the third level, we approximate measures that count the multiplicities of type configurations and are therefore suited to register the number of clusters described by the different type configurations.
Fix \(m\in \mathbb {N}\). We write \(\pi _m({{\textbf {x}}}) =(\pi _m(x_1), \ldots , \pi _m(x_N))\in ({\mathcal S }_m)^N\) for the discretized type sequence and denote by \(\mu _N^{{{({m}})}}=\frac{1}{N}\sum _{i=1}^N\delta _{\pi _m(x_i)}\) its empirical measure. It is easy to check that \(\mu _N^{{{({m}})}}=\pi _m(\mu _N)\). Since our partition \(P_m\) has carefully been chosen such that the sets \(A_{m,i}\in P_m\) are continuity sets of \(\mu \), it holds that \(\mu _N^{{{({m}})}}\) converges weakly to \(\mu ^{{{({m}})}} :=\pi _m(\mu )\), as \(N\rightarrow \infty \). We also define \(\eta _{\pi _m({{\textbf {x}}})} :{\mathcal P }([N]) \rightarrow {\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m)\) as the discrete analog of the type registering measure \(\eta _{{{\textbf {x}}}}\) from (1.1), i.e., for any \(A\subset [N]\) we have that \(\eta _{\pi _m({{\textbf {x}}})} = \sum _{i\in A}\delta _{\pi _m(x_i)} \in {\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m)\). Abbreviating \(\eta _m = \eta _{\pi _m({{\textbf {x}}})}\) we can now define
It is straightforward to show that \(\text {Mi}_N^{{{({m}})}} = \pi _m(\text {Mi}_N)\) and \(\text {Ma}_N^{{{({m}})}} = \pi _m(\text {Ma}_N)\). Both empirical measures \(\text {Mi}_N^{{{({m}})}}\) and \(\text {Ma}_N^{{{({m}})}}\) evaluate type information given by the vertices of the clusters only roughly by approximating the type of each vertex by an element from \({\mathcal S }_m\). Note that \(c_{\text {Mi}_N^{{{({m}})}}} = \pi _m(c_{\text {Mi}_N}) \le \mu ^{{{({m}})}}_N\) and \(c_{\text {Ma}_N^{{{({m}})}}} = \pi _m(c_{\text {Ma}_N}) \le \mu ^{{{({m}})}}_N\). Hence, the natural state spaces for the discretized empirical measures are given by
and we endow them with vague topologies.
At this point it is important to note that we cannot apply the discrete LDP from Theorem 3.1 directly to get an LDP for the measure \(\mathbb {P}_N((\text {Mi}_N^{{{({m}})}},\text {Ma}_N^{{{({m}})}})\in \cdot )\), since the edges of the random graph \({\mathcal G }_N\) are drawn according to the nondiscretized types of the vertices. Before applying Theorem 3.1 one has to approximate the underlying graph itself by a discrete version.
Let \(\kappa _N^{{{({m}})}}:{\mathcal S }_m \times {\mathcal S }_m \rightarrow [0,\infty )\), \(N\in \mathbb {N}\), be a sequence of kernels on \({\mathcal S }_m\). We will specify later, in which sense they should be an approximation for \(\kappa _N\). Consider the inhomogeneous random graph \({\mathcal G }_N^{{{{({m}})}}}={\mathcal G }([N],\pi _m({{\textbf {x}}}),\frac{1}{N}\kappa _N^{{{{({m}})}}})\) and denote by \(({\mathcal C }^{{{({m}})}}_i)_i\) the collection of the vertex sets of its connected components.
Instead of choosing just one approximating kernel, we will consider a lower and an upper approximation for \(\kappa _N\), which will allow us to find upper and lower bounds for the distribution \(\mathbb {P}_N((\text {Mi}_N^{{{({m}})}},\text {Ma}_N^{{{({m}})}})\in \cdot )\). For fixed \(m\in \mathbb {N}\) let \(\kappa _N^{{{({m,}})}}\) and \(\kappa _N^{{{({m,+}})}}\), \(N\in \mathbb {N}\), be two sequences of kernels on \({\mathcal S }_m\) satisfying
An obvious choice is given by
Defining \(\kappa ^{{{({m, *}})}} = \lim _{N\rightarrow \infty }\kappa _N^{{{({m, *}})}}\) for both \(*\in \{+,\}\) it is obvious that
and we see that
For both \(*\in \{+,\}\) we will denote by \(\mathbb {P}_N^{{{({m,*}})}}\) the probability measure corresponding to the graph \({\mathcal G }_N^{{{({m,*}})}}={\mathcal G }([N],\pi _m({{\textbf {x}}}),\frac{1}{N}\kappa _N^{{{({m,*}})}})\).
The following comparison lemma shows how we can estimate the distribution \(\mathbb {P}_N((\text {Mi}_N^{{{({m}})}},\text {Ma}_N^{{{({m}})}})\in \cdot )\) from below and above.
Lemma 5.4
Fix \(m\in \mathbb {N}\) and let \(\kappa _N^{{{({m,}})}}\) and \(\kappa _N^{{{({m,+}})}}\), \(N\in \mathbb {N}\), be two sequences of kernels on \({\mathcal S }_m\) satisfying (5.12). Assume that \(N\in \mathbb {N}\) is large enough such that \(\frac{1}{N}\kappa _N^{{{({m,+}})}} \le 1\). Then, for any \(\ell \in {\mathcal M }_{\mathbb {N}_0}({\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m))\),
where
Proof
As in the discrete setting of Sect. 3.1 we can identify elements from \({\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m)\) with elements from \(\mathbb {N}_0^{{\mathcal S }_m}\) and identify \(\ell \) with \((\ell _k)_{k\in \mathbb {N}_0^{{\mathcal S }_m}}\). For \(*\in \{+,\}\) and any \(k\in \mathbb {N}_0^{{\mathcal S }_m}\) we write
with \({{\textbf {x}}}_k\in {\mathcal S }_m^{k}\) any kdimensional vector compatible with k. The main idea is that we identify in the exact formula (3.15) in Lemma 3.3 those terms that are increasing in \(\kappa \) and those that are decreasing in \(\kappa \). Indeed, the connection probabilities are increasing in \(\kappa \) since the event of being connected is increasing in the edge parameter. Indeed, for any \(k\in \mathbb {N}_0^{{\mathcal S }_m}\) and \({{\textbf {x}}}\in {\mathcal S }^{k}\) such that \(\pi _m(\sum _{i=1}^{k}\delta _{x_i})=k\),
Furthermore, the powers of \((1\frac{1}{N}\kappa (\cdot ))\) that describe the probabilities of not being connected are increasing for negative powers, and decreasing for positive powers. With those observations and combinatorial factors we can estimate
The lower bound and its proof are analogous. \(\square \)
5.2 Projective system
Using our discretization scheme from Sect. 5.1 we now introduce a projective system that fits into the framework of [19, Section 4.6]. Recall the notions introduced at the beginning of Sect. 5.1, in particular the family of finite partitions \((P_m)_{m\in \mathbb {N}}\) that has the properties formulated in Lemma 5.3 and the definition (5.5) of the discretized type spaces \({\mathcal S }_m\). For any pair \(m,n \in \mathbb {N}\) with \(m \le n \) we define
where \(x_{m,i}\) is the representative of \(A_{m,i}\) and \(A_{m,i}\) is the unique set in \(P_m\) containing \(x_{n,j}\). As before, we lift this definition to the measure spaces by defining (with abuse of notation)
as well as
It is easy to check that restricting the latter mapping to \({\mathcal L }_n\) and \({\mathcal A}_n\), respectively, gives us two welldefined mappings \(\pi _{m,n}:{\mathcal L }_n \rightarrow {\mathcal L }_m\) and \(\pi _{m,n}:{\mathcal A}_n \rightarrow {\mathcal A}_m\), respectively. Now, \(({\mathcal L }_m, \pi _{m,n})_{m\le n}\) (or \(({\mathcal A}_m, \pi _{m,n})_{m\le n}\)) is called a projective system, if

for any \(m\in \mathbb {N}\) the space \({\mathcal L }_m\) is a Hausdorff topological space,

for any \(m,n\in \mathbb {N}\) with \(m\le n\) the mapping \(\pi _{m,n}:{\mathcal L }_n \rightarrow {\mathcal L }_m\) is continuous,

for any \(m,n,p \in \mathbb {N}\) with \(m\le n\le p\) we have \(\pi _{m,p} = \pi _{m,n} \circ \pi _{n,p}\).
The projective limit of the projective system \(({\mathcal L }_m, \pi _{m,n})_{m\le n}\) is denoted by \(\varprojlim {\mathcal L }_m\) and is defined as the subset of the product space \(\prod _{m\in \mathbb {N}} {\mathcal L }_m\) that contains all elements \((\lambda _m)_{m\in \mathbb {N}}\) that satisfy \(\pi _{m,n}(\lambda _n) = \lambda _m\) for any pair \(m,n\in \mathbb {N}\) with \(m\le n\). The projective limit \(\varprojlim {\mathcal L }_m\) is equipped with the topology that is induced by the product topology on \(\prod _{m\in \mathbb {N}} {\mathcal L }_m\). We call this the projective limit topology. In particular, a sequence \(\lambda ^{{{({n}})}}\in \varprojlim {\mathcal L }_m\) converges to \(\lambda \in \varprojlim {\mathcal L }_m\) as \(n\rightarrow \infty \) if it holds for any \(m\in \mathbb {N}\) that \(\lambda ^{{{({n}})}}_m\) converges to \(\lambda _m\) as \(n\rightarrow \infty \).
Lemma 5.5
\(({\mathcal L }_m, \pi _{m,n})_{m\le n}\) and \(({\mathcal A}_m, \pi _{m,n})_{m\le n}\) are projective systems.
Proof
The fact that \({\mathcal L }_m\) and \({\mathcal A}_m\) are Hausdorff topological spaces is a consequence of the equivalence of their topologies with the discrete topologies that we described in Sect. 3.1. For \(m\le n\), the continuity of \(\pi _{m,n}\) is easily verified for the lowest level, i.e., on \({\mathcal S }_n \rightarrow {\mathcal S }_m\), and then lifted to the higher levels. Indeed, for each open set \({\mathcal O }\in {\mathcal M }({\mathcal S }_m)\) (which can be identified as an open set in \(\mathbb {R}_{\ge 0}^{{\mathcal S }_m}{\setminus }\{0\}\)), we see that \(\pi ^{1}_{m,n}({\mathcal O })\) is an open set in \({\mathcal M }({\mathcal S }_m)\). The same is true at the higher level with open sets in \({\mathcal L }_m\) and \({\mathcal A}_m\), which are images of open sets in \({\mathcal L }_n\) and \({\mathcal A}_n\). \(\square \)
Throughout this section we will denote \({\mathcal L }_\infty = \varprojlim {\mathcal L }_m\) and \({\mathcal A}_\infty = \varprojlim {\mathcal A}_m\). The aim of this section is to prove the following
Proposition 5.6
The following assertions hold.

1.
The set \({\mathcal L }\times {\mathcal A}\) can be identified with \({\mathcal L }_\infty \times {\mathcal A}_\infty \).

2.
The projective limit topology on \({\mathcal L }_\infty \times {\mathcal A}_\infty \) is equivalent to the vague topology on \({\mathcal L }\times {\mathcal A}\).
The proof will be a consequence of the following lemmas. In Lemma 5.8 we explain how to project from \({\mathcal L }\times {\mathcal A}\) to \({\mathcal L }_\infty \times {\mathcal A}_\infty \). As a direct consequence of Lemma 5.9 we get that this operation is continuous. Afterwards, we deal with the inverse operation. In Lemma 5.10 we show the existence of the inverse and in Lemma 5.12 we argue that it is continous.
Here is a basic property of the mappings \(\pi _m\) and \(\pi _{m,n}\), which we need to prepare for Lemma 5.8.
Lemma 5.7
Let \(m\le n\). Then the equality \(\pi _m = \pi _{m,n}\circ \pi _n\) holds on all levels, i.e., for all mappings that were defined in (5.6)–(5.8) and (5.21)–(5.23).
Proof
On the lowest level, i.e., on \({\mathcal S }\rightarrow {\mathcal S }_m\), the equality is a direct consequence of the fact that \((P_m)_{m\in \mathbb {N}}\) is nested. On the higher levels, the equality follows by the definition of the image measure. \(\square \)
Lemma 5.8
The following holds.

1.
Let \(\lambda \in {\mathcal L }\). Then for any \(m\in \mathbb {N}\) we have that \(\pi _m(\lambda ) \in {\mathcal L }_m\). Further, we have that the sequence \((\pi _m(\lambda ))_{m\in \mathbb {N}}\) is an element of the projective limit \({\mathcal L }_\infty \).

2.
Let \(\alpha \in {\mathcal A}\). Then for any \(m\in \mathbb {N}\) we have that \(\pi _m(\alpha ) \in {\mathcal A}_m\). Further, we have that the sequence \((\pi _m(\alpha ))_{m\in \mathbb {N}}\) is an element of the projective limit \({\mathcal A}_\infty \).
Consequently, the mapping \(\varPi :{\mathcal L }\times {\mathcal A}\rightarrow {\mathcal L }_\infty \times {\mathcal A}_\infty \), \((\lambda ,\alpha ) \mapsto \big ((\pi _m(\lambda ))_{m\in \mathbb {N}}, (\pi _m(\alpha )_{m\in \mathbb {N}})\big )\) is welldefined.
Proof
(1) For any \(\lambda \in {\mathcal L }\) and \(m\in \mathbb {N}\) we have that \(c_{\pi _m(\lambda )} = \pi _m(c_\lambda ) \le \pi _m({\tilde{\mu }})\), where \({\tilde{\mu }}\in \{\mu \}\cup \{\mu _N:N\in \mathbb {N}\}\), hence \(\pi _m(\lambda )\in {\mathcal L }_m\). As a direct consequence of Lemma 5.7 the sequence \((\pi _m(\lambda ))_{m\in \mathbb {N}}\) satisfies the consistency condition and is therefore an element of \({\mathcal L }_\infty \).
(2) For \(\alpha \in {\mathcal A}\) the proof is analogous. \(\square \)
Lemma 5.9
Fix any \(m\in \mathbb {N}\). The mappings \(\pi _m:{\mathcal L }\rightarrow {\mathcal L }_m\) and \(\pi _m:{\mathcal A}\rightarrow {\mathcal A}_m\) are continuous w.r.t. the vague topologies. Consequently, the mapping \(\varPi \) defined as in Lemma 5.8 is continuous.
Proof
On the lowest level, i.e., for \(\pi _m:{\mathcal S }\rightarrow {\mathcal S }_m\), it is obvious that \(\pi _m\) is continuous on the set \({\mathcal S }{\setminus } \bigcup _{i=1}^{P_m}\partial A_{m,i}\). Consider the second level, i.e., \(\pi _m:{\mathcal M }({\mathcal S }) \rightarrow {\mathcal M }({\mathcal S }_m)\). We claim that \(\pi _m\) is continuous on
w.r.t. weak convergence: Given some \(\nu \in {\mathcal N }_0\) and a sequence \((\nu _n)_{n\in \mathbb {N}}\) in \({\mathcal M }({\mathcal S })\) such that \(\nu _n\rightarrow \nu \) weakly as \(n\rightarrow \infty \) and some continuous bounded function \(f:{\mathcal S }_m \rightarrow \mathbb {R}\) we clearly have that
as \(n\rightarrow \infty \), since \(f\circ \pi _m\) is continuous \(\nu \)almost everywhere. Hence, \(\pi _m(\nu _n) \rightarrow \pi _m(\nu )\) weakly as \(n\rightarrow \infty \).
Now, consider \(\pi _m:{\mathcal L }\rightarrow {\mathcal L }_m\). For \(\lambda \in {\mathcal L }\) we have that \(c_\lambda \le {\tilde{\mu }}\), where \({\tilde{\mu }}\in \{\mu \}\cup \{\mu _N:N\in \mathbb {N}\}\). This implies that for any \(i=1, \dots , P_m\) we have that
by Lemma (5.3). With other words, \(\lambda \) is concentrated on a subset of the set \({\mathcal N }_0\) given in (5.24). Now, let \((\lambda _n)_{n\in \mathbb {N}}\) be a sequence in \({\mathcal L }\) that converges vaguely to \(\lambda \) as \(n\rightarrow \infty \). Then for any function \(g:{\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m)\rightarrow \mathbb {R}\) that is continuous and compactly supported, we have that
as \(n\rightarrow \infty \), since \(g\circ \pi _m\) is compactly supported and continuous \(\lambda \)almost everywhere. Hence, \(\pi _m(\lambda _n) \rightarrow \pi _m(\lambda )\) vaguely as \(n\rightarrow \infty \).
The proof for \(\pi _m:{\mathcal A}\rightarrow {\mathcal A}_m\) is analogous. \(\square \)
In the next lemmas we will deal with the construction of the inverse of the projection mapping \(\varPi \) and verify its continuity. This requires, for any \(m\in \mathbb {N}\), to identify measures \(\lambda _m \in {\mathcal L }_m\) with measures \({\bar{\lambda }}_m\in {\mathcal L }\). To prepare for this identification we now define for any \(m\in \mathbb {N}\)
where for any measure \(\nu \) on some measure space \({\mathcal X }\) and any measurable \({\mathcal U }\subset {\mathcal X }\) we say that \(\nu \) is concentrated on \({\mathcal U }\) if \(\nu ({\mathcal X }{\setminus } {\mathcal U }) = 0\). It is clear that \({\mathcal M }_m({\mathcal S })\) can be identified with \({\mathcal M }({\mathcal S }_m)\). Observe that this is possible because \({\mathcal S }_m\subset {\mathcal S }\), as we have defined it via the representatives of each partition. For any \(\nu _m\in {\mathcal M }({\mathcal S }_m)\) we will denote the corresponding element by \({\bar{\nu }}_m \in {\mathcal M }_m({\mathcal S })\) and we will write \({\bar{\pi }}_m:{\mathcal M }({\mathcal S }) \rightarrow {\mathcal M }_m({\mathcal S })\) for the mapping that we obtain by concatenating \(\pi _m:{\mathcal M }({\mathcal S }) \rightarrow {\mathcal M }({\mathcal S }_m)\) as defined in (5.7) with the operation \(\nu _m\mapsto {\bar{\nu }}_m\). Then we can identify the space \({\mathcal L }_m\) that was defined in (5.10) with
For any \(\lambda _m \in {\mathcal L }_m\) we will denote the corresponding element by \({\bar{\lambda }}_m\in {\bar{{\mathcal L }}}_m\) and we will write \({\bar{\pi }}_m :{\mathcal L }\rightarrow {\bar{{\mathcal L }}}_m\) for the mapping that we obtain by concatenating \(\pi _m:{\mathcal L }\rightarrow {\mathcal L }_m\) with the operation \(\lambda _m \mapsto {\bar{\lambda }}_m\). In the same way, we identify the space \({\mathcal A}_m\) that was defined in (5.11) with
Now we construct the inverse of the projection mapping \(\varPi \) that was defined in Lemma 5.8.
Lemma 5.10
The following assertions hold.

1.
Let \((\lambda _m)_{m\in \mathbb {N}}\in {\mathcal L }_\infty \). Then there exists a unique \(\lambda \in {\mathcal L }\) such that \(\lambda _m = \pi _m(\lambda )\) holds for all \(m\in \mathbb {N}\).

2.
Let \((\alpha _m)_{m\in \mathbb {N}}\in {\mathcal A}_\infty \). Then there exists a unique \(\alpha \in {\mathcal A}\) such that \(\alpha _m = \pi _m(\alpha )\) holds for all \(m\in \mathbb {N}\).
Consequently, the mapping \(\varPi \) defined in Lemma 5.8 is bijective with inverse \(\varPi ^{1}\).
Proof
Fix \((\lambda _m)_{m\in \mathbb {N}}\in {\mathcal L }_\infty \). The idea is to identify for any \(m\in \mathbb {N}\) the measure \(\lambda _m \in {\mathcal L }_m\) uniquely with the element \({\bar{\lambda }}_m\in {\bar{{\mathcal L }}}_m\) and to prove that the sequence \(({\bar{\lambda }}_m)_{m\in \mathbb {N}}\) has a vague limit point in \({\mathcal L }\), which we will denote by \(\lambda \). It then remains to show that \(\pi _m(\lambda ) = \lambda _m\) holds for any \(m\in \mathbb {N}\).
Next, we will argue for the existence of a vague limit point of \(({\bar{\lambda }}_m)_{m\in \mathbb {N}}\). As an element of \({\mathcal L }_\infty \) the sequence \((\lambda _m)_{m\in \mathbb {N}}\) satisfies the consistency condition \(\lambda _m = \pi _{m,n}(\lambda _n)\) for any \(m\le n\). Abbreviating \({\mathcal M }_{\mathbb {N}_0}:= {\mathcal M }_{\mathbb {N}_0}({\mathcal S }) {\setminus }\{0\}\) and \({\mathcal M }_{\mathbb {N}_0,m}:= {\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m) {\setminus }\{0\}\), we get that
and consequently, \({\bar{\lambda }}_m\), \(m\in \mathbb {N}\), is of constant total variation. Note that the measures \({\bar{\lambda }}_m\), \(m\in \mathbb {N}\), are in the dual of \(C_0({\mathcal M }_{\mathbb {N}_0})\) and that, due to the Banach–Alaoglu theorem, they are compact w.r.t. the \(\hbox {weak}^*\)topology, which implies compactness w.r.t. the vague topology on \({\mathcal L }\). Hence, there exists a vague limit point \(\lambda \in {\mathcal M }({\mathcal M }_{\mathbb {N}_0})\) and a subsequence \(({\bar{\lambda }}_{m_i})_{i\in \mathbb {N}}\) in \({\mathcal L }\) converging vaguely to \(\lambda \). Since \({\mathcal L }\) is compact by Lemma 5.2, we also have that \(\lambda \in {\mathcal L }\).
Next, we fix \(m\in \mathbb {N}\) and our goal is to show that \(\lambda _m= \pi _m(\lambda )\). Observe that as a consequence of our identification between \(\lambda _n\) and \({\bar{\lambda }}_n\) we have that \(\pi _n({\bar{\lambda }}_n) = \lambda _n\) for any \(n\in \mathbb {N}\). Together with the consistency and Lemma 5.7 we get for \(n\ge m\) that
Choosing a subsequence \(({\bar{\lambda }}_{n_i})_{i\in \mathbb {N}}\) that converges vaguely to \(\lambda \), we get that \( \lambda _m = \lim _{i\rightarrow \infty } \pi _m({\bar{\lambda }}_{n_i}) = \pi _m(\lambda ) \) where we used the continuity of the mapping \(\pi _m\) that we showed in Lemma 5.9.
For a given \((\alpha _m)_{m\in \mathbb {N}} \in {\mathcal A}_\infty \) the proof is analogous. \(\square \)
To prepare for the proof of the continuity of \(\varPi ^{1}\) we need the following lemma.
Lemma 5.11
On bounded subsets of \({\mathcal M }({\mathcal S })\), the mapping \({\mathcal M }({\mathcal S }) \rightarrow {\mathcal M }({\mathcal S })\), \(\nu \mapsto {\bar{\pi }}_m(\nu )\), converges uniformly to the identity as \(m\rightarrow \infty \).
Proof
Recall that we equip \({\mathcal M }({\mathcal S })\) with the weak topology, which is generated by all the test integrals against continuous bounded functions \({\mathcal S }\rightarrow \mathbb {R}\). The weak topology on \({\mathcal M }({\mathcal S })\) admits a number of metrisations, especially since \({\mathcal S }\) is compact. We introduce the dual boundedLipschitz distance given by
where \({\left\Vert {\phi }\right\Vert }_\text {BL}= {\left\Vert {\phi }\right\Vert }_\infty + \text {Lip}(\phi )\) and \(\text {Lip}(\phi )\) is the infimum of all Lipschitz constants of \(\phi :{\mathcal S }\rightarrow \mathbb {R}\). Let \(\phi :{\mathcal S }\rightarrow \mathbb {R}\) satisfy \({\left\Vert {\phi }\right\Vert }_\text {BL} \le 1\) then
where \(x_{m,i}\) is the representative of \(A_{m,i}\) as defined right before definition (5.5). Now, if \({\mathcal N }\subset {\mathcal M }({\mathcal S })\) is bounded, i.e., \(\sup _{\nu \in {\mathcal N }} \nu ({\mathcal S }) < \infty \), then \(d_\text {BL}(\nu , \pi _m(\nu ))\) vanishes as \(m\rightarrow \infty \) uniformly on \({\mathcal N }\) by assumption (5.4). \(\square \)
In the next lemma we verify that the mapping \(\varPi ^{1}\) constructed in Lemma 5.10 is continuous.
Lemma 5.12
The following assertions hold.

1.
Let \(\lambda ^{{{({n}})}}\), \(n\in \mathbb {N}\), be a sequence in \({\mathcal L }\). Assume for all \(m\in \mathbb {N}\) that \(\pi _m(\lambda ^{{{({n}})}})\) converges to \(\pi _m(\lambda )\) vaguely in \({\mathcal L }_m\) as \(n\rightarrow \infty \). Then \(\lambda ^{{{({n}})}}\) converges vaguely to \(\lambda \).

2.
Let \(\alpha ^{{{({n}})}}\), \(n\in \mathbb {N}\), be a sequence in \({\mathcal A}\). Assume for all \(m\in \mathbb {N}\) that \(\pi _m(\alpha ^{{{({n}})}})\) converges to \(\pi _m(\alpha )\) vaguely in \({\mathcal A}_m\) as \(n\rightarrow \infty \). Then \(\alpha ^{{{({n}})}}\) converges vaguely to \(\alpha \).
Consequently, the mapping \(\varPi ^{1}\) constructed in Lemma 5.10 is continuous.
Proof
(1) Let \(\lambda ^{{{({n}})}}\), \(n\in \mathbb {N}\), be a sequence in \({\mathcal L }\), such that for all \(m\in \mathbb {N}\) it holds that \(\pi _m(\lambda ^{{{({n}})}})\) converges to \(\pi _m(\lambda )\) vaguely in \({\mathcal L }_m\) as \(n\rightarrow \infty \). Abbreviate \({\mathcal M }_{\mathbb {N}_0}:= {\mathcal M }_{\mathbb {N}_0}({\mathcal S }) {\setminus }\{0\}\) and fix a continuous and compactly supported function \(g:{\mathcal M }_{\mathbb {N}_0}\rightarrow \mathbb {R}\). Recall the identification of \({\mathcal L }_m\) and \({\bar{{\mathcal L }}}_m\) introduced before Lemma 5.10. Clearly, we have for any \(m\in \mathbb {N}\) that
Note that \(\lambda ^{{{({n}})}}\in {\mathcal L }\) implies that \(\lambda ^{{{({n}})}} \le c_{\lambda ^{{{({n}})}}}({\mathcal S }) \le \sup _{N\in \mathbb {N}}\mu _N({\mathcal S }) \vee \mu ({\mathcal S }) = 1\) for all \(n\in \mathbb {N}\), and in the same way \(\lambda \le 1\). Further, the support of the function g is compact, and thus bounded. So by Lemma 5.11 we have that the mappings \({\bar{\pi }}_m\) restricted to the support of g converge uniformly to the identity, as \(m\rightarrow \infty \). Hence, we can first choose \(m\in \mathbb {N}\) sufficently large such that \(\big (\lambda ^{{{({n}})}}+ \lambda \big ) {\left\Vert {gg\circ {\bar{\pi }}_m}\right\Vert }_{\infty }\) is arbitrarily small, uniformly in n. Then we can use that, \({\bar{\pi }}_m(\lambda ^{{{({n}})}}) \rightarrow {\bar{\pi }}_m(\lambda )\), as \(n\rightarrow \infty \), holds by assumption, so the second summand on the righthand side of (5.31) vanishes as \(n\rightarrow \infty \).
(2) Let \(\alpha ^{{{({n}})}}\), \(n\in \mathbb {N}\), be a sequence in \({\mathcal A}\), such that for all \(m\in \mathbb {N}\) it holds that \(\pi _m(\alpha ^{{{({n}})}})\) converges to \(\pi _m(\alpha )\) vaguely in \({\mathcal A}_m\) as \(n\rightarrow \infty \). Recall that for all \(n\in \mathbb {N}\) the measures \(\alpha ^{{{({n}})}}\) are concentrated on \({\mathcal M }_{\le 1}({\mathcal S })\), i.e., on subprobability measures on \({\mathcal S }\), since for any \(y\in \text {supp}(\alpha ^{{{({n}})}})\), we have that \(y({\mathcal S }) \le c_{\alpha ^{{{({n}})}}}({\mathcal S }) \le \sup _{N\in \mathbb {N}}\mu _N({\mathcal S }) \vee \mu ({\mathcal S }) = 1\). The same holds for \(\alpha \). Hence, without loss of generality we can show vague convergence by considering any continuous compactly supported test function \(g:{\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\} \rightarrow \mathbb {R}\). By Lemma 5.1 there exists \(\varepsilon >0\) such that the support of g is contained in \(N_\varepsilon :=\{\nu \in {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}:\nu ({\mathcal S }) \ge \varepsilon \}\). Analogously to (5.31) we get
To bound the first summand observe the following: Since \(\alpha ^{{{({n}})}}\in {\mathcal A}\) one has that \(\varepsilon \alpha ^{{{({n}})}}(N_\varepsilon ) \le c_{\alpha ^{{{({n}})}}}({\mathcal S }) \le 1\) and hence \(\alpha ^{{{({n}})}}(N_\varepsilon ) \le 1/\varepsilon \). The same holds for \(\alpha \) and hence \(\alpha ^{{{({n}})}}(N_\varepsilon )+ \alpha (N_\varepsilon )\le 2/\varepsilon \). Also, by Lemma 5.11 the supremum in (5.32) vanishes, as \(m\rightarrow \infty \). So, by first choosing m large enough and then using that \({\bar{\pi }}_m(\alpha ^{{{({n}})}}) \rightarrow {\bar{\pi }}_m(\alpha )\), as \(n\rightarrow \infty \), the righthand side of (5.32) vanishes. \(\square \)
5.3 Dawson–Gärtner and identification of the rate function
As an easy consequence of Lemma 5.4 and the LDP of Theorem 3.1, we obtain upper and lower LDP bounds for \((\text {Mi}_N^{{{{({m}})}}},\text {Ma}_N^{{{{({m}})}}})\) with different rate functions. In order to formulate this (in particular, to identify the rate functions) we need to introduce additional notation.
For dealing with the microscopic clusters, we need the discretized version of the connection parameter \(\tau \) defined in (1.9), which now has to be understood with respect to the discretized kernels. For \(*\in \{+,\}\) and \(k\in {\mathcal M }({\mathcal S }_m){\setminus } \{0\}\) we write
where \((r_i)_{i=1, \ldots , k} \in {\mathcal S }_m^{k}\) is such that \(k = \sum _{i=1}^{k} \delta _{r_i} \in {\mathcal M }_{\mathbb {N}_0}({\mathcal S }_m)\) and \({\mathcal T }(k)\) is the set of spanning trees on [k] and we recall that \(k= k({\mathcal S }_m)\). Further, we define for \(*\in \{+,\}\)
where we recall that \(\mathbb {Q}_{\mu ^{{{({m}})}}}\) is the distribution of a Poisson point process on \({\mathcal S }_m\) with intensity measure \(\mu ^{{{({m}})}}\) and that \(\mathbb {H}\) denotes relative entropy between nonnormalized measures, defined as in (1.8). Again, we adopt the convention that \(I^{{{({m, *}})}}_\text {Me}(\nu ) =\infty \) if \(\frac{\textrm{d}\nu }{(\kappa ^{{{({m,*}})}}\nu ) \,\textrm{d}\mu ^{{{({m}})}}}\) does not exist. We have to be more careful in the definition of the macroscopic rate. For \(\alpha \in {\mathcal A}_m\) we define
where again we define \(I_\text {Ma}(\alpha )=\infty \), if it is not true that \(\alpha \)almost everywhere the density in the \(\log \)term exists. The definition of \({\hat{I}}_\text {Ma}\) also takes into account the possibility that the discretized kernel might not be irreducible. In particular if \(\kappa \) is irreducible, it is not always true that \(\kappa ^{{{({m,}})}}\) is also irreducible. In that case we have to additionally assume that \(\alpha \) is connectable with respect to \(\kappa ^{{{({m,*}})}}\) to get a finite rate, as it is formulated in the generalized version of Theorem 3.1, i.e., Theorem 3.2. We will comment on this in detail below.
When estimating the distribution of the pair \((\text {Mi}^{{{({m}})}},\text {Ma}^{{{({m}})}})\) under the measure \(\mathbb {P}_N\) by Lemma 5.4 we will get an additional error term. To deal with that we define for \((\lambda , \alpha ) \in {\mathcal L }_m\times {\mathcal A}_m\)
Corollary 5.13
(LDP bounds for \((\text {Mi}_N^{{{{({m}})}}},\text {Ma}_N^{{{{({m}})}}})\) under \({\mathcal G }_N\)) Assume that \(\mu _N\) converges to \(\mu \) as \(N\rightarrow \infty \). Let \(\kappa _N\) converge to a continuous kernel \(\kappa \) that is irreducible w.r.t. \(\mu \). Fix \(m\in \mathbb {N}\) and let \(\kappa _N^{{{({m,}})}}\) and \(\kappa _N^{{{({m,+}})}}\), \(N\in \mathbb {N}\), be two sequences of kernels on \({\mathcal S }_m\) satisfying (5.12). Then the distribution of \((\text {Mi}_N^{{{{({m}})}}},\text {Ma}_N^{{{{({m}})}}})\) under \(\mathbb {P}_N\) satisfies, as \(N\rightarrow \infty \), the upper largedeviations bound with rate function \(I^{{{{({m,+}})}}}\) and the lower largedeviations bound with rate function \(I^{{{{({m,}})}}}\).
Proof
It is easy to verify that
where \(\varDelta _N^{{{({m}})}}\) is defined as in (5.18). Note that irreducibility of \(\kappa \) implies irreducibility of \(\kappa ^{{{({m,+}})}}\). Hence, for the upper largedeviations bound we can apply Theorem 3.1 after using the upper bound from Lemma 5.4. For the lower bound we use the lower bound from Lemma 5.4 and the lower bound from Theorem 3.2 that also applies if \(\kappa ^{{{({m,}})}}\) is reducible. \(\square \)
Of course, the basic idea is to use Corollary 5.13 for very large m and the hope is that the discretized rate functions approximate the rate function I given in Theorem 1.1. However, the problem is that the lower bound can be arbitrarily bad due to the following issue. Observe that although \(\kappa \) is assumed to be irreducible with respect to \(\mu \), as a consequence of the definition of the lower approximation \(\kappa ^{{{({m,}})}}\) given in (5.13) we have to deal with the possibility that \(\kappa ^{{{({m,}})}}\) is not irreducible with respect to \(\mu \circ \pi _m^{1}\). This might even be the case for all \(m\in \mathbb {N}\). To illustrate this, we give a brief example.
Example 5.14
Choose \({\mathcal S }= [0,1]\), let \(\mu \) be the Lebesgue measure and \(\kappa _N(x,x^\prime )=\kappa (x,x^\prime ) = x x^\prime \) for \(x, x^\prime \in {\mathcal S }\) and all \(N\in \mathbb {N}\). Let \(\{P_m\}_{m\in \mathbb {N}}\) be any nested partition for \({\mathcal S }\) and let \({\mathcal S }_m\), \(m\in \mathbb {N}\), be any choice for the sets of representative points. Then for any \(m\in \mathbb {N}\) there exists some set \(A_m\in P_m\) such that \(0\in \partial A_m\) and \(\mu (A_m)>0\). Let \(x_m\) be the representative of \(A_m\). It can be directly verified that for any \(x^\prime _m\in {\mathcal S }_m\) we have that \(\kappa ^{{{({m,}})}}(x_m,x^\prime _m) = 0\), although \(\mu ^{{{({m}})}}(\{x_m\}) = \mu (A_m) > 0\). On the other hand, assumption (5.4) implies that \(\mu (A_m) \rightarrow 0\) as \(m\rightarrow \infty \) so there exists \(m_0\) such that for \(m\ge m_0\) we also have that \(\mu ^{{{({m}})}}({\mathcal S }_m{\setminus }\{x_m\})>0\) and hence \(\kappa ^{{{({m,}})}}\) is reducible with respect to \(\mu ^{{{({m}})}}\) for all \(m\ge m_0\). \(\Diamond \)
Regarding the approximation of the rate function I by its discrete version the problem of (missing) irreducibility enters our analysis only via the function \({\hat{I}}_\text {Ma}^{{{({m,}})}}\). Indeed, for \({\hat{I}}_\text {Ma}^{{{({m,+}})}}\) the additional case distinction given in (5.37) is not necessary, since \(\kappa ^{{{({m,+}})}}\) is always irreducible and hence every \(\alpha \) is connectable with respect to \(\kappa ^{{{({m,+}})}}\). In order to make the lower approximation work, we formulate additional assumptions on \(\alpha \). Since they are not satisfied for any \(\alpha \) we have to find a way how to deal with the other cases.
Lemma 5.15
(Identification of the rate function) Let \(\kappa :{\mathcal S }\times {\mathcal S }\rightarrow [0,\infty )\) be continuous and irreducible with respect to \(\mu \). Assume that \(\kappa ^{{{{({m,+}})}}},\kappa ^{{{{({m,}})}}}:{\mathcal S }_m\times {\mathcal S }_m\rightarrow [0,\infty )\) are given such that \(\kappa ^{{{{({m,}})}}}\le \kappa ^{{{{({m,+}})}}}\) and
monotonously decreasing for \(*=+\) and monotonously increasing for \(*=\). We abbreviate \( \varPi _m(\lambda , \alpha ):= (\pi _m(\lambda ), \pi _m(\alpha ))\) for \(m\in \mathbb {N}\), \((\lambda , \alpha ) \in {\mathcal L }\times {\mathcal A}\). Then the rate function I introduced in Theorem 1.1 satisfies the following.

1.
For any \((\lambda ,\alpha )\in {\mathcal L }\times {\mathcal A}\) it holds that \(I^{{{{({m,+}})}}}(\varPi _m(\lambda ,\alpha ))\nearrow I(\lambda ,\alpha )\) as \(m\rightarrow \infty \).

2.
Let \((\lambda ,\alpha )\in {\mathcal L }\times {\mathcal A}\) and assume that \(\alpha \) satisfies the following: there exists some \(m_0\in \mathbb {N}\) such that for all \(m\ge m_0\) the measures \(\pi _m(\alpha )\) are connectable with respect to \(\kappa ^{{{({m,}})}}\). Then \(I^{{{{({m,}})}}}(\varPi _m(\lambda ,\alpha ))\rightarrow I(\lambda ,\alpha )\) as \(m\rightarrow \infty \).
Proof
Fix \(\lambda \in {\mathcal L }\) and \(\alpha \in {\mathcal A}\). Also, we will denote \(\lambda _m = \pi _m(\lambda )\) and \(\alpha _m = \pi _m(\alpha )\).
We first assume that \(c_\lambda +c_\alpha \le \mu \). It is straightforward to show that this implies that \(c_{\lambda _m}+ c_{\alpha _m} \le \mu ^{{{({m}})}}\) for all \(m\in \mathbb {N}\). We denote \(\nu _m = \mu ^{{{({m}})}}  c_{\lambda _m}  c_{\alpha _m}\).
We will see in the proof that in each of the terms that we handle, the main part is an entropy between two image measures under \(\pi _m\) plus a perturbation and other terms that will turn out to converge monotonically as \(m\rightarrow \infty \). Then we will use [22, Prop. 15.6], which says that the named entropy converges, as \(m\rightarrow \infty \), to its supremum over m.
For this, we will handle each of the four terms in (5.34)–(5.36) separately.
Step 1: term \(C^{{{({*}})}}_m\). It is easy to deduce that \(\lim _{m\rightarrow \infty }C^{{{({+}})}}_m=\frac{1}{2}\langle \mu ,\kappa \mu \rangle =\lim _{m\rightarrow \infty }C^{{{({}})}}_m\) and that \(C^{{{({*}})}}_{m}\) is decreasing in m for \(*=+\) and increasing for \(*=\). Note that \(I^{{{{({m,+}})}}}\) is defined with \(C^{{{({}})}}_{m}\), so it has the right direction of monotonicity.
Step 2: term \({\hat{I}}^{{{({m,*}})}}_{\text {Mi}}(\lambda _m)\). Let us turn to the part \({\hat{I}}^{{{({m,*}})}}_{\text {Mi}}(\lambda _m)\). Recall that \(\lambda _m=\lambda \circ \pi _m^{1}\).
Since \(\lambda _m=\lambda \circ \pi _m^{1}\) and \(\mathbb Q_{\mu ^{{{({m}})}}}={\mathbb {Q}}_{\mu }\circ \pi _m^{1}\) by the mapping theorem for Poisson point processes, [22, Prop. 15.6] implies that \(\mathbb {H}(\lambda _m{\mathbb {Q}}_{\mu ^{{{({m}})}}})\) converges towards \(\mathbb {H}(\lambda {\mathbb {Q}}_\mu )\), and \(\mathbb {H}(\lambda _m\mathbb Q_{\mu ^{{{({m}})}}})\) is increasing in m. According to our assumption in (5.16), using the monotone convergence theorem, we see that the term \(\langle \lambda _m, \log \tau _m^{{{({*}})}}\rangle \) converges towards \(\langle \lambda ,\log \tau \rangle \) (also if \(\langle \lambda ,\log \tau \rangle = \infty \)). For \(*=+\) the convergence is from below, as desired. It is easy to verify that \(c_{\lambda _m}\lambda _m= c_{\lambda }\lambda \) holds for all m. Hence, we have shown that \({\hat{I}}_\text {Mi}^{{{{({m,+}})}}}(\pi _m(\lambda ))\nearrow {\hat{I}}_\text {Mi}(\lambda )\) and \({\hat{I}}_\text {Mi}^{{{{({m,}})}}}(\pi _m(\lambda ))\rightarrow {\hat{I}}_\text {Mi}(\lambda )\) as \(m\rightarrow \infty \).
Step 3: term \({\hat{I}}^{{{({m,*}})}}_\text {Me}(\nu _m)\). Now we turn to the mesoscopic term \({\hat{I}}^{{{({m,*}})}}_\text {Me}(\nu _m)\). Note that for \(\nu = \mu c_\lambda  c_\alpha \) the definition of the image measure implies that \(\nu _m=\nu \circ \pi _m^{1}\). Further, we have that
Hence, we may apply [22, Prop. 15.6] to the first term and see that \(\langle \nu _m, \log \frac{\textrm{d}\nu _m}{\textrm{d}\mu ^{{{({m}})}}}\rangle \) converges as \(m\rightarrow \infty \) to its supremum on m, and for the second term we use (5.16) to see that \(\lim _{m\rightarrow \infty }\langle \nu _m,\log (\kappa ^{{{{({m,*}})}}}\nu _m)\rangle =\langle \nu ,\log (\kappa \nu )\rangle \) holds by the monotone convergence theorem (also if \(\langle \nu , \log (\kappa \nu )\rangle = \infty \)). For \(*=+\) the sequence is increasing as desired.
Step 4: term \({\hat{I}}^{{{({m,*}})}}_\text {Ma}(\alpha _m)\). Finally, we turn to the macroscopic term \({\hat{I}}^{{{({m,*}})}}_\text {Ma}(\alpha _m)\). Note that irreducibility of \(\kappa \) with respect to \(\mu \) implies irreducibility of \(\kappa ^{{{({m,+}})}}\) with respect to \(\mu ^{{{({m}})}}\). Hence the measures \(\pi _m(\alpha )\), \(m\in \mathbb {N}\), are connectable. In the setting of statement (2) this is true for \(m\ge m_0\) by assumption. Therefore, without loss of generality, we assume \(m\ge m_0\). Applying the definition of the image measure we have that
where we wrote \(y_m = y\circ \pi _m^{1}\). Now the convergence (including the desired monotonicity) can be argued in the same way as for the microscopic and the mesoscopic terms above, using the monotonicity of the entropy and our assumption in (5.16).
This finishes the proof of (1) and (2) in the case \(c_\lambda + c_\alpha \le \mu \).
Finally, we are considering the case that \(\lambda \in {\mathcal L }\) and \(\alpha \in {\mathcal A}\) do not satisfy \(c_\lambda + c_\alpha \le \mu \). Then there exists a \(\mu \)continuity set \(A\subset {\mathcal S }\) such that \(c_\lambda (A) + c_{\alpha }(A) > \mu (A)\). Then with \(A_m= \pi _m(A)\) we have that \(\pi _m^{1}(A_m) \supseteq A\) and hence \( c_{\lambda _m}(A_m) + c_{\alpha _m}(A_m) \ge c_\lambda (A) + c_\alpha (A)\). Now, let \(0<\varepsilon < c_\lambda (A)+c_\alpha (A)\mu (A)\). By Lemma 5.11 we have that \({\bar{\pi }}_m(\mu )\rightarrow \mu \) weakly, as \(m\rightarrow \infty \). Hence, we can choose \(m_0\) large enough such that \(\mu (A) \ge {\bar{\pi }}_m(\mu )(A)  \varepsilon /2 = \pi _m(\mu )(A_m)  \varepsilon /2\) holds for all \(m\ge m_0\). Consequently, \(c_{\lambda _m}(A_m) + c_{\alpha _m}(A_m) \ge \pi _m(\mu )(A_m) + \varepsilon /2\). Therefore \(I^{{{({m,*}})}}(\varPi _m(\lambda , \alpha )) = \infty \) for any \(m\ge m_0\) and \(*\in \{+,\}\). \(\square \)
Now we derive an upper and a lower bound LDP for the distribution of \((\text {Mi}_N,\text {Ma}_N)\) under \({\mathcal G }_N\) by following the two parts of the proof of the Dawson–Gärtner theorem, [19, Theorem 4.6.1] and using the fact that due to Proposition 5.6 it is sufficient to work with open and closed sets from the projective limit topology.
Lemma 5.16
(LDP—upper bound) Suppose that all assumptions of Theorem 1.1 are satisfied. Then the distribution of \((\text {Mi}_N,\text {Ma}_N)\) under \(\mathbb {P}_N\) satisfies the upper bound part of the LDP with rate function I as defined in Theorem 1.1.
Proof
Fix a set \(F\subset {\mathcal L }\times {\mathcal A}\) that is closed with respect to the vague topology. Then by Proposition 5.6 the set F is also closed with respect to the projective limit topology. For any \(m\in \mathbb {N}\) we use the notation \(\varPi _m(\lambda , \alpha ) := (\pi _m(\lambda ), \pi _m(\alpha ))\) and recall that \((\text {Mi}_N^{{{{({m}})}}}, \text {Ma}_N^{{{{({m}})}}}) = \varPi _m(\text {Mi}_N,\text {Ma}_N)\). Therefore,
where we used Corollary 5.13 for the second inequality. Since the lefthand side does not depend on m, we can proceed with the supremum over \(m\in \mathbb {N}\) on the righthand side. By Lemma 5.15 we have that \(\sup _{m}I^{{{({m,+}})}}\circ \varPi _m = I\), which implies the claim. \(\square \)
It remains to prove the lower bound of the LDP formulated in Theorem 1.1. Since the approximation of I from below via \(I^{{{({m,}})}}\) works only in the case where \(\alpha \) satisfies the assumptions given in the second statement of Lemma 5.15, we have to do some additional work. The idea is the following: given some \(\alpha \) that does not fulfill the assumptions, we will first approximate \(\alpha \) by some suitable choice for which the assumptions hold. Then we can apply Lemma 5.15.
Lemma 5.17
Let \(\alpha \in {\mathcal A}\) with \(c_\alpha \le \mu \) and \(I_\text {Ma}(\alpha ) < \infty \). Then there exists a sequence \((\alpha ^{{{({\delta , \varepsilon }})}})_{\delta> 0, \varepsilon >0}\) in \({\mathcal A}\) such that the following properties hold:

(1)
for fixed \(\delta >0\) and \(\varepsilon >0\) there exists \(m_0 = m_0(\delta )\) such that for all \(m\ge m_0\) the measures \(\pi _m(\alpha ^{{{({\delta , \varepsilon }})}})\) are connectable with respect to \(\kappa ^{{{({m,}})}}\);

(2)
\(\alpha ^{{{({\delta , \varepsilon }})}} \rightarrow \alpha \) as \(\delta \rightarrow 0\) and \(\varepsilon \rightarrow 0\) with respect to the vague topology;

(3)
for any \(\lambda \in {\mathcal L }\) we have that \(I(\lambda , \alpha ^{{{({\delta , \varepsilon }})}}) \rightarrow I(\lambda , \alpha )\) as \(\delta \rightarrow 0\) and \(\varepsilon \rightarrow 0\).
Proof
We always write \(\alpha = \sum _{i\in J}\delta _{y_i}\), where J is a countable set. The idea is to pick some type \(x\in {\mathcal S }\) and to restrict the measures \(y_i\in {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}\), \(i\in J\), to a subset \(S_{\delta }\) of \({\mathcal S }\) that contains all types \(x^\prime \in {\mathcal S }\) that can be connected to x by using a finite sequence of intermediate types \(x_{h1}, x_{h}\) for which we have \(\kappa (x_{h1},x_h)\ge \delta \). Then for large enough m connectivity is preserved with respect to \(\kappa ^{{{({m,}})}}\), which will imply (1). The parameter \(\varepsilon >0\) is only introduced to deal with the fact that J might be infinite and ensures the convergence claimed in (3).
Fix some \(x\in \text {supp}(\mu )\). For \(\delta >0\) define
We first show that \(\mu ({\mathcal S }{\setminus } S_{\delta }) \rightarrow 0 \) as \(\delta \rightarrow 0\), which we will need for the proof of (2) and (3). Observe that \(S_{\delta } \subset S_{\delta ^\prime }\) if \(\delta > \delta ^\prime \) and put \(S_0 := \bigcup _{\delta >0}S_{\delta }\). Since \(\kappa \) is irreducible with respect to \(\mu \) and \(x\in \text {supp}(\mu )\) it is easy to see that for \(\delta \) small enough we have that \(\mu (S_{\delta }) >0\) and hence \(\mu (S_0)>0\). We now argue that \(\kappa = 0\) \(\mu \)almost everywhere on \(S_0 \times {\mathcal S }{\setminus } S_0\). Assume the contrary, i.e., \(\int _{S_0}\int _{{\mathcal S }{\setminus } S_0}\kappa (x^\prime , x^{\prime \prime })\mu (\textrm{d}x^\prime ) \mu (\textrm{d}x^{\prime \prime }) > 0\). Then by continuity of \(\kappa \) we find sets of positive measure \(A\subset S_0\) and \(B\subset {\mathcal S }{\setminus } S_0\) where \(\delta ^\prime := \inf _{x^\prime \in A, x^{\prime \prime }\in B}\kappa (x^\prime , x^{\prime \prime }) >0\), which is a contradiction to the fact that \(B\subset {\mathcal S }{\setminus } S_0\). By the irreducibility of \(\kappa \) with respect to \(\mu \), the facts that \(\mu (S_0)>0\) and \(\kappa = 0\) holds \(\mu \)almost everywhere on \(S_0 \times {\mathcal S }{\setminus } S_0\) imply that \(\mu ({\mathcal S }{\setminus } S_0) = 0\), so by continuity of measures we have that \(\mu ({\mathcal S }{\setminus } S_{\delta }) \rightarrow 0 \) as \(\delta \rightarrow 0\).
Now for any \(\delta \ge 0\) and \(\varepsilon \ge 0\) we define
and where \(J_\varepsilon = \{i\in J :y_i>\varepsilon \}\). Note that \(c_\alpha  \le 1\) implies that for \(\varepsilon >0\) the set \(J_\varepsilon \) is finite.
Now we show that \((\alpha ^{{{({\delta , \varepsilon }})}})_{\delta ,\varepsilon >0}\) has the three properties.
(1) Fix \(\varepsilon >0\) and \(\delta >0\). Now take \(m_0\) such that \(\Vert \kappa  \kappa ^{{{({m,}})}} \circ \pi _m \Vert _{\infty } \le \delta /2\) holds for all \(m\ge m_0\). Let \(m\ge m_0\). Then we have that \(\kappa ^{{{({m,}})}} \circ \pi _m\) is irreducible on \(S_\delta \), since \(\kappa (x_{i1}, x_i) \ge \delta \) implies \(\kappa ^{{{({m,}})}}\circ \pi _m(x_{i1}, x_i) \ge \delta /2\). With other words, \(\kappa ^{{{({m,}})}}\) is irreducible on \(\pi _m^{1}(S_\delta )\). For any \(i \in J\) we have that \(\text {supp}(y_i^{{{({\delta }})}})\subset S_\delta \) by construction and hence \(\text {supp}(\pi _m(y_i^{{{({\delta }})}})) \subset \pi _m^{1}(S_\delta )\). Therefore, we get for all \(m\ge m_0\) that \(\pi _m(\alpha ^{{{({\delta ,\varepsilon }})}})\) is connectable with respect to \(\kappa ^{{{({m,}})}}\).
(2) It is straightforward to show that for all \(i\in J\) and \(\delta >0\) we have \(d_\text {BL}(y_i,y_i^{{{({\delta }})}}) \le y_i({\mathcal S }{\setminus } S_{\delta })\le \mu ({\mathcal S }{\setminus } S_{\delta })\), where \(d_\text {BL}\) is the metric defined in (5.29) that induces the weak topology on \({\mathcal M }({\mathcal S })\). For any continuous compactly supported test function \(f:{\mathcal M }_{\le 1}({\mathcal S }){\setminus } \{0\} \rightarrow \mathbb {R}\) there exists \(\varepsilon _f>0\) such that \(f=0\) on \(\{y\in {\mathcal M }_{\le 1}({\mathcal S }){\setminus } \{0\} :y \le \varepsilon _f \}\), so for \(\varepsilon \le \varepsilon _f\) (including the case \(\varepsilon = 0\)) we have
Observe that the righthand side converges to 0 as \(\delta \rightarrow 0\). This implies (2).
(3) By lowersemicontinuity of the rate function it is clear that \(I(\lambda , \alpha ) \le \lim _{\delta , \varepsilon \rightarrow 0}I(\lambda , \alpha ^{{{({\delta , \varepsilon }})}})\). So we only have to deal with the other estimate, i.e., we need to find an upper bound for
Let \(\gamma > 0\). It is straightforward to verify that \(c_{\alpha ^{{{({\delta , \varepsilon }})}}} \rightarrow c_\alpha \) weakly as \(\delta , \varepsilon \rightarrow 0\). Since \(I_\text {Me}\) is continuous we can choose \(\delta _0\) and \(\varepsilon _0\) such that \(I_\text {Me}(\mu c_\lambda c_\alpha )  I_\text {Me}(\mu c_\lambda c_{\alpha ^{{{({\delta , \varepsilon }})}}})\le \gamma /3\) for all \(\delta \le \delta _0\), \(\varepsilon \le \varepsilon _0\).
Recall that
and where we interpret the \(\log \) term as equal to \(+\infty \) if the density does not exist.
We have that
where the inequality follows from (7.1), which is proved in Lemma 7.1. Note that \(\sum _{i\notin J_\varepsilon }y_i \rightarrow 0\) as \(\varepsilon \rightarrow 0\) and \(f_\text {Ma}(y) \rightarrow 0\) as \(y \rightarrow 0\). So we can choose \(\varepsilon \le \varepsilon _0\) such that \(I_\text {Ma}(\alpha ^{{{({0,\varepsilon }})}})  I_\text {Ma}(\alpha ) \le \gamma /3\).
As a last step we want to choose \(\delta = \delta (\varepsilon )\le \delta _0\) such that \(I_\text {Ma}(\alpha ^{{{({\delta , \varepsilon }})}})  I_\text {Ma}(\alpha ^{{{({0, \varepsilon }})}}) \le \gamma /3\), so it remains to show that
Notice that the definition of \(y^{{{({\delta }})}}\) given in (5.41) implies that \(y  \mu ({\mathcal S }{\setminus } S_\delta ) \le y^{{{({\delta }})}}\le y\). Therefore, we can choose \(\delta \) small enough such that \(\mu ({\mathcal S }{\setminus } S_\delta ) \le \frac{1}{2}\min _{i\in J_\varepsilon }(y_i \varepsilon )\) to ensure that for all \(i\in J_\varepsilon \) we have that \(y_i^{{{({\delta }})}} > \varepsilon \). Now, we choose a function \(\chi _\varepsilon :{\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}\rightarrow \mathbb {R}\) that is equal to one on \(\{y\in {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}:y>\varepsilon \}\), equal to zero on \(\{y\in {\mathcal M }_{\le 1}({\mathcal S }){\setminus }\{0\}:y\le \varepsilon /2\}\) and continuous. In particular, the function \(\chi _\varepsilon f_\text {Ma}\) is then compactly supported and we have that for all \(\delta ^\prime \le \delta \)
Technically, we still have the problem that \(f_\text {Ma}\) can take values in \(\mathbb {R}\cup \{+\infty \}\). But our assumption \(I_\text {Ma}(\alpha )<\infty \) implies that \(f_\text {Ma}(y_i) < \infty \) for all \(i\in J_\varepsilon \), so by continuity of \(f_\text {Ma}\) and the finiteness of \(J_\varepsilon \) we can tune \(\delta ^\prime \le \delta \) such that uniformly for all \(i\in J_\varepsilon \) we have \(f_\text {Ma}(y_i^{{{({\delta }})}})\le C\) for some constant C. We already showed in (2) that \(\alpha ^{{{({\delta ^\prime ,0}})}} \rightarrow \alpha \), so having established continuity and compactly supportedness of the function in the integral we get that
Altogether the righthand side of (5.43) can be bounded by \(\gamma \), which proves the claim. \(\square \)
Lemma 5.18
(LDP—lower bound) Suppose that all the assumptions of Theorem 1.1 are satisfied. Then the distribution of \((\text {Mi}_N,\text {Ma}_N)\) under \(\mathbb {P}_N\) satisfies the lower bound part of the LDP with rate function I.
Proof
Fix a set \(G\in {\mathcal L }\times {\mathcal A}\) that is open with respect to the vague topology and a point \((\lambda ,\alpha )\in G\). We will show that for any \(\gamma >0\) we have that
Note that in the case where \(I(\lambda , \alpha ) = \infty \), the claimed estimate always holds, so we assume that \(I(\lambda , \alpha ) < \infty \). Let \(\gamma >0\). We will use the approximating sequence \((\alpha ^{{{({\delta , \varepsilon }})}})_{\delta>0, \varepsilon >0}\) constructed in Lemma 5.17. Let \(\varepsilon >0\) and \(\delta >0\) be small enough such that \(\alpha ^{{{({\delta , \varepsilon }})}} \in G\) and \(I(\lambda , \alpha )I(\lambda , \alpha ^{{{({\delta , \varepsilon }})}}) \le \gamma /2\). Again, we write \(\varPi _m({\tilde{\lambda }}, {\tilde{\alpha }}) = (\pi _m({\tilde{\lambda }}), \pi _m({\tilde{\alpha }}))\) and also \(\varPi _{m,n}({\tilde{\lambda }}, {\tilde{\alpha }}) = (\pi _{m,n}({\tilde{\lambda }}), \pi _{m,n}({\tilde{\alpha }}))\) for any \(n\ge m\). y Proposition 5.6 the set G is also open with respect to the projective limit topology. It is a general fact that the set
is a basis of the projective limit topology. From now on, we fix some large \(m_0\in \mathbb {N}\) which we will specify later. We claim that also the set
is a basis of the projective limit topology and argue this as follows: note that for any \(m<m_0\) we can take any \(n\ge m_0\) and use that \(\pi _m = \pi _{m,n}\circ \pi _n\) holds by Lemma 5.7 to derive that \(\varPi _m^{1}(U_m) = \varPi _n^{1}(\varPi _{m,n}^{1}(U_m))\) for any open set \(U_m\). Since the set \(\varPi _{m,n}^{1}(U_m)\) is again open due to the continuity of \(\pi _{m,n}\), we get that \(\varPi _m^{1}(U_m) \in {\mathcal B}_{m_0}\). Altogether we have that \({\mathcal B}_1\subset {\mathcal B}_{m_0}\).
Having established that \({\mathcal B}_{m_0}\) is a basis for the projective limit topology, we may pick \(m\ge m_0\) and an open set \(U_m\subset {\mathcal L }_m\times {\mathcal A}_m\) such that \((\lambda ,\alpha ^{{{({\delta , \varepsilon }})}})\in \varPi _m^{1}(U_m)\subset G\). Therefore, we see that
where we used Corollary 5.13 for the second inequality. Using Lemmas 5.15 and 5.17 we can pick \(m_0\) large enough such that for all \(m\ge m_0\) the measures \(\pi _m(\alpha ^{{{({\delta , \varepsilon }})}})\) are connectable and \(I^{{{({m,}})}}(\varPi _m(\lambda , \alpha ^{{{({\delta , \varepsilon }})}}))  I(\lambda , \alpha ^{{{({\delta , \varepsilon }})}}) \le \gamma /2\). Altogether, this gives (5.46). \(\square \)
6 The minimizers of \(I_\text {Mi}\)
In this section, we derive an explicit description of the minimizer(s) \(\lambda \) of \(I_\text {Mi}\) under the constraint \(c_\lambda = c\) for any \(c\in {\mathcal M }({\mathcal S })\) satisfying \(c\le \mu \). This will allow us to solve the optimization problem in (2.6), i.e., to identify the minimizer(s) of the rate function for the LDP for \(\text {Mi}_N\) in Theorem 2.3. It will also be used as an important intermediate step in deriving the full optimization of the rate function I of the LDP in Theorem 2.1, our main result. Recall the notation that we introduced in Sect. 2.1, in particular the definition of \(\varSigma (\kappa ,c)\) from (2.3). Here is the main result of this section.
Proposition 6.1
(Minimizers of \(I_\text {Mi}\)) Fix a probability measure \(\mu \) on \({\mathcal S }\) and a kernel \(\kappa \) on \({\mathcal S }\times {\mathcal S }\) that is nonnegative and continuous.
Let \(c\in {\mathcal M }({\mathcal S })\) be a measure such that \(c\le \mu \).

(i)
Assume that \(\varSigma (\kappa ,c)\le 1\). Then
$$\begin{aligned} \inf _{\lambda \in {\mathcal L }:c_\lambda =c}I_\text {Mi}(\lambda )=\Big \langle c,\log \frac{\textrm{d}c}{\textrm{d}\mu }\Big \rangle +\frac{1}{2}\langle c,\kappa (\mu c)\rangle ,\end{aligned}$$(6.1)and the infimum is attained in the unique minimizer \(\lambda _c\) defined in (2.1).

(ii)
Assume that \(\varSigma (\kappa ,c)>1\). Then
$$\begin{aligned} \inf _{\lambda \in {\mathcal L }:c_\lambda = c} {I}_\text {Mi}(\lambda ) \ge \inf _{\lambda \in {\mathcal L }:c_\lambda = b^*} I_\text {Mi}(\lambda ) + I_\text {Me}(cb^*) \end{aligned}$$(6.2)where \(b^* = b^*(c) \in {\mathcal M }({\mathcal S })\) is the minimal, nontrivial (i.e., not equal to c) solution to (2.9) and satisfies \(\varSigma (\kappa ,b^*) = 1\).
It is interesting to notice that one can see the phase transition already from the sole consideration of \(I_\text {Mi}\). We will refer to (i) and (ii) as to the sub and supercritical cases, respectively.
In the case where \({\mathcal S }\) is finite we can actually prove an equality in (6.2). In the general case we also expect this to be true, but did not attempt a proof, since the inequality will be enough to prove our main results, Theorems 2.3 and 2.1.
The proof is naturally divided into Sects. 6.1–6.4 according to the distinctions between finite \({\mathcal S }\) (the discrete case) or general compact \({\mathcal S }\) and between the sub and supercritical cases. In Sect. 6.1 we construct minimizers for subcritical measures c for finite \({\mathcal S }\); we analyze if the only candidate \(\lambda \) (coming from the Euler–Lagrange equations) satisfies the constraint \(c_\lambda =c\), which requires the result about the multivariate power series from Sect. 4.1. In Sect. 6.2 we generalize the results to a general compact type space via an approximation argument. In order to deal with supercritical measures c for \({\mathcal S }\) a finite set, we also rely on combinatorial results in Sect. 6.3. Afterwards, we handle the general supercritical case in Sect. 6.4.
6.1 The discrete, subcritical case
In this section, we formulate and prove the main assertions about the minimizers of \(I_\text {Mi}\) in the discrete case, i.e., the case of a finite type space \({\mathcal S }\). Here we will be using the notation of linear algebra, i.e., measures \(\lambda \in {\mathcal L }\) on \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) will be written as sequences \((\lambda _k)_{k\in \mathbb {N}_0^{\mathcal S }}\).
Recall the definition of the rate function \(I_\text {Mi}\) from Theorem 3.1 as well as the notation for the integrated typeconfiguration \(c_r(\lambda )=\sum _{k\in \mathbb {N}_0^{\mathcal S }}\lambda _k\, k_r\) for \(r\in {\mathcal S }\) introduced in (3.5). We write \([0,\mu ]\) for the set of all \(c\in [0,\infty )^{\mathcal S }\) satisfying \(0\le c_r\le \mu _r\) for any \(r\in {\mathcal S }\).
For \(c\in [0,\infty )^{\mathcal S }\), define \(\lambda (c)=(\lambda _k(c))_{k\in \mathbb {N}_0^{\mathcal S }}\) by
and note that this definition is the discrete analog of the general form of the minimizer in (2.1).
The aim of the present Sect. 6.1 is to verify the subcritical case of Proposition 6.1 in the discrete setting, which we restate here quickly.
Proposition 6.2
Let \(c=(c_s)_{s\in {\mathcal S }}\) be in \([0,\mu ]\). Assume that \(\varSigma (\kappa ,c) \le 1\). Then
and the infimum is attained in the unique minimizer \(\lambda (c)\) defined in (6.3).
To derive the form of the minimizer given in (6.3), we will start by giving a short heuristic. First note that \(I_\text {Mi}\) is a strictly convex function and that \(\{\lambda :c(\lambda ) = c\}\) is a convex set, which implies that there is at most one minimizer. Assume that a minimizer \(\lambda ^*\) exists in the interior of \(\{\lambda :c(\lambda ) = c\}\). Then by formally writing down the Euler–Lagrange equations, one can see that
where \(\theta = (\theta _s)_{s\in {\mathcal S }}\) is some nonnegative realvalued vector. Note that \(\theta \) has to be chosen in such a way that \(c(\lambda ^*) = c\), i.e., for every \(r\in {\mathcal S }\) the multivariate power series
converges with limit \(c_r\). We already encountered in Sect. 4.1 that for \(\theta = c{{\text {e}} }^{\kappa c}\) the power series on the righthand side of (6.6) has the right value. The following is just a reformulation of the results from Lemma 4.1 and Proposition 4.2 using the notation of the present section.
Corollary 6.3
Let \(c = (c_s)_{s\in {\mathcal S }}\) in \([0,\mu ]\) and assume that \(\varSigma (\kappa ,c) \le 1\). Then for \(\lambda ^* = \lambda (c)\) we have that \(c(\lambda ^*) = c\).
A rigorous argument showing that this choice uniquely minimizes \(I_\text {Mi}\) can be found at the end of this section. The identification of the optimal rate \(I_\text {Mi}(\lambda (c))\) needs an additional property of the minimizer, namely a formula of its total mass, which we derive in Lemma 6.5. For this we use the following recursive formula.
Lemma 6.4
Let \(k\in \mathbb {N}_0^{\mathcal S }\). Then we have the recursion
Proof
Let \((x_i)_{i\in [k]} \in {\mathcal S }^{k}\) be a vector compatible to k, i.e., \(k = \sum _i \delta _{x_i}\) and recall the definition of \(\tau (k)\) from (3.6). For \(i,j \in [{\left{k}\right}]\) with \(i\ne j\) define
i.e., \(W_{i,j}\) is the total weight of trees containing the edge \(\{i,j\}\). Observe, that each tree T on \([{\left{k}\right}]\) contains exactly \({\left{k}\right}1\) edges and, for each edge \(\{i,j\} \in E(T)\) the weight of T appears once in \(W_{i,j}\) and once in \(W_{j,i}\). Thus, the weight of T is counted \(2({\left{k}\right}1)\) times in the sum \(\sum _{i\ne j} W_{i,j}\), which implies that
Now, for a fixed pair of types \(r,s \in {\mathcal S }\) consider the weights of trees containing an edge connecting some type r with some type s vertex, i.e., consider \(\sum _{i\ne j :x_i = r, x_j =s} W_{i,j}\). Notice that each tree contributing to this weight can be decomposed into an edge of weight \(\kappa (r,s)\) and two trees \(T_r\) and \(T_s\) with roots of type r and s respectively. (The term ’root’ is here only used to mark a certain vertex, not to give some directed structure.) This implies the formula
Here we used formula (4.5) to collect the weight coming from the possible choices of \(T_r\) and \(T_s\), which is \(\tau (m)m_r\) and \(\tau ({\widetilde{m}}){\widetilde{m}}_s\) respectively. Formula (6.7) now follows by summing over all possible pairs \(r,s \in {\mathcal S }\). \(\square \)
Lemma 6.5
Let \(c=(c_s)_{s\in {\mathcal S }}\) be nonnegative with \(\varSigma (\kappa ,c) \le 1\). Then for \(\lambda (c)\) defined as in (6.3) we have that
Proof
Writing \(\lambda ^* = \lambda (c)\) and using that \(c(\lambda ^*) = c\) we show the equivalent equation
For fixed \(k\in \mathbb {N}_0^{\mathcal S }\) the recursive equation (6.7) for \(\tau (k)\) easily implies
With the assumption \(\varSigma (\kappa ,c)\le 1\) all series in the next equations converge (absolutely) by Corollary 6.3, so by rearranging terms we get that
\(\square \)
Combining the results from Corollary 6.3 and Lemma 6.5 we can now give the proof of Proposition 6.2:
Proof of Proposition 6.2
Assume that \(\varSigma (\kappa ,c)\le 1\). Define \(\lambda ^* = \lambda (c)\) as in (6.3). By Corollary 6.3 we have that \(c_r(\lambda ^*)=\sum _k \lambda ^*_k k_s = c_s\) for all \(s\in {\mathcal S }\). Now, take any \(\lambda \) satisfying \(c_s(\lambda )=\sum _k \lambda _k k_s =c_s\) for all \(s\in {\mathcal S }\). Then from (3.7), using the formula from Lemma 6.5, we get
where we wrote \(\mathbb {H}(\lambda \lambda ^*) = \langle \lambda , \log \frac{\lambda }{\lambda ^*}\rangle + \lambda ^* \lambda  \) for the entropy and used that \(\mathbb {H}(\lambda \lambda ^*)\ge 0\) with equality if and only if \(\lambda = \lambda ^*\). \(\square \)
6.2 The general subcritical case
In this section we derive Proposition 6.1(i). The proof is similar to the proof of the discrete variant in Sect. 6.1. Again, there is an explicit candidate for the minimizer, but one has to prove that it is admissible, and we need to identify its total mass. This is done in the analogs of Corollary 6.3 and Lemma 6.5, see Lemmas 6.7 and 6.8, whose proofs proceed via a discrete approximation based on the material of Sect. 5.1.
In the current case of a general compact metric type space \({\mathcal S }\), the candidate for a minimizer is given in terms of a Poisson point process, see (6.14). Recall that we write \(\mathbb {Q}_\theta \) for the distribution of a Poisson point process \(\mathbb {X}=(X_i)_{i\in I}\) in \({\mathcal S }\) with intensity measure \(\theta \in {\mathcal M }({\mathcal S })\). We write \(k =\sum _i \delta _{X_i} \in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) for the measure induced by the random point cloud. Note that the points \(X_i\) do not have to be distinct with positive probability, if \(\theta \) has no Lebesgue density. We start by noting a simple fact about the densities between absolute continuous Poisson point processes.
Lemma 6.6
Let \(\theta , {\hat{\theta }} \in {\mathcal M }({\mathcal S })\) with \({\hat{\theta }} \ll \theta \). Then \(\mathbb {Q}_{{\hat{\theta }}} \ll \mathbb {Q}_{\theta }\) and
Recall the definition of \(\tau (k)\) introduced in (1.9). Also recall that for a fixed \(c\in {\mathcal M }\) according to definition (2.1) the candidate for the minimizer of \(I_\text {Mi}\) under the constraint \(c_\lambda = c\) has the form
We first provide a generalized version of Corollary 6.3 and Lemma 6.5. We first impose the stricter condition \(\varSigma (\kappa ,c) < 1\).
Lemma 6.7
Let \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \). Assume that \(\varSigma (\kappa ,c)<1\). Then the following holds.

1.
For any continuous test function \(f:{\mathcal S }\rightarrow [0,\infty )\) we have that
$$\begin{aligned} \int _{{\mathcal M }_{\mathbb {N}_0}({\mathcal S })}\lambda _c(\textrm{d}k) \langle k,f \rangle = \langle c, f \rangle . \end{aligned}$$(6.15) 
2.
The total mass of \(\lambda _c\) is given by
$$\begin{aligned} \lambda _c=\int _{{\mathcal M }_{\mathbb {N}_0}({\mathcal S })}\lambda _c(\textrm{d}k) = c({\mathcal S })  \frac{1}{2} \langle c, \kappa c\rangle . \end{aligned}$$(6.16)
Proof
We focus on showing Eq. (6.15); the proof of (6.16) is similar (see the end of the proof). Abbreviating \(\theta := \theta _c\) and inserting the definition of \(\lambda _c\) we have to prove that
where we conceive k as an \({\mathcal M }_{\mathbb {N}_0}({\mathcal S })\)valued random variable on the lefthand side. The idea is to deduce the equality from the one that we have in the finitetype case by using the discretization scheme from Sect. 5.1. Recall the notation from Sect. 5.1, where we discretized the compact metric space \({\mathcal S }\) into finite spaces \({\mathcal S }_m\), \(m\in \mathbb {N}\), and defined the projections \(\pi _m\), \(m\in \mathbb {N}\), on different spaces in equations (5.6)–(5.8). For \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) we will again identify the discretized measure \(\pi _m(k)\) with an element of \(\mathbb {N}_0^{{\mathcal S }_m}{\setminus }\{0\}\). Via \({\mathcal S }_m\subset {\mathcal S }\) the function f can be restricted to \({\mathcal S }_m\) and write \(f_m = \left. f \right _{{\mathcal S }_m}\). Also, we write \(c_m := \pi _m(c)\) and identify it with a vector \((c_m(r))_{r\in {\mathcal S }_m}\). Recall the definitions of the discretized kernels. Let \(\kappa _m\in \{\kappa ^{{{({m,+}})}}, \kappa ^{{{({m,}})}}\}\), where \(\kappa ^{{{({m,\star }})}}\) for \(\star =\pm \) is defined as in (5.13) and (5.14). Denote \(\theta _m(r):={{\text {e}} }^{(\kappa _m c_m)(r)}c_m(r)\), \(r\in {\mathcal S }_m\). For \(k\in \mathbb {N}_0^{{\mathcal S }_m}\) let \(\tau _m(k)\) be defined as in (3.6), but with respect to \(\kappa _m\). Fix a continuous function \(f:{\mathcal S }\rightarrow \mathbb {R}\). Our aim is to show that
which finishes the proof of (6.15).
We start with proving the second equality of (6.18). It is straightforward to show that
and hence \(\varSigma (\kappa _m,c_m)\rightarrow \varSigma (\kappa ,c)\), as \(m\rightarrow \infty \) and so we will have for large \(m\in \mathbb {N}\) that \(\varSigma (\kappa _m,c_m) <1\). Then we get
where the second equation only holds if m is large enough, and thus \(\varSigma (\kappa _m, c_m) \le 1\) due to Lemma 4.1(i) and Proposition 4.2.
Now we show the first equation of (6.18). Note that \(\mathbb {Q}_{\theta _m}\) is a point process on \({\mathcal S }_m\), whereas \(\mathbb {Q}_\theta \) is a point process on \({\mathcal S }\). However, by defining an intensity measure on \({\mathcal S }\) by \({\bar{\theta }}_m (\textrm{d}x) := {{\text {e}} }^{(\kappa _m c_m)(\pi _m(x))}c(\textrm{d}x)\) we have that \(\theta _m = {\bar{\theta }}_m \circ \pi _m^{1}\) and hence \(\mathbb {Q}_{\theta _m} = \mathbb {Q}_{\bar{\theta }_m} \circ \pi _m^{1}\) holds by the mapping theorem for Poisson point processes. Therefore, according to Lemma 6.6,
with
Note that \(\varPsi ^f_m\) converges pointwise to \(\varPsi ^f\), where \(\varPsi ^f(k) = \tau (k) \langle f, k \rangle {{\text {e}} }^{\theta ({\mathcal S })}\), \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\). Hence, the first equation of (6.18) immediately follows as soon as we have given an argument for interchanging the limit as \(m\rightarrow \infty \) and the integration with respect to \( \mathbb {Q}_{\theta }\). We will be using Lebesgue’s theorem about dominated convergence for that. Let us introduce a majorant. Recalling the definition (5.33) of \(\tau _m^{{{({+}})}}\) we define \({\widehat{\varPsi }}_m\) by
Then, since \(\kappa ^{{{({m,}})}}\le \kappa _m\le \kappa ^{{{({m,+}})}}\) we clearly have for any \(k\in {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) that \(\varPsi ^f_m(k) \le {\left\Vert {f}\right\Vert }_\infty {\widehat{\varPsi }}_m(k)\) and \({\widehat{\varPsi }}_m(k) \le {\widehat{\varPsi }}_{m_0}(k)\) if \(m\ge m_0\). Hence, \({\widehat{\varPsi }}_{m_0}\) is a majorant. It remains to show that there exists \(m_0\) such that \(\mathbb {Q}_\theta ({\widehat{\varPsi }}_{m_0}) < \infty \), then the majorant \({\widehat{\varPsi }}_{m_0}\) is integrable. Arguing as in (6.20) we have that
where \(\theta _m^{{{({}})}}(s) = {{\text {e}} }^{(\kappa ^{{{({m,}})}}c_m)(s)}c_m(s)\), for \(s\in {\mathcal S }_m\). Let \(\chi _m:=\chi (\kappa ^{{{({m,+}})}},\theta _m^{{{({}})}})\) be defined as in (4.11). We can argue as in the proof of Lemma 4.5 to get that, for any \(n\in \mathbb {N}\),
(where the \({{\text {e}} }^{o(n)}\)term is actually given by \({\mathcal M }_1^{{{({n}})}}({\mathcal S }_m)\sum _{r\in {\mathcal S }_m}\varDelta _r(n\nu )\)). Abbreviating \(\delta _m= \Vert \kappa ^{{{({m,+}})}}\kappa ^{{{({m,}})}}\Vert _{\infty }\), we further have that
where \(\varSigma _m:= \varSigma (\kappa ^{{{({m,+}})}},c_m)\). Now, choose \(\varepsilon >0\) small enough such that \(\varSigma (\kappa ,c) + \varepsilon <1\) and \(\varSigma (\kappa ,c)  \log (\varSigma (\kappa ,c)) > 1+\varepsilon \). It holds that \(\varSigma _m \rightarrow \varSigma (\kappa ,c)\), as \(m\rightarrow \infty \), and clearly we have that \(\delta _m \rightarrow 0\), as \(m\rightarrow \infty \). Additionally, we use that the function \(x \mapsto \phi (x) :=x\log x\) is continuous and decreasing on [0, 1]. Hence, we find \(m_0\) such that \(\delta _{m_0} \le \varepsilon /2\), as well as \(\varSigma _{m_0} \le \varSigma (\kappa , c)+\varepsilon <1\) and \(\phi (\varSigma _{m_0})\ge \phi (\varSigma (\kappa ,c))  \varepsilon /2\). Consequently,
which altogether implies that
Thus, Lebesgue’s theorem of dominated convergence is applicable and (6.18) follows.
Equation (6.16) can be shown in the same way and relies on the discrete version of the equation, derived in Lemma 6.5. \(\square \)
Lemma 6.8
The statement of Lemma 6.7 is also true under the assumption \(\varSigma (\kappa ,c) = 1\).
Proof
The idea is to construct a sequence \(c^{{{({n}})}}\in {\mathcal M }({\mathcal S })\) with \(\varSigma (\kappa ,c^{{{({n}})}})<1\) such that \(\theta ^{{{({n}})}} := \theta _{c^{{{({n}})}}} \nearrow \theta _c =: \theta \) monotonically as \(n\rightarrow \infty \).
Recall that \({\mathcal S }\) is compact and \(\kappa \) is continuous, hence a standard argument (see e.g. [7, Lemma 5.15]) shows that the operator \(T_{\kappa ,c}\) is a positive Hilbert–Schmidt operator and therefore has a nonnegative eigenfunction corresponding to the eigenvalue \(\varSigma (\kappa ,c)\). By the assumption \(\varSigma (\kappa ,c) =1\) we can find a function \(g:{\mathcal S }\rightarrow (0,\infty )\) such that \(T_{\kappa ,c} g = g\). For any \(n\in \mathbb {N}\) define \(c^{{{({n}})}} \in {\mathcal M }({\mathcal S })\) via \(\frac{\textrm{d}c^{{{({n}})}}}{\textrm{d}c}:= 1 \frac{1}{n}g\). Then for n large enough \(c^{{{({n}})}}(A)< c(A)\) for any measurable \(A\subset {\mathcal S }\) with \(\int _A g\,\textrm{d}c>0\). In particular \(\varSigma (\kappa ,c^{{{({n}})}}) <1\). (An adhoc argument in the case that \(\kappa \) is irreducible is as follows: Pick an \(L^2(c^{{{{({n}})}}})\)normalized positive eigenfunction \(g_n\) of \(T_{\kappa ,c^{{{{({n}})}}}}\) corresponding to the eigenvalue \(\varSigma (\kappa ,c^{{{({n}})}})\) and observe that \(\widetilde{g}_n(x)=g_n(x)(1\frac{1}{n} g(x))^{1/2}\) is \(L^2(c)\) normalized and that \(\varSigma (\kappa ,c^{{{({n}})}})= \Vert T_{\kappa ,c^{{{{({n}})}}}} g_n\Vert _{L^2(c^{{{{({n}})}}})}<\Vert T_{\kappa ,c}{\widetilde{g}}_n\Vert _{L^2(c)}\le \Vert T_{\kappa ,c}\Vert =\varSigma (\kappa ,c)\). If \(\kappa \) is reducible, then apply this argument to the irreducible components.) Now, observe that for any \(n\in \mathbb {N}\) we have
and the righthand side converges pointwise monotonically to 1, as \(n\rightarrow \infty \).
Now, fix any continuous test function \(f:{\mathcal S }\rightarrow [0,\infty )\). Then, by monotone convergence and the fact that we can apply Lemma 6.7 to \(c^{{{({n}})}}\) for all \(n\in \mathbb {N}\), we get that
The same argument shows that \(\mathbb {Q}_{\theta }\Big [\tau (k)e^{\theta ({\mathcal S })}\Big ] = c({\mathcal S }) \frac{1}{2} \langle c, \kappa c\rangle \). \(\square \)
Now we can identify the minimizers of \(I_{\text {Mi}}\). The following is a variant of Proposition 6.2 in the general setting. Once having established Lemmas 6.7 and 6.8, the proof in the general setting follows the ones in the discrete setting. Recall that \(\mu \) is the reference probability measure on \({\mathcal S }\).
Lemma 6.9
(Minimizers of \(I_\text {Mi}\)) Assume that \(c\in {\mathcal M }({\mathcal S })\) with \(c \le \mu \) satisfying \(\varSigma (\kappa ,c)\le 1\). Then the unique minimizer \(\lambda \) of \(I_\text {Mi}\) under the assumption \(c_\lambda =c\) is equal to \(\lambda _c\) defined in (2.1) and
Proof
Note that \(\lambda _c\) is admissible, according to Lemmas 6.7 and 6.8, since \(c_{\lambda ^*}=c\).
Using Lemma 6.6 and the fact that \(\mu \) is a probability measure we can rewrite the measure \(\mathbb {Q}_\mu \) as
Now, writing \({\mathcal M }= {\mathcal M }_{\mathbb {N}_0}({\mathcal S })\) we get for any \(\lambda \in {\mathcal L }\) satisfying \(c_\lambda =c\) that
We used the fact that \(\lambda _c({\mathcal M }) = c({\mathcal S })  \frac{1}{2} \langle c, \kappa c\rangle \), which was derived in the last statement of Lemma 6.7. Since \(\mathbb {H}(\lambda  \lambda _c) \ge 0\) and \(\mathbb {H}(\lambda  \lambda _c) = 0\) if and only if \(\lambda = \lambda _c\), the claim follows. \(\square \)
6.3 The discrete, supercritical case
In this section, we assume again that \({\mathcal S }\) is a finite space and investigate the case where the measure c (the one that formulates the constraint) is supercritical, meaning \(\varSigma (\kappa ,c) >1\). The aim of this section is to verify the following result.
Proposition 6.10
(Discrete, supercritical case) Let \(c\in [0,\mu ]\) with \(\varSigma (\kappa ,c) >1\). Then
where \(b^*=b^*(c)\) is the minimal nontrivial (i.e., not equal to c) solution to
and satisfies \(\varSigma (\kappa , b^*) = 1\).
Indeed, one possible realization of the rate (6.26) is given by constructing a minimizer as in formula (6.3) with respect to the (sub)critical parameter \(b^*\) and realizing the remaining part \(cb^*\) by means of a diverging sequence \(k^{{{({n}})}}\), such that the mesoscopic rate term appears.
The proof will be a consequence of the next lemmas. In Lemma 6.11 we derive an upper and a lower bound for \(\inf _{\lambda :c_\lambda =c}I_\text {Mi}(\lambda )\) and in Lemma 6.12 we show that they coincide, if there are solutions to the fixed point equation (6.27). We will postpone the proof about existence of solutions to Sect. 6.4, Lemma 6.14.
Lemma 6.11
Let \(c\in [0,\mu ]\) with \(\varSigma (\kappa ,c) >1\). For \(b\in [0,c]\) with \(\varSigma (\kappa ,b) \le 1\) we put
Then
Proof
We first show the first inequality in (6.30). Fix \(b \in [0, c]\) with \(\varSigma (\kappa ,b) \le 1\). Let \(\lambda ^*:= \lambda (b)\) be given as in (6.3). We proceed in the same way as in the proof of Proposition 6.2, but use that this time \(\lambda ^* = b  \frac{1}{2} \langle b, \kappa b\rangle \). Then for any \(\lambda \) with \(c(\lambda ) = c\) we have
We now prove the upper bound in (6.30). Fix \(b \in [0, c]\) with \(\varSigma (\kappa ,b) \le 1\).
Case 1: First, we assume that \(\kappa \) is irreducible with respect to \(cb\). Let \(\lambda ^*:=\lambda (b)\) be defined as in (6.3). For \(n \in \mathbb {N}\) define \(k^{{{({n}})}}:= \lfloor n(cb)\rfloor \) and write \(R_n:={\left{k^{{{({n}})}}}\right}\) and \(b^{{{({R_n}})}}:=\sum _{{\left{k}\right}\le R_n}\lambda ^*_k k\). We define
with \(\varepsilon := cb^{{{({R_n}})}} \frac{1}{n}k^{{{({n}})}}\), which ensures that \(c(\lambda ^{{{({n}})}})=c\) holds for all \(n\in \mathbb {N}\), but is negligible in the limit, i.e., \(\lim _{n\rightarrow \infty }\lambda ^{{{({n}})}}_{{{\textbf {e}}}_s} = \lambda ^*_{{{\textbf {e}}}_s}\) for all \(s\in {\mathcal S }\). Note that due to the irreducibility assumption we have that \(\tau (k^{{{({n}})}}) >0\) if n is large enough, which ensures that \(I_\text {Mi}(\lambda ^{{{({n}})}})\) is finite.
Using the notation \(I_\text {Mi}^{{{({R}})}}(\lambda )\) introduced in (3.22) we get
Denote the last summand as \(A_n\). By using the formula (4.6) from Lemma 4.3 for some \(r \in \text {supp}(c)\) as well as Stirling’s formula for the factorial terms, we have that, as \(n\rightarrow \infty \)
where \(\varDelta _r\) is defined in (4.7), which can be easily extended to arguments in \([0,\infty )^{\mathcal S }\). Note that by construction \(\varDelta _r(cb) >0\). Clearly, the last summand in (6.32) is of order o(1). By the construction of \(\lambda ^{{{({n}})}}\) it is immediate that \(\lim _{n\rightarrow \infty } I_\text {Mi}^{{{({R_n1}})}}(\lambda ^{{{({n}})}}) = \langle b,\log (b/\mu )\rangle + \frac{1}{2} \langle b, \kappa (\mu b)\rangle \), so altogether we get that
Case 2: If \(\kappa \) is reducible with respect to \(cb\), we can find a decomposition of \(\text {supp}(cb)\) into disjoint sets \({\mathcal S }_j\) such that \(\kappa \) restricted to \({\mathcal S }_j\times {\mathcal S }_j\) is irreducible and \(\left. \kappa \right _{{\mathcal S }_i\times {\mathcal S }_j}=0\) for \(i\ne j\). Then we have to modify the construction of \(\lambda ^{{{({n}})}}\) given above by putting mass \(\frac{1}{n}\) on each of the mesoparticles \(k^{{{({j,n}})}} := \lfloor n(cb)\rfloor \mathbbm {1}_{{\mathcal S }_j}\). We omit the details. \(\square \)
The following lemma completes the proof of Proposition 6.10. Its assumptions are verified later and in more generality in Lemma 6.14.
Lemma 6.12
Let \(c\in [0,\mu ]\) with \(\varSigma (\kappa ,c) >1\). If there exists a nontrivial solution \(b^*\in [0,c]\) to (6.27) and \(\varSigma (\kappa ,b^*) = 1\), then \(F(b^*) = G(b^*)\). Consequently, Eq. (6.26) holds.
Proof
Using the fixed point equation (6.27) we can substitute \(cb^* = \langle b^*, \kappa (cb^*)\rangle \) to rewrite \(F_c(b^*)\) and \(\langle cb^*, \log [(cb^*)/\kappa (cb^*)]= \langle cb^*, \log b^*\rangle \) to rewrite \(G_c(b^*)\). Then \(F_c(b^*) = G_c(b^*)\). \(\square \)
6.4 The general supercritical case
Building on the results of the previous subsection we derive a slightly weaker result than Proposition 6.10 for the general case, which will still be enough to derive the optimal rates for the contraction principle as well as to fully optimize the rate function I.
Lemma 6.13
Fix \(c \in {\mathcal M }({\mathcal S })\) with \(c\le \mu \) and \(\varSigma (\kappa ,c) > 1\). Then
where \(b^* = b^*(c) \in {\mathcal M }({\mathcal S })\) is the minimal, nontrivial (i.e., not equal to c) solution to (2.9) and satisfies \(\varSigma (\kappa ,b^*) = 1\).
Sketch of proof
We can generalize the proof of the lower bound of Lemma 6.11 and the definition of \(F_c\) to obtain
for any \(b \in {\mathcal M }({\mathcal S })\) with \(b\le c\) and \(\varSigma (\kappa ,b) \le 1\). This relies on the admissibility of the (auxiliary) minimizers \(\lambda _b\) proved in Lemmas 6.7 and 6.8. The lower bound is obtained by writing everything with entropies as in the proof of Lemma 6.9.
Observe that if \(b^*\) is a solution of (2.9), then one can argue as in the proof of Lemma 6.12 to see that
\(\square \)
Lemma 6.14
(Solutions to (2.9)) Fix \(c\in {\mathcal M }({\mathcal S })\).

(i)
Assume that \(\kappa \) is irreducible w.r.t. c and \(\varSigma (\kappa ,c) > 1\). Then there exists exactly one solution \(b^*\) to (2.9) that satisfies \(b^* \ne c\). Further, it holds that \(\varSigma (\kappa ,b^*)=1\).

(ii)
If \(\varSigma (\kappa ,c) \le 1\), then the only solution \(b^*\) to (2.9) is given by the trivial solution \(b^*=c\).

(iii)
Assume that \(\kappa \) is reducible w.r.t. c and \(\varSigma (\kappa ,c) > 1\), then there exists at least one solution \(b^*\) to (2.9) with \(b^*\ne c\). Moreover, there exists a unique minimal solution \(b_*\) to (2.9) (which is minimal in the sense that \(b_*\le b^*\) holds calmost everywhere for all solutions \(b^*\) of (2.9)). Further, we have that \(\varSigma (\kappa , b^*)>1\) for all solutions \(b^*\) with \(b^*\ne b_*\) and \(\varSigma (\kappa , b_*) =1\).
Proof
We will study the existence and uniqueness of nontrivial solutions \(f^*:{\mathcal S }\rightarrow [0,1)\) to
By substituting \(b^* = (1f^*)c\) it is easily seen that solving (6.34) is equivalent to solving (2.9).
(i) Existence: We once more reformulate (6.34) by substituting \(g^*=f^*/(1f^*)\) (which is equivalent to \(f^*= g^*(1+g^*)\)). Then (6.34) is equivalent to
i.e., we are searching for a fixed point of U. Note that \(g^*(s)/(1+g^*(s)) \le 1\) for all \(s\in {\mathcal S }\). Together with the fact that \(\kappa \) is nonnegative this implies that any solution \(g^*\) of (6.35) satisfies \(g^*\le T_{\kappa ,c}\mathbbm {1}\). Hence, it suffices to study the operator U on the domain \(D=\{g:{\mathcal S }\rightarrow \mathbb {R}:0\le g\le T_{\kappa ,c}\mathbbm {1}\}\). We construct a solution iteratively by defining \(g_0 := T_{\kappa ,c}\mathbbm {1}\) and \(g_m:=U(g_{m1})\) for \(m\in \mathbb {N}\). Since the function \(x\mapsto x/(1+x)\) is strictly increasing on \([0,\infty )\) and \(\kappa \) is nonnegative, we have that \(g\le g^\prime \) implies \(U(g) \le U(g^\prime )\). Since \(g_1 \le g_0\) we can iterate this argument to show that \(g_m \le g_{m1}\) holds for any \(m\in \mathbb {N}\). Therefore the limit \(g^*:=\lim _{m\rightarrow \infty }g_m \in D\) exists and by the continuity of U it satisfies (6.35). We claim that our assumptions on \(\kappa \) and c imply that \(g^*>0\): By the assumptions that \(\kappa \) is irreducible w.r.t. c, \({\mathcal S }\) is compact and \(\kappa \) is continuous, \(T_{\kappa ,c}\) is a positive, irreducible and compact operator. Therefore there exists a strictly positive eigenfunction v of \(T_{\kappa ,c}\) with eigenvalue \(\varSigma (\kappa ,c)>1\). Note that the function \(T_{\kappa ,c} v\) is continuous (by compactness of \({\mathcal S }\) and continuity of \(\kappa \)), hence v is continuous and by compactness of \({\mathcal S }\) it is also bounded. So without loss of generality we can pick v such that \(v(s)\in (0,1]\) for any \(s\in {\mathcal S }\). Observe that by the irreducibility assumption \(g_0=T_{\kappa ,c}\mathbbm {1}>0\). Now, pick \(\delta >0\) such that \(\varSigma (\kappa ,c)\ge 1+\delta \) and \(g_0\ge \delta v\). Observe that for any g with \(g\ge \delta v\) we have that
Hence \(g_m \ge \delta v\) holds for all \(m\in \mathbb {N}\). Consequently, \(g^* \ge \delta v>0\).
Additionally, we claim that \(g^*\) is the maximal solution to (6.35). Let \({{\tilde{g}}}^* \) be any other solution to (6.35), then we necessarily have that \({{\tilde{g}}}^* \le T_{\kappa ,c}\mathbbm {1}=g_0\). It follows by the monotonicity of U that \({{\tilde{g}}}^* = U({{\tilde{g}}}^* ) \le g_1\) and, iteratively, \(\tilde{g}^* \le g_m\) for any \(m\in \mathbb {N}\). Hence \({{\tilde{g}}}^* \le g^*\). By the equivalence of (6.34) and (6.35) and monotonicity of \(x\mapsto x/(1+x)\) we have that \(f^* = g^*/(1+g^*)\) is the maximal solution to (6.34).
Uniqueness: Assume towards a contradiction that \(f^*\) and \({{\tilde{f}}}^*\) are nontrivial solutions of (6.34) and \(f^*\ne {{\tilde{f}}}^*\) on a set \(A\subset {\mathcal S }\) with \(c(A)>0\). Without loss of generality we can assume that \(f^*\) is the maximal solution (as constructed in the existence part), i.e., \({{\tilde{f}}}^* \le f^*\) on \({\mathcal S }\) and \({{\tilde{f}}}^* < f^*\) everywhere on A. For any \(h:{\mathcal S }\rightarrow [0,1]\) put \(\varPsi (h) := (1h)T_{\kappa ,c}h\). Then \(\varPsi (f^*) = f^*\) and \(\varPsi ({{\tilde{f}}}^*) = {{\tilde{f}}}^*\) and we have that
where the inequality relies on the fact that \(\kappa \) is nonnegative and \({{\tilde{f}}}^*\le f^*\). Note that we even have a strict inequality on the set A. Now, define \(h:= (f^*\tilde{f}^*)/2\), then \({{\tilde{f}}}^*+h = ({{\tilde{f}}}^*+f^*)/2\) and we already argued that \(\varPsi ({{\tilde{f}}}^*+h) \ge {{\tilde{f}}}^*+h\) where the inequality is strict on A. On the other hand, as \(h\ge 0\), we get
so altogether \((1{{\tilde{f}}}^*)T_{\kappa ,c}({{\tilde{f}}}^*+h) \ge \tilde{f}^*+h\) with strict inequality on A. Using (6.34) for \({{\tilde{f}}}^*\) and the symmetry of \(\kappa \) we get
which is a contradiction. Hence the solution to (6.34) is unique up to sets that have measure zero w.r.t. c, which implies uniqueness of \(b^*\).
We now argue that \(\varSigma (\kappa ,b^*) = 1\). Our procedure is very similar to the one used in the proof of Lemma 6.6 in [7]. Let \(w :{\mathcal S }\rightarrow \mathbb {R}\) be an eigenfunction of \(T_{\kappa , b^*}\) with eigenvalue a. Then using (6.34) and the symmetry of \(\kappa \), we get that
Hence, we either have that \(\langle c, w f^* \rangle = 0\) or \(a=1\). By the Krein–Rutman Theorem (the extension of the Perron–Frobenius Theorem to positive compact operators) the eigenfunction w corresponding to the largest eigenvalue of \(T_{\kappa , b^*}\) is nonnegative and nontrivial, so \(\langle c, w f^* \rangle > 0\) and hence \(\varSigma (\kappa , b^*)=1\). (Interestingly, we have constructed \(b^*\) in such a way that all other eigenfunctions \({\widetilde{w}}\) of \(T_{\kappa , b^*}\) satisfy \(\langle c, {\widetilde{w}} f^* \rangle = 0\).)
(ii) Let \(\varSigma (\kappa ,c)\le 1\). Assume towards a contradiction that \(f^*\) is a solution to (6.34) and \(\int _A f^*\, \textrm{d}c>0\) for some open set \(A\subset {\mathcal S }\) such that \(c(A)>0\). Using the substitution \(f^*= g/(1+g)\) we have that Eq. (6.34) is equivalent to \(T_{\kappa ,c}(g/(1+g)) =g\) and since \(\kappa \) is continuous and \({\mathcal S }\) is compact the lefthand side \(T_{\kappa ,c}(g/(1+g))\) is a continuous function, which implies that both g and \(f^*\) are continuous functions. So we can find an \(\varepsilon >0\) and a set \(A_\varepsilon \subset A\) such that \(f^* \ge \varepsilon \) on \(A_\varepsilon \) and \(c(A_\varepsilon ) >0\). Therefore,
holds on \(A_\varepsilon \) and \(T_{\kappa ,c} f^* \ge f^*\) holds on \({\mathcal S }\). This implies that \(\Vert T_{\kappa ,c} f^*\Vert _{L^2(c)} >\Vert f^*\Vert _{L^2(c)}\) and hence \(\varSigma (\kappa , c) > 1\) in contradiction to our assumption.
(iii) Since \(\kappa \) is reducible w.r.t. c, we find a decomposition of \(\text {supp}(c)\) into (countable many) disjoint sets \(S_j\), \(j\in J\), such that \(\kappa ^{{{({j}})}} = \left. \kappa \right _{S_j\times S_j}\) is irreducible with respect to \(c^{{{({j}})}}\), the restriction of c to \(S_j\) for any \(j\in J\), and \(\left. \kappa \right _{S_i\times S_j} =0\) holds calmost everywhere, if \(i\ne j\). Let \(J^\prime :=\{j\in J:\varSigma (\kappa ^{{{({j}})}},c^{{{({j}})}}) > 1\}\) and note that \(J^\prime \ne \emptyset \). By (i) we get that for any \(j\in J^\prime \) there exists a function \(f^{{{({j}})}}:S_j \rightarrow [0,1)\) that solves (6.34) on \(S_j\) and \(f^{{{({j}})}} > 0 \). By (ii) we get that for any \(j\in J{\setminus } J^\prime \) the only function \(f:S_j \rightarrow [0,1)\) that solves (6.34) on \(S_j\) is equal to 0 \(c^{{{({j}})}}\)almost everywhere. Now for \(\sigma = (\sigma _j)_{j\in J^\prime } \in \{0,1\}^{J^\prime }\) define \(f^{{{({\sigma }})}}:{\mathcal S }\rightarrow [0,1)\) by \(f^{{{({\sigma }})}}(s)= \sigma _j f^{{{({j}})}}(s)\), if \(s\in S_j\) for some \(j\in J^\prime \) and \(f^{{{({\sigma }})}}(s) = 0\) for \(s\in {\mathcal S }{\setminus }\bigcup _{j\in J^\prime }S_j\). It can now be easily checked that all solutions to (6.34) are given by
and that \({\mathcal F }\) contains at least one nontrivial solution. Write \(b^{{{({\sigma }})}} = (1f^{{{({\sigma }})}})c\) and note that all possible solutions of (2.9) are of this form. Clearly, the minimal solution \(b^*\) of (2.9) is given via the maximal solution in \({\mathcal F }\), i.e., by choosing \(\sigma \equiv 1\).
We will now investigate the quantities \(\varSigma (\kappa , b^{{{({\sigma }})}})\) for any choice of \(\sigma \). First, let \(\sigma \) be such that there exists some \(j\in J^\prime \) with \(\sigma _j=0\). Then for \(r\in S_j\) and any function \(h:{\mathcal S }\rightarrow \mathbb {R}\)
Therefore, given an eigenfunction \(g^{{{({j}})}}\) of \(T_{\kappa ^{{{({j}})}},c^{{{({j}})}}}\) that corresponds to the eigenvalue \(\varSigma (\kappa ^{{{({j}})}}, c^{{{({j}})}})\), we can construct an eigenfunction g for \(T_{\kappa ,b^{{{({\sigma }})}}}\) with the same eigenvalue by choosing \(g=g^{{{({j}})}}\) on \(S_j\) and \(g=0\) on \({\mathcal S }{\setminus } S_j\). Hence, \(\varSigma (\kappa , b^{{{({\sigma }})}}) = \varSigma (\kappa ^{{{({j}})}}, c^{{{({j}})}})>1\). Now, consider \(\sigma \equiv 1\). Then we can argue as in (6.36) to show that \(\varSigma (\kappa ,b^{{{({\sigma }})}})=1\). \(\square \)
7 Analysis of minimizers of the rate function in Theorem 1.1
In this section we provide the final steps needed for the optimization of the rate function and prove Theorems 2.3 and 2.1. Since the arguments for the remaining steps do not rely on discrete combinatorics (as it was the case in Sect. 6), we will immediately work in the general setting. In Sect. 7.1 we study a constrained optimization problem for the functions \(I_\text {Me}\) and \(I_\text {Ma}\). In Sect. 7.2 we prove the explicit form of the rate functions that is derived by applying the contraction principle and formulated in Theorem 2.3. Section 7.3 presents the last step for a full optimization of the rate function I that gives Theorem 2.1. In Sect. 7.4 we derive the Flory equation that we formulated in Proposition 2.7.
7.1 The minimizers of \(I_\text {Ma}\) and \(I_\text {Me}\)
Complementary to what was done in Sect. 6 for the function \(I_\text {Mi}\) we will solve the analogous optimization problems for the functions \(I_\text {Me}\) and \(I_\text {Ma}\) defined in (1.12) and (1.13). To optimize the function \(I_\text {Me}\) it is beneficial to combine it with \(I_\text {Ma}\). We will again fix a measure \(c\in {\mathcal M }({\mathcal S })\) to formulate the constraint. In contrast to the result of Proposition 6.1 it will turn out that we do not have to distinguish between the cases \(\varSigma (\kappa ,c) \le 1\) and \(\varSigma (\kappa ,c) >1\).
Lemma 7.1
(Minimizers of \(I_\text {Me}\) and \(I_\text {Ma}\)) Let \(c\in {\mathcal M }({\mathcal S })\) be such that \(c\le \mu \). Then
If \(\kappa \) is irreducible with respect to c, then the minimizers are unique.
In order to prove the result above, we need the following lemma.
Lemma 7.2
Let \(c\in {\mathcal M }({\mathcal S })\) and let \(\kappa \) be irreducible with respect to c. Let \(\alpha \in {\mathcal A}\) be such that \(c(\alpha ) = c\) and assume that \(\alpha = \sum _{i\in I} \delta _{y^{{{({i}})}}}\) with \({\left{I}\right} \ge 2\) and \(I_\text {Ma}(\alpha )<\infty \). Then for any fixed \(i\in I\) there is a measurable set \(A\subset {\mathcal S }\) such that \(y^{{{({i}})}}(A) >0\) and \(\kappa (cy^{{{({i}})}})(x)>0\) for all \(x\in A\).
Proof
Denote \(y := y^{{{({i}})}}\), \(S_1 = \text {supp}(y)\) and \(S_2 = \text {supp}(c(\alpha )  y)\). We first study the case where the sets \(S_1\) and \(S_2\) are disjoint. Assume towards a contradiction that for yalmost every r we have that \(\kappa (c y)(r) =0\). Then \(\left. \kappa \right _{S_1 \times S_2} = 0\) which is equivalent to saying that \(\left. \kappa \right _{S_1 \times {\mathcal S }{\setminus } S_1} = 0\) holds calmost everywhere. Since \(\kappa \) is irreducible with respect to c we get that either \(c(S_1) = 0\) or \(c({\mathcal S }{\setminus } S_1) = 0\). Consequently, either \(y(S_1) = 0\) or \((cy)({\mathcal S }{\setminus } S_1) = 0\), i.e., either \(y = 0\) or \(c  y = 0\) in contradiction to our assumptions.
Now, assume that \(S_1\) and \(S_2\) are not disjoint. In that case we can pick \(j\in I{\setminus }\{i\}\) in such a way that with \({\hat{y}} := y^{{{({j}})}}\) there exists \(r_0\in \text {supp}(y) \cap \text {supp}({\hat{y}})\), which implies that for any open neighborhood \(A_0\subset {\mathcal S }\) of \(r_0\) we have that \(y(A_0) > 0\) and \({\hat{y}} (A_0) >0\). By our assumption \(I_\text {Ma}(\alpha ) < \infty \), we have that \(\infty < \langle {\hat{y}}, \log (1{{\text {e}} }^{\kappa {\hat{y}}})\rangle \). This, together with the uniform continuity of \(\kappa \) implies that we can find a neighborhood \(A_0\) of \(r_0\) such that \(\left. \kappa {\hat{y}} \right _{A_0} >0\). Now the claim follows, since \(\kappa (cy) \ge \kappa {\hat{y}}\) and \(y(A_0) >0\). \(\square \)
Now we are ready to prove Lemma 7.1.
Proof of Lemma 7.1
Writing \({{\widehat{I}}}_\text {Ma}(\alpha ) = I_\text {Ma}(\alpha )  \frac{1}{2} \langle c(\alpha ), \kappa \mu \rangle \) and \({{\widehat{I}}}_\text {Me}(\nu ) = I_\text {Me}(\nu )  \frac{1}{2} \langle \nu , \kappa \mu \rangle \) it suffices to prove Eqs. (7.1) and (7.2) for \({{\widehat{I}}}_\text {Ma}\) and \({{\widehat{I}}}_\text {Me}\) since the difference does not depend on c. Observe that \({{\widehat{I}}}_\text {Ma}(\alpha ) =A(\alpha ) + B(\alpha )\), where
Let \(\alpha \in {\mathcal A}\) with \(c(\alpha ) = c\). Note that \(\int \alpha (\textrm{d}y)\, \kappa y (r)=\kappa c(r)\) for \(r\in {\mathcal S }\). With \(\phi (u) = u\log u\) we use Jensen’s inequality to get that
where we used that \(\int \alpha (\textrm{d}y) \frac{\textrm{d}y}{\textrm{d}\mu }(r) = \frac{\textrm{d}c}{\textrm{d}\mu }(r)\). We now derive a corresponding lower bound for \(B(\alpha )\). Note that the function \(u\mapsto u/\sinh (u)=: \psi (u)\) is strictly decreasing on \([0,\infty )\) and that for any \(y\in \text {supp}(\alpha )\) we have that \(\kappa y\le \kappa c\). Therefore
This implies Eq. (7.1). Now, let \(\alpha \in {\mathcal A}\) and \(\nu \in {\mathcal M }({\mathcal S })\) be such that \(c(\alpha ) + \nu = c\). Note that
Since \(c(\alpha + \delta _\nu ) = c\) holds, we can use the estimate from before to get that \(A(\alpha + \delta _\nu ) \ge A(\delta _c)\). To get the estimate for \(B(\alpha )\), observe that we still have that \(\kappa y \le \kappa c\) for any \(y\in \text {supp}(\alpha )\). So,
where the second estimate is due to the the fact that \(\psi (u) \in [0,1]\) for \(u \ge 0\) and thus \(\log \psi (\frac{1}{2} \kappa c(r)) \le 0\). Combining the estimates gives Eq. (7.2).
For both uniqueness claims we rely on Lemma 7.2 above.
To show uniqueness of the minimizer in Eq. (7.1) assume that \(\alpha \in {\mathcal A}\) with \(c(\alpha ) = c\) and \(\alpha \ne \delta _c\). Without loss of generality we can assume that \(I_\text {Ma}(\alpha ) < \infty \). Now, we only have to note that the inequality in the estimate (7.6) is a strict inequality, since by Lemma 7.2 we have that \(\kappa y < \kappa c\) holds on some set A, for which \(y(A)>0\) and the function \(\psi \) is strictly decreasing.
To show uniqueness of the minimizer in Eq. (7.2) assume that \(\alpha \in {\mathcal A}\) and \(\nu \in {\mathcal M }({\mathcal S })\) with \(\nu \ne 0\) and \(c(\alpha ) + \nu = c\). Then by the same arguments as before
holds, if \(\kappa c > \kappa c(\alpha )\) on some measurable set \(A\subset {\mathcal S }\) for which \(c(\alpha )(A) > 0\). To see that the latter condition is satisfied, we apply Lemma 7.2 to the measure \(\delta _{c(\alpha )}+ \delta _\nu \). This proves the claim. \(\square \)
7.2 Minimization for the contraction principles
Here, we will exploit the work of Sects. 6 and 7.1 to prove Theorem 2.3, which is an application of the contraction principle but also provides an explicit solution for the optimization problem. When studying the optimization problem in the large deviation principle for \(\text {Ma}_N\), we encounter a functional that combines rates coming from the microscopic and the mesoscopic part. Its optimization is derived in the following lemma.
Lemma 7.3
Fix \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \). For \(b\in {\mathcal M }({\mathcal S })\) with \(b\le c\) and \(\varSigma (\kappa ,b) \le 1\) let \(G_c\) be as in (6.29)
Then the following holds.

1.
If \(\varSigma (\kappa ,c) \le 1\), then
$$\begin{aligned} \min \left\{ G_c(b) :b\in {\mathcal M }({\mathcal S }), b\le c, \varSigma (\kappa ,b) \le 1\right\} = G_c(c), \end{aligned}$$(7.7)and c is the unique minimizer.

2.
If \(\varSigma (\kappa ,c) >1\), then
$$\begin{aligned} \min \left\{ G_c(b) :b\in {\mathcal M }({\mathcal S }), b\le c, \varSigma (\kappa ,b) \le 1\right\} = G_c(b^*(c)) \end{aligned}$$(7.8)and \(b^*(c)\) is the unique minimizer, which is given as the minimal, nontrivial (i.e., not equal to c) solution of (2.9), and it satisfies \(\varSigma (\kappa , b^*(c))=1\).
Proof
(1) We use that \(\frac{1}{2} \langle c, \kappa c \rangle  \frac{1}{2} \langle b, \kappa b\rangle = \langle b, \kappa (cb)\rangle + \frac{1}{2} \langle cb,\kappa (cb)\rangle \) holds by the symmetry of \(\kappa \). Therefore,
We claim that \(\kappa (cb){{\text {e}} }^{\frac{1}{2} \kappa (cb)} \le 1{{\text {e}} }^{\kappa (cb)}\) holds pointwise. Indeed, the function \(\psi (z) := 1{{\text {e}} }^{z}z{{\text {e}} }^{\frac{z}{2}}\), \(z\ge 0\), satisfies that \(\psi (0) =0\) and \(\psi ' (z) = {{\text {e}} }^{\frac{z}{2}}({{\text {e}} }^{\frac{z}{2}} (1\frac{z}{2}))\ge 0\) for any \(z\ge 0\), implying that \(\psi (z) \ge 0\) for any \(z\ge 0\). So the claim holds, since \(\kappa (cb) \ge 0\) holds pointwise. Therefore, we can estimate
with equality if and only if \(b=c\). The last inequality can be seen by applying Jensen’s inequality to the function \(x \mapsto x\log x\) or by noting the following: For any point \(r\in {\mathcal S }\) the term in brackets in the last line is an entropy between the Bernoulli distribution with (success) parameter \(\frac{\textrm{d}b}{\textrm{d}c}(r)\) and the Bernoulli distribution with parameter \({{\text {e}} }^{\kappa (cb)(r)}\) and therefore nonnegative.
(2) Define \(F_c\) as the generalized analog of (6.28), i.e., for \(b\le c\)
Let \(b,b^\prime \le c\) with \(\varSigma (\kappa ,b) \le 1\) and \(\varSigma (\kappa , b^\prime ) \le 1\). We want to show that \(F_c(b^\prime ) \le G_c(b)\). By rearranging terms one can see that
Now, given the signed measure \(bb^\prime \) we use the Hahn decomposition theorem to decompose \({\mathcal S }\) into two disjoint sets \(S_+\), \(S_\) with \(S_+ \cup S_ = {\mathcal S }\) such that \(\delta _+ (\cdot ):=(bb^\prime )(\cdot \cap S_+)\) and \(\delta _(\cdot ):= (bb^\prime )(\cdot \cap S_)\) are nonnegative measures and \(bb^\prime = \delta _+  \delta _\). Observe that
We write \(f_{b^\prime , b} := \frac{\textrm{d}b^\prime }{\textrm{d}b} \mathbbm {1}_{S_+}\) and denote by \(\langle \cdot , \cdot \rangle _b\) the inner product on \(L^2(b)\), i.e. \(\langle f, g \rangle _b = \int f(s) g(s)\, b(\textrm{d}s)\). Note that by the symmetry of \(\kappa \) we have \(\langle f, T_{\kappa , b} g \rangle _b = \langle g, T_{\kappa ,b}f\rangle _b\), so we have that \(\varSigma (\kappa , b) = \sup _{f\ne 0} \frac{\langle f, T_{\kappa ,b}f\rangle _b}{\langle f, f \rangle _b}\le 1\). Then
An elementary analysis shows that \(\frac{1}{2} (1x)^2\le \log x +x 1\) for \(x\in (0,1]\) with equality if and only if \(x=1\), and since \(b^\prime \le b\) on \(S_+\) implies that \(f_{b^\prime , b}(s)\in (0,1]\) for \(s\in S_+\), we get that
Denote \(f_{b,b^\prime }:= \frac{\textrm{d}b}{\textrm{d}b^\prime }\mathbbm {1}_{S_}\). Interchanging the roles of b and \(b^\prime \) and replacing \(S_+\) by \(S_\) one can argue as in (7.10) to show that
An elementary analysis shows that \(\frac{1}{2} (1x)^2 \le x\log x +1 x\) for \(x\in (0,1]\) with equality if and only if \(x=1\), and since \(b\le b^\prime \) on \(S_\) implies that \(f_{b, b^\prime }(s)\in [0,1]\) for \(s\in S_\), we get that
Note that the two expressions on the righthand sides of (7.11) and (7.12) sum up to \(\mathbb {H}(bb^\prime )\), hence we have shown that
and we have equality if and only if \(b=b^\prime \). Using this in Eq. (7.9) and the fact that the first entropy term in (7.9) is always nonnegative, we get that
Note that \(G_c(b)F_c(b^\prime )=0\) if and only if the following conditions are satisfied: (i) \(b=b^\prime \), (ii) b is a solution of (2.9) and (iii) \(\varSigma (\kappa ,b)=1\). By Lemma 6.14 the only choice is given by \(b=b^\prime =b_*\), where \(b_*\) is the unique minimal solution of (2.9). Hence, the uniqueness claim holds. \(\square \)
Proof of Theorem 2.3
The projection \((\lambda ,\alpha ) \mapsto \lambda \) is continuous with respect to the vague topology, so the contraction principle gives that the LDP for \(\text {Mi}_N\) holds with rate function
assuming \(c(\lambda ) \le \mu \) (the other case is trivial). By Eq. (7.2) of Lemma 7.1 we immediately get the representation for \({\mathcal I }_\text {Mi}\) claimed in Eq. (2.6).
The projection \((\lambda ,\alpha ) \mapsto \alpha \) is continuous with respect to the chosen topology, so the contraction principle gives that the LDP for \(\text {Ma}_N\) holds with rate function
assuming \(c(\alpha ) \le \mu \) (the other case is trivial). Define \(\mu _\alpha = \mu c(\alpha )\).
Now, assume that \(\varSigma (\kappa ,\mu _\alpha )\le 1\). Then for any fixed \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu _\alpha \) we have that \(\varSigma (\kappa ,c)\le 1\), so according to Eq. (6.1) of Proposition 6.1 we have
with \(G_{\mu _\alpha }(c)\) defined as in (6.29). By Lemma 7.3 we have that \(\min _{c\le \mu _\alpha }G_{\mu _\alpha }(c)= G_{\mu _\alpha }(\mu _\alpha )\), which implies Eq. (2.7) under the assumption \(\varSigma (\kappa ,\mu _\alpha )\le 1\).
Now, assume that \(\varSigma (\kappa ,\mu _\alpha )> 1\). Then by Proposition 6.1 we have
In the case \(\varSigma (\kappa ,c) >1\), one can use the same argument as in (7.5) to show that \(G_{c}(b^*(c)) + \langle \mu _\alpha c,\log \frac{\mu _\alpha c}{\kappa (\mu _\alpha c)}\rangle > G_{\mu _\alpha }(b^*(c))\), where \(b^*= b^*(c)\) is given as in (2.9). In particular, \(\varSigma (\kappa ,b^*(c))=1\) holds, so any possible minimizer has to be in the set \(\{c:\varSigma (\kappa ,c) \le 1\}\). Now, recall that due to Lemma 7.3
This proves the claim of equation (2.7) under the assumption \(\varSigma (\kappa ,\mu _\alpha )> 1\). \(\square \)
7.3 The minimizers of I
Proof of Theorem 2.1
Note that \(\inf I(\lambda ,\alpha ) = \inf _{c\in {\mathcal M }({\mathcal S }):c\le \mu } {\mathcal J }(c)\) where for fixed \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \) we define
By Proposition 6.1 and Lemma 7.1 we have that
with \(b^*= b^*(c)\) characterized in (2.9).
We will start by minimizing the function \({\mathcal J }_{\le 1}\) over all \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \). Rearranging terms, we get that
where for \(r\in {\mathcal S }\) we defined \(\beta ^{{{({r}})}}\) and \(\gamma ^{{{({r}})}}\) to be Bernoulli distributions with success rate \(\frac{\textrm{d}c}{\textrm{d}\mu }(r)\) and \({{\text {e}} }^{\kappa (\mu c)(r)}\), respectively (note that \(c\le \mu \) implies that \(\frac{\textrm{d}c}{\textrm{d}\mu }(r) \le 1\) for all \(r\in {\mathcal S }\)). By Jensen’s inequality we have that \(H(\beta ^{{{({r}})}}\gamma ^{{{({r}})}}) \ge 0\) for all choices of \(\beta ^{{{({r}})}}\) and \(\gamma ^{{{({r}})}}\) and \(H(\beta ^{{{({r}})}}\gamma ^{{{({r}})}}) = 0\) if and only if \(\beta ^{{{({r}})}}=\gamma ^{{{({r}})}}\), that is if and only if \(\frac{\textrm{d}c}{\textrm{d}\mu }(r)={{\text {e}} }^{\kappa (\mu c)(r)}\). The minimizer of \({\mathcal J }_{\le 1}\) is therefore characterized by (2.4).
In the case \(\varSigma (\kappa ,\mu ) \le 1\) (which implies \(\varSigma (\kappa ,c) \le 1\) for all \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \)), Lemma 4.1 states that \(\mu \) is the only solution to (2.4). Note that even though we formulated Lemma 4.1 only for the case of a finite type space, it is valid also under our general assumptions due to Theorem 6.2 and Theorem 6.7 in [7].
Now assume that \(\varSigma (\kappa ,\mu ) > 1\). Then for any \(c\in {\mathcal M }({\mathcal S })\) with \(c\le \mu \) and \(\varSigma (\kappa ,c) >1\), Lemma 7.1 implies that \(I_\text {Me}(cb^*) + I_\text {Ma}(\delta _{\mu c}) > I_\text {Ma}(\delta _{\mu b^*})\), where \(b^* = b^*(c)\) is given as in (2.9) and satisfies \(\varSigma (\kappa ,b^*) = 1\). Therefore, \({\mathcal J }_{>1}(c) > {\mathcal J }_{\le 1}(b^*(c))\), which implies that the minimizer of \({\mathcal J }\) lies in the set \(\{c:\varSigma (\kappa ,c) \le 1\}\). (Note, that in this way, \(\mu \) is ruled out as a minimizer, although it solves Eq. (2.4)). By the analysis of \({\mathcal J }_{\le 1}\) above, the minimizer of \({\mathcal J }\) is given by a solution to (2.4) satisfying \(\varSigma (\kappa ,c)\le 1\). Applying the second part of Lemma 4.1 (again under generalized assumptions) finishes the proof. \(\square \)
Remark 7.4
(Reducibility) Theorem 2.1 and 2.3 are proved under the assumptions of Theorem 1.1, in particular when \(\kappa \) is irreducible with respect to \(\mu \). We see however that this condition does not play any role in the minimization of Proposition 6.1 and of Lemma 7.1. It is indeed Theorem 2.6 that excludes the admissibility of a minimizer of the form \((\lambda _{c^*},\delta _{\mu c^*})\) when \(\varSigma (\kappa ,\mu )>1\) for \(\widetilde{I}\), as \(\alpha ^*=\delta _{\mu c^*}\) may not be connectable. It is straightforward to see that the optimal macroscopic mass in this case takes the form \(\widetilde{\alpha }^*=\sum _n \delta _{y^{{{({n}})}}}\), with \(y^{{{({n}})}}(\cdot )=(\mu c^*)(\cdot \cap S^{{{({n}})}})\), for each irreducible class \(S^{{{({n}})}}\).
7.4 The Flory equation
As we explained in Sect. 2.4 the graph model studied in this paper has an important connection with a certain inhomogeneous coagulation process. In this section we prove that the statistics of the limiting microscopic cluster distribution, i.e., the minimizer of the rate function I, satisfy the Flory equation (2.14), the related deterministic PDE, over the entire time interval \([0,\infty )\), before and after the gelation time \(t_\textrm{c}=1/\varSigma (\kappa ,\mu )\). We prove it in the case of a finite type set \({\mathcal S }\).
We fix an irreducible symmetric matrix \(\kappa \) on the finite type space \({\mathcal S }\). Recall from Proposition 6.2 the explicit formula
for the minimizer of the microscopic rate function \(I_\text {Mi}\) (see also (6.3)). With this notation we stressed the dependence on \(\kappa \), since we consider now \(t\kappa \) instead of \(\kappa \), where \(t\in [0,\infty )\) is a time instant, writing \(\lambda (\mu ;t\kappa )\). Note that \(\lambda (\mu ; t \kappa )\) is the minimizer both for \(t<t_{\textrm{c}}\) and for \(t\ge t_{\textrm{c}}\), as the characteristic equation \(c^*_r(t){{\text {e}} }^{t(\kappa c^*(t))_r}=\mu _r{{\text {e}} }^{t(\kappa \mu )_r}\) (where \(c^*(t)\) is the \(c^*\) of Proposition 6.2 for \(t\kappa \) instead of \(\kappa \)) ensures that \(\lambda (c^*(t); t\kappa ) = \lambda (\mu ; t\kappa )\). Note that \(c(\lambda (c^*(t); t\kappa )) = c^*(t)\). We now show that the function \(t\mapsto \lambda (\mu ; t \kappa )\) solves the Flory equation that corresponds to our model.
Lemma 7.5
The function \(t\mapsto \lambda (\mu ; t \kappa )\) is a solution of
with initial condition \(\lambda (\mu ; 0) = \sum _{r\in {\mathcal S }} \mu _r \mathbbm {1}_{\{{{\textbf {e}}}_r\}}\).
Proof
The initial condition is easily checked.
Since any tree contributing to the term \(\tau (k;t\kappa )\) has exactly \({\left{k}\right}1\) edges, we can rewrite \(\lambda (\mu ;t\kappa )\) for any \(t\ge 0\) as
Abreviating \(\lambda (t) := \lambda (\mu ;t \kappa )\), we get that
Now, we study the first summand of the r.h.s. of (7.17). By first inserting (7.18) and then using the recursive Eq. (6.7) from Lemma 4.3, we have that
Furthermore, we have that
which implies the claim. \(\square \)
Notes
We use “cluster” and “component” synonymously.
References
Aldous, D.J.: Deterministic and stochastic models for coalescence (aggregation and coagulation): a review of the meanfield theory for probabilists. Bernoulli 5(1), 3–48 (1999)
Andreis, L., König, W., Patterson, R.I.: A largedeviations principle for all the cluster sizes of a sparse Erdős–Rényi graph. Random Struct. Algorithms (2019)
Baldasso, R., Oliveira, R.I., Pereira, A., Reis, G.: Large deviations for marked sparse random graphs with applications to interacting diffusions (2022). https://arxiv.org/pdf/2204.08789.pdf
Bernardi, O., Morales, A.H.: Counting trees using symmetries. J. Comb. Theory Ser. A 123(1), 104–122 (2014). https://doi.org/10.1016/j.jcta.2013.12.001
Bhamidi, S., van der Hofstad, R., van Leeuwaarden, J.S.H.: Scaling limits for critical inhomogeneous random graphs with finite third moments. Electron. J. Probab. 15(54), 1682–1703 (2010). https://doi.org/10.1214/EJP.v15817
Bhamidi, S., van der Hofstad, R., van Leeuwaarden, J.S.H.: Novel scaling limits for critical inhomogeneous random graphs. Ann. Probab. 40(6), 2299–2361 (2012). https://doi.org/10.1214/11AOP680
Bollobás, B., Janson, S., Riordan, O.: The phase transition in inhomogeneous random graphs. Random Struct. Algorithms 31(1), 3–122 (2007)
Bordenave, C., Caputo, P.: Large deviations of empirical neighborhood distribution in sparse random graphs. Probab. Theory Relat. Fields 163(1–2), 149–222 (2015)
Borgs, C., Chayes, J., Gaudio, J., Petti, S., Sen, S.: A large deviation principle for block models. arXiv preprint arXiv:2007.14508 (2020)
Borgs, C., Chayes, J.T., Dhara, S., Sen, S.: Limits of sparse configuration models and beyond: graphexes and multigraphexes. Ann. Probab. 49(6), 2830–2873 (2021)
Chakrabarty, A., Chakraborty, S., Hazra, R.S.: Eigenvalues outside the bulk of inhomogeneous Erdős–Rényi random graphs. J. Stat. Phys. 181(5), 1746–1780 (2020). https://doi.org/10.1007/s10955020026447
Chakraborty, S., van der Hofstad, R., Hollander, F.D.: Sparse random graphs with many triangles. arXiv preprint arXiv:2112.06526 (2021)
Chatterjee, S.: An introduction to large deviations for random graphs. Bull. Am. Math. Soc. 53(4), 617–642 (2016). https://doi.org/10.1090/bull/1539
Chatterjee, S., Varadhan, S.: The large deviation principle for the Erdős–Rényi random graph. Eur. J. Comb. 32(7), 1000–1017 (2011). https://doi.org/10.1016/j.ejc.2011.03.014
Cook, N.A., Dembo, A.: Large deviations of subgraph counts for sparse ErdősRényi graphs. Adv. Math. 373, 107289 (2020). https://doi.org/10.1016/j.aim.2020.107289
Crane, E., Ráth, B., Yeo, D.: Age evolution in the mean field forest fire model via multitype branching processes. Ann. Probab. 49(4), 2031–2075 (2021)
Delgosha, P., Anantharam, V.: A notion of entropy for stochastic processes on marked rooted graphs. arXiv preprint arXiv:1908.00964 (2019)
Dembo, A., Lubetzky, E.: A large deviation principle for the Erdős–Rényi uniform random graph. Electron. Commun. Probab. 23, 1–13 (2018)
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications, Stochastic Modelling and Applied Probability, vol. 38. Springer, Berlin (2010) (Corrected reprint of the second (1998) edition). https://doi.org/10.1007/9783642033117
Devroye, L., Fraiman, N.: Connectivity of inhomogeneous random graphs. Random Struct. Algorithms 45(3), 408–420 (2014)
Ganguly, S., Hiesmayr, E., Nam, K.: Upper tail behavior of the number of triangles in random graphs with constant average degree. arXiv preprint arXiv:2202.06916 (2022)
Georgii, H.O.: Gibbs Measures and Phase Transitions. De Gruyter Studies in Mathematics, vol. 9. Walter de Gruyter & Co., Berlin (1988). https://doi.org/10.1515/9783110850147
Gessel, I.M.: A combinatorial proof of the multivariable Lagrange inversion formula. J. Comb. Theory Ser. A 45(2), 178–195 (1987)
Gilbert, E.N.: Random graphs. Ann. Math. Statist. 30(4), 1141–1144 (1959)
Jansen, S., Kuna, T., Tsagkarogiannis, D.: Virial inversion and density functionals. J. Funct. Anal. 284(1), 109731 (2023). https://doi.org/10.1016/j.jfa.2022.109731
Jansen, S., Kuna, T., Tsagkarogiannis, D.: Lagrange inversion and combinatorial species with uncountable color palette. In: Annales Henri Poincaré, pp. 1–36. Springer (2021)
Kovchegov, Y., Otto, P.T.: Multidimensional Lambert–Euler inversion and vectormultiplicative coalescent processes. arXiv preprint arXiv:2107.13162 (2021)
Markering, M.: The large deviation principle for inhomogeneous ErdősRényi random graphs. J. Theor. Probab. (2022). https://doi.org/10.1007/s10959022011811
Merle, M., Normand, R.: Selforganized criticality in a discrete model for Smoluchowski’s equation. arXiv preprint arXiv:1410.8338 (2014)
Normand, R., Zambotti, L.: Uniqueness of postgelation solutions of a class of coagulation equations. Ann. Inst. H. Poincaré Anal. Non Linéaire 28(2), 189–215 (2011). https://doi.org/10.1016/j.anihpc.2010.10.005
Norris, J.R.: Cluster coagulation. Commun. Math. Phys. 209(2), 407–435 (2000). https://doi.org/10.1007/s002200050026
O’Connell, N.: Some large deviation results for sparse random graphs. Probab. Theory Relat. Fields 110(3), 277–285 (1998)
Ráth, B., Tóth, B., et al.: Erdős–Rényi random graphs + forest fires = selforganized criticality. Electron. J. Probab. 14, 1290–1327 (2009)
Söderberg, B.: General formalism for inhomogeneous random graphs. Phys. Rev. E 66(6), 066121 (2002)
Stepanov, V.E.: On the probability of connectedness of a random graph \(\cal{G} _m(t)\). Theory Probab. Appl. 15(1), 55–67 (1970)
van der Hofstad, R.: Critical behavior in inhomogeneous random graphs. Random Struct. Algorithms 42(4), 480–508 (2013). https://doi.org/10.1002/rsa.20450
Yeo, D.: Frozen percolation on inhomogeneous random graphs. arXiv preprint arXiv:1810.02750 (2018)
Acknowledgements
This research has been funded by the Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 “Scaling Cascades in Complex Systems”, Project C08, and Grant SPP2265 “Random Geometric Systems”, Project P01.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Andreis, L., König, W., Langhammer, H. et al. A largedeviations principle for all the components in a sparse inhomogeneous random graph. Probab. Theory Relat. Fields 186, 521–620 (2023). https://doi.org/10.1007/s00440022011807
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440022011807
Keywords
 Inhomogeneous random graph
 Erdős–Rényi random graph
 Sparse random graph
 Empirical measures of components
 Large deviations
 Projective limits
 Giant cluster phase transition
 Asymptotics for connection probabilities
 Spatial coagulation model
 Flory equation
 Stochastic block model