1 Introduction

Consider the following system of a countable number of ordinary differential equations

$$\begin{aligned} \frac{\textrm{d}w_n}{\textrm{d}t} = \frac{1}{2} \sum \limits _{k+l = n} \textrm{K}(k, l) w_k w_l - \sum \limits _{k=1}^\infty \textrm{K}(n, k) w_n w_k, \qquad n \in \mathbb {N}, \end{aligned}$$
(1.1)

subject to initial conditions \(\{w_n(0)\}_{n \in {\mathbb N}}\). Originally introduced by Smoluchowski [24] as a model for evolution of the size distribution \(w_n(t)\) of merging particles (also called colloids or clusters), this equation became one of the standard equations in mathematical physics. If we interpret n as the particle size and \(w_n\) as a (non-normalised) size distribution, the two terms in the right hand side of equation (1.1) represent respectively the rate of production of new particles of size \(n=k+l\) through merging two smaller particles of sizes \(k, l \in {\mathbb N}\), and the rate of consumption of particles of size n as they themselves are being incorporated into larger particles [3, 10]. The vast popularity of equation (1.1) for modeling, see for example the review articles [3, 22] and the references therein, is due to the flexibility one gains by specifying the kernel \(\textrm{K}(k,l)\) to reflect a particular physical mechanism for merging of particles. Among the most popular are the constant \(\textrm{K}(k, l) = 1,\) additive \(\textrm{K}(k, l) = k+l\) and multiplicative \(\textrm{K}(k, l) = kl\) kernels, though kernels with fractional exponents also appear [3].

In this paper, we study a more general multicomponent (i.e. multidimensional) Smoluchowski’s equation in which the particle sizes are vector-valued and the coagulation kernel is a symmetric bilinear form, \(\textrm{K}(\textbf{k}, \textbf{l}) = \textbf{k}^\top A \textbf{l}\). This kernel is a generalization of the multiplicative kernel in the monocomponent setting. The multicomponent coagulation equation is then written out as:

$$\begin{aligned} \frac{\textrm{d}w_{\textbf{n}}}{\textrm{d}t} = \frac{1}{2} \sum \limits _{\textbf{k}+\textbf{l} = \textbf{n}} \textrm{K}(\textbf{k}, \textbf{l}) w_{\textbf{k}} w_{\textbf{l}} - \sum \limits _{\textbf{k} \in \mathbb {N}_0^m} \textrm{K}(\textbf{n}, \textbf{k}) w_{\textbf{n}} w_{\textbf{k}}, \qquad \textbf{n} \in \mathbb {N}_0^m, \end{aligned}$$
(1.2)

with \(\{w_{\textbf{n}}(0)\}\) being the initial distribution. Even though system (1.2) is comprised of ordinary differential equations (ODEs), the fact that this system contains a countably infinite number of equations means that some aspects of its behaviour are closer to those appearing in partial differential equations (PDEs). This similarity explains why we observe such phenomena as gelation and localization in the Smoluchowski equation, and justifies the necessity for more subtle analysis techniques. The similarity to PDEs can be made formal. As discussed in the next section, it is possible to map our system to vector-valued inviscid Burgers’ PDE in \(1+d\) dimensions by using a functional transform. Although Burgers’ equation does not have known analytical solutions even in 1+1 dimension, we have shown in our previous work [17] that these solutions can generally be represented using a branching process from probability theory. In the present paper we expand the branching process representation technique to multidimensional PDEs and apply it to the multicomponent Smoluchowski coagulation equation. The approach provides a probabilistic route to analysis of the Smoluchowski equation with multiplicative kernel and yields short proofs.

Equation (1.2) has appeared in many studies with different kernels, such as additive [8], rank-one multiplicative [9], and strictly sublinear [11,12,13, 25]. In earlier studies, [2, 20, 21], the multiplicative multicomponent kernel was studied through a combinatorial approach. The continuum limits for coagulation with multiplicative kernel were proven in  [1, 18, 23], which also show that the gelation time can be formulated as a simple eigenvalue problem.

More recently, an overarching combinatorial study of coalescence with multiplicative kernel [2], described the evolution of clusters of all sizes by pointing out the analogy with sparse inhomogeneous random graphs. Intuitively, if particles are viewed as nodes in the graph, the clusters become the connected components. The graph is random because the coagulation events place an additional random edge joining two components. The solution of the equation is then obtained through a large deviations principle for the connected components of the graph in the limit of large number of vertices. Alternatively, one can compute the size of a connected component, through the depth- or breadth-first search, which converges to a branching process in the limit of the vertices going to infinity [5, Theorem 11.6.1]. This connection was also noticed for the multicomponent Smoluchowski’s equation with multiplicative kernel and the multi-type branching process in [2, Lemma 4.7], which anticipated some of our current results, for instance, our Corollaries 2.6 (see [2, Remark 4.6]) and 2.8 (see [2, Lemma 4.5]). Finally, it is worth to mention that both coagulation and branching processes are also being discussed in the continuous setting [6, 19], and generalising our representation method to continuous branching remains an interesting perspective for a follow up study.

The structure of this paper is as follows. After establishing the equivalence between equation (1.2) and multidimensional inviscid Burger’s equation in Theorem 2.2, we show that the solution of the former can be written as an expectation with respect to a certain multitype branching process in Theorem 2.3, which constitutes the core of our analysis technique. This equivalence holds until certain critical time \(T_c\), which marks the breach of the mass conservation in the coagulation equation – a phenomenon known as the gelation. We characterise the gelation time \(T_c\) by linking it to the criticality of the multitype branching process. Finally, in Corollary 2.8, we show that large clusters feature localization – their composition in terms of fractions of particles of different type concentrates on a specific vector, which we characterise using a variational problem. Section 2 discusses the results and assumptions in details, while the proofs follow in Sect. 3.

2 Definitions and Results

Let \(m \in {\mathbb N}\) and consider an \(m \times m\) symmetric, irreducible matrix A. Let

$$\begin{aligned} \textrm{K}(\textbf{k}, \textbf{l}) := \textbf{k}^\top A \textbf{l}, \qquad \text { for any } \textbf{k}, \textbf{l} \in {\mathbb N}_0^m \end{aligned}$$
(2.1)

be the coagulation kernel associated to A. We also require basic assumptions on the initial condition:

  • \(w_{\textbf{n}}(0) \ge 0\) for all \(\textbf{n} \in {\mathbb N}_0^m\),

  • \(\sum _{\textbf{n} \in {\mathbb N}_0^m} |\textbf{n}| w_{\textbf{n}}(0) = 1\),

  • \(\{w_{\textbf{n}}(0)\}_{\textbf{n}\in {\mathbb N}_0^m}\) has non-empty, finite support,

  • \(w_{\textbf{0}}(0) = 0\).

Additionally, we require that the initial conditions are monodisperse. That is \(w_{\textbf{n}}(0)\) is non-zero only on the set of multi-indices \(\textbf{n} = \textbf{e}_i\), where \(\textbf{e}_i\) is the i-th standard basis vector. In this case, we define \(p_{i}:=w_{\mathbf {e_i}}(0)\). We will make further comment on this assumption in Sect. 2.1.

Having set the assumptions, we refer to equation (1.2) supplied with kernel \(\textrm{K}(\textbf{k}, \textbf{l})\) and initial condition \(w_{\textbf{n}}(0)\) as the multicomponent Smoluchowski equation. Next, we refer to the vector

$$\begin{aligned} (\textbf{m}(t))_i := \sum \limits _{\textbf{n}\in {\mathbb N}_0^m} n_i w_{\textbf{n}}(t), \; i \in [m], \end{aligned}$$
(2.2)

as the total mass vector of the multicomponent system at time t and say that the equation (1.2) exhibits gelation at time \(T_c \in (0, \infty )\) such that \(\textbf{m}(t) = \textbf{m}(0)\) for all \(t \in [0, T_c)\) and \(\textbf{m}_i(t) < \textbf{m}_i(0)\) for some i and \(t > T_c\).

Using the fact that the mass vector is conserved before gelation, allows us to simplify the coagulation equation (1.2) to obtain

$$\begin{aligned} \frac{\textrm{d}w_{\textbf{n}}}{\textrm{d}t} = \frac{1}{2} \sum \limits _{\textbf{k}+\textbf{l} = \textbf{n}} \textrm{K}(\textbf{k}, \textbf{l}) w_{\textbf{k}} w_{\textbf{l}} - \textbf{n}^\top A \textbf{m}(0) w_{\textbf{n}}, \text { for all }\textbf{n} \in \mathbb {N}_0^m, \end{aligned}$$
(2.3)

which we call the reduced multicomponent Smoluchowski equation.

Proposition 2.1

The solutions of equations (1.2) and (2.3) coincide for \(t < T_c\) and both equations undergo gelation at \(T_c\).

The proposition is proved in Sect. 3.1. Our task is now to analyse the reduced equation (2.3), which we do by mapping it to a multidimensional PDE. Let \(\{w_{\textbf{n}}(t)\}_{\textbf{n} \in {\mathbb N}_0^m}\) be the solution of the multicomponent Smoluchowski’s equation (2.3) before gelation. Consider the multivariate generating function (GF)

$$\begin{aligned} U(t, \textbf{x}) := \sum \limits _{\textbf{n} \in \mathbb {N}_0^m} w_{\textbf{n}}(t) e^{-\textbf{x}\cdot \textbf{n}}, \end{aligned}$$
(2.4)

where \(\textbf{x} \in [0, \infty )^m\) and \(t \ge 0\). Let \(\textbf{u} = -\nabla U\), where \(\nabla \) is the gradient with respect to the coordinates \(x_1, \ldots , x_m\). We write \(\nabla \textbf{u}\) for the Jacobian of \(\textbf{u}\) with respect to the spatial variables \(\textbf{x}\). Finally, observe that \( \textbf{m}(t) = -\nabla U(t, \textbf{0}). \) We have the following theorem:

Theorem 2.2

The above-defined GF of the solution of the multicomponent coagulation equation (2.3), \(\textbf{u}(t,\textbf{x})\), solves the following PDE:

$$\begin{aligned} \frac{\partial }{\partial t} \textbf{u}(t, \textbf{x}) = -(\nabla \textbf{u}(t, \textbf{x})) A(\textbf{u}(t, \textbf{x})-\textbf{m}(t)), \qquad \textbf{x} \in [0, \infty )^m \end{aligned}$$
(2.5)

with initial condition \(\textbf{u}(0, \textbf{x}) = -\nabla U (0, \textbf{x})\) for \(t < T_c\), where \(T_c\) is the gelation time of (2.3). Moreover, for any collection of coefficients \(\{w_{\textbf{n}}(t)\}_{\textbf{n} \in {\mathbb N}_0^m}\) such that

$$\begin{aligned} \textbf{u}(t, \textbf{x}) = \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} w_{\textbf{n}}(t) e^{-\textbf{n} \cdot \textbf{x}} \end{aligned}$$
(2.6)

solves (2.5), \(\{w_{\textbf{n}}(t)\}_{\textbf{n} \in {\mathbb N}_0^m}\) is a solution of (2.3) with initial condition \(\{w_{\textbf{n}}(0)\}_{\textbf{n} \in {\mathbb N}_0^m}\).

Note that the theorem does not imply uniqueness of the solution. However, we do establish the uniqueness for a related equation (2.3) later on, in Corollary 2.6.

This theorem is proven in Sect. 3.2. The theorem states that if we manage to find a solution of the PDE (2.5) that has a power series expansion of the form (2.6), we can transform it into a solution of the multicomponent Smoluchowski equation (2.3). We will now show that such a power series solution is given by the overall total progeny distribution of a certain multi-type branching process determined by the initial conditions and the kernel matrix A.

First, we give the working definition of the branching process, as appears, for example, in [16]. Let \(X_{n,i}\) be independent and identically distributed copies of a non-negative discrete random variable X, indexed by \(n,i\in \mathbb N_0\). We refer to the probability mass function \(\mathbb P[X=k]\) as the offspring distribution. Then the recurrence equation

$$Z_{n+1} = \sum _{i=1}^{Z_n} X_{n,i}$$

with \(Z_0 = 1\) is called a simple branching process (\(Z_n\) being the number of individuals in generation n), and the random variable \(T:=\sum _{n=0}^{\infty } Z_n\) – its total progeny. A more general multi-type branching process is defined in a similar fashion, by replacing the random variable X with a collection of random vectors indexed with \(k\in [m]:={1,2,\dots ,m}\). For fixed type k, let \(\textbf{X}_{k,(n,i)}\) be a collection of independent and identically distributed copies of a non-negative discrete random vectors \(\textbf{X}_{k}\), which we call the random offspring of the i-th individual of type k. The multi-type branching process started from type i is the recurrence equation

$$\begin{aligned} \textbf{Z}_{n+1} = \sum \limits _{k=1}^m\sum \limits _{i=1}^{Z_{n, k}}\textbf{X}_{k,(n, i)}, \end{aligned}$$

initialized with the standard unit vector, \(\textbf{Z}_0 = \textbf{e}_i\), where \(\textbf{Z}_n = (Z_{n, 1}, Z_{n, 2}, \ldots ,Z_{n, m})\). Again, \(\textbf{Z}_n\) denotes the number of individuals of each type in generation n. Then, the random variable \(\textbf{T}^{(i)}:= \sum _{n=0}^\infty \textbf{Z}_n\) is called the total progeny of the multi-type branching process started from type i. By choosing the starting type k at random with probability \(p_k\), we define the overall total progeny as \(\textbf{T}:=\sum _{k = 1}^m p_k \textbf{T}^{(k)}\).

In general, \(\textbf{T}\) is not a proper random variable, in the sense that it may take value \(\infty \) with a positive probability. We hence define the extinction probability \(\xi = {\mathbb P}(|\textbf{T}|<\infty )\). When \(\xi =1\) we say that the branching process goes extinct almost surely (a.s.). Consider a matrix with elements \(M_{ij} = {\mathbb E}[X_{i, j}]\). It can be shown [4, 16] that the branching process goes extinct a.s. if and only if \(\Vert M\Vert _2 \le 1\). When \(\Vert M\Vert _2 = 1\), the branching process is said to be critical.

Now we are ready to define the specific multi-type branching process that will be used in the rest of the paper. Let the root have type \(k \in [m]\) with probability \(p_{k}\). We define the offspring vector \(\textbf{X}_k\) by giving its probability generating function

$$\begin{aligned} G_{\textbf{X}_k}(\textbf{s}) = \prod \limits _{l=1}^m \exp (t A_{kl} p_l (s_l-1)), \end{aligned}$$
(2.7)

incorporating the kernel matrix A and the initial conditions. Note that by varying the time parameter t, we can move between extinction and non-extinction of the branching process. We define \(T_c\) to be the time for which the branching process is critical. The overall total progeny of this branching process is denoted by \(\textbf{T}\). The total progeny conditioned on starting from type \(k \in [m]\) is denoted by \(\textbf{T}^{(k)}\).

Theorem 2.3

Consider the multi-type branching process as defined above. The PDE (2.5) is solved by \(\textbf{u}(t, \textbf{x})\) where \(u_i(t, \textbf{x})\) are defined as

$$\begin{aligned} u_i(t, \textbf{x}) := p_i G_{\textbf{T}^{(i)}}(e^{-x_1}, \ldots , e^{-x_m}) \end{aligned}$$
(2.8)

for each \(i \in [m]\). This solution is smooth up until the critical time \(T_c= \Vert A P\Vert _2^{-1}\), where \(P = \textrm{diag}(p_1, \ldots , p_m)\).

The proof is given in Sect. 3.3.

Remark 2.4

If one sets \(\textbf{m}(t) = \textbf{m}(0)\) for all \(t \in {\mathbb R}\), the solution (2.8) will be a valid solution of equation (2.3) also for \(t > T_c\). In this case, the equation (2.3) is known as Flory’s equation.

Corollary 2.5

The gelation time of the multicomponent Smoluchowski equation (2.3) is given by the critical time \(T_c\) of the multi-type branching process.

Corollary 2.6

For any \(\textbf{n} \in {\mathbb N}^m_0\), fix an arbitrary \(i \in [m]\) such that \(n_i>0\). The solution of the multicomponent Smoluchowski equation  (2.3) at \(\textbf{n}\) is given by

$$\begin{aligned} w_{\textbf{n}}(t) = \frac{p_i}{n_i} {\mathbb P}(\textbf{T}^{(i)} = \textbf{n}) \end{aligned}$$
(2.9)

for \(t \in (0, T_c).\)

Remark 2.7

Note that the above solution is unique, because of the triangular structure of the system of ODEs (2.3), while Theorem 2.2 does not provide uniqueness on its own.

The corollary above can be combined with an arborescent form of Lagrange’s inversion formula (see [7]) to obtain a combinatorial formula for \(w_{\textbf{n}}(t)\) that is similar to the results derived in [2, 20].

Finally, we discuss the concentration-like behaviour in the cluster size distribution \(w_{\textbf{n}}(t)\) observed when \(N:=|\textbf{n}|\) is large. In the multicomponent coagulation equation, each cluster is characterised by a vector \(\textbf{n}\), and we can interpret N as the overall cluster size, irrespective of its composition. Since large clusters can only be generated by a large number of merging events, the mass in the distribution should concentrate on the ‘mean’ vector of all possible combinations if we condition N to be large. We refer to this phenomenon a localization. This terminology is inspired by observation of a similar behaviour for asymptotically large time in multicomponent coagulation equations that do not lead to gelation [11, 13]. In contrast, our notion of localization is an asymptotic property in the size of the clusters, however, it is defined for any \(t\in (0, T_c)\).

Let \(P:= \textrm{diag}(p_1, \ldots , p_m)\), \(\Delta _m:= \{\varvec{\rho } \in {\mathbb R}^m: |\varvec{\rho }| = 1\}\) and consider a function \(\Gamma : \Delta _m \rightarrow {\mathbb R}\),

$$\begin{aligned} \Gamma (\rho _1, \ldots , \rho _m) = \sum \limits _{l=1}^m \left( \rho _l \log \left( \frac{\rho _l}{t(AP\varvec{\rho })_l}\right) + t (AP\varvec{\rho })_l\right) - 1. \end{aligned}$$
(2.10)

Corollary 2.8

Consider the multicomponent Smoluchowski equation (2.3) with monodisperse initial conditions \(p_i \ne 0\) for \(i \in [m]\). For any \(t \in (0, T_c)\) and stochastic vector \(\varvec{\rho }\in {\mathbb Q}^m\) with strictly positive elements,

$$\begin{aligned} \lim \limits _{N\rightarrow \infty }\frac{1}{N} \log w_\textbf{n}(t)\Big |_{\textbf{n}=N \varvec{\rho }} = -\Gamma (\varvec{\rho }). \end{aligned}$$
(2.11)

Moreover, \(\Gamma (\varvec{\rho })\) is convex on \(\Delta _m\).

Since \(\Gamma (\varvec{\rho })\) is convex, it attains a finite minimum \({\varvec{\rho }}^*(t)=\min _{{\varvec{\rho }} \in \Delta _m}\Gamma (\varvec{\rho })\), which generally depends on time t, initial conditions \(p_k\) and the kernel matrix A. The vector \(\varvec{\rho }^*\) gives the limiting configuration of clusters when their overall size N tends to infinity.

Remark 2.9

Consider the special case when AP is a right stochastic matrix, i.e. all of its row sums are 1. Then the minimizer giving the localisation direction, \(\varvec{\rho }^*\), can be solved explicitly. Namely, \(\varvec{\rho }^*\) is the normalized eigenvector corresponding the leading eigenvalue of AP. In particular, this implies that the minimizer depends on the initial condition and the kernel but is independent of time for stochastic AP.

Even though our localization result is not applicable at the critical point \(t=T_c\), the case of stochastic matrix shows that the minimzer \(\varvec{\rho }^*\) can depend on AP even as \(t \rightarrow T_c\). The proof of the above corollaries is given in Sect. 3.4.

2.1 Relaxation of the Assumptions

The requirements that A has to be symmetric and irreducible are made without loss of generality. When A is not irreducible, the system of ODEs can be broken into several independent subsystems that can be studied separately with our approach. This also means that there may be several critical points, one for each subsystem. For example, set kernel \(A = I\). Furthermore, even when A is not symmetric, its action will still be \(\frac{1}{2}(A+A^\top )\), as the (multidimensional) convolution is a commutative operation.

The assumption of monodisperse initial conditions is restrictive, although, we do see a possible avenue for relaxing it. Our branching process representation theory is applicable for initial conditions with full supports. For instance, in [17] we show that one can consider appropriately decaying initial conditions for the characteristic PDE of the monocomponent multiplicative Smoluchowski equation. Hence, we expect, the proofs can be generalised for this setting too.

3 Proofs

3.1 Proof of Proposition 2.1

Proof

By definition, \(\textbf{m}(t) = \textbf{m}(0)\) for all \(t \in (0, T_c)\) with \(\textbf{m}\) being the mass vector of (1.2). Therefore, we can rewrite the second term in the RHS of (1.2) as

$$\begin{aligned} \sum \limits _{\textbf{k} \in \mathbb {N}_0^m} \textrm{K}(\textbf{n}, \textbf{k}) w_{\textbf{n}} w_{\textbf{k}} = \sum \limits _{\textbf{k} \in {\mathbb N}_0^m} \textbf{n}^\top A \textbf{k} w_{\textbf{n}} w_{\textbf{k}} = \textbf{n}^\top A \textbf{m}(t) w_{\textbf{n}} = \textbf{n}^\top A \textbf{m}(0)w_{\textbf{n}}. \end{aligned}$$

This shows that the reduced equation is equivalent to the original equation before gelation. Hence, if the reduced equation gels at \(T_c\), so does the original equation. To show the reverse implication, suppose that the original equation gels at \(T_c\) and that reduced equation gels at \(T_c' > T_c\). Let \(\textbf{m}'(t)\) be the mass vector of the reduced equation. By definition, \(\textbf{m}'(t) = \textbf{m}'(0)\) for all \(t \in (0, T_c')\) and \(\textbf{m}'(0) = \textbf{m}(0)\). However, this means that we could turn the reduced equation back into the original equation, which contradicts the assumption that \(T_c' > T_c\). \(\square \)

3.2 Proof of Theorem 2.2

Proof

The idea of the proof is to first derive a PDE for \(U(t, \textbf{x})\) using the multicomponent Smoluchowski equation and then use the PDE for \(U(t, \textbf{x})\) to derive a PDE for \(\textbf{u}(t, \textbf{x})\).

Consider \(U(t, \textbf{x})\) as defined in (2.4). First, observe that

$$\begin{aligned} \nabla U(t, \textbf{x}) = -\sum \limits _{\textbf{n}\in {\mathbb N}_0^m} \textbf{n} w_{\textbf{n}}(t) e^{-\textbf{x} \cdot \textbf{n}}, \end{aligned}$$
(3.1)

where the gradient is taken with respect to \(\textbf{x}\). Differentiating with respect to t and substituting the right hand side of the multicomponent Smoluchowski equation with kernel A, we obtain

$$\begin{aligned} \frac{\partial }{\partial t}U(t, \textbf{x})= & {} \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \frac{\mathrm dw_{\textbf{n}}}{\mathrm dt} e^{-\textbf{x}\cdot \textbf{n}}\nonumber \\= & {} \sum \limits _{\textbf{n}\in {\mathbb N}_0^m} \left( \frac{1}{2}\sum \limits _{\textbf{k}+\textbf{l} = \textbf{n}} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} - \textbf{n}^\top A \textbf{m}(0) w_{\textbf{n}}\right) e^{-\textbf{x} \cdot \textbf{n}}\nonumber \\= & {} \underbrace{\frac{1}{2}\sum \limits _{\textbf{n}\in {\mathbb N}_0^m} \sum \limits _{\textbf{k}+\textbf{l} = \textbf{n}} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{x}\cdot \textbf{n}}}_{\boxed {W_1}} - \underbrace{\sum \limits _{\textbf{n}\in {\mathbb N}_0^m} \textbf{n}^\top A \textbf{m}(0) w_{\textbf{n}} e^{-\textbf{x} \cdot \textbf{n}}}_{\boxed {W_2}}. \end{aligned}$$
(3.2)

We will first look at term \(\boxed {W_1}\). We use that \(\textbf{k}+\textbf{l} = \textbf{n}\), the Cauchy product and the observation (3.1) to obtain that

$$\begin{aligned} \begin{aligned} \boxed {W_1}&= \frac{1}{2}\sum \limits _{\textbf{n} \in {\mathbb N}_0^m}\sum \limits _{\textbf{k}+\textbf{l} = \textbf{n}} (\textbf{k} w_{\textbf{k}} e^{-\textbf{x}\cdot \textbf{k}})^\top A (\textbf{l}w_{\textbf{l}} e^{-\textbf{x}\cdot \textbf{l}}) = \frac{1}{2}(\nabla U)^\top A (\nabla U). \end{aligned} \end{aligned}$$
(3.3)

Next, we manipulate term \(\boxed {W_2}\). Again, using observation (3.1), we obtain

$$\begin{aligned} \boxed {W_2}= & {} \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} (\textbf{n} w_{\textbf{n}} e^{-\textbf{x} \cdot \textbf{n}})^\top A \textbf{m}(0) = - (\nabla U)^\top A \textbf{m}(0). \end{aligned}$$
(3.4)

Combining both (3.3) and (3.4) in (3.2) results in the following PDE for \(U(t, \textbf{x})\):

$$\begin{aligned} \frac{\partial }{\partial t} U(t, \textbf{x}) = \frac{1}{2}(\nabla U)^\top A (\nabla U) + (\nabla U)^\top A \textbf{m}(0). \end{aligned}$$
(3.5)

Next, use \(\textbf{u} = - \nabla U\) to obtain

$$\begin{aligned} \frac{\partial \textbf{u}}{\partial t}= & {} - \nabla \left( \frac{\partial U}{\partial t}\right) \nonumber \\= & {} - \nabla \left( \frac{1}{2}(\textbf{u}^\top A \textbf{u}) - \nabla (\textbf{u}^\top A \textbf{m}(0))\right) \nonumber \\= & {} - ((\nabla \textbf{u}(t, \textbf{x})) A \textbf{u} - (\nabla \textbf{u}) A \textbf{m}(0))\nonumber \\= & {} - (\nabla \textbf{u}) A (\textbf{u}-\textbf{m}(0)). \end{aligned}$$
(3.6)

This finishes the first part of the theorem.

Next, we will show that any power series solution of (2.5) of the form

$$\begin{aligned} \textbf{u}(t, \textbf{x}) = \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} w_{\textbf{n}} e^{-\textbf{n} \cdot \textbf{x}} \end{aligned}$$
(3.7)

solves the multicomponent Smoluchowski equation. By uniform convergence, we observe that

$$\begin{aligned} \nabla \textbf{u} = -\sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} \textbf{n}^\top w_{\textbf{n}} e^{-\textbf{n} \cdot \textbf{x}}. \end{aligned}$$
(3.8)

Therefore, by substituting (3.7) and (3.8) into the PDE (2.5) we obtain:

$$\begin{aligned}{} & {} (\nabla \textbf{u}) A \textbf{u} = -\sum \limits _{\textbf{k}\in {\mathbb N}_0^m} \sum \limits _{\textbf{l} \in {\mathbb N}_0^m} \textbf{k} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{k} \cdot \textbf{x}} e^{-\textbf{l} \cdot \textbf{x}}\nonumber \\{} & {} \quad = -\frac{1}{2}\left( \sum \limits _{\textbf{k}\in {\mathbb N}_0^m} \sum \limits _{\textbf{l} \in {\mathbb N}_0^m} (\textbf{k} \textbf{k}^\top ) A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{k} \cdot \textbf{x}} e^{-\textbf{l} \cdot \textbf{x}} + \sum \limits _{\textbf{k}\in {\mathbb N}_0^m} \sum \limits _{\textbf{l} \in {\mathbb N}_0^m} (\textbf{l} \textbf{l}^\top ) A \textbf{k} w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{k} \cdot \textbf{x}} e^{-\textbf{l} \cdot \textbf{x}}\right) ,\nonumber \\ \end{aligned}$$
(3.9)

where we have split the sum into two equal pieces and relabelled the vectors. Next, we use the symmetry of A to obtain that

$$\begin{aligned} (\nabla \textbf{u}) A \textbf{u} = -\frac{1}{2}\left( \sum \limits _{\textbf{k}\in {\mathbb N}_0^m} \sum \limits _{\textbf{l} \in {\mathbb N}_0^m} (\textbf{k}+\textbf{l}) (\textbf{k}^\top A \textbf{l}) w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{k} \cdot \textbf{x}} e^{-\textbf{l} \cdot \textbf{x}} \right) . \end{aligned}$$
(3.10)

Partitioning the sum, we may write

$$\begin{aligned} (\nabla \textbf{u}) A \textbf{u} = -\frac{1}{2}\left( \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} \sum \limits _{\textbf{k} + \textbf{l} = \textbf{n}} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} e^{-\textbf{n} \cdot \textbf{x}}\right) . \end{aligned}$$
(3.11)

For the second term in (2.5), substituting (3.8) yields

$$\begin{aligned} (\nabla \textbf{u}) A \textbf{m}(0) = -\sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} (\textbf{n}^\top A \textbf{m}(0)) w_{\textbf{n}}e^{- \textbf{n} \cdot \textbf{x}}. \end{aligned}$$
(3.12)

Combining all terms, we obtain the equality

$$\begin{aligned} \frac{\partial \textbf{u}}{\partial t}= & {} \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} \frac{\mathrm dw_{\textbf{n}}}{\mathrm dt} e^{-\textbf{n} \cdot \textbf{x}}\nonumber \\= & {} \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} \left( \frac{1}{2}\sum \limits _{\textbf{k} + \textbf{l} = \textbf{n}} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} - \textbf{n}^\top A \textbf{m}(0)\right) w_{\textbf{n}} e^{-\textbf{n} \cdot \textbf{x}}\nonumber \\= & {} -(\nabla \textbf{u}) A (\textbf{u}-\textbf{m}(0)). \end{aligned}$$
(3.13)

Therefore,

$$\begin{aligned} 0 = \sum \limits _{\textbf{n} \in {\mathbb N}_0^m} \textbf{n} e^{-\textbf{n} \cdot \textbf{x}} \left( \frac{\mathrm dw_{\textbf{n}}}{\mathrm dt} - \frac{1}{2} \sum \limits _{\textbf{k} + \textbf{l} = \textbf{n}} \textbf{k}^\top A \textbf{l} w_{\textbf{k}} w_{\textbf{l}} + \textbf{n}^\top A \textbf{m}(0) w_{\textbf{n}}\right) . \end{aligned}$$
(3.14)

It follows immediately that \(\{w_{\textbf{n}}(t)\}_{\textbf{n} \in {\mathbb N}_0^m}\) satisfies the multicomponent Smoluchowski equation. \(\square \)

3.3 Proof of Theorem 2.3

Before proving the theorem, we formulate the following consequence of the implicit function theorem that is appropriate for our setting.

Lemma 3.1

Let \(U \subseteq {\mathbb R}^{2m+1}\) be an open set and \(\textbf{F}: U \rightarrow {\mathbb R}^m\) be a continuously differentiable map. Let \(V \subseteq {\mathbb R}^{m+1}\) be an open set and consider the continuously differentiable map \(\textbf{g}: V \rightarrow {\mathbb R}^m\) such that \(\textbf{F}((t, \textbf{x}), \textbf{g}(t, \textbf{x})) = 0\). Then,

$$\begin{aligned} \frac{\partial \textbf{g}}{\partial t}(t, \textbf{x}) = - \left[ \frac{\partial \textbf{F}}{\partial \textbf{g}}(t, \textbf{x}, \textbf{g}(t, \textbf{x}))\right] _{m \times m}^{-1} \cdot \frac{\partial \textbf{F}}{\partial t}(t, \textbf{x}, \textbf{g}(t, \textbf{x})) \end{aligned}$$
(3.15)

and

$$\begin{aligned} \nabla (\textbf{g})(t, \textbf{x}) = -\left[ \frac{\partial \textbf{F}}{\partial \textbf{g}}(t, \textbf{x}, \textbf{g}(t, \textbf{x}))\right] _{m\times m}^{-1} \cdot \frac{\partial \textbf{F}}{\partial \textbf{x}}(t, \textbf{x}, \textbf{g}(t, \textbf{x})). \end{aligned}$$
(3.16)

Proof of Theorem 2.3

Define \(\textbf{G}(t, \textbf{x}):= (G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}})\), where \(G_{\textbf{T}^{(i)}}\) is the probability generating function of \(\textbf{T}^{(i)}\). Moreover, let \(P:= \textrm{diag}(p_1, \ldots , p_m)\). Observe that

$$\begin{aligned} \textbf{u}(t, \textbf{x}) = P \textbf{G}(t, \textbf{x}). \end{aligned}$$
(3.17)

Therefore, it suffices to show that

$$\begin{aligned} \frac{\partial \textbf{G}}{\partial t}(t, \textbf{x}) = -(\nabla \textbf{G}(t, \textbf{x})) A P (\textbf{G}(t, \textbf{x}) - \textbf{1}), \end{aligned}$$
(3.18)

with \(\textbf{1}\) being a vector of ones of length m, is solved by \(\textbf{G}(t, \textbf{x})\) as defined above.

From the theory of branching processes [14], the following implicit system holds for vector \(\textbf{G}(t, \textbf{x}) = (G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}})\):

$$\begin{aligned} \begin{aligned} G_{\textbf{T}^{(1)}}&= e^{-x_1} G_{\textbf{X}_1}(G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}})\\&\,\vdots \\ G_{\textbf{T}^{(m)}}&= e^{-x_m} G_{\textbf{X}_m}(G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}}). \end{aligned} \end{aligned}$$
(3.19)

We will make use of the implicit differentiation lemma above to show that the branching process solves the PDE (3.18).

Define \(\textbf{F}: {\mathbb R}^{2m+1} \rightarrow {\mathbb R}^m\) to be the map given by

$$\begin{aligned} F_k(t, \textbf{x}, \textbf{y}) = y_k - e^{-x_k} G_{\textbf{X}_k}(y_1, \ldots , y_m), \qquad k \in [m]. \end{aligned}$$
(3.20)

\(\textbf{F}\) is continuous differentiable. Let \(\textbf{G}(t, \textbf{x}) = (G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}})\) be as in the above Lemma. By the implicit function theorem, \(\textbf{G}(t, \textbf{x})\) is continuously differentiable as long as \(\frac{\partial \textbf{F}}{\partial \textbf{g}}\) is invertible. Furthermore, \(\textbf{F}(t, \textbf{x}, \textbf{G}(t, \textbf{x})) = 0\) for all \(t \in (0, T_c)\) and \(\textbf{x} \in [0, \infty )^m\). Therefore, we can apply the above Lemma to implicitly differentiate (3.19). To this end, we calculate the partial derivatives of \(\textbf{F}\) with respect to t and \(\textbf{x}\). For any \(i \in [m]\), we have

$$\begin{aligned} \frac{\partial F_i}{\partial t}(t, \textbf{x}, \textbf{G}(t, \textbf{x})) = - e^{-x_i} G_{\textbf{X}_{i}}(G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}})\sum \limits _{l=1}^m A_{il} p_l \left( G_{\textbf{T}^{(l)}}-1\right) . \end{aligned}$$
(3.21)

Similarly, for any \(i, j \in [m]\),

$$\begin{aligned} \begin{aligned} \frac{\partial F_i}{\partial x_j}(t, \textbf{x}, \textbf{G}(t, \textbf{x}))&= \delta _{i, j} e^{-x_i} G_{\textbf{X}_i}(G_{\textbf{T}^{(1)}}, \ldots , G_{\textbf{T}^{(m)}}) \end{aligned} \end{aligned}$$
(3.22)

which is a diagonal matrix. Observe that by combining (3.21) and (3.22), we obtain

$$\begin{aligned} \frac{\partial \textbf{F}}{\partial t}(t, \textbf{x}, \textbf{G}(t, \textbf{x})) = - \frac{\partial \textbf{F}}{\partial \textbf{x}} A P (\textbf{G}(t, \textbf{x}) - \textbf{1}). \end{aligned}$$
(3.23)

Multiplying both sides by \(-\left[ \frac{\partial \textbf{F}}{\partial \textbf{G}}(t, \textbf{x}, \textbf{G}(t, \textbf{x}))\right] _{m \times m}^{-1}\), we obtain

$$\begin{aligned} \frac{\partial \textbf{G}}{\partial t}(t, \textbf{x}, \textbf{G}(t, \textbf{x})) = -(\nabla \textbf{G}(t, \textbf{x})) A P (\textbf{G}(t, \textbf{x}) - \textbf{m}(0)), \end{aligned}$$
(3.24)

which finishes the proof of the first part of the Theorem.

In order to show that the solution is smooth up until the critical time

\(T_c = \Vert AP\Vert _2^{-1}\), we look at

$$\begin{aligned} \begin{aligned} \frac{\partial \textbf{F}}{\partial \textbf{G}}(t, \textbf{x}, \textbf{G}(t, \textbf{x}))&= I - t A \, \textrm{diag}(u_1(t, \textbf{x}), \ldots , u_m(t, \textbf{x}))\\&= I - t A P\, \textrm{diag}(G_1(t, \textbf{x}), \ldots , G_m(t, \textbf{x})). \end{aligned} \end{aligned}$$

Since \(AP \,\textrm{diag}(u_1(t, \textbf{x}), \ldots , u_m(t, \textbf{x}))\) is a matrix with exclusively non-negative entries, the above quantity is singular for \((t, \textbf{x})\) as soon as \(t = \Vert A P \textbf{G}(t, \textbf{x})\Vert _2^{-1}\) by the Perron–Frobenius theorem. Since \(\Vert \textbf{G}(t, \textbf{x})\Vert _2 \le 1\), it follows that \(T_c = \Vert AP\Vert _2^{-1}\) is the critical time. \(\square \)

3.4 Proofs of Corollaries 2.52.6 and 2.8

Proof of Corollary 2.5

Theorem 2.3 shows that the gelation time of the multicomponent Smoluchowski equation is given by \(T_c = \Vert AP\Vert _2^{-1}\). To show that it is equal to the critical time of the branching process, recall that

$$G_{\textbf{X}_k}(\textbf{s}) = \prod \limits _{l=1}^m \exp (t A_{kl} p_l (s_l-1)).$$

Therefore, the matrix of expected offspring, denoted by \(T_{\kappa }\) is given by

$$\begin{aligned} (T_{\kappa })_{i,j \in [m]} = t A_{ij} p_j. \end{aligned}$$
(3.25)

A multi-type branching processes is critical when \(\Vert T_{\kappa }\Vert _2 = 1\), see [4, 16]. This is equivalent to \(t\Vert AP\Vert _2 = 1\), which is satisfied at \(t=T_c\). \(\square \)

Proof of Corollary 2.6

Combining Theorem 2.3 and the second part of Theorem 2.2, we immediately obtain the result of the corollary. \(\square \)

Before proving Corollary 2.8, we introduce the following lemma, which is a consequence of the Lagrange-Good inversion formula [15].

Lemma 3.2

Given a multi-type branching process satisfying the implicit system of equations (3.19), for any \(\textbf{n} \in {\mathbb N}^m\) and \(i \in [m],\)

$$\begin{aligned} {\mathbb P}(\textbf{T}^{(i)} = \textbf{n}) = [\textbf{r}^{\textbf{n}}]r_i \textrm{det}(K(\textbf{r})) (G_{\textbf{X}_1}(\textbf{r}))^{n_1} \cdot \ldots \cdot (G_{\textbf{X}_m}(\textbf{r}))^{n_m}, \end{aligned}$$
(3.26)

where

$$\begin{aligned} K(\textbf{r}):=\left[ \delta _{ij} - \frac{r_i}{G_{\textbf{X}_i}(\textbf{r})} \frac{\partial G_{\textbf{X}_i}}{\partial r_j}\right] _{1 \le i, j \le m}. \end{aligned}$$
(3.27)

Proof of Corollary 2.8

Note that by Corollary 2.6, we have that for any \(\textbf{n} \in {\mathbb N}^m\) and \(i \in [m]\),

$$\begin{aligned} w_{\textbf{n}} = \frac{p_i}{n_i}{\mathbb P}(\textbf{T}^{(i)} = \textbf{n}). \end{aligned}$$
(3.28)

We will analyze \({\mathbb P}(\textbf{T}^{(i)} = \textbf{n})\) further. Using Lemma 3.2, we obtain that

$$\begin{aligned} {\mathbb P}(\textbf{T}^{(i)} = \textbf{n}) = [\textbf{r}^{\textbf{n}}]r_i \textrm{det}(K(\textbf{r})) G_{\textbf{X}_1}^{n_1} \cdot \ldots \cdot G_{\textbf{X}_m}^{n_m}. \end{aligned}$$
(3.29)

It can be readily verified that

$$\begin{aligned} K(\textbf{r}) = [\delta _{ij} - r_i t A_{ij} p_j]_{1 \le i, j \le m}. \end{aligned}$$
(3.30)

Moreover, notice that \(\det K(\textbf{r})\) can be written as a sum over permutations,

$$\begin{aligned} \det K(\textbf{r}) = \sum \limits _{I \in \mathcal {P}([m])} c_I r_I, \end{aligned}$$
(3.31)

where \(r_I = \prod _{i\in I} r_i\) and \(c_I\) being a coefficient dependent on t, A and \(\{p_i\}_{i \in [m]}\). Therefore, we may write (3.29) as

$$\begin{aligned} {\mathbb P}(\textbf{T}^{(i)} = \textbf{n}) = \sum \limits _{I \in \mathcal {P}([m])} c_I [\textbf{r}^{\textbf{n}}]r_I r_i G_{\textbf{X}_1}^{n_1} \cdot \ldots \cdot G_{\textbf{X}_m}^{n_m}. \end{aligned}$$
(3.32)

For the next step in our analysis, let \(Z_l \sim \textrm{Poi}\left( t \sum _{k=1}^m n_k A_{kl} p_l\right) \) for each \(l \in [m]\). By defining

$$\begin{aligned} \sigma _l:= \sum \limits _{k=1}^m \rho _k A_{kl} p_l, \end{aligned}$$
(3.33)

we see that \(Z_l \sim \textrm{Poi}\left( t N \sigma _l\right) \). Observe that the probability generating function of \(Z_l\) is given by

$$\begin{aligned} G_{Z_l}(r) = \exp \left( t N \sigma _l (r-1)\right) \end{aligned}$$
(3.34)

Then, we note that the product \(G_{\textbf{X}_1}(\textbf{r})^{n_1} \cdot \ldots \cdot G_{\textbf{X}_m}(\textbf{r})^{n_m}\) can be written as

$$\begin{aligned} G_{\textbf{X}_1}(\textbf{r})^{n_1} \cdot \ldots \cdot G_{\textbf{X}_m}(\textbf{r})^{n_m}= & {} \prod \limits _{k=1}^m \prod \limits _{l=1}^m \exp \left( t n_k A_{kl} p_l (r_l-1)\right) \nonumber \\= & {} \exp \left( t \sum \limits _{l=1}^m \sum \limits _{k=1}^m n_k A_{kl} p_l (r_l-1)\right) \nonumber \\= & {} \prod \limits _{l=1}^m \exp \left( t N \sigma _l (r_l-1)\right) \nonumber \\= & {} \prod \limits _{l=1}^m G_{Z_l}(r_l). \end{aligned}$$
(3.35)

Combining (3.32) and (3.35) yields

$$\begin{aligned} {\mathbb P}(\textbf{T}^{(i)} = \textbf{n})= & {} \sum \limits _{I \in \mathcal {P}([m])} c_I [\textbf{r}^{\textbf{n}}] r_I r_i G_{Z_1}(r_1) \cdot \ldots \cdot G_{Z_m}(r_m)\nonumber \\= & {} \sum \limits _{I \in \mathcal {P}([m])} c_I [\textbf{r}^{\textbf{n}}] \left( \prod \limits _{l=1}^m r_l^{\mathbb {1}_I(l) + \delta _{li}}\right) G_{Z_1}(r_1) \cdot \ldots \cdot G_{Z_m}(r_m)\nonumber \\= & {} \sum \limits _{I \in \mathcal {P}([m])} c_I \prod \limits _{l=1}^m [r_l^{n_l}] r_l^{\mathbb {1}_I(l) + \delta _{li}} G_{Z_l}(r_l)\nonumber \\= & {} \sum \limits _{I \in \mathcal {P}([m])} c_I \prod \limits _{l=1}^m {\mathbb P}(Z_l = n_l - (\mathbb {1}_I(l) + \delta _{li})) \end{aligned}$$
(3.36)

The last line implies that we need to analyze

$$\begin{aligned} {\mathbb P}(Z_l = n_l), \qquad {\mathbb P}(Z_l = n_l-1), \qquad {\mathbb P}(Z_l = n_l - 2). \end{aligned}$$
(3.37)

We are going to relate the latter two terms to the first term, making explicit use of the fact that \(Z_l \sim \textrm{Poi}\left( t N \sigma _l\right) \) for each \(l \in [m]\). We obtain

$$\begin{aligned} {\mathbb P}(Z_l = n_l-1) = e^{-t N \sigma _l} \frac{(t N \sigma _l)^{n_l-1}}{(n_l-1)!} = \frac{\rho _l}{t \sigma _l} {\mathbb P}(Z_l = n_l) \end{aligned}$$
(3.38)

and

$$\begin{aligned} {\mathbb P}(Z_l = n_l-2) = e^{-\lambda _l} \frac{\lambda _l^{n_l-2}}{(n_l-2)!} = \left( \frac{\rho _l^2}{t^2 \sigma _l^2} - \frac{\rho _l}{t^2 N \sigma _l^2}\right) {\mathbb P}(Z_l = n_l) \end{aligned}$$
(3.39)

Therefore, we obtain that

$$\begin{aligned} \log {\mathbb P}(\textbf{T}^{(i)} = \textbf{n}) = \sum \limits _{l=1}^m \log {\mathbb P}(Z_l = n_l) + \log \sum \limits _{I \in \mathcal {P}([m])} c_I \mathcal {O}\left( 1-\frac{1}{N}\right) . \end{aligned}$$
(3.40)

Observe that the second term will tend to zero as we divide by N and sending \(N \rightarrow \infty \). Using the explicit formula of the Poisson distribution together with Stirling’s formula

$$ n! = (1+o(1)) \sqrt{2\pi n} n^n e^{-n},$$

we obtain

$$\begin{aligned} \sum \limits _{l=1}^m \log {\mathbb P}(Z_l = N \rho _l)= & {} \sum \limits _{l=1}^m \log \left( \frac{(t N \sigma _l)^{N \rho _l}}{(N \rho _l)!} e^{-t N \sigma _l}\right) \nonumber \\= & {} N \sum \limits _{l=1}^m \left( \rho _l \log (t N \sigma _l) - \rho _l \log (N \rho _l) + \rho _l \right) + \mathcal {O}\left( \log (N)\right) \nonumber \\= & {} N \sum \limits _{l=1}^m \left( \rho _l \log \left( \frac{t \sigma _l}{\rho _l}\right) - t \sigma _l + \rho _l\right) + \mathcal {O}\left( \log (N)\right) \end{aligned}$$
(3.41)

Dividing both sides by N and sending \(N \rightarrow \infty \) finishes the first part of the proof.

It remains to prove that

$$\begin{aligned} \Gamma (\mathbf {\rho }) = \sum \limits _{l=1}^m \rho _l \log \left( \frac{\rho _l}{t \sigma _l}\right) + t \sigma _l - 1 \end{aligned}$$
(3.42)

is convex. The idea of the proof is to use objects that are very reminiscent of objects that also appear in large deviations theory. Consider the function

$$\begin{aligned} \Lambda _l(\varvec{\rho }) = \sup \limits _{\lambda \in {\mathbb R}} \{\lambda \rho _l - \sigma _l(e^{\lambda }-1)\}. \end{aligned}$$
(3.43)

Now, we claim that the equality \(\Gamma (\varvec{\rho }) = \sum _{l=1}^m \Lambda _l(\varvec{\rho })\) holds and that \(\Lambda _l(\varvec{\rho })\) is convex for each \(l \in [m]\). The first claim follows immediately by differentiating the argument within the supremum of \(\Lambda _l\), equating to zero and solving for \(\lambda \). It can be readily checked that this yields a unique maximum. The second claim follows by considering \(\varvec{\rho }^{(1)}, \varvec{\rho }^{(2)} \in \Delta _m\) and \(\nu \in [0, 1]\). By using subadditivity of the supremum, we obtain that

$$\begin{aligned} \Lambda _l(\nu \varvec{\rho }^{(1)} + (1-\nu ) \varvec{\rho }^{(2)}) \le \nu \Lambda _l(\varvec{\rho }^{(1)}) + (1-\nu ) \Lambda _l(\varvec{\rho }^{(2)}). \end{aligned}$$
(3.44)

Therefore, \(\Lambda _l\) is convex for each \(l \in [m]\). Since \(\Gamma \) is a sum of convex functions, it is convex as well. \(\square \)