The tail does not determine the size of the giant

The size of the giant component in the configuration model, measured by the asymptotic fraction of vertices in the component, is given by a well-known expression involving the generating function of the degree distribution. In this note, we argue that the distribution over small degrees is more important for the size of the giant component than the precise distribution over very large degrees. In particular, the tail behavior of the degree distribution does not play the same crucial role for the size of the giant as it does for many other properties of the graph. Upper and lower bounds for the component size are derived for an arbitrary given distribution over small degrees $d\leq L$ and given expected degree, and numerical implementations show that these bounds are close already for small values of $L$. On the other hand, examples illustrate that, for a fixed degree tail, the component size can vary substantially depending on the distribution over small degrees.


Introduction and results
The configuration model is one of the simplest and most well-known models for generating a random graph with a prescribed degree distribution. It takes a probability distribution with support on the non-negative integers as input and gives a graph with this degree distribution as output. The model is very well studied and there are precise answers to many questions concerning properties of the model such as the threshold for the occurrence of a giant component [9,11], the asymptotic fraction of vertices in the largest component [9,12], diameter and distances in the supercritical regime [5,6,7], criteria for the graph to be simple [8] etc; see [3,Chapter 7] and [4, for detailed overviews. Empirical networks often exhibit power law distributions, that is, the number of vertices with degree d decays as an inverse power of d for large degrees. For this reason, there has been a lot of attention on properties of the configuration model with this type of degree distribution. Here we focus on the size of the largest component in the supercritical regime -specifically, the asymptotic fraction of vertices in the giant component -as a functional of the degree distribution. Our main message is that the distribution over small degrees is more important for the size of the largest component than the tail behavior of the degree distribution. While this is not surprising, in view of the general focus on degree tails in the literature, we think it deserves to be pointed out and elaborated on.

The model and its phase transition
To define the model, fix the number n of vertices in the graph and let F = {p d } d≥0 be a probability distribution with support on the non-negative integers. Assign a random number D i of half-edges independently to each vertex i = 1, . . . , n, with D i ∼ F . If the total number of half-edges is odd, one extra half-edge is added to a uniformly chosen vertex. Then pair half-edges uniformly at random to create edges, that is, first pick two half-edges uniformly at random and join them into an edge, then pick two half-edges from the set of remaining half-edges and create another edge, and so on until all half-edges have been paired. The construction allows for self-loops and multiple edges between the same pair of vertices. However, if the degree distribution has finite mean, such edges can be removed without changing the asymptotic degree distribution, and if the second moment is finite, there is a strictly positive probability that the graph is simple; see e.g. [1,8].
Write µ = E[D] and ν = E[D(D − 1)]/µ, and assume throghout that p 2 = 1. It is wellknown that the threshold for the occurrence of a giant component in the configuration model is given by ν = 1: if ν > 1, then there is with high probability a unique giant component occupying a positive fraction ξ of the vertices as n → ∞, while if ν < 1, then the largest component grows sublinearly in n; see [11,9]. To see this, consider an exploration of the graph starting from a uniformly chosen vertex and then proceeding via nearest neighbors. For large n, such an exploration can be approximated by a branching process, where the offspring (=degree) of the first vertex has distribution F . For vertices in later generations, their degrees are distributed according to a size biased version of F . Indeed, by construction of the graph, the vertices constitute the end-points of uniformly chosen half-edges, and the probability of encountering a vertex with degree d is therefore proportional to d. Since we arrive at a vertex from one neighbor, the remaining number of neighbors -corresponding to the offspring of the vertex -has a down-shifted size biased distributionF = {p d } d≥0 , defined bỹ Infinite survival in the approximating branching process corresponds to a giant component in the graph, and the critical parameter ν is easily identified as the mean of the distribution (1). Let ξ denote the asymptotic fraction of vertices in the largest component, throughout refered to as the size of the largest component. The asymptotic size ξ is given by the survival probability in the two-stage branching process (this can fail when p 2 = 1, see Remark 2.7 in [9]). Write g(s) for the probability generating function for the degree distribution F and note that the probability generating function forF is given by g ′ (s)/µ.
Letz denote the probability that a branching process with offspring distributionF goes extinct. Thenz is the smallest non-negative solution to the equation s = g ′ (s)/µ, and A comprehensive description of the above exploration process can be found e.g. in [4,Chapter 4]. As for notation, when we want to emphasize the role of a given distribution F for the above quantities, we write ξ F andz F etc. Furthermore, we always equip quantities related to down-shifted size biased distributions with a wiggle-hat.

Basic examples
We will be interested in how the size ξ of the giant component depends on properties of the degree distribution F . Despite the large interest in the configuration model in the context of network modeling, there has been surprisingly little work on this issue. One recent example however is [10], where component sizes are compared when degree distributions are ordered according to various concepts of stochastic domination. We also mention [2], where a distribution is identified that maximizes the size of the largest component in a percolated configuration graph for a given mean degree: this is achieved by putting all mass at 0 and two consecutive integers. Here, we will throughout restrict to the class of distributions with p 0 = 0, that is, to graphs without isolated vertices. We hence require that all vertices have a chance of being included in a giant component (if such a component exists), and do not investigate cases where the component size can be tuned by removing some fraction of the vertices.
First note that, when the mean µ is fixed, the critical parameter ν increases as the variance of the distribution increases, making it easier to form a giant component. This might lead one to suspect that the size of the giant component is also increasing in ν. This however is not true, in fact it is typically the other way around, as elaborated on in [10]. To understand this, note that fixing the mean and increasing the variance implies that there will be more vertices with small degree in the graph. Vertices with small degree are those that may not be included in the giant component, which then becomes smaller. Consider a very simple example with D ∈ {1, 2, 3} where the probability p 1 of degree 1 is varied and the probabilities p 2 and p 3 are tuned so that the mean is kept fixed. As p 1 increases, also the probability p 3 increases, implying a larger variance. Figure 1(a) shows a plot of the component size and the critical parameter against p 1 when µ = 2.1, and we see that the giant component shrinks from occupying all vertices to a fraction 0.85 of them, while the critical parameter increases linearly. Figure 1(b) shows a similar plot (with only the component size) when D ∈ {1, 2, 10} and again µ = 2.1, and we see that the component size decreases from 1 to less than 0.65. Note that these examples also illustrate that the mean in itself does not determine the component size, since the mean is constant in both pictures.
In the example we see that the component size ξ decreases as the fraction of degree 1 vertices increases. This is natural since degree 1 vertices serve as dead ends in the component. If P(D ≥ 2) = 1 (and p 2 = 1), then the extinction probabilityz equals 0, implying that ξ = 1. The size of the giant is hence determined by the balance between degree 1 vertices and vertices of larger degree. Increasing the variance in a distribution with a fixed mean typically implies an increase in the number of low degree vertices, and our main message is that the distribution over small degrees is in fact more important for the size of the giant component than the precise distribution over very large degrees. In particular, the tail behavior of the degree distribution does not play the same crucial role for the size of the giant as it does for certain other quantities such as e.g. the scaling of the distances in the giant component [5,6].
That the distribution over small degrees can play a significant role is illustrated in Figure 2, where the degrees have a fixed tail distribution and the remaining probability is allocated at small degrees in different mean-preserving ways. In Figure 2(a), the degree distribution is fixed for d ≥ 4 (we consider a Poisson(2) distribution and a power-law with exponent -3) and the remaining probability is allocated at the degrees 1, 2 and 3. Specifically, the probability p 1 is varied and p 2 and p 3 are then adjusted so that the mean is kept fixed at µ = 2.2. Figure 2(a) shows plots of the component size against p 1 and we see that, although the tails remain the same, the component size changes with p 1 in both cases. Figure 2(b) shows a similar plot when the tail is fixed for d ≥ 11 (Poisson and power-law) and the mean is equal to 3.5.

Bounds for a given distribution over small degrees
We also argue that, conversely, fixing the distribution over small degrees typically leaves little room for controlling the component size by tuning the tail. Specifically, the difference between the maximal and the minimal achievable component size when the first L probabilities and the mean are fixed tend to be small already for small values of L. This requires bounds for the component size for a given distribution over small degrees. To formulate our results here, let p L = {p 1 , . . . , p L } denote a fixed set of probabilities associated with degrees 1, . . . , L for some L ≥ 1, and write F (p L ) for the set of all distributions having those specific initial probabilities. Also write F (µ, p L ) for the set of all distributions in F (p L ) with a given mean µ. It turns out that a crude lower bound for the component size for distributions in F (µ, p L ) is obtained by placing all remaining mass p >L = 1 − L i=1 p i at the point L + 1. Fixing also the mean µ, under a mild technical condition, this bound can be modified into one that is optimal for distributions in F (µ, p L ), that is, any larger bound is violated by some distribution in F (µ, p L ). Under a similar technical condition, an optimal upper bound for distributions in F (µ, p L ) is obtained by placing all remaining mass at two specific consecutive integers.
For a fixed p L , consider a distribution G = G(p L ) with p L+1 = p >L (and p i = 0 for i ≥ L + 2), write g G (s) for its probability generating function and ξ G for the size of the giant component in a configuration graph with this degree distribution. Proposition 1.1. For each fixed p L , we have that ξ F ≥ ξ G for all F ∈ F (p L ). Proposition 1.1 is proved in the next section. To formulate (optimal) bounds for distributions in F (µ, p L ), where also the mean µ is fixed, denote and note that, for any F ∈ F (µ, p L ), we have for D ∼ F that E[D|D > L] = κ. Next, let H = H(µ, p L ) be a distribution where all remaining mass is placed at the two integers ⌊κ⌋ and ⌈κ⌉ (or one integer if κ is an integer) in such a way that the mean is preserved, that is, Write g H for the associated generating function and ξ H for the component size in the corresponding configuration graph. Finally, letz G andz H denote the extinction probabilities in branching processes with offspring distributions given by down-shifted size biased versions of the above distributions. Our bounds on the component size with fixed initial probabilities p L and fixed mean µ are as follows.
Theorem 1.1. Fix p L and µ.
(a) If p L is such thatz G ≤ e − 1 L+1 , then is the smallest non-negative solution to the equation s = g ′ G (s)/µ.
The bounds are optimal under the given conditions, that is, in (a) we have that Remark 1. The restrictions on p L and µ are imposed for technical reasons. They imply that, if the extinction probabilitiesz G andz H are close to 1, then L has to be large, that is, a sufficiently large part of the distribution has to be fixed. We believe that this serves to avoid e.g. situations where F (µ, p L ) contains both subcritical and supercritical distributions. For most distributions, the conditions are mild, in the sense that they are satisfied already for moderate values of L (in relation to µ); see Table 1 for examples. Note however that, for L = 1, when only the probability of degree 1 is fixed, the condition in (a) is not satisfied: in this case the distribution G has mass only at 1 and 2 implying thatz G = 1.

Remark 2.
The distribution G can be thought of as the limiting case of a distribution G m where most of the remaining mass p >L is placed at L + 1 and a vanishing amount on another integer m → ∞; see the proof of Theorem 1.1(b). The mean in this distribution G m is kept fixed at µ, and the bound in (b) differs from the component size ξ G obtained for the distribution G in that the correct mean µ is used instead of the mean of G in the equation definingz (µ) G (explaining the notation). Note that the spread in the distribution of the remaining mass is maximized in the distribution G m . In the distribution H, on the other hand, the mass is concentrated as much as possible (while still keeping the mean fixed). Table 1 contains numerical values of the bounds in Proposition 1.1 and Theorem 1.1 for a few different distributions p L over small degrees (that all fulfill the technical conditions). As explained above, we only analyze distributions with p 0 = 0. We note that, in all cases, the upper and lower bound on the size of the giant are very close, supporting the claim that, if the distribution over low degrees is fixed, then the size of the giant is not affected much by the tail of the distribution. However, we would like to argue that this is the case for all choises of p L and µ (satisfying the technical conditions) and for this we need to investigate the bounds more systematically.  Table 2: Maximal difference between the bounds in Theorem 1.1(a) and (b) for different values of L. We also give the probabilities p L and mean µ that give rise to the maximal difference and the corresponding values of the bounds. If a large part of the distribution is fixed, it is not surprising that the component size cannot be tuned much, and we hence focus on small values of L, say L ≤ 5. For each L ∈ {2, 3, 4, 5} we have made a grid search (with step length 0.05) of all possible distributions p L for different values of µ ∈ [1, 5] (with step length 0.2). Table 5 shows the maximal difference between the upper and lower bound for distributions fulfilling the technical conditions and also for which distribution p L and mean µ that this maximal difference is observed. We note that, for L = 2, the maximal difference is 0.055 and it then decreases with L to 0.024 for L = 5. Througout, the worst cases occur for small values of µ. This is confirmed by Figure 3, where the maximal difference (over L ∈ {2, 3, 4, 5} and p L ) is plotted against µ. We remark that, in all cases, the maximal difference was observed for L = 2. In summary, this indicates that, if the first L = 5 probabilities are fixed (and the technical conditions satisfied), then the component size cannot vary more than approximately 0.024.

Numerical implementations
It would of course be desirable to estimate the difference between the bounds analytically, but it seems complicated to obtain good estimates for small values of L, which is what we are after.
In the next section we prove Proposition 1.1 and Theorem 1.1.

Proof of Theorem 1.1
Assume throughout this section that p L is fixed.
Proof of Proposition 1.1. Fix a distribution F ∈ F (p L ). Since the component size ξ is given by (2), and ξ G by the analogous expression for the distribution G, we need to show that g(z) ≤ g G (z G ). It is clear that g(s) ≤ g G (s) for any s ∈ [0, 1], and hence, since generating functions are increasing, it follows that g(z) ≤ g G (z G ) if we show thatz ≤z G . Let {p (G) d } L d=0 denote the probabilities defining the down-shifted size biased versionG of G and recall that {p d }, defined in (1), denote the corresponding probabilities for F . It is not hard to see thatp d ≤p (G) d for all i = 0, . . . , L (andp (G) i = 0 for i ≥ L + 1). HenceG is stochastically smaller thanF , implying thatz ≤z G , as desired.
For the remainder of the section, we fix also the mean µ.
Proof of Theorem 1.1(a). We begin by defining a sequence of distributions {G m } m≥κ where a vanishing (as m → ∞) fraction of the remaining mass is placed at m and the rest at L + 1, in such a way that the mean of the distribution is fixed at µ. Let .
Note that G m ∈ F (µ, p L ). Writez m for the extinction probability of a branching process with offspring distribution given by a down-shifted size biased versionG m of G m . Also, letz (µ) G denote the smallest solution of the equation s = g ′ G (s)/µ. We will show that (i) G ) for all F ∈ F (µ, p L ) and then, in order to show that the bound is sharp, that (ii)z m is increasing for large m and converges toz (µ) G . To establish (i), first fix a distribution F ∈ F (µ, p L ), that is, in addition to p L we also fix p d for d ≥ L + 1 such that the mean is µ. Since g F (s) ≤ g G (s) for all s and generating functions are increasing, the desired conclusion follows ifz F ≤z (µ) G , which in turn follows ensures that functions of the form f (d) = ds d−1 , with s ≤z G , are strictly decreasing for d ≥ L + 1. Sincez F ≤z G (as shown in Proposition 1.1), this means that which implies that g ′ F (z F ) ≤ g ′ G (z F ), as desired. As for (ii), note that it follows from the proof of Proposition 1.1 thatz m ≤z G , and the assumptionz G ≤ e − 2 L+1 ensures thatz G < 1 so thatz m < 1. The extinction probabilityz m solves the equation s = g ′ m (s)/µ and hence it follows thatz m is increasing for large m if g ′ m (z m ) ≤ g ′ m+1 (z m ) when m is large -indeed, the smallest solutionz m+1 of s = g ′ m+1 (s)/µ must then be larger thanz m . Noting that g ′ m (s) = dp (m) d s d−1 , we obtain that which is positive for large m since r m −r m+1 is positive and of order m −2 while mr m (z m ) m−L−1 is exponentially decreasing in m (recall,z m ≤z G < 1 for all m ≥ κ). Sincez m is increasing for large m and bounded from above byz G < 1, it converges to some limitz ∞ that is strictly smaller than 1. Furthermore, since r m → 0 andz m ≤z G < 1, we obtain that as m → ∞. Therefore,z ∞ is the unique solution of the equation s = g ′ G (s)/µ in (0, 1), which is also the definition ofz (µ) G . Finally, we obtain that the derived bound, 1 − ξ F = g F (z F ) ≤ g G (z (µ) G ), is optimal: Since g m (z m ) ր g G (z (µ) G ), for any ξ > 1 − g G (z (µ) G ) there exist an m such that ξ Gm < ξ.
The following simple lemma will be used in the proof of Theorem 1.1(b). It will be applied to N d = D|D > L -that is, a random variable distributed as D conditional on being strictly larger than L -and the mean is therefore denoted by κ. Lemma 2.1. Let N be an integer valued random variable with mean κ. There exist integer valued random variables N 1 and N 2 with E[N 1 ] = ⌊κ⌋ and E[N 2 ] = ⌊κ⌋ + 1 such that, with Z ∼ Be(⌊κ⌋ + 1 − κ) independent of N 1 and N 2 , we have that Proof. Let N low d = N|N ≤ ⌊κ⌋ and N hi d = N|N > ⌊κ⌋ be independent and write κ low and κ hi for the respective means. Furthermore, let X and Y be Bernoulli variables independent of N low and N hi with parameter κ hi −⌊κ⌋ κ hi −κ low and κ hi −⌊κ⌋−1 κ hi −κ low , respectively. Then set It is straightforward to confirm that P(ZN 1 + (1 − Z)N 2 = i) = P(N = i) for all i.
Proof of Theorem 1.1(b). We need to show that g F (z F ) ≥ g H (z H ) for all F ∈ F (µ, p L ).
To this end, we begin by showing that g F (s) ≥ g H (s) for all s ∈ [0, 1] and all F ∈ F (µ, p L ).
Pick F ∈ F (µ, p L ) and let D ∼ F . The probability generating function G F (s) can be written as  (3)  and the bound follows. That the bound is optimal follows by noting that F κ ∈ F (µ, p L ), that is, the distribution defining the bound is included in the class.