Weighted Distances in Scale-Free Configuration Models

In this paper we study first-passage percolation in the configuration model with empirical degree distribution that follows a power-law with exponent \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau \in (2,3)$$\end{document}τ∈(2,3). We assign independent and identically distributed (i.i.d.) weights to the edges of the graph. We investigate the weighted distance (the length of the shortest weighted path) between two uniformly chosen vertices, called typical distances. When the underlying age-dependent branching process approximating the local neighborhoods of vertices is found to produce infinitely many individuals in finite time—called explosive branching process—Baroni, Hofstad and the second author showed in Baroni et al. (J Appl Probab 54(1):146–164, 2017) that typical distances converge in distribution to a bounded random variable. The order of magnitude of typical distances remained open for the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau \in (2,3)$$\end{document}τ∈(2,3) case when the underlying branching process is not explosive. We close this gap by determining the first order of magnitude of typical distances in this regime for arbitrary, not necessary continuous edge-weight distributions that produce a non-explosive age-dependent branching process with infinite mean power-law offspring distributions. This sequence tends to infinity with the amount of vertices, and, by choosing an appropriate weight distribution, can be tuned to be any growing function that is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\log \log n)$$\end{document}O(loglogn), where n is the number of vertices in the graph. We show that the result remains valid for the the erased configuration model as well, where we delete loops and any second and further edges between two vertices.


Introduction
Every logistic company wants to be the fastest, cheapest and deliver on time. In order to achieve this, the routes they are driving should be (near-) optimal, meaning they should be the least costly and fastest for them in order to be competitive. This is just an example where weighted distances in a network play an important role. Other examples include the spreading of epidemics through society, the spreading of rumours, videos and advertisement through (online) social networks, and several other processes spreading on the internet.
The recent interest in understanding complex networks and processes on these networks motivates the study of more and more elaborate models for these (weighted) networks. The analysis of processes on these models often reveal finer topological aspects of the models themselves. And, vice versa, the organisation and topology of a network affect the behaviour of different processes on the network. Many real-life networks turn out to share some common properties, one of them being that the degree distribution follows a power-law [23,37], examples include the world-wide web [6], the movie-actor collaboration network [5], the network of citations of scientific publications [38], and many more. Another common property is the small-world phenomenon, popularized by Millgram [35] as: "everyone on this planet is separated from anyone else by only six people". Mathematically speaking, a network exhibits the small-world property if the minimal amount of connections to go from one node to another is of order log(n) or log log(n) for ultra-small worlds, with n the amount of nodes in the network. This effect is not only seen in social networks, but also in neurological networks like the brain [1,17] or food webs [36]. A third common property is clustering as pointed out by Watts and Strogatz [40]. High clustering means that two vertices in the graph are more likely to be connected to one another when they have a common neighbor. This is a common feature in e.g. social networks.
The natural way to model a network from a mathematical point of view is to see this as a graph, where nodes are represented by vertices and their connections by edges. Since real-life networks are large, models often involve randomness to determine the presence of edges between the vertices. Random graph models that incorporate the (first two) above mentioned properties often serve as null-models for the analysis of real-life networks. Examples include variation of inhomogeneous random graphs such as the Chung-Lu or Norros-Reitu model [18,39], the configuration model [9,15], and the preferential attachment model [3]. Spatial variants are introduced to incorporate clustering, e.g. hyperbolic random graphs [14], geometric inhomogeneous random graphs [16], scale-free percolation [20], spatial preferred attachment [2,30], etc.
When modeling the spread of information in a network, edge weights to the edges can be added that represent the passage time of the information through the edge. The weighted distance is then the weight of the path with smallest total weight, corresponding to the passage time of the information from one vertex to the other. When the edge-weights are i.id., the study of the resulting weighted graph is often called first-passage percolation (FPP). Introduced by Hammersley and Welsh [24] for the grid Z d , FPP can be seen as a flow, starting from a vertex, flowing through the edges at a rate equal to the respective edge-weights, the weighted distance corresponding to the time it takes the front of the flow to reach the other vertex.
First passage percolation has been studied on the Erdős-Rényi random graph see [12], on inhomogeneous random graphs see [33]. FPP on the configuration model with finite mean degrees for exponential edge-weights is treated in [11], with finite variance degrees (i.e., power law exponent at least 3) and arbitrary edge-weight distributions in [13], and for infinite variance degrees (power-law exponent ∈ (2, 3)) for a class of edge-weights [8].
In particular, [8] determines the weighted distance when the edge-weights fall into what they call the explosive class. In this case, weighted distances converge in distribution (see Theorem 2.9 below), which heuristically means that regardless of how large the size of the network gets, the average weighted distance in the networks stays bounded. This explains the observed phenomena of extremely fast information spread in e.g. online networks such as meme spreading or viral spreading. The other class, where the weight distribution is 'nonexplosive' is further studied in [7], for the special case when the edge weights are of the form 1 + X . For this case [7] shows that the weighted distance is tight around the typical graph distance (that is, 2 log log n/| log(τ − 2)|, where τ is the power-law exponent), if and only if the extra weight X falls into the explosive class.
In this paper we investigate the missing case, i.e., FPP on the configuration model with infinite variance degrees (power-law exponent ∈ (2, 3)) and i.i.d. edge-weights that fall into the 'non-explosive' class. We determine the first order of weighted distance in the highest generality, thus, together with [7] providing an (almost) full picture of weighted distances in the τ ∈ (2, 3) case. We also extend our results to the erased configuration model, when self-loops are deleted and only one edge of every multiple edge is kept.
Structure In the next section we introduce the configuration model and state our results, as well as discuss related results and open problems. In Sect. 3 we develop a coupling to branching processes (BPs), and state and prove some ingredient lemmas about the degrees and weighted distances within these BPs. In Sect. 4 we develop a crucial tool to prove the upper bound of the main result, degree-dependent percolation. In Sect. 5, we prove the main result and extend it to the erased configuration model.
Notation We say that a sequence of events (E n ) n∈N holds with high probability (whp) if lim n→∞ P(E n ) = 1. For a sequence of random variables (X n ) n≥1 , we say than X n converges in probability to a random variable X, shortly X n P −→ X , if for all ε > 0, lim n→∞ P(|X n − X | > ε) = 0. Similarly, we say that X n converges in distribution to a random variable X, is continuous. For a non-decreasing right-continuous function F(x) the generalised inverse of F is defined as F (−1) (x) := inf{y ∈ R : F(y) ≥ x}. For an edge e = (x, y) we write L e for the associated edge-length on e. We write lhs and rhs for the left-hand side and right-hand side, respectively.

Model and Results
In this section we introduce the weighted configuration model and present our results. Then we discuss related research and describe some open problems.

The Model
We consider the configuration model CM n (d) on n vertices with degree sequence d = {d 1 , . . . , d n }. Let H n := v∈ [n] d v , the sum of the degrees with [n] := {1, 2, . . . , n}. If H n is odd we add an additional half-edge to vertex n, this does not further influence the analysis and we refrain from discussing this issue further. Given the degree sequence, the model is constructed as follows: To every vertex v ∈ [n] we assign d v half-edges, then we take a uniform random matching of the half-edges, where any two matched half-edges form an edge of the graph. The resulting random graph is denoted by CM n (d). After constructing the edges, we assign each edge e an i.i.d. edge-length L e from distribution L. We denote the resulting weighted random graph by CM n (d, L), with L = (L e ) e∈[H n /2] . We assume that the empirical distribution function of the degrees, defined as F n (x) := 1 n v∈[n] 1 {d v ≤x} , satisfies the conditions for a power-law distribution, as given in the following assumption. Assumption 2.1 (Power-law tail behavior). There exist τ ∈ (2, 3), γ ∈ (0, 1), C > 0 and α > 1/2 such that for all x ∈ [0, n α ),  (1)), thus two uniformly chosen vertices lie whp in the same connected component. Let D n denote a random variable with distribution function F n , the degree of an uniformly chosen vertex in [n]. We define B n as the (size biased version of D n )-1.
We write F B n for the distribution function of B n . As shown in [29], F B n also satisfies a similar bound as (2.1), namely, for some C > 0, To be able to relate models with different values of n to each other, we pose an additional assumption.

Assumption 2.2 (Limiting distributions).
There exist distribution functions F D (x), F B (x) such that for some κ > 0, Then, for the weighted distance, For the hopcount, for all ε > 0 and whp, there exist at least one path of length at most 1 + ε times the denominator in (2.6), with number of edges at most (1 + ε)2 log log n/| log(τ − 2)|.
Convergence in distribution of the hopcount around 2 log log n/| log(τ − 2)| remains an open question, since the upper bound does not follow from our techniques. Namely, we cannot exclude the possibility of a much longer path with optimal total edge-length.
The weighted erased configuration model is defined as follows. After CM n (d, L) is constructed, we remove all self-loops and, if there are multiple-edges between two vertices, one of the edges is chosen uniformly at random independent of the edge weights and the other edges are deleted. The resulting graph is called the weighted erased configuration model, shortly ECM n (d, L). Let us denote the L-distance in this graph by d e L (u, v)  (z) = O(1/ log log(1/z)) also diverge. This corresponds to the family of distributions F L (t) = exp{−C exp{−c/t β }}, that give explosion for β < 1 but non-explosion for β ≥ 1.
Note that (2.5) does not require that F L is continuous. By setting the edge weights to be deterministic and equal to 1 in Theorem 2.4, we obtain the following corollary. Stronger results about the graph distance were already obtained in [28,29].
Theorems 2.4 and 2.9 together describe typical distances in the configuration model with power law degrees, with exponent τ ∈ (2, 3) for all edge-weight distributions L. Next we discuss some related literature and pose some open problems.

Discussion and Open Problems
Relation to age-dependent branching processes The configuration model has a tree-like local structure. Since most cycles are long, the local neighborhood of a uniformly chosen vertex exploration around a vertex can be coupled to a branching process (BP). When the edgeweights are incorporated in the model and in the coupling, this BP becomes age-dependent. In an age-dependent BP, individuals have an i.i.d. lifetime and give birth to their i.i.d. number of offspring upon death. Let us denote such a BP with offspring distribution X and life-time distribution σ by BP(X, σ ). Let us write BP(D, X, σ ) for D i.i.d. copies of BP(X, σ ). Then, the local neighborhood of a vertex in CM n (d, L) can be approximated by BP(D, B, L). Explosion of a BP means that the BP produces infinitely many individuals in finite time, with positive probability. In 2013 Amini et al [4] gave a necessary and sufficient condition for the explosion of BP(X, L), for offspring distributions X that satisfy P(X ≥ x) ≥ x −1−ε for some ε > 0. In an unpublished note [34], under the stronger assumption that X satisfies x −1−ε ≤ P(X ≥ x) ≤ x −ε for some ε > 0, the second author simplified this criterion to the sum in (2.5) being finite. The criterion (2.5) comes from the following observation: the L-length of any path in a BP leading to infinity can be lower bounded by the sum of the minimum edge-lengths in each generation. In generation k, the number of individuals is doubly exponential in k. The minimum of this many i.i.d. random variables of distribution L is approximately the kth term in the sum in (2.5). If this sum is infinite, the BP cannot explode, thus the summability of the minimums in each generation is necessary for the BP to explode. In [4], the authors showed that this notion of minimum-summability is sufficient as well by constructing an algorithm that finds an infinite path with finite total length.
To show distributional convergence of weighted distances in CM n (d, L), as in Theorem 2.9, when the underlying BP explodes was the content of [8]. It remained open to characterise the growth of weighted distances when explosion does not happen. It follows from [4] that in the nonexplosive case, for offspring distribution X satisfying (2.3), the time to reach the first individual in generation grows as This, combined with the fact that the graph distance of a typical vertex to a maximal degree vertex is log log n/| log(τ − 2)|, gives a strong intuitive explanation for the formula for d L (u, v) in Theorem 2.4. Unfortunately, the BP approximation for CM n (d) fails much earlier than reaching the maximal degree vertex, and the BP techniques do not reveal enough information on the structure of the optimal path leading generation k of the BP, in particular, they do not provide good enough lower bounds on the degrees along the path. These are the reasons why we need to use a different technique, degree-dependent percolation, to show the upper bound on d L (u, v). Unfortunately, this technique is not fine enough to show distributional convergence of the fluctuations of d L (u, v) around its typical value.
are tight sequences of random variables. Do these sequences converge in distribution?
Infinite mean degrees, i.e., when τ ∈ (1, 2), is investigated in [10,21], where the authors show that the graph distance is whp 2 or 3, the weighted distance converges to the sum of two random variables. Finite variance degrees, τ > 3, is studied in [11,13,22,27]. In this case typical graph distances are of order log n, weighted distances scale as a constant times log n with converging fluctuations around this value, while the hopcount, centered around another constant times log n, satisfies a central limit theorem.
It still remains open to characterise weighted distances for the boundary exponents, i.e., when τ ∈ {2, 3}. For the τ = 3 case, even the explosion of the underlying age dependent BP is an open question. For τ = 2, local neighborhoods grow faster than double-exponential and the precise growth depends sensitively on the slowly varying function involved, thus the techniques used here do not apply directly. Problem 2.11 (τ = 2 or 3). Characterise weighted distances for the case when the degree distribution follows a power law (with a slowly varying function correction term) when τ = 2 and when τ = 3.
We further expect that similar results hold for a large class of power-law graph models, specially in the τ ∈ (2, 3) regime, including inhomogeneous random graphs (e.g. the Chung-Lu or Norros-Reitu models), spatial models such as the geometric inhomogeneous random graphs and scale-free-percolation.

Overview of the Proof
Next we give an overview of the proof of Theorem 2.4. The proof consists of two parts, a lower and an upper bound that use slightly different techniques.

Lower Bound
Let us denote the graph distance ball of radius k around a vertex q in CM n (d) by B G k (q), and the set of vertices precisely at graph distance k away from q by B G k (q). For the proof of the lower bound we show that B G k (u), B G k (v) can be coupled to two independent branching processes (BPs), with the first generation having distribution function F D and all further generations having distribution function F B from Assumption 2.2. We show that the coupling can be maintained until two random indices κ n (u), κ n (v) such that polynomially many vertices in n are found around both vertices u, v, and that B G Since any path connecting u, v must intersect the boundaries of these sets, we obtain the lower bound where we obtained the second line by lower bounding d L (q, B G κ n (q) (q)) by the sum of the minimal edge lengths connecting We show that this sum of minima is larger than (1 − ε) times the denominator of the lhs of (2.6) whp.

Upper Bound
The upper bound also couples the neighborhoods B k (u), B k (v) to two disjoint BPs, but we exploit the coupling only until we reach vertices u K n , v K n of degree at least K n , for some carefully chosen K n that tends to infinity with n, but d L (u, u K n ) and d L (v, v K n ) are still of negligible length compared to the denominator of the lhs of (2.6). Then we connect the vertices u K n and v K n using degree-dependent percolation that we describe now.
The idea for degree-dependent percolation originates from [7], and it is an extension of a construction by Janson [31]. In the percolated graph, we keep edges independently of each other, with probabilities that depend on the degrees of the end vertices of the edge. We use the i.i.d. edge lengths to realise the percolation, i.e., an edge e = (x, y) is kept if and only if its edge length satisfies for some appropriately chosen threshold function ξ(·, ·). We use a result from [7,31], that states that the percolated graph can be looked at as a subgraph G r of a configuration model with a new degree sequence d r . We choose ξ in such a way that the new degree sequence still satisfies the power-law condition in (2.1), for the same τ but possibly different C, γ . Note that G r is a subgraph of the original CM n (d, L), and as a result any path present in G r was necessarily also present in CM n (d, L). We show that u K n , v K n has percolated degree at least K n . Then, we construct two paths, emanating from u K n and v K n , and reaching vertices u , v of percolated degree at least n α(τ −2) , respectively, in this percolated graph. We control the (growing) degrees of vertices along these paths and as a result (2.12) gives an upper bound on the edge-lengths along these paths. More precisely, analogous to [7], we define a sequence y i (K n ) with y 0 = K n and layers in the graphs i := {v ∈ [n] : d v ≥ y i (K n )}, for 0 ≤ i ≤ i max with i max the number of layers. We show that a vertex in i is connected to a vertex in i+1 whp, moreover the total error probability over all the layers tends to zero as K n → ∞. Thus whp there exist paths from u K n , v K n where the ith vertex along the path has degree at least y i (K n ). Finally, we connect the vertices u , v in G r via a path of length at most three edges using vertices with degree at least n 1/2+δ for some small δ > 0. The length of the constructed path is at most (2.13) The first two terms on the rhs, coming from the branching processes, are negligible due to the choice of K n , and the last term also since it tends to zero with n. With the proper choice of ξ(·, ·), the middle term becomes at most (1 + ε/2) times the denominator of (2.6).

Exploration Around Two Vertices
The goal of this section is to couple the neighborhoods B k (u), B k (v) of the two uniformly chosen vertices u, v to two independent BPs. We first show that, for q ∈ {u, v} the coupling can be maintained until k = κ n (q) = log log n/| log(τ − 2)|+ a tight random variable. Then, using the growth of the BPs, we make (2.11) quantitative by giving a whp lower bound on the minimum of edge-lengths connecting consecutive generations in the BPs.
As a preparation for the upper bound, as in (2.13), we determine M n , the number of generations needed to reach a vertex with degree at least K n , that we denote by q K n , for q ∈ {u, v}. Finally, we give an upper bound on d L (q, q K n ) for q ∈ {u, v}.

Coupling of the Exploration to a Branching Process
First we explain the coupling of the neighborhoods of the vertices u and v to branching processes. The coupling uses an exploration, where we reveal the pairs of half-edges and thus the neighbors of vertices together with their degrees one-by-one, in a breadth-firstsearch manner. By U t , V t we denote the subgraphs consisting of vertices at graph distance of at most t from u and v, respectively. The forward degree of a vertex v in the exploration denotes then the number of new (not previously discovered) neighbors of a vertex upon exploration. We slightly adjust [29, Lemma 2.2]) to our setting, since Assumptions 2.1-2.2 are a special case of the assumptions of [29,Lemma 2.2]). An alternative formulation and proof can be found in [11,Proposition 4.7]. Lemma 3.1 (Coupling error of the exploration process, [29]). Consider CM n (d) satisfying Assumptions 2.1-2.2. Then, in the exploration process started from two uniformly chosen vertices u and v, the forward degrees (X (n) k ) k≤s n of the first s n newly discovered vertices can be coupled to an i.i.d. sequence B k from distribution B as in Assumption 2.2. So, there is a coupling (X (n) k , B k ) k≤s n with the following error bound An immediate corollary is the following: Corollary 3.2 (Whp coupling of the exploration to BPs, [29]). In the configuration model satisfying Assumptions 2.1 and 2.2, let t be such that

with distribution F B for the offspring in the second and further generations, and with distribution F for the offspring in the first generation.
The proofs of Lemma 3.1 and Corollary 3.2 can be found in [29,Sect. 2].
Davies in [19] shows that for a BP with offspring distribution satisfying the tail behavior in (2.3), the sequence of random variables Y k := (τ − 2) k log(Z k ) converges almost surely. It is elementary to extend his result to a BP where the root has a different offspring distribution (see [8] for details). Having this result in mind for the two BPs coupled to the neighborhoods of u, v, we rewrite the generation sizes as Fixing a small δ > 0, we define for q ∈ {u, v}, and then Corollary 3.2 implies that B κ n (q) (q) has size Z (q) κ n (q) since the coupling can still be maintained. Combining (3.3) and (3.4) we obtain that Y (q) κ n (q) converges in distribution to two independent copies of the same random variable Y . The convergence now is only distributional, since there is no coupling between the BPs for different values of n. Using where κ n (q) is an integer; Y (q) κ n (q) , for q ∈ {u, v} are independent and converge in distribution, and f n (q) ∈ (τ − 2, 1] describes the exponent θ(n) f n (q) that satisfies Z (q) κ n (q) = n θ(n) f n (q) .
The message of this claim is that the coupling can be maintained until log log n/| log(τ −2)| + a tight random variable many generations.
Proof Fix δ > 0 small enough and set θ(δ) as in Corollary 3.2. Corollary 3.2 then implies that the coupling error converges to zero as long as (3.5) implies that this is indeed satisfied for k 1 := κ n (u), k 2 := κ n (v). The value of κ n (q) in (3.5) is obtained by an elementary rearrangement of the formula (3.3) when k is replaced by κ n (q), and we took the integer part of the obtained expression. Note that κ n (q) is well-defined this way since the generation sizes are increasing doubly exponentially for all large enough k due to (3.3) and the fact that Y (q) k would converge if we would let k tend to infinity, and as a result the total size of B k (q) is 1 + o(1) times the last generation size.
Next we make (2.11) quantitative by giving a lower bound on the length of the path from q to generation κ n (q). Recall that B k (w) is the set of vertices at distance k from a vertex w in CM n (d).  (2.5). Let u, v be two uniformly chosen vertices. Then, for q ∈ {u, v}, with κ n (q) as in (3.5), Proof Under the coupling between the neighborhoods of u, v to the BPs established in Corollary 3.2 and Claim 3.3, for all i ≤ κ n (q), | B i (q)| = Z (q) i , the size of generation i in the BP coupled to the neighborhood of u. Using the idea in (2.11), d L (u, B κ n (q) (q)) is longer than the sum of the minimum edge-lengths between consecutive generations. That is, where L (u) i, j are i.i.d. for all i, j and q ∈ {u, v}. We let  N 1+ξ ). Using that the minimum in (3.7) is non-increasing when increasing the number of variables involved, by (3.9), we can set N to be exp{(τ − 2) −i C n } to estimate the ith term in (3.7) from below using (3.10). Conditioning on the value C n , κ n (q), combined with a union bound, yields that the inequality holds with error probability (conditioned on C n , κ n (q)) at most for some constant C 1 > 0. Combining (3.7) and (3.11) yields that under the coupling, with error probability given in (3.12), Next we transform the rhs to match the format in (3.6). With a = min{m ∈ Z, m ≥ a}, b = max{m ∈ Z, m ≤ a}, we use the following inequalities valid for monotone non-increasing functions g with g(0) (3.14) We use ( ) to bound the rhs of (3.13) from below, then we carry out the variable transformation 1/(τ − 2) x C n (1 + ξ) = 1/(τ − 2) y , and transform the integral back to a sum using ( ). The variable transformation shifts the summation boundaries by C n := log(C n (1 + ξ ))/| log(τ − 2)|, and we obtain that We bound the upper summation boundary on the rhs from below. Recall C n from (3.8), then Using now the formula for κ n (q) from (3.5), Next, the lower summation boundary on the rhs of (3.15) is not 1, and, if C n → ∞, then this might cause too much difference from the desired sum in (3.6). Thus, for any fixed ε > 0 we define Since the sum on the rhs between the brackets tends to infinity with n, so will R n (ε). Setting C n = R n (ε), combined with (3.17) and the fact that the summands tend to zero then implies that Since R n (ε) tends to infinity, so will C n , ensuring that the error probability in (3.12) tends to zero as well. This finishes the proof of the lower bound.
Next we do some preparations for the proof of the upper bound. First we investigate the number of generations we need to explore to reach a vertex of degree at least K n .
By Davies [19], the limiting variable lim n→∞ Y (q) k is almost surely positive on survival of the BP. By Assumption 2.1, P(B ≥ 1) = 1 and thus the BP cannot go extinct and therefore P(Y (q) M n = 0) = 0. Thus, the second term on the rhs in (3.21) converges to zero. For the first term, Using the lower bound on F B from (2.3), with L(x) := exp{−C (log x) γ } we obtain where we used that M n = M log log K n , with 1 + δ = M| log(τ − 2)|, the last inequality holds for K n large. Thus both probabilities on the rhs in (3.21) tend to zero as K n tends to infinity, which completes the proof.
In the proof of the upper bound of the main theorem we run the exploration algorithm until we reach a vertex of degree K n . The path to this vertex is the sum of i.i.d. copies of edge weights. We show that whp this sum is less than some ε 1 > 0 times the denominator of the lhs of (2.6).

Lemma 3.6 (Upper bound on the length of the path in the exploration process). Let (L i ) i≥1 be i.i.d. from distribution F L satisfying (2.5). Then for all ε 1 > 0 there exists a choice of M n such that M n tends to infinity with n and
(3.24) The lemma immediately follows from the following, more general result.
Proof We distinguish two cases, based on the tail behavior of L. When F L does not satisfy any of these cases, L can be stochastically dominated by a random variable that does satisfy (at least) one of these cases and then the result follows by a simple stochastic domination argument.
Case (1) . For some small δ, ε 2 > 0, we define z L (m) implicitly by where g (−1) (x) = inf{y ∈ R : g(y) ≥ x}. Since g(x) is non-decreasing and a m tends to infinity, z L (m) tends to infinity as well. Note that when g(x) ≥ x a for some a ∈ (0, 1), capturing regularly varying cases, a lower bound on (3.27) can be explicitly calculated: To estimate the lhs of (3.26) in this case, we use a truncation argument. We condition (3.24) on the maximum of the L i being larger than T m := g (−1) (z L (m)) 1+ε 2 or not, which gives us the following upper bound (3.28) First we focus on the first term in (3.28). Using that P(L i > x) = 1/g(x) and the value T m , that tends to zero as m tends to infinity. Next we investigate the second term in (3.28) which we bound with Markov's inequality, Combining (3.29) and (3.31) implies that (3.28) tends to zero as m tends to infinity. This finishes the proof.

Degree-Dependent Percolation on the Configuration Model
In this section we make the degree dependent-percolation precise, that we have described in Sect. 2.3.2. Percolation for the configuration model was studied in [31] and later adjusted for the degree-dependent version in [7]. The induced subgraph of G on vertex set S is the largest subgraph of G with edges that have both endpoints in S. We denote the induced graph of a graph G restricted to the vertices in a set S by G |S . Let p(d) : N → [0, 1] be a monotone decreasing function of d. For a half-edge s we write the percolation probability shortly as p s := p(d v(s) ) with v(s) the vertex that s is attached to and d v(s) the degree of vertex v(s). Now we define two different ways to percolate the configuration model. After that we show equality in distribution for the two different percolated graphs.   (d 1 , . . . , d n ) and a half-edge s, we keep a half-edge with probability p s independently. If we do not keep it, then we create a new vertex with one half-edge corresponding to the deleted half-edge. We call the newly created vertex and half-edge artificial. We denote the total number of artificial vertices by A. After this procedure is carried out for all half-edges, we pair all the half-edges uniformly at random, (including the artificial ones as well). In the end we take the induced subgraph on the n original vertices. We denote the resulting graph by CM By denoting the number of half-edges that are kept at vertex i by d r i , and 1(A) a sequence with A repetitions of the value 1, CM ((d r , 1(A))) |[n] , i.e., the induced subgraph of the first n vertices of a configuration model with n + A vertices, and degree sequence (d r , 1(A)) that is d r i for i ≤ n and 1 for i ≥ n. A result in [7] is the following:  for some constants b, c > 0 and η ∈ (0, 1). Then there exists θ > 0 such that for all x ∈ [θ, n α ] the empirical degree distribution F r n (x) of the degrees after percolation still satisfies Assumption 2.1, except the condition on the minimal degree, with the same τ, α, but possibly different γ ∈ (0, 1).
Proof By Definition 4.2, half-edges are kept independently, and thus, given where Bin(n, p) is a binomial random variable with parameters n and p. As a result the random variables (d r i ) i≤n are independent given the initial degrees (d 1 , . . . , d n ). The upper bound in Assumption 2.1 for F r n is elementary since Next we show the lower bound. First we define for all x < n α where s(x) > x is a function of x that is defined later. Clearly, We choose the value of s(x) such that the probability that the indicators within the sum are 1 with high enough probability, for all i ∈ S(x). Namely, if we choose s(x) such that the expectation of the binomial, d i p(d i ), is higher than 2x for all vertices in S(x), then we can use the concentration of binomial random variables [25,Theorem 2.21] to get an upper bound on the probability that the indicator functions are 1. Let then we find that Since η < 1, the factor 1 + 2c(log(2x/b)) η−1 in the exponent of the last factor is at most 3/2 whenever x ≥ b 2 exp (4c) 1−η := b 2 θ . Using this fact and the monotonicity of dp(d), we find that for all d i > s(x), Then, for all d i > s(x), by [25,Theorem 2.21], whenever x > 4 log 8. Using this we get for all x ≥ max b 2 θ, 4 log 8 := θ since |S(x)| decreases as x increases. Using (2.1) |S(x)| can be bounded from below as follows It is elementary to calculate that, with s(x) as in (4.5), the rhs satisfies satisfies the lower bound in Assumption 2.1, with the same τ, α, while the new value of γ is max{γ old , η}. Using this bound for |S(x)| within the probability sign in (4.8) and for |S(n α )| on the rhs of (4.8) we arrive at: log s(x)) η ≤ e α log n e −n ε e −c(log n) η n→∞ −−−→ 0. (4.9) Next we prepare more for the proof of the upper bound of Theorem 2.4, by comparing the degree of a fixed vertex before and after the half-edge percolation. This will be used to ensure that u K n , v K n in Sect. 2.3.2 still has sufficiently high degree in the percolated subgraph.

Lemma 4.5 (Degree after percolation vs original degree).
Apply half-edge percolation as described in Definition 4.2 with percolation function p(d) satisfying (4.1) on CM n (d). Let K n = O(log n) an arbitrary sequence that tends to infinity with n. We define K n := sup m : 2m ≤ K n be −c(log K n ) η . (4.10) Proof First we investigate the expected degree of a vertex after percolation. Consider a vertex w with degree d w , as before d r w denotes the degree after the half-edge percolation. Recall that d r , p(d w )). Thus The rhs is monotone increasing in d w , and tends to infinity as d w → ∞. Therefore, by setting K n as in (4.10), K n tends to infinity when K n does. By (4.10), the expectation of a Bin(d w , p(d w )), for any d w ≥ K n , is larger than 2K n . Knowing that, we can use the concentration of binomial random variables [26,Theorem 2.21] to obtain a bound on the probability that the binomial is smaller than K n , i.e.
since K n tends to infinity this finishes the proof.

Upper and Lower Bound on Weighted Distances
In this section we give the proofs of Theorems 2.4 and 2.5. We start with the main result as stated in Theorem 2.4, after that we give the proof of Theorem 2.5. We start with the lower bound as stated in the following lemma: and for the hopcount Proof We consider two uniformly chosen vertices u and v. We do a BFS-exploration on both sides and by Lemma 3.3, we can couple these explorations whp to two independent BPs until generations κ n (u), κ n (v) respectively. We write B κ n (q) (q) for the set of vertices at distance κ n (q) from vertex q ∈ {u, v}, respectively. By the coupling, these explorations are disjoint whp. Since any path connecting u, v must intersect B κ n (u) (u), B κ n (v) (v), we have the following lower bounds on the weighted distance and the hopcount between u, v: Then, (5.1) directly follows from the first inequality and Lemma 3.4. By (5.3), the result of the lemma follows by a union bound. For the hopcount, the second inequality combined with Lemma 3.3 yields (5.2), since Y (q) κ n (q) converges in distribution.
For the proof of the upper bound we use a proposition, similar to [7, Proposition 2.1], which gives an upper bound on the path length between two vertices of a fixed degree of at least K . In our setting the vertices have degree at least K n with K n tending to infinity with n. We provide the adjusted proof since the adjustments are non-trivial. CM n (d) satisfying (2.1) for all x ∈ [θ, n α ] for some given θ ∈ R and some α > 1/2. Let w be a vertex with degree at least K n . Then, whp, there exists a path (π 0 = w, π 1 . . . , π i max = w ) from w to a vertex w with degree at least n (τ −2)α such that the degree d π i of the ith vertex on the path satisfies

4)
with δ n → 0 as K n → ∞. Whp, i max , the length of this path is at most Proof We shall denote the number of edges on the path from w to w by i max and we define the following sets of vertices for some increasing sequence y i (K n ) =: y i to be determined shortly. ( i ) i≤i max can be seen as layers of the graph, where i max is the maximal i such that i is non-empty. Our goal is to prove that there exists a sequence y i (K n ), defining the layers i , such that the following holds: where π i is a vertex chosen from i according to the size-biased distribution, equivalently, vertex π i is the vertex that a uniformly chosen half-edge from i is attached to. Conditioning on the total number of half edges H n in CM n (d), and H y i , the number of half-edges attached to vertices in the set i , by pairing the half-edges of a vertex z ∈ i , we can pair at least y i /2 half-edges before all the half-edges of z are paired, and each of these half-edges is paired to a half-edge attached to a vertex in i+1 with probability at least 1 − H y i+1 /H n . Thus, Note in particular that this bound holds when the vertex z is chosen randomly from i in a way that does not take into account its connections, in particular it holds when z is chosen according to the size-biased distribution from i . Since any vertex in i has degree larger than y i and | i | = n(1 − F n (y i )), H y i ≥ y i n(1 − F n (y i )). Under Assumptions 2.1, 2.2, H n ≤ ϕn for some ϕ ∈ R, thus, we have by (5.8) For now we focus on the term in the exponent. Using (2.1), we lower bound , (5.10) with C defined in (2.1) andc some positive constant. Now we would like to choose the sequence y i = y i (K n ) such that (5.9) converges to zero in particular that (5.7) holds. We claim that this holds when y i is given by the following recursion with A > 0 defined later. Note that for sufficiently large K n , since γ < 1, (5.13) in particular, y i is increasing doubly exponentially. We use the recursion relation of (5.11) in (5.10)c Choose A ≥ 2C and use that the sequence y i is increasing and the lower bound in (5.13), thenc (5.15) with C = C/(τ − 2 − n ). Combining everything from (5.10), we can use this lower bound in the exponent on the rhs of (5.9), and, since τ − 2 + n < 1, the rhs of (5.9) is summable in i. Summing the lhs of (5.9) over i and then use the above bound we obtain (5.16) which tends to zero with n as K n tends to infinity with n. This result yields the statement of (5.7). Using the result of [7, Lemma 2.6], the lower bound in (5.13) can be improved to with δ n → 0 as K n → ∞. 1 The path in the statement of the lemma is then constructed as follows, starting from the first vertex π 0 := w. By the first term in the sum in (5.7), π 0 is whp connected to at least one vertex in 1 . By the fact that the pairs of the half edges of π 0 are chosen uniformly, π 1 is a vertex attached to a uniformly chosen half-edge in 1 . As a result, π 1 is chosen according to the size-biased distribution within 1 . Then we iterate this procedure to obtain π 2 , π 3 , . . . in 2 , 3 , . . . until the layers become empty by the constraint that the maximal degree guaranteed to exist is of order n α . We denote the last vertex on the path by w . The constructed path might jump across a layer, so the number of layers is an upper bound on the length of the path from w to w . The last layer that is nonempty is then i max with i max is the largest integer with Since δ n → 0, α < 1, the following upper bound then holds We yet have to show that y i max ≥ n α(τ −2) . For this, elementary rearrangement yields that the lhs of (5.18) equals n α(τ −2) β , with β ∈ [0, 1) the fractional part of the middle term in (5.19). This finishes the proof.
Further, there exists a path between u, v with at most (1 + ε)2 log log n/(τ − 2) edges and having total length at most (1 + ε)2 Proof For brevity let a n := . First we construct the initial segments of the connecting path from both ends from u, v, as described heuristically in Sect. 2.3.2. Let M n be as in Lemma 3.6, with ε 1 := ε/3. Then, consider any vertex w of graph distance M n away in CM n (d) from q ∈ {u, v}, chosen independently of (L e ) e . Then, since the edge-lengths in CM n (d, L) are i.i.d. on the edges of the path from q to w, by Lemma 3.6, d L (q, w) ≤ a n ε/3 in CM n (d, L) whp. As a result of Lemma 3.5, for any M with M| log(τ − 2)| > 1, at graph distance M log log K n away from q ∈ {u, v}, there is at least one vertex with degree K n in CM n (d) whp. Thus, by defining K n via M n = M log log K n , (equivalently, K n := exp exp{M n /M} ), we find vertices with degree at least K n at graph distance M n away from q ∈ {u, v}, whp. Then, pick q K n for q ∈ {u, v} in an arbitrary way among these vertices that is independent of (L e ) e . Then, the previous argument applies and whp, d L (q, q K n ) ≤ a n ε/3 (5.21) in CM n (d, L) for q ∈ {u, v}. Next we connect u K n , v K n using degree-dependent percolation. When applying edgedependent percolation (as in Definition 4.1) on CM n (d, L), we can use the edge-lengths (L e ) e as auxiliary variables to decide which edge to keep. Namely, we keep edge e iff L e ≤ ξ(d, d ), with ξ(d, d ) satisfying P(L ≤ ξ(d, d )) = p(d) p(d ). By Proposition 4.3, we can consider the percolated (sub)graph as an instance of a configuration model where the new degree sequence is (d r , 1(A)). We yet have to specify the percolation function that we use. For some c > 0, η ∈ (0, 1) to be determined later, let The conditions of Lemma 4.5 apply, thus, with K n as in (4.10), d r q Kn ≥ K n whp for q ∈ {u, v}. Further, the conditions of Lemma 4.4 are also satisfied, thus the (d r , 1(A)) sequence obtained after percolation still satisfies Assumption 2.1 (except the condition on the minimal degree being at least 2). Hence, following Proposition 5.2, we construct a path connecting u K n , v K n in the percolated graph, with good control on the (percolated) degrees along the path. Note that this path only uses vertices with percolated degree strictly larger than one, i.e., artificial vertices are not used, and as a result the constructed path is part of CM For q ∈ {u, v}, we use the constructed path as described in Proposition 5.2 starting from q K n to reach a vertex q with d r q ≥ n α(τ −2) . A lower bound on the degree of the ith vertex on this path is given by y i (K n ) =: y i in (5.4). Since p(d) is monotone decreasing, ξ(d, d ) is non-increasing in both variables. Thus, the edge-lengths on the constructed path are at most ξ( y i , y i+1 ) for i = 0, 1, . . . , i max − 1. Hence, for q ∈ {u, v} (5.23) Next we connect the two high-degree vertices u and v . Recall that H n stands for the total number of half-edges, and is at least some constant ϕ times n under Assumption 2.1 even without the minimal degree assumption. Fix δ ∈ (0, α − 1/2) and write 1/2+δ := {w : d r w ≥ n 1/2+δ }, as well as H 1/2+δ := w∈ 1/2+δ d r w . Then, following (5.8)-(5.10), which tends to zero as n → ∞ since 1/2 + δ < α. Thus, we can find vertices u , v ∈ 1/2+δ such that (q , q ) are kept edges in the percolated graph whp. Finally, we show that the edge (u , v ) is also present whp in the percolated graph.
which tends to zero as n → ∞. By the monotonicity of ξ , whp, the vertices u and v are connected via at most 3 edges with length at most Combining (5.27) with the bound on i max from Proposition 5.2, (5.23) can be bounded above as Similar to the proof of Lemma 3.4, we need to transform the rhs to the desired form in (5.20).
Using similar bounds as in (3.14), we rewrite the sum to an integral, change variables as (τ − 2) −ηx (log K 1−δ n n ) η =: (τ − 2) −y , and change the integral back to a sum. This operation shifts the summation boundaries by η log log K n /| log(τ − 2)| and multiplies the whole sum by η. We obtain By choosing η ∈ (0, 1) in (5.22) such that 1/η < 1 + ε/3 so we obtain that Finally, it is not hard to see that 3ξ(n α (τ − 2), n 1/2 ) ≤ a n ε/3 holds as well for all large enough n. Combining everything, we arrive at This finishes the proof of (5.20). For the second statement, recall that for some M ≥ 1/| log(τ − 2)|, M n = M log log( K n ) and note that the number of edges on the constructed path is at most 2M log log K n + 2 log log n − log log K n | log(τ − 2)| + 3, (5.33) where the relation between K n and K n is described in Lemma 4.5 in (4.10). From (4.10) it is elementary to check that for all n large enough log log K n = log log K n + log(1 − c(log K n ) η−1 ) = log log K n + o(1), (5.34) thus, writing M := (1 + z)/| log(τ − 2)| for some z > 0, the number of edges in the constructed path is at most 2 log log n + 2z log log K n + o(1) | log(τ − 2)| + 3 ≤ (1 + ε) 2 log log n | log(τ − 2)| , (5.35) as desired.
Proof of Theorem 2.4 Lemma 5.1 states the proof of the lower bound and Lemma 5.3 the proof of the upper bound. These combined prove the statement of the theorem.

Erased Configuration Model
In this section we prove Theorem 2.5.
Proof of Theorem 2.5, lower bound The strategy of the proof is the following: first we show that the lower bound is also valid in the erased model. Then, we show that the constructed paths in the proof of the upper bound between vertices q, q K n and q K n , q are whp simple for q ∈ {u, v}, and as a result they survive the erasing procedure whp. Finally, we connect u , v in the erased model in some other way than that in the original model. First we start with the lower bound. The proof of Lemma 5.1 consists of a BFS exploration around the two vertices u and v. These explorations can whp be coupled to two BP trees and therefore all edges within these trees are whp simple. So this lemma remains valid after erasure and thus the lower bound follows both for the weighted distance as well as for the hopcount.
In the proof of the upper bound we again use a coupling to BP trees to find u K n , v K n . Thus, the path between q, q K n is again whp simple and thus it survives erasure. Next we investigate the constructed path between q K n , q . This path is constructed in the percolated graph. The erasure happens before the degree-dependent percolation, so edges of the path constructed in Proposition 5.2 could in principle be deleted earlier in the erasure procedure. We show that the edges on the constructed path were not part of a multiple edge whp, meaning that they were whp not erased before. For this, we state a lemma that gives a bound on the original degree of a vertex, given its percolated degree d r . This lemma is the 'reverse' of Lemma 4.5.
Claim 5.4 (Degree after percolation vs original degree). Apply half-edge percolation as described in Definition 4.2 with percolation function p(d) satisfying (4.1) on CM n (d). Let ω(n) be an arbitrary sequence that tends to infinity with n. Let s(x) be defined as in (4.5). Then, for a vertex w ∈ CM n (d), for some c > 0.
Proof The proof directly follows from Bayes' theorem applied to the lhs of (5.36), and following the calculations between (4.5) and (4.7). Proof Note that the path in Prop. 5.2 is later, in the proof of Theorem 2.4 is constructed in the percolated graph. Thus u i , the ith vertex on this path has percolated degree at least as in (5.4). Without loss of generality we can assume that d r u i ≤ K 1−δ n n 1 τ −2 i+1 =: y i+1 , (5.38) since otherwise the path has 'jumped' a layer and one can consider the path to be shorter by an edge. Recall that d u imax ≥ n α(τ −2) holds as well. Applying Claim 5.4 on (u i ) i≤i max −1 , using the upper bound in (5.38), c exp{−y i+1 /4} (5.39) which tends to zero with n since it is a constant times the first term. We can rewrite the probability in (5.37) as We investigate the probabilities in (5.40) separately starting with the probability that there is exactly one edge between those two vertices. We lower bound the probability that precisely the jth half-edge of u i connects to u i+1 , and the others do not. Note that for the kth half-edge the probability of not connecting to u i+1 is at least (H n −d u i+1 −2(k −1))/(H n −2(k −1)−1) ≥ The rhs converges to zero as n tends to infinity as long as α < ((τ − 2)(τ − 1)) −1 , which we have assumed in Assumption 2.1.
Proof of Theorem 2.5, upper bound As mentioned before, we construct a path in ECM n (d, L) to connect u, v. For this it is enough to construct a path with all its edges begin simple edges in CM n (d, L). This path has a huge overlap with the path in the upper bound of Theorem 2.4. Namely, the segments between u, u K n and v, v K n are whp using simple edges by the coupling to BP trees. The segments between u K n , u i max −1 and v K n , v i max −1 are whp using simple edges again so they survives erasure. Next we connect u i max −1 to v i max −1 . Note that the constructed path in CM n (d, L) might use multiple edges so we need a different connecting path. However, q i max −1 for q ∈ {u, v} are vertices with degree at least n α(τ −2)(1+o (1)) . In the proof of Theorem 2.4, we created a 3-hop connection between u i max = u and v i max = v in the percolated graph, see (5.25)-(5.26). When we erase a multiple edge, we keep one edge independently of its edge-length. Thus, from every multiple edge at least one edge remains.
Hence, an analogous construction as in (5.25)-(5.26) can be repeated, not for the percolated graph but for the original graph, developing a 5-hop connection between u i max −1 , v i max −1 .
The edge-lengths on this path are simply i.i.d. copies of L. Thus, Then we treat the terms in (5.45) similarly as we did in the proof of Theorem 2.4 (see (5.23) and (5.27)-(5.31)) finishes the proof.