Probabilistic Analysis of Optimization Problems on Sparse Random Shortest Path Metrics

Simple heuristics for (combinatorial) optimization problems often show a remarkable performance in practice. Worst-case analysis often falls short of explaining this performance. Because of this, “beyond worst-case analysis” of algorithms has recently gained a lot of attention, including probabilistic analysis of algorithms. The instances of many (combinatorial) optimization problems are essentially a discrete metric space. Probabilistic analysis for such metric optimization problems has nevertheless mostly been conducted on instances drawn from Euclidean space, which provides a structure that is usually heavily exploited in the analysis. However, most instances from practice are not Euclidean. Little work has been done on metric instances drawn from other, more realistic, distributions. Some initial results have been obtained in recent years, where random shortest path metrics generated from dense graphs (either complete graphs or Erdős–Rényi random graphs) have been used so far. In this paper we extend these findings to sparse graphs, with a focus on sparse graphs with ‘fast growing cut sizes’, i.e. graphs for which |δ(U)|=Ω(|U|ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\delta (U)|=\Omega (|U|^\varepsilon )$$\end{document} for some constant ε∈(0,1)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon \in (0,1)$$\end{document} for all subsets U of the vertices, where δ(U)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\delta (U)$$\end{document} is the set of edges connecting U to the remaining vertices. A random shortest path metric is constructed by drawing independent random edge weights for each edge in the graph and setting the distance between every pair of vertices to the length of a shortest path between them with respect to the drawn weights. For such instances generated from a sparse graph with fast growing cut sizes, we prove that the greedy heuristic for the minimum distance maximum matching problem, and the nearest neighbor and insertion heuristics for the traveling salesman problem all achieve a constant expected approximation ratio. Additionally, for instances generated from an arbitrary sparse graph, we show that the 2-opt heuristic for the traveling salesman problem also achieves a constant expected approximation ratio.


Introduction
Large-scale optimization problems, such as the traveling salesman problem (TSP), are relevant for many applications.Often it is not possible to solve these problems to optimality within a reasonable amount of time, especially when instances get larger.Therefore, in practice these kind of problems are tackled by using approximation algorithms or ad-hoc heuristics.Even though the worst-case performance of these, often simple, heuristics is usually rather bad, they often show a remarkably good performance in practice.
In order to find theoretical results that are closer to the practical observations, probabilistic analysis has been a useful tool over the last decades.One of the main challenges here is to choose a probability distribution on the set of possible instances of the problem: on the one hand this distribution should be sufficiently simple in order to make the probabilistic analysis possible, but on the other hand the distribution should somehow reflect realistic instances.
In the 'early days' of probabilistic analysis, random instances were either generated by using independent random edge lengths or embedded in Euclidean space (e.g.[3,14]).Although these models have some nice mathematical properties that enable the probabilistic analysis, they have shortcomings regarding their realism: in practice, instances are often metric, but not Euclidean, and independent random edge lengths are not even metric.
Recently, Bringmann et al. [8] widened the scope of models for generating random instances by using the following model, already proposed by Karp and Steele in 1985 [21]: given an undirected complete graph, draw edge weights independently at random and then define the distance between any two vertices as the total weight of the shortest path between them, measured with respect to those random weights.Even though this model broadens the scope of random metric spaces, the resulting instances from this model are not very realistic.
In this paper we adapt this model in the sense that we start with a sparse graph instead of a complete graph.We believe that this yields instances that are more realistic, for instance since in practice the underlying (road, communication, etc.) networks are almost always sparse.

Related Work
The model described above is known by two different names: random shortest path metrics and first-passage percolation.It was introduced by Hammersley and Welsh under the latter name as a model for fluid flow through a (random) porous medium [15,18].A lot of studies have been conducted on first-passage percolation, mostly on this model defined on the lattice Z d .
For first-passage percolation defined on complete graphs many structural results exist.We know for instance that the expected distance between two arbitrary fixed vertices is approximately ln(n)/n and that the distance from a fixed vertex to the vertex that is farthest away from it is approximately 2 ln(n)/n [8,19].We also know that the diameter in this case is approximately 3 ln(n)/n [16,19].Bringmann et al. used this model to analyze heuristics for matching, TSP and k-median [8].
A lot of studies have been conducted on the model of random shortest path metrics (or first-passage percolation).Many of these studies focused on first-passage percolation defined on the integer lattice Z d .Although very few precise results are known for this model, there are many existential results available.For instance, the distance between the origin and ne 1 (where e 1 is the unit vector in the first coordinate direction) is known to be (n).Also, the set of vertices within distance t from a given vertex grows linearly in t and, after rescaling, converges to some convex domain [26].The survey by Auffinger et al. [1] contains a thorough overview.

Our Results
This paper aims at extending the results of Bringmann et al. [8] and Klootwijk et al. [22] to the more realistic setting of random shortest path metrics generated from sparse graphs, i.e., graphs G = (V , E) for which |E| = O(|V |).We believe that the probabilistic analysis of simple heuristics in different random models will enhance the understanding of the performance of these heuristics, which are used in many applications.
In this paper we provide a probabilistic analysis of some simple heuristics in the model of random shortest path metrics generated from sparse graphs.For most of the results in this paper we need to restrict ourselves to classes of sparse graphs that have 'fast growing cut sizes'.The following definition formalizes the notion of fast growing cut sizes Definition 1 Let G be a family of sparse connected undirected simple graphs.We say that G has fast growing cut sizes if there exist constants c > 0 and ε, c ∈ (0, 1) such that for any In the remainder of this paper, whenever we say that a family of sparse graphs has fast growing cut sizes, we implicitly assume that it satisfies Definition 1 for some constants c, ε, c .Intuitively, this definition implies that a family of sparse graphs with fast growing cut sizes cannot have too many 'bottlenecks'.Loosely speaking, a bottleneck is given by two relatively large sets of vertices with only relatively few edges between them.Even though Definition 1 might seem rather restrictive at first glance, many graph classes actually have fast growing cut sizes.Examples include d-dimensional grid graphs (see Example 2 in Sect.2.2 for a proof), other lattice graphs and random geometric graphs (with high probability).Empirically, also many (real-life) network graphs have fast growing cut sizes.In particular, Definition 1 can be seen as a generalization of the notion of expander graphs, since setting c = 1/2 and ε = 1 yields the definition of a family of expander graphs [17]. 1n Sect. 3 we provide some structural properties of random shortest path metrics generated from sparse graphs with fast growing cut sizes.Partially, these properties can be seen as a generalization of some of the structural properties found by Bringmann et al. for random shortest path metrics generated from complete graphs [8].
For the probabilistic analyses in this paper we consider two different types of simple heuristics.In Sect. 4 we conduct a probabilistic analysis of three greedy-like heuristics: the greedy heuristic for the minimum-distance perfect matching problem, and the nearest neighbor heuristic and insertion heuristic for the TSP.In Sect. 5 we conduct a probabilistic analysis of a local search heuristic: the 2-opt heuristic for the TSP.We show that all four heuristics yield a constant approximation ratio for random shortest path metrics generated from sparse graphs with fast growing cut sizes (greedylike in Sect.4) or arbitrary sparse graphs (local search in Sect.5).We are aware that our results regarding the 2-opt heuristic are mostly purely theoretical, because, e.g., cheapest insertion already achieves an approximation ratio of 2 and is often used to initialize 2-opt [12,27].However, they are non-trivial results about practically used algorithms, beyond the classical worst-case analysis.

Notation and Model
For n ∈ N, we use [n] as shorthand notation for the set {1, . . ., n}.Sometimes we use exp(•) to denote the exponential function with base e.We denote by X ∼ P that a random variable X is distributed according to a probability distribution P. Exp(λ) denotes the exponential distribution with parameter λ.We write X ∼ n i=1 Exp(λ i ) if X is the sum of n independent exponentially distributed random variables having parameters λ 1 , . . ., λ n .In particular, X ∼ n i=1 Exp(λ) denotes an Erlang distribution with parameters n and λ.If a random variable X 1 is stochastically dominated by another random variable X 2 , i.e., we have P(X 1 ≤ x) ≥ P(X 2 ≤ x) for all x, we denote this by X 1 X 2 .
Furthermore, we use as shorthand notation for the nth generalized harmonic number of order m, i.e., H (m) n = n i=1 1/i m .Observe that for m ∈ (0, 1) we can view the generalized harmonic numbers as Riemann sums for 1/x m dx and bound them as follows: In particular, for any y > 1 (and m ∈ (0, 1)) this implies that

Random Shortest Path Metrics
Given an undirected simple connected graph G = (V , E), the corresponding random shortest path metric is constructed as follows.First, for each edge e ∈ E, we draw a random edge weight w(e) independently according to the exponential distribution2 with parameter 1.Then, we define the distance function is the total weight of a lightest u, v-path in G (with respect to the random weights w(•)).Observe that this definition immediately implies that We call the distance function d obtained by this process a random shortest path metric generated from G. Note that even though the graph G is not a complete graph, the metric d(•, •) is complete in the sense that between each pair of vertices u, v ∈ V it has a direct connection of distance d(u, v).It is tempting to refer to these direct connections in the metric space as 'edges' (with weight/length/distance equal to d(u, v)).In order to avoid potential confusion with the edges of the graph G that is used to generate the metric space, we write quotation marks around the 'edges' of the metric space.
We use the following notation to denote some properties of these random shortest path metrics generated from G = (V , E).The diameter of the random metric is denoted by max := max u,v d(u, v).The -ball around a vertex v, B (v) := {u ∈ V : d(u, v) ≤ }, is the set of vertices within distance of v. Let π k (v) denote the kth closest vertex from v (including v itself and breaking ties arbitrarily).Note that π 1 (v) = v for all v ∈ V .The distance from a vertex v to the kth closest vertex from it is denoted by k} denote the set of the k closest vertices to v (including v itself).The size of the cut in G induced by this set, which plays an important role in our analysis, is denoted by

Sparse Graphs
Throughout this paper, we consider random shortest path metrics generated from sparse connected undirected simple graphs on n vertices.We have for any sparse graph G = (V , E).The probabilistic analysis of the 2-opt heuristic for the TSP in Sect. 5 works for any such graph.However, for the probabilistic analyses of the greedy-like heuristics in Sect. 4 we need to restrict ourselves to classes of sparse graphs that have 'fast growing cut sizes' as defined in Definition 1.
Looking at this definition, note that c|U | ε is a subadditive function.Hence, when checking whether a family of sparse graphs has fast growing cut sizes, we can restrict ourselves to connected subsets U ⊆ V with |U | ≤ c n: if |δ(U )| ≥ c|U | ε for all such connected subsets U ⊆ V , then it follows for any unconnected subset Ũ = U 1 ∪ . . .∪ U k (where the U i are the maximal connected subsets of Ũ ) that We end this section with showing that d-dimensional grid graphs have fast growing grid sizes.A d-dimensional grid graph has vertex set V = [N ] d , and two vertices Example 2 For any integer d > 1, the family of d-dimensional grid graphs has fast growing cut sizes.To see this, let hence for any d > 1, the family of d-dimensional grid graphs satisfies Definition 1 for c = 2 1−1/d , ε = 1 − 1/d and c = 1/2.

Structural Properties
In this section, we provide some structural properties regarding random shortest path metrics generated from sparse graphs that are used later on in our probabilistic analyses of the greedy heuristic for maximum matching and the 2-opt heuristic for the TSP in such random metric spaces.We start of with some technical lemmas from known literature and some results regarding sums of lightest edge weights in G (which hold for arbitrary sparse graphs).After that, we consider a random growth process on sparse graphs with fast growing cut sizes and use it to derive a clustering result and a tail bound on the diameter max for random shortest path metrics generated using these graphs.

Sums of Lightest Edge Weights in G
All main results in this paper make use of some observations related to sums of the m lightest edge weights in a sparse graph G.The lemmas and corollary below summarize some structural properties concerning these sums.They hold for arbitrary sparse graphs G.

Lemma 6 Let S m denote the sum of the m lightest edge weights in G. Then
Proof Let σ k denote the kth lightest edge weight in G. Since all edge weights are independent and standard exponentially distributed, we have Using the memorylessness property of the exponential distribution, it follows that σ 2 ∼ σ 1 + Exp(|E| − 1), i.e., the second lightest edge weight is equal to the lightest edge weight plus the minimum of |E| − 1 standard exponential distributed random variables.In general, we get

Now, the first stochastic dominance relation follows from Lemma 5 by observing that
where the inequality follows from applying the well-known inequality n k ≤ (en/k) k .The second stochastic dominance relation follows by observing that |E| ≥ m, which implies that (|E| − i)/(m − i) ≥ |E|/m for all i = 0, . . ., m − 1.

Corollary 7 Let S m denote the sum of the m lightest edge weights in G. Then E[S
Proof From Lemma 6 we can immediately see that The result follows by observing that The result follows immediately.

Lemma 9
Let S m denote the sum of the m lightest edge weights in G. Then we have TSP ≥ MM ≥ S n/2 , where TSP and MM are the total distance of a shortest TSP tour and a minimum-distance perfect matching, respectively.
Proof The first inequality is trivial.For the second inequality, consider a minimumdistance perfect matching in G, and take the union of the shortest paths between each matched pair of vertices.This union must contain at least n/2 different edges of G.These edges have a total weight of at least S n/2 and at most MM.So, MM ≥ S n/2 .

A Random Growth Process
In this subsection, and the following one, we assume that G is a sparse graph with fast growing cut sizes.
In order to understand the structure of sparse random shortest path metrics it is important to get a feeling for the distribution of the distances in the random metric, in particular the distribution of τ k (v).However, this distribution depends heavily on the exact position of v within G, which makes it rather complicated to derive it.In order to overcome this, we derive instead a stochastic upper bound on τ k (v) which holds for any vertex v ∈ V .The derivation of this result is a generalization of the case in which G is a complete graph, which has been analysed before (e.g.[8,10,19]).The (proof of the) following lemma shows this generalization.
Lemma 10 Let G be a family of sparse graphs with fast growing cut sizes.Then, for any G = (V , E) ∈ G, any v ∈ V and any k ≤ c n we have Proof The values of τ k (v) are generated by a birth process as follows.For k = 1 we have τ k (v) = 0 and also k−1 i=1 Exp(ci ε ) = 0.For k ≥ 2 we can obtain τ k (v) from τ k−1 (v) by looking at all edges that 'leave' (v).By definition there are χ k−1 (v) such edges, and from Definition 1 it follows that χ k−1 (v) ≥ c(k − 1) ε for k ≤ c n.Moreover, conditioned on the first k − 1 phases of the birth process, these edges must have a weight of at least Proof From Lemma 10 we can see that Exp ci ε ≥ .
Next, we want to apply the result of Lemma 3(i).For this purpose, set and recall from (1) that Finally, we substitute the values of μ and λ to obtain the desired result.
By observing that |B (v)| ≥ k if and only if τ k (v) ≤ , we can immediately derive the following corollary.

Corollary 12 Let G be a family of sparse graphs with fast growing cut sizes. Then, for any G
.
We now use this bound to derive a bound on the probability distribution of |B (v)| that is a crucial ingredient for the construction of clusterings in the next section.

Lemma 13
Let G be a family of sparse graphs with fast growing cut sizes.Then, there exists a constant c 1 such that for any > 0, any G = (V , E) ∈ G with n sufficiently large, and any v ∈ V we have Proof For ease of notation, define ξ := c (c(1 − ε)) 1/(1−ε) and assume w.l.o.g. that c 1 ≥ 1/ξ .Now observe that for ≤ 1/ξ 1−ε , the statement is trivial since in that case we have c 1 / 1/(1−ε) ≥ c 1 ξ ≥ 1.So, we are left with the case where > 1/ξ 1−ε .
For the first case, suppose that ξ 1/(1−ε) ≤ c n. Then it follows that s = ξ 1/(1−ε) , and we need to show that the function is bounded from above by a constant.Now, observe that λ − 1 − ln(λ) is an increasing function of λ for λ ≥ 1. Combining this with the observation, following from (2), that c and setting γ := (1/c ) 1−ε for ease of notation, it follows that 123 where the second inequality follows by using a bound for the generalized harmonic number (cf. ( 2)).It is well-established that the function on the right-hand side has a finite global maximum (since γ > 1 implies γ − 1 − ln(γ ) > 0).Therefore, we can conclude that in this case there exists a constant c 1 such that f ( ) ≤ c 1 for all > 1/ξ 1−ε .For the second case, suppose that ξ 1/(1−ε) ≥ c n. Then it follows that s = c n, and we need to show that the function is bounded from above by a constant as long as ξ 1/(1−ε) ≥ c n and n is sufficiently large.Observe that we can rewrite the inequality ξ To do so, we compute the partial derivative of g( , n) with respect to , and show that it is non-positive for all ≥ n 1−ε /(c(1 − ε)).The partial derivative equals Now observe that for sufficiently large n we have where we subsequently used the bound on for this case, the fact that n is sufficiently large, and (2) to bound the generalized harmonic number.Together with the facts that e x > 0 for all x ∈ R and H (ε) 2 ) ≥ 0, this shows that the partial derivative of g( , n) with respect to is indeed non-positive for all In the first case we have already shown that there exists a constant c 1 such that f ( ) ≤ c 1 for all > 1/ξ 1−ε .So, it follows immediately that g( , )) ≤ c 1 as long as ξ 1/(1−ε) ≥ c n and n is sufficiently large.
Combining both cases yields the desired result.

Clustering and a Tail Bound for 1 max
The following theorem shows that we can partition the vertices of random shortest path metrics generated from sparse graphs from fast growing cut sizes into a suitably small number of clusters with a given maximum diameter.Its proof follows closely the ideas of Bringmann et al. [8], albeit with a different value of s .
Theorem 14 Let G be a family of sparse graphs with fast growing cut sizes.Then, there exists a constant c 1 such that for any > 0 and any G ∈ G there exists a partition of the vertices of a random shortest path metric generated from G into clusters, each of diameter at most 4 , such that the expected number of clusters needed is bounded from above by Proof Let G ∈ G with n be sufficiently large, and let s := min{ξ 1/(1−ε) , c n}.Consider a random shortest path metric generated from G. We call vertex v -dense if |B (v)| ≥ s and -sparse otherwise.Using Lemma 13 we can bound the expected number of -sparse vertices by c 1 n/ 1/(1−ε) .We put each -sparse vertex in its own cluster (of size 1), which has diameter 0 ≤ 4 .Now, only the -dense vertices remain.We cluster them according to the following process.Consider an auxiliary graph H whose vertices are the -dense vertices and where two vertices u, v are connected by an edge if and only if B (u) ∩ B (v) = ∅.Consider an arbitrary maximal independent set S in H , and observe that |S| ≤ n/s by construction of H .We create the initial clusters C 1 , . . ., C |S| , each of which equals B (v) for some vertex v ∈ S. Observe that these initial clusters have diameter at most 2 .
Next, consider an arbitrary -dense vertex v that is not yet part of any cluster.By the maximality of S, we know that there must exist a vertex u ∈ S such that A := B (u)∩ B (v) = ∅.Let x ∈ A be arbitrarily chosen, and observe that d(v, u) ≤ d(v, x) + d(x, u) ≤ + = 2 .We add v to the initial cluster corresponding to u, and repeat this step until all -dense vertices have been added to some initial cluster.By construction, the diameter of all these clusters is now at most 4 : consider two arbitrary vertices w, y in a cluster that initially corresponded to u ∈ S; then we have d(w, y) ≤ d(w, u) + d(u, y So, now we have in expectation at most c 1 n/ 1/(1−ε) clusters containing one (sparse) vertex each, and at most n/s ≤ 1/c + n/ξ 1/(1−ε) clusters containing at least s ( -dense) vertices each, all with diameter at most 4 .The result follows.This clustering result is useful as long as is not too large.However, for large values of , in particular ≥ max /4, a 'partition' always requires only one cluster.Recall that max = max u,v d(u, v) is the diameter of the random metric space.
For random shortest path metrics generated from complete graphs we know that max ≤ O(log(n)/n) with high probability [19].For random shortest path metrics generated from sparse graphs the diameter is significantly larger.Intuitively this follows from the fact that in a sparse graph there are significantly fewer different paths between most pairs of vertices compared to the number of different paths in a complete graph.Hence, it becomes significantly less likely to have a really short path between every pair of vertices.
For random shortest path metrics generated from arbitrary graphs, the best possible general bound is max ≤ O(n) with high probability.Note that for random shortest path metrics generated from a path graph on n vertices, we can easily derive that E[ max ] = (n) (this follows from Corollary 7).Hence, the bound in the following lemma is tight.
Lemma 15 Let G = (V , E) be an arbitrary connected graph on n vertices and consider a random shortest path metric generated from G. For any x ≥ 6n we have Proof Fix an arbitrary v ∈ V and let x ≥ 6n.We first show that P(τ n (v) ≥ x) ≤ e −x/2 .Since G is connected, we know that |δ(U )| ≥ 1 for all ∅ = U ⊂ V , and hence in particular χ k (v) ≥ 1 for all k ∈ [n].Using the same approach as in the proof of Lemma 10, we can derive that Exp (1).
From this, we can see that In order to bound this probability, we once more use Lemma 3(i).For this purpose, set and λ := x/μ, and observe that λ ≥ 6 (since x ≥ 6n).Lemma 3(i) now yields where we used λ − 1 − ln(λ) ≥ λ/2 (which holds for all λ ≥ 5.36) for the second inequality.The final result follows from observing that max = max v τ n (v) and applying the appropriate union bound.

Analysis of Greedy-like Heuristics for Matching and TSP
In this section, we show that three greedy-like heuristics (greedy for minimum-distance perfect matching, and nearest neighbor and insertion for TSP) achieve a constant expected approximation ratio on sparse random shortest path metrics generated from sparse graphs with fast growing cut sizes.The three proofs are very alike, and the ideas behind them are built upon ideas by Bringmann et al. [8]: we divide the steps of the greedy-like heuristics into bins, depending on the value which they add to the total distance of our (partial) matching or TSP tour.Using the clustering (Theorem 14) we bound the total contribution of these bins by O(n), and using our observations regarding sums of lightest edge weights (Lemmas 8 and 9 ) we show that the optimal matching or TSP tour has a value of (n) with sufficiently high probability.

Greedy Heuristic for Minimum-Distance Perfect Matching
The first problem that we consider is the minimum-distance perfect matching problem.Even though solving the minimum-distance perfect matching problem to optimality is not very difficult (it can be done in O(n 3 ) time [23]), in practice this is often too slow, especially if the number of vertices is large.Therefore, people often rely on (simple) heuristics to solve this problem in practical situations.The greedy heuristic is arguably the simplest one among these heuristics.It starts with an empty matching and iteratively adds a pair of currently unmatched vertices (an 'edge') to the matching such that the distance between them is minimal.Let GR denote the total distance of the matching computed by the greedy heuristic, and let MM denote the total distance of an optimal matching.
It is known that the worst-case approximation ratio for this heuristic on metric instances is O(n log 2 (3/2) ) [25].Moreover, for random Euclidean instances, the greedy heuristic has an approximation ratio of O(1) with high probability [3].For instances with independent edge lengths (thus not necessarily metric), the greedy heuristic returns a matching with an expected distance of (ln(n)) [2] and the optimal matching has a total distance of (1) with high probability [30], which gives an approximation ratio of O(ln(n)).For random shortest path metrics generated from complete graphs or Erdős-Rényi random graphs the expected approximation ratio of the greedy heuristic is O(1) [8,22].We show that a similar result holds for random shortest path metrics generated from sparse graphs with fast growing cut sizes.

Theorem 16 For random shortest path metrics generated from sparse graphs with fast growing cut sizes we have E[GR] = O(n).
Proof We put 'edges' that are being added to the greedy matching into bins according to their distance: bin i receives all 'edges' {u, v} satisfying d(u, v) ∈ (4(i −1), 4i].Let X i denote the number of 'edges' that end up in bin i and set Y i := ∞ k=i X k , i.e., Y i denotes the number of 'edges' in the greedy matching with distance at least 4(i − 1).Observe that Y 1 = n/2.For i > 1, by Theorem 14, we can partition the vertices in an expected number of at most O(1+n/(i −1) 1/(1−ε) ) clusters (where the constant hidden by the Onotation does not depend on i), each of diameter at most 4(i −1).Just before the greedy heuristic adds for the first time an 'edge' of distance more than 4(i −1) to the matching, it must be the case that each of these clusters contains at most one unmatched vertex (otherwise the greedy heuristic could have chosen a shorter 'edge' between two vertices in the same cluster).Therefore, we can conclude that On the other hand, for values of i such that 4(i − 1) ≥ 6n, it follows from Lemma 15 that E[Y i ] ≤ (n/2) • P( max ≥ 4(i − 1)) ≤ n 2 e −2(i−1) .Now we sum over all bins, bound the length of each 'edge' in bin i by 4i, and subsequently use Fubini's theorem and the derived bounds on E[Y i ].This yields which finishes the proof.

Theorem 17 For random shortest path metrics generated from sparse graphs with fast growing cut sizes we have E[ GR MM ] = O(1).
Proof Let ĉ > 0 be a sufficiently small constant.Then the approximation ratio of the greedy heuristic on random shortest path metrics generated from sparse graphs with fast growing cut sizes can be bounded by since the worst-case approximation ratio of the greedy heuristic on metric instances is known to be O(n log 2 (3/2) ) [25].By Theorem 16 the first term is O(1).Combining Lemmas 8 and 9, the second term can be bounded from above by 1) since ĉ is sufficiently small.

Nearest Neighbor Heuristic for TSP
One of the most intuitive heuristics for the TSP is the nearest neighbor heuristic.This greedy-like heuristic starts with an arbitrary vertex as its current vertex and iteratively builds a TSP tour by traveling from its current vertex to the closest unvisited vertex and adding the corresponding 'edge' to the tour (and closing the tour by going back to its first vertex after all vertices have been visited).Let NN denote the total distance of the TSP tour computed by the nearest neighbor heuristic, and let TSP denote the total distance of an optimal TSP tour.It is known that the worst-case approximation ratio for this heuristic on metric instances is O(ln(n)) [27].Moreover, for random Euclidean instances, the nearest neighbor heuristic has an approximation ratio of O(1) with high probability [5].For instances with independent edge lengths (thus not necessarily metric), the nearest neighbor heuristic returns a TSP tour with an expected length of H n−1 + n/(n − 1) = (ln(n)), 3 while the optimal TSP tour has a total length of (1) with high probability [13], which gives an approximation ratio of O(ln(n)).For random shortest path metrics generated from complete graphs or Erdős-Rényi random graphs the expected approximation ratio of the nearest neighbor heuristic is O(1) as well [8,22].We show that a similar result holds for random shortest path metrics generated from sparse graphs with fast growing cut sizes.
Theorem 18 For random shortest path metrics generated from sparse graphs with fast growing cut sizes we have Proof We put 'edges' that are being added to the nearest neighbor TSP tour into bins according to their distance: bin i receives all 'edges' {u, v} satisfying d(u, v) ∈ (4(i − 1), 4i].Let X i and Y i be defined as in the proof of Theorem 16.Observe that Y 1 = n.For i > 1, by Theorem 14, we can partition the vertices in an expected number of at most O(1 + n/(i − 1) 1/(1−ε) ) clusters (where the constant hidden by the O-notation does not depend on i), each of diameter at most 4(i − 1).Every time the nearest neighbor heuristic adds an 'edge' of distance more than 4(i − 1), this must be an 'edge' from a vertex in some cluster C k to a vertex in another cluster C , and the tour must have already visited all other vertices in C k (otherwise the nearest neighbor heuristic could have chosen a shorter 'edge' to an unvisited vertex in C k ).Therefore, we can conclude that On the other hand, for values of i such that 4(i − 1) ≥ 6n, it follows from Lemma 15 that E[Y i ] ≤ n • P( max ≥ 4(i − 1)) ≤ n 2 e −2(i−1) .
Note that (except for Y 1 ) we have derived exactly the same bounds as in the proof of Theorem 16.Using the same calculations as in that proof, it follows now that Theorem 19 For random shortest path metrics generated from sparse graphs with fast growing cut sizes we have E[ NN TSP ] = O(1).The proof of this theorem is similar to that of Theorem 17, with the worstcase approximation ratio of the nearest neighbor heuristic on metric instances being O(ln(n)) [27].

Insertion Heuristics for TSP
Another group of greedy-like heuristics for the TSP are the insertion heuristics.An insertion heuristic starts with an initial optimal tour on a few vertices that are selected according to some predefined rule R, and iteratively chooses (according to the same rule R) a vertex that is not in the tour yet and inserts this vertex in the current tour such that the total distance of the tour increases the least.Usually the rule R prescribes that the initial tour is just some tour on three vertices or an edge (i.e., a tour on two vertices) or even a single vertex.Examples of rules used for choosing a vertex to insert in the tour are 'nearest insertion' (choose the vertex that has the shortest distance to a vertex already in the tour), 'farthest insertion' (choose the vertex whose minimal distance to a vertex already in the tour is maximal) and 'cheapest insertion' (choose the vertex whose insertion causes the smallest increase in the length of the tour) [24].Let IN R denote the total distance of the TSP tour computed by the insertion heuristic using rule R, and let TSP denote the total distance of an optimal TSP tour.
It is known that the worst-case approximation ratio for this heuristic for any rule R on metric instances is O(ln(n)) [27].Moreover, for random Euclidean instances, some insertion rules R have an approximation ratio of (ln(n)/ ln ln(n)) [4].For random shortest path metrics generated from complete graphs or Erdős-Rényi random graphs the expected approximation ratio of the nearest neighbor heuristic is O(1) for any rule R [8,22].We show that a similar result holds for random shortest path metrics generated from sparse graphs with fast growing cut sizes.

Theorem 20 For random shortest path metrics generated from sparse graphs with fast growing cut sizes we have E[IN R ] = O(n).
Proof We put the steps of the insertion heuristic into bins according to the distance they add to the tour: bin i receives all steps with a contribution in the range (8(i − 1), 8i].Let X i and Y i be defined as in the proof of Theorem 16.Observe that Y 1 ≤ n.For i > 1, by Theorem 14, we can partition the vertices in an expected number of at most O(1 + n/(i − 1) 1/(1−ε) ) clusters (where the constant hidden by the Onotation does not depend on i), each of diameter at most 4(i − 1).Every time the contribution of a step of the insertion heuristic is more than 8(i − 1), this step must add a vertex to the tour that is part of a cluster C k of which no other vertex is in the tour yet (otherwise the contribution of this step would have been less than 8(i − 1)).Therefore, we can conclude that E[Y i ] ≤ O(1 + n/(i − 1) 1/(1−ε) ) for i > 1.On the other hand, for values of i such that 8(i − 1) ≥ 6n, it follows from Lemma 15 that Using the same method as in the proof of Theorem 16 (i.e., summing over all bins, bounding the contribution of each step in bin i by 8i and using Fubini's theorem and the derived bounds on E[Y i ]), and adding the expected contribution E[T R ] of the initial tour, we obtain In this lemma and these theorems we consider the TSP tours as being directed and use the following notation.For each i, j ∈ V , let P i j denote the set of all (directed) edges in the shortest i, j-path.
Lemma 22 Let G = (V , E) be an arbitrary connected graph and consider a random shortest path metric generated from this graph.Also, let S denote an arbitrary 2optimal solution for the TSP on this random metric.Moreover, let x i j := 1 if this solution S travels directly from vertex i to vertex j, and x i j := 0 otherwise.Then, for any i, j, k, l ∈ V with x i j = x kl = 1 we have either P i j ∩ P kl = ∅ or (i, j) = (k, l).
Proof Let i, j, k, l ∈ V such that x i j = x kl = 1, and suppose that (i, j) = (k, l).Set A := {i, j, k, l} and observe that |A| equals either 3 or 4. (|A| = 2 would imply (i, j) = (k, l).) We first look at the case where |A| = 4. Suppose, by way of contradiction, that P i j ∩ P kl = ∅.Take e = (s, t) ∈ P i j ∩ P kl .Then d(i, j) = d(i, s) + w(e) + d(t, j) and d(k, l) = d(k, s) + w(e) + d(t, l).Moreover, using the triangle inequality, we can see that d(i, k) ≤ d(i, s) + d(s, k) and d( j, l) ≤ d( j, t) + d(t, l).Let δ = δ(i, j, k, l) denote the improvement of the 2-exchange where {i, j} and {k, l} are replaced by {i, k} and { j, l}.Note that δ ≤ 0 since S is a 2-optimal solution for the TSP.It follows that which clearly is a contradiction.Therefore, we must have P i j ∩ P kl = ∅ in this case.Now, we look at the case where |A| = 3.Since the x variables describe a solution to the TSP, this implies that either j = k or i = l.These cases are analogous, so w.l.o.g.we assume that j = k.The proof that P i j ∩P kl = ∅ in this case is similar to the proof for |A| = 4, with the exception that here we have δ = d(i, j)+d( j, l)−d(i, j)−d( j, l) = 0 (instead of δ ≤ 0).The desired result follows.

Theorem 23 For random shortest path metrics generated from arbitrary (connected) sparse graphs we have E[WLO] = O(n).
Proof Let x i j = 1 if WLO travels directly from vertex i to vertex j, and x i j = 0 otherwise.From Lemma 22 we know that each edge e ∈ E can appear at most twice in the disjoint union of all shortest i, j-paths that form a 2-optimal tour (at most once per direction).This yields WLO = where the last equality follows by recalling that |E| = (n) for (connected) sparse graphs.
Theorem 24 For random shortest path metrics generated from arbitrary (connected) sparse graphs we have E[ WLO TSP ] = O (1).
The proof of this theorem is similar to that of Theorem 17, with the worst-case approximation ratio of the 2-opt heuristic on metric instances being O( √ n) [9].

Concluding Remarks
We have analyzed simple heuristics for matching and TSP on random shortest path metrics generated from sparse graphs, since we believe that these models yield more realistic metric spaces than random shortest path metrics generated from dense or even complete graphs.However, for the greedy-like heuristics we had to restrict ourselves to sparse graphs with fast growing cut sizes (which includes many classes of sparse graphs).We raise the question whether it is possible to extend our findings for these heuristics to arbitrary sparse graphs.On the other hand, especially if we consider random shortest path metrics generated from grid graphs, in our view the model could be improved by using only a (possibly random) subset of the vertices of G for defining the random metric space, i.e., restricting the distance function d of the metric to some sub-domain V × V , where V ⊂ V .It would be interesting to see whether this model could be analyzed as well.
Finally, in our analysis of the 2-opt local search heuristic, we had to decouple the actual heuristic from the initialization in order to make the analysis tractable.We leave it as an open problem to prove rigorous results about hybrid heuristics that consist of an initialization and a local search algorithm.

Declarations
Conflict of interest We have no conflicts of interest to disclose.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Lemma 8 2 = e 2 c|E|n m 2 m.
and recalling that |E| = (n) by our restrictions imposed on G. Let S m denote the sum of the m lightest edge weights in G. Then we have P (S m ≤ cn) ≤ exp m 2 + ln c|E|n m Proof First of all, Lemma 6 yields apply Corollary 4 with μ = m 2 /e|E|, a * = e|E|/m, and x = cn to obtain The result follows using induction.Now we use this stochastic upper bound on τ k (v) that holds for any v ∈ V to derive some bounds on the cumulative distribution functions of τ k (v) and |B (v)|.The final bound on |B (v)| is a crucial ingredient for the construction of clusterings in the next section.Lemma 11 Let G be a family of sparse graphs with fast growing cut sizes.Then, for any G = (V , E) ∈ G, any > 0, any v ∈ V and any k ∈ [n] with k ≤ min{c n, (c(1 − ε) ) 1/(1−ε) } we have s) + w(e) + d(t, j) + d(k, s) + w(e) + d(t, l) − d(i, s) − d(s, k) − d( j, t) − d(t, l) = 2w(e) > 0, ) = 2S |E| ,where S m denotes the sum of the m lightest edge weights in G (as in Sect.3.2).Combining this with Corollary 7, it follows thatE[WLO] ≤ E[2S |E| ] = O |E| 2 |E| = O(n),