Smoothed Analysis of Partitioning Algorithms for Euclidean Functionals
 1.1k Downloads
 3 Citations
Abstract
Euclidean optimization problems such as TSP and minimumlength matching admit fast partitioning algorithms that compute nearoptimal solutions on typical instances.
In order to explain this performance, we develop a general framework for the application of smoothed analysis to partitioning algorithms for Euclidean optimization problems. Our framework can be used to analyze both the runningtime and the approximation ratio of such algorithms. We apply our framework to obtain smoothed analyses of Dyer and Frieze’s partitioning algorithm for Euclidean matching, Karp’s partitioning scheme for the TSP, a heuristic for Steiner trees, and a heuristic for degreebounded minimumlength spanning trees.
Keywords
Span Tree Approximation Ratio Minimum Span Tree Steiner Tree Steiner Tree Problem1 Introduction
Euclidean optimization problems are a natural class of combinatorial optimization problems. In a Euclidean optimization problem, we are given a set X of points in \(\mathbb{R}^{2}\). The topology used is the complete graph of all points, where the Euclidean distance ∥x−y∥ is the length of the edge connecting the two points x,y∈X.
Many such problems, like the Euclidean traveling salesman problem [22] or the Euclidean Steiner tree problem [14], are NPhard. For others, like minimumlength perfect matching, there exist polynomialtime algorithms. However, these polynomialtime algorithms are sometimes too slow to solve large instances. Thus, fast heuristics to find nearoptimal solutions for Euclidean optimization problems are needed.
A generic approach to design heuristics for Euclidean optimization problems are partitioning algorithms: They divide the Euclidean plane into a number of cells such that each cell contains only a small number of points. This allows us to compute quickly an optimal solution for our optimization problem for the points within each cell. Finally, the solutions of all cells are joined in order to obtain a solution to the whole set of points.
Although this is a rather simple adhoc approach, it works surprisingly well and fast in practice [16, 24]. This is at stark contrast to the worstcase performance of partitioning algorithms: They can both be very slow and output solutions that are far from being optimal. Thus, as it is often the case, worstcase analysis is too pessimistic to explain the performance of partitioning algorithms. The reason for this is that worstcase analysis is dominated by artificially constructed instances that often do not resemble practical instances.
Both to explain the performance of partitioning algorithms and to gain probabilistic insights into the structure and value of optimal solutions of Euclidean optimization problems, the averagecase performance of partitioning algorithms has been studied a lot. In particular, Steele [31] proved complete convergence of Karp’s partitioning algorithm [18] for Euclidean TSP. Also strong central limit theorems for a wide range of optimization problems are known. We refer to Steele [32] and Yukich [35] for comprehensive surveys.
However, also averagecase analysis has its drawback: Random instances usually have very specific properties with overwhelming probability. This is often exploited in averagecase analysis: One shows that the algorithm at hand performs very well if the input has some of these properties. But this does not mean that typical instances share these properties. Thus, although a good averagecase performance can be an indicator that an algorithm performs well, it often fails to explain the performance convincingly.
In order to explain the performance of partitioning schemes for Euclidean optimization problems, we provide a smoothed analysis. Smoothed analysis has been introduced by Spielman and Teng [27] in order to explain the performance of the simplex method for linear programming. It is a hybrid of worstcase and averagecase analysis: An adversary specifies an instance, and this instance is then slightly randomly perturbed. The perturbation can, for instance, model noise from measurement. Since its invention in 2001, smoothed analysis has been applied in a variety of contexts [3, 4, 6, 12, 26]. We refer to two recent surveys [20, 28] for a broader picture.
We develop a general framework for smoothed analysis of partitioning algorithms for optimization problems in the Euclidean plane (Sect. 3). We consider a very general probabilistic model where the adversary specifies n density functions f _{1},…,f _{ n }:[0,1]^{2}→[0,ϕ], one for each point. Then the actual point set is obtained by drawing x _{ i } independently from the others according to f _{ i }. The parameter ϕ controls the adversary’s power: The larger ϕ, the more powerful the adversary. (See Sect. 2.2 for a formal explanation of the model.) We analyze the expected runningtime and approximation performance of a generic partitioning algorithm under this model. The smoothed analysis of the runningtime for partitioning algorithms depends crucially on the convexity of the worstcase bound of the runningtime of the problem under consideration. The main tool for the analysis of the expected approximation ratio is Rhee’s isoperimetric inequality [25]. Let us note that, even in the average case, convergence to the optimal value for large n does not imply a bound on the expected approximation ratio. The reason is that if we compute a very bad solution with very small probability, then this allows convergence results but it deteriorates the expected approximation ratio.
Smoothed bounds for some Euclidean optimization problems
Problem  Runningtime  Approximation ratio  Reference 

matching [10]  O(nϕ ^{2}log^{4} n)  \(1+O(\sqrt{\phi}/\log n)\)  Corollaries 4.2 & 4.5 
TSP [18]  \(\mathop{\mathrm{poly}}(n)\)  \(1 +O(\sqrt{\phi/\log n})\)  Corollary 5.2 
Steiner tree [17]  \(\mathop{\mathrm{poly}}(n)\)  \(1 +O(\sqrt{\phi/\log n})\)  Corollary 6.2 
degreebounded MST  \(\mathop{\mathrm{poly}}(n)\)  \(1 +O(\sqrt{\phi\log\log n/\log n})\)  Corollary 7.2 
2 Preliminaries
For \(n \in\mathbb{N}\), let [n]={1,2,…,n}. We denote probabilities by \(\mathbb{P}\) and expected values by \(\mathbb{E}\).
2.1 Euclidean Functionals

\(\mathsf{MM}\) maps a point set to the length of its minimumlength perfect matching (length means Euclidean distance, one point is left out if the cardinality of the point set is odd).

\(\mathsf{TSP}\) maps a point set to the length of its shortest Hamiltonian cycle, i.e., to the length of its optimal traveling salesman tour.

\(\mathsf{MST}\) maps a point set to the length of its minimumlength spanning tree.

\(\mathsf{ST}\) maps a point set to the length of its shortest Steiner tree.

\(\mathsf{dbMST}\) maps a point set to the length of its minimumlength spanning tree, restricted to trees of maximum degree at most b for some given bound b.
The Euclidean functionals that we consider in this paper are all associated with an underlying combinatorial optimization problem. Thus, the function value \(\mathsf{F}(X)\) is associated with an optimal solution (minimumlength perfect matching, optimal TSP \(\mathrm{tour},\ldots\)) to the underlying combinatorial optimization problem. In this sense, we can design approximation algorithms for \(\mathsf{F}\): Compute a (nearoptimal) solution (where it depends on the functional what a solution actually is; for instance, a perfect matching), and compare the objective value (for instance, the sum of the lengths of its edges) to the function value.
Let C _{1},…,C _{ s } be a partition of [0,1]^{2} into rectangles. We call each C _{ ℓ } a cell. Note that the cells are not necessarily of the same size. For a finite set X⊆[0,1]^{2} of n points, let X _{ ℓ }=X∩C _{ ℓ } be the points of X in cell C _{ ℓ }. Let n _{ ℓ }=X _{ ℓ } be the number of points of X in cell C _{ ℓ }. Let \(\mathop{\mathrm{diameter}}(C_{\ell})\) be the diameter of cell C _{ ℓ }.
Unfortunately, the Euclidean functionals \(\mathsf{TSP}\), \(\mathsf {MM}\) and \(\mathsf{MST}\) are smooth and subadditive but not superadditive [31, 32, 35]. However, these functionals can be approximated by their corresponding canonical boundary functionals, which are superadditive [13, 35]. We obtain the canonical boundary functional of a Euclidean functional by considering the boundary of the domain as a single point [35]. This means that two points can either be connected directly or via a detour along the boundary. In the latter case, only the lengths of the two edges connecting the two points to the boundary count, walking along the boundary is free of charge. Yukich [35] has shown that this is a sufficient condition for a Euclidean functional to be nearadditive.
Proposition 2.1
(Yukich [35, Lemma 5.7])
Let \(\mathsf{F}\) be a subadditive Euclidean functional. Let \(\mathsf{F}_{\mathop{\mathrm{B}}}\) be a superadditive functional that wellapproximates \(\mathsf{F}\). (This means that \(\mathsf{F}(X)\mathsf{F}_{\mathop{\mathrm{B}}}(X) = O(1)\) for all finite X⊆[0,1]^{2}.) Then \(\mathsf{F}\) is nearadditive.
The functionals \(\mathsf{MM}\), \(\mathsf{TSP}\), \(\mathsf{MST}\), \(\mathsf{ST}\), and \(\mathsf{dbMST}\) are nearadditive.
Limit theorems are powerful tools for the analysis of Euclidean functionals. Rhee [25] proved the following limit theorem for smooth Euclidean functionals over [0,1]^{2}. We will mainly use it to bound the probability that \(\mathsf{F}\) assumes a too small function value.
Theorem 2.2
(Rhee [25])
Remark 2.3
Rhee proved Theorem 2.2 for the case that x _{1},…,x _{ n } are identically distributed. However, as pointed out by Rhee herself [25], the proof carries over to the case when x _{1},…,x _{ n } are drawn independently but their distributions are not necessarily identical.
2.2 Smoothed Analysis
In the classical model of smoothed analysis [27], an adversary specifies a point set \(\bar{X}\), and then this point set is perturbed by independent identically distributed random variables in order to obtain the input set X. A different viewpoint is that the adversary specifies the means of the probability distributions according to which the point set is drawn. This model has been generalized as follows [4]: Instead of only specifying the mean, the adversary can specify a density function for each point, and then we draw the points independently according to their density functions. In order to limit the power of the adversary, we have an upper bound ϕ for the densities: The adversary is allowed to specify any density function [0,1]^{2}→[0,ϕ]. If ϕ=1, then this boils down to the uniform distribution on the unit square [0,1]^{2}. If ϕ gets larger, the adversary becomes more powerful and can specify the location of the points more and more precisely. The role of ϕ is the same as the role of 1/σ in classical smoothed analysis, where σ is the standard deviation of the perturbation. We summarize this model formally in the following assumption.
Assumption 2.4
Let ϕ≥1. An adversary specifies n probability density functions f _{1},…,f _{ n }:[0,1]^{2}→[0,ϕ]. We write f=(f _{1},…,f _{ n }) for short. Let x _{1},…,x _{ n }∈[0,1]^{2} be n random vectors where x _{ i } is drawn according to f _{ i }, independently from the other points. Let X={x _{1},…,x _{ n }}.
If the actual density functions f matter and are not clear from the context, we write X∼f to denote that X is drawn as described above. If we have a performance measure P for an algorithm (P will be either runningtime or approximation ratio in this paper), then the smoothed performance is \(\max_{f} (\mathbb{E}_{X \sim f} [P(X)])\). Note that the smoothed performance is a function of the number n of points and the parameter ϕ.
Let \(\mathsf{F}\) be a Euclidean functional. For the rest of this paper, let \(\mu_{\mathsf{F}}(n,\phi)\) be a lower bound for the expected value of \(\mathsf{F}\) if X is drawn according to the probabilistic model described above. More precisely, \(\mu_{\mathsf{F}}\) is some function that fulfills \(\mu_{\mathsf{F}}(n, \phi) \leq\min_{f} (\mathbb{E}_{X \sim f} [\mathsf{F}(X)])\). The function \(\mu_{\mathsf{F}}\) comes into play when we have to bound the objective value of an optimal solution, i.e., \(\mathsf{F}(X)\), from below in order to analyze the approximation ratio.
3 Framework
In this section, we present our framework for the performance analysis of partitioning heuristics for Euclidean functionals. Let \(\mathsf{A}_{\mathop{\mathrm{opt}}}\) be an optimal algorithm for some smooth and nearadditive Euclidean functional \(\mathsf{F}\), and let \(\mathsf{A}_{\mathop{\mathrm{join}}}\) be an algorithm that combines solutions for each cell into a global solution. We assume that \(\mathsf{A}_{\mathop{\mathrm{join}}}\) runs in time linear in the number of cells. Then we obtain the following algorithm, which we call \(\mathsf{A}\).
Algorithm 3.1
(Generic algorithm \(\mathsf{A}\))
 1.
Divide [0,1]^{2} into s cells C _{1},…,C _{ s }.
 2.
Compute optimal solutions for each cell using \(\mathsf{A}_{\mathop{\mathrm{opt}}}\).
 3.
Join the s partial solutions to a solution for X using \(\mathsf{A}_{\mathop{\mathrm{join}}}\).
The cells in the first step of Algorithm 3.1 are rectangles. They are not necessarily of the same size (in this paper, only the algorithm for matching divides the unique square into cells of exactly the same size, the other algorithms choose the division into squares depending on the actual point set). We use the following assumptions in our analysis and mention explicitly whenever they are used.
Assumption 3.2
 1.
ϕ∈O(s). This basically implies that the adversary cannot concentrate all points in a too small number of cells.
 2.
ϕ∈ω(slogn/n). This provides a lower bound for the probability mass in a “full” cell, where full is defined in Sect. 3.1.
 3.
\(\phi\in o(\sqrt{n/\log n})\). With this assumption, the tail bound of Theorem 2.2 becomes subpolynomial.
These assumptions are not too restrictive: For the partitioning algorithms we analyze here, we have s=O(n/log^{ O(1)} n) (for matching, we could also use smaller s while maintaining polynomial, albeit worse, runningtime; for the other problems, we even need s=O(n/log^{ O(1)})). Ignoring polylogarithmic terms, the first and third assumption translate roughly to ϕ=O(n) and \(\phi= o(\sqrt{n})\), respectively. The second assumption roughly says ϕ=ω(1). But for ϕ=O(1), we can expect roughly averagecase behavior because the adversary has only little influence on the positions of the points.
3.1 Smoothed RunningTime
Many of the schemes that we analyze choose the partition in such a way that we have a worstcase upper bound on the number of points in each cell. Other algorithms, like the one for matching [10], have a fixed partition independent of the input points. In the latter case, the runningtime also depends on ϕ.
We call a cell C _{ ℓ } full with respect to f if f(C _{ ℓ })=nϕ/s. We call C _{ ℓ } empty if f(C _{ ℓ })=0. Our bound (1) on the runningtime depends only on the values f _{1}(C _{ ℓ }),…,f _{ n }(C _{ ℓ }), but not on where exactly within the cells the probability mass is assumed.
Lemma 3.3
(Inverse Jensen’s inequality)
Let T be any convex, monotonically increasing function that is bounded by a polynomial, and let B be a binomially distributed random variable with parameters \(n \in\mathbb{N}\) and p∈[0,1] with p∈ω(logn/n). Then \(\mathbb{E}[T(B)] = \varTheta(T(\mathbb{E}[B]))\).
Proof
What remains to be done is to show that the adversary will indeed make as many cells as possible full. This follows essentially from the convexity of the runningtime. In the following series of three lemmas, we make the argument rigorous.
The first lemma basically says that we maximize a convex function of a sum of independent 0/1 random variables if we balance the probabilities of the random variables. This is similar to a result by León and Perron [19]. But when we apply Lemma 3.4 in the proof of Lemma 3.5, we have to deal with the additional constraint p _{ i }∈[ε _{ i },1−ε _{ i }]. This makes León and Perron’s result [19] inapplicable.
Lemma 3.4
Let p∈(0,1). Let X _{1},X _{2} be independent 0/1 random variables with \(\mathbb{P}[X_{1} =1] = p\delta\) and \(\mathbb{P}[X_{2} =1] = p+\delta \). Let X=X _{1}+X _{2}. Let f be any convex function, and let \(g(\delta) = \mathbb{E}[f(X)]\).
Then g is monotonically decreasing in δ for δ>0 and monotonically increasing for δ<0 and has a global maximum at δ=0.
Proof
With Lemma 3.4 above, we can show the following lemma: If we maximize a convex function of n 0/1 random variables and this function is symmetric around n/2, then we should make all probabilities as small as possible (or all as large as possible) in order to maximize the function.
Lemma 3.5
Let f be an arbitrary convex function. Let X _{1},X _{2},…,X _{ n } be independent 0/1 random variables with \(\mathbb{P}[X_{i} = 1] = p_{i} \in[\varepsilon_{i}, 1\varepsilon_{i}]\), and let \(X = \sum _{i=1}^{n} X_{i}\). Let \(g(p_{1},\ldots, p_{n}) = \mathbb{E}[f(X) + f(nX)]\). Then g has a global maximum at (ε _{1},…,ε _{ n }).
Proof
In the following, let \(X' = \sum_{i=1}^{n1} X_{i}\). Without loss of generality, we can assume that \(\sum_{i=1}^{n} p_{i} \leq n/2\). Otherwise, we replace p _{ i } by 1−p _{ i }, which does not change the function value of g by symmetry.
Lemma 3.5 above is the main ingredient for the proof that the adversary wants as many full cells as possible. Lemma 3.6 below makes this rigorous.
Lemma 3.6
 1.
\(\tilde{f}_{i}(C_{\ell'}) = \min (\phi/s, f_{i}(C_{\ell'}) +f_{i}(C_{\ell''}) )\).
 2.
\(\tilde{f}_{i}(C_{\ell''}) =(f_{i}(C_{\ell'}) + f_{i}(C_{\ell''}) )  \tilde{f}_{i}(C_{\ell'})\).
Proof
 1.
\(\tilde{f}_{1},\ldots, \tilde{f}_{n}\) have ⌊s/ϕ⌋ full cells and at most one cell that is neither full nor empty.
 2.
The expected value of T on X sampled according to \(\tilde{f}_{1},\ldots\tilde{f}_{n}\) is not smaller than the expected value of T on X sampled according to f _{1},…,f _{ n }.
Theorem 3.7
Proof
3.2 Smoothed Approximation Ratio
For estimating the expected approximation ratio \(\mathbb{E}[\mathsf{A}(X)/\mathsf{F}(X)]\) for some algorithm \(\mathsf{A}\), the main challenge is that \(\mathsf{F}(X)\) stands in the denominator. Thus, even if we have a good (deterministic) upper bound for \(\mathsf{A}(X)\) that we can plug into the expected ratio in order to get an upper bound for the ratio that only depends on \(\mathsf{F}(X)\), we are basically left with the problem of estimating \(\mathbb{E}[1/\mathsf{F}(X)]\). Jensen’s inequality yields \(\mathbb{E}[1/\mathsf{F}(X)] \geq1/\mathbb{E}[\mathsf{F}(X)]\). But this does not help, as we need upper bounds for \(\mathbb{E}[1/\mathsf{F}(X)]\). Unfortunately, such upper bounds cannot be derived easily from \(1/\mathbb{E}[\mathsf{F}(X)]\). The problem is that we need strong upper bounds for the probability that \(\mathsf{F}(X)\) is close to 0. Theorem 2.2 is too weak for this. This problem of bounding the expected value of the inverse of the optimal objective value arises frequently in bounding expected approximation ratios [11, 12].
There are two ways to attack this problem: The first and easiest way is if \(\mathsf{A}\) comes with a worstcase guarantee α(n) on its approximation ratio for instances of n points. Then we can apply Theorem 2.2 to bound \(\mathsf{F}(X)\) from below. If \(\mathsf{F}(X) \geq\mu _{\mathsf{F}}(n,\phi)/2\), then we can use (2) to obtain a ratio of \(1+O(\frac{J}{\mu_{\mathsf{F}}(n, \phi)})\). Otherwise, we obtain a ratio of α(n). If α(n) is not too large compared to the tail bound obtained from Theorem 2.2, then this contributes only little to the expected approximation ratio. The following theorem formalizes this.
Theorem 3.8
Proof
Now we turn to the case that the worstcase approximation ratio of \(\mathsf{A}\) cannot be bounded by some α(n). In order to be able to bound the expected approximation ratio, we need an upper bound on \(\mathbb{E}[1/\mathsf{F}(X)]\). Note that we do not explicitly provide an upper bound for \(\mathbb{E}[1/\mathsf {F}(X)]\), but only a sufficiently strong tail bound h _{ n } for \(1/\mathsf{F}(X)\).
Theorem 3.9
Proof
4 Matching
As a first example, we apply our framework to the matching functional \(\mathsf{MM}\) defined by the Euclidean minimumlength perfect matching problem. A partitioning algorithm for approximating \(\mathsf{MM}\) was proposed by Dyer and Frieze [10]. For completeness, let us describe their algorithm.
Algorithm 4.1
(\(\mathsf{DF}\); Dyer, Frieze [10])
 1.
Partition [0,1]^{2} into s=k ^{2} equalsized subsquares \(C_{1},\ldots,C_{k^{2}}\), each of side length 1/k, where \(k=\frac{\sqrt{n}}{\log n}\).
 2.
Compute minimumlength perfect matchings for X _{ ℓ } for each ℓ∈[k ^{2}].
 3.
Compute a matching for the unmatched points from the previous step using the strip heuristic [33].
Let \(\mathsf{DF}(X)\) be the cost of the matching computed by the algorithm above on input X={x _{1},…,x _{ n }}, and let \(\mathsf{MM}(X)\) be the cost of a perfect matching of minimum total length. Dyer and Frieze showed that \(\mathsf{DF}(X)\) converges to \(\mathsf{MM}(X)\) with probability 1 if the points in X are drawn according to the uniform distribution on [0,1]^{2} (this corresponds to Assumption 2.4 with ϕ=1). We extend this to the case when X is drawn as described in Assumption 2.4.
4.1 Smoothed RunningTime
A minimumlength perfect matching can be found in time O(n ^{3}) [1]. By Theorem 3.7, we get the following corollary.
Corollary 4.2
4.2 Smoothed Approximation Ratio
Lemma 4.3
Proof
Unfortunately, we cannot bound the worstcase approximation ratio of Dyer and Frieze’s partitioning algorithm. Thus, we cannot apply Theorem 3.8, but we have to use Theorem 3.9. Thus, we first need a tail bound for \(1/\mathsf{MM}(X)\). The bound in the following lemma suffices for our purposes.
Lemma 4.4
Proof
With this tail bound for \(1/\mathsf{MM}(X)\), we can prove the following bound on the smoothed approximation ratio.
Corollary 4.5
Under Assumptions 2.4 and 3.2(3), the expected approximation ratio of \(\mathsf{DF}\) is \(1 + O(\frac{\sqrt{\phi}}{\log n})\).
Proof
Remark 4.6
 1.
There exist other partitioning schemes for Euclidean matching [2], which can be analyzed in a similar way.
 2.
Instead of a standard cubictime algorithm, we can use Varadarajan’s matching algorithm [34] for computing the optimal matchings within each cell. This algorithm has a runningtime of O(m ^{1.5}log^{5} m) for m points, which improves the runningtime bound to \(O(n\sqrt{\phi}\log(n)\log^{5}(\phi\log n))\).
5 Karp’s Partitioning Scheme for Euclidean TSP
Karp’s partitioning scheme [18] is a heuristic for Euclidean TSP that computes nearoptimal solutions on average. It proceeds as follows:
Algorithm 5.1
(\(\mathsf{KP}\), Karp’s partitioning scheme)
 1.
Partition [0,1]^{2} into \(k = \sqrt{n/\log n}\) stripes such that each stripe contains exactly \(n/k = \sqrt{n \log n}\) points.
 2.
Partition each stripe into k cells such that each cell contains exactly n/k ^{2}=logn points.
 3.
Compute optimal TSP tours for each cell.
 4.
Join the tours to obtain a TSP tour for X.
We remark that the choice of k in Karp’s partitioning scheme is optimal in the following sense: On the one hand, more than Θ(logn) points per cell would yield a superpolynomial runningtime as the runningtime is exponential in the number of points per cell. On the other hand, less than Θ(logn) point per cell would yield a worse approximation ratio as the approximation ratio gets worse with increasing k.
For a point set X⊆[0,1]^{2}, let \(\mathsf{KP}(X)\) denote the cost of the tour through X computed by Karp’s scheme. Steele [31] has proved complete convergence of \(\mathsf {KP}(X)\) to \(\mathsf{TSP}(X)\) with probability 1, if the points are chosen uniformly and independently. Using our framework developed in Sect. 3, we extend the analysis of \(\mathsf{KP}\) to the case of nonuniform and nonidentical distributions.
Since Karp’s scheme chooses the cells adaptively based on the point set X, our framework for the analysis of the runningtime cannot be applied. However, the total runningtime of the algorithm is \(T(n)=2^{n/k^{2}} \mathop{\mathrm{poly}}(n/k^{2})+ O(k^{2})\), which is, independent of the randomness, polynomial in n for k ^{2}=n/logn.
The nice thing about the TSP is that every tour has a worstcase approximation guarantee: Consider any two points x,y∈X. Since any tour must visit both x and y, its length is at least 2∥x−y∥ by the triangle inequality. Since a tour consists of n edges, any tour has a length of at most \(\frac{n}{2} \cdot\mathsf{TSP}(X)\). Thus, we can use Theorem 3.8 together with α(n)=n/2 and obtain the following result.
Corollary 5.2
Under Assumptions 2.4 and 3.2(3), the expected approximation ratio of \(\mathsf{KP}\) is \(\mathbb{E}[ \frac{\mathsf{KP}(X)}{\mathsf{TSP}(X)}]\leq 1 +O(\sqrt{\phi/\log n})\).
Proof
6 Euclidean Steiner Trees
Kalpakis and Sherman [17] proposed a partitioning algorithm for the Euclidean minimum Steiner tree problem analogous to Karp’s partitioning scheme for Euclidean TSP. The solution produced by their algorithm converges to the optimal value with probability 1−o(1). Also, their algorithm [17] is known to produce nearoptimal solutions in practice too [24]. Let us now describe Kalpakis and Sherman’s algorithm [17].
Algorithm 6.1
(\(\mathsf{KS}\), Kalpakis, Sherman [17])
 1.
Let s=n/logn. Partition [0,1]^{2} into Θ(s) cells such that each cell contains at most n/s=logn points.
 2.
Solve the Steiner tree problem optimally within each cell.
 3.
Compute a minimumlength spanning tree to connect the forest thus obtained.
The runningtime of this algorithm is polynomial for the choice of s=n/logn [8]. For the same reason as for Karp’s partitioning scheme, we cannot use our framework to estimate the runningtime, because the choice of cells depends on the actual point set.
Since minimum spanning trees are \(2/\sqrt{3}\) approximations for Euclidean Steiner trees [9], we have \(\mathsf{ST}(X) \geq\frac{\sqrt{3}}{2}\cdot\mathsf{MST}(X)\). Furthermore, we have \(\mathsf{MST}(X) \geq\frac{1}{2} \cdot\mathsf{NN}(X)\). Thus, we can choose \(\mu_{\mathsf{ST}}(n,\phi) = \varTheta(\sqrt {n/\phi})\) by Lemma 4.3.
As \(\mathsf{KP}\) for the traveling salesman problem, \(\mathsf{KS}\) comes with a worstcase approximation ratio of α(n)=O(n). The reason is that, for any two points x,y∈X, we have \(\xy\\leq\mathsf{ST}(X)\). Since Kalpakis and Sherman’s partitioning algorithm [17] outputs at most a linear number of edges, we have \(\mathsf{KS}(X) \leq O (n \cdot\mathsf {ST}(X) )\). This gives us a worstcase approximation ratio of O(n) and yields the following corollary of Theorem 3.8.
Corollary 6.2
Proof
The proof is almost identical to the proof of Corollary 5.2. □
7 DegreeBounded Minimum Spanning Tree
A bdegreebounded minimum spanning tree of a given set of points in [0,1]^{2} is a spanning tree in which the degree of every point is bounded by b. For 2≤b≤4, this problem is NPhard, and it is solvable in polynomial time for b≥5 [23]. Let \(\mathsf{dbMST}\) denote the Euclidean functional that maps a point set to the length of its shortest bdegreebounded minimum spanning tree.
Proposition 7.1
\(\mathsf{dbMST}\) is a smooth, subadditive and nearadditive Euclidean functional.
Proof
The smoothness and subadditivity properties have been proved by Srivastav and Werth [29]. They have also defined a canonical superadditive boundary functional that wellapproximates \(\mathsf {dbMST}\) [29, Lemmas 3 and 4]. This, together with Proposition 2.1 proves that \(\mathsf{dbMST}\) is nearadditive. □
Again, we have \(\xy\ \leq\mathsf{dbMST}(X)\) for all X and x,y∈X, which implies that any possible tree is at most a factor n worse than the optimal tree. This implies in particular that the worstcase approximation ratio of \(\mathsf{P}\mbox{}\mathsf{bMST}\) is O(n): \(\mathsf{P}\mbox{}\mathsf{bMST}(X) =O(n \cdot \mathsf{dbMST}(X))\). Furthermore, we can use \(\mu_{\mathsf{dbMST}}(n, \phi) = \varOmega (\sqrt{n/\phi})\) by Lemma 4.3 as \(\mathsf{dbMST}(X) = \varOmega(\mathsf{NN}(X))\).
We can apply Theorem 3.8 to obtain the following result.
Corollary 7.2
Proof
The proof is almost identical to the proof of Corollary 5.2. The only difference is we now have to use \(J=O(\sqrt{n\log\log n/\log n})\), which leads to the slightly worse bound for the approximation ratio. □
Again, we cannot use our framework for the runningtime, but the runningtime is guaranteed to be bounded by a polynomial.
8 Concluding Remarks
We have provided a smoothed analysis of partitioning algorithms for Euclidean optimization problems. The results can be extended to distributions over \(\mathbb{R}^{2}\) by scaling down the instance so that the inputs lie inside [0,1]^{2}. The analysis can also be extended to higher dimensions. However, the value of ϕ for which our results are applicable will depend on the dimension d.
Even though solutions computed by most of the partitioning algorithms achieve convergence to the corresponding optimal value with probability 1 under uniform samples, in practice they have constant approximation ratios close to 1 [16, 24]. Our results show that the expected function values computed by partitioning algorithms approach optimality not only under uniform, identical distributions, but also under nonuniform, nonidentical distributions, provided that the distributions are not sharply concentrated.
One prominent open problem for which our approach does not work is the functional defined by the total edge weight of a minimumweight triangulation in the Euclidean plane. The main obstacles for this problem are that, first, the functional corresponding to minimumweight triangulation is not smooth and, second, the value computed by the partitioning heuristic depends on the number of points in the convex hull of the point set [15]. Damerow and Sohler [7] provide a bound for the smoothed number of points in the convex hull. However, their bound is not strong enough for analyzing triangulations.
Notes
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
References
 1.Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms, and Applications. PrenticeHall, Englewood Cliffs (1993) zbMATHGoogle Scholar
 2.Anthes, B., Rüschendorf, L.: On the weighted Euclidean matching problem in \(\mathbb{R}^{d}\). Appl. Math. 28(2), 181–190 (2001) MathSciNetzbMATHGoogle Scholar
 3.Arthur, D., Manthey, B., Röglin, H.: Smoothed analysis of the kmeans method. J. ACM 58(5), 19 (2011) MathSciNetCrossRefGoogle Scholar
 4.Beier, R., Vöcking, B.: Random knapsack in expected polynomial time. J. Comput. Syst. Sci. 69(3), 306–329 (2004) zbMATHCrossRefGoogle Scholar
 5.Bläser, M., Manthey, B., Rao, B.V.R.: Smoothed analysis of partitioning algorithms for Euclidean functionals. In: Dehne, F., Iacono, J., Sack, J.R. (eds.) Proc. of the 12th Algorithms and Data Structures Symposium (WADS). Lecture Notes in Computer Science, vol. 6844, pp. 110–121. Springer, Berlin (2011) CrossRefGoogle Scholar
 6.Damerow, V., Manthey, B., Meyer auf der Heide, F., Räcke, H., Scheideler, C., Sohler, C., Tantau, T.: Smoothed analysis of lefttoright maxima with applications. ACM Trans. Algorithms (to appear) Google Scholar
 7.Damerow, V., Sohler, C.: Extreme points under random noise. In: Albers, S., Radzik, T. (eds.) Proc. of the 12th Ann. European Symp. on Algorithms (ESA). Lecture Notes in Computer Science, vol. 3221, pp. 264–274. Springer, Berlin (2004) Google Scholar
 8.Dreyfus, S.E., Wagner, R.A.: The Steiner problem in graphs. Networks 1(3), 195–207 (1971) MathSciNetCrossRefGoogle Scholar
 9.Du, D.Z., Hwang, F.K.: A proof of the GilbertPollak conjecture on the Steiner ratio. Algorithmica 7(2&3), 121–135 (1992) MathSciNetzbMATHCrossRefGoogle Scholar
 10.Dyer, M.E., Frieze, A.M.: A partitioning algorithm for minimum weighted Euclidean matching. Inf. Process. Lett. 18(2), 59–62 (1984) MathSciNetzbMATHCrossRefGoogle Scholar
 11.Engels, C., Manthey, B.: Averagecase approximation ratio of the 2opt algorithm for the TSP. Oper. Res. Lett. 37(2), 83–84 (2009) MathSciNetzbMATHCrossRefGoogle Scholar
 12.Englert, M., Röglin, H., Vöcking, B.: Worst case and probabilistic analysis of the 2Opt algorithm for the TSP. In: Proc. of the 18th Ann. ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 1295–1304. SIAM, Philadelphia (2007) Google Scholar
 13.Frieze, A.M., Yukich, J.E.: Probabilistic analysis of the traveling salesman problem. In: Gutin, G., Punnen, A.P. (eds.) The Traveling Salesman Problem and Its Variations, pp. 257–308. Kluwer Academic, Dordrecht (2002). Chapter 7 Google Scholar
 14.Garey, M.R., Graham, R.L., Johnson, D.S.: The complexity of computing Steiner minimal trees. SIAM J. Appl. Math. 32(4), 835–859 (1977) MathSciNetzbMATHCrossRefGoogle Scholar
 15.Golin, M.J.: Limit theorems for minimumweight triangulations, other Euclidean functionals, and probabilistic recurrence relations. In: Proc. of the 7th Ann. ACMSIAM Symp. on Discrete Algorithms (SODA), pp. 252–260. SIAM, Philadelphia (1996) Google Scholar
 16.Johnson, D.S., McGeoch, L.A.: Experimental analysis of heuristics for the STSP. In: Gutin, G., Punnen, A.P. (eds.) The Traveling Salesman Problem and Its Variations, pp. 369–443. Kluwer Academic, Dordrecht (2002). Chapter 9 Google Scholar
 17.Kalpakis, K., Sherman, A.T.: Probabilistic analysis of an enhanced partitioning algorithm for the Steiner tree problem in R ^{d}. Networks 24(3), 147–159 (1994) MathSciNetzbMATHCrossRefGoogle Scholar
 18.Karp, R.M.: Probabilistic analysis of partitioning algorithms for the travelingsalesman problem in the plane. Math. Oper. Res. 2(3), 209–224 (1977) MathSciNetzbMATHCrossRefGoogle Scholar
 19.León, C.A., Perron, F.: Extremal properties of sums of Bernoulli random variables. Stat. Probab. Lett. 62(4), 345–354 (2003) zbMATHCrossRefGoogle Scholar
 20.Manthey, B., Röglin, H.: Smoothed analysis: Analysis of algorithms beyond worst case. it–Inf. Technol. 53(6), 280–286 (2011) Google Scholar
 21.Mitzenmacher, M., Upfal, E.: Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, Cambridge (2005) zbMATHCrossRefGoogle Scholar
 22.Papadimitriou, C.H.: The Euclidean traveling salesman problem is NPcomplete. Theor. Comput. Sci. 4(3), 237–244 (1977) MathSciNetzbMATHCrossRefGoogle Scholar
 23.Papadimitriou, C.H., Vazirani, U.V.: On two geometric problems related to the traveling salesman problem. J. Algorithms 5(2), 231–246 (1984) MathSciNetzbMATHCrossRefGoogle Scholar
 24.Ravada, S., Sherman, A.T.: Experimental evaluation of a partitioning algorithm for the Steiner tree problem in R ^{2} and R ^{3}. Networks 24(8), 409–415 (1994) zbMATHCrossRefGoogle Scholar
 25.Rhee, W.T.: A matching problem and subadditive Euclidean functionals. Ann. Appl. Probab. 3(3), 794–801 (1993) MathSciNetzbMATHCrossRefGoogle Scholar
 26.Röglin, H., Teng, S.H.: Smoothed analysis of multiobjective optimization. In: Proc. of the 50th Ann. IEEE Symp. on Foundations of Computer Science (FOCS), pp. 681–690. IEEE Press, New York (2009) Google Scholar
 27.Spielman, D.A., Teng, S.H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004) MathSciNetzbMATHCrossRefGoogle Scholar
 28.Spielman, D.A., Teng, S.H.: Smoothed analysis: An attempt to explain the behavior of algorithms in practice. Commun. ACM 52(10), 76–84 (2009) CrossRefGoogle Scholar
 29.Srivastav, A., Werth, S.: Probabilistic analysis of the degree bounded minimum spanning tree problem. In: Arvind, V., Prasad, S. (eds.) Proc. of the 27th Int. Conf. on Foundations of Software Technology and Theoretical Computer Science (FSTTCS). Lecture Notes in Computer Science, vol. 4855, pp. 497–507. Springer, Berlin (2007) Google Scholar
 30.Steele, J.M.: Complete convergence of short paths in Karp’s algorithm for the TSP. Math. Oper. Res. 6, 374–378 (1981) MathSciNetzbMATHCrossRefGoogle Scholar
 31.Steele, J.M.: Subadditive Euclidean functionals and nonlinear growth in geometric probability. Ann. Probab. 9(3), 365–376 (1981) MathSciNetzbMATHCrossRefGoogle Scholar
 32.Steele, J.M.: Probability Theory and Combinatorial Optimization. CBMSNSF Regional Conference Series in Applied Mathematics, vol. 69. SIAM, Philadelphia (1987) Google Scholar
 33.Supowit, K.J., Reingold, E.M.: Divide and conquer heuristics for minimum weighted Euclidean matching. SIAM J. Comput. 12(1), 118–143 (1983) MathSciNetzbMATHCrossRefGoogle Scholar
 34.Varadarajan, K.R.: A divideandconquer algorithm for mincost perfect matching in the plane. In: Proc. of the 39th Ann. Symp. on Foundations of Computer Science (FOCS), pp. 320–331. IEEE Press, New York (1998) Google Scholar
 35.Yukich, J.E.: Probability Theory of Classical Euclidean Optimization Problems. Lecture Notes in Mathematics, vol. 1675. Springer, Berlin (1998) zbMATHGoogle Scholar