Abstract
Many geometric optimization problems can be reduced to choosing points in space (centers) minimizing some objective function which continuously depends on the distances from the chosen centers to given input points. We prove that, for any fixed \(\varepsilon >0\), every finite set of points in any-dimensional real space admits a polynomial-size set of candidate centers which can be computed in polynomial time and which contains a \((1+\varepsilon )\)-approximation of each point of space with respect to the Euclidean distances to all the given points. It provides a universal approximation-preserving reduction of any geometric center-based problems whose objective function satisfies a natural continuity-type condition to their discrete versions where the desired centers are selected from a polynomial-size set of candidates. The obtained polynomial upper bound for the size of a universal centers set is supplemented by a theoretical lower bound for this size in the worst case.
Similar content being viewed by others
References
Agarwal P, Har-Peled S, Varadarajan K (2005) Geometric approximation via coresets. In: Combinatorial and computational geometry. MSRI Publications 52, pp 1–30. Cambridge University Press. http://library.msri.org/books/Book52/files/01agar.pdf
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2020) Better guarantees for \(k\)-means and Euclidean \(k\)-median by primal-dual algorithms. SIAM J Comput 49(4):FOCS17-97-FOCS17-156. https://doi.org/10.1137/18M1171321
Aho A, Hopcroft J, Ullman J (1974) The design and analysis of computer algorithms. Addison-Wesley, New York. https://doi.org/10.5555/578775
Aigner M (1979) Combinatorial theory. Springer, Berlin. https://doi.org/10.1007/978-1-4615-6666-3
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248. https://doi.org/10.1007/s10994-009-5103-0
Bhattacharya A, Goyal D, Jaiswal R (2020) Hardness of approximation of Euclidean \(k\)-median. arXiv:2011.04221 [cs.CC]
Bǎdoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings 34th ACM symposium on theory of computing (STOC 2002), pp 250–257. https://doi.org/10.1145/509907.509947
Chen K (2009) On coresets for \(k\)-median and \(k\)-means clustering in metric and Euclidean spaces and their applications. SIAM J Comput 39(3):923–947. https://doi.org/10.1137/070699007
Cook W, Rohe A (1999) Computing minimum-weight perfect matchings. INFORMS J Comput 11(2):138–148. https://doi.org/10.1287/ijoc.11.2.138
Feder T, Greene D (1988) Optimal algorithms for approximate clustering. In: Proceedings of the 20th ACM symposium on theory of computing (STOC 1988), pp 434–444. https://doi.org/10.1145/62212.62255
Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for \(k\)-means clustering based on weak coresets. In: Proceedings of the 23rd ACM symposium on computational geometry, pp 11–18. https://doi.org/10.1145/1247069.1247072
Gonzalez T (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306. https://doi.org/10.1016/0304-3975(85)90224-5
Guruswami V, Indyk P (2003) Embeddings and non-approximability of geometric problems. In: Proceedings of the 14th ACM-SIAM symposium on discrete algorithms (SODA 2003), pp 537–538. https://doi.org/10.5555/644108.644198
Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based \(k\)-clustering (extended abstract). In: Proceedings of the 10th ACM symposium on computational geometry, pp 332–339. https://doi.org/10.1145/177424.178042
Jaiswal R, Kumar A, Sen S (2014) A simple \(D^2\)-sampling based PTAS for \(k\)-means and other clustering problems. Algorithmica 70(1):22–46. https://doi.org/10.1007/s00453-013-9833-9
Kel’manov A, Pyatkin A (2011) NP-completeness of some problems of choosing a vector subset. J Appl Ind Math 5(3):352–357. https://doi.org/10.1134/S1990478911030069
Kumar P, Mitchell J, Yıldırım E (2003) Approximate minimum enclosing balls in high dimensions using core-sets. J Exp Algorithmics 8:1–29. https://doi.org/10.1145/996546.996548
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):1–32. https://doi.org/10.1145/1667053.1667054
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for \(k\)-means. Inf Proc Lett 120:40–43. https://doi.org/10.1016/j.ipl.2016.11.009
Lyubashevsky V, Prest T (2015) Quadratic time, linear space algorithms for Gram–Schmidt orthogonalization and Gaussian sampling in structured lattices. In: Proceedings of the 34th conference on the theory and applications of cryptographic techniques (EUROCRYPT 2015), LNCS 9056, pp 789–815. https://doi.org/10.1007/978-3-662-46800-5_30
Megiddo N (1990) On the complexity of some geometric problems in unbounded dimension. J Symb Comput 10(3):327–334. https://doi.org/10.1016/S0747-7171(08)80067-3
Megiddo N, Supowit K (1984) On the complexity of some common geometric location problems. SIAM J Comput 13(1):182–196. https://doi.org/10.1137/0213014
Mentzer S (1988) Approximability of metric clustering problems. https://www.academia.edu/23251714/Approximability_of_Metric_Clustering_Problems
Shenmaier V (2012) An approximation scheme for a problem of search for a vector subset. J Appl Ind Math 6(3):381–386. https://doi.org/10.1134/S1990478912030131
Shenmaier V (2015) Complexity and approximation of the smallest \(k\)-enclosing ball problem. Eur J Comb 48:81–87. https://doi.org/10.1016/j.ejc.2015.02.011
Shenmaier V (2021) Linear-size universal discretization of geometric center-based problems in fixed dimensions. J Combin Optim. https://doi.org/10.1007/s10878-021-00790-6
Author information
Authors and Affiliations
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project 0314-2019-0014).
Appendix
Appendix
Here, we prove Statements 1 and 2, which contain estimations of the functions
where
Statement 1
If \(\varepsilon \in (0,1)\), then \(a(\varepsilon )\le 1\).
Proof
Case 1: \(\varepsilon \in (0,0.4)\). In this case, by using Taylor’s theorem, we obtain that
It follows that \(a(\varepsilon )<1\).
Case 2: \(\varepsilon \in [0.4,1)\). The interval [0.4, 1) can be divided into 8 subintervals with constant values of both \(\zeta \) and \(\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil \). For each of these subintervals, the inequality \(a(\varepsilon )\le 1\) is verified directly. \(\square \)
The proof of Statement 2 is more complicated. To improve understanding, we first give a short sketch for the justification of the weaker statement:
Simplified stimate
If \(\varepsilon \in (0,1)\), then \(\displaystyle \ell (\varepsilon )=\big (\,{\mathcal {O}}({\textstyle \frac{1}{\varepsilon }})\log {\textstyle \frac{2}{\varepsilon }}\,\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }\).
Proof
By using the inequality \(\ln (1+x)\le x\), we obtain that \(\displaystyle 1/\zeta \le \frac{0.87\varepsilon }{\ln \frac{1}{0.87\varepsilon }}={\mathcal {O}}(\varepsilon )\), so \((0.87\varepsilon )^{1/\zeta }=\Omega (1)\). Next, by Statement 1, we have \(\zeta <\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\), therefore, the expression \(\displaystyle \frac{\sqrt{\zeta }}{0.26\varepsilon \,(0.87\varepsilon )^{1/\zeta }}+\frac{1}{0.87\varepsilon }\) in the definition of \(\ell (\varepsilon )\) is \(\displaystyle \frac{{\mathcal {O}}(1)}{\varepsilon ^{1.5}}\sqrt{\log {\textstyle \frac{2}{\varepsilon }}}\). On the other hand, the definition of \(\zeta \) gives the inequality \(\displaystyle (1+0.87\varepsilon )^\zeta \ge \frac{1}{0.87\varepsilon }\), which can be written as \((1+0.87\varepsilon )^{-(\zeta -1)/2}\le \sqrt{0.87\varepsilon \,(1+0.87\varepsilon )}\). It follows that
Finally, we recall that \(\zeta \le \lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil -1\), as shown in Statement 1, so we obtain the estimate \(\ell (\varepsilon )= \big (\,{\mathcal {O}}({\textstyle \frac{1}{\varepsilon }})\log {\textstyle \frac{2}{\varepsilon }}\,\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }\). \(\square \)
Now, let us prove the required stronger statement.
Statement 2
If \(\varepsilon \in (0,1)\), then \(b(\varepsilon )< 37\).
Proof
Case 1: \(\varepsilon \in (0,2^{-14})\). The inequality \(\ln (1+x)\le x\) implies that \(\displaystyle \zeta \ge \frac{\ln \frac{1}{0.87\varepsilon }}{0.87\varepsilon }\). Hence, for small \(\varepsilon \), the second term in the sum
is much less than the first. It follows that this sum can be estimated, e.g., as the value \(\displaystyle \frac{\sqrt{\zeta }}{0.25\varepsilon \,(0.87\varepsilon )^{1/\zeta }}\). Next, by the definition of \(\zeta \), we have \(\displaystyle (1+0.87\varepsilon )^\zeta \ge \frac{1}{0.87\varepsilon }\). Then
But \(\displaystyle \zeta \le \frac{\log \frac{1}{0.87\varepsilon }}{\log (1+0.87\varepsilon )}+1\) and, for small \(\varepsilon \), it is less than \(\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\), as shown in the proof of Statement 1. So
It remains to note that the obtained expression is less than \(\big (\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\big )^{\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }}\) since
for \(\varepsilon =2^{-14}\) and, therefore, for any \(\varepsilon <2^{-14}\). Thus, we have \(b(\varepsilon )<1\).
Case 2: \(\varepsilon \in [2^{-14},1)\). First, let us consider the expression
as the function of \(\varepsilon \in [2^{-14},1)\) and \(\displaystyle z\in \{\zeta -1,\,\zeta \}\). For any fixed positive integer z, if we decrease \(\varepsilon \), then the value of this function increases. Similarly, if we fix \(\varepsilon \) and increase z from \(\zeta -1\) to \(\zeta \), then we obtain that \(\ell (\zeta ,\varepsilon )>\ell (\zeta -1,\varepsilon )\) since the terms
increase at least in \(\displaystyle \frac{(1+0.87\varepsilon )^{1-\zeta }}{0.26\varepsilon }>\frac{0.87\varepsilon }{0.26\varepsilon }>1\) and \(\displaystyle \frac{(1+0.87\varepsilon )^{1-\zeta }}{0.87\varepsilon }>\frac{0.87\varepsilon }{0.87\varepsilon }=1\) times, respectively. Taking into account that the value of \(\zeta \) is an integer-value function of \(\varepsilon \), increasing when \(\varepsilon \) decreases, the above observations imply that the function \(\ell (\varepsilon )=\ell (\zeta ,\varepsilon )\) increases with decreasing \(\varepsilon \).
On the other hand, the function \(L(\varepsilon )=\big (\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }\), the denominator in the expression for \(b(\varepsilon )\), also increases with decreasing \(\varepsilon \). Hence, for any positive integer J and each \(j=1,\dots ,J\), we obtain the inequality \(\displaystyle \max _{\varepsilon \in [\varepsilon _{j-1},\varepsilon _j]}b(\varepsilon )\le \frac{\ell (\varepsilon _{j-1})}{L(\varepsilon _j)}\), where \(\varepsilon _0=2^{-14}\) and \(\varepsilon _j=\varepsilon _0+(1-\varepsilon _0)\,j/J\). It follows that
To finish the proof, we choose \(J=2^{15}\) and, by using computer calculations, verify that \(\displaystyle \frac{\ell (\varepsilon _{j-1})}{L(\varepsilon _j)}<37\) for all \(j=1,\dots ,J\). \(\square \)
Rights and permissions
About this article
Cite this article
Shenmaier, V. Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space. Adv Data Anal Classif 16, 1039–1067 (2022). https://doi.org/10.1007/s11634-021-00481-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-021-00481-4
Keywords
- Geometric optimization
- Clustering
- Facility location
- Euclidean space
- High dimensions
- Candidate centers
- Discretization