Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space

Shenmaier, Vladimir

doi:10.1007/s11634-021-00481-4

Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space

Regular Article
Published: 03 December 2021

Volume 16, pages 1039–1067, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Vladimir Shenmaier ORCID: orcid.org/0000-0002-4692-1994¹

177 Accesses
Explore all metrics

Abstract

Many geometric optimization problems can be reduced to choosing points in space (centers) minimizing some objective function which continuously depends on the distances from the chosen centers to given input points. We prove that, for any fixed $\varepsilon >0$, every finite set of points in any-dimensional real space admits a polynomial-size set of candidate centers which can be computed in polynomial time and which contains a $(1+\varepsilon )$-approximation of each point of space with respect to the Euclidean distances to all the given points. It provides a universal approximation-preserving reduction of any geometric center-based problems whose objective function satisfies a natural continuity-type condition to their discrete versions where the desired centers are selected from a polynomial-size set of candidates. The obtained polynomial upper bound for the size of a universal centers set is supplemented by a theoretical lower bound for this size in the worst case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Linear-size universal discretization of geometric center-based problems in fixed dimensions

Article 06 August 2021

Some Estimates on the Discretization of Geometric Center-Based Problems in High Dimensions

A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

References

Agarwal P, Har-Peled S, Varadarajan K (2005) Geometric approximation via coresets. In: Combinatorial and computational geometry. MSRI Publications 52, pp 1–30. Cambridge University Press. http://library.msri.org/books/Book52/files/01agar.pdf
Ahmadian S, Norouzi-Fard A, Svensson O, Ward J (2020) Better guarantees for $k$-means and Euclidean $k$-median by primal-dual algorithms. SIAM J Comput 49(4):FOCS17-97-FOCS17-156. https://doi.org/10.1137/18M1171321
Article MathSciNet MATH Google Scholar
Aho A, Hopcroft J, Ullman J (1974) The design and analysis of computer algorithms. Addison-Wesley, New York. https://doi.org/10.5555/578775
Book MATH Google Scholar
Aigner M (1979) Combinatorial theory. Springer, Berlin. https://doi.org/10.1007/978-1-4615-6666-3
Book MATH Google Scholar
Aloise D, Deshpande A, Hansen P, Popat P (2009) NP-hardness of Euclidean sum-of-squares clustering. Mach Learn 75(2):245–248. https://doi.org/10.1007/s10994-009-5103-0
Article MATH Google Scholar
Bhattacharya A, Goyal D, Jaiswal R (2020) Hardness of approximation of Euclidean $k$-median. arXiv:2011.04221 [cs.CC]
Bǎdoiu M, Har-Peled S, Indyk P (2002) Approximate clustering via core-sets. In: Proceedings 34th ACM symposium on theory of computing (STOC 2002), pp 250–257. https://doi.org/10.1145/509907.509947
Chen K (2009) On coresets for $k$-median and $k$-means clustering in metric and Euclidean spaces and their applications. SIAM J Comput 39(3):923–947. https://doi.org/10.1137/070699007
Article MathSciNet MATH Google Scholar
Cook W, Rohe A (1999) Computing minimum-weight perfect matchings. INFORMS J Comput 11(2):138–148. https://doi.org/10.1287/ijoc.11.2.138
Article MathSciNet MATH Google Scholar
Feder T, Greene D (1988) Optimal algorithms for approximate clustering. In: Proceedings of the 20th ACM symposium on theory of computing (STOC 1988), pp 434–444. https://doi.org/10.1145/62212.62255
Feldman D, Monemizadeh M, Sohler C (2007) A PTAS for $k$-means clustering based on weak coresets. In: Proceedings of the 23rd ACM symposium on computational geometry, pp 11–18. https://doi.org/10.1145/1247069.1247072
Gonzalez T (1985) Clustering to minimize the maximum intercluster distance. Theor Comput Sci 38:293–306. https://doi.org/10.1016/0304-3975(85)90224-5
Article MathSciNet MATH Google Scholar
Guruswami V, Indyk P (2003) Embeddings and non-approximability of geometric problems. In: Proceedings of the 14th ACM-SIAM symposium on discrete algorithms (SODA 2003), pp 537–538. https://doi.org/10.5555/644108.644198
Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based $k$-clustering (extended abstract). In: Proceedings of the 10th ACM symposium on computational geometry, pp 332–339. https://doi.org/10.1145/177424.178042
Jaiswal R, Kumar A, Sen S (2014) A simple $D^2$-sampling based PTAS for $k$-means and other clustering problems. Algorithmica 70(1):22–46. https://doi.org/10.1007/s00453-013-9833-9
Article MathSciNet MATH Google Scholar
Kel’manov A, Pyatkin A (2011) NP-completeness of some problems of choosing a vector subset. J Appl Ind Math 5(3):352–357. https://doi.org/10.1134/S1990478911030069
Kumar P, Mitchell J, Yıldırım E (2003) Approximate minimum enclosing balls in high dimensions using core-sets. J Exp Algorithmics 8:1–29. https://doi.org/10.1145/996546.996548
Article MathSciNet MATH Google Scholar
Kumar A, Sabharwal Y, Sen S (2010) Linear-time approximation schemes for clustering problems in any dimensions. J ACM 57(2):1–32. https://doi.org/10.1145/1667053.1667054
Article MathSciNet MATH Google Scholar
Lee E, Schmidt M, Wright J (2017) Improved and simplified inapproximability for $k$-means. Inf Proc Lett 120:40–43. https://doi.org/10.1016/j.ipl.2016.11.009
Article MathSciNet MATH Google Scholar
Lyubashevsky V, Prest T (2015) Quadratic time, linear space algorithms for Gram–Schmidt orthogonalization and Gaussian sampling in structured lattices. In: Proceedings of the 34th conference on the theory and applications of cryptographic techniques (EUROCRYPT 2015), LNCS 9056, pp 789–815. https://doi.org/10.1007/978-3-662-46800-5_30
Megiddo N (1990) On the complexity of some geometric problems in unbounded dimension. J Symb Comput 10(3):327–334. https://doi.org/10.1016/S0747-7171(08)80067-3
Article MathSciNet MATH Google Scholar
Megiddo N, Supowit K (1984) On the complexity of some common geometric location problems. SIAM J Comput 13(1):182–196. https://doi.org/10.1137/0213014
Article MathSciNet MATH Google Scholar
Mentzer S (1988) Approximability of metric clustering problems. https://www.academia.edu/23251714/Approximability_of_Metric_Clustering_Problems
Shenmaier V (2012) An approximation scheme for a problem of search for a vector subset. J Appl Ind Math 6(3):381–386. https://doi.org/10.1134/S1990478912030131
Article MathSciNet MATH Google Scholar
Shenmaier V (2015) Complexity and approximation of the smallest $k$-enclosing ball problem. Eur J Comb 48:81–87. https://doi.org/10.1016/j.ejc.2015.02.011
Article MathSciNet MATH Google Scholar
Shenmaier V (2021) Linear-size universal discretization of geometric center-based problems in fixed dimensions. J Combin Optim. https://doi.org/10.1007/s10878-021-00790-6

Download references

Author information

Authors and Affiliations

Sobolev Institute of Mathematics, 4 Koptyug avenue, Novosibirsk, Russia, 630090
Vladimir Shenmaier

Authors

Vladimir Shenmaier
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The study was carried out within the framework of the state contract of the Sobolev Institute of Mathematics (project 0314-2019-0014).

Appendix

Here, we prove Statements 1 and 2, which contain estimations of the functions

$$\begin{aligned} a(\varepsilon )=\frac{\zeta +1}{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil } \hbox { and } b(\varepsilon )=\frac{\ell (\varepsilon )}{(\frac{1}{\varepsilon }\log \frac{2}{\varepsilon })^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }}, \end{aligned}$$

where

$$\begin{aligned}&\varepsilon \in (0,1),\zeta =\Big \lceil \frac{\log \frac{1}{0.87\varepsilon }}{\log (1+0.87\varepsilon )}\Big \rceil ,\hbox { and}\\&\ell (\varepsilon )=\left( \frac{\sqrt{\zeta }}{0.26\varepsilon \,(0.87 \varepsilon )^{1/\zeta }}+\frac{1}{0.87\varepsilon }\right) ^\zeta (1+0.87\varepsilon )^{-\zeta (\zeta -1)/2}\zeta . \end{aligned}$$

Statement 1

If $\varepsilon \in (0,1)$, then $a(\varepsilon )\le 1$.

Proof

Case 1: $\varepsilon \in (0,0.4)$. In this case, by using Taylor’s theorem, we obtain that

$$\begin{aligned} \log (1+0.87\varepsilon )=\frac{\ln (1+0.87\varepsilon )}{\ln 2}\ge \frac{0.87\varepsilon -(0.87\varepsilon )^2/2}{\ln 2}>\frac{0.71\varepsilon }{\ln 2}>\varepsilon ,\hbox { so} \\ \frac{\log \frac{1}{0.87\varepsilon }}{\log (1+0.87\varepsilon )}+1< \frac{\log \frac{1+0.87\varepsilon }{0.87\varepsilon }}{\varepsilon }<{\textstyle \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }}. \end{aligned}$$

It follows that $a(\varepsilon )<1$.

Case 2: $\varepsilon \in [0.4,1)$. The interval [0.4, 1) can be divided into 8 subintervals with constant values of both $\zeta $ and $\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil $. For each of these subintervals, the inequality $a(\varepsilon )\le 1$ is verified directly. $\square $

The proof of Statement 2 is more complicated. To improve understanding, we first give a short sketch for the justification of the weaker statement:

Simplified stimate

If $\varepsilon \in (0,1)$, then $\displaystyle \ell (\varepsilon )=\big (\,{\mathcal {O}}({\textstyle \frac{1}{\varepsilon }})\log {\textstyle \frac{2}{\varepsilon }}\,\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }$.

Proof

By using the inequality $\ln (1+x)\le x$, we obtain that $\displaystyle 1/\zeta \le \frac{0.87\varepsilon }{\ln \frac{1}{0.87\varepsilon }}={\mathcal {O}}(\varepsilon )$, so $(0.87\varepsilon )^{1/\zeta }=\Omega (1)$. Next, by Statement 1, we have $\zeta <\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }$, therefore, the expression $\displaystyle \frac{\sqrt{\zeta }}{0.26\varepsilon \,(0.87\varepsilon )^{1/\zeta }}+\frac{1}{0.87\varepsilon }$ in the definition of $\ell (\varepsilon )$ is $\displaystyle \frac{{\mathcal {O}}(1)}{\varepsilon ^{1.5}}\sqrt{\log {\textstyle \frac{2}{\varepsilon }}}$. On the other hand, the definition of $\zeta $ gives the inequality $\displaystyle (1+0.87\varepsilon )^\zeta \ge \frac{1}{0.87\varepsilon }$, which can be written as $(1+0.87\varepsilon )^{-(\zeta -1)/2}\le \sqrt{0.87\varepsilon \,(1+0.87\varepsilon )}$. It follows that

$$\begin{aligned} \ell (\varepsilon )= \left( \,\frac{{\mathcal {O}}(1)\sqrt{\varepsilon }}{\varepsilon ^{1.5}}{ \textstyle \sqrt{(1+0.87\varepsilon )\log \frac{2}{\varepsilon }}}\ \right) ^\zeta \zeta . \end{aligned}$$

Finally, we recall that $\zeta \le \lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil -1$, as shown in Statement 1, so we obtain the estimate $\ell (\varepsilon )= \big (\,{\mathcal {O}}({\textstyle \frac{1}{\varepsilon }})\log {\textstyle \frac{2}{\varepsilon }}\,\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }$. $\square $

Now, let us prove the required stronger statement.

Statement 2

If $\varepsilon \in (0,1)$, then $b(\varepsilon )< 37$.

Proof

Case 1: $\varepsilon \in (0,2^{-14})$. The inequality $\ln (1+x)\le x$ implies that $\displaystyle \zeta \ge \frac{\ln \frac{1}{0.87\varepsilon }}{0.87\varepsilon }$. Hence, for small $\varepsilon $, the second term in the sum

$$\begin{aligned} \frac{\sqrt{\zeta }}{0.26\varepsilon \,(0.87\varepsilon )^{1/\zeta }}+\frac{1}{0.87\varepsilon } \end{aligned}$$

is much less than the first. It follows that this sum can be estimated, e.g., as the value $\displaystyle \frac{\sqrt{\zeta }}{0.25\varepsilon \,(0.87\varepsilon )^{1/\zeta }}$. Next, by the definition of $\zeta $, we have $\displaystyle (1+0.87\varepsilon )^\zeta \ge \frac{1}{0.87\varepsilon }$. Then

$$\begin{aligned} \ell (\varepsilon )< \left( \frac{\sqrt{\zeta }}{0.25\varepsilon \,(0.87\varepsilon )^{1/\zeta }}\right) ^\zeta (0.87\varepsilon )^{(\zeta -1)/2}\zeta = \left( \frac{\sqrt{0.87\zeta }}{0.25\sqrt{\varepsilon }}\,\right) ^\zeta (0.87\varepsilon )^{-1.5}\zeta . \end{aligned}$$

But $\displaystyle \zeta \le \frac{\log \frac{1}{0.87\varepsilon }}{\log (1+0.87\varepsilon )}+1$ and, for small $\varepsilon $, it is less than $\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }$, as shown in the proof of Statement 1. So

$$\begin{aligned} \ell (\varepsilon )<\left( \frac{\sqrt{0.87}}{0.25\varepsilon }{ \textstyle \sqrt{\log \frac{2}{\varepsilon }}}\,\right) ^{\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }}\frac{\log \frac{2}{\varepsilon }}{0.87^{1.5}\varepsilon ^{2.5}}. \end{aligned}$$

It remains to note that the obtained expression is less than $\big (\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\big )^{\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }}$ since

$$\begin{aligned} \left( \frac{\sqrt{0.87}}{0.25}\,\right) ^{\frac{1}{\varepsilon } \log \frac{2}{\varepsilon }}\frac{\log \frac{2}{\varepsilon }}{0.87^{1.5} \varepsilon ^{2.5}}<\left( {\textstyle \sqrt{\log \frac{2}{\varepsilon }}}\,\right) ^{\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }} \end{aligned}$$

for $\varepsilon =2^{-14}$ and, therefore, for any $\varepsilon <2^{-14}$. Thus, we have $b(\varepsilon )<1$.

Case 2: $\varepsilon \in [2^{-14},1)$. First, let us consider the expression

$$\begin{aligned} \displaystyle \ell (z,\varepsilon )=\left( \frac{\sqrt{z}}{0.26\varepsilon \,(0.87\varepsilon )^{1/z}}+\frac{1}{0.87\varepsilon }\right) ^z(1+0.87\varepsilon )^{-z(z-1)/2}z \end{aligned}$$

as the function of $\varepsilon \in [2^{-14},1)$ and $\displaystyle z\in \{\zeta -1,\,\zeta \}$. For any fixed positive integer z, if we decrease $\varepsilon $, then the value of this function increases. Similarly, if we fix $\varepsilon $ and increase z from $\zeta -1$ to $\zeta $, then we obtain that $\ell (\zeta ,\varepsilon )>\ell (\zeta -1,\varepsilon )$ since the terms

$$\begin{aligned} \Big (\frac{\sqrt{z}}{0.26\varepsilon \,(0.87\varepsilon )^{1/z}}(1+0.87\varepsilon )^{-(z-1)/2}\Big )^z \hbox { and } \Big (\frac{1}{0.87\varepsilon }(1+0.87\varepsilon )^{-(z-1)/2}\Big )^z \end{aligned}$$

increase at least in $\displaystyle \frac{(1+0.87\varepsilon )^{1-\zeta }}{0.26\varepsilon }>\frac{0.87\varepsilon }{0.26\varepsilon }>1$ and $\displaystyle \frac{(1+0.87\varepsilon )^{1-\zeta }}{0.87\varepsilon }>\frac{0.87\varepsilon }{0.87\varepsilon }=1$ times, respectively. Taking into account that the value of $\zeta $ is an integer-value function of $\varepsilon $, increasing when $\varepsilon $ decreases, the above observations imply that the function $\ell (\varepsilon )=\ell (\zeta ,\varepsilon )$ increases with decreasing $\varepsilon $.

On the other hand, the function $L(\varepsilon )=\big (\frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\big )^{\lceil \frac{1}{\varepsilon }\log \frac{2}{\varepsilon }\rceil }$, the denominator in the expression for $b(\varepsilon )$, also increases with decreasing $\varepsilon $. Hence, for any positive integer J and each $j=1,\dots ,J$, we obtain the inequality $\displaystyle \max _{\varepsilon \in [\varepsilon _{j-1},\varepsilon _j]}b(\varepsilon )\le \frac{\ell (\varepsilon _{j-1})}{L(\varepsilon _j)}$, where $\varepsilon _0=2^{-14}$ and $\varepsilon _j=\varepsilon _0+(1-\varepsilon _0)\,j/J$. It follows that

$$\begin{aligned} \max _{\varepsilon \in [2^{-14},1]}b(\varepsilon )=\max _{j=1,\dots ,J}\max _{\varepsilon \in [\varepsilon _{j-1},\varepsilon _j]}b(\varepsilon )\le \max _{j=1,\dots ,J}\frac{\ell (\varepsilon _{j-1})}{L(\varepsilon _j)}. \end{aligned}$$

To finish the proof, we choose $J=2^{15}$ and, by using computer calculations, verify that $\displaystyle \frac{\ell (\varepsilon _{j-1})}{L(\varepsilon _j)}<37$ for all $j=1,\dots ,J$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shenmaier, V. Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space. Adv Data Anal Classif 16, 1039–1067 (2022). https://doi.org/10.1007/s11634-021-00481-4

Download citation

Received: 16 April 2020
Revised: 15 May 2021
Accepted: 25 October 2021
Published: 03 December 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11634-021-00481-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space

Abstract

Access this article

Similar content being viewed by others

Linear-size universal discretization of geometric center-based problems in fixed dimensions

Some Estimates on the Discretization of Geometric Center-Based Problems in High Dimensions

A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

References

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendix

Statement 1

Proof

Simplified stimate

Proof

Statement 2

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Polynomial approximate discretization of geometric centers in high-dimensional Euclidean space

Abstract

Access this article

Similar content being viewed by others

Linear-size universal discretization of geometric center-based problems in fixed dimensions

Some Estimates on the Discretization of Geometric Center-Based Problems in High Dimensions

A Structural Theorem for Center-Based Clustering in High-Dimensional Euclidean Space

References

Author information

Authors and Affiliations

Additional information

Publisher's Note

Appendix

Appendix

Statement 1

Proof

Simplified stimate

Proof

Statement 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation