Preclustering Algorithms for Imprecise Points

Abam, Mohammad Ali; de Berg, Mark; Farahzad, Sina; Haji Mirsadeghi, Mir Omid; Saghafian, Morteza

doi:10.1007/s00453-022-00929-9

Preclustering Algorithms for Imprecise Points

Published: 09 February 2022

Volume 84, pages 1467–1489, (2022)
Cite this article

Algorithmica Aims and scope Submit manuscript

185 Accesses
1 Citation
Explore all metrics

Abstract

We study the problem of preclustering a set B of imprecise points in ${\mathbb {R}}^d$: we wish to cluster the regions specifying the potential locations of the points such that, no matter where the points are located within their regions, the resulting clustering approximates the optimal clustering for those locations. We consider k-center, k-median, and k-means clustering, and obtain the following results. Let $B := \{b_1,\ldots ,b_n\}$ be a collection of disjoint balls in ${\mathbb {R}}^d$, where each ball $b_i$ specifies the possible locations of an input point $p_i$. A partition $\mathcal {C}$ of B into subsets is called an $(f(k),\alpha )$-preclustering (with respect to the specific k-clustering variant under consideration) if (i) $\mathcal {C}$ consists of f(k) preclusters, and (ii) for any realization P of the points $p_i$ inside their respective balls, the cost of the clustering on P induced by $\mathcal {C}$ is at most $\alpha $ times the cost of an optimal k-clustering on P. We call f(k) the size of the preclustering and we call $\alpha $ its approximation ratio. We prove that, even in ${\mathbb {R}}^1$, one may need at least $3k-3$ preclusters to obtain a bounded approximation ratio—this holds for the k-center, the k-median, and the k-means problem—and we present a (3k, 1) preclustering for the k-center problem in ${\mathbb {R}}^1$. We also present various preclusterings for balls in ${\mathbb {R}}^d$ with $d\geqslant 2$, including a $(3k,\alpha )$-preclustering with $\alpha \approx 13.9$ for the k-center and the k-median problem, and $\alpha \approx 193.9$ for the k-means problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

Separability of Imprecise Points

One-Dimensional k-Center on Uncertain Data

Improvements on approximation algorithms for clustering probabilistic data

Article 30 August 2021

Notes

The subscript $\infty $ in ${\textsc {Opt}}_{\infty }$ refers to the fact that if $d_i$ denotes the distance of point $p_i\in P$ to its nearest center, then we are minimizing the norm of the vector $\langle d_1,\ldots ,d_n\rangle $ in the $\ell _{\infty }$-metric. For k-median and k-means we are minimizing the norm in the $\ell _1$-metric and in the squared $\ell _2$-metric, respectively.

References

Agarwal, P.K., Procopiuc, C.M.: Exact and approximation algorithms for clustering. Algorithmica 33(2), 201–226 (2002). https://doi.org/10.1007/s00453-001-0110-y
Article MathSciNet MATH Google Scholar
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for euclidean k-medians and related problems. In: Proceedings of the ACM Symposium on the Theory of Computing, pp. 106–113 (1998). https://doi.org/10.1145/276698.276718
Bilu, Y., Linial, N.: Are stable instances easy? Comb. Probab. Comput. 21(5), 643–660 (2012). https://doi.org/10.1017/S0963548312000193
Article MathSciNet MATH Google Scholar
Buchin, K., Löffler, M., Morin, P., Mulzer, W.: Preprocessing imprecise points for delaunay triangulation: simplified and extended. Algorithmica 61(3), 674–693 (2011). https://doi.org/10.1007/s00453-010-9430-0
Article MathSciNet MATH Google Scholar
Cohen-Addad, V., Schwiegelshohn, C.: On the local structure of stable clustering instances. In: C. Umans (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15–17, 2017, pp. 49–60. IEEE Computer Society (2017). https://doi.org/10.1109/FOCS.2017.14
Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based on weak coresets. In: Proceedings of ACM Symposium on Computational Geometry, pp. 11–18 (2007). https://doi.org/10.1145/1247069.1247072
Gullo, F., Tagarelli, A.: Uncertain centroid based partitional clustering of uncertain data. Proc. VLDB Endow. 5(7), 610–621 (2012). https://doi.org/10.14778/2180912.2180914
Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Oper. Res. 10(2), 180–184 (1985). https://doi.org/10.1287/moor.10.2.180
Article MathSciNet MATH Google Scholar
Jaiswal, R., Kumar, A., Sen, S.: A simple D 2-sampling based PTAS for k-means and other clustering problems. In: Computing and Combinatorics COCOON 2012. Lecture Notes in Computer Science, vol. 7434, pp. 13–24. Springer (2012). https://doi.org/10.1007/978-3-642-32241-9_2
Ju, W., Luo, J., Zhu, B., Daescu, O.: Largest area convex hull of imprecise data based on axis-aligned squares. J. Comb. Optim. 26(4), 832–859 (2013). https://doi.org/10.1007/s10878-012-9488-5
Article MathSciNet MATH Google Scholar
Kao, B., Lee, S.D., Cheung, D.W., Ho, W., Chan, K.F.: Clustering uncertain data using voronoi diagrams. In: Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15–19, 2008, Pisa, Italy, pp. 333–342. IEEE Computer Society (2008). https://doi.org/10.1109/ICDM.2008.31
Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the euclidean kappa-median problem. In: Algorithms—ESA Proceedings. Lecture Notes in Computer Science, vol. 1643, pp. 378–389. Springer (1999). https://doi.org/10.1007/3-540-48481-7_33
Liu, C., Montanari, S.: Minimizing the diameter of a spanning tree for imprecise points. Algorithmica 80(2), 801–826 (2018). https://doi.org/10.1007/s00453-017-0292-6
Article MathSciNet MATH Google Scholar
Löffler, M.: Data imprecision in computational geometry. Ph.D. thesis, Utrecht University, Netherlands (2009)
Löffler, M., van Kreveld, M.J.: Largest and smallest convex hulls for imprecise points. Algorithmica 56(2), 235–269 (2010). https://doi.org/10.1007/s00453-008-9174-2
Article MathSciNet MATH Google Scholar
Mahajan, M., Nimbhorkar, P., Varadarajan, K.R.: The planar k-means problem is nphard. Theor. Comput. Sci. 442, 13–21 (2012). https://doi.org/10.1016/j.tcs.2010.05.034
Article MATH Google Scholar
Megiddo, N., Supowit, K.J.: On the complexity of some common geometric location problems. SIAM J. Comput. 13(1), 182–196 (1984). https://doi.org/10.1137/0213014
Article MathSciNet MATH Google Scholar
Nagai, T., Tokura, N.: Tight error bounds of geometric problems on convex objects with imprecise coordinates. In: JCDCG. Lecture Notes in Computer Science, vol. 2098, pp. 252–263. Springer (2000). https://doi.org/10.1007/3-540-47738-1_24
Saulpic, D., Cohen-Addad, V., Feldmann, A.E.: Near-linear time approximations schemes for clustering in doubling metrics. In: D. Zuckerman (ed.) 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9–12, 2019, pp. 540–559. IEEE Computer Society (2019). https://doi.org/10.1109/FOCS.2019.00041
Sheikhi, F., Mohades, A., de Berg, M., Mehrabi, A.D.: Separability of imprecise points. Comput. Geom. 61, 24–37 (2017). https://doi.org/10.1016/j.comgeo.2016.10.001
Article MathSciNet MATH Google Scholar
Shmoys, D.B., Tardos, É.: An approximation algorithm for the generalized assignment problem. Math. Program. 62, 461–474 (1993). https://doi.org/10.1007/BF01585178
Article MathSciNet MATH Google Scholar
Spielman, D.A., Teng, S.: Smoothed analysis: an attempt to explain the behavior of algorithms in practice. Commun. ACM 52(10), 76–84 (2009). https://doi.org/10.1145/1562764.1562785
Article Google Scholar
Talata, I.: Exponential lower bound for the translative kissing numbers of d-dimensional convex bodies. Discrete Comput. Geom. 19(3), 447–455 (1998). https://doi.org/10.1007/PL00009362
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Mohammad Ali Abam & Sina Farahzad
Department of Mathematics and Computer Science, TU Eindhoven, Eindhoven, The Netherlands
Mark de Berg
Department of Mathematical Sciences, Sharif University of Technology, Tehran, Iran
Mir Omid Haji Mirsadeghi & Morteza Saghafian

Authors

Mohammad Ali Abam
View author publications
You can also search for this author in PubMed Google Scholar
Mark de Berg
View author publications
You can also search for this author in PubMed Google Scholar
Sina Farahzad
View author publications
You can also search for this author in PubMed Google Scholar
Mir Omid Haji Mirsadeghi
View author publications
You can also search for this author in PubMed Google Scholar
Morteza Saghafian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Ali Abam.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mark de Berg supported by the Netherlands’ Organisation for Scientific Research (NWO) under project networks (project no. 024.002.003).

Appendices

A Generalizations of Lemmas 3 and 4

We present the generalizations of Lemmas 3 and 4 which are used in order to prove Theorem 7. Note that p can take any value between 1 and n/k.

Lemma 13

For any B-instance P the preclustering $\mathcal {C}:= \{ \{b_1\}, \ldots , \{b_{(p-1)k} \},B_1,\ldots ,B_{k} \}$ computed by the generalized algorithm satisfies:

(i)
$\mathcal {C}{{\text {-}}\textsc {Cost}}_{\infty }(P) \leqslant {\textsc {Opt}}_{\infty }(P,k) + 2 \cdot {{\,\mathrm{radius}\,}}(b_{(p-1)k+1})$
(ii)
$\mathcal {C}{{\text {-}}\textsc {Cost}}_1(P) \leqslant {\textsc {Opt}}_1(P,k) + 2 \sum _{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j)$
(iii)
$\sqrt{\mathcal {C}{{\text {-}}\textsc {Cost}}_2(P)} \leqslant \sqrt{{\textsc {Opt}}_2(P,k)} + 2\sqrt{\sum _{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j)^2}$.

Proof

We first prove part (i) of the lemma. Let P be any B-instance, let $p_j\in P$ denote the point inside $b_j$, and let $c_j$ be the center of $b_j$. Recall that $P_i\subset P$ is the subset of points in the instance corresponding to the precluster $B_i$. Define $P_{\mathrm {small}}:= \{ p_{(p-1)k+1},\ldots ,p_n\}$ to be the set of points from P in the small balls, and define $C_{\mathrm {small}}:= \{ c_{(p-1)k+1},\ldots ,c_n\}$. Note that $P_{\mathrm {small}}= P_1\cup \cdots \cup P_k$ and that

$$\begin{aligned} |p_j c_j| \leqslant {{\,\mathrm{radius}\,}}(b_j) \leqslant {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) \end{aligned}$$

(7)

for all $p_j\in P_{\mathrm {small}}$. We define the following sets of centroids:

Let $Q := \{q_1,\ldots .q_k\}$ be the set of centroids in an optimal k-center solution for the entire point set P. We have
$$\begin{aligned} \max _{p_j\in P_{\mathrm {small}}} \; \min _{q_i\in Q} |p_j q_i| \leqslant \max _{p_j\in P} \min _{q_i\in Q} |p_j q_i| = {\textsc {Opt}}_{\infty }(P,k). \end{aligned}$$
(8)
Let $Q' := \{q'_1,\ldots ,q'_k\}$ be the set of centroids in the optimal k-center clustering on $C_{\mathrm {small}}$ used in Step 2 of the algorithm. Thus
$$\begin{aligned} \max _{c_i\in C_{\mathrm {small}}} \min _{q'_j \in Q'} |c_i q'_j| = {\textsc {Opt}}_{\infty }(C_{\mathrm {small}},k) \leqslant \max _{c_i\in C_{\mathrm {small}}} \min _{q_j \in Q} |c_i q_j|. \end{aligned}$$
(9)
Let $Q'' := \{q''_1,\ldots ,q''_k\}$, where $q''_i$ is the optimal centroid for $P_i$. Note that for all $P_i$ we have
$$\begin{aligned} \max _{p_j\in P_i}|p_j q''_i| \leqslant \max _{p_j\in P_i}|p_j q'_i|. \end{aligned}$$
(10)

Since the total cost of the singleton preclusters is trivially zero, we have

$$\begin{aligned} \begin{array}{ll} \mathcal {C}{{\text {-}}\textsc {Cost}}_{\infty }(P) &{} \\ = \max _{1\leqslant i\leqslant k} \max _{p_j\in P_i} |p_j q''_i| &{} \\ \leqslant \max _{1\leqslant i\leqslant k} \max _{p_j\in P_i} |p_j q'_i| &{} \hbox { (Inequality}\,(10)) \\ \leqslant \max _{1\leqslant i\leqslant k} \max _{p_j\in P_i}\big ( |p_j c_j| + |c_j q'_i| \big ) &{} \hbox { (triangle inequality)} \\ \leqslant {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + \max _{1\leqslant i\leqslant k} \max _{p_j\in P_i} |c_j q'_i| &{} \hbox { (Inequality}\,(7)) \\ \leqslant {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + \max _{c_j\in C_{\mathrm {small}}} \min _{q'_i\in Q'} |c_j q'_i| &{} \hbox { (definition of }C_{\mathrm {small}}) \\ \leqslant {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + \max _{c_j\in C_{\mathrm {small}}} \min _{q_i\in Q} |c_j q_i| &{} \hbox { (Inequality}\,(9)) \\ \leqslant {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + \max _{p_j\in P_{\mathrm {small}}} \min _{q_i\in Q} \big ( |c_j p_j|+ |p_j q_i| \big ) &{} \hbox { (triangle inequality)} \\ \leqslant 2\cdot {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + \max _{p_j\in P_{\mathrm {small}}} \min _{q_i\in Q} |p_j q_i| &{} \hbox { (Inequality}\,(7)) \\ \leqslant 2\cdot {{\,\mathrm{radius}\,}}(b_{(p-1)k+1}) + {\textsc {Opt}}_{\infty }(P,k) &{} \hbox { (Inequality}\,(8)) \end{array} \end{aligned}$$

To prove part (ii) of the lemma, which deals with the k-median problem, note that Inequality (2) still holds while Inequalities (3)–(5) hold if we replace the $\max $-operator by a summation. Part (ii) can thus be derived using a similar derivation as for part (i).

To prove part (iii), which deals with the k-means problem, we need to work with squared distances. Note that Inequality (2) still holds, while Inequalities (3)–(5) hold if we replace the max-operator with a summation and all distance values with their squared values. For squared distances the triangle inequality does not hold. Instead we use the triangle inequality for $L_2$ norm, which is called Minkowsky Inequality. A similar computation as above can now be used to prove part (iii); we have

$$\begin{aligned} \begin{array}{ll} \sqrt{\mathcal {C}{{\text {-}}\textsc {Cost}}_2(P)} &{}\\ = \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i} |p_j q''_i|^2}&{}\\ \leqslant \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i} |p_j q'_i|^2} &{} \hbox { (Inequality}~(10)) \\ \leqslant \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i}\big ( |p_j c_j| + |c_j q'_i|\big )^2} &{} \hbox { (triangle inequality)} \\ \leqslant \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i} |p_j c_j|^2} + \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i}|c_j q'_i|^2} &{} \hbox { (Minkowsky inequality)} \\ \leqslant \sqrt{{\sum }_{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j)^2} + \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i} |c_j q'_i|^2} &{} \hbox { (Inequality}\,(7)) \\ \end{array} \end{aligned}$$

On the other hand

$$\begin{aligned} \begin{array}{ll} \sqrt{{\sum }_{i=1}^k {\sum }_{p_j\in P_i} |c_j q'_i|^2} &{} \\ \leqslant \sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} \min _{q'_i\in Q'} |c_j q'_i|^2} &{} \hbox { (definition of }C_{\mathrm {small}}) \\ \leqslant \sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} \min _{q_i\in Q} |c_j q_i|^2} &{} \hbox { (Inequality}\,(9)) \\ \leqslant \sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} \min _{q_i\in Q} \big (|c_j p_j|+|p_jq_i|\big )^2} &{} \hbox { (triangle inequality)} \\ \leqslant \sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} |c_j p_j|^2}+\sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} \min _{q_i\in Q} |p_j q_i|^2} &{} \hbox { (Minkowsky inequality)} \\ \leqslant \sqrt{{\sum }_{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j)^2} + \sqrt{{\sum }_{c_j\in C_{\mathrm {small}}} \min _{q_i\in Q} |p_j q_i|^2} &{} \hbox { (Inequality}\,(7)) \\ \leqslant \sqrt{{\sum }_{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j)^2} + \sqrt{{\textsc {Opt}}_2(P,k)} &{} \hbox { (Inequality}\,(8))\\ \end{array} \end{aligned}$$

By adding up the above inequalities we conclude part (iii) of the lemma. $\square $

Lemma 14

Let $r^p_d$ be the smallest possible radius of any ball that intersects p disjoint unit balls in ${\mathbb {R}}^d$. Then

(i)
${\textsc {Opt}}_{\infty }(P,k) \geqslant r^p_d \cdot {{\,\mathrm{radius}\,}}(b_{(p-1)k+1})$
(ii)
${\textsc {Opt}}_1(P,k) \geqslant r^p_d \cdot \sum _{j=(p-1)k+1}^{n} {{\,\mathrm{radius}\,}}(b_j)$
(iii)
${\textsc {Opt}}_2(P,k) \geqslant (r^p_d)^2 \cdot \sum _{j=(p-1)k+1}^{n} {{\,\mathrm{radius}\,}}(b_j)^2$

Proof

For part (i) notice that by the Pigeonhole Principle an optimal clustering must have a cluster containing at least p points from $\{p_1,\ldots ,p_{(p-1)k+1}\}$. The cost of this cluster is lower bounded by the radius of the smallest ball intersecting p balls of radius at least $b_{(p-1)k+1}$, which is in turn lower bounded by $r^p_d \cdot {{\,\mathrm{radius}\,}}(b_{(p-1)k+1})$.

For part (ii) let $P_1, P_2, \ldots , P_k$ be the clusters in an optimal k-median clustering on P, and let $q_i$ be the centroid of $P_i$ in this clustering. Let $B_i$ be the set of balls corresponding the points in $P_i$. We claim that

$$\begin{aligned} \sum _{p_j\in P_i} |p_jq_i| \geqslant r^p_d \cdot \Big ( \Big (\sum _{b_j \in B_i} {{\,\mathrm{radius}\,}}(b_j)\Big ) - s_i \Big ). \end{aligned}$$

(11)

where $s_i$ is the sum of the radii of the $p-1$ largest balls in $B_i$. To show this, let $b(q_i,r)$ be the ball of radius r centered at $q_i$, let $P_i(r) := \{ p_j\in P_i: b_j \cap b(q_i,r)\ne \emptyset \}$ be the set of points in $P_i$ whose associated ball intersects $b(q_i,r)$, and let $B_i(r)$ be the corresponding set of balls. Since for sufficiently large r we have $P_i = P_i(r)$, it suffices to show that for all $r>0$ we have

$$\begin{aligned} \sum _{p_j\in P_i(r)} |p_jq_i| \geqslant r^p_d \cdot \Big ( \Big (\sum _{p_j \in B_i(r)} {{\,\mathrm{radius}\,}}(b_j)\Big ) - s_i(r) \Big ). \end{aligned}$$

where $s_i(r)$ is the sum of the radii of the $p-1$ largest balls in $B_i(r)$. To prove this, consider this inequality as r increases from $r=0$ to $r=\infty $. As long as $|P_i(0)|\leqslant 2$ the right-hand side is zero and so the inequality is obviously true. As we increase r further, $b(q_i,r)$ starts intersecting more and more balls from $B_i$. Consider what happens to the inequality when $b(q_i,r)$ starts intersecting another ball $b_\ell \in B_i$. Then $p_\ell $ is added to $P_i(r)$, so the left-hand side of the inequality increases by $|p_\ell q_i|$, which is at least r. The right-hand side increases by at most $r^p_d$ times the radius of the p-th largest ball in $B_i(r)$. By definition of $r^p_d$, if p balls intersect a ball of radius r then the smallest has radius at most $r/r^p_d$. Hence, the right-hand side increases by at most r and the inequality remains true.

Recall that $b_1,\ldots ,b_{(p-1)k}$ are the $(p-1)k$ largest balls in B. Hence, summing Inequality (11) over all clusters $P_1,\ldots ,P_k$ gives

$$\begin{aligned} \begin{array}{ll} {\textsc {Opt}}_1(P,k) &{} = {\sum }_{i=1}^k {\sum }_{p_j\in P_i} |p_jq_i| \\ &{} \geqslant r^p_d \cdot \Big ( {\sum }_{i=1}^{k}{\sum }_{b_j \in B_i} {{\,\mathrm{radius}\,}}(b_j) - {\sum }_{j=1}^{(p-1)k}{{\,\mathrm{radius}\,}}(b_j) \Big ) \\ &{} = r^p_d \cdot {\sum }_{j=(p-1)k+1}^n {{\,\mathrm{radius}\,}}(b_j).\\ \end{array} \end{aligned}$$

For part (iii) the same proof as (ii) works if we replace all distances with squared distances. $\square $

B Proof of Theorem 9 for the k-Means Problem

For the k-means problem, observe that it suffices to prove the lower bound for $k=1$; for larger k we can simply copy the construction k times and put the copies sufficiently far from each other. Now for $k=1$, let $n=\frac{2^{2t+1}-2}{3}$ and for every $1\leqslant i \leqslant t$ let $B_i$ be the set of $2^{2t-2i}$ tiny intervals all very close to the point $2^{i-1}$ in ${\mathbb {R}}^1$. Also let $B_{-i}$ be the set of $2^{2t-2i}$ tiny intervals all very close to the point $-2^{i-1}$. Consider B as the union of $B_{-t},\cdots ,B_{-2},B_{-1},B_1,B_2,\cdots ,B_t$. It is not difficult to see that ${\textsc {Opt}}_2(B,1)=2t\cdot 2^{2t-2}$ (considering the origin as the center). On the other hand assume that we precluster B into f(1) preclusters, using $q_1,q_2,\cdots ,q_{f(1)}$ as centers. For $1 \leqslant i \leqslant t$, let $I_i$ be an open interval of length $2^{i-2}$ whose midpoint is $2^{i-1}$ and let $I_{-i}$ be an open interval of length $2^{i-2}$ whose midpoint is $-2^{i-1}$. Note that $I_{-t},\cdots ,I_{-2},I_{-1},I_1,I_2,\cdots I_t$ are disjoint. Thus, at least $2t-f(1)$ number of the intervals $I_{-t},\cdots ,I_{-2},I_{-1},I_1,I_2,\cdots I_t$ are empty of a center. But when $I_i$ (similarly $I_{-i}$) is empty of a center, this means the total cost for $B_i$ (similarly $B_{-i}$) is at least $2^{2t-2i}\cdot (2^{i-3})^2)=2^{2t-6}$. Therefore, the total cost for any preclustering of B into f(1) preclusters is at least $(2t-f(1))\cdot 2^{2t-6}$. For a $(f(1),\varepsilon )$-preclustering we should have

$$\begin{aligned} (2t-f(1))\cdot 2^{2t-6} \leqslant \varepsilon \cdot 2t\cdot 2^{2t-2} \end{aligned}$$

which concludes $f(1)=\varOmega (t)$.

One can easily generalize this example to any ${\mathbb {R}}^d$ by considering the balls with diameters as the intervals in B introduced above.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abam, M.A., de Berg, M., Farahzad, S. et al. Preclustering Algorithms for Imprecise Points. Algorithmica 84, 1467–1489 (2022). https://doi.org/10.1007/s00453-022-00929-9

Download citation

Received: 20 July 2020
Accepted: 30 December 2021
Published: 09 February 2022
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00453-022-00929-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Preclustering Algorithms for Imprecise Points

Abstract

Access this article

Similar content being viewed by others

Separability of Imprecise Points

One-Dimensional k-Center on Uncertain Data

Improvements on approximation algorithms for clustering probabilistic data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Generalizations of Lemmas 3 and 4

Lemma 13

Proof

Lemma 14

Proof

B Proof of Theorem 9 for the k-Means Problem

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preclustering Algorithms for Imprecise Points

Abstract

Access this article

Similar content being viewed by others

Separability of Imprecise Points

One-Dimensional k-Center on Uncertain Data

Improvements on approximation algorithms for clustering probabilistic data

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

A Generalizations of Lemmas 3 and 4

Lemma 13

Proof

Lemma 14

Proof

B Proof of Theorem 9 for the k-Means Problem

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation