Skip to main content

A technique for obtaining true approximations for k-center with covering constraints

Abstract

There has been a recent surge of interest in incorporating fairness aspects into classical clustering problems. Two recently introduced variants of the k-Center problem in this spirit are Colorful k-Center, introduced by Bandyapadhyay, Inamdar, Pai, and Varadarajan, and lottery models, such as the Fair Robust k-Center problem introduced by Harris, Pensyl, Srinivasan, and Trinh. To address fairness aspects, these models, compared to traditional k-Center, include additional covering constraints. Prior approximation results for these models require to relax some of the normally hard constraints, like the number of centers to be opened or the involved covering constraints, and therefore, only obtain constant-factor pseudo-approximations. In this paper, we introduce a new approach to deal with such covering constraints that leads to (true) approximations, including a 4-approximation for Colorful k-Center with constantly many colors—settling an open question raised by Bandyapadhyay, Inamdar, Pai, and Varadarajan—and a 4-approximation for Fair Robust k-Center, for which the existence of a (true) constant-factor approximation was also open. We complement our results by showing that if one allows an unbounded number of colors, then Colorful k-Center admits no approximation algorithm with finite approximation guarantee, assuming that \(\mathtt {P}\ne \mathtt {NP}\). Moreover, under the Exponential Time Hypothesis, the problem is inapproximable if the number of colors grows faster than logarithmic in the size of the ground set.

Introduction

Along with k-Median and k-Means, k-Center is one of the most fundamental and heavily studied clustering problems. In k-Center, we are given a finite metric space (Xd) and an integer \(k\in [|X|]:=\{1,\dots , |X|\}\), and the task is to find a set \(C\subseteq X\) with \(|C|\le k\) minimizing the maximum distance of any point in X to its closest point in C. Equivalently, the problem can be phrased as covering X with k balls of radius as small as possible, i.e., finding the smallest radius \(r\in \mathbb {R}_{\ge 0}\) together with a set \(C\subseteq X\) with \(|C|\le k\) such that \(X = B(C,r) :=\bigcup _{c\in C}B(c,r)\), where \(B(c,r):=\{u\in X: d(c,u)\le r\}\) is the ball of radius r around c.

k-Center, like most clustering problems, is computationally hard; actually it is \(\mathtt {NP}\)-hard to approximate to within any constant below 2 [21]. On the positive side, various 2-approximations [15, 19] have been found, and thus, its approximability is settled. Many variations of k-Center have been studied, most of which are based on generalizations along one of the following two main axes:

  1. (i)

    which sets of centers can be selected, and

  2. (ii)

    which sets of points of X need to be covered.

The most prominent variations along (i) are variations where the set of centers is required to be in some down-closed family \(\mathcal {F}\subseteq 2^X\). For example, if centers have non-negative opening costs and there is a global budget for opening centers, Knapsack Center is obtained. If \(\mathcal {F}\) is the set of independent sets of a matroid, the problem is known as Matroid Center. The best-known problem type linked to (ii) is Robust k-Center. Here, an integer \(m\in [|X|]\) is given, and one only needs to cover any m points of X with k balls of radius as small as possible. Research on k-Center variants along one or both of these axes has been very active and fruitful, see, e.g., [8, 10, 11, 20]. In particular, recent work of Chakrabarty and Negahbani [9] presents an elegant and unifying framework for designing best possible approximation algorithms for all above-mentioned variants.

All the above variants have in common that there is a single covering requirement; either all of X needs to be covered or a subset of it. Moreover, they come with different kinds of packing constraints on the centers to be opened as in Knapsack or Matroid Center. However, the desire to address fairness in clustering, which has received significant attention recently, naturally leads to multiple covering constraints. Here, existing techniques only lead to constant-factor pseudo-approximations that violate at least one constraint, like the number of centers to be opened. In this work, we present techniques for obtaining (true) approximations for two recent fairness-inspired generalizations of k-Center along axis (ii), namely

  1. (i)

    \(\gamma \)-Colorful k-Center, as introduced by Bandyapadhyay et al. [3], and

  2. (ii)

    Fair Robust k-Center, a lottery model introduced by Harris et al. [18].

\(\gamma \)-Colorful k-Center (\(\upgamma \mathrm {C k C}\)) is a fairness-inspired k-Center model imposing covering constraints on subgroups. It is formally defined as follows.

Definition 1

(\(\upgamma \)-Colorful k-Center (\(\upgamma \mathrm {C k C}\)) [3]) Let \(\gamma ,k\in \mathbb {Z}_{\ge 1}\), (Xd) be a finite metric space, \(X_\ell \subseteq X\) and \(m_\ell \in \mathbb {Z}_{\ge 0}\) for \(\ell \in [\gamma ]\). The \(\gamma \)-Colorful k-Center problem (\(\upgamma \mathrm {C k C}\)) asks to find the smallest radius \(r\in \mathbb {R}_{\ge 0}\) together with centers \(C\subseteq X\), \(|C|\le k\), such that

$$\begin{aligned} |B(C,r)\cap X_\ell | \ge m_\ell \quad \forall \ell \in [\gamma ]. \end{aligned}$$

Such a set of centers C is called a \(\upgamma \mathrm {C k C}\) solution of radius r.Footnote 1

We clarify that, unless explicitly stated otherwise, the number \(\gamma \) in the above definition is assumed to be part of the input.

The choice of name for the problem stems from interpreting each set \(X_\ell \) for \(\ell \in [\gamma ]\) as a color assigned to the elements of \(X_{\ell }\). In particular, an element can have multiple colors or no color. In words, the task is to open k balls of smallest possible radius such that, for each color \(\ell \in [\gamma ]\), at least \(m_\ell \) points of color \(\ell \) are covered. Hence, for \(\gamma =1\), we recover the Robust k-Center problem.

We briefly contrast \(\upgamma \mathrm {C k C}\) with related fairness models. A related class of models that has received significant attention also assumes that the ground set is colored, but requires that the ratio between colors within each cluster is approximately the same as the global ratio between colors. Such variants have been considered for k-Median, k-Means, and k-Center, e.g., see [2, 4, 5, 12, 28] and references therein. \(\upgamma \mathrm {C k C}\) differentiates itself from the above notion of fairness by not requiring a per-cluster guarantee, but a global fairness guarantee. More precisely, each color can be thought of as representing a certain group of people (demographic), and a global covering requirement is given per demographic. Also notice the difference with the well-known Robust k-Center problem, where a feasible solution might, potentially, completely ignore a certain subgroup, resulting in a heavily unfair treatment. \(\upgamma \mathrm {C k C}\) addresses this issue.

The presence of multiple covering constraints in \(\upgamma \mathrm {C k C}\), imposed by the colors, hinders the use of classical k-Center clustering techniques, which, as mentioned above, have mostly been developed for packing constraints on the centers to be opened. An elegant first step was done by Bandyapadhyay et al. [3]. They exploit sparsity of a well-chosen LP (in a similar spirit as in [18]) to obtain the following pseudo-approximation for \(\upgamma \mathrm {C k C}\): they efficiently compute a solution of twice the optimal radius by opening at most \(k+\gamma -1\) centers. Hence, up to \(\gamma -1\) more centers than allowed may have to be opened. Moreover, [3] shows that in the Euclidean plane, a significantly more involved extension of this technique allows for obtaining a true \((17+\varepsilon )\)-approximation for \(\gamma =O(1)\). Unfortunately, this approach is heavily problem-tailored and does not even extend to 3-dimensional Euclidean spaces. This naturally leads to the main open question raised in [3]:

Does \(\upgamma \mathrm {C k C}\) with \(\gamma =O(1)\) admit an O(1)-approximation, for any finite metric?

Here, we introduce a new approach that answers this question affirmatively.

Together with additional ingredients, our approach also applies to Fair Robust k-Center, which is a natural lottery model introduced by Harris et al. [18]. We introduce the following generalization thereof that can be handled with our techniques, which we name Fair \(\gamma \)-Colorful k-Center problem (Fair \(\upgamma \mathrm {C k C}\)). (The Fair Robust k-Center problem, as introduced in [18], corresponds to \(\gamma =1\).)

Definition 2

(Fair \(\gamma \)-Colorful k-Center problem (Fair \(\upgamma \mathrm {C k C}\))) Given is a \(\upgamma \mathrm {C k C}\) instance on a finite metric space (Xd) together with a vector \(p\in [0,1]^X\). The goal is to find the smallest radius \(r\in \mathbb {R}_{\ge 0}\), for which there exists a distribution \(\mathcal {H}\) over feasible \(\upgamma \mathrm {C k C}\) solutions of radius r such that

$$\begin{aligned} \Pr _{C\sim \mathcal {H}} [u \in B(C,r)] \ge p(u) \quad \forall u\in X. \end{aligned}$$

An algorithm for this problem should return a radius r along with an efficient procedure for sampling a random feasible \(\upgamma \mathrm {C k C}\) solution of radius r.

We note that if there exists a distribution \(\mathcal {H}\) with the desired properties for some radius r, then there exists a distribution of polynomial support with the desired properties (due to sparsity of the natural LP corresponding to the distribution, described in Sect. 3). This, in particular, implies that the corresponding decision problem is in \(\mathtt {NP}\).

Fair \(\upgamma \mathrm {C k C}\) is a generalization of   \(\upgamma \mathrm {C k C}\), where each element \(u\in X\) needs to be covered with a prescribed probability p(u). The Fair Robust k-Center problem, i.e., Fair \(\upgamma \mathrm {C k C}\) with \(\gamma =1\), is indeed a fairness-inspired generalization of Robust k-Center, since Robust k-Center is obtained by setting \(p(u)=0\) for all \(u\in X\). One example setting where the additional fairness aspect of Fair \(\upgamma \mathrm {C k C}\) compared to \(\upgamma \mathrm {C k C}\) is nicely illustrated, is when k-Center problems have to be solved repeatedly on the same metric space. The introduction of the probability requirements p allows for obtaining a distribution to draw from that needs to consider all elements of X (as prescribed by p), whereas classical Robust k-Center likely ignores a group of badly-placed elements. We refer to Harris et al. [18] for further motivation of the problem setting. They also discuss the Knapsack and Matroid Center problem under the same notion of fairness.

For Fair Robust k-Center, [18] presents a 2-pseudo-approximation that slightly violates both the number of points to be covered and the probability of covering each point. More precisely, for any constant \(\varepsilon >0\), only a \((1-\varepsilon )\)-fraction of the required number of elements are covered, and element \(u\in X\) is covered only with probability \((1-\varepsilon ) p(u)\) instead of p(u). It was left open in [18] whether a true approximation may exist for Fair Robust k-Center.

Our results

Our main contribution is a method to obtain 4-approximations for variants of k-Center with unary encoded covering constraints on the points to be covered. We illustrate our technique in the context of \(\upgamma \mathrm {C k C}\), affirmatively resolving the open question of Bandyapadhyay et al. [3] about the existence of an O(1)-approximation for constantly many colors (without restrictions on the underlying metric space).

Theorem 1

There is a 4-approximation algorithm for \(\upgamma \mathrm {C k C}\) running in time \(|X|^{O(\gamma )}\).

In a second step we extend and generalize our technique to Fair \(\upgamma \mathrm {C k C}\), which, as mentioned, is a generalization of \(\upgamma \mathrm {C k C}\). We show that Fair \(\upgamma \mathrm {C k C}\) admits an O(1)-approximation, which neither violates covering nor probabilistic constraints.

Theorem 2

There is a 4-approximation algorithm for Fair \(\upgamma \mathrm {C k C}\) running in time \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\), where L is the encoding length of the input.

We recall that in our definition of \(\upgamma \mathrm {C k C}\), the number of colors \(\gamma \) is part of the input. In the following, we complete our results above—which lead to efficient algorithms only for constant \(\gamma \)—by showing inapproximability of \(\upgamma \mathrm {C k C}\) when \(\gamma \) is not bounded. This holds even on the real line (1-dimensional Euclidean space).

Theorem 3

It is \(\mathtt {NP}\)-hard to decide whether \(\upgamma \mathrm {C k C}\) on the real line admits a solution of radius 0. Moreover, unless the Exponential Time Hypothesis fails, for any function \(f:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {Z}_{\ge 0}\) with \(f(n) = \omega (\log n)\), no polynomial-time algorithm can distinguish whether \(\upgamma \mathrm {C k C}\) on the real line with \(\gamma \le f(|X|)\) admits a solution of radius 0.

Hence, assuming the Exponential Time Hypothesis, there is no polynomial-time approximation algorithm for \(\upgamma \mathrm {C k C}\) if the number of colors grows faster than logarithmic in the size of the ground set. Notice that, for a logarithmic number of colors, our procedures run in quasi-polynomial time.

Finally, we extend the hardness implied by Theorem 3 to bi-criteria algorithms that are allowed to open more than k centers. An \((\alpha ,\beta )\) bi-criteria algorithm for \(\upgamma \mathrm {C k C}\), for \(\alpha , \beta \ge 1\), is an algorithm that returns a solution that picks at most \(\alpha k\) centers and its radius is at most \(\beta r\), where r is the radius of an optimal solution with k centers. More precisely, we prove the following theorem.

Theorem 4

There exists a constant \(c > 0\), such that it is \(\mathtt {NP}\)-hard to decide whether \(\upgamma \mathrm {C k C}\) on the real line admits a solution of radius 0, even if we are allowed to violate the number of open centers by a factor of \(c \log |X|\).

Notice that, unless \(\mathtt {P}=\mathtt {NP}\), the above theorem rules out the existence of a \((c \log |X|, \beta )\) bi-criteria algorithm for \(\upgamma \mathrm {C k C}\) for any value of \(\beta \).

Note: In an independent work, Jia, Sheth, and Svensson [23], also made advances on \(\upgamma \mathrm {C k C}\). We briefly highlight some main differences. In particular, they gave a 3-approximation algorithm for \(\upgamma \mathrm {C k C}\) running in time \(|X|^{O(\gamma ^2)}\). Hence, this algorithm provides a better approximation guarantee than our 4-approximation for \(\upgamma \mathrm {C k C}\), though with a slower running time. Moreover, contrary to [23], we also show that our techniques extend to Fair \(\upgamma \mathrm {C k C}\) (Theorem 2) and obtain the hardness results highlighted in Theorems 3 and 4.

Outline of main technical contributions and paper organization

We introduce two main technical ingredients. The first is a method to deal with additional covering constraints in k-Center problems. We showcase this method in the context of \(\upgamma \mathrm {C k C}\), which leads to Theorem 1. For this, we combine polyhedral sparsity-based arguments as used by Bandyapadhyay et al. [3], which by themselves only lead to pseudo-approximations, with dynamic programming to design a round-or-cut approach. Round-or-cut approaches, first used by Carr et al. [7], leverage the ellipsoid method in a clever way. In each ellipsoid iteration they either separate the current point from a well-defined polyhedron P, or round the current point to a good solution. The rounding step may happen even if the current point is not in P. Round-or-cut methods have found applications in numerous problem settings (see, e.g., [1, 9, 16, 24,25,26,27]). The way we employ round-or-cut is inspired by a powerful round-or-cut approach of Chakrabarty and Negahbani [9] also developed in the context of k-Center. However, their approach is not applicable to k-Center problems as soon as multiple covering constraints exist, like in \(\upgamma \mathrm {C k C}\); see Appendix B for more details.

Our second technical contribution first employs LP duality to transform lottery-type models, like Fair \(\upgamma \mathrm {C k C}\), into an auxiliary problem that corresponds to a weighted version of k-Center with covering constraints. We then show how a certain type of approximate separation over the dual is possible, by leveraging the techniques we introduced in the context of \(\upgamma \mathrm {C k C}\), leading to a 4-approximation.

Even though Theorem 2 is a strictly stronger statement than Theorem 1, we first prove Theorem 1 in Sect. 2, because it allows us to give a significantly cleaner presentation of some of our main technical contributions. In Sect. 3, we then focus on the additional techniques needed to deal with Fair \(\upgamma \mathrm {C k C}\), by reducing it to a problem that can be tackled with the techniques introduced in Sect. 2. Finally, in Sect. 4, we discuss the hardness results stated in Theorems 3 and 4.

A 4-approximation for \(\upgamma \mathrm {C k C}\)B with running time \(\varvec{|X|^{O(\gamma )}}\)

In this section, we prove Theorem 1, which implies a polynomial-time 4-approximation algorithm for \(\upgamma \mathrm {C k C}\) with constantly many colors. We assume \(\gamma \ge 2\); notice that \(\gamma =1\) corresponds to Robust k-Center, for which a (tight) polynomial-time 2-approximation is known [8, 18]. Moreover, we assume that \(\gamma < k\), since otherwise, we can simply enumerate over all subsets of X of size k, which leads to an exact algorithm with running time \(|X|^{O(k)} \le |X|^{O(\gamma )}\). Thus, from now on, we have that \(2 \le \gamma \le k - 1\).

We present a procedure that for any \(r\in \mathbb {R}_{\ge 0}\) returns a solution of radius 4r if a solution of radius r exists, and runs in time \(|X|^{O(\gamma )}\). This implies Theorem 1 because the optimal radius is a distance between two points. Hence, we can run the procedure for all possible pairwise distances r between points in X (or, alternatively, do binary search on the set of pairwise distances in order to speed up the algorithm) and return the best solution found. Thus, we fix \(r\in \mathbb {R}_{\ge 0}\) in what follows. We denote by \(\mathcal {P}\) the following canonical relaxation of \(\upgamma \mathrm {C k C}\) with radius r:

(1)

Integral points \((x,y)\in \mathcal {P}\) correspond to solutions of radius r, where x and y are characteristic vectors indicating the points that are covered and the centers that are opened, respectively. We denote the integer hull of \(\mathcal {P}\) by \(\mathcal {P}_{I}:={{\,\mathrm{conv}\,}}\left( \mathcal {P}\cap (\{0,1\}^X \times \{0,1\}^X )\right) \) .

Our algorithm is based on the round-or-cut framework, first used in [7]. The main building block is a procedure that rounds a point \((x,y)\in \mathcal {P}\) to a radius 4r solution under certain conditions. It will turn out that these conditions are always satisfied if \((x,y) \in \mathcal {P}_{I}\). If they are not satisfied, then we can prove that \((x,y) \notin \mathcal {P}_{I}\) and generate in time \(|X|^{O(\gamma )}\) a hyperplane separating (xy) from \(\mathcal {P}_{I}\). This separation step now becomes an iteration of the ellipsoid method, employed to find a point in \(\mathcal {P}_{I}\), and we continue with a new candidate point (xy). Schematically, the whole process is described in Fig. 1.

Fig. 1
figure1

An iteration of the ellipsoid method

On a high level, we realize our round-or-cut procedure as follows. First, we check whether \((x,y) \in \mathcal {P}\) and return a violated constraint if this is not the case. If \((x,y)\in \mathcal {P}\), we partition the metric space, based on a natural greedy heuristic introduced by Harris et al. [18]. This gives a set of centers \(S=\{s_1,\ldots , s_q\}\subseteq X\) with corresponding clusters \(\mathcal {D}=\{D_1, \ldots , D_q\}\subseteq 2^X\). We now exploit a technique by Bandyapadhyay et al. [3], which implies that if \(y(B(S,r)) \le k - \gamma + 1\), then one can leverage sparsity arguments in a simplified LP to obtain a radius 4r solution that picks centers only within S. (For brevity, we use the shorthand \(y(W):=\sum _{u\in W} y(w)\) for any finite set W and vector \(y\in \mathbb {R}^W\); in particular, \(y(B(S,r))=\sum _{v\in B(S,r)} y(v)\).) We then turn to the case where \(y(B(S, r)) > k - \gamma + 1\). At this point, we show that one can efficiently check whether there exists a solution of radius 2r that opens at most \(k - (k - \gamma + 2) = \gamma - 2\) centers outside of S. This is achieved by guessing the centers outside of S (of which there are at most \(\gamma -2\) many, as noted) and using dynamic programming to find the remaining centers in S. If no such radius 2r solution exists, we argue that any solution of radius r has at most \(k-\gamma +1\) centers in B(Sr), proving that \(y(B(S, r)) \le k - \gamma + 1\) is an inequality separating (xy) from \(\mathcal {P}_{I}\).

We now give a formal treatment of each step of this algorithm, which is schematically described in Fig. 1. Given a point \((x,y)\in \mathbb {R}^X \times \mathbb {R}^X\), we first check whether \((x,y)\in \mathcal {P}\), and, if not, return a violated constraint of \(\mathcal {P}\). Such a constraint separates (xy) from \(\mathcal {P}_{I}\) because \(\mathcal {P}_{I}\subseteq \mathcal {P}\). Hence, we may assume that \((x, y) \in \mathcal {P}\).

We now use a partitioning technique by Harris et al. [18] that, given \((x, y) \in \mathcal {P}\), allows for obtaining what we call an (xy)-good partition \((S,\mathcal {D})\), defined as follows.

Definition 3

((xy)-good partition) Let \((x,y) \in \mathcal {P}\). A tuple \((S,\mathcal {D})\), where the family \(\mathcal {D} = \{D_1, \ldots , D_q\}\) partitions X and \(S = \{s_1, \ldots , s_q\}\subseteq X\) with \(s_i\in D_i\) for \(i\in [q]\), is an (xy)-good partition if:

  1. (i)

    \(d(s_i, s_j) > 4r\) for all \(i,j\in [q], i\ne j\),

  2. (ii)

    \(D_i \subseteq B(s_i, 4r)\) for all \(i \in [q]\), and

  3. (iii)

    \(y(B(s_i,r)) \ge x(u)\) for all \(i\in [q]\) and for all \(u\in D_i\).

The partitioning procedure of [18] was originally introduced for Robust k-Center and naturally extends to \(\upgamma \mathrm {C k C}\) (see [3]). For completeness, we describe it in Algorithm 1. Contrary to prior procedures, we compute an (xy)-good partition whose centers have pairwise distances of strictly more than 4r (instead of 2r as in prior work). This large separation avoids overlap of radius 2r balls around centers in S, and allows us to use dynamic programming (DP) to build a radius 2r solution with centers in S under certain conditions. However, it is also the reason why we get a 4-approximation if the DP approach cannot be applied.

Lemma 1

([3, 18]) For \((x,y) \in \mathcal {P}\), Algorithm 1 computes an (xy)-good partition \((S, \mathcal {D})\) in polynomial time.

For completeness, we present the proof of the above lemma.

Proof of Lemma 1

By construction, the first two properties of the definition of an (xy)-good partition are trivially satisfied by the generated partition \((S, \mathcal {D})\). We now turn to the third property. For each point \(u \in D_i\), by the greedy criterion we have \(x(u) \le x(s_i)\). Since \((x,y) \in \mathcal {P}\), we also have \(x(s_i) \le y(B(s_i, r))\), implying the statement. \(\square \)

The following theorem follows from the results in [3].

Theorem 5

([3]) Let \((x, y) \in \mathcal {P}\) and \((S,\mathcal {D})\) be an (xy)-good partition. Then, if \(y(B(S, r)) \le k - \gamma + 1\), a solution of radius 4r can be found in polynomial time.

For completeness, we provide in Appendix A a proof of a slightly stronger version of Theorem 5, namely Theorem 8, which we reuse later in a more general context. Theorem 8 easily follows by the same sparsity argument used in [3].

We are left with the case \(y(B(S, r)) > k - \gamma + 1\). In this case we present a procedure that either returns a solution of radius 2r or, if it fails to do so, we show that every point \((\overline{x},\overline{y})\in \mathcal {P}_{I}\) must fulfill \(\overline{y}(B(S, r)) \le k - \gamma + 1\); hence, this is an inequality separating (xy) from \(\mathcal {P}_I\).

To show the above, we assume that \((x,y)\in \mathcal {P}_I\) holds and provide a procedure obtaining a solution of radius 2r. (Notice that we cannot check whether \((x,y)\in \mathcal {P}_I\), and even if we knew that \((x,y)\in \mathcal {P}_I\), we still need a procedure transforming the possibly fractional point (xy) to an actual (integral) solution.) Note that if \((x,y) \in \mathcal {P}_{I}\), then there must exist a solution \(C_1\subseteq X\) of radius r with \(|C_1\cap B(S, r)| > k - \gamma + 1\). In particular, we must have \(|C_1 \setminus B(S,r)| \le \gamma - 2\). We observe that if such a solution \(C_1\) exists, then there must be a solution \(C_2\) of radius 2r which has at most \(\gamma - 2\) centers outside of S. This is formalized in the following lemma.

Lemma 2

Let \(S\subseteq X\) with \(d(s,s')>4r\) for all \(s,s'\in S\) with \(s \ne s'\), and \(\tau \in \{0, \ldots , k-1\}\). If there is a radius r solution \(C_1\) with \(|C_1\cap B(S,r)| > \tau \), then there is a radius 2r solution \(C_2\) with \(|C_2\,{\setminus }\, S| \le k - \tau - 1\).

Proof

Assume there is a solution \(C_1\) of radius r with \(|C_1\cap B(S,r)| > \tau \). Let \(A = C_1 \cap B(S,r)\). For each \(p \in A\), let \(\phi (p) \in S\) be the unique point in S such that \(p \in B(\phi (p), r)\); \(\phi (p)\) is well defined because \(d(s, s') > 4r\) for every \(s \ne s' \in S\). Thus, \(|\phi (A)| \le |A|\), where \(\phi (A) :=\{\phi (p):\,p \in A\}\).

Let \(C_2 = \phi (A) \cup (C_1 \setminus A)\). We have \(|C_2| = |\phi (A)| + |C_1 \setminus A| \le |A| + |C_1 \setminus A| \le k\). Moreover, as \(d(p, \phi (p)) \le r\) for every \(p \in A\), we have that \(B(C_1, r) \subseteq B(C_2, 2r)\). Thus, \(C_2\) is a feasible solution of radius 2r. Finally, by construction, \(|C_2 \setminus S| = |C_1 \setminus B(S,r)| \le k - \tau - 1\). \(\square \)

So, we have now proved that if \(y(B(S,r)) > k-\gamma +1\) and \((x,y)\in \mathcal {P}_{I}\), then there is a solution \(C_2\) of radius 2r with \(|C_2\setminus S|\le \gamma - 2\). The motivation for considering solutions of radius 2r with all centers in S except for constantly many (if \(\gamma =O(1)\)) is that such solutions can be found efficiently via dynamic programming. This is possible because the centers in S are separated by distances strictly larger than 4r, which implies that radius 2r balls centered at points in S do not overlap. Hence, there are no interactions between such balls. This is formalized below.

Lemma 3

Let \(S\subseteq X\) with \(d(s,s')>4r\) for all \(s,s'\in S\) with \(s \ne s'\), and \(\beta \in \mathbb {Z}_{\ge 0}\). If a radius 2r solution \(C\subseteq X\) with \(|C\setminus S|\le \beta \) exists, then we can find such a solution in time \(|X|^{O(\beta + \gamma )}\).

Proof

Suppose there is a solution \(C \subseteq X\) of radius 2r with \(|C \setminus S| \le \beta \). The algorithm has two components. We first guess the set \(Q:=C\setminus S\). Because \(|Q| \le \beta \), there are \(|X|^{O(\beta )}\) choices. Given Q, it remains to select at most \(k-|Q|\) centers \(W\subseteq S\) to fulfill the color requirements. Note that for any \(W\subseteq S\), the number of points of color \(\ell \in [\gamma ]\) that B(W, 2r) covers on top of those already covered by B(Q, 2r) is \(\left| (B(W,2r)\setminus B(Q,2r))\cap X_\ell \right| = \sum _{w\in W} \left| \left( B(w,2r) \setminus B(Q,2r) \right) \cap X_\ell \right| , \) where equality holds because centers in W are separated by distances strictly larger than 4r, and thus B(W, 2r) is the disjoint union of the sets B(w, 2r) for \(w\in W\). Hence, the task of finding a set \(W\subseteq S\) with \(|W|\le k-|Q|\) such that \(Q\cup W\) is a solution of radius 2r can be phrased as finding a feasible solution to the following binary program:

(2)

The above binary program can be easily solved through standard dynamic programming techniques in \(|X|^{O(\gamma )}\) time, because the coefficients are small. For completeness, we show in Appendix A how this can be done for a slightly more general problem (see Theorem 9), which we will reuse later on.Footnote 2 As the dynamic program is run for \(|X|^{O(\beta )}\) many guesses of Q, we obtain an overall running time of \(|X|^{O(\beta +\gamma )}\), as claimed. \(\square \)

This completes the last ingredient for an iteration of our round-or-cut approach as shown in Fig. 1. In summary, assuming \(y(B(S,r)) > k-\gamma +1\) (for otherwise Theorem 5 leads to a solution of radius 4r) we use Lemma 3 (with \(\beta =\gamma -2\)) to check whether there is a radius 2r solution \(C_2\) with \(|C_2\setminus S|\le \gamma -2\). This requires \(|X|^{O(\gamma )}\) time. If this is the case, we are done. If not, the contrapositive of Lemma 2 (with \(\tau =k - \gamma + 1\)) implies that every radius r solution \(C_1\) fulfills \(|C_1\cap B(S,r)| \le k - \gamma + 1\). Hence, every point \((\overline{x},\overline{y})\in \mathcal {P}_{I}\) satisfies \(\overline{y}(B(S,r)) \le k - \gamma +1\). However, this constraint is violated by (xy), and so it separates (xy) from \(\mathcal {P}_{I}\). Thus, we proved that the process described in Fig. 1 is a valid round-or-cut procedure that runs in time \(|X|^{O(\gamma )}\).

Corollary 1

There is an algorithm that, given a point \((x,y)\in \mathbb {R}^X \times \mathbb {R}^X\), either returns a \(\upgamma \mathrm {C k C}\) solution of radius 4r or an inequality separating (xy) from \(\mathcal {P}_{I}\). The running time of the algorithm is \(|X|^{O(\gamma )}\).

We can now prove the main theorem.

Proof of Theorem 1

We run the ellipsoid method on \(\mathcal {P}_{I}\) for each of the \(O(|X|^2)\) candidate radii r. For each r, the number of ellipsoid iterations is polynomially bounded as the separating hyperplanes that are produced by the algorithm have encoding length at most O(|X|) (see Theorem 6.4.9 of [17]). To see this, note that all generated hyperplanes are either inequalities defining \(\mathcal {P}\) or inequalities of the form \(y(B(S,r))\le k-\gamma +1\). For the correct guess of r, \(\mathcal {P}_{I}\) is non-empty and the algorithm terminates by returning a radius 4r solution. Hence, if we return the best solution among those computed for all guesses of r, we have a 4-approximation, and the total running time is \({{\,\mathrm{poly}\,}}(|X|) \cdot |X|^{O(\gamma )} = |X|^{O(\gamma )}\). \(\square \)

The lottery model of Harris et al. [18]

Our main tool to solve the lottery model of Harris et al. [18] is a reduction to a certain type of weighted k-center problem. A key step of this reduction is to transform the problem through the use of linear duality. In Subsect. 3.1, we first present this reduction before proving in Subsect. 3.2 our algorithmic result for the above-referred version of a weighted k-center problem.

Reduction to weighted version of k-center

Let (Xd) be a Fair \(\upgamma \mathrm {C k C}\) instance, and let \(\mathcal {F}(r)\) be the family of sets of centers satisfying the covering requirements with radius r, i.e.,

$$\begin{aligned} \mathcal {F}(r):=\big \{C \subseteq X \,\big \vert \, |C| \le k \text { and } |B(C,r) \cap X_\ell | \ge m_\ell \;\;\forall \ell \in [\gamma ]\big \} . \end{aligned}$$

Note that a radius r solution for Fair \(\upgamma \mathrm {C k C}\) defines a distribution over the sets in \(\mathcal {F}(r)\). Given r, such a distribution exists if and only if the following (exponential-size) linear program \(\text {PLP}(r)\) is feasible (with \(\text {DLP}(r)\) being its dual):

The dual problem \(\text {DLP}(r)\) can naturally be interpreted as a packing problem with packing constraints imposed by \(\upgamma \mathrm {C k C}\)-solutions. However, we will mostly be interested in approximately separating over \(\text {DLP}(r)\). This will turn out to reduce to a weighted version of \(\upgamma \mathrm {C k C}\) as we highlight later.

Clearly, if \(\text {PLP}(r)\) is feasible, then its optimal value is 0. As mentioned in the introduction, it is also easy to see that if \(\text {PLP}(r)\) is feasible, then it has a feasible solution with polynomial support (since the number of non-trivial constraints is \(|X| + 1\)).

We will again assume that \(\gamma < k\). If \(\gamma \ge k\), then for each fixed radius r, we solve \(\text {PLP}(r)\) in time \({{\,\mathrm{poly}\,}}(L)\cdot |X|^{O(k)} \le {{\,\mathrm{poly}\,}}(L)\cdot |X|^{O(\gamma )}\), where L is the encoding length of the input. If \(\text {PLP}(r)\) is infeasible, then the radius r is too small. Otherwise, we compute a feasible extreme point solution to \(\text {PLP}(r)\) which corresponds to a distribution with support size \({{\,\mathrm{poly}\,}}(|X|)\). Hence, by applying binary search over all candidate radii, which are the \(O(|X|^2)\) pairwise distances between points in X, we can compute an optimal distribution for the smallest possible radius in \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\) time. Thus, from now on, we assume that \(1 \le \gamma < k\).

Observe that, for any \(r \ge 0\), \(\text {DLP}(r)\) always has a feasible solution (the zero vector) of value 0. Thus, by strong duality, \(\text {PLP}(r)\) is feasible if and only if the optimal value of \(\text {DLP}(r)\) is 0. Note that \(\text {DLP}(r)\) is scale-invariant, meaning that if \((\alpha , \mu )\) is feasible for \(\text {DLP}(r)\) then so is \((t\alpha , t\mu )\) for \(t\in \mathbb {R}_{\ge 0}\). This implies that \(\text {DLP}(r)\) has a solution of strictly positive objective value if and only if \(\text {DLP}(r)\) is unbounded. We thus define the following polyhedron \(\mathcal {Q}(r)\), which contains all solutions of \(\text {DLP}(r)\) of value at least 1: As discussed, the following statement is a direct consequence of strong duality of linear programming.

Lemma 4

\(\mathcal {Q}(r)\) is empty if and only if PLP(r) is feasible.

The main lemma that allows us to obtain our result is the following. It guarantees the existence of an algorithm approximately solving a certain weighted k-center problem, where clients are weighted by \(\alpha \in \mathbb {Q}^X_{\ge 0}\). Before proving the lemma in Subsect. 3.2, we show that it implies Theorem 2.

Lemma 5

There is an algorithm that, given a point \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^{X} \times \mathbb {Q}\) satisfying \(\sum _{u\in X}p(u)\alpha (u) \ge \mu + 1\) and a radius \(r \ge 0\), either certifies that \((\alpha , \mu ) \in \mathcal {Q}(r)\), or outputs a set \(C\in \mathcal {F}(4r)\) with \(\sum _{u\in B(C,4r)} \alpha (u) > \mu \). The running time of the algorithm is \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\), where L is the encoding length of the input.

In words, Lemma 5 either certifies \((\alpha ,\mu )\in \mathcal {Q}(r)\) or returns a hyperplane separating \((\alpha ,\mu )\) from \(\mathcal {Q}(4r)\). Its proof leverages techniques introduced in Sect. 2, and we present it in Subsect. 3.2. Using Lemma 5, we can now prove Theorem 2.

Proof of Theorem 2

As noted, there are polynomially many choices for the radius r, for each of which we run the ellipsoid method to check emptiness of \(\mathcal {Q}(4r)\) as follows. Whenever there is a call to the separation oracle for a point \((\alpha ,\mu )\in \mathbb {Q}^X \times \mathbb {Q}\), we first check whether \(\alpha \ge 0\) and \(\sum _{u\in X} p(u)\alpha (u) \ge \mu +1\). If one of these constraints is violated, we return it as separating hyperplane. Otherwise, we invoke the algorithm of Lemma 5. The algorithm either returns a constraint in the inequality description of \(\mathcal {Q}(4r)\) violated by \((\alpha ,\mu )\), which solves the separation problem, or certifies \((\alpha ,\mu )\in \mathcal {Q}(r)\). If, at any iteration of the ellipsoid method, the separation oracle is called for a point \((\alpha ,\mu )\) for which Lemma 5 certifies \((\alpha ,\mu ) \in \mathcal {Q}(r)\), then Lemma 4 implies \(\text {PLP}(r)\) is infeasible. Thus, there is no solution to the considered Fair \(\upgamma \mathrm {C k C}\) instance of radius r. Hence, consider from now on that the separation oracle always returns a separating hyperplane, in which case the ellipsoid method certifies that \(\mathcal {Q}(4r) = \emptyset \) as follows. Let \(\mathcal {H}\subseteq \mathcal {F}(4r)\) be the family of all sets \(C\in \mathcal {F}(4r)\) returned by Lemma 5 through calls to the separation oracle. Then, the following polyhedron: which clearly contains \(\mathcal {Q}(4r)\), is empty. As the encoding length of any constraint in the inequality description of \(\mathcal {Q}(4r)\) is polynomially bounded in the input, the ellipsoid method runs in polynomial time (see Theorem 6.4.9 of [17]). In particular, the number of calls to the separation oracle, and thus \(|\mathcal {H}|\), is polynomially bounded.

As \(\mathcal {Q}(4r) \subseteq \mathcal {Q}_{\mathcal {H}}(4r) = \emptyset \), Lemma 4 implies that PLP(4r) is feasible. More precisely, because \(Q_{\mathcal {H}}(4r)=\emptyset \), the linear program obtained from DLP(4r) by replacing \(\mathcal {F}(4r)\), which parameterizes the constraints in DLP(4r), by \(\mathcal {H}\), has optimal value equal to 0. Hence, its dual, which corresponds to PLP(4r) where we replace \(\mathcal {F}(4r)\) by \(\mathcal {H}\), is feasible. As this feasible linear program has polynomial size, because \(|\mathcal {H}|\) is polynomially bounded, we can solve it efficiently to obtain a distribution with the desired properties. Moreover, the total running time is \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\), where L is the encoding length of the input. \(\square \)

Proof of Lemma 5

The desired separation algorithm requires us to find a solution for a \(\upgamma \mathrm {C k C}\) instance with an extra covering constraint; the procedure of Sect. 2 generalizes to handle this extra constraint. We follow similar steps as in Fig. 1.

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^{X} \times \mathbb {Q}\) be a point satisfying \(\sum _{u\in X}p(u)a(u) \ge \mu + 1\), let \(r \ge 0\), and, moreover, let

Hence, to prove Lemma 5, we need to find a procedure that either certifies \(\mathcal {F}^{\alpha ,\mu }(r)=\emptyset \) or returns a set \(C\in \mathcal {F}^{\alpha ,\mu }(4r)\). To avoid technical complications later on due to the strict inequality in the definition of \(\mathcal {F}^{\alpha ,\mu }(r)\), we observe, using standard techniques, that one can efficiently compute a polynomially encoded \(\varepsilon >0\) to replace the inequality \(\sum _{u\in B(C,r)}\alpha (u) > \mu \) by \(\sum _{u\in B(C,r)} \alpha (u) \ge \mu + \epsilon \).

Lemma 6

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^X \times \mathbb {Q}\). Then one can efficiently compute an \(\varepsilon > 0\) with encoding length O(L), where L is the encoding length of \((\alpha ,\mu )\), such that the following holds: For any \(C\in \mathcal {F}(r)\), we have \(\sum _{u\in B(C,r)}\alpha (u) >\mu \) if and only if \(\sum _{u\in B(C,r)} \alpha (u) \ge \mu +\varepsilon \).

Proof

The tuple \((\alpha ,\mu )\) consists of \(|X|+1\) rationals , with \(p_i \in \mathbb {Z}\) and \(q_i \in \mathbb {Z}_{>0}\). Let \(\Pi = \prod _{i \in [N]} q_i\). Note that if \(\sum _{u\in B(C,r)} \alpha (u) > \mu \), then \(\sum _{u\in B(C,r)} \alpha (u) - \mu \ge \frac{1}{\Pi }\). Thus, we set . Moreover \(\log \Pi = \sum _{i \in [N]} \log q_i\), and so the encoding length of \(\varepsilon \) is O(L). \(\square \)

Let \(\mathcal {P}^{\alpha ,\mu }\) be the following modified relaxation of \(\upgamma \mathrm {C k C}\), defined for given \((\alpha , \mu ) \in \mathbb {Q}_{\ge 0}^X \times \mathbb {Q}\), and a corresponding \(\varepsilon > 0\) as per Lemma 6, where the polytope \(\mathcal {P}\) is defined for a fixed radius r, as in Sect. 2 (see (1)):

Let \(\mathcal {P}_{I}^{\alpha ,\mu }:={{\,\mathrm{conv}\,}}\left( \mathcal {P}^{\alpha ,\mu }\cap (\{0,1\}^X \times \{0,1\}^X)\right) \) be the integer hull of \(\mathcal {P}^{\alpha ,\mu }\). We now state the following straightforward observation, whose proof is an immediate consequence of the definitions of the corresponding polytopes and Lemma 6.

Observation 1

Let \((\alpha , \mu )\in \mathbb {Q}_{\ge 0}^X \times \mathbb {Q}\) be such that \(\sum _{u\in X} p(u)\alpha (u) \ge \mu + 1\) and \(\mathcal {P}_{I}^{\alpha ,\mu }=\emptyset \). Then \((\alpha , \mu )\in \mathcal {Q}(r)\).

The following lemma is a slightly modified version of Theorem 5, which is also a direct consequence of Theorem 8 given in Appendix A.

Lemma 7

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^X\times \mathbb {Q}\), let \((x, y) \in \mathcal {P}^{\alpha ,\mu }\), and let \((S, \mathcal {D})\) be an (xy)-good partition. If \(y(B(S, r)) \le k - \gamma \), a set \(C\in \mathcal {F}^{\alpha ,\mu }(4r)\) can be found in polynomial time.

If \(y(B(S,r)) \le k - \gamma \), then Lemma 7 leads to a set \(C \in \mathcal {F}(4r)\) that satisfies \(\sum _{u \in B(C,4r)} \alpha (u) > \mu \); this gives a constraint separating \((\alpha ,\mu )\) from \(\mathcal {Q}(4r)\).

It remains to consider the case \(y(B(S,r))>k-\gamma \). As in Sect. 2, we can either find a set \(C_2\in \mathcal {F}^{\alpha , \mu }(2r)\) or certify that every \(C_1\in \mathcal {F}^{\alpha ,\mu }(r)\) satisfies \(|C_1\cap B(S,r)|\le k-\gamma \).

Lemma 8

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^X\times \mathbb {Q}\), \(S\subseteq X\) with \(d(s,s')>4r\) for all \(s, s' \in S\) with \(s\ne s'\), and \(\tau \in \{0, \ldots , k-1\}\). If there is a set \(C_1\in \mathcal {F}^{\alpha ,\mu }(r)\) with \(|C_1\cap B(S,r)|> \tau \), then there is a set \(C_2\in \mathcal {F}^{\alpha ,\mu }(2r)\) with \(|C_2 \setminus S| \le k - \tau - 1\).

The proof of the above lemma is identical to the proof of Lemma 2, and thus is omitted.

Lemma 9

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^X\times \mathbb {Q}\), \(S\subseteq X\) with \(d(s,s')>4r\) for all \(s,s'\in S\) with \(s \ne s'\), and \(\beta \in \mathbb {Z}_{\ge 0}\). If there exists a set \(C\in \mathcal {F}^{\alpha ,\mu }(2r)\) with \(|C\setminus S|\le \beta \), then we can find such a set in time \(|X|^{O(\beta + \gamma )}\).

Proof

As in the proof of Lemma 3, we first guess up to \(\beta \) centers \(Q \subseteq X\setminus S\). For each of those guesses, we consider the binary program (2) with objective function \(\sum _{s\in S} z(s) \cdot \alpha (B(s,2r) \setminus B(Q,2r))\) to be maximized. Again, this is a special case of the binary program presented in Theorem 9, given in Appendix A, and thus can be solved in time \(|X|^{O(\gamma )}\). For the guess \(Q=C\setminus S\), the characteristic vector \(\chi ^{C\cap S}\) is feasible for this binary program, implying that the optimal centers \(Z\subseteq S\) chosen by the binary program fulfill \(Z\cup Q \in \mathcal {F}^{\alpha ,\mu }(2r)\). \(\square \)

Corollary 2

Let \((\alpha ,\mu ) \in \mathbb {Q}_{\ge 0}^X\times \mathbb {Q}\). There is an algorithm that, given \((x,y)\in \mathbb {R}^X \times \mathbb {R}^X\), either returns a set \(C\in \mathcal {F}^{\alpha ,\mu }(4r)\) or returns a hyperplane separating (xy) from \(\mathcal {P}_{I}^{\alpha ,\mu }\). The running time of the algorithm is \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\), where L is the encoding length of the input.

Proof

If \((x,y)\notin \mathcal {P}^{\alpha ,\mu }\), we return a violated constraint separating (xy) from \(\mathcal {P}^{\alpha ,\mu }\supseteq \mathcal {P}_{I}^{\alpha ,\mu }\). Hence we assume \((x,y)\in \mathcal {P}^{\alpha ,\mu }\). Since \(\mathcal {P}^{\alpha ,\mu }\subseteq \mathcal {P}\), we can use Theorem 1 to get an (xy)-good partition \((S,\mathcal {D})\). If \(y(B(S,r))\le k-\gamma \), Lemma 7 gives a set \(C\in \mathcal {F}^{\alpha ,\mu }(4r)\). So, assuming \(y(B(S,r))> k-\gamma \), we use Lemma 9 (with \(\beta =\gamma -1\)) to check whether there is \(C_2\in \mathcal {F}^{\alpha ,\mu }(2r)\) with \(|C_2\setminus S| \le \gamma -1\). If this is the case, we are done because \(\mathcal {F}^{\alpha ,\mu }(2r)\subseteq \mathcal {F}^{\alpha ,\mu }(4r)\). If not, the contrapositive of Lemma 8 (with \(\tau =k-\gamma \)) implies that every \(C_1\in \mathcal {F}^{\alpha ,\mu }(r)\) fulfills \(|C_1\cap B(S,r)|\le k-\gamma \). Hence, every point \((\overline{x},\overline{y})\in \mathcal {P}_{I}^{\alpha ,\mu }\) satisfies \(\overline{y}(B(S,r))\le k-\gamma \). However, this constraint is violated by (xy), and it thus separates (xy) from \(\mathcal {P}_{I}^{\alpha ,\mu }\). \(\square \)

Proof of Lemma 5

We use the ellipsoid method to check emptiness of \(\mathcal {P}_{I}^{\alpha ,\mu }\). Whenever the separation oracle gets called for a point \((x,y)\in \mathbb {R}^X \times \mathbb {R}^X\), we invoke the algorithm of Corollary 2. If the algorithm returns at any point a set \(C\in \mathcal {F}^{\alpha ,\mu }(4r)\), then C corresponds to a constraint in the inequality description of \(\mathcal {Q}(4r)\) violated by \((\alpha ,\mu )\). Otherwise, the ellipsoid method certifies that \(\mathcal {P}_{I}^{\alpha ,\mu }=\emptyset \), which implies \((\alpha ,\mu )\in \mathcal {Q}(r)\) by Observation 1. Note that the number of iterations of the ellipsoid method is polynomial as the separating hyperplanes used by the procedure above have encoding length \({{\,\mathrm{poly}\,}}(L)\), where L is the encoding length of the input (see Theorem 6.4.9 of [17]). Thus, the total running time is \({{\,\mathrm{poly}\,}}(L) \cdot |X|^{O(\gamma )}\). \(\square \)

Hardness results for Colorful k-Center

We now prove our hardness results. We start in Subsect. 4.1 by showing Theorem 3, i.e., that \(\upgamma \mathrm {C k C}\) becomes hard to approximate when the number of colors is unbounded. Then, in Subsect. 4.2, we prove Theorem 4, which shows our bi-criteria inapproximability result, i.e., there is an approximation hardness even when one is allowed to exceed the number of centers to be opened by up to a factor \(c\log |X|\) for some constant c.

We note that all of our hardness results apply even to real-line metrics. These are \(\upgamma \mathrm {C k C}\) instances where the underlying metric is given by a set of real numbers \(X \subseteq \mathbb {R}\), and the distance function d is defined as \(d(x,y) = |x-y|\) for every \(x,y \in X\). The task that we prove to be hard is distinguishing whether such an instance admits a solution of radius 0 or not.

We start by discussing a reduction from the well-known Set Cover problem to \(\upgamma \mathrm {C k C}\) on the real line. More precisely, we will show that deciding whether a given Set Cover instance has a solution of size at most k is equivalent to deciding whether a certain real-line \(\upgamma \mathrm {C k C}\) instance admits a solution of radius 0. We note that the reduction is a straightforward adaptation of the reduction appearing in [22] in the context of the Partial Set Cover problem in geometric settings. For completeness, we first define the (decision version of the) Set Cover problem.

Definition 4

Let U be a finite set, let \(\mathcal {S} \subseteq 2^U\) be a family of subsets of U, and let \(k\in \mathbb {Z}_{\ge 0}\). The (decision) Set Cover problem, denoted as \({{\,\mathrm{SC}\,}}(U,\mathcal {S}, k)\), asks to decide whether there exists a subset \(\mathcal {S}' \subseteq \mathcal {S}\) such that \(|\mathcal {S'}| \le k\) and \(\bigcup _{S \in \mathcal {S}'} S = U\).

The following lemma, mimicking the ideas in [22], shows a simple yet very useful reduction from Set Cover to \(\upgamma \mathrm {C k C}\).

Lemma 10

Let \({{\,\mathrm{SC}\,}}(U,\mathcal {S},k)\) be a Set Cover instance. Then, in time polynomial in |U| and \(|\mathcal {S}|\), we can construct a real-line \(\upgamma \mathrm {C k C}\) instance with \(|X|=|\mathcal {S}|\) points and \(\gamma =|U|\) colors such that \({{\,\mathrm{SC}\,}}(U,\mathcal {S},k)\) is a “yes” instance if and only if the \(\upgamma \mathrm {C k C}\) instance admits a solution of radius 0. Moreover, any \(\upgamma \mathrm {C k C}\) solution of radius 0 can be mapped efficiently to a \({{\,\mathrm{SC}\,}}(U,\mathcal {S},k)\) solution.

This reduction is independent of the parameter k, in the sense that for different values of k, the same \(\upgamma \mathrm {C k C}\) instance is obtained with the only difference that the number k of centers one can open is different.

Proof

We construct a \(\upgamma \mathrm {C k C}\) instance as follows. Let \(\gamma = |U|\) and \(s = |\mathcal {S}|\). Let \(U = \{u_1,\ldots , u_\gamma \}\) and \(\mathcal {S} = \{S_1, \ldots , S_s\}\). We set \(X = \{1, \ldots , s\} \subseteq \mathbb {R}\). Each element \(u_\ell \in U\) corresponds to a distinct color \(X_\ell = \{i \in [s]: u_\ell \in S_i\}\). We also set the covering requirement for each color \(\ell \in [\gamma ]\) to be \(m_\ell = 1\). Note that none of \(X, \gamma , X_\ell , m_\ell \) depend on k. Clearly, the construction can be done in time polynomial in |U| and \(|\mathcal {S}|\).

We now observe that the given \({{\,\mathrm{SC}\,}}(U,\mathcal {S},k)\) is a “yes” instance if and only if the constructed \(\upgamma \mathrm {C k C}\) instance admits a solution of radius 0. Indeed, if \(C \subseteq X\) is a \(\upgamma \mathrm {C k C}\) solution of radius 0, then the set \(\mathcal {S'} = \{S_i: i \in C\}\) is a feasible solution of the Set Cover instance of size \(|\mathcal {S}'| = |C|\le k\). Conversely, if \(\mathcal {S}'\subseteq \mathcal {S}\) is a Set Cover solution of size \(|\mathcal {S}'|\le k\), then \(C = \{i\in X :S_i \in \mathcal {S}'\}\) is a \(\upgamma \mathrm {C k C}\) solution of radius 0 with \(|C| = |\mathcal {S}'|\le k\) many centers. \(\square \)

Hardness of approximation for \(\upgamma \mathrm {C k C}\).

In this section, we prove our main hardness result, Theorem 3. For that, we reduce from the well-known Vertex Cover problem on graphs of maximum degree 3 and cast it as a \(\upgamma \mathrm {C k C}\) problem. We first formally define the problem.

Definition 5

Let \(G = (V, E)\) be a graph of maximum degree 3 and let \(k \in \mathbb {Z}_{\ge 0}\). The Vertex Cover problem on such a graph, denoted as \({{\,\mathrm{VC3}\,}}(G, k)\), asks to decide whether there exists a set \(S \subseteq V\) of size at most k such that \(S \cap e \ne \emptyset \) for every \(e \in E\).

Notice that Vertex Cover is a special case of Set Cover; hence, we can employ the reduction highlighted in Lemma 10 to obtain a \(\upgamma \mathrm {C k C}\) problem. Reducing from a Vertex Cover problem of bounded degree, instead of starting from a general Set Cover problem, has the advantage that the cardinality of a minimum Vertex Cover in bounded degree graphs has, up to constant factors, the same size as the underlying ground set, which is the edge set in case of Vertex Cover. This relation is relevant in our reduction to derive a contradiction with the Exponential Time Hypothesis.

In order to prove Theorem 3, we will use the following hardness results for \({{\,\mathrm{VC3}\,}}(G, k)\).

Theorem 6

([6, 14])

  1. (i)

    There is no algorithm for \({{\,\mathrm{VC3}\,}}(G, k)\) that runs in polynomial time, assuming that \(\mathtt {P}\ne \mathtt {NP}\).

  2. (ii)

    There is no algorithm for \({{\,\mathrm{VC3}\,}}(G,k)\) that runs in time \(2^{o(k)} {{\,\mathrm{poly}\,}}(|V(G)|)\), assuming the Exponential Time Hypothesis.

In our proof of Theorem 3, we reduce \({{\,\mathrm{VC3}\,}}\) to \(\upgamma \mathrm {C k C}\) using Lemma 10 and then derive hardness of \(\upgamma \mathrm {C k C}\) by the hardness given by Theorem 6. Whereas this approach proves the first part of Theorem 3 in a straightforward way, it faces a technical hurdle for the second part. More precisely, note that the second part can be rephrased as follows. The existence of a function \(f:\mathbb {Z}_{\ge 0} \rightarrow \mathbb {Z}_{\ge 0}\) with \(f(n) = \omega (\log n)\) together with a polynomial-time algorithm \(\mathcal {A}\) for \(\upgamma \mathrm {C k C}\) on the real line with \(\gamma \le f(|X|)\) violates the Exponential Time Hypothesis. However, by reducing a general \({{\,\mathrm{VC3}\,}}\) instance to \(\upgamma \mathrm {C k C}\) through Lemma 10, we may obtain a \(\upgamma \mathrm {C k C}\) instance that does not fulfill \(\gamma \le f(|X|)\), which is required to apply algorithm \(\mathcal {A}\), as algorithm \(\mathcal {A}\) only needs to work on instances in this regime. Indeed, the reduction of Lemma 10 would only allow us to use algorithm \(\mathcal {A}\) to obtain a polynomial-time algorithm \(\mathcal {A}'\) for Set Cover instances \({{\,\mathrm{SC}\,}}(U,\mathcal {S},k)\) with \(|U|\le f(|\mathcal {S}|)\); in particular, we would only be able to solve \({{\,\mathrm{VC3}\,}}\) instances whose underlying graph \(G = (V,E)\) satisfies \(|E| \le f(|V|)\). However, \(\mathcal {A}'\) can easily be transformed into an algorithm working for \({{\,\mathrm{VC3}\,}}\) instance by artificially inflating the vertex set V to make sure that |E| is small compared to |V|. The following lemma formalizes this quite straightforward, though slightly technical, step.

Lemma 11

Let \(f: \mathbb {Z}_{\ge 0} \rightarrow \mathbb {Z}_{\ge 0}\) be a function satisfying \(f(n) = \omega (\log n)\). Suppose that there exists an algorithm \(\mathcal {A}'\) that solves in polynomial time any \({{\,\mathrm{VC3}\,}}(G', k')\) instances with \(|E(G')|\le f(|V(G')|)\). Then there is an algorithm \(\mathcal {A}\) that solves any \({{\,\mathrm{VC3}\,}}(G,k)\) instance in time \(2^{o(|E(G)|)} {{\,\mathrm{poly}\,}}(|V(G)|)\).

Proof

Let \(\mathcal {I} = {{\,\mathrm{VC3}\,}}(G,k)\) be a Vertex Cover instance on a graph of maximum degree 3. To be able to apply \(\mathcal {A}'\) to \(\mathcal {I}\) we would need \(|E(G)|\le f(|V(G)|)\). If this is satisfied, we simply apply \(\mathcal {A}'\). Hence, assume from now on \(|E(G)| > f(|V(G)|)\). In this case we create a modified \({{\,\mathrm{VC3}\,}}\) instance \(\overline{\mathcal {I}} = {{\,\mathrm{VC3}\,}}(\overline{G},k)\) obtained by inflating \(\mathcal {I}\) through the addition of singleton vertices as discussed in the following. Because \(f(n) = \omega (\log n)\), there is a constant \(n_0\in \mathbb {Z}_{>0}\) and a non-decreasing function \(h:\mathbb {Z}_{\ge 0}\rightarrow \mathbb {Z}_{> 0}\) with

  1. (i)

    \(\lim _{n\rightarrow \infty } h(n) = \infty \), and

  2. (ii)

    \(f(n) \ge h(n) \cdot \log n \quad \forall n\in \mathbb {Z}_{\ge n_0}\).

Without loss of generality, we assume that \(|V(G)|\ge n_0\); for otherwise, the instance \(\mathcal {I}\) has constant size and can therefore be solved in constant time. We add

$$\begin{aligned} N :=\max \left\{ 0,2^{\left\lceil \frac{|E(G)|}{h(|V(G)|)}\right\rceil } - |V(G)|\right\} \end{aligned}$$

new singleton vertices to the \({{\,\mathrm{VC3}\,}}\) instance \(\mathcal {I}\) to obtain a new blown-up \({{\,\mathrm{VC3}\,}}\) instance \(\overline{\mathcal {I}}={{\,\mathrm{VC3}\,}}(\overline{G},k)\) that is equivalent to \(\mathcal {I}\) because the introduced singleton vertices are not incident with any edges.

Hence, the new Vertex Cover instance \(\overline{\mathcal {I}}\) fulfills

$$\begin{aligned} |V(\overline{G})| = \max \left\{ |V(G)|, 2^{\left\lceil \frac{|E(G)|}{h(|V(\overline{G})|)} \right\rceil } \right\} . \end{aligned}$$
(3)

Notice that

$$\begin{aligned} f(|V(\overline{G})|) \ge h(|V(\overline{G})|)\log |V(\overline{G})| \ge |E(G)| \cdot \frac{h(|V(\overline{G})|)}{h(|V(G)|)} \ge |E(G)|, \end{aligned}$$

where the above inequalities follow by the properties of the function h, including that h is non-decreasing, and (3). Hence, algorithm \(\mathcal {A}'\) is applicable to \(\overline{\mathcal {I}}\) and, because \(\overline{\mathcal {I}}\) and \(\mathcal {I}\) are equivalent instances, \(\mathcal {A}'\) solves the original instance \(\mathcal {I}\). Finally, the running time to construct and solve \(\overline{\mathcal {I}}\) through \(\mathcal {A}'\) is upper bounded by

$$\begin{aligned} \mathrm {poly}\left( 2^{\frac{|E(G)|}{h(|V(G)|)}} |V(G)| \right) = 2^{o(|E(G)|)} \mathrm {poly}(|V(G)|), \end{aligned}$$

where we used the fact that \(h(n) = \omega (1)\).

We highlight that the function h(n) does not need to be known or computed explicitly to perform the reduction. By our choice of N, the number of vertices \(|V(\overline{G})|\) in the blown-up \({{\,\mathrm{VC3}\,}}\) instance \(\overline{\mathcal {I}}\) is either |V(G)| or a power of two between |V(G)| and \(2^{|E(G)|}\). Hence, one can simply run \(\mathcal {A}'\) in parallel for each of the polynomially many options of the size of the blown-up instance and terminate as soon as the first one of these parallel computations terminates. \(\square \)

We are now ready to prove Theorem 3.

Proof of Theorem 3

The first part of the theorem is an immediate consequence of part 6 of Theorem 6 and Lemma 10.

For the second part, let \(f: \mathbb {Z}_{\ge 0} \rightarrow \mathbb {Z}_{\ge 0}\) be a function that satisfies \(f(n) = \omega (\log n)\) and assume for the sake of contradiction that there is a polynomial-time algorithm \(\mathcal {A}'\) for \(\upgamma \mathrm {C k C}\) on the real line with \(\gamma \le f(|X|)\). Then, by Lemma 10, there exists a polynomial-time algorithm \(\mathcal {A}'\) for Vertex Cover instances \({{\,\mathrm{VC3}\,}}(G,k)\) satisfying \(|E(G)|\le f(|V(G)|)\). By Lemma 11, this implies the existence of an algorithm \(\mathcal {A}\) for solving (arbitrary) \({{\,\mathrm{VC3}\,}}(G,k)\) instances in time \(2^{o(|E(G)|)} {{\,\mathrm{poly}\,}}(|V(G)|)\).

To obtain a contradiction with Theorem 6 (assuming the Exponential Time Hypothesis), it remains to show that this implies the existence of an algorithm for \({{\,\mathrm{VC3}\,}}(G,k)\) running in time \(2^{o(k)} {{\,\mathrm{poly}\,}}(|V(G)|)\). Given a \({{\,\mathrm{VC3}\,}}(G,k)\) instance, we proceed as follows. Because G has no vertex of degree larger than 3, any vertex cover in G must have cardinality at least . Hence, if , we know that \({{\,\mathrm{VC3}\,}}(G,k)\) is a “no” instance. Otherwise, if , the running time of algorithm \(\mathcal {A}\) is \(2^{o(|E(G)|)} {{\,\mathrm{poly}\,}}(|V(G)|) = 2^{o(k)} {{\,\mathrm{poly}\,}}(|V(G)|)\), thus leading to the desired contradiction under the Exponential Time Hypothesis.

Hardness for bi-criteria algorithms

In this section, we extend the hardness result stated in Theorem 3 to bi-criteria algorithms. For this, we reduce from the optimization version of the Set Cover problem, which we refer to as the Minimum Cardinality Set Cover problem to distinguish it from the decision version used earlier. For completeness, we define it formally below.

Definition 6

(Minimum Cardinality Set Cover (\({{\,\mathrm{MCSC}\,}}\))) Let U be a finite set and \(\mathcal {S}\subseteq 2^U\) be a family of subsets of U. The Minimum Cardinality Set Cover problem \({{\,\mathrm{MCSC}\,}}(U,\mathcal {S})\) asks to compute the smallest subset \(\mathcal {S}' \subseteq \mathcal {S}\) such that \(\bigcup _{S \in \mathcal {S}'} S = U\).

\({{\,\mathrm{MCSC}\,}}\) is a well-understood \(\mathtt {NP}\)-hard problem. We are interested in its approximation hardness, which, after a long series of works, was settled by Dinur and Steurer [13]; we state their result as Theorem 7. We note that since we are not interested in optimizing the constant that appears in the main theorem of this section, any known \(\Omega (\log n)\)-hardness result for \({{\,\mathrm{MCSC}\,}}\) suffices to derive Theorem 4, proved below.

Theorem 7

[13]] For every \(\varepsilon > 0\), it is \(\mathtt {NP}\)-hard to approximate \({{\,\mathrm{MCSC}\,}}\) for instances with universe size n and \(m \le {{\,\mathrm{poly}\,}}(n)\) sets to within a factor of \((1 - \varepsilon ) \ln n\).

Combining Theorem 7 with Lemma 10 leads to the desired result.

Proof of Theorem 4

Suppose that, for some constant \(c>0\) to be determined later, there exists an algorithm \(\mathcal {A}\) for \(\upgamma \mathrm {C k C}\) on the real line that, if there exists a solution of radius 0, it finds a solution of radius 0 by opening at most \(k\cdot c \cdot \log |X|\) many centers, where X are the points on which \(\upgamma \mathrm {C k C}\) is defined. We now translate this algorithm to \({{\,\mathrm{MCSC}\,}}\) using Lemma 10. To this end, consider an instance \(\mathcal {I}={{\,\mathrm{MCSC}\,}}(U,\mathcal {S})\) with \(|\mathcal {S}|\le {{\,\mathrm{poly}\,}}(|U|)\), where the polynomial \({{\,\mathrm{poly}\,}}(|U|)\) is the one from Theorem 7. Let \(k^*\) be the optimal value of \(\mathcal {I}\).

For every \(k \in \{0, \ldots , \min \{|U|, |\mathcal {S}|\}\}\), we use the reduction of Lemma 10 to get a real-line \(\upgamma \mathrm {C k C}\) instance and run \(\mathcal {A}\) on it. For \(k = k^*\), the resulting \(\upgamma \mathrm {C k C}\) instance, by Lemma 10, has a feasible solution of size at most \(k^*\), and thus, for this instance our algorithm will return a solution of size at most \(k^* \cdot c \cdot \log |\mathcal {S}|\). Because \(|\mathcal {S}| \le {{\,\mathrm{poly}\,}}(|U|)\), this means that the returned Set Cover has size at most \(k^* \cdot c' \cdot \log |U|\), for some constant \(c' > 0\) that depends on c and the hidden universal constants in the \(|\mathcal {S}| \le {{\,\mathrm{poly}\,}}(|U|)\) assumption. Thus, by considering all constructed \(\upgamma \mathrm {C k C}\) instances—which only differ by their value of k—for which a solution was returned and picking the smallest such solution, we obtain a set cover of size at most \(k^* \cdot c' \cdot \log |U|\). By setting the constant c appropriately (it is easy to see that this can always be done for sufficiently small c), this now contradicts Theorem 7. We conclude that it is \(\mathtt {NP}\)-hard to decide whether a \(\upgamma \mathrm {C k C}\) instance has a solution of radius 0, even if we allow solutions that open up to \(k \cdot c \cdot \log |X|\) centers.

Conclusion

In this work, we presented a technique for obtaining true constant-factor approximation algorithms for k-center problems with multiple covering constraints on the points to be covered. This leads to a polynomial-time 4-approximation algorithm for \(\gamma \)-Colorful k-Center, where \(\gamma \), the number of colors, is assumed to be constant, as well as a polynomial-time 4-approximation algorithm for the more general Fair \(\gamma \)-Colorful k-Center problem.

We note here that our results extend to the supplier setting, where there are distinct sets of facilities and clients, and one is allowed to open k facilities in order to cover clients. For such settings, we obtain a polynomial-time 5-approximation algorithm for the Fair \(\gamma \)-Colorful k-Supplier problem. The extension of our arguments to this setting is done by using a standard technique: we first find clients C that constitute a 4-approximate solution to the corresponding Center problem and then pick a facility \(f_c\in B(c,r)\) for each \(c\in C\). Using the notation introduced in the description of Algorithm 1, we note that terminating Algorithm 1 once \(\max _{u\in U} x(u) = 0\) does not affect the remaining steps in our approximation algorithms. Hence we may assume that \(x(s)>0\) for all \(s\in S\), which guarantees the existence of a facility in B(sr). We also clarify that the “guessing a few centers” part of our algorithm performed in Lemma 9 can be applied directly to facilities with no issues arising.

On the negative side, we show that Colorful k-Center is inapproximable when the number of colors is assumed to be part of the input.

There are still some open questions remaining; we highlight two of them, which we find particularly natural and interesting:

  1. (i)

    The currently known hardness of \(\gamma \)-Colorful k-Center is \(2-\varepsilon \), inherited from the standard k-Center problem, while (for constant \(\gamma \)) we give a polynomial-time 4-approximation, and, as already mentioned, in an independent work, Jia, Sheth, and Svensson [23] give a polynomial-time 3-approximation with a worse running time. It would be interesting to close this gap.

  2. (ii)

    \(\gamma \)-Colorful k-Center naturally generalizes to the knapsack and matroid versions of it, where the set of centers that are opened must satisfy a knapsack or a matroid constraint. Currently, our technique does not easily generalize to such settings, so new ideas might be needed to handle these problems.

Notes

  1. 1.

    The version introduced in [3] requires \(X_1,\ldots , X_\gamma \) to partition X. However, this additional condition on the input does not simplify the problem. Indeed, \(\upgamma \mathrm {C k C}\) readily reduces to the model in [3] by introducing a new color \(X_{\gamma + 1} = X \setminus \bigcup _{i \in [\gamma ]}X_i\) with \(m_{\gamma + 1} = 0\) and replacing each element that has \(q > 1\) colors by q elements on the same location with each having a single color.

  2. 2.

    Program (2) reduces to the one of Theorem 9 by removing any redundant constraint of the first type that has negative right-hand side.

  3. 3.

    Note that the only reason why this is a slight abuse of terminology is because we defined (xy)-good partitions only for points in \(\mathcal {P}\). Moreover, contrary to \(\mathcal {T}\), the decription of the polytope \(\mathcal {P}\) contains specific constraints for the covering requirements of the colors. However, these constraints did not play any role in showing that Algorithm 1 returns an (xy)-good partition (see proof of Lemma 1).

References

  1. 1.

    An, H.C., Singh, M., Svensson, O.: LP-based algorithms for capacitated facility location. SIAM J. Comput. 46(1), 272–306 (2017)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: Proceedings of the 36th International Conference on Machine Learning (ICML), pp. 405–413 (2019)

  3. 3.

    Bandyapadhyay, S., Inamdar, T., Pai, S., Varadarajan, K.R.: A constant approximation for Colorful \(k\)-Center. In: Proceedings of the 27th Annual European Symposium on Algorithms (ESA), pp. 12:1–12:14 (2019)

  4. 4.

    Bera, S.K., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems (NeurIPS), pp. 4955–4966 (2019)

  5. 5.

    Bercea, I.O., Groß, M., Khuller, S., Kumar, A., Rösner, C., Schmidt, D.R., Schmidt, M.: On the cost of essentially fair clusterings. In: Proceedings of the 22nd International Conference on Approximation Algorithms for Combinatorial Optimization Problems (APPROX/RANDOM), pp. 18:1–18:22 (2019)

  6. 6.

    Cai, L., Juedes, D.W.: On the existence of subexponential parameterized algorithms. J. Comput. Syst. Sci. 67(4), 789–807 (2003)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Carr, R.D., Fleischer, L.K., Leung, V.J., Phillips, C.A.: Strengthening integrality gaps for capacitated network design and covering problems. In: Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 106–115 (2000)

  8. 8.

    Chakrabarty, D., Goyal, P., Krishnaswamy, R.: The non-uniform \(k\)-center problem. In: Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming (ICALP), pp. 67:1–67:15 (2016)

  9. 9.

    Chakrabarty, D., Negahbani, M.: Generalized center problems with outliers. ACM Trans. Algorithm. 15(3), 41:1-41:14 (2019)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proceedings of the 12th Annual Symposium on Discrete Algorithms (SODA), pp. 642–651 (2001)

  11. 11.

    Chen, D.Z., Li, J., Liang, H., Wang, H.: Matroid and knapsack center problems. Algorithmica 75(1), 27–52 (2016)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Chierichetti, F., Kumar, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS), pp. 5029–5037 (2017)

  13. 13.

    Dinur, I., Steurer, D.: Analytical approach to parallel repetition. In: Proceedings of the 46th Annual Symposium on the Theory of Computing (STOC), pp. 624–633 (2014)

  14. 14.

    Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete problems. In: Proceedings of the 6th Annual ACM Symposium on Theory of Computing (STOC), pp. 47–63 (1974)

  15. 15.

    Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci. 38, 293–306 (1985)

    MathSciNet  Article  Google Scholar 

  16. 16.

    Grandoni, F., Kalaitzis, C., Zenklusen, R.: Improved approximation for tree augmentation: Saving by rewiring. In: Proceedings of the 50th ACM Symposium on Theory of Computing (STOC), pp. 632–645 (2018)

  17. 17.

    Grötschel, M., Lovász, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization, 2nd edn. Springer, Germany (1993)

    Book  Google Scholar 

  18. 18.

    Harris, D.G., Pensyl, T., Srinivasan, A., Trinh, K.: A lottery model for center-type problems with outliers. ACM Transactions on Algorithms 15(3), 36:1–36:25 (2019)

  19. 19.

    Hochbaum, D.S., Shmoys, D.B.: A best possible heuristic for the k-center problem. Math. Op. Res. 10(2), 180–184 (1985)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Hochbaum, D.S., Shmoys, D.B.: A unified approach to approximation algorithms for bottleneck problems. J. ACM 33(3), 533–550 (1986)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Hsu, W., Nemhauser, G.L.: Easy and hard bottleneck location problems. Discret. Appl. Math. 1(3), 209–215 (1979)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Inamdar, T., Varadarajan, K.R.: On the partition set cover problem (2018). https://arxiv.org/abs/1809.06506

  23. 23.

    Jia, X., Sheth, K., Svensson, O.: Fair colorful k-center clustering. In: Proceedings of the 21st International Conference on Integer Programming and Combinatorial Optimization (IPCO), pp. 209–222 (2020)

  24. 24.

    Levi, R., Lodi, A., Sviridenko, M.: Approximation algorithms for the capacitated multi-item lot-sizing problem via flow-cover inequalities. Math. Op. Res. 33(2), 461–474 (2008)

    MathSciNet  Article  Google Scholar 

  25. 25.

    Li, S.: Approximating capacitated \(k\)-median with \((1+\epsilon )k\) open facilities. In: Proceedings of the 27th Annual ACM Symposium on Discrete Algorithms (SODA), pp. 786–796 (2016)

  26. 26.

    Li, S.: On uniform capacitated \(k\)-median beyond the natural LP relaxation. ACM Trans. Algorithm. 13(2), 22:1-22:18 (2017)

    MathSciNet  Article  Google Scholar 

  27. 27.

    Nutov, Z.: On the tree augmentation problem. In: Proceedings of the 25th Annual Symposium on Algorithms (ESA), pp. 61:1–61:14 (2017)

  28. 28.

    Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP), pp. 96:1–96:14 (2018)

Download references

Funding

Open Access funding provided by ETH Zurich.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Adam Kurpisz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 817750) and the Swiss National Science Foundation grants 200021_184622 and PZ00P2_174117. This research was conducted while the second author was at ETH Zurich. A preliminary version of this work was presented at the 21st Conference on Integer Programming and Combinatorial Optimization (IPCO 2020). An independent work of Jia, Sheth, and Svensson [23], presented at the same venue, gave a 3-approximation for Colorful k-Center with constantly many colors using different techniques.

Appendices

Technical theorems

Theorem 8

([3]) Let (Xd) be a finite metric space, and suppose that the following polytope

is not empty, where \(k \in \{1, \ldots , |X|\}\), \(t \in \mathbb {Z}_{\ge 0}\), \(a_\ell \in \mathbb {R}_{\ge 0}^X\) and \(b_{\ell } \in \mathbb {R}_{\ge 0}\) for every \(\ell \in [t]\), and \(r \ge 0\). Let \((x,y) \in \mathcal {T}\), and let \((S,\mathcal {D})\) be a partition obtained by running Algorithm 1 with input (xy). Then, if \(y(B(S, r)) \le k - t + 1\), we can find in polynomial time a set \(C \subseteq X\) with \(|C| \le k\) satisfying \(a_\ell (B(C,4r)) \ge b_\ell \) for all \(\ell \in [t]\).

Proof

Let \(S = \{s_1, \ldots , s_q\}\) and \(\mathcal {D} = \{D_1, \ldots , D_q\}\) be the partition obtained by running Algorithm 1 with input (xy). It is easy to see that \((S,\mathcal {D})\) satisfies all three properties of an (xy)-good partition, and so, by slightly abusing terminology, we will call it an (xy)-good partition.Footnote 3 We now assume that \(q \ge k + 1\), since otherwise, the set of centers S is already a feasible solution, as \(B(S,4r) = X\). We claim that the simplified LP given below is feasible and has optimal value at most y(B(Sr)).

(4)

This is indeed the case because we can construct a feasible point to the above LP with objective value at most y(B(Sr)) as follows. Let \(z_i = \min \{1, y(B(s_i,r))\}\) for all \(i\in [q]\). Because \((\mathcal {D}, S)\) is a (xy)-good partition, property 3 of Definition 3 implies that

$$\begin{aligned} \sum _{i\in [q]} a_\ell (D_i) z_i \ge \sum _{i\in [q]} \sum _{u \in D_i}a_\ell (u) x(u) =\sum _{u\in X} a_\ell (u) x(u)\ge b_\ell \qquad \forall \ell \in [t], \end{aligned}$$

(here we also use the fact that \(x(u) \le 1\) for all \(u \in X\), as \((x,y) \in \mathcal {T}\)), i.e., z is a feasible solution of the above LP, and its objective value is \(\sum _{i \in [q]} z_i \le y(B(S, r))\).

Suppose now that the hypothesis holds, i.e., \(y(B(S,r)) \le k - t + 1\). In particular, this means that \(t \le k + 1 \le q\). Note that if \(t = k + 1\), then \(y(B(S,r)) = 0\), which, by the greediness of Algorithm 1, further implies that \(b_\ell = 0\) for every \(\ell \in [t]\). Such a case is trivial, as we can simply set \(C :=\emptyset \). Thus, from now on, we assume that \(t \le k < q\). By the above discussion, the optimal value of the above simplified LP is at most \(k - t + 1\). We consider an optimal extreme point solution \(z^*\) of LP (4). A standard sparsity argument implies that \(z^*\) has at most t fractional variables. Indeed, \(z^*\) is defined by q linearly independent and tight constraints of (4), among which at most t many are not of type \(z_i\ge 0\) or \(z_i \le 1\). Hence, this implies that there are at least \(q-t\) \(z^*\)-tight constraints of (4) of type \(z_i^* =0\) or \(z_i^*=1\). This in turn implies that \(z^*\) has at most t fractional components.

Furthermore, the number of strictly positive components of \(z^*\) is at most k. To see this, note that if \(k-t+1\) components of \(z^*\) are equal to 1, all other entries must be 0 because \(z^*\) is an optimal solution to (4), which has objective value no more than \(k-t+1\). Otherwise, there are at most \(k-t\) variables that are equal to 1 and, together with at most t fractional variables, there are at most k strictly positive entries. Therefore, the set of centers \(C=\{s_i \in X \ | \ z^*_i>0\} \subseteq S\) has size at most k and satisfies \(a_\ell (B(C,4r)) \ge b_\ell \) for all \(\ell \in [t]\), because \(\bigcup _{c \in C} B(c,4r) \supseteq \bigcup _{i: z^*_i>0} D_i\), as \(D_i \subseteq B(s_i, 4r)\) for all \(i\in [q]\). \(\square \)

For completeness, we now discuss how the dynamic programming problems appearing in our approaches can be solved in the claimed running time.

Theorem 9

Consider the following binary program:

where \(\gamma \in \mathbb {Z}_{\ge 1}\), \(w \in \mathbb {R}_{\ge 0}^q\), \(a_\ell \in \{0, \ldots , M\}^q\) and \(m_\ell \in \{0, \ldots , M\}\) for all \(\ell \in [\gamma ]\), where M is some positive integer number, and \(\kappa \in [q]\). Then, the above program can be solved in time \(O(\gamma q^2 M^\gamma )\).

Proof

The above binary program can be solved using standard dynamic programming techniques. More precisely, we define the following DP table. For every \(i \in \{0, \ldots , q\}\), \(M_\ell \in \{0, \ldots m_\ell \}\) for every \(\ell \in [\gamma ]\), and \(j \in \{0, \ldots , \kappa \}\), let \(A[i, M_1, \ldots , M_\gamma , j]\) be the maximum objective value of any vector \(z\in \{0,1\}^q\) that satisfies

  1. (i)

    \(\{t \in [q]: \; z(t) = 1\} \subseteq [i]\),

  2. (ii)

    \(\sum _{t = 1}^q z(t) \le j\), and

  3. (iii)

    \(\sum _{t = 1}^q a_\ell (t)\cdot z(t) \ge M_\ell \) for every \(\ell \in [\gamma ]\).

Initialization is easy to define. For all non-trivial tuples \([i, M_1, \ldots , M_\gamma , j]\), by setting \(b_\ell :=\max \{M_\ell - a_\ell (i), 0\}\) for every \(\ell \in [\gamma ]\), we have

$$\begin{aligned}&A[i, M_1, \ldots , M_\gamma , j] = \max \{w(i) + A[i-1, b_1, \ldots , b_\gamma , j-1],\\&A[i-1, M_1, \ldots , M_\gamma , j] \}. \end{aligned}$$

By observing the range of each parameter of the above table, we get that there are \(O(q \kappa M^\gamma )\) table entries in total. Moreover, for each entry we need to compute \(\gamma \) auxiliary quantities \(b_1, \ldots , b_\gamma \), and so we conclude that each entry can be computed in time \(O(\gamma )\). Thus, the DP can be solved in time \(O(\gamma q^2 M^\gamma )\), where we used the fact that \(\kappa \le q\). \(\square \)

We remark that the \(O(\gamma )\) update time per table entry in the above proof can be reduced to O(1) amortized update time per table entry through a more careful analysis. However, the resulting slight reduction in running time from \(O(\gamma q^2 M^{\gamma })\) to \(O(q^2 M^{\gamma })\) is irrelevant for our purposes.

A limiting example for the framework of Chakrabarty and Negahbani [9]

A natural way to extend the approach of [9] is the following procedure. Given a point \((x,y)\in \mathbb {R}^X\times \mathbb {R}^X\), we first run Algorithm 1 (with balls of radius 2 at each step) to get a partition of X, and then we use dynamic programming to decide whether it is possible to select at most k clusters of this partition so that the covering requirements for all colors are satisfied. Such a selection, if it exists, gives a 2-approximation. If there is no such selection, we want to return a hyperplane separating (xy) from \(\mathcal {P}_{I}\), as in [9].

However, there is an instance and a point (xy), given below, such that neither the partition will lead to a solution nor is it possible to separate (xy) from \(\mathcal {P}_{I}\). Thus any such procedure needs to deal with this limitation.

Fig. 2
figure2

A limiting example for the Chakrabarty-Negahbani framework [9]

In Fig. 2, we present an instance of \(\gamma \)-Colorful k-Center with \(\gamma =k=2\) in the one-dimensional Euclidean space; hence \(X\subseteq \mathbb {R}\). There are two colors, red and blue; the red points are represented as red circles and the blue points as blue squares. The color covering requirements are \(m_1=m_2=3\). It is easy to see that there are no integral solutions of radius 0, hence any solution with radius 1 is optimal. We consider two different optimal solutions:

  • \(C_1 = \{1, M + 1\}\) with corresponding clustering \(\mathcal {C}_1 = \left\{ \{1,2\}, \{M+1, M+2 \} \right\} \),

  • \(C_2 = \{4, M+4\}\) with corresponding clustering \(\mathcal {C}_2 = \left\{ \{3,4\}, \{M+3, M+4\} \right\} \).

We clarify that in the above, we slightly abuse notation; if there are multiple points in a location, we only pick one of them as a center, while in the corresponding clustering, all points in a covered location participate in the clustering. It is easy to verify that the above clusterings are indeed feasible solutions of radius 1, and thus, they are optimal solutions.

We now define the fractional solution \((x,y) \in \mathbb {R}^X \times \mathbb {R}^X\), where \(x=\frac{1}{2} \left( \chi ^{\mathcal {C}_1}+\chi ^{\mathcal {C}_2}\right) \) and \(y=\frac{1}{2} \left( \chi ^{C_1} + \chi ^{C_2}\right) \). Observe that we have \(x(u)=\frac{1}{2}\) for all \(u\in X\).

In the above example, given the defined point (xy) as input, Algorithm 1 may return the indicated partitioning \(\{D_1, D_2, D_3, D_4\}\). We stress here that there are ties, and in order to get this partitioning we resolve them adversarially. Note that there is no specified way to resolve such ties in [9] and it seems highly unclear how to design a procedure that always break ties in a good way even if there is a good way to break them. Observe now that no combination of two of these resulting clusters satisfies the covering requirement, so the partitioning does not lead to a solution. However, we cannot possibly find an appropriate separating hyperplane because \((x,y)\in \mathcal {P}_{I}\) by construction.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Anegg, G., Angelidakis, H., Kurpisz, A. et al. A technique for obtaining true approximations for k-center with covering constraints. Math. Program. (2021). https://doi.org/10.1007/s10107-021-01645-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-021-01645-y

Keywords

  • Approximation algorithms
  • k-Center
  • Clustering
  • Polyhedral techniques

Mathematics Subject Classification

  • 90C27
  • 68W40
  • 68Q25
  • 90C05