Algorithms for covering multiple submodular constraints and applications

We consider the problem of covering multiple submodular constraints. Given a finite ground set N, a weight function w:N→R+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w: N \rightarrow \mathbb {R}_+$$\end{document}, r monotone submodular functions f1,f2,…,fr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_1,f_2,\ldots ,f_r$$\end{document} over N and requirements k1,k2,…,kr\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k_1,k_2,\ldots ,k_r$$\end{document} the goal is to find a minimum weight subset S⊆N\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S \subseteq N$$\end{document} such that fi(S)≥ki\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_i(S) \ge k_i$$\end{document} for 1≤i≤r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1 \le i \le r$$\end{document}. We refer to this problem as Multi-Submod-Cover and it was recently considered by Har-Peled and Jones (Few cuts meet many point sets. CoRR. arxiv:abs1808.03260Har-Peled and Jones 2018) who were motivated by an application in geometry. Even with r=1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r=1$$\end{document}Multi-Submod-Cover generalizes the well-known Submodular Set Cover problem (Submod-SC), and it can also be easily reduced to Submod-SC. A simple greedy algorithm gives an O(log(kr))\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\log (kr))$$\end{document} approximation where k=∑iki\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = \sum _i k_i$$\end{document} and this ratio cannot be improved in the general case. In this paper, motivated by several concrete applications, we consider two ways to improve upon the approximation given by the greedy algorithm. First, we give a bicriteria approximation algorithm for Multi-Submod-Cover that covers each constraint to within a factor of (1-1/e-ε)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(1-1/e-\varepsilon )$$\end{document} while incurring an approximation of O(1ϵlogr)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(\frac{1}{\epsilon }\log r)$$\end{document} in the cost. Second, we consider the special case when each fi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f_i$$\end{document} is a obtained from a truncated coverage function and obtain an algorithm that generalizes previous work on partial set cover (Partial-SC), covering integer programs (CIPs) and multiple vertex cover constraints Bera et al. (Theoret Comput Sci 555:2–8 Bera et al. 2014). Both these algorithms are based on mathematical programming relaxations that avoid the limitations of the greedy algorithm. We demonstrate the implications of our algorithms and related ideas to several applications ranging from geometric covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.

covering problems to clustering with outliers. Our work highlights the utility of the high-level model and the lens of submodularity in addressing this class of covering problems.
Keywords Set Cover · Partial Set Cover · Submodular functions

Introduction
Set Cover is a well-studied problem in combinatorial optimization and is a canonical covering problem. The input is a set system (U, S) consisting of a finite set U and a collection S = {S 1 , S 2 , . . . , S m } of subsets of U. The goal is to find a minimum cardinality subcollection S ⊆ S such that ∪ A∈S A = U. In the weighted version each S i has a weight w i ≥ 0 and the goal is to find a minimum weight subcollection of sets whose union is U. Set Cover is NP-Hard and approximation algorithms have been extensively studied. A very simple greedy algorithm yields a (1+ln d)-approximation where d = max i |S i | even in the weighted case (Dobson 1982). Moreover this bound is essentially tight unless P = NP (Dinur and Steurer 2014).
Various special cases and generalizations of Set Cover have been studied over the years for their applications and theoretical interest. We describe three generalizations that are of interest to us.
• Partial Set Cover (Partial-SC): In Partial-SC the input is a set system (U, S) and an integer parameter k and the goal is to find a minimum (weight) subcollection of the given sets whose union is of size at least k. Set Cover is a special case when k = |U|. • Covering Integer Program (CIP): A CIP is an integer program of the form min{wx | Ax ≥ b, x ≤ d, x ∈ Z n + } where A is a non-negative m × n matrix and b ≥ 0. Set Cover is a special case of CIP when A is a {0, 1} matrix and b and d are the all ones vectors-each constraint row of A corresponds to covering an element of U.
• Submodular Set Cover (Submod-SC): In Submod-SC we are given a finite ground set N , a non-negative weight function w : N → R + , and a polymatroid f : 2 N → Z + via a value oracle 1 . The goal is to find a minimum weight subset S ⊆ N such that f (S) = f (N ). Set Cover is a special case where N represents the sets in the set system and f captures the coverage function which is submodular.
Submodularity is a powerful abstraction and Submod-SC can be seen to generalize Partial-SC and CIPs. The greedy algorithm for Set Cover admits a natural generalization to Submod-SC- Wolsey (1982) showed that it yields a (1 + ln d)approximation where d = max i∈N f (i). The abstraction of submodularity comes at a cost, however. For instance CIPs admit an O(ln m)-approximation via an LP relaxation strengthened with knapsack cover (KC) inequalities (Carr et al. 2000;Kolliopoulos and Young 2005;Chen et al. 2016; while using the greedy algorithm yields only an O(ln d) approximation where d depends on the maximum sum of the entries in a column of A, and in fact can be as large as m (Dobson 1982). CIPs provide the explicit ability to model multiple covering constraints and this is often useful in applications. In this paper we consider an abstraction that generalizes Submod-SC by explicitly allowing multiple submodular covering constraints.

Multiple Submodular Covering Constraints:
The input consists of a ground set N and a weight function w : N → R + . The input consists of r polymatroids f 1 , f 2 , . . . , f r over N and integers k 1 , k 2 , . . . , k r . The goal is to find S ⊆ N of minimum weight such that f i (S) ≥ k i for 1 ≤ i ≤ r . We refer to this as Multi-Submod-Cover.
Har-Peled and Jones (2018), motivated by an application from computational geometry, appear to be the first ones to consider Multi-Submod-Cover explicitly. As noted in Har-Peled and Jones (2018), it is not hard to reduce Multi-Submod-Cover to Submod-SC. We simply define a new submodular set function g : 2 N → R + where g(A) = r i=1 min{k i , f i (A)}. Via Wolsey's result for Submod-SC this implies an O(log r + log K ) approximation via the greedy algorithm where K = r j=1 k j . Although Multi-Submod-Cover can be reduced to Submod-SC it is useful to treat it separately when the functions f i are known to belong to a special class of submodular functions. For instance CIP can be seen as a special case of Multi-Submod-Cover where each f i is a truncated/partial linear function. Another example, which is the main motivation for this work, comes from Bera et al. (2014),  who considered the case when each f i is a truncated/partial coverage function (partial vertex cover in Bera et al. (2014) and partial set cover in ). These special cases have several applications that we outline below.
We mention that prior work has considered multiple submodular objectives from a maximization perspective (Chekuri et al. 2010(Chekuri et al. , 2015 rather than from a minimum cost perspective. There are useful connections between these two perspectives. Consider Submod-SC. We could recast the exact version of this problem as max f (S) subject to the constraint w(S) ≤ OPT where OPT is the optimum cost of the given instance of Submod-SC. This is submodular function maximization subject to a knapsack constraint and admits a (1 − 1/e)-approximation (Sviridenko 2004). Using this algorithm iterarively will also yield an approximation algorithm for Submod-SC.
We describe several applications that motive this and some previous work and then state our results formally.

Motivating applications
Splitting point sets: Har-Peled and Jones (2018), as we remarked, were motivated to study Multi-Submod-Cover due to a geometric application that has connections to the classical Ham-Sandwich theorem as well as problems in feature selection in machine learning. Their problem is the following. Given m point sets P 1 , . . . , P m in R d they wish to find the smallest number of hyperplanes (or other geometric shapes) such that no point set P i has more than a constant factor of its points in any cell of the arrangement induced by the chosen hyperplanes; in particular when the constant is a half, the problem is related to the Ham-Sandwich theorem which implies that when m ≤ d just one hyperplane suffices! From this one can infer that m/d hyperplanes always suffice. However for a given instance it may be possible to do much better. In Har-Peled and Jones (2018) the authors considered the problem of approximating the smallest number of hyperplanes required for the desired partitioning and reduced this problem to Multi-Submod-Cover. Via Wolsey's greedy algorithm they obtain an O(log m +log N ) approximation where N is the total number of points. In applications N is likely to be large while m is likely to be quite small and hence an approximation that does not depend on N is desirable. It is not obvious how to model this problem as a special case of Multi-Submod-Cover-Har-Peled and Jones (2018) describes one such reduction and we describe a slightly different one later in the paper.
Multiple Partial Set Cover Constraints in Geometric Settings: There has been extensive work on Set Cover specialized to geometric settings via sophisticated techniques (Brönnimann and Goodrich 1995;Clarkson and Varadarajan 2007;Varadarajan 2009Varadarajan , 2010Chan et al. 2012;Mustafa et al. 2014). Consider for example the problem of covering a given collection of points in the plane by a minimum weight subcollection of a given collection of weighted disks. This admits a constant factor approximation via the natural LP relaxation (Chan et al. 2012) in contrast to the logarithmic integrality gap and hardness known for general Set Cover instances. A natural question is whether this improved result also holds for Partial-SC version; here we are only required to cover k of the given points rather than all of them.  developed a simple and elegant black box technique for this purpose via a standard LP relaxation. They show that if there is a β-approximation for a deletion-closed class of Set Cover instances 2 via the standard LP, then there is 2(β + 1) approximation for Partial-SC for the same family. A natural extension of Partial-SC is to have multiple constraints. Consider the setting where the points are colored by r colors (equivalently they are partitioned into r sets) and the goal is to find the minimum weight subset of a given collection of disks such that at least k i points from color class i are covered; one can also consider the setting where the color classes are not disjoint. Bera et al. (2014) considered multiple partial covering constraints in the restricted setting of Vertex Cover and obtained an O(log r )-approximation. This was generalized in Hong and Kao (2018) to instances of Set Cover with maximum frequency Δ to obtain a (ΔH r + H r )-approximation via a primal-dual algorithm. A natural open question here was whether one can obtain an O(Δ+log r )-approximation and whether one can generalize further and obtain an O(β + log r )-approximation for all deletion-closed families of Set Cover that admit a β-approximation. We refer to this problem as Multi-Partial-SC. Note Multi-Partial-SC is a special case of Multi-Submod-Cover where each f i is a truncated coverage function (equivalently a partial set cover function). Now we discuss two geometric variants of Set Cover that induce deletion-closed set systems, and for which β is known to be sub-logarithmic in some special settings. In HittingSet, we are given a collection of geometric objects U and a collection of points P. If a point p ∈ P is contained in an object U ∈ U, then U is said to be hit by p. In the weighted version, each point has a non-negative weight. The goal is to find a minimum-weight set of points that hits all objects from U. In the Geometric DominatingSet, we are given an intersection graph G = (V , E) of geometric objects such as disks, with non-negative weights on vertices. A vertex v is said to dominate itself and its neighbors. The goal is to find a minimum-weight subset of vertices V that dominates at all vertices from V . In the partial version of DominatingSet (resp. HittingSet), the goal is to dominate at least k vertices (resp. hit at least k objects). We summarize known results for Geometric Set Cover, HittingSet and Geometric DominatingSet in Table 1.
Facility Location with Multiple Outliers: Facility location is an extensively studied problem and there are several variants. In the basic Uncapacitated Facility Location problem (which we abbreviate to Facility Location) the input consists of a set of facilities F and a set of clients C in a metric space (F ∪ C, d). Each facility i ∈ F has a non-negative opening cost f i . The goal is to open a set of facilities and connect the clients to them to minimize the sum of the opening costs of the facilities plus the sum of the distances of each client to the nearest open facility-mathematically we want to find S ⊆ F to minimize i∈S f i + j∈C d( j, S). In many scenarios there are outliers and instead of asking for all the clients to be connected we only seek to connect some specified number k of clients-this has been studied under the name the Robust Facility Location problem by Charikar et al. (2001) who obtained a constant factor approximation. We consider here the setting of multiple outlier classes. We have r disjoint classes of clients C 1 , C 2 , . . . , C r and we need to connect to the open facilities some specified number b i of clients from C i for 1 ≤ i ≤ r . An O(r )-approximation is easy by considering each client class separately but the natural question is whether we can find an O(log r )-approximation; via a reduction from Set Cover one can show an Ω(log r ) lower bound on the approximability of this problem. We refer to Facility Location with Multiple Outliers as FL-Multi-Outliers. We note that FL-Multi-Outliers is not a special case of Multi-Submod-Cover since the objective function has both facility opening cost as well as client connection cost. Nevertheless, the problem is sufficiently close to Multi-Submod-Cover that the techniques we develop are applicable to this problem as well.
We also consider a related problem of clustering to minimize the sum of radii. This problem was considered by Charikar and Panigrahy (2004) who gave a constant approximation using a primal-dual algorithm. A constant approximation for the outlier version was given by Ahmadian and Swamy (2016). We consider a further generalization of covering r classes of clients, while minimizing the sum of radii. We call this problem MCC-Multi-Outliers. We formally define MCC-Multi-Outliers in Sect. 5, and give a tight O(log r )-approximation using similar techniques.

Results and contributions
In this paper we examine approximation algorithms for Multi-Submod-Cover and Multi-Partial-SC and the motivating applications. Instead of relying on the greedy algorithm we use mathematical programming based approaches and tools from con-tinuous extensions of submodular extensions that allow us to handle the special cases of interest that arise from the applications. In addition to the technical results we showcase the utility of the models in capturing interesting applications. Our algorithmic results are summarized below.
-For Multi-Submod-Cover we obtain a bicriteria approximation. We obtain a random solution S such that f i (S) ≥ (1−1/e−ε)k i for 1 ≤ i ≤ r and the expected weight of S is O( 1 ε log r )OPT 3 . We obtain the same bound even in a more general setting where the system of constraints is r -sparse. We apply this result to the splitting points application and obtain an O(log m) bicriteria approximation that suffices in many scenarios. This improves the O(log m + log N ) approximation obtained in Har-Peled and Jones (2018).
-We consider a simultaneous generalization of Multi-Partial-SC and CIPs for deletion-closed set systems that admit a β-approximation for Set Cover via the natural LP. We obtain a randomized O(β + log r ) approximation where r is the sparsity of the system. This generalizes and improves bounds for multiple covering versions of Vertex-Cover from Bera et al. (2014), Hong and Kao (2018). In particular, we obtain O(Δ + log r )-approximation for Multi-Partial-SC in the set systems with maximum frequency Δ, improving on H r (Δ + 1)-approximation by Hong and Kao (2018). Furthermore, we obtain O(β + log r )-approximations for several geometric Multi-Partial-SC problems, where β is known to be sublogarithmic or constant (cf. Table 1). -We obtain O(log r ) approximations for FL-Multi-Outliers and MCC-Multi-Outliers generalizing the previous bounds for one class of outliers to multiple classes of outliers. As noted before, these bounds are tight up to constant factors via simple reductions from Set Cover. -For deletion-closed set systems that have a β-approximation (cf. Table 1) to Set Cover via the natural LP we obtain an e e−1 (β + 1)-approximation for Partial-SC. This slightly improves the bound of 2(β + 1) in  while also simplifying the algorithm and analysis. A brief discussion of technical ideas: Multi-Submod-Cover admits a reduction to Submod-SC for which the greedy algorithm is a known approach. To obtain our bicriteria approximation we take a different approach based on the multilinear relaxation or submodular functions which plays a fundamental role in submodular function maximization algorithms.
For addressing Multi-Partial-SC and its generalization we follow the high-level approach used already in the special setting of Vertex-Cover by Bera et al. (2014)they used an LP relaxation strengthened with knapsack cover inequalities. We bring two technical ingredients to bear on this problem. First we extend a probabilistic inequality used in Bera et al. (2014) to the general set cover setting and this is not obvious. We provide a proof that relies on continuous extensions of submodular functions and certain concentration properties, which, we believe, provides a clean and transparent explanation. Second, we use randomized rounding plus alteration to extend the results to the sparse setting-this is inspired by recent work on CIPs Chekuri and Quanrud (2019).  (Gibson and Pirwani 2010) Finally, for Partial-SC we simplify the algorithm and analysis from  via connections to submodular function maximization and continuous extensions.
We believe that the problems, applications and technical tools that we demonstrate in this paper are likely to be useful for other problems in the future.

Other related work
Partial-SC has been well-studied in the past for special cases such as the Partial Vertex Cover (PartialVC) where the goal is to find a minimum weight subset of nodes in a graph to cover at least k edges. There are several 2-approximations known for PartialVC (Bar-Yehuda 2001; Bshouty and Burroughs 1998;Gandhi et al. 2004). More generally, for set systems with maximum frequency Δ, similar techniques give O(Δ)-approximations (Gandhi et al. 2004;Bera et al. 2014;Könemann et al. 2011). Surprisingly, the black box reduction of  from Partial-SC to Set Cover via the LP relaxation, that we mentioned earlier, is fairly recent. In some restricted geometric settings, PTASes-polynomial time (1 + )-approximations for any constant > 0-are known via the shifting technique and local search (Chan and Hu 2015;Gandhi et al. 2004;Inamdar 2019). CIPs have been studied extensively for several years starting with the work of Dobson (1982). The introduction of KC inequalities Carr et al. (2000) led to the first O(log m)approximation by Kolliopoulos and Young (2005). Recent work (Chen et al. 2016; has obtained sharp bounds that depend on the 0 and 1 sparsity of the constraint matrix. Constrained submodular set function optimization (maximization and minimization) has been a topic of much interest in recent years and it is difficult to provide a concise summary. Continuous extensions of submodular functions and mathematical programming approachs have played an important role. We refer the reader to Buchbinder and Feldman (2017) for a survey on submodular function maximization which provides several pointers. The literature on clustering and facility location problems is vast. We are motivated by work on approximation algorithms for handling outliers and generalizing it to handle multiple groups of client. Many papers on clustering are in the model where the number of clusters k is specified and different objectives lead to well-studied problems such as k-median, k-means and k-center. Recent work on fair clustering (see Bera et al. 2019 and pointers) has also considered multiple groups of clients. The specific problems we consider and techniques we use are different. We leave it to future work to better understand the relationship between clustering with fairness constraints and clustering with outliers.
Organization: In Sect. 2, we introduce necessary background on other related problems and submodular functions. In Sect. 3, we give a bicriteria approximation algorithm to Multi-Submod-Cover, and apply it for a geometric problem in 3.1. We consider the special case of Multi-Partial-SC in Sect. 4. Next, we adapt these techniques to obtain similar results for FL-Multi-Outliers and MCC-Multi-Outliers in Sect. 5. In Sect. 6, we sketch a proof of the improved approximation for Partial-SC, using some of the similar techniques used elsewhere in the paper. We conclude in Sect. 7 with some open problems.

Preliminaries and background
Set Cover and Partial-SC have natural LP relaxations and they are closely related to those for Max k-Cover and Max-Budgeted-Cover. The LP relaxation for Set Cover (SC-LP) is shown in Fig. 1a. It has a variable x i for each set S i ∈ S, which, in the integer programming formulation, indicates whether S i is picked in the solution. The goal is to minimize the weight of the chosen sets which is captured by the objective S i ∈S w i x i subject to the constraint that each element e j is covered. The LP relaxation for Partial-SC (PSC-LP) is shown in Fig. 1b. Now we need additional variables to indicate which k elements are going to be covered; for each e j ∈ U we thus have a variable z j for this purpose. In PSC-LP it is important to constrain z j to be at most 1.
The constraint e j z j ≥ k forces at least k elements to be covered fractionally.
As noted in prior work the integrality gap of PSC-LP can be made arbitrarily large but it is easy to fix by guessing the largest cost set in an optimum solution and doing some preprocessing. We discuss this issue in later sections. show LP relaxations for Max k-Cover and Max-Budgeted-Cover respectively. In these problems we maximize the number of elements covered subject to an upper bound on the number of sets or on the total weight of the chosen sets. Greedy algorithm: The greedy algorithm is a well-known and standard algorithm for the problems studied here. The algorithm iteratively picks the set with the current maximum bang-per-buck ratio and adds it to the current solution until some stopping condition is met. The bang-per-buck of a set S i is defined as |S i ∩U |/w i where U is the set of uncovered elements at that point in the algorithm. For minimization problems such as Set Cover and Partial-SC the algorithm is stopped when the required number of elements are covered. For Max k-Cover and Max-Budgeted-Cover the algorithm is stopped when if adding the current set would exceed the budget. Since this is a standard algorithm that is extremely well-studied we do not describe all the formal details and the known results. Typically the approximation guarantee of Greedy is analyzed with respect to an optimum integer solution. We need to compare it to the value of the fractional solution. For the setting of the cardinality constraint this was already done in Nemhauser et al. (1978). We need a slight generalization to the budgeted setting and we give a proof for the sake of completeness.

Lemma 2.1 Let Z be the optimum value of (MBC-LP) for a given instance of Max-Budgeted-Cover with budget B. -Suppose Greedy algorithm is run until the total weight of the chosen sets is equal to or exceeds B. Then the number of elements covered by greedy is at least
-Suppose no set covers more than cZ elements for some c > 0 then the weight of sets chosen by Greedy to cover (1 − 1/e)Z elements is at most (1 + ec)B.
Proof We give a short sketch. Greedy's analysis for Max-Budgeted-Cover is based on the following key observation. Consider the first set S picked by Greedy. Then |S|/w(S) ≥ OPT/B where OPT is the value of an optimum integer solution. And this follows from submodularity of the coverage function. This observation is applied iteratively with the residual solution as sets are picked and a standard analysis shows that when Greedy first meets or exceeds the budget B then the total number of elements covered is at least (1 − 1/e)OPT. We claim that we can replace OPT in the analysis by Z . Given a fractional solution x, z we see that Z = e z e ≤ e∈U min{1, i:e∈S i x i }.
Via simple algebra, we can obtain a contradiction if |S i |/w i < Z /B holds for all sets S i . Once we have this property the rest of the analysis is very similar to the standard one where OPT is replaced by Z . Now consider the case when no set covers more than cZ elements. Note that without loss of generality, we can assume that the weight of every set is at most B -otherwise no feasible solution contains such a set. If Greedy covers (1 − 1/e)Z elements before the weight of sets chosen exceeds B then there is nothing to prove. Otherwise let S j be the set added by Greedy when its weight exceeds B for the first time. Let α ≤ |S j | be the number of new elements covered by the inclusion of S j . Since Greedy had covered less than (1 − 1/e)Z elements, the value of the residual fractional solution is at least Z /e. From the same argument as in the preceding paragraph, since Greedy Since Greedy covers at least (1 − 1/e)Z elements after choosing S j (follows from the first claim of the lemma), the total weight of the sets chosen by Greedy is at most We note that the conclusions of the preceding lemma hold even for the following generalization of Max-Budgeted-Cover. Here each element e ∈ U has a nonnegative "profit" p e associated with it, and the goal is to find a collection of sets with weight at most B, such that the overall profit of the covered elements is maximized. One difference in the argument is that the "bang-per-buck" of a set S is defined as e∈S p e w(S) .

Submodular set functions and continuous extensions
Continuous extensions of submodular set functions have played an important role in algorithmic and structural aspects. The idea is to extend a discrete set function f : 2 N → R to the continous space [0, 1] N . Here we are mainly concerned with extensions motivated by maximization problems, and confine our attention to two extensions and refer the interested reader to Calinescu et al. (2007), Vondrák (2007) for a more detailed discussion.
The multilinear extension of a real-valued set function f : where R is a random set obtained by picking each i ∈ N independently with probability x i .
The concave closure of a real-valued set function f : 2 N → R, denoted by f + , is defined as the optimum of an exponential sized linear program: A special case of submodular functions are non-negative weighted sums of rank functions of matroids. More formally suppose N is a finite ground set and M 1 , M 2 , . . . , M are matroids on the same ground set N . Let g 1 , . . . , g be the rank functions of the matroids and these are monotone submodular. Suppose f = h=1 w h g h where w h ≥ 0 for all h ∈ [ ], then f is monotone submodular. We note that (weighted) coverage functions belongs to this class. For a such a submodular function we can consider an extensionf wheref (x) = h w h g + h (x). We capture two useful facts which are shown in Calinescu et al. (2007).

Remark 2.3
Let f : 2 S → Z + be the coverage function associated with a set system (U, S). Thenf (x) = e∈U min{1, i:e∈S i x i } wheref = e∈U g + e and g + e (x) = min{1, i:e∈S i x i } is the rank function of a simple uniform matroid. One can see PSC-LP in a more compact fashion: Lemma 2.4 (Vondrák 2010) Let f : 2 N → R + be a 1-Lipschitz monotone submodular function. For x ∈ [0, 1] N let R be a random set drawn from the product distribution induced by x. Then for δ ≥ 0, Greedy algorithm under a knapsack constraint: Consider the problem of maximizing a monotone submodular function subject to a knapsack constraint; formally max f (S) s.t. w(S) ≤ B where w : N → R + is a non-negative weight function on the elements of the ground set N . Note that when all w(i) = 1 and B = k this is the problem of maximizing a monotone submodular function subject to a cardinality constraint. For the cardinality constraint case, the simple greedy algorithm that iteratively picks the element with the largest marginal value yields a (1 − 1/e)-approximation Nemhauser et al. (1978). Greedy extends in a natural fashion to the knapsack constraint setting; in each iteration the element i = arg max j f S ( j)/w j is chosen where S is the set of already chosen elements (here, Sviridenko (2004), building on earlier work on the coverage function (Khuller et al. 1999), showed that greedy with some partial enumeration yields a (1 − 1/e)-approximation for the knapsack constraint. The following lemma quantifies the performance of the basic Greedy when it is stopped after meeting or exceeding the budget B.
Lemma 2.5 Consider an instance of monotone submodular function maximization subject to a knapsack constraint. Let Z be the optimum value for the given knapsack budget B. Suppose the greedy algorithm is run until the total weight of the chosen sets is equal to or exceeds B. Letting S be the greedy solution we have f (S) ≥ (1 − 1/e)Z.

A bicriteria approximation for MULTI-SUBMOD-COVER
In this section we consider Multi-Submod-Cover. Let N be a finite ground set. For each j ∈ [h] we are given a submodular function f j : 2 N → R + , and the corresponding k j ≥ 0. We are also given a non-negative weight function w : N → R + . The goal is to solve the following covering problem: We say that i ∈ N is active in constraint j if f j (i) > 0, otherwise it is inactive. We say that the given instance is r -sparse if each element i ∈ N is active in at most r constraints.
Theorem 3.1 There is a randomized polynomial-time approximation algorithm that given an r -sparse instance of Multi-Submod-Cover outputs a set S ⊆ N such that The rest of the section is devoted to the proof of the preceding theorem. We will assume without loss of generality that k i = 1 for each i which can be arranged by scaling. Also, we will assume that f i (N ) = 1; otherwise we can work with the truncated function min{1, f i (S)} which is also submodular. This technical assumption plays a role in the analysis later.
We consider a continuous relaxation of the problem based on the multilinear extension. Instead of finding a set S we consider finding a fractional point x ∈ [0, 1] N . For any value B ≥ OPT where OPT is the optimum value of the original problem, the following continuous optimization problem has a feasible solution.
(MP-Submod-Relax) min One cannot hope to solve the preceding continuous optimization problem since it is NP-Hard. However the following approximation result is known and is based on extending the continuous greedy algorithm of Vondrák (2008), Cȃlinescu et al. (2011).
Theorem 3.2 (Chekuri et al. 2010(Chekuri et al. , 2015 For any fixed ε > 0, there is a randomized polynomial-time algorithm that given an instance of MP-Submod-Relax and value oracle access to the submodular functions f 1 , . . . , f h , with high probability, either correctly outputs that the instance is not feasible or outputs an x such that (i) Using the preceding theorem and binary search one can obtain an x such that It remains to round this solution. We use the following algorithm based on the high-level framework of randomized rounding plus alteration.
1. Let S 1 , S 2 , . . . , S be random sets obtained by picking elements independently and randomly times according to the fractional solution x.
, fix the constraint. That is, find a set T j using the greedy algorithm (via Lemma 2.5) such that f j (T j ) ≥ (1 − 1/e). We . It remains to choose and bound the expected cost of S ∪ T .
The following is easy from randomized rounding stage of the algorithm.

Lemma 3.3 E[w(S)]
We now bound the probability that any fixed constraint is not satisfied after the randomized rounding stage of the algorithm. Let I j be the indicator for the event that f j (S) < (1 − 1/e − 2ε).

Lemma 3.4 For any j ∈ [h], Pr[I
Proof Let I j,k be indicator for the event that f j (S k ) < (1 − 1/e − 2ε). From the definition of the multilinear extension, for any k . We upper bound α as follows. Recall that f j (N ) ≤ 1 and hence by monotonicity we have we can upper bound α by the following: . Using the fact that for 1 1+x ≤ 1 − x/2 for sufficiently small x > 0, we simplify and see that α ≤ 1 − eε 2(1+eε) ≤ 1 − ε for sufficiently small ε > 0. Since the sets S 1 , . . . , S are chosen independently, Pr[I j,k ] ≤ α .

Remark 3.5
The simplicity of the previous proof is based on the use of the multilinear extension which is well-suited for randomized rounding.
The assumption that f j (N ) ≤ 1 is technically important and it is easy to ensure in the general submodular case but is not straightforward when working with specific classes of functions.
Lemma 3.6 Let OPT j be the value of an optimum solution to the problem min w(S) s.t. f j (S) ≥ 1. Then, h j=1 OPT j ≤ r OPT. Proof Let S * be an optimum solution to the problem of covering all h constraints. Let N j be the set of active elements for constraint j. It follows that S * ∩ N j is a feasible solution for the problem of covering just f j . Thus OPT j ≤ w(S * ∩ N j ). Hence We now bound the expected cost of T Assuming the claim, from the description of the algorithm, we have Now we prove the claim. Consider the problem min w(S) s.t. f j (S) ≥ 1. OPT j is the optimum solution value to this problem. Now consider the following submodular function maximization problem subject to a knapsack constraint: max f j (S) s.t. w(S) ≤ OPT j . Clearly the optimum value of this maximization problem is at least 1. From Lemma 2.1, the greedy algorithm when run on the maximization problem, outputs a solution T j such that f (T j ) ≥ (1 − 1/e) and w(T j ) ≤ OPT j + max i w i . By guessing the maximum weight element in an optimum solution to the maximization problem we can ensure that max From the preceding lemmas it follows that

An application to splitting point sets
Har-Peled and Jones (2018), as we remarked, were motivated to study Multi-Submod-Cover due to a geometric application. We recall the problem. Given m point sets P 1 , . . . , P m in R d find the smallest number of hyperplanes (or other geometric shapes) such that no point set P i has more than α fraction of its points in any cell of the arrangement induced by the chosen hyperplanes; in particular when α = 1/2 the problem is related to the Ham-Sandwich theorem which implies that when m ≤ d just one hyperplane suffices 4 . From this one can infer that m/d hyperplanes always suffice, however we are interested in approximating the optimum number of hyperplanes for a given instance. Let k i = |P i | and let P = ∪ i P i . We will assume, for notational simplicity, that the sets P i are disjoint. The assumption can be dispensed with.
In Har-Peled and Jones (2018) the authors reduce their problem to Multi-Submod-Cover as follows. Let N be the set of all hyperplanes in R d ; we can confine attention to a finite subset by restricting to those half-spaces that are supported by d points of P. For each point set P i they consider a complete graph G i on the vertex set P i . For each p ∈ ∪ i P i they define a submodular function f p : 2 N → R + where f p (S) is the number of edges incident to p that are cut by S; an edge ( p, q) with p, q ∈ P i is cut if p and q are separated by at least one of the hyperplanes in S. Thus one can formulate the original problem as choosing the smallest number of hyperplanes such that for each p ∈ P the number of edges that are cut is at least k p where k p is the demand of p. To ensure that P i is partitioned such that no cell has more than k i /2 points we set k p = k i /2 for each p ∈ P i ; more generally if we wish no cell to have more than βk i points of P i we set k p = (1 − β)k i for each p ∈ P i . As a special case of Multi-Submod-Cover we have We now show that one can obtain an O(log m)-approximation if we settle for a bicriteria approximation where we compare the cost of the solution to that of an optimum solution, but guarantee a slightly weaker bound on the partition quality. This could be useful since one can imagine several applications where m, the number of different point sets, is much smaller than the total number of points. Consider the formulation from Har-Peled and Jones (2018). Suppose we used our bicriteria approximation algorithm for Multi-Submod-Cover. The algorithm would cut (1 − 1/e − ε)k p edges for each p and hence for 1 ≤ i ≤ m we will only be guaranteed that each cell in the arrangement contains at most (1 − (1 − 1/e − ε)/2)k i points from P i . This is acceptable in many applications. However, the approximation ratio still depends on n since the number of constraints in the formulation is n. We describe a related but slightly modified formulation to obtain an O(log m)-approximation by using only m constraints.
Given a collection S ⊆ N let f i (S) denote the number of pairs of points in P i that are separated by S (equivalently the number of edges of G i cut by S). It is easy to see that f i (S) is a monotone submodular function over N . Suppose S ⊆ N induces an arrangement such that no cell in the arrangement contains more than (1 − β)k i points for some 0 < β < 1. Consider a point p ∈ P i belonging to some cell. Note that the degree of p in G i is k i − 1. Furthermore, the number of edges incident to p that are not cut by S are exactly the edges of the form ( p, q), where q belongs to the same cell as p in the arrangement. Therefore, the number of edges incident to p that are cut by S is at least k i − 1 − (k i · (1 − β) − 1) = βk i . Thus, the total number of edges cut from G i is at least βk 2 i /2, since each edge is counted twice from its endpoints. In particular if β = 1/2 then S cuts at least k 2 i /4 edges. Conversely, if S cuts at least αk 2 i edges for some α < 1/2 then no cell in the arrangement induced by S has more than (1 − Ω(α))k i points from P i . Given this we can consider the formulation below.
We apply our bicriteria approximation for Multi-Submod-Cover with some fixed ε to obtain an O(log m)-approximation to the objective but we are only guaranteed that the output S satisfies the property that f i (S) ≥ (1 − 1/e − ε)k 2 i /4 for each i. This is sufficient to ensure that no P i has more than a constant factor in each cell of the arrangement.
The running time of the algorithm we describe depends polynomially on N and m and N can be upper bounded by n d . The running time in Har-Peled and Jones (2018) is O(mn d+2 ). Finding a running time that depends polynomially on n, m and d is an interesting open problem.

Approximating MULTI-PARTIAL-SC
In this section we consider a problem that generalizes Multi-Partial-SC and CIPs while being a special case of Multi-Submod-Cover. We call this problem CCF (Covering Coverage Functions). Bera et al. (2014) already considered this version in the restricted context of Vertex-Cover. Formally the input is a weighted set system (U, S) and a set of inequalities of the form Az ≥ b where A ∈ [0, 1] h×n matrix and b ∈ R h + is a positive vector. The goal is to optimize the integer program CCF-IP shown in Fig. 3a. Multi-Partial-SC is a special case of CCF when the matrix A contains only {0, 1} entries. On the other hand CIP is a special case when the set system is very restricted and each set S i consists of a single element. We say that an instance is r -sparse if each set S i "influences" at most r rows of A; in other words the elements of S i have non-zero coefficients in at most r rows of A. This notion of sparsity coincides in the case of CIPs with column sparsity and in the case of Multi-Submod-Cover with the sparsity that we saw in Sect. 3. It is useful to explicitly see why CCF is a special case of Multi-Submod-Cover. The ground set N = [m] corresponds to the sets S 1 , . . . , S m in the given set system (U, S). Consider the row k of the covering constraint matrix Az ≥ b. We can model it as a constraint f k (S) ≥ b k where the submodular set function f k : 2 N → R + is defined as follows: for a set X ⊆ N we let f k (X ) = e j ∈∪ i∈X S i A k, j which is simply a weighted coverage function with the (a) (b) Fig. 3 Modeling CCF weights coming from the coefficients of the matrix A. Note that when formulating via these submodular functions, the auxiliary variables z 1 , . . . , z n that correspond to the elements U are unnecessary. We prove the following theorem.

Theorem 4.1 Consider an instance of r -sparse CCF induced by a set system (U, S) from a deletion-closed family with a β-approximation for Set Cover via the natural LP. There is a randomized polynomial-time algorithm that outputs a feasible solution of expected cost O(β + ln r )OPT.
The natural LP relaxation for CCF is shown in Fig. 3b. It is well-known that this LP relaxation, even for CIPs with only one constraint, has an unbounded integrality gap (Carr et al. 2000). For CIPs knapsack-cover inequalities are used to strengthen the LP. KC-inequalities in this context were first introduced in the influential work of Carr et al. (2000) and have since become a standard tool in developing stronger LP relaxations. Bera et al. (2014) adapt KC-inequalities to the setting of PartitionVC, and it is straightforward to extend this to CCF (this is implicit in Bera et al. (2014)).

Remark 4.2
Weighted coverage functions are a special case of sums of weighted rank functions of matroids. The natural LP for CCF can be viewed as using a different, and in fact a tighter extension, than the multilinear relaxation (Calinescu et al. 2007). The fact that one can use an LP relaxation here is crucial to the scaling idea that will play a role in the eventual algorithm. The main difficulty, however, is the large integrality gap which arises due to the partial covering constraints.
We set up and the explain the notation to describe the use of KC-inequalities for CCF. It is convenient here to use the reduction of CCF to Multi-Submod-Cover.
Writing the preceding inequality for every possible choice of D and for every k we obtained a strengthened LP that we show in Fig. 4. CCF-KC-LP has an exponential number of constraints and the separation problem involves submodular functions. A priori it is not clear that there is even an approximate separation oracle. However, as noted in Bera et al. (2014), one can combine the rounding procedure with the Ellipsoid method to obtain the desired guarantees even though we do not obtain a fractional solution that satisfies all the KC inequalities. This observation holds for rounding as well. In particular, it suffices to obtain a fractional solution that satisfies constraints (2), (3), (5), and (6) (note that there are polynomially many such constraints), and constraint 4 for a particular D ⊆ S satisfying certain properties. Then, it can be shown that the ellipsoid algorithm returns such a solution in polynomially many iterations. In the following, we discuss the rounding and the analysis under the assumption that the LP can be solved exactly, and return to the issue of KC inequalities at the end of the section. Rounding and analysis assuming LP can be solved exactly: Let (x, z) be an optimum solution to CCF-KC-LP. We can assume without loss of generality that for each element e j ∈ U we have z j = min{1, i:e j ∈S i x i }. As in Sect. 6 we split the elements in U into heavily covered elements and shallow elements. For some fixed threshold τ that we will specify later, let U he = {e j | z j ≥ τ }, and U sh = U \ U he . We will also choose another threshold. The rounding algorithm is the following.
1. Solve a Set Cover problem via the natural LP to cover all elements in U he . Let Y 1 be the sets chosen in this step. 2. Let Y 2 = {S i | x i ≥ τ } be the heavy sets. 3. Repeat for = Θ(ln r ) rounds: independently pick each set with probability 1 τ x i . Let Y 3 be the sets chosen in this randomized rounding step.

For
be the residual requirement of k'th constraint. (b) Run the modified Greedy algorithm to satisfy the residual requirement. Let F k be the sets chosen to fix the constraint (could be empty).

Output
. The algorithm at a high level is similar to that in Bera et al. (2014). There are two main differences. First, we explicitly fix the constraints after the randomized rounding phase using a slight variant of the Greedy algorithm. This ensures that the output of the algorithm is always a feasible solution; this makes it easier to analyze the r -sparse case while a straight forward union bound will not work. Second, the analysis relies on a probabilistic inequality that is simpler in Vertex-Cover case while it requires a more sophisticated approach here. We now describe the modified Greedy algorithm to fix a constraint. For an unsatisfied constraint k we consider the collection of sets that influence the residual requirement for k, and partition them it into H k and L k . H k is the collection of all sets such that choosing any of them completely satisfies the residual requirement for k, and L k are the remaining sets. The modified Greedy algorithm for fixing constraint k picks the better of two solutions: (i) the first solution is the cheapest set in H k (this makes sense only if H k = ∅) and (ii) the second solution is obtained by running Greedy on sets in L k until the constraint is satisfied.
Analysis: We now analyze the expected cost of the solution output by the algorithm. The lemma below bounds the cost of Y 1 .

Lemma 4.3 The cost of Y
Proof Recall that z * j ≥ τ for each e j ∈ U he . Consider x i = min{1, 1 τ x i }. It is easy to see that x is a feasible fractional solution for SC-LP to cover U he using sets in S. Since the set family is deletion-closed, and the integrality gap of the SC-LP is at most β for all instances in the family, there is an integral solution covering U he of cost at most β i w i x i ≤ 1 τ β i w i x i . The expected cost of randomized rounding in the second step is easy to bound.

Lemma 4.4 The expected cost of Y 3 is at most
An analog of the following lemma for PartitionVC was proved by Bera et al. (2014). However, in PartitonVC, each element is contained in at most two sets, which is crucially used in their proof. Consequently, their proof does not readily generalize to set systems with unbounded frequency. We rely on tools from submodularity to prove this lemma even in the general case of CCF.
Lemma 4.5 Fix a constraint k. If τ is a sufficiently small but fixed constant, the probability that constraint k is satisfied after one round of randomized rounding is at least a fixed constant c τ .
Before we give a proof of this lemma, we finish the rest of the analysis first. Let I k = {i | S i influences constraint k}. Note that for any i ∈ N , |{k ∈ [h] : S i ∈ I k }| ≤ r by our sparsity assumption. Lemma 4.6 Let ρ k be the cost of fixing constraint k if it is not satisfied after randomized rounding. Then ρ k ≤ c τ i∈I k w i x i for some constant c τ .
Proof We will assume that τ < (1−1/e)/2. Let D = Y 1 ∪Y 2 and let b k = b k − f k (D) be residual requirement of constraint k after choosing Y 1 and Y 2 . Let U = U \ U D be elements in the residual instance; all these are shallow elements. Consider the scaled solution x where x i = 1 if S i ∈ D and x i = 1 τ x i for other sets. For any shallow element e j let z j = min{1, i: j∈S i x i }; since e j is shallow we have z j = 1 τ z j = i: j∈S i ,i / ∈D x i . Recall from the description of the modified Greedy algorithm that a set S i is in Then it is not hard to see that the cheapest set from H k will cover the residual requirement and has cost at most 2 i∈H k w i x i and we are done. We now consider the case when i∈H k x i < 1/2. Let L k = I k \ H k . For each j ∈ U let z j = i: j∈S i ,i∈L k x i . We claim that j∈U A k, j z j ≥ 1 2τ b k . Since τ ≤ (1 − 1/e)/2 this implies j∈U A k, j z j ≥

1
(1−1/e) b k . Assuming the claim, if we run Greedy on L k to cover at least b k elements then the total cost, by Lemma 2.1, is at most (1 + e) i∈L k w i x i ; note that we use the fact that no set in L k has coverage more than b k and hence c = 1 in applying Lemma 2.1.
We now prove the claim. Since the x, z satisfy KC inequalities: We split the LHS into two terms based on sets in H k and L k . Note that if i ∈ H k then Putting together the preceding two inequalities and the condition that i∈H k x i < 1/2 (recall that We have, by swapping the order of summation, The preceding two inequalities prove the claim.
With the preceding lemmas we can finish the analysis of the total expected cost of the sets output by the algorithm. From Lemma 4.5 the probability that any fixed constraint k is not satisfied after the randomized rounding step is at most c , for some constant c < 1. By choosing ≥ 1 + log 1/c r we can reduce this probability to at most 1/r . Thus, as in the preceding section, the expected fixing cost is k since the given instance is r -sparse. Thus the expected fixing cost is at most Putting together, the total expected cost is at most O(β + log r ) i w i x i where the constants depend on τ . We need to choose τ to be sufficiently small to ensure that Lemma 4.5 holds. We do not attempt to optimize the constants or specify them here.
Submodularity and proof of Lemma 4.5: We follow some notation that we used in the proof of Lemma 4.6. Let D = Y 1 ∪ Y 2 and consider the residual instance obtained by removing the elements covered by D and reducing the coverage requirement of each constraint. The lemma is essentially only about the residual instance. Fix a constraint k and recall that b k is the residual coverage requirement and that each set in H k fully satisfies the requirement by itself. Recall that x i = 1 τ x i ≤ 1 for each set i / ∈ D and z j = 1 τ z j = i:e j ∈S i x i for each residual element e j . As in the proof of Lemma 4.6 we consider two cases. If i∈H k x i ≥ 1/2 then with probability (1 − 1/ √ e) at least one set from H k is picked and will satisfy the requirement by itself. Thus the interesting case is when i∈H k x i < 1/2. Let U = ∪ i∈L k S i . As we saw earlier, in this case For ease of notation we let N = L k be a ground set. Consider the weighted coverage function g : 2 N → R + where g(T ) for T ⊆ L k is given by j∈∪ i∈T S i A k, j . Then for a vector y ∈ [0, 1] N the quantity j∈U A k, j min{1, i: j∈S i y i } is the continuous extensiong(y) discussed in Sect. 2. Thus we haveg(x ) ≥ 1 2τ b k . From Lemma 2.2, we have G(x ) ≥ (1 − 1/e) 1 2τ b k where G is the multilinear extension of g. If we choose τ ≤ (1 − 1/e)/4 then G(x ) ≥ 2b k . Let Z be the random variable denoting the value of g(R) where R x . Independent random rounding of x preserves G(x ) in expectation by the definition of the multilinear extension, therefore E[Z ] = G(x ) ≥ 2b k . Moreover, by Lemma 2.4, Z is concentrated around its expectation since G(i) ≤ b k for each i ∈ L k . An easy calculation shows that Pr[Z < b k ] ≤ e −1/4 < 0.78. Thus with constant probability g(R) ≥ b k .

Solving the LP with KC inequalities:
The proof of the performance guarantee of the algorithm relies on the fractional solution satisfying KC inequalities with respect to the set D = Y 1 ∪ Y 2 . Thus, given a fractional solution (x, z) for the LP we can check the easy constraints in polynomial time and implement the first two steps of the algorithm. Once Y 1 , Y 2 are determined we have D and one can check if (x, z) satisfies KC inequalities with respect to D (for each row of A). If it does then the rest of the proof goes through and performance guarantee holds with respect to the cost of (x, z) which is a lower bound on OPT. If some constraint does not satisfy the KC inequality with respect to D we can use this as a separation oracle in the Ellipsoid method.

Facility location and minimum sum of radii clustering
In this section, we consider two well-studied problems related to clustering in the setting where there are r partial covering constraints. As we mentioned previously, the generalization of Facility Location problem does not quite fit in the Multi-Submod-Cover framework. However, we are able to adapt the techniques to this problem. We obtain O(log r )-approximations for these two problems and they are treated in the next two sections.

Facility location with multiple outliers
Here, we consider a generalization of the Facility Location Problem that is analogous to Multi-Partial-SC. We show how to adapt the randomized rounding framework, along with existing LP-based approximation algorithms for the standard Facility Location problem, to obtain an O(log r ) approximation for this generalization.
In FL-Multi-Outliers, we are given a set of facilities F, a set of clients C, belonging to a metric space (F ∪C, d). Each facility i ∈ F has a non-negative opening cost f i . We are given r non-empty subsets of clients C 1 , . . . , C r , that partition the set C of clients. Each color class C k has a connection requirement 1 ≤ b k ≤ |C k |. The objective of FL-Multi-Outliers is to find a solution (F * , C * ) that minimizes i∈F f i + j∈C d( j, F ) over all feasible solutions (F , C ). We say that a solution (F , C ) is feasible if (i) |F | ≥ 1 and (ii) For all classes C k , |C k ∩ C | ≥ b k . Note that the special case with just one class is the Robust Facility Location problem, first considered by Charikar et al. (2001).
A natural LP formulation of this problem is as follows. Next, we discuss strengthening of this LP by adding KC inequalities and solving the strengthened LP.
Solving a Strengthened LP: First, we convert the LP into a feasibility LP by guessing the optimal cost up to a factor of 2, say Δ, and by adding a cost constraint i∈F f i x i + i∈F, j∈C y i j · d(i, j) ≤ Δ. Similar to Sect. 4, we use the Ellipsoid algorithm to find a feasible LP solution that satisfies the cost constraint, constraints 7 to 10, and an additional KC inequality specified below. Fix an LP solution that satisfies all these constraints (except possibly the additional one). With respect to this solution, let H = { j ∈ C | i∈F y i j ≥ τ } be the set of heavy clients, where τ is a constant as in Sect. 4.
Let L = C \ H be the set of light clients. Note that for any light client j ∈ L, z j ≤ i∈F y i j < τ. Also, for a class C k , let C k (H ) := C k \ H denote the light clients from C k , and let b k := b k − |C k ∩ H | denote its residual connection requirement. Now, we check whether the following constraint holds for all color classes C k : First, note that this can be formulated as an LP constraint by introducing auxiliary variables. It is easy to see that any integral solution satisfies this constraint for any H ⊆ C, and hence it is valid. If this constraint is not satisfied for some class C k , we report it as a violated constraint to the Ellipsoid algorithm.
Rounding: Now, suppose we have an LP solution (x, y, z) that satisfies 7 to 10, and has cost at most Δ. Let H and L be the corresponding heavy and light clients. Furthermore, suppose the LP solution satisfies Constraint 11 with respect to H and L.
For any i ∈ F, let x i := min{1, 1 τ x i }, and for any i ∈ F, j ∈ H , let y i j := min{1, 1 τ y i j }. It is easy to see that (x , y ) is a feasible Facility Location (without outliers) LP solution for the instance induced by the heavy clients, and its cost is at most Δ/τ . We use an LP-based algorithm (such as Byrka et al. (2010)) with a constant approximation guarantee to round this solution to an integral solution (F H , H ) For handling light clients, we "split" the facilities into multiple co-located copies if necessary, we ensure the following two conditions hold: 1. For any facility i ∈ F, x i < τ. 2. For any client j ∈ L and any facility i ∈ F, y i j > 0 ⇒ y i j = x i . This has to be done in a careful manner -we give the details in appendix A. This procedure results in a feasible LP solution of essentially the same cost. Henceforth, we treat all co-located copies of a facility as distinct facilities for the sake of the analysis. We now show that the rounding for the light clients can be reduced to the randomized rounding algorithm for Multi-Partial-SC from the previous section.
For any facility i ∈ F, let S i := { j ∈ L | x i = y i j } denote the set of light clients that are fractionally connected to i. The cost of opening facility i and connecting all j ∈ S i to i is equal to w i := f i + j∈S i d (i, j). Consider a Multi-Partial-SC instance (U, S), where S = {S i | i ∈ F} with weights w i , and residual coverage requirement b k for each class C k (H ). We obtain an LP solution (x, z) for this instance of Multi-Partial-SC from the LP solution (x, y, z) for the Facility Location problem, by taking the variables x i for i ∈ F and z j for j ∈ L. The following properties are satisfied by this LP solution (x, z).
1. All the elements are light, and all the sets S i ∈ S have x i < τ. 2. The costs of the two LP solutions are equal: 3. Constraint 11 is equivalent to: This is exactly the KC inequality (1) required for rounding Multi-Partial-SC.
Therefore, we can use the randomized rounding plus alteration algorithm from Sect. 4 to obtain a solution Y for the Multi-Partial-SC. It has cost at most O(log r ) · Δ, and for each class C k (H ), it covers at least b k clients. To obtain a solution for the original facility location problem, we open a facility i ∈ F, if its corresponding set S i is selected in Y , and connect to it all the clients in S i . Notice that we connect b k clients from C k (H ) to the set of opened facilities in this manner. The cost of this solution is upper bounded by w(Y ) ≤ O(log r ) · Δ. Combining this with the solution (F H , H ) for the heavy clients of cost at most O(1) · Δ, we obtain our overall solution for the given instance. It is easy to see that this is an O(log r ) approximation.
Theorem 5.1 There is a randomized polynomial-time algorithm that outputs a feasible solution of expected cost O(log r )· OPT for FL-Multi-Outliers.

Minimum sum of radii clustering with multiple outliers
Here, we are given a set of facilities F, a set of clients C and a metric space (F ∪C, d). Each facility i ∈ F has a non-negative opening cost f i . We are given r classes of clients C 1 , . . . , C r . Each color class C k has a coverage requirement b k where 1 ≤ b k ≤ |C k |. A ball centered at a facility i ∈ F of radius ρ ≥ 0 is the set B(i, ρ) := { j ∈ C | d(i, j) ≤ ρ}. The goal is to select a set of balls B = {B i = B(i, ρ i ) | i ∈ F ⊆ F} centered at some subset of facilities F ⊆ F, such that (i) the set of balls B satisfies 2(β + 1)-approximation. The rounding algorithm in  can be seen as an adaptation of pipage rounding Ageev and Sviridenko (2004) for Max-Budgeted-Cover. The details are somewhat technical and perhaps obscure the high-level intuition that scaling up the LP solution allows one to use a bicriteria approximation for Max-Budgeted-Cover. Our contribution is to simplify the fourth step in the preceding algorithm. Here is the last step in our algorithm; the other steps are the same modulo the specific choice of τ . 4'. Run Greedy to cover k elements from U .
We now analyze the performance of our modified algorithm. Lemma 6.1 Suppose τ ≤ (1 − 1/e). Then running Greedy in the final step outputs a solution of total weight at most Proof It is easy to see that e j ∈U z * j ≥ k since e j ∈U z * j ≥ k and z * j ≤ 1 for each e j . Let (U , S ) be the set system obtained by restricting (U, S) to U , and let (x , z ) be the restriction of (x * , z * ) to the set system (U , S ). We have ( Consider (x , z ) obtained from (x , z ) as follows. For each e j ∈ U set z j = 1 τ z j and note that z j ≤ 1. For each set S i set is also a feasible solution to the LP formulation MBC-LP. We apply Lemma 2.1 to this fractional solution. Suppose we stop Greedy when it covers k elements or when it first crosses the budget B, whichever comes first. Clearly the total weight is at most B + max i w i . We argue that at least k elements are covered when we stop Greedy. The only case to argue is when Greedy is stopped when the weight of sets picked by it exceeds B for the first time. From Lemma 2.1 it follows that Greedy covers at least (1 − 1/e)Z elements but since Z ≥ 1 τ k it implies that Greedy covers at least k elements when it is stopped.
We formally state a lemma to bound the cost of covering U h . The proof of this lemma is identical to that of Lemma 4.3, and therefore omitted.

Lemma 6.2 The cost of covering
Finally, we can analyze the approximation guarantee of the overall solution.
Theorem 6.3 Setting τ = (1 − 1/e), the algorithm outputs a feasible solution of total cost at most e e−1 (β + 1)OPT where OPT is the value of an optimum integral solution. Proof Fix an optimum solution. Let W be the weight of a maximum weight set in the optimum solution. In the first step of the algorithm we can assume that the algorithm has correctly guessed a maximum weight set from the fixed optimum solution. Let OPT = OPT − W . In the residual instance the weight of every set is at most W . The optimum solution value for PSC-LP, after guessing the largest weight set and removing it, is at most OPT . From Lemma 6.2, the cost of covering U h is at most e e−1 βOPT .

Concluding remarks
The paper shows the utility of viewing Partial-SC and its generalizations as special cases of Multi-Submod-Cover. The coverage function in set systems is a submodular funtion that belongs to the class of sum of weighted matroid rank functions. Certain ideas for the coverage function extend to this larger class. Are there interesting problems that can be understood through this view point? Are there other special classes of submodular functions for which one can obtain uni-criteria approximation algorithms for Multi-Submod-Cover unlike the bicriteria one we presented? An interesting example is the problem considered in Har-Peled and Jones (2018). The algorithm in this paper for Multi-Partial-SC, like the ones in Bera et al. (2014), relies on using the Ellipsoid method to solve the LP with KC inequalities. It may be possible to avoid the inherent inefficiency in this way of solving the LP via some ideas from recent and past work (Carr et al. 2000;.
in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Splitting Facilities
In this section, we show how to "split" the facilities in order to satisfy the following properties required by our rounding algorithm for the Facility Location with Multiple Outliers problem. 1. For any facility i ∈ F, x i < τ.
2. For any client j ∈ L and any facility i ∈ F, y i j > 0 ⇒ y i j = x i . 3.
i∈F min x i · b k , j∈C k (H ) Let 0 < δ < τ be a small quantity such that all positive x and y-values are integral multiples of δ. Note that assuming the all variables are rational, such a δ must exist.
Fix any facility i ∈ F. Let X = x i /δ and Y j = y i j /δ for any j ∈ L. By assumption, X and Y j are integers, and by LP constraint, Y j ≤ X for any j ∈ L. We replace i with multiple (X in number) co-located copies, we denote this set by copies(i) = {i 1 , i 2 , . . . , i X }. The x-value of each copy is set to be δ. For clarity, we denote the new x-values of the copies byx. Now, fix any class C k . For any client j ∈ C k (H ) with y i j > 0, we will connect j to Y j distinct copies from copies(i), and set the y-value of each such assignment equal to δ (which we will again denote byỹ for clarity). Notice that this will satisfy the second property. For any ∈ copies(i), let C k denote the clients from C k (H ) that will be assigned to in this manner. We will also satisfy the following property while making these assignments.
Notice that that the term on the RHS is exactly the contribution of i to the LHS of Constraint 12, whereas the sum on the LHS is the contribution of the copies after splitting. Therefore, maintaining this property for all the facilities will guarantee that Constraint 12 holds at the end. Now, we consider two cases about the term on the RHS of Constraint 13.
We process clients j ∈ C k (H ) with Y j > 0 in an arbitrary order. Let j be the first client in this order. We connect it to the copies i 1 , i 2 , . . . , i , i.e., set y i 1 j = y i 2 j = . . . = y i j = δ. Here, = Y j ≤ X , so we connect j to at most X copies. Now, we consider the second client j , and connect it to the copies i +1 , i +2 , · · · where the indices in the subscript are used modulo X . That is, after using i X for a connection, we use i 1 for the next connection. We continue in this manner for all clients in C k (H ).
Since Y j ≤ X and j∈C k (H ) Y j ≤ X · b k , it is easy to see we process all clients while connecting at most b k clients to each copy. Therefore, we ensure Multiplying both sides by δ, we obtain ∈copies(i)x · min{b k , |C k |} ≥ j∈C k (H ) y i j , thus ensuring (13).
In this case, we arbitrarily and integrally decrease Y -values of clients to obtain Yvalues, so that we have X · b k = j∈C k (H ) Y j . Now, we use the assignment scheme from the previous case to obtain: Now, we increase Y j to Y j and arbitrarily connect client j to Y j −Y j copies to which it is already not connected. Again, since Y j ≤ X , there are enough copies available for making these extra connections. This may increase |C |, thereby increasing the LHS of (14). Therefore, we ensure that: Multiplying both sides by δ, we obtain ∈copies(i)x · min{b k , |C k |} ≥ x i · b k = min{x i · b k , j∈C k (H ) y i j }, thus ensuring (13).
Since any client j ∈ L belongs to exactly one class, it is part of exactly one reassignment process for each original facility i. Therefore, processing each facility and each class in this manner will result in a new set of facilities that satisfies all the desired properties. In particular, note that for all facilities after splitting, we have x i = δ < τ. Furthermore, x i = ∈copies(i)x , and y i j = ∈copies(i)ỹ j , for all i ∈ F, j ∈ L. Therefore, the cost of the new solution is equal to the cost of the original solution.
Note that using standard tricks, we can ensure that 1/δ is polynomial in the input, at the expense of a tiny but insignificant increase in the cost of the solution, which can be absorbed in the O(log r ) approximation guarantee. Thus, the splitting procedure runs in polynomial time, and the size of resulting instance is also polynomial in the input.