Algorithmica

, Volume 74, Issue 4, pp 1205–1223 | Cite as

Shrinking Maxima, Decreasing Costs: New Online Packing and Covering Problems

  • Pierre Fraigniaud
  • Magnús M. Halldórsson
  • Boaz Patt-Shamir
  • Dror Rawitz
  • Adi Rosén
Article

Abstract

We consider two new variants of online integer programs that are duals. In the packing problem we are given a set of items and a collection of knapsack constraints over these items that are revealed over time in an online fashion. Upon arrival of a constraint we may need to remove several items (irrevocably) so as to maintain feasibility of the solution. Hence, the set of packed items becomes smaller over time. The goal is to maximize the number, or value, of packed items. The problem originates from a buffer-overflow model in communication networks, where items represent information units broken into multiple packets. The other problem considered is online covering: there is a universe to be covered. Sets arrive online, and we must decide for each set whether we add it to the cover or give it up. The cost of a solution is the total cost of sets taken, plus a penalty for each uncovered element. The number of sets in the solution grows over time, but its cost goes down. This problem is motivated by team formation, where the universe consists of skills, and sets represent candidates we may hire. The packing problem was introduced in Emek et al. (SIAM J Comput 41(4):728–746, 2012) for the special case where the matrix is binary; in this paper we extend the solution to general matrices with non-negative integer entries. The covering problem is introduced in this paper; we present matching upper and lower bounds on its competitive ratio.

Keywords

Competitive analysis Randomized algorithm Packing integer programs Online set packing Team formation  Prize-collecting multi-covering 

1 Introduction

In this paper we study two related online problems based on the classic packing and covering integer programs. The first is a general packing problem called Online Packing Integer Programs (abbreviated opip). In this problem we are given a set of \(n\) items and a collection of knapsack constraints over these items. Initially the constraints are unknown and all items are considered packed. In each time step, a new constraint arrives, and the online algorithm needs to remove some items (irrevocably) so as to maintain feasibility of its solution. The goal is to maximize the number, or value, of packed items. Formally, the offline version of the problem we consider is expressed by the following linear integer program (\(\mathbb {N}\) denotes the set of non-negative integers):
$$\begin{aligned} \begin{array}{lll} \max &{} \quad \sum \limits _{j=1}^n b_j x_j \\ \text {s.t.} &{} \quad \sum \limits _{j=1}^n a_{ij} x_j \le c_i &{}\quad \forall i \\ &{}\quad x_j \le p_j &{}\quad \forall j \\ &{}\quad x_j \in \mathbb {N}&{}\quad \forall j \end{array} \end{aligned}$$
(PIP)
We assume that \(A \in \mathbb {N}^{m \times n}\) and \(c \in \mathbb {N}^n\). The value of \(x_j\) represents the number of copies of item \(j\) that are packed, \(p_j\) is a cap (an upper bound) on the number of copies of item \(j,\,b_j\) is the benefit obtained by packing item \(j\), and \(c_i\) is the capacity of the \(i\)th constraint. The online character of opip is expressed by the following additional assumptions: (1) knapsack constraints arrive one by one, and (2) the variables can only be decreased. The special case, where \(A \in \{0,1\}^{m \times n}\) and \(c = 1^n\) is known as Online Set Packing [8].
An LP-relaxation of (PIP) is obtained by replacing the integrality constraints by \(x_j \ge 0\), for every \(j\). It follows that the integral version of the dual of the LP-relaxation is:
$$\begin{aligned} \begin{array}{lll} \min &{} \quad \sum \limits _{i=1}^m c_i y_i + \sum \limits _{j=1}^n p_j z_j \\ \text {s.t.} &{} \quad \sum \limits _{i=1}^m a_{ij} y_i + z_j \ge b_j &{}\quad \forall j \\ &{}\quad y_i \in \mathbb {N}&{}\quad \forall i \\ &{}\quad z_j \in \mathbb {N}&{}\quad \forall j \end{array} \end{aligned}$$
(TF)
The program (TF) describes the offline version of the second problem considered in this paper, called the Team Formation problem (for reasons that will become apparent below). In this problem we are given \(n\) elements, where element \(j\) has a covering requirement \(b_j\) and a penalty \(p_j\). There are \(m\) sets,1 where the coverage of set \(i\) of element \(j\) is \(a_{ij}\) and its cost is \(c_i\). The solution is a collection of the sets, where multiple copies of sets are allowed. The cost of a solution is the cost of selected sets plus the penalties for unsatisfied covering requirements. In (TF), the value of \(y_i\) represents the number of copies of \(i\) taken by the solution, and \(z_j\) is the amount of unsatisfied coverage of set \(j\) (for which we pay penalty).

Our online version of the Team Formation problem, denoted otf, is as follows. Initially, the elements are uncovered—and hence incur a unit penalty per each unit of uncovered element. Sets with various coverage and cost arrive online. In each time step, a new set arrives, and the algorithm must decide how many copies of the arriving set to add to the solution. The goal is to minimize the total cost of sets taken plus penalties for uncovered elements.

Our main measure, as is customary with online algorithms, is the competitive ratio: in the covering case, the ratio of cost incurred by the algorithm (expected cost if the algorithm is randomized) to the best possible cost for the given instance, and in the packing case, the ratio between the benefit earned by the optimum solution to the (expected) benefit earned by the algorithm.

Motivation. The otf problem is an abstraction of the following situation (corresponding to a binary matrix and binary requirements). We are embarking on a new project that requires some \(n\) skills. The requirement for skill \(j\) can be satisfied by outsourcing for some cost \(p_j\), or by hiring an employee who possesses skill \(j\). The goal is to minimize the project cost under the following procedure: We interview candidates one by one. After each interview we know the skills and the hiring cost of the candidate and must then decide irrevocably whether to hire the candidate.

The opip problem originates from the following natural networking situation [8]. High-level information units, called frames, can be too large to fit in a single network packet, in which case the frames are fragmented into multiple packets. As packets traverse the network, they may arrive at a bottleneck link that cannot deliver them all, giving rise to a basic online question: which packets to drop so as to maximize the number of frames that are delivered in full. If we ignore buffers, this question is precisely our version of opip. Namely, in each time step \(i\), a burst of packets arrives, corresponding to the \(i\)th constraint in (PIP): \(a_{ij}\) is the size of the packet from frame \(j\) that arrives at step \(i\), and \(c_i\) is the total size that the link can deliver at time \(i\).

Our problems appear unique in the literature of online computation in that solutions get progressively smaller with time. Traditionally, the initial solution is expected to be the empty set, and its value or cost only increases as the input is progressively presented. In our class of problems, some aspects of the input are known, inducing a naïve initial solution. The presented input progressively elucidates the structure of the instance, adding more constraints (in maximization problems) or providing increasing opportunities for cost reductions or optimizations (in minimization problems). In reality, the issue is often less what to include than what to keep. We feel that this complementary viewpoint is natural and deserves further treatment.

Contribution and Results. The contributions of this paper are twofold. On the conceptual level, we are the first to formalize the otf problem, to the best of our knowledge (the opip problem was introduced in [8]). On the technical level, we present nearly tight results for both the opip and the otf problems.

For opip, we extend the results of [8] from a binary matrix to the case of general non-negative integer demands. This is a useful extension when we consider our motivating network bottleneck scenario: it allows the algorithm to deal with packets of different size, while previous solutions were restricted to uniform-size packets. The competitive ratio of our algorithm is \(O(C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}})\), where \(C_{\mathrm{max}}\) the maximal sum of entries in a column, and \(\rho _{\mathrm{max}}\) is the maximal ratio of the load on constraint \(i\), namely \(\sum _j p_j a_{ij}\), to its capacity \(c_i\). Observe that for the case of unit caps (i.e., \(p=1\)), \(\rho _{\mathrm{max}}\) is the sum of entries in a row \(i\) to its capacity \(c_i\). We remark that the extension is non-trivial, although it uses known techniques.

Regarding otf, we prove matching upper and lower bounds on the competitive ratio: We show that even randomized algorithms cannot have competitive ratio better than \(\Omega (\sqrt{\rho _{\mathrm{max}}})\), where \(\rho _{\mathrm{max}}\) is the maximal ratio, over all elements, between the highest and lowest cost of covering a given element. This result holds even for the case where the algorithm may discard a set from its running solution (but never takes back a set that was dismissed). On the other hand, we give a simple deterministic algorithm with a competitive ratio of \(O(\sqrt{\rho _{\mathrm{max}}})\). The algorithm requires prior knowledge of the value of \(\rho _{\mathrm{max}}\); we show that without such knowledge only the trivial \(O(\rho _{\mathrm{max}})\) bound is possible.

We note that our techniques can be used for the variant of otf in which \(y_i\) is bounded (e.g., there is only one copy of a given candidate).

Related Work. Online packing was studied in the past, but traditionally the elements of the universe (equivalently, the constraints) are given ahead of time and sets arrive on-line (e.g., in [2]). In a similar vein, online set cover was defined in [1] as follows. A collection of sets is given ahead of time. Elements arrive online, and the algorithm is required to maintain a cover of the elements that arrived: if the arriving element is not already covered, then some set from the given collection must be added to the solution. Our problems have the complementary view of what is known in advance and what arrives online (see also [5]).

Let us first review some results for the offline packing problem pip. The single constraint case (\(m=1\)) is simply the Knapsack problem, which is NP-hard and has an FPTAS [17, 21]. If the number of constraints is constant, the offline version of pip becomes the Multi-dimensional Knapsack problem that has a PTAS [11], while obtaining an FPTAS is NP-hard [18]. Raghavan and Thompson [20] used randomized rounding to obtain solutions whose benefit is \(t_1 = \Omega (\textsc {opt}/m^{1/\alpha })\) for pip, where \(\alpha = \min _j \min _i \frac{c_j}{a_{ij}}\). A solution of benefit \(t_2 = \Omega (\textsc {opt}/m^{1/(\alpha +1)})\) is also given for the case where \(A \in \left\{ 0,1 \right\} ^{m \times n}\) (In this case \(\alpha = \min _j c_j\)). Srinivasan [22] improved these results by obtaining solutions whose benefits are \(\Omega (t_1^{\alpha /(\alpha -1)})\) and \(\Omega (t_2^{\alpha /(\alpha -1)})\). Chekuri and Khanna [6] showed that, for every fixed integer \(\alpha \) and fixed \(\varepsilon >0\), pip with \(c = \alpha ^m\) and \(A \in \left\{ 0,1 \right\} ^{m \times n}\) cannot be approximated within a factor of \(m^{1/(\alpha +1) - \varepsilon }\), unless NP \(=\) ZPP. They also showed that pip with uniform capacities cannot be approximated within a factor of \(m^{1/(\alpha +1) - \varepsilon }\), unless NP \(=\) ZPP, even with a resource augmentation factor \(\alpha \) (In this case the solution \(x\) satisfies \(Ax \le \alpha c\)).

As mentioned before, the special case of pip where \(A \in \{0,1\}^{m \times n}\) and \(c = 1^n\) is known as set packing. This problem is as hard as Maximum Independent Set even when all elements have degree \(2\) (i.e., \(A\) contains at most two non-zero entries in each row), and therefore cannot be approximated to within a factor of \(O(n^{1-\epsilon })\), for any \(\varepsilon >0\) [15]. In terms of the number of elements (constraints, in pip terms), set packing is \(O(\sqrt{m})\)-approximable and hard to approximate within \(m^{1/2-\varepsilon }\), for any \(\varepsilon >0\) [13]. When set sizes are at most \(k\) (\(A\) contains at most \(k\) non-zero entries in each column), it is approximable to within \((k+1)/3+\varepsilon \), for any \(\varepsilon > 0\) [7], and within \((k+1)/2\) in the weighted case [4], but known to be hard to approximate to within \(o(k/\log k)\)-factor [16].

opip was introduced in [8], assuming that the matrix is binary, namely each set requires either one or zero copies of each item. A randomized algorithm was given for that case with a competitive ratio of \(O(k\sqrt{\nu })\), where \(k\) is the maximal set size and \(\nu \) is the maximal ratio, over all items, between the number of sets containing that item to the number of its copies. In opip terms this bound is \(O(C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}})\). A nearly matching lower bound of \(\tilde{\Omega }(k\sqrt{\nu })\) was also given for the unit capacities case. This translates to an \(\tilde{\Omega }(C_{\mathrm{max}}\sqrt{\rho _{\mathrm{max}}})\) lower bound for opip. Subsequent work extended these results to allow for redundancy [19], i.e., when the benefit of a set is earned when at least a \(\beta \)-fraction of its elements are assigned to it, for some fixed \(\beta > 0\). For the special case of unit capacity opip in which the constraint matrix has the consecutive ones property, a deterministic \(O(\log R_{\mathrm{max}})\)-competitive algorithm was given in [14], where \(R_{\mathrm{max}}\) the maximal sum of entries in a row, as well as a matching lower bound.

Previously, the online packing problem where sets arrive online and constraints are fixed was defined in [2], and an \(O(\log n)\)-competitive algorithm given for the case when each set requires at most a \(1/\log n\)-fraction of the cap of any element. A matching lower bound shows that this requirement is necessary to obtain a polylogarithmic competitive ratio.

Regarding team formation, we are unaware of any prior formalization of the problem, let alone analysis. The online cover problem defined in [1] has an algorithm with competitive ratio \(O(\log n\log m)\). Another related problem is the secretary problem (see, e.g., [10, 12]; further results and references can be found in [3, 9]). In this family of problems, \(n\) candidates arrive in random order (or with random value), and the goal is to pick \(k\) of them (classically, \(k=1\)) that optimize some function of the set, such as the probability of picking the candidates with the top \(k\) values, or the average rank of selected candidates. The difficulty, similar to our otf formulation, is that the decision must be taken immediately upon the candidate’s arrival. However, the stipulation that the input is random makes the secretary problem very different from otf. Another difference is that unlike otf, the number of candidates to pick is set in advance.

Paper Organization. The remainder of this paper is organized as follows. In Sect. 2 we introduce some notation. In Sect. 3 we describe and analyze our online algorithm for opip, and in Sect. 4 we consider otf.

2 Preliminaries

In this section we define our notation. Given a matrix \(A \in \mathbb {N}^{m \times n}\), let \(R(i) = \sum _j a_{ij}\) be the sum of entries in the \(i\)th row, and let \(C(j) = \sum _i a_{ij}\) be the sum of entries in the \(j\)th column. Denote \(R_{\mathrm{max}} = \max _i R(i)\) and \(C_{\mathrm{max}} = \max _j C(j)\). Define \(\rho (i) = (\sum _j p_ja_{ij})/c_i\), for every \(i\), and \(\rho _{\mathrm{max}}= \max _i \rho (i)\).

Observe that if \(\sum _j p_j a_{ij} \le c_i\) for some \(i\), in an opip instance, then constraint \(i\) is redundant. Hence, we assume w.l.o.g. that \(\sum _j p_j a_{ij} > c_i\) for every \(i\), which means that \(\rho (i) > 1\), for every \(i\).

We assume hereafter that \(\text {gcd}(a_{i1},\ldots ,a_{in},c_i)=1\), for every \(i\). Otherwise, we may divide \(a_{i1},\ldots ,a_{in}\), and \(c_i\) by this common factor. This does not change \(\rho (i)\), but it may decrease \(C_{\mathrm{max}}\) and our bound on the competitive ratio. On the other extreme, we assume that \(a_{ij} \le c_i\) for every \(i\) and \(j\): if \(a_{ij}> c_i\) then item \(j\) is not a member in any feasible solution.

Given a subset \(J\) of items and a constraint \(i\), let \(J(i) = \left\{ j \in J : a_{ij}>0 \right\} \) be the subset of items from \(J\) that participate in constraint \(i\). For example, if \(\textsc {opt} \) is the set of items in some fixed optimal solution, then \(\textsc {opt} (i)\) denotes the items in \(\textsc {opt} \) that are active in constraint \(i\). Also, let \(R_J(i) = \sum _{j \in J} a_{ij}\), and define the weighted benefit of a constraint \(i\) as \(wb(i) = \sum _j a_{ij} \cdot b_j\).

Given an otf instance, \(R(i) = \sum _j a_{ij}\) is the coverage potential of a single copy of set \(i\), and \(\sum _j p_j a_{ij}\) is the potential savings in penalties of a single copy of set \(i\). Hence, \(\rho (i)\) is the ratio between the savings and cost of set \(i\), namely it is the cost effectiveness of set \(i\). Observe that we may assume that \(\rho (i)>1\), since otherwise we may ignore the set. Intuitively, the cheapest possible way to cover the elements is by sets with maximum cost effectiveness. Hence, ignoring the sets and simply paying the penalties (i.e., the solution \(y=0\) and \(z=b\)) is a \(\rho _{\mathrm{max}}\)-approximate solution.

3 Online Packing Integer Programs

In this section we present a randomized algorithm for opip whose competitive ratio is \(2C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}}\). We describe an algorithm for opip with unit caps, namely for the case where \(p_j=1\), for every \(j\), that is a slight generalization of the algorithm given in [8], allowing us to deal with non-binary instances. We solve the general case by simply treating each item \(j\) as \(p_j\) items, namely by duplicating the \(j\)th column \(p_j\) times. Observe that this transformation does not change \(C_{\mathrm{max}}\) or \(\rho _{\mathrm{max}}\).

For the rest of this section we assume that unit item upper bounds, namely that \(p=1\). In particular, we assume that \(\rho (i) = R(i)/c_i\), for every \(i\).

Random Variables. For \(w >0\), let \(D_w: \mathbb {R}\rightarrow [0,1]\) be a (cumulative) distribution function of a random variable \(Z\) that is defined by
$$\begin{aligned} D_w(z) = \Pr [Z \le z] = \left\{ \begin{array}{ll} 0 &{}\quad \text {if}\, z < 0; \\ z^w &{}\quad \text {if}\, 0 \le z < 1; \\ 1 &{}\quad \text {if}\, 1 \le z. \end{array} \right. \end{aligned}$$
Note that \(D_1\) is the uniform distribution over \([0,1]\) and, in general, for a positive integer \(q,\,D_q\) is the distribution of the maximum of \(q\) independent and identically distributed variables, each uniformly distributed over \([0,1]\).

Algorithm RP. Initially, we independently choose for each item \(j\) a random priority \(r(j) \in [0,1]\) with distribution \(D_{b_j}\). When constraint \(i\) arrives, we construct \(c_i\) subsets \(S_{i1},\ldots ,S_{ic_i}\) as follows. Each item \(j\) chooses \(a_{ij}\) subsets at random. Then, for each subset \(S_{i\ell },\,\ell \in \left\{ 1,\ldots ,c_i \right\} \), we reject all items but the one with the highest priority. Observe that an item survives only if it has the highest priority in all of its chosen sets.

Example 1

Supposed that the instance contains four items whose priorities are \(r(1) = 0.5,\,r(2) = 0.8,\,r(3) = 0.4\), and \(r(4) = 0.9\). Upon arrival of the \(i\)th constraint: \(x_1 + 3x_2 + 2x_3 + 2 x_4 \le 4\), Algorithm RP constructs \(c_i = 4\) random subsets: \(S_{i1} = \left\{ 1,3 \right\} ,\,S_{i2} = \left\{ 2,3,4 \right\} ,\,S_{i3} = \left\{ 2,4 \right\} \), and \(S_{i4} = \left\{ 2 \right\} \). Item \(2\) is eliminated due to \(S_2\) and \(S_3\), while Item \(3\) is eliminated due to \(S_1\) and \(S_2\). Items \(1\) and \(4\) are not eliminated by this constraint.

Intuitively, the approach is to prefer items with high priority. In the special case where \(a_{ij} \in \left\{ 0,1 \right\} \), one may simply choose the \(c_i\) items with highest priority. A somewhat more subtle approach, based on a reduction to the unit capacity case is used in [8]: Items are randomly partitioned into \(c_i\) equal-size subsets; from each subset only the top priority item survives. Our Algorithm RP use a variation of this approach: we construct \(c_i\) subsets whose expected sizes are equal, such that item \(j\) is contained in exactly \(a_{ij}\) subsets.

Analysis. Observe that each subset \(S_{i \ell }\) induces the following constraint: \(\sum _{j \in S_{i\ell }} x_j \le 1\). Hence, the algorithm implicitly constructs a new uniform capacity opip instance by defining the matrix \(A' \in \left\{ 0,1 \right\} ^{(\sum _i c_i) \times n}\) as follows: \(a_{\sum _{t<i} c_k + \ell ,j} = 1\) if and only if \(j \in S_{i \ell }\). Each row of \(A'\) corresponds to one of the random constraints generated by the algorithm. See example in Fig. 1.
Fig. 1

The inequalities that are induced by the sets in Example 1

In what follows we use \(m'\) to denote the number of rows in \(A'\), namely \(m' = \sum _i c_i\). Also, we use \(R'(i)\) to denote \(\sum _j a'_{ij},\,C'(j)\) to denote \(\sum _{i=1}^{m'} a'_{ij}\), and so forth. See example in Fig. 2. Notice that \(b\) remains the same, since the item set did not change. However, the weighted benefit of a new constraint \(i\) is \(wb'(i) = \sum _j a'_{ij} \cdot b_j\). Since \(A'\) is binary, \(wb'(i)\) is the sum of benefits that correspond to variables that appear in new constraint \(i\).
Fig. 2

The rows of \(A'\) that correspond to the inequalities that are given in Fig. 1. In this case we have that \(R'(\sum _{j < i} c_j + 1) = 2\)

Observation 1

\(C(j) = C'(j)\), for every \(j\), and \(\mathbb {E}[R'(\sum _{t<i} c_t + \ell )] = \rho (i)\), for every \(i\) and \(\ell \).

Proof

\(C(j) = C'(j)\), since the item \(j\) appears in \(a_{ij}\) new constraints with coefficient 1, for every such constraint \(i\). Each item \(j\) participates in the \(\ell \)th new constraint corresponding to original constraint \(i\) with probability \(a_{ij}/c_i\). Hence,
$$\begin{aligned} \mathbb {E}\left[ R'\left( \sum _{t<i} c_t + \ell \right) \right] = \sum _j \mathbb {E}\left[ a'_{\sum _{t<i} c_t + \ell , j}\right] = \sum _i \frac{a_{ij}}{c_i} = \frac{R(i)}{c_i} = \rho (i), \end{aligned}$$
where the last equality holds in the unit caps case. \(\square \)
Let \(N[j]\) denote the items that are in conflict with item \(j\), namely
$$\begin{aligned} N[j] = \left\{ k : \exists i,\ell \text { s.t. } j,k \in S_{i\ell } \right\} . \end{aligned}$$
Notice that \(j \in N[j]\). We also define \(N(j) = N[j] \setminus \{j\}\). Clearly, item \(j\) is satisfied by the algorithm if and only if its priority is higher than that of all other items with whom it competes, i.e., if \(r(j) > r(k)\), for every \(k \in N(j)\).

First, consider the probability of satisfying an item \(j\).

Lemma 2

\(\Pr [r(j) > \max \{r(k): k \in N(j)\}] = \mathbb {E}\left[ \frac{b_j}{b(N[j])} \right] .\)

Proof

Suppose that \(N(j)=N\) and let \(r_{\mathrm{max}} = \max \{r(k) : k \in N\}\). Then, for any \(z \in [0,1]\) we have
$$\begin{aligned} \Pr [r_{\mathrm{max}} < z] ~=~ \prod _{k \in N} \Pr [r(k) < z] ~=~ \prod _{k \in N} z^{b_k} ~=~ z^{\sum _{k \in N} b_k} ~=~ z^{b(N)}; \end{aligned}$$
that is, \(r_{\mathrm{max}}\) has distribution \(D_{b(N)}\). Hence,
$$\begin{aligned} \Pr [r(j)> r_{\mathrm{max}}]= & {} \int _0^1 \Pr [r_{\mathrm{max}} < z] \cdot f_{r(j)}(z) dz ~=~ \int _0^1 z^{b(N)} \cdot b_j z^{b_j-1} dz \\= & {} \frac{b_j}{b(N) + b_j}, \end{aligned}$$
where \(f_{r(j)}\) denotes the probability density function of the random variable \(r(j)\). It follows that
$$\begin{aligned}&\Pr [r(j) > \max \{r(k): k \in N(j)\}] \\&~=~ \sum _N \Pr [N(j) = N] \cdot \Pr [r(j) > \max \{r(k): k \in N\} | N(j)=N] \\&~=~ \sum _N \Pr [N(j) = N] \cdot \frac{b_j}{b(N) + b_j} \\&~=~ \mathbb {E}\left[ \frac{b_j}{b(N(j)) + b_j} \right] , \end{aligned}$$
as required. \(\square \)

Next, we provide a lower bound on the expected performance of Algorithm RP. We abuse notation by referring to the output of the algorithm by RP, as well.

Lemma 3

For any subset of items \(J,\,\mathbb {E}[b(\textsc {RP})] \ge \frac{\left( \sum _{j \in J} b_j\right) ^2}{\mathbb {E}\left[ \sum _{j \in J} b(N[j]) \right] }\).

Proof

By Lemma 2 and by linearity of expectation we obtain
$$\begin{aligned} \mathbb {E}[b(\textsc {RP})]&~=~ \sum _{j \in J} b_j \cdot \Pr [j \in \textsc {RP} ] \\&~=~ \sum _{j \in J} b_j \cdot \mathbb {E}\left[ \frac{b_j}{b(N[j])} \right] \\&~=~ \mathbb {E}\left[ \sum _{j \in J} \frac{b_j^2}{b(N[j])} \right] \\&~\ge ~ \mathbb {E}\left[ \frac{(\sum _{j \in J} b_j)^2}{\sum _{j \in J} b(N[j])} \right] , \end{aligned}$$
where the inequality is due to the following consequence of the Cauchy–Schwarz inequality (with \(b_j\) for \(\alpha _j\) and \(b(N[j])\) for \(\beta _j\)): for positive reals \(\alpha _1, \ldots , \alpha _n\) and \(\beta _1, \ldots , \beta _n\), we have \(\sum _j \frac{\alpha _j^2}{\beta _j} \ge \frac{\left( \sum _j \alpha _j\right) ^2}{\sum _j \beta _j}\). Jensen’s inequality (for a non-negative random variable \(X,\,\mathbb {E}\left[ \frac{1}{X}\right] \ge \frac{1}{\mathbb {E}\left[ X\right] }\)) then implies that
$$\begin{aligned} \mathbb {E}[b(\textsc {RP})] ~\ge ~ \mathbb {E}\left[ \frac{\left( \sum _{j \in J} b_j\right) ^2}{\sum _{j \in J} b(N[j])} \right] ~\ge ~ \frac{\left( \sum _{j \in J} b_j\right) ^2}{\mathbb {E}\left[ \sum _{j \in J} b(N[j]) \right] }, \end{aligned}$$
and the lemma follows. \(\square \)

Our next step is to bound \(\sum _{j \in J} b(N[j])\). Recall that, since \(A'\) is binary, \(wb'(i)\) is the sum of benefits that appear in new constraint \(i\). Hence, if \(j\) appears in new constraint \(i\), its weighted competition is at most \(wb'(i)\).

Lemma 4

Let \(J\) be a subset of items. Then, \(\sum _{j \in J} b(N[j]) ~\le ~ \sum _{i=1}^{m'} R'_J(i) \cdot wb'(i)\).

Proof

Observe that
$$\begin{aligned} \sum _{j \in J} b(N[j])&= \sum _{j \in J} \sum _{k \in N[j]} b_k \end{aligned}$$
(1)
$$\begin{aligned}&\le \sum _{j \in J} \sum _{(i,\ell ) : j \in S_{i\ell }} \sum _{k \in S_{i\ell }} b_k \end{aligned}$$
(2)
$$\begin{aligned}&= \sum _{j \in J} \sum _{(i,\ell ) : j \in S_{i\ell }} b(S_{i\ell }) \\&= \sum _{i=1}^{m} \sum _{\ell =1}^{c_i} |S_{i\ell } \cap J| \cdot b(S_{i\ell }) \nonumber \\&= \sum _{i=1}^{m'} R'_J(i) \cdot wb'(i) \nonumber ~, \end{aligned}$$
(3)
where (1) and (3) are by definition, and (2) is since there can be more than one collision. \(\square \)

To complete the analysis we derive appropriate upper bounds for the denominator when \(J=[n]\) and when \(J=\textsc {opt} \).

Lemma 5

$$\begin{aligned} \mathbb {E}\left[ \sum _{i=1}^{m'} R'_{[n]}(i) \bar{b'}(i) \right]&< 2\sum _{i=1}^m \rho (i) \cdot wb(i) ~, \end{aligned}$$
(4)
$$\begin{aligned} \mathbb {E}\left[ \sum _{i=1}^{m'} R'_{\textsc {opt}}(i) \bar{b'}(i) \right]&\le \sum _{j \in [n]} C(j) b_j + \sum _{j \in \textsc {opt}} C(j) b_j \le 2\sum _{j \in [n]} C(j) b_j ~. \end{aligned}$$
(5)

Proof

Consider \(i' \in [m']\) that corresponds to the \(\ell \)th new constraint of original constraint \(i\), and two items \(j \ne k\). We have that
$$\begin{aligned} \Pr [ j,k \in S_{i \ell }] = \Pr \left[ j \in S_{i\ell }\right] \cdot \Pr [k \in S_{i\ell }] = \frac{a_{ij}}{c_i} \cdot \frac{a_{ik}}{c_i}, \end{aligned}$$
due to the independence of the random choices of \(j\) and \(k\). Hence, for \(i \in [m]\) we have that
$$\begin{aligned} \mathbb {E}\left[ \sum _{\ell =1}^{c_i} R'_J\left( \sum _{t<i} c_t + \ell \right) \bar{b'}\left( \sum _{t<i} c_t + \ell \right) \right]&~=~ \sum _{j \in J(i)} \sum _k \sum _{\ell =1}^{c_i} b_k \Pr [ j,k \in S_{i \ell }] \\&~=~ \sum _{j \in J(i)} \! \frac{a_{ij}}{c_i} \sum _{k \ne j} c_i b_k \frac{a_{ik}}{c_i} + \!\! \sum _{j \in J(i)} \!\! c_i b_j \frac{a_{ij}}{c_i} \\&~\le ~ \sum _{j \in J(i)} \frac{a_{ij}}{c_i} \cdot wb(i) + \sum _{j \in J(i)} a_{ij} \cdot b_j \\&~\le ~ \rho _J(i) \cdot wb(i) + wb_J(i). \end{aligned}$$
It follows that
$$\begin{aligned} \mathbb {E}\left[ \sum _{i=1}^{m'} R'_J(i) \cdot wb'(i) \right]&~\le ~ \sum _i \rho _J(i) \cdot wb(i) + \sum _i wb_J(i). \end{aligned}$$
(6)
Since \(\rho (i)>1\), for every \(i\), Inequality (4) is obtained by assigning \(J = [n]\) in (6).
To prove Inequality (5) we assign \(J=\textsc {opt} \). In this case, \(\rho _{\textsc {opt}}(i) \le 1\), for every \(i\), since opt is a feasible solution. Hence
$$\begin{aligned} \mathbb {E}\left[ \sum _{i=1}^{m'} R'_\textsc {opt} (i) \cdot wb'(i) \right]&\le \sum _i wb(i) + \sum _i wb_{\textsc {opt}}(i) \\&= \sum _i \sum _j a_{ij} b_j + \sum _i \sum _{j \in \textsc {opt}} a_{ij} b_j \\&= \sum _j b_j \sum _i a_{ij} + \sum _{j \in \textsc {opt}} b_j \sum _i a_{ij} \\&= \sum _j b_j C(j) + \sum _{j \in \textsc {opt}} b_j C(j), \end{aligned}$$
and the lemma follows. \(\square \)

Lemma 5 implies that

Theorem 1

$$\begin{aligned} \mathbb {E}[b(\textsc {RP})]\ge & {} \max \left\{ \frac{b([n])^2}{2\sum _i \rho (i) \cdot wb(i)}, \frac{b(\textsc {opt})^2}{2\sum _j C(j)b_j} \right\} \\\ge & {} \frac{b([n]) b(\textsc {opt})}{2\sqrt{\sum _i \rho (i) \cdot wb(i) \cdot \sum _j C(j)b_j}}. \end{aligned}$$

Theorem 1 implies the following:

Corollary 2

There is an opip algorithm with competitive ratio at most \(2C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}}\).

Proof

By definition,
$$\begin{aligned} \sum _i \rho (i) \cdot wb(i) \le \rho _{\mathrm{max}}\sum _i \sum _j a_{ij} b_j = \rho _{\mathrm{max}}\sum _j b_j C(j) \le \rho _{\mathrm{max}}b([n]) C_{\mathrm{max}}, \end{aligned}$$
and
$$\begin{aligned} \sum _j C(j) b_j \le C_{\mathrm{max}} b([n]). \end{aligned}$$
Hence, it follows from Theorem 1 that
$$\begin{aligned} \mathbb {E}[b(\textsc {RP})] ~\ge ~ \frac{b([n]) b(\textsc {opt})}{2 \sqrt{\rho _{\mathrm{max}}b([n]) C_{\mathrm{max}} \cdot C_{\mathrm{max}} b([n])}} ~=~ \frac{b(\textsc {opt})}{2 C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}}}, \end{aligned}$$
and we are done. \(\square \)

4 Competitive Team Formation

In this section we provide a deterministic online algorithm for otf and a matching lower bound that holds even for randomized algorithms. Furthermore, our lower bound holds for a more general case, where the commitment of the online algorithm is only “one way” in the following sense. Once a set is dismissed it cannot be recruited again, but a set in the solution at one point may be thrown out of the solution later.

4.1 An Online Algorithm

Our algorithm generates a monotonically growing collection of sets based on a simple deterministic threshold rule. Recall that \(\rho _{\mathrm{max}}\) is the maximum cost effectiveness, over all sets. Algorithm Threshold assumes knowledge of \(\rho _{\mathrm{max}}\) and works as follows. Let \(y\) be the set vector constructed by Threshold, and define \(z^i_j = \max \left\{ b_j - \sum _{\ell \le i} a_{\ell j} y_\ell ,0 \right\} \), i.e., \(z^i_j\) is the amount of missing coverage for element \(j\) after the introduction of set \(i\). Note that \(z_j^i\) is monotone non-increasing with \(i\).

The solution is constructed as follows. Upon arrival of a new candidate \(i\), assign \(y_i \leftarrow v\), where \(v\) is the maximum integer that satisfies
$$\begin{aligned} v \cdot c_i \le \frac{\sum _j \min \left\{ v \cdot a_{ij} ,z^{i-1}_j \right\} \cdot p_j}{\sqrt{\rho _{\mathrm{max}}}}. \end{aligned}$$
(7)
Intuitively, we take the maximum possible number of units of set \(i\) that allows us to save a factor of at least \(\sqrt{\rho _{\mathrm{max}}}\) over the penalties it replaces. Note that \(\min \{v a_{ij},z^{i-1}_j\}\) is the amount of coverage that \(v\) copies of set \(i\) add to element \(j\). Hence, the total amount of penalties that are saved by \(v\) copies of set \(i\) is \(\sum _j \min \{v a_{ij},z^{i-1}_j\} p_j\). Also notice that \(v\) is well-defined because (7) is always satisfied by \(v=0\).

We show that the competitive ratio of Threshold is at most \(2\sqrt{\rho _{\mathrm{max}}}-1\).

Theorem 3

Let \((y,z)\) be the solution computed by Algorithm Threshold, and let \((y^*,z^*)\) be an optimal (integral) solution. Then,
$$\begin{aligned} \sum _i c_i y_i + \sum _j p_j z_j ~\le ~ \left( 2\sqrt{\rho _{\mathrm{max}}}-1\right) \sum _i c_i y^*_i + \sum _j p_j z^*_j. \end{aligned}$$

Proof

We first bound \(\sum _i c_i y_i\). By condition (7),
$$\begin{aligned} \sum _i c_i y_i&\le \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _i \sum _j \min \left\{ a_{ij} y_i,z^{i-1}_j\right\} \cdot p_j \\&= \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _j p_j \sum _i \min \left\{ a_{ij} y_i,z^{i-1}_j\right\} \\&\le \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _j p_j (b_j-z_j), \end{aligned}$$
where the second inequality follows since \(\min \{a_{ij} y_i,z^{i-1}_j\}\) is the amount of coverage that is added to \(j\) in the \(i\)th round, and therefore the total coverage of \(j,\,\sum _i \min \{a_{ij} y_i,z^{i-1}_j\}\), is at most \(b_j - z_j\).
On the other hand, since \(\rho (i) = (\sum _j p_j a_{ij}/c_i)\), for every \(i\), we have that
$$\begin{aligned} \sum _i c_i y^*_i= & {} \sum _i \frac{1}{\rho (i)} y^*_i \sum _j p_j a_{ij} ~\ge ~ \frac{1}{\rho _{\mathrm{max}}} \sum _j p_j \sum _i y^*_i a_{ij}\\\ge & {} \frac{1}{\rho _{\mathrm{max}}} \sum _j p_j \left( b_j - z^*_j\right) . \end{aligned}$$
It follows that
$$\begin{aligned} \sum _i c_i y_i&\le \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _j p_j (b_j-z_j) \\&= \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \left( \sum _j p_j (b_j - z^*_j) + \sum _j p_j z^*_j - \sum _j p_j z_j \right) \\&\le \sqrt{\rho _{\mathrm{max}}} \sum _i c_i y^*_i + \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _j p_j \left( z^*_j - z_j\right) . \end{aligned}$$
Next, we turn to bound the penalties that \((y,z)\) pays and \((y^*,z^*)\) does not pay, namely we bound \(\sum _j p_j \max \{z_j-z^*_j,0\}\). Define
$$\begin{aligned} \Delta _i = \max \left\{ y^*_i - y_i,0 \right\} . \end{aligned}$$
If \(\Delta = 0\), then \(z_j \le z^*_j\), for every \(j\), and we are done. Otherwise, let \(i\) be an index such that \(\Delta _i>0\). Due to condition (7) in the \(i\)th step, we have that
$$\begin{aligned} c_i y_i \le \frac{\sum _j \min \left\{ a_{ij} y_i,z^{i-1}_j\right\} \cdot p_j}{\sqrt{\rho _{\mathrm{max}}}} \end{aligned}$$
while
$$\begin{aligned} c_i y^*_i > \frac{\sum _j \min \{a_{ij} y^*_i,z^{i-1}_j\} \cdot p_j}{\sqrt{\rho _{\mathrm{max}}}}. \end{aligned}$$
Observe that \(j\)’s coverage increases by \(\min \{a_{ij} y_i, z^{i-1}_j\} = z^i_j - z^{i-1}_j\) in the \(i\)th step. If we further increase \(y_i\) to \(y^*_i\) we may gain \(\min \{\Delta _i a_{ij},z^i_j\}\) additional coverage for item \(j\). Hence,
$$\begin{aligned} c_i \Delta _i ~=~ c_i y^*_i - c_i y_i ~>~ \frac{\sum _j \min \{a_{ij} \Delta _i,z^i_j\} \cdot p_j}{\sqrt{\rho _{\mathrm{max}}}} ~\ge ~ \frac{\sum _j \min \left\{ a_{ij} \Delta _i,z_j\right\} \cdot p_j}{\sqrt{\rho _{\mathrm{max}}}}. \end{aligned}$$
It follows that
$$\begin{aligned} \sqrt{\rho _{\mathrm{max}}} \sum _i c_i \Delta _i&> \sum _i \sum _j \min \left\{ a_{ij} \Delta _i,z_j\right\} \cdot p_j \\&\ge \sum _j p_j \min \left\{ \sum _i a_{ij} \Delta _i,z_j \right\} \\&\ge \sum _j p_j \max \left\{ z_j - z^*_j,0\right\} , \end{aligned}$$
where the last inequality follows from the fact that \(y + \Delta \ge y^*\) and therefore \(\Delta \) covers at least \(\max \{z_j - z^*_j,0\}\), for every \(j\). Hence,
$$\begin{aligned} \sum _j p_j \max \left\{ z_j - z^*_j,0\right\} ~\le ~ \sqrt{\rho _{\mathrm{max}}} \sum _i c_i \Delta _i ~\le ~ \sqrt{\rho _{\mathrm{max}}} \sum _i c_i y^*_i. \end{aligned}$$
Putting it all together, we get that
$$\begin{aligned}&\sum _i c_i y_i + \sum _j p_j z_j \le \sum _i c_i y_i + \sum _j p_j z^*_j + \sum _j p_j \max \{z_j - z^*_j,0\} \\&\quad \le \sqrt{\rho _{\mathrm{max}}} \sum _i c_i y^*_i + \frac{1}{\sqrt{\rho _{\mathrm{max}}}} \sum _j p_j (z^*_j - z_j) + \sum _j p_j z^*_j\\&\qquad + \sum _j p_j \max \{z_j - z^*_j,0\} \\&\quad \le \sqrt{\rho _{\mathrm{max}}} \sum _i c_i y^*_i + \sum _j p_j z^*_j + (1 - 1/\sqrt{\rho _{\mathrm{max}}}) \sum _j p_j \max \{z_j - z^*_j,0\} \\&\quad \le (2\sqrt{\rho _{\mathrm{max}}}-1) \sum _i c_i y^*_i + \sum _j p_j z^*_j ~, \end{aligned}$$
as required. \(\square \)

This leads us to an upper bound on the competitive ratio.

Corollary 4

Algorithm Threshold is \((2\sqrt{\rho _{\mathrm{max}}}-1)\)-competitive.

We note that the same approach would work for the variant of otf in which there is an upper bound \(u_i\) on the number of copies of set \(i\) that can be used, i.e., \(y_i \le u_i\). In this case the value of \(v\) in condition (7) is also bounded by \(u_i\). The rest of the details are omitted.

4.2 A Lower Bound

In this section we present a matching lower bound, which holds for randomized algorithms, and even for the case where the algorithm may discard a set from its running solution (but never takes back a set that was dismissed).

We start with a couple of simple constructions. In the first construction, the input consists of sets of size one, and in the second all costs and penalties are the same.

Theorem 5

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds for inputs with only two elements and sets of size one, with unit coverage and uniform penalties.

Proof

Let alg be a randomized algorithm. Consider an input sequence consisting of two elements with unit covering requirement and penalty \(p\). The arrival sequence is composed of two or three sets. The first set to arrive is \(\left\{ 1 \right\} \) of cost \(1\). (The goal of the first set is to make sure that the ratio between the penalty and the minimum cost is \(p\).) The second set is \(\left\{ 2 \right\} \) of cost \(\sqrt{p}\). If alg takes this set with probability less than half, then the sequence ends; otherwise, the third set \(\left\{ 2 \right\} \) of cost \(1\) arrives.

In the first case the optimal cost is \(1+\sqrt{p}\), while alg pays at least \(1 + \frac{1}{2}p\). Otherwise, the optimal cost is \(2\), while alg pays at least \(1 + \frac{1}{2}\sqrt{p}\). Notice that we may repeat the second part of this sequence as many times as needed. Finally, notice that \(\rho _{\mathrm{max}}= p\). \(\square \)

Theorem 6

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds for inputs with unit costs and penalties.

Proof

Let alg be a randomized algorithm. Assume unit penalties and unit coverage requirements. Consider the input sequence that starts with \(\sqrt{n}\) candidates, each with \(\sqrt{n}\) fresh skills and cost \(1\). Let \(\ell \) be the expected number of candidates alg takes from this sequence. If \(\ell < \sqrt{n}/2\), this is the whole input. In this case the expected cost of alg is at least \(\frac{1}{2}n\), whereas the optimal cost is \(\sqrt{n}\). If \(\ell \ge \frac{1}{2}\sqrt{n}\), then we add an omnipotent candidate (who has all skills) at the end, with cost \(1\). It follows that alg pays at least \(\frac{1}{2}\sqrt{n}\) in expectation, while opt pays only \(1\). Finally, notice that \(\rho _{\mathrm{max}}= n\). \(\square \)

Next, we give a lower bound construction that applies to the more general setting in which the algorithm may discard a set from its solution.

Theorem 7

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds even if the algorithm is allowed to discard sets. Furthermore, it holds also in the binary case, where all demands, coverages, penalties and costs are either \(0\) or \(1\).

Proof

Our lower bound construction uses affine planes defined as follows. Let \(n = q^2\), where \(q\) is prime. In our construction, each pair \((a,b) \in \mathbb {Z}_q \times \mathbb {Z}_q\) corresponds to an element. Sets will correspond to lines: a line in this finite geometry is a collection of pairs \((x,y) \in \mathbb {Z}_q \times \mathbb {Z}_q\) satisfying either \(y \equiv ax+b \pmod q\), for some given \(a,b \in \mathbb {Z}_q\), or of the form \((c,*)\) for some given \(c \in \mathbb {Z}_q\). There are \(q^2 + q = \Theta (n)\) such lines.

The important properties we use are the following:
  1. 1.

    All points can be covered by \(q\) disjoint (parallel) lines.

     
  2. 2.

    Two lines that intersect in more than a single point are necessarily identical.

     
We now describe the lower bound scenario. The elements correspond (in a 1–1 fashion) to the points in the affine plane. All elements have unit penalty and unit covering requirement, i.e., \(p_j = 1\) and \(b_j = 1\), for every \(j\). The input sequence starts with a sequence of \(q^2+q\) sets corresponding to all distinct lines of the plane, each with unit cost. Fix any randomized online algorithm alg. We proceed by cases, depending on the expected number \(r\) of these sets that alg retains at this point. If \(r\le \sqrt{n}/2\) or \(r>n/2\), then we are already done: at this time the cost to the algorithm is \(\Omega (n)\) (due either to penalties or to the cost of sets retained), while the optimal cost at this time is \(\sqrt{n}\) by virtue of Property (1) above.

Otherwise, \(\sqrt{n}/2<r\le n/2\). Let \(L\) be a line chosen uniformly at random. The probability that \(L\) is retained by the algorithm is at most \(1/2\), since \(r\le n/2\). We now extend the input sequence by one more set \(L^c \mathop {=}\limits ^\mathrm{def}\left\{ 1,\ldots ,n \right\} \setminus L\), and assign \(L^c\) unit cost. Note that by Property (2), if \(L\) is not retained by the algorithm, then the number of other lines that cover the points of \(L\) cannot be smaller than \(|L|=\sqrt{n}\), and hence the expected cost of alg due only to the points of \(L\) (either by covering set costs or by incurred penalties) is at least \(\sqrt{n}/2\). Obviously, throwing out any set from the solution at this time will not help to reduce the cost. On the other hand, the optimal solution to this scenario is the sets \(L\) and \(L^c\), whose cost is \(2\), and hence the competitive ratio is at least \(\Omega (\sqrt{n})\). \(\square \)

Remarks. First, we note that in the proof above, the unit-cost set \(L^c\) can be replaced by \(\sqrt{n}-1\) sets, where each set covers \(\sqrt{n}\) elements and costs \(\frac{1}{\sqrt{n}-1}\). Second, we note that one may be concerned that in the first case, the actual \(\rho _{\mathrm{max}}\) of the instance is not \(n\). This can be easily remedied as follows. Let the instance consist of \(2n\) elements: \(n\) elements in the affine plane as in the proof, and another \(n\) dummy elements. The dummy elements will be all covered by a single set that arrives first in the input sequence. The remainder of the input sequence is as in the proof. This allows us to argue that the actual\(\rho _{\mathrm{max}}\) is indeed \(n\), whatever the ensuing scenario is, while decreasing the lower bound by no more than a constant factor.

The above theorems hold even if \(\rho _{\mathrm{max}}\) is known to the algorithm. However, if \(\rho _{\mathrm{max}}\) is unknown, and discarding sets is not allowed, then we get a stronger lower bound.

Theorem 8

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\rho _{\mathrm{max}})\), if the algorithm cannot discard sets and has no knowledge of \(\rho _{\mathrm{max}}\). It holds even in the case of unit penalties, demands and coverage.

Proof

Let alg be a randomized algorithm. Suppose for the sake of deriving a contradiction that there is an arbitrarily slow growing invertible function \(h\) such that alg has competitive ratio at most \(\rho _{\mathrm{max}}/h(\rho _{\mathrm{max}})\).

For every sufficiently large \(x\), we shall construct an instance \(I\) with \(\rho _{\mathrm{max}}= \rho _{\mathrm{max}}(I) \ge x\) for which the performance ratio of alg is at least \(2\rho _{\mathrm{max}}/h(\rho _{\mathrm{max}})\). This contradicts the assumption of the competitive ratio of alg, implying the theorem.

Let \(x\) be value satisfying \(h(x) \ge 4\) and let \(f(x) = h(x)/4\). The instance we construct as follows has only one element with unit penalty. A set arrives with cost \(1/x\). If alg takes the set with probability less than \(\frac{1}{2}\), we stop. Otherwise, we present a second set with cost \(1/f^{-1}(x)\).

In the former case, \(\rho _{\mathrm{max}}= x\), the expected cost of alg is at least \(\frac{1}{2}\), and opt pays \(1/x\), for a competitive ratio of
$$\begin{aligned} \frac{\mathbb {E}[\textsc {alg} ]}{\textsc {opt}} \ge \frac{1/2}{1/x} = \frac{\rho _{\mathrm{max}}}{2} \ge \frac{2\rho _{\mathrm{max}}}{h\left( \rho _{\mathrm{max}}\right) }. \end{aligned}$$
In the latter case, \(\rho _{\mathrm{max}}= f^{-1}(x)\), which means that \(x = f(\rho _{\mathrm{max}})\). The expected cost of alg is at least \(1/(2x) = 1/(2 f(\rho _{\mathrm{max}}))\), while the cost of opt is \(1/f^{-1}(x) = 1/\rho _{\mathrm{max}}\). The performance ratio is then at least \(\rho _{\mathrm{max}}/(2 f(\rho _{\mathrm{max}})) \ge 2 \rho _{\mathrm{max}}/h(\rho _{\mathrm{max}})\). In both cases, we obtain a contradiction, implying the theorem. \(\square \)

It follows that when \(\rho _{\mathrm{max}}\) is unknown, it will not be possible to obtain an \(O(\sqrt{\rho _{\mathrm{max}}})\)-competitive algorithm without the ability to discard sets.

5 Conclusion

As mentioned in the introduction, the special case of opip in which the matrix is binary (i.e., each set requires either one or zero copies of each item) was considered in [8], where an upper bound and an almost tight lower bound on the competitive ratio of randomized algorithms where presented. We have shown that a variant of the algorithm from [8] applies to general opip. We note that the above lower bound applies to the unit capacities case. However, there is no lower bound for opip with non-unit uniform capacities.

We have proven matching upper and lower bounds on the competitive ratio for otf. We have shown that even randomized algorithms cannot have competitive ratio better than \(\Omega (\sqrt{\rho }_{\mathrm{max}})\). The lower bound holds even if \(\rho _{\mathrm{max}}\) is known, and even if one is allowed to drop previously selected sets. On the other hand, the upper bound is obtained due to a simple deterministic algorithm that does not drop sets. Unfortunately, our algorithm is based on the prior knowledge of \(\rho _{\mathrm{max}}\). It remains an open question whether there is an \(O(\sqrt{\rho _{\mathrm{max}}})\)-competitive algorithm that has no knowledge of \(\rho _{\mathrm{max}}\). We have eliminated the possibility of an \(O(\sqrt{\rho _{\mathrm{max}}})\) upper bound for an algorithm that is not allowed to discard sets.

Footnotes

  1. 1.

    We misuse the term “set” for simplicity.

Notes

Acknowledgments

We thank Moti Medina for going to Berkeley to represent us.

References

  1. 1.
    Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., Naor, J.: The online set cover problem. SIAM J. Comput. 39(2), 361–370 (2009)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Awerbuch, B., Azar, Y., Plotkin, S.A.: Throughput-competitive on-line routing. In: 34th IEEE Annual Symposium on Foundations of Computer Science, pp. 32–40 (1993)Google Scholar
  3. 3.
    Bateni, M., Hajiaghayi, M., Zadimoghaddam, M.: Submodular secretary problem and extensions. In: 13th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Volume 6302 of LNCS, pp. 39–52 (2010)Google Scholar
  4. 4.
    Berman, P.: A \(d/2\) approximation for maximum weight independent set in \(d\)-claw free graphs. Nord. J. Comput. 7(3), 178–184 (2000)MathSciNetMATHGoogle Scholar
  5. 5.
    Buchbinder, N., Naor, J.: Online primal-dual algorithms for covering and packing. Math. Oper. Res. 34(2), 270–286 (2009)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Chekuri, C., Khanna, S.: On multidimensional packing problems. SIAM J. Comput. 33(4), 837–851 (2004)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Cygan, M.: Improved approximation for 3-dimensional matching via bounded pathwidth local search. In: 54th IEEE Annual Symposium on Foundations of Computer Science, pp. 509–518 (2013)Google Scholar
  8. 8.
    Emek, Y., Halldórsson, M.M., Mansour, Y., Patt-Shamir, B., Radhakrishnan, J., Rawitz, D.: Online set packing. SIAM J. Comput. 41(4), 728–746 (2012)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Feldman, M., Naor, J.S., Schwartz, R.: Improved competitive ratios for submodular secretary problems. In: 14th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Volume 6845 of LNCS, pp. 218–229 (2011)Google Scholar
  10. 10.
    Freeman, P.: The secretary problem and its extensions: a review. Int. Stat. Rev. 51(2), 189–206 (1983)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Frieze, A.M., Clarke, M.R.B.: Approximation algorithms for the \(m\)-dimensional 0–1 knapsack problem: worst-case and probabilistic analyses. Eur. J. Oper. Res. 15, 100–109 (1984)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Gilbert, J.P., Mosteller, F.: Recognizing the maximum of a sequence. J. Am. Stat. Assoc. 61(313), 35–73 (1966)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Halldórsson, M.M., Kratochvíl, J., Telle, J.A.: Independent sets with domination constraints. Discrete Appl. Math. 99(1–3), 39–54 (2000)MathSciNetCrossRefMATHGoogle Scholar
  14. 14.
    Halldórsson, M.M., Patt-Shamir, B., Rawitz, D.: Online scheduling with interval conflicts. Theory Comput. Syst. 53(2), 300–317 (2013)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Håstad, J.: Clique is hard to approximate within \(n^{1-\epsilon }\). Acta Math. 182(1), 105–142 (1999)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Hazan, E., Safra, S., Schwartz, O.: On the complexity of approximating k-set packing. Comput. Complex. 15(1), 20–39 (2006)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Ibarra, O.H., Kim, C.E.: Fast approximation algorithms for the knapsack and sum of subset problems. J. ACM 22(4), 463–468 (1975)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Magazine, M.J., Chern, M.-S.: A note on approximation schemes for multidimensional knapsack problems. Math. Oper. Res. 9(2), 244–247 (1984)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Mansour, Y., Patt-Shamir, B., Rawitz, D.: Competitive router scheduling with structured data. Theor. Comput. Sci. 530, 12–22 (2014)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Raghavan, P., Thompson, C.D.: Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4), 365–374 (1987)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Sahni, S.: Approximate algorithms for the 0/1 knapsack problem. J. ACM 22(1), 115–124 (1975)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Srinivasan, A.: Improved approximations of packing and covering problems. In: 27th Annual ACM Symposium on the Theory of Computing, pp. 268–276 (1995)Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Pierre Fraigniaud
    • 1
  • Magnús M. Halldórsson
    • 2
  • Boaz Patt-Shamir
    • 3
  • Dror Rawitz
    • 4
  • Adi Rosén
    • 1
  1. 1.LIAFA, CNRSUniversity Paris DiderotParisFrance
  2. 2.ICE-TCS, School of Computer ScienceReykjavik UniversityReykjavíkIceland
  3. 3.School of Electrical EngineeringTel Aviv UniversityTel AvivIsrael
  4. 4.Faculty of EngineeringBar-Ilan UniversityRamat GanIsrael

Personalised recommendations