# Shrinking Maxima, Decreasing Costs: New Online Packing and Covering Problems

## Abstract

We consider two new variants of online integer programs that are duals. In the packing problem we are given a set of items and a collection of knapsack constraints over these items that are revealed over time in an online fashion. Upon arrival of a constraint we may need to remove several items (irrevocably) so as to maintain feasibility of the solution. Hence, the set of packed items becomes smaller over time. The goal is to maximize the number, or value, of packed items. The problem originates from a buffer-overflow model in communication networks, where items represent information units broken into multiple packets. The other problem considered is online covering: there is a universe to be covered. Sets arrive online, and we must decide for each set whether we add it to the cover or give it up. The cost of a solution is the total cost of sets taken, plus a penalty for each uncovered element. The number of sets in the solution grows over time, but its cost goes down. This problem is motivated by team formation, where the universe consists of skills, and sets represent candidates we may hire. The packing problem was introduced in Emek et al. (SIAM J Comput 41(4):728–746, 2012) for the special case where the matrix is binary; in this paper we extend the solution to general matrices with non-negative integer entries. The covering problem is introduced in this paper; we present matching upper and lower bounds on its competitive ratio.

### Keywords

Competitive analysis Randomized algorithm Packing integer programs Online set packing Team formation Prize-collecting multi-covering## 1 Introduction

*cap*(an upper bound) on the number of copies of item \(j,\,b_j\) is the

*benefit*obtained by packing item \(j\), and \(c_i\) is the

*capacity*of the \(i\)th constraint. The online character of opip is expressed by the following additional assumptions: (1) knapsack constraints arrive one by one, and (2) the variables can only be decreased. The special case, where \(A \in \{0,1\}^{m \times n}\) and \(c = 1^n\) is known as Online Set Packing [8].

^{1}where the coverage of set \(i\) of element \(j\) is \(a_{ij}\) and its cost is \(c_i\). The solution is a collection of the sets, where multiple copies of sets are allowed. The cost of a solution is the cost of selected sets plus the penalties for unsatisfied covering requirements. In (TF), the value of \(y_i\) represents the number of copies of \(i\) taken by the solution, and \(z_j\) is the amount of unsatisfied coverage of set \(j\) (for which we pay penalty).

Our online version of the Team Formation problem, denoted otf, is as follows. Initially, the elements are uncovered—and hence incur a unit penalty per each unit of uncovered element. Sets with various coverage and cost arrive online. In each time step, a new set arrives, and the algorithm must decide how many copies of the arriving set to add to the solution. The goal is to minimize the total cost of sets taken plus penalties for uncovered elements.

Our main measure, as is customary with online algorithms, is the *competitive ratio*: in the covering case, the ratio of cost incurred by the algorithm (expected cost if the algorithm is randomized) to the best possible cost for the given instance, and in the packing case, the ratio between the benefit earned by the optimum solution to the (expected) benefit earned by the algorithm.

*Motivation.* The otf problem is an abstraction of the following situation (corresponding to a binary matrix and binary requirements). We are embarking on a new project that requires some \(n\) skills. The requirement for skill \(j\) can be satisfied by outsourcing for some cost \(p_j\), or by hiring an employee who possesses skill \(j\). The goal is to minimize the project cost under the following procedure: We interview candidates one by one. After each interview we know the skills and the hiring cost of the candidate and must then decide irrevocably whether to hire the candidate.

The opip problem originates from the following natural networking situation [8]. High-level information units, called *frames*, can be too large to fit in a single network packet, in which case the frames are fragmented into multiple packets. As packets traverse the network, they may arrive at a bottleneck link that cannot deliver them all, giving rise to a basic online question: which packets to drop so as to maximize the number of frames that are delivered in full. If we ignore buffers, this question is precisely our version of opip. Namely, in each time step \(i\), a burst of packets arrives, corresponding to the \(i\)th constraint in (PIP): \(a_{ij}\) is the size of the packet from frame \(j\) that arrives at step \(i\), and \(c_i\) is the total size that the link can deliver at time \(i\).

Our problems appear unique in the literature of online computation in that solutions get progressively *smaller* with time. Traditionally, the initial solution is expected to be the empty set, and its value or cost only increases as the input is progressively presented. In our class of problems, some aspects of the input are known, inducing a naïve initial solution. The presented input progressively elucidates the structure of the instance, adding more constraints (in maximization problems) or providing increasing opportunities for cost reductions or optimizations (in minimization problems). In reality, the issue is often less what to include than what to keep. We feel that this complementary viewpoint is natural and deserves further treatment.

*Contribution and Results.* The contributions of this paper are twofold. On the conceptual level, we are the first to formalize the otf problem, to the best of our knowledge (the opip problem was introduced in [8]). On the technical level, we present nearly tight results for both the opip and the otf problems.

For opip, we extend the results of [8] from a binary matrix to the case of general non-negative integer demands. This is a useful extension when we consider our motivating network bottleneck scenario: it allows the algorithm to deal with packets of different size, while previous solutions were restricted to uniform-size packets. The competitive ratio of our algorithm is \(O(C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}})\), where \(C_{\mathrm{max}}\) the maximal sum of entries in a column, and \(\rho _{\mathrm{max}}\) is the maximal ratio of the load on constraint \(i\), namely \(\sum _j p_j a_{ij}\), to its capacity \(c_i\). Observe that for the case of unit caps (i.e., \(p=1\)), \(\rho _{\mathrm{max}}\) is the sum of entries in a row \(i\) to its capacity \(c_i\). We remark that the extension is non-trivial, although it uses known techniques.

Regarding otf, we prove matching upper and lower bounds on the competitive ratio: We show that even randomized algorithms cannot have competitive ratio better than \(\Omega (\sqrt{\rho _{\mathrm{max}}})\), where \(\rho _{\mathrm{max}}\) is the maximal ratio, over all elements, between the highest and lowest cost of covering a given element. This result holds even for the case where the algorithm may discard a set from its running solution (but never takes back a set that was dismissed). On the other hand, we give a simple deterministic algorithm with a competitive ratio of \(O(\sqrt{\rho _{\mathrm{max}}})\). The algorithm requires prior knowledge of the value of \(\rho _{\mathrm{max}}\); we show that without such knowledge only the trivial \(O(\rho _{\mathrm{max}})\) bound is possible.

We note that our techniques can be used for the variant of otf in which \(y_i\) is bounded (e.g., there is only one copy of a given candidate).

*Related Work.* Online packing was studied in the past, but traditionally the elements of the universe (equivalently, the constraints) are given ahead of time and sets arrive on-line (e.g., in [2]). In a similar vein, online set cover was defined in [1] as follows. A collection of sets is given ahead of time. Elements arrive online, and the algorithm is required to maintain a cover of the elements that arrived: if the arriving element is not already covered, then some set from the given collection must be added to the solution. Our problems have the complementary view of what is known in advance and what arrives online (see also [5]).

Let us first review some results for the offline packing problem pip. The single constraint case (\(m=1\)) is simply the Knapsack problem, which is NP-hard and has an FPTAS [17, 21]. If the number of constraints is constant, the offline version of pip becomes the Multi-dimensional Knapsack problem that has a PTAS [11], while obtaining an FPTAS is NP-hard [18]. Raghavan and Thompson [20] used randomized rounding to obtain solutions whose benefit is \(t_1 = \Omega (\textsc {opt}/m^{1/\alpha })\) for pip, where \(\alpha = \min _j \min _i \frac{c_j}{a_{ij}}\). A solution of benefit \(t_2 = \Omega (\textsc {opt}/m^{1/(\alpha +1)})\) is also given for the case where \(A \in \left\{ 0,1 \right\} ^{m \times n}\) (In this case \(\alpha = \min _j c_j\)). Srinivasan [22] improved these results by obtaining solutions whose benefits are \(\Omega (t_1^{\alpha /(\alpha -1)})\) and \(\Omega (t_2^{\alpha /(\alpha -1)})\). Chekuri and Khanna [6] showed that, for every fixed integer \(\alpha \) and fixed \(\varepsilon >0\), pip with \(c = \alpha ^m\) and \(A \in \left\{ 0,1 \right\} ^{m \times n}\) cannot be approximated within a factor of \(m^{1/(\alpha +1) - \varepsilon }\), unless NP \(=\) ZPP. They also showed that pip with uniform capacities cannot be approximated within a factor of \(m^{1/(\alpha +1) - \varepsilon }\), unless NP \(=\) ZPP, even with a resource augmentation factor \(\alpha \) (In this case the solution \(x\) satisfies \(Ax \le \alpha c\)).

As mentioned before, the special case of pip where \(A \in \{0,1\}^{m \times n}\) and \(c = 1^n\) is known as set packing. This problem is as hard as Maximum Independent Set even when all elements have degree \(2\) (i.e., \(A\) contains at most two non-zero entries in each row), and therefore cannot be approximated to within a factor of \(O(n^{1-\epsilon })\), for any \(\varepsilon >0\) [15]. In terms of the number of elements (constraints, in pip terms), set packing is \(O(\sqrt{m})\)-approximable and hard to approximate within \(m^{1/2-\varepsilon }\), for any \(\varepsilon >0\) [13]. When set sizes are at most \(k\) (\(A\) contains at most \(k\) non-zero entries in each column), it is approximable to within \((k+1)/3+\varepsilon \), for any \(\varepsilon > 0\) [7], and within \((k+1)/2\) in the weighted case [4], but known to be hard to approximate to within \(o(k/\log k)\)-factor [16].

opip was introduced in [8], assuming that the matrix is binary, namely each set requires either one or zero copies of each item. A randomized algorithm was given for that case with a competitive ratio of \(O(k\sqrt{\nu })\), where \(k\) is the maximal set size and \(\nu \) is the maximal ratio, over all items, between the number of sets containing that item to the number of its copies. In opip terms this bound is \(O(C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}})\). A nearly matching lower bound of \(\tilde{\Omega }(k\sqrt{\nu })\) was also given for the unit capacities case. This translates to an \(\tilde{\Omega }(C_{\mathrm{max}}\sqrt{\rho _{\mathrm{max}}})\) lower bound for opip. Subsequent work extended these results to allow for redundancy [19], i.e., when the benefit of a set is earned when at least a \(\beta \)-fraction of its elements are assigned to it, for some fixed \(\beta > 0\). For the special case of unit capacity opip in which the constraint matrix has the consecutive ones property, a deterministic \(O(\log R_{\mathrm{max}})\)-competitive algorithm was given in [14], where \(R_{\mathrm{max}}\) the maximal sum of entries in a row, as well as a matching lower bound.

Previously, the online packing problem where *sets* arrive online and constraints are fixed was defined in [2], and an \(O(\log n)\)-competitive algorithm given for the case when each set requires at most a \(1/\log n\)-fraction of the cap of any element. A matching lower bound shows that this requirement is necessary to obtain a polylogarithmic competitive ratio.

Regarding team formation, we are unaware of any prior formalization of the problem, let alone analysis. The online cover problem defined in [1] has an algorithm with competitive ratio \(O(\log n\log m)\). Another related problem is the secretary problem (see, e.g., [10, 12]; further results and references can be found in [3, 9]). In this family of problems, \(n\) candidates arrive in random order (or with random value), and the goal is to pick \(k\) of them (classically, \(k=1\)) that optimize some function of the set, such as the probability of picking the candidates with the top \(k\) values, or the average rank of selected candidates. The difficulty, similar to our otf formulation, is that the decision must be taken immediately upon the candidate’s arrival. However, the stipulation that the input is random makes the secretary problem very different from otf. Another difference is that unlike otf, the number of candidates to pick is set in advance.

*Paper Organization.* The remainder of this paper is organized as follows. In Sect. 2 we introduce some notation. In Sect. 3 we describe and analyze our online algorithm for opip, and in Sect. 4 we consider otf.

## 2 Preliminaries

In this section we define our notation. Given a matrix \(A \in \mathbb {N}^{m \times n}\), let \(R(i) = \sum _j a_{ij}\) be the sum of entries in the \(i\)th row, and let \(C(j) = \sum _i a_{ij}\) be the sum of entries in the \(j\)th column. Denote \(R_{\mathrm{max}} = \max _i R(i)\) and \(C_{\mathrm{max}} = \max _j C(j)\). Define \(\rho (i) = (\sum _j p_ja_{ij})/c_i\), for every \(i\), and \(\rho _{\mathrm{max}}= \max _i \rho (i)\).

Observe that if \(\sum _j p_j a_{ij} \le c_i\) for some \(i\), in an opip instance, then constraint \(i\) is redundant. Hence, we assume w.l.o.g. that \(\sum _j p_j a_{ij} > c_i\) for every \(i\), which means that \(\rho (i) > 1\), for every \(i\).

We assume hereafter that \(\text {gcd}(a_{i1},\ldots ,a_{in},c_i)=1\), for every \(i\). Otherwise, we may divide \(a_{i1},\ldots ,a_{in}\), and \(c_i\) by this common factor. This does not change \(\rho (i)\), but it may decrease \(C_{\mathrm{max}}\) and our bound on the competitive ratio. On the other extreme, we assume that \(a_{ij} \le c_i\) for every \(i\) and \(j\): if \(a_{ij}> c_i\) then item \(j\) is not a member in any feasible solution.

Given a subset \(J\) of items and a constraint \(i\), let \(J(i) = \left\{ j \in J : a_{ij}>0 \right\} \) be the subset of items from \(J\) that participate in constraint \(i\). For example, if \(\textsc {opt} \) is the set of items in some fixed optimal solution, then \(\textsc {opt} (i)\) denotes the items in \(\textsc {opt} \) that are active in constraint \(i\). Also, let \(R_J(i) = \sum _{j \in J} a_{ij}\), and define the *weighted benefit* of a constraint \(i\) as \(wb(i) = \sum _j a_{ij} \cdot b_j\).

Given an otf instance, \(R(i) = \sum _j a_{ij}\) is the coverage potential of a single copy of set \(i\), and \(\sum _j p_j a_{ij}\) is the potential savings in penalties of a single copy of set \(i\). Hence, \(\rho (i)\) is the ratio between the savings and cost of set \(i\), namely it is the *cost effectiveness* of set \(i\). Observe that we may assume that \(\rho (i)>1\), since otherwise we may ignore the set. Intuitively, the cheapest possible way to cover the elements is by sets with maximum cost effectiveness. Hence, ignoring the sets and simply paying the penalties (i.e., the solution \(y=0\) and \(z=b\)) is a \(\rho _{\mathrm{max}}\)-approximate solution.

## 3 Online Packing Integer Programs

In this section we present a randomized algorithm for opip whose competitive ratio is \(2C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}}\). We describe an algorithm for opip with unit caps, namely for the case where \(p_j=1\), for every \(j\), that is a slight generalization of the algorithm given in [8], allowing us to deal with non-binary instances. We solve the general case by simply treating each item \(j\) as \(p_j\) items, namely by duplicating the \(j\)th column \(p_j\) times. Observe that this transformation does not change \(C_{\mathrm{max}}\) or \(\rho _{\mathrm{max}}\).

For the rest of this section we assume that unit item upper bounds, namely that \(p=1\). In particular, we assume that \(\rho (i) = R(i)/c_i\), for every \(i\).

*Random Variables.*For \(w >0\), let \(D_w: \mathbb {R}\rightarrow [0,1]\) be a (cumulative) distribution function of a random variable \(Z\) that is defined by

*Algorithm RP.* Initially, we independently choose for each item \(j\) a random priority \(r(j) \in [0,1]\) with distribution \(D_{b_j}\). When constraint \(i\) arrives, we construct \(c_i\) subsets \(S_{i1},\ldots ,S_{ic_i}\) as follows. Each item \(j\) chooses \(a_{ij}\) subsets at random. Then, for each subset \(S_{i\ell },\,\ell \in \left\{ 1,\ldots ,c_i \right\} \), we reject all items but the one with the highest priority. Observe that an item survives only if it has the highest priority in all of its chosen sets.

*Example 1*

Supposed that the instance contains four items whose priorities are \(r(1) = 0.5,\,r(2) = 0.8,\,r(3) = 0.4\), and \(r(4) = 0.9\). Upon arrival of the \(i\)th constraint: \(x_1 + 3x_2 + 2x_3 + 2 x_4 \le 4\), Algorithm RP constructs \(c_i = 4\) random subsets: \(S_{i1} = \left\{ 1,3 \right\} ,\,S_{i2} = \left\{ 2,3,4 \right\} ,\,S_{i3} = \left\{ 2,4 \right\} \), and \(S_{i4} = \left\{ 2 \right\} \). Item \(2\) is eliminated due to \(S_2\) and \(S_3\), while Item \(3\) is eliminated due to \(S_1\) and \(S_2\). Items \(1\) and \(4\) are not eliminated by this constraint.

Intuitively, the approach is to prefer items with high priority. In the special case where \(a_{ij} \in \left\{ 0,1 \right\} \), one may simply choose the \(c_i\) items with highest priority. A somewhat more subtle approach, based on a reduction to the unit capacity case is used in [8]: Items are randomly partitioned into \(c_i\) equal-size subsets; from each subset only the top priority item survives. Our Algorithm RP use a variation of this approach: we construct \(c_i\) subsets whose expected sizes are equal, such that item \(j\) is contained in exactly \(a_{ij}\) subsets.

*Analysis.*Observe that each subset \(S_{i \ell }\) induces the following constraint: \(\sum _{j \in S_{i\ell }} x_j \le 1\). Hence, the algorithm implicitly constructs a new uniform capacity opip instance by defining the matrix \(A' \in \left\{ 0,1 \right\} ^{(\sum _i c_i) \times n}\) as follows: \(a_{\sum _{t<i} c_k + \ell ,j} = 1\) if and only if \(j \in S_{i \ell }\). Each row of \(A'\) corresponds to one of the random constraints generated by the algorithm. See example in Fig. 1.

**Observation 1**

\(C(j) = C'(j)\), for every \(j\), and \(\mathbb {E}[R'(\sum _{t<i} c_t + \ell )] = \rho (i)\), for every \(i\) and \(\ell \).

*Proof*

First, consider the probability of satisfying an item \(j\).

**Lemma 2**

\(\Pr [r(j) > \max \{r(k): k \in N(j)\}] = \mathbb {E}\left[ \frac{b_j}{b(N[j])} \right] .\)

*Proof*

Next, we provide a lower bound on the expected performance of Algorithm RP. We abuse notation by referring to the output of the algorithm by RP, as well.

**Lemma 3**

For any subset of items \(J,\,\mathbb {E}[b(\textsc {RP})] \ge \frac{\left( \sum _{j \in J} b_j\right) ^2}{\mathbb {E}\left[ \sum _{j \in J} b(N[j]) \right] }\).

*Proof*

Our next step is to bound \(\sum _{j \in J} b(N[j])\). Recall that, since \(A'\) is binary, \(wb'(i)\) is the sum of benefits that appear in new constraint \(i\). Hence, if \(j\) appears in new constraint \(i\), its weighted competition is at most \(wb'(i)\).

**Lemma 4**

Let \(J\) be a subset of items. Then, \(\sum _{j \in J} b(N[j]) ~\le ~ \sum _{i=1}^{m'} R'_J(i) \cdot wb'(i)\).

*Proof*

To complete the analysis we derive appropriate upper bounds for the denominator when \(J=[n]\) and when \(J=\textsc {opt} \).

**Lemma 5**

*Proof*

Lemma 5 implies that

**Theorem 1**

Theorem 1 implies the following:

**Corollary 2**

There is an opip algorithm with competitive ratio at most \(2C_{\mathrm{max}} \sqrt{\rho _{\mathrm{max}}}\).

*Proof*

## 4 Competitive Team Formation

In this section we provide a deterministic online algorithm for otf and a matching lower bound that holds even for randomized algorithms. Furthermore, our lower bound holds for a more general case, where the commitment of the online algorithm is only “one way” in the following sense. Once a set is dismissed it cannot be recruited again, but a set in the solution at one point may be thrown out of the solution later.

### 4.1 An Online Algorithm

Our algorithm generates a monotonically growing collection of sets based on a simple deterministic threshold rule. Recall that \(\rho _{\mathrm{max}}\) is the maximum cost effectiveness, over all sets. Algorithm Threshold assumes knowledge of \(\rho _{\mathrm{max}}\) and works as follows. Let \(y\) be the set vector constructed by Threshold, and define \(z^i_j = \max \left\{ b_j - \sum _{\ell \le i} a_{\ell j} y_\ell ,0 \right\} \), i.e., \(z^i_j\) is the amount of missing coverage for element \(j\) after the introduction of set \(i\). Note that \(z_j^i\) is monotone non-increasing with \(i\).

We show that the competitive ratio of Threshold is at most \(2\sqrt{\rho _{\mathrm{max}}}-1\).

**Theorem 3**

*Proof*

This leads us to an upper bound on the competitive ratio.

**Corollary 4**

Algorithm Threshold is \((2\sqrt{\rho _{\mathrm{max}}}-1)\)-competitive.

We note that the same approach would work for the variant of otf in which there is an upper bound \(u_i\) on the number of copies of set \(i\) that can be used, i.e., \(y_i \le u_i\). In this case the value of \(v\) in condition (7) is also bounded by \(u_i\). The rest of the details are omitted.

### 4.2 A Lower Bound

In this section we present a matching lower bound, which holds for randomized algorithms, and even for the case where the algorithm may discard a set from its running solution (but never takes back a set that was dismissed).

We start with a couple of simple constructions. In the first construction, the input consists of sets of size one, and in the second all costs and penalties are the same.

**Theorem 5**

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds for inputs with only two elements and sets of size one, with unit coverage and uniform penalties.

*Proof*

Let alg be a randomized algorithm. Consider an input sequence consisting of two elements with unit covering requirement and penalty \(p\). The arrival sequence is composed of two or three sets. The first set to arrive is \(\left\{ 1 \right\} \) of cost \(1\). (The goal of the first set is to make sure that the ratio between the penalty and the minimum cost is \(p\).) The second set is \(\left\{ 2 \right\} \) of cost \(\sqrt{p}\). If alg takes this set with probability less than half, then the sequence ends; otherwise, the third set \(\left\{ 2 \right\} \) of cost \(1\) arrives.

In the first case the optimal cost is \(1+\sqrt{p}\), while alg pays at least \(1 + \frac{1}{2}p\). Otherwise, the optimal cost is \(2\), while alg pays at least \(1 + \frac{1}{2}\sqrt{p}\). Notice that we may repeat the second part of this sequence as many times as needed. Finally, notice that \(\rho _{\mathrm{max}}= p\). \(\square \)

**Theorem 6**

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds for inputs with unit costs and penalties.

*Proof*

Let alg be a randomized algorithm. Assume unit penalties and unit coverage requirements. Consider the input sequence that starts with \(\sqrt{n}\) candidates, each with \(\sqrt{n}\) fresh skills and cost \(1\). Let \(\ell \) be the expected number of candidates alg takes from this sequence. If \(\ell < \sqrt{n}/2\), this is the whole input. In this case the expected cost of alg is at least \(\frac{1}{2}n\), whereas the optimal cost is \(\sqrt{n}\). If \(\ell \ge \frac{1}{2}\sqrt{n}\), then we add an omnipotent candidate (who has all skills) at the end, with cost \(1\). It follows that alg pays at least \(\frac{1}{2}\sqrt{n}\) in expectation, while opt pays only \(1\). Finally, notice that \(\rho _{\mathrm{max}}= n\). \(\square \)

Next, we give a lower bound construction that applies to the more general setting in which the algorithm may discard a set from its solution.

**Theorem 7**

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\sqrt{\rho _{\mathrm{max}}})\). This bound holds even if the algorithm is allowed to discard sets. Furthermore, it holds also in the binary case, where all demands, coverages, penalties and costs are either \(0\) or \(1\).

*Proof*

Our lower bound construction uses affine planes defined as follows. Let \(n = q^2\), where \(q\) is prime. In our construction, each pair \((a,b) \in \mathbb {Z}_q \times \mathbb {Z}_q\) corresponds to an element. Sets will correspond to lines: a line in this finite geometry is a collection of pairs \((x,y) \in \mathbb {Z}_q \times \mathbb {Z}_q\) satisfying either \(y \equiv ax+b \pmod q\), for some given \(a,b \in \mathbb {Z}_q\), or of the form \((c,*)\) for some given \(c \in \mathbb {Z}_q\). There are \(q^2 + q = \Theta (n)\) such lines.

- 1.
All points can be covered by \(q\) disjoint (parallel) lines.

- 2.
Two lines that intersect in more than a single point are necessarily identical.

Otherwise, \(\sqrt{n}/2<r\le n/2\). Let \(L\) be a line chosen uniformly at random. The probability that \(L\) is retained by the algorithm is at most \(1/2\), since \(r\le n/2\). We now extend the input sequence by one more set \(L^c \mathop {=}\limits ^\mathrm{def}\left\{ 1,\ldots ,n \right\} \setminus L\), and assign \(L^c\) unit cost. Note that by Property (2), if \(L\) is not retained by the algorithm, then the number of other lines that cover the points of \(L\) cannot be smaller than \(|L|=\sqrt{n}\), and hence the expected cost of alg due only to the points of \(L\) (either by covering set costs or by incurred penalties) is at least \(\sqrt{n}/2\). Obviously, throwing out any set from the solution at this time will not help to reduce the cost. On the other hand, the optimal solution to this scenario is the sets \(L\) and \(L^c\), whose cost is \(2\), and hence the competitive ratio is at least \(\Omega (\sqrt{n})\). \(\square \)

*Remarks.* First, we note that in the proof above, the unit-cost set \(L^c\) can be replaced by \(\sqrt{n}-1\) sets, where each set covers \(\sqrt{n}\) elements and costs \(\frac{1}{\sqrt{n}-1}\). Second, we note that one may be concerned that in the first case, the actual \(\rho _{\mathrm{max}}\) of the instance is not \(n\). This can be easily remedied as follows. Let the instance consist of \(2n\) elements: \(n\) elements in the affine plane as in the proof, and another \(n\) dummy elements. The dummy elements will be all covered by a single set that arrives first in the input sequence. The remainder of the input sequence is as in the proof. This allows us to argue that the *actual*\(\rho _{\mathrm{max}}\) is indeed \(n\), whatever the ensuing scenario is, while decreasing the lower bound by no more than a constant factor.

The above theorems hold even if \(\rho _{\mathrm{max}}\) is known to the algorithm. However, if \(\rho _{\mathrm{max}}\) is unknown, and discarding sets is not allowed, then we get a stronger lower bound.

**Theorem 8**

The competitive ratio of any randomized online algorithm for otf is \(\Omega (\rho _{\mathrm{max}})\), if the algorithm cannot discard sets and has no knowledge of \(\rho _{\mathrm{max}}\). It holds even in the case of unit penalties, demands and coverage.

*Proof*

Let alg be a randomized algorithm. Suppose for the sake of deriving a contradiction that there is an arbitrarily slow growing invertible function \(h\) such that alg has competitive ratio at most \(\rho _{\mathrm{max}}/h(\rho _{\mathrm{max}})\).

For every sufficiently large \(x\), we shall construct an instance \(I\) with \(\rho _{\mathrm{max}}= \rho _{\mathrm{max}}(I) \ge x\) for which the performance ratio of alg is at least \(2\rho _{\mathrm{max}}/h(\rho _{\mathrm{max}})\). This contradicts the assumption of the competitive ratio of alg, implying the theorem.

Let \(x\) be value satisfying \(h(x) \ge 4\) and let \(f(x) = h(x)/4\). The instance we construct as follows has only one element with unit penalty. A set arrives with cost \(1/x\). If alg takes the set with probability less than \(\frac{1}{2}\), we stop. Otherwise, we present a second set with cost \(1/f^{-1}(x)\).

It follows that when \(\rho _{\mathrm{max}}\) is unknown, it will not be possible to obtain an \(O(\sqrt{\rho _{\mathrm{max}}})\)-competitive algorithm without the ability to discard sets.

## 5 Conclusion

As mentioned in the introduction, the special case of opip in which the matrix is binary (i.e., each set requires either one or zero copies of each item) was considered in [8], where an upper bound and an almost tight lower bound on the competitive ratio of randomized algorithms where presented. We have shown that a variant of the algorithm from [8] applies to general opip. We note that the above lower bound applies to the unit capacities case. However, there is no lower bound for opip with non-unit uniform capacities.

We have proven matching upper and lower bounds on the competitive ratio for otf. We have shown that even randomized algorithms cannot have competitive ratio better than \(\Omega (\sqrt{\rho }_{\mathrm{max}})\). The lower bound holds even if \(\rho _{\mathrm{max}}\) is known, and even if one is allowed to drop previously selected sets. On the other hand, the upper bound is obtained due to a simple deterministic algorithm that does not drop sets. Unfortunately, our algorithm is based on the prior knowledge of \(\rho _{\mathrm{max}}\). It remains an open question whether there is an \(O(\sqrt{\rho _{\mathrm{max}}})\)-competitive algorithm that has no knowledge of \(\rho _{\mathrm{max}}\). We have eliminated the possibility of an \(O(\sqrt{\rho _{\mathrm{max}}})\) upper bound for an algorithm that is not allowed to discard sets.

## Footnotes

- 1.
We misuse the term “set” for simplicity.

## Notes

### Acknowledgments

We thank Moti Medina for going to Berkeley to represent us.

### References

- 1.Alon, N., Awerbuch, B., Azar, Y., Buchbinder, N., Naor, J.: The online set cover problem. SIAM J. Comput.
**39**(2), 361–370 (2009)MathSciNetCrossRefMATHGoogle Scholar - 2.Awerbuch, B., Azar, Y., Plotkin, S.A.: Throughput-competitive on-line routing. In: 34th IEEE Annual Symposium on Foundations of Computer Science, pp. 32–40 (1993)Google Scholar
- 3.Bateni, M., Hajiaghayi, M., Zadimoghaddam, M.: Submodular secretary problem and extensions. In: 13th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Volume 6302 of LNCS, pp. 39–52 (2010)Google Scholar
- 4.Berman, P.: A \(d/2\) approximation for maximum weight independent set in \(d\)-claw free graphs. Nord. J. Comput.
**7**(3), 178–184 (2000)MathSciNetMATHGoogle Scholar - 5.Buchbinder, N., Naor, J.: Online primal-dual algorithms for covering and packing. Math. Oper. Res.
**34**(2), 270–286 (2009)MathSciNetCrossRefMATHGoogle Scholar - 6.Chekuri, C., Khanna, S.: On multidimensional packing problems. SIAM J. Comput.
**33**(4), 837–851 (2004)MathSciNetCrossRefMATHGoogle Scholar - 7.Cygan, M.: Improved approximation for 3-dimensional matching via bounded pathwidth local search. In: 54th IEEE Annual Symposium on Foundations of Computer Science, pp. 509–518 (2013)Google Scholar
- 8.Emek, Y., Halldórsson, M.M., Mansour, Y., Patt-Shamir, B., Radhakrishnan, J., Rawitz, D.: Online set packing. SIAM J. Comput.
**41**(4), 728–746 (2012)MathSciNetCrossRefMATHGoogle Scholar - 9.Feldman, M., Naor, J.S., Schwartz, R.: Improved competitive ratios for submodular secretary problems. In: 14th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, Volume 6845 of LNCS, pp. 218–229 (2011)Google Scholar
- 10.Freeman, P.: The secretary problem and its extensions: a review. Int. Stat. Rev.
**51**(2), 189–206 (1983)MathSciNetCrossRefMATHGoogle Scholar - 11.Frieze, A.M., Clarke, M.R.B.: Approximation algorithms for the \(m\)-dimensional 0–1 knapsack problem: worst-case and probabilistic analyses. Eur. J. Oper. Res.
**15**, 100–109 (1984)MathSciNetCrossRefMATHGoogle Scholar - 12.Gilbert, J.P., Mosteller, F.: Recognizing the maximum of a sequence. J. Am. Stat. Assoc.
**61**(313), 35–73 (1966)MathSciNetCrossRefGoogle Scholar - 13.Halldórsson, M.M., Kratochvíl, J., Telle, J.A.: Independent sets with domination constraints. Discrete Appl. Math.
**99**(1–3), 39–54 (2000)MathSciNetCrossRefMATHGoogle Scholar - 14.Halldórsson, M.M., Patt-Shamir, B., Rawitz, D.: Online scheduling with interval conflicts. Theory Comput. Syst.
**53**(2), 300–317 (2013)MathSciNetCrossRefMATHGoogle Scholar - 15.Håstad, J.: Clique is hard to approximate within \(n^{1-\epsilon }\). Acta Math.
**182**(1), 105–142 (1999)MathSciNetCrossRefGoogle Scholar - 16.Hazan, E., Safra, S., Schwartz, O.: On the complexity of approximating k-set packing. Comput. Complex.
**15**(1), 20–39 (2006)MathSciNetCrossRefMATHGoogle Scholar - 17.Ibarra, O.H., Kim, C.E.: Fast approximation algorithms for the knapsack and sum of subset problems. J. ACM
**22**(4), 463–468 (1975)MathSciNetCrossRefMATHGoogle Scholar - 18.Magazine, M.J., Chern, M.-S.: A note on approximation schemes for multidimensional knapsack problems. Math. Oper. Res.
**9**(2), 244–247 (1984)MathSciNetCrossRefMATHGoogle Scholar - 19.Mansour, Y., Patt-Shamir, B., Rawitz, D.: Competitive router scheduling with structured data. Theor. Comput. Sci.
**530**, 12–22 (2014)MathSciNetCrossRefMATHGoogle Scholar - 20.Raghavan, P., Thompson, C.D.: Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica
**7**(4), 365–374 (1987)MathSciNetCrossRefMATHGoogle Scholar - 21.Sahni, S.: Approximate algorithms for the 0/1 knapsack problem. J. ACM
**22**(1), 115–124 (1975)MathSciNetCrossRefMATHGoogle Scholar - 22.Srinivasan, A.: Improved approximations of packing and covering problems. In: 27th Annual ACM Symposium on the Theory of Computing, pp. 268–276 (1995)Google Scholar