Streaming Algorithms for Bin Packing and Vector Scheduling

Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases, but have been less well-studied when the volume of the input is truly massive, and cannot even be read into memory. This is captured by the streaming model of computation, where the aim is to approximate the cost of the solution in one pass over the data, using small space. As a result, streaming algorithms produce concise input summaries that approximately preserve the optimum value. We design the first efficient streaming algorithms for these fundamental problems in combinatorial optimization. For Bin Packing, we provide a streaming asymptotic $1+\varepsilon$-approximation with $\widetilde{O}\left(\frac{1}{\varepsilon}\right)$ memory, where $\widetilde{O}$ hides logarithmic factors. Moreover, such a space bound is essentially optimal. Our algorithm implies a streaming $d+\varepsilon$-approximation for Vector Bin Packing in $d$ dimensions, running in space $\widetilde{O}\left(\frac{d}{\varepsilon}\right)$. For the related Vector Scheduling problem, we show how to construct an input summary in space $\widetilde{O}(d^2\cdot m / \varepsilon^2)$ that preserves the optimum value up to a factor of $2 - \frac{1}{m} +\varepsilon$, where $m$ is the number of identical machines.


Introduction
The streaming model captures many scenarios when we must process very large volumes of data, which cannot fit into the working memory. The algorithm makes one or more passes over the data with a limited memory, but does not have random access to the data. Thus, it needs to extract a concise summary of the huge input, which can be used to approximately answer the problem under consideration. The main aim is to provide a good trade-off between the space used for processing the input stream (and hence, the summary size) and the accuracy of the (best possible) answer computed from the summary. Other relevant parameters are the time and space needed to make the estimate, and the number of passes, ideally equal to one.
While there have been many effective streaming algorithms designed for a range of problems in statistics, optimization, and graph algorithms (see surveys by Muthukrishnan [39] and McGregor [38]), there has been little attention paid to the core problems of packing and scheduling. These are fundamental abstractions, which form the basis of many generalizations and extensions [13,14]. In this work, we present the first efficient algorithms for packing and scheduling that work in the streaming model.
A first conceptual challenge is to resolve what form of answer is desirable in this setting. If items in the input are too many to store, then it is also unfeasible to require a streaming algorithm to provide an explicit description of how each item is to be handled. Rather, our objective is for the algorithm to provide the cost of the solution, in the form of the number of bins or the duration of the schedule. Moreover, our algorithms can provide a concise description of the solution, which describes in outline how the jobs are treated in the design.
A second issue is that the problems we consider, even in their simplest form, are NP-hard. The additional constraints of streaming computation do not erase the computational challenge. In some cases, our algorithms proceed by adopting and extending known polynomial-time approximation schemes for the offline versions of the problems, while in other cases, we come up with new approaches. The streaming model effectively emphasizes the question of how compactly can the input be summarized to allow subsequent approximation of the problem of interest. Our main results show that in fact the inputs for many of our problems of interest can be "compressed" to very small intermediate descriptions which suffice to extract near-optimal solutions for the original input. This implies that they can be solved in scenarios which are storage or communication constrained.
We proceed by formalizing the streaming model, after which we summarize our results. We continue by presenting related work, and contrast with the online setting.

Problems and Streaming Model
Bin packing The BIN PACKING problem is defined as follows: The input consists of N items with sizes s 1 , . . . , s N (each between 0 and 1), which need to be packed into bins of unit capacity. That is, we seek a partition of the set of items {1, . . . , N} into subsets B 1 , . . . , B m , called bins, such that for any bin B i , it holds that j ∈B i s j ≤ 1. The goal is to minimize the number m of bins used.
We also consider the natural generalization to VECTOR BIN PACKING, where the input consists of d-dimensional vectors, with the value of each coordinate between 0 and 1 (i.e., the scalar items s i are replaced with vectors v i ). The vectors need to be packed into d-dimensional bins with unit capacity in each dimension, we thus require that v∈B i v ∞ ≤ 1 (where the infinity norm v ∞ = max i v i ).

Scheduling
The MAKESPAN SCHEDULING problem is closely related to BIN PACK-ING but, instead of filling bins with bounded capacity, we try to balance the loads assigned to a fixed number of bins. Now we refer to the input as comprising a set of jobs, with each job j defined by its processing time p j . Our goal is to assign each job on one of m identical machines to minimize the makespan, which is the maximum load over all machines.
In VECTOR SCHEDULING, a job is described not only by its processing time, but also by, say, memory or bandwidth requirements. The input is thus a set of jobs, each job j characterized by a vector v j . The goal is to assign each job into one of m identical machines such that the maximum load over all machines and dimensions is minimized.
Streaming model In the streaming scenario, the algorithm receives the input as a sequence of items, called the input stream. We do not assume that the stream is ordered in any particular way (e.g., randomly or by item sizes), so our algorithms must work for arbitrarily ordered streams. The items arrive one by one and upon receiving each item, the algorithm updates its memory state. A streaming algorithm is required to use space sublinear in the length of the stream, ideally just polylog(N), while it processes the stream. After the last item arrives, the algorithm computes its estimate of the optimal value, and the space or time used during this final computation is not restricted.
For many natural optimization problems outputting some explicit solution of the problem is not possible owing to the memory restriction (as the algorithm can store only a small subset of items). Thus the goal is to find a good approximation of the value of an offline optimal solution. Since our model does not assume that item sizes are integers, we express the space complexity not in bits, but in words (or memory cells), where each word can store any number from the input; a linear combination of numbers from the input; or any integer with O(log N) bits (for counters, pointers, etc.).

Our Results
Bin packing In Section 3, we present a streaming algorithm for BIN PACKING, which outputs an asymptotic (1 + ε)-approximation of OPT, the optimal number of bins, using O 1 ε · log 1 ε · log OPT memory. 1 This means that the algorithm uses at most (1 + ε) · OPT + o(OPT) bins, and in our case, the additive o(OPT) term is bounded by the space used. The novelty of our contribution is to combine a data structure that approximately tracks all quantiles in a numeric stream [27] with techniques for approximation schemes [19,34]. We show that we can improve upon the log OPT factor in the space complexity if randomization is allowed or if item sizes are drawn from a bounded-size set of real numbers. On the other hand, we argue that our result is close to optimal, up to a factor of O log 1 ε , if item sizes are accessed only by comparisons (including comparisons with some fixed constants). Thus, one cannot get an estimate with at most OPT + o(OPT) bins by a streaming algorithm, unlike in the offline setting [29]. The hardness emerges from the space complexity of the quantiles problem in the streaming model.
For VECTOR BIN PACKING, we design a streaming asymptotic (d + ε)approximation algorithm running in space O d ε · log d ε · log OPT ; see Section 3.3. We remark that if vectors are rounded into a sublinear number of types, then better than d-approximation is not possible [7].
Scheduling For MAKESPAN SCHEDULING, one can obtain a straightforward streaming (1 + ε)-approximation 2 with space of only O( 1 ε · log 1 ε ) by rounding sizes of suitably large jobs to powers of 1 + ε and counting the total size of small jobs. In a higher dimension, it is also possible to get a streaming (1 + ε)-approximation, by the rounding introduced by Bansal et al. [8]. However, the memory required for this algo- , and thus only practical when d is a very small constant. Moreover, such a huge amount of memory is needed even if the number m of machines (and hence, of big jobs) is small as the algorithm rounds small jobs into exponentially many types. See Section 4.2 for more details.
In case m and d make this feasible, we design a new streaming 2 − 1 m + εapproximation with O 1 ε 2 · d 2 · m · log d ε memory, which implies a streaming 2approximation algorithm running in space O(d 2 ·m 3 ·log dm). We thus obtain a much better approximation than for VECTOR BIN PACKING with a reasonable amount of memory (although to compute the actual makespan from our input summary, it takes time doubly exponential in d [8]). Our algorithm is not based on rounding, as in the aforementioned algorithms, but on combining small jobs into containers, and the approximation guarantee of this approach is at least 2 − 1 m , which we demonstrate by an example. We describe the algorithm in Section 4.

Related Work
We give an overview of related work in offline, online, and sublinear algorithms, and highlight the differences between online and streaming algorithms. Recent surveys of Christensen et al. [13] and Coffman et al. [14] have a more comprehensive overview.

Bin Packing
Offline approximation algorithms BIN PACKING is an NP-complete problem and indeed it is NP-hard even to decide whether two bins are sufficient or at least three bins are necessary. This follows by a simple reduction from the PARTITION problem and presents the strongest inapproximability to date. Most work in the offline model focused on providing asymptotic R-approximation algorithms, which use at most R · OPT + o(OPT) bins. In the following, when we refer to an approximation for BIN PACKING we implicitly mean the asymptotic approximation. The first polynomial-time approximation scheme (PTAS), that is, a (1 + ε)-approximation for any ε > 0, was given by Fernandez de la Vega and Lueker [19]. Karmarkar and Karp [34] provided an algorithm which returns a solution with OPT + O(log 2 OPT) bins. Recently, Hoberg and Rothvoß [29] proved it is possible to find a solution with OPT + O(log OPT) bins in polynomial time.
The input for BIN PACKING can be described by N numbers, corresponding to item sizes. While in general these sizes may be distinct, in some cases the input description can be compressed significantly by specifying the number of items of each size in the input. Namely, in the HIGH-MULTIPLICITY BIN PACKING problem, the input is a set of pairs (a 1 , s 1 ), . . . , (a σ , s σ ), where for i = 1, . . . , σ , a i is the number of items of size s i (and all s i 's are distinct). Thus, σ encodes the number of item sizes, and hence the size of the description. The goal is again to pack these items into bins, using as few bins as possible. For constant number of sizes, σ , Goemans and Rothvoß [25] recently gave an exact algorithm for the case of rational item sizes running in time (log , where is the largest multiplicity of an item or the largest denominator of an item size, whichever is the greater. While these algorithms provide satisfying theoretical guarantees, simple heuristics are often adopted in practice to provide a "good-enough" performance. FIRST FIT [33], which puts each incoming item into the first bin where it fits and opens a new bin only when the item does not fit anywhere else achieves 1.7approximation [17]. For the high-multiplicity variant, using an LP-based Gilmore-Gomory cutting stock heuristic [23,24] gives a good running time in practice [2] and produces a solution with at most OPT + σ bins. However, neither of these algorithms adapts well to the streaming setting with possibly distinct item sizes. For example, FIRST FIT has to remember the remaining capacity of each open bin, which in general can require space proportional to OPT. VECTOR BIN PACKING proves to be substantially harder to approximate, even in a constant dimension. For fixed d, Bansal, Eliáš, and Khan [7] showed an approximation factor of approximately 0.807 + ln(d + 1) + ε. For general d, a relatively simple algorithm based on an LP relaxation, due to Chekuri and Khanna [11], remains the best known, with an approximation guarantee of 1 + εd + O(log 1 ε ). The problem is APX-hard even for d = 2 [41], and cannot be approximated within a factor better than d 1−ε for any fixed ε > 0 [13] if d is arbitrarily large. Hence, our streaming (d +ε)-approximation for VECTOR BIN PACKING asymptotically matches the offline lower bound.
Sampling-based algorithms Sublinear-time approximation schemes constitute a model related to, but distinct from, streaming algorithms. Batu, Berenbrink, and Sohler [9] provide an algorithm that takes O √ N · poly( 1 ε ) weighted samples, meaning that the probability of sampling an item is proportional to its size. It outputs an asymptotic (1 + ε)-approximation of OPT. If uniform samples are also available, then sampling O N 1/3 · poly( 1 ε ) items is sufficient. These results are tight, up to a poly( 1 ε , log N) factor. Later, Beigel and Fu [10] focused on uniform sampling of items, proving that (N/SIZE) samples are sufficient and necessary, where SIZE is the total size of all items. Their approach implies a streaming approximation scheme by uniform sampling of the substream of big items. However, the space complexity in terms of 1 ε is not stated in the paper, and we calculate it to be ε −c for a constant c ≥ 10. Moreover, ( 1 ε 2 ) samples are clearly needed to estimate the number of items with size close to 1. Note that our approach is deterministic and substantially different than taking a random sample from the stream.
Online algorithms Online and streaming algorithms are similar in the sense that they are required to process items one by one. However, an online algorithm must make all its decisions immediately -it must fix the placement of each incoming item on arrival. 3 A streaming algorithm can postpone such decisions to the very end, but is required to keep its memory small, whereas an online algorithm may remember all items that have arrived so far. Hence, online algorithms apply in the streaming setting only when they have small space cost, including the space needed to store the solution constructed so far. The performance of online algorithms is usually quantified by the competitive ratio, which is defined as the worst-case ratio between the online algorithm's cost and that of the optimal offline algorithm, analogous to the approximation ratio for approximation algorithms.
For BIN PACKING, the best possible competitive ratio is substantially worse than what we can achieve offline or even in the streaming setting. Balogh et al. [5] designed an asymptotically 1.5783-competitive algorithm, while the current lower bound on the asymptotic competitive ratio is 1.5403 [6]. This (relatively complicated) online algorithm is based on the HARMONIC algorithm [36], which for some integer K classifies items into size groups (0, . It packs each group separately by NEXT FIT, keeping just one bin open, which is closed whenever the next item does not fit. Thus HARMONIC can run in memory of size K and be implemented in the streaming model, unlike most other online algorithms which require maintaining the levels of all bins opened so far. Its competitive ratio tends to approximately 1.691 as K goes to infinity. Surprisingly, this is also the best possible ratio if only a bounded number of bins is allowed to be open for an online algorithm [36], which can be seen as the intersection of online and streaming models. For VECTOR BIN PACKING, the best known competitive ratio of d + 0.7 [21] is achieved by FIRST FIT. A lower bound of (d 1−ε ) on the competitive ratio was shown by Azar et al. [3]. It is thus currently unknown whether or not online algorithms outperform streaming algorithms in the vector setting.

Scheduling
Offline approximation algorithms MAKESPAN SCHEDULING is strongly NPcomplete [22], which in particular rules out the possibility of a PTAS with time complexity poly( 1 ε , n). After a sequence of improvements, Jansen, Klein, and Verschae [32] gave a PTAS with time complexity 2 O(1/ε) + O(n log n), which is essentially tight under the Exponential Time Hypothesis (ETH) [12].
For constant dimension d, VECTOR SCHEDULING also admits a PTAS, as shown by Chekuri and Khanna [11]. However, the running time is of order n (1/ε) O(d) . The approximation scheme for a fixed d was improved to an efficient PTAS, namely to an algorithm running in time 2 (1/ε) O(d) + O(dn), by Bansal et al. [8], who also showed that the running time cannot be significantly improved under ETH. In contrast, our streaming poly(d, m)-space algorithm computes an input summary maintaining 2approximation of the original input. This respects the lower bound, since to compute the actual makespan from the summary, we still need to execute an offline algorithm, with running time doubly exponential in d. The state-of-the-art approximation ratio for large d is O(log d/(log log d)) [28,31], while α-approximation is not possible in polynomial time for any constant α > 1 and arbitrary d, unless NP = ZPP [11].
Online algorithms For the scalar problem, the optimal competitive ratio is known to lie in the interval (1.88, 1.9201) [1,20,26,30], which is substantially worse than what can be done by a simple streaming (1 + ε)-approximation in space O( 1 ε · log 1 ε ). Interestingly, for VECTOR SCHEDULING, the algorithm by Im et al. [31] with ratio O(log d/(log log d)) actually works in the online setting as well and needs space O(d · m) only during its execution (if the solution itself is not stored), which makes it possible to implement it in the streaming setting. This online ratio cannot be improved as there is a lower bound of (log d/(log log d)) [4,31], whereas in the streaming setting we can achieve a 2-approximation with a reasonable memory (or even (1 + ε)-approximation for a fixed d). If all jobs have sufficiently small size, we improve the analysis in [31] and show that the online algorithm achieves (1 + ε)-approximation; see Section 4.

Bin Packing
Notation For an instance I , let N(I ) be the number of items in I , let SIZE(I ) be the total size of all items in I , and let OPT(I ) be the number of bins used in an optimal solution for I . Clearly, SIZE(I ) ≤ OPT(I ). For a bin B, let s(B) be the total size of items in B. For a given ε > 0, we use O(f ( 1 ε )) to hide factors logarithmic in 1 ε and OPT(I ), i.e., to denote O f ( 1 ε ) · polylog 1 ε · polylog OPT(I ) .
Overview We first briefly describe the approximation scheme of Fernandez de la Vega and Lueker [19], whose structure we follow in outline. Let I be an instance of BIN PACKING. Given a precision requirement ε > 0, we say that an item is small if its size is at most ε; otherwise, it is big. Note that there are at most 1 ε SIZE(I ) big items. The rounding scheme in [19], called "linear grouping", works as follows: We sort the big items by size non-increasingly and divide them into groups of k = ε · SIZE(I ) items (the first group thus contains the k biggest items). In each group, we round up the sizes of all items to the size of the biggest item in that group. It follows that the number of groups and thus the number of distinct item sizes (after rounding) is bounded by 1 ε 2 . Let I R be the instance of HIGH-MULTIPLICITY BIN PACKING consisting of the big items with rounded sizes. It can be shown that where I B is the set of big items in I (we detail a similar argument in Section 3.1). Due to the bounded number of distinct item sizes, we can find a close-to-optimal solution for I R efficiently. We then translate this solution into a packing for I B in the natural way. Finally, small items are filled greedily (e.g., by First Fit) and it can be shown that the resulting complete solution for I is a 1 + O(ε)-approximation.
Karmarkar and Karp [34] proposed an improved rounding scheme, called "geometric grouping". It is based on the observation that item sizes close to 1 should be approximated substantially better than item sizes close to ε. We present a version of such a rounding scheme in Section 3.1.
Our algorithm follows a similar outline with two stages (rounding and finding a solution for the rounded instance), but working in the streaming model brings two challenges: First, in the rounding stage, we need to process the stream of items and output a rounded high-multiplicity instance with few item sizes that are not too small, while keeping only a small number of items in the memory. Second, the rounding of big items needs to be done carefully so that not much space is "wasted", since in the case when the total size of small items is relatively large, we argue that our solution is close to optimal by showing that the bins are nearly full on average. Input summary properties More precisely, we fix some ε > 0 that is used to control the approximation guarantee. During the first stage, our algorithm has one variable which accumulates the total size of all small items in the input stream, i.e., those of size at most ε. Let I B be the substream consisting of all big items. We process I B and output a rounded high-multiplicity instance I R with the following properties: (P1) There are at most σ item sizes in I R , all of them larger than ε, and the memory required for processing I B is O(σ ). (P2) The i-th biggest item in I R is at least as large as the i-th biggest item in I B (and the number of items in I R is the same as in I B ). This immediately implies that any packing of I R can be used as a packing of I B (in the same number of bins), so OPT(I B ) ≤ OPT(I R ), and moreover, In words, (P2) means that we are rounding item sizes up and, together with (P3), it implies that the optimal solution for the rounded instance approximates OPT(I B ) well. The last property is used in the case when the total size of small items constitutes a large fraction of the total size of all items. Note that SIZE(I R ) − SIZE(I B ) can be thought of as bin space "wasted" by rounding.
Observe that the succinctness of the rounded instance depends on σ . First, we show a streaming algorithm for rounding with σ = O( 1 ε 2 ). Then we improve upon it and give an algorithm with σ = O( 1 ε ), which is essentially the best possible, while guaranteeing an error of ε · OPT(I B ) introduced by rounding (elaborated on in Section 3.2). More precisely, we show the following:

Lemma 1 Given a stream I B of big items, there is a deterministic streaming algorithm that outputs a HIGH-MULTIPLICITY BIN PACKING instance satisfying
Before describing the rounding itself and proving Lemma 1, we explain how to use it to calculate an accurate estimate of the number of bins. This part follows a similar outline as in other approximation schemes for BIN PACKING. In a nutshell, rounding the sizes of big items allows us to efficiently find an approximate solution for them, and then there are two cases: Either we know for sure that all small items fit into bins with big items, or all bins except for one are nearly full. Below, we provide details for completeness.
Calculating a bound on the number of bins after rounding First, we obtain a solution S of the rounded instance I R . For instance, we may round the solution of the linear program introduced by Gilmore and Gomory [23,24], and get a solution with at most OPT(I R )+σ bins. Or, if item sizes are rational numbers, we may compute an optimal solution for I R by the algorithm of Goemans and Rothvoß [25]; however, the former approach appears to be more efficient and more general. In the following, we thus assume that S uses at most OPT(I R ) + σ bins.
We now calculate a bound on the number of bins in the original instance. Let W be the total free space in the bins of S that can be used for small items. To be precise, W equals the sum over all bins B in S of max(0, 1 − ε − s(B)). The reason behind the definition of W is the following: Our definition of small items means that in the worst case, each small item could be as big as ε. To allow us to handle these in the second phase of the algorithm, we will cap the capacity of bins at 1 − ε as items of size ε do not fit into bins already containing items of total size more than 1 − ε. On the other hand, if a small item does not fit into a bin, then the remaining space in the bin is smaller than ε.
Let s be the total size of all small items in the input stream. If s ≤ W , then all small items surely fit into the free space of bins in S (and can be assigned there greedily by FIRST FIT). Consequently, we output that the number of bins needed for the stream of items is at most |S|, i.e., the number of bins in solution S for I R . Otherwise, we need to place small items of total size at most s = s − W into new bins and it is easy to see that opening at most s /(1 − ε) ≤ (1 + O(ε)) · s + 1 bins for these small items suffices. Hence, in the case s > W, we output that |S| + s /(1 − ε) bins are sufficient to pack all items in the stream.
We prove that the number of bins that we output in either case is a good approximation of the optimal number of bins, provided that S is a good solution for I R .
Lemma 2 Let I be given as a stream of items. Suppose that 0 < ε ≤ 1 3 , that the rounded instance I R , created from I , satisfies properties (P1)-(P4), and that the solution S of I R uses at most OPT(I R ) + σ bins. Let ALG(I ) be the number of bins that our algorithm outputs. Then, it holds that OPT Proof We analyze the two cases of the algorithm: Case s ≤ W In this case, small items fit into the bins of S and ALG(I ) = |S|. For the inequality OPT(I ) ≤ ALG(I ), observe that the packing S can be used as a packing of items in I B (in a straightforward way) with no less free space for small items by property (P2). Thus OPT(I ) ≤ |S|.
To upper bound ALG(I ), note that where the second inequality follows from property (P3) and the third inequality holds as I B is a subinstance of I .
Case s > W Recall that ALG(I ) = |S| + s /(1 − ε) . We again have that S can be used as a packing of I B with no less free space for small items. Thus, the total size of small items that do not fit into bins in S is at most s and these items clearly fit into For the other inequality, consider starting with solution S for I R , first to (almost) fill up the bins of S with small items of total size W , then using s /(1−ε) additional bins for the remaining small items. Note that in each bin, except the last one, the unused space is less than ε, thus the total size of items in I R and small items is more than (ALG(I ) − 1) · (1 − ε). Finally, we replace items in I R by items in I B and the total size of items decreases by SIZE( Rearranging and using ε ≤ 1 3 , we get Considered together, these two cases both meet the claimed bound.

Processing the Stream and Rounding
In this section, we describe the streaming algorithm of the rounding stage and prove Lemma 1. The algorithm makes use of the deterministic quantile summary of Greenwald and Khanna [27]. Given a precision δ > 0 and an input stream of numbers s 1 , . . . , s N , their algorithm computes a data structure Q(δ) which is able to answer a quantile query with precision δN. Namely, for any 0 ≤ φ ≤ 1, it returns an element s of the input stream such that the rank of s is in where the rank of s is the position of s in the non-increasing ordering of the input stream. 4 The data structure stores an ordered sequence of tuples, each consisting of an input number s i and valid lower and upper bounds on the true rank of s i in the input sequence. 5 The first and last stored items correspond to the maximum and minimum numbers in the stream, respectively. Note that the lower and upper bounds on the rank of any stored number differ by at most 2δN and upper (or lower) bounds on the rank of two consecutive stored numbers differ by at most 2δN as well. The space requirement of log δN), however, in practice the space used is observed to scale linearly with 1 δ [37]. (We remark that an offline optimal data structure for δ-approximate quantiles uses space O 1 δ .) We use the data structure Q(δ) to construct our algorithm for processing the stream I B of big items. Note that it is possible to use any other quantile summary instead of Q(δ) in a similar way.
Simple rounding algorithm We begin by describing a simpler solution with δ = 1 4 ε 2 , resulting in a rounded instance with O( 1 ε 2 ) item sizes. Subsequently, we introduce a more involved solution with smaller space cost, which proves Lemma 1. The algorithm uses a quantile summary structure to determine the rounding scheme. Given a (big) item s i from the input, we insert it into Q(δ). After processing all items, we extract from Q(δ) the set of stored input items (i.e., their sizes) together with upper bounds on their rank (where the largest size has highest rank 1, and the smallest size has least rank N B ). Note that the number N B of big items in I B is less than 1 ε SIZE(I B ) ≤ 1 ε OPT(I B ) as each is of size more than ε. Let q be the number of items (or tuples) extracted from Q(δ); we get that q ) . Let (a 1 , u 1 = 1), (a 2 , u 2 ), . . . , (a q , u q = N B ) be the output pairs of an item size and the bound on its rank, sorted so that a 1 ≥ a 2 ≥ · · · ≥ a q . We define the rounded instance I R with at most q item sizes as follows: I R contains (u j +1 − u j ) items of size a j for each j = 1, . . . , q − 1, plus one item of size a q . (See Fig. 1.) We show that the desired properties (P1)-(P4) hold with σ = q. Property (P1) follows easily from the definition of I R and the design of data structure Q(δ). Note that the number of items is preserved. To show (P2), suppose for a contradiction that the i-th biggest item in I B is bigger than the i-th biggest item in I R , whose size is a j for j = 1, . . . , q − 1, i.e., i ∈ [u j , u j +1 ) (note that j < q as a q is the smallest item in I B and is present only once in I R ). We get that the rank of item a j in I B is strictly more than i, and as i ≥ u j , we get a contradiction with the fact that u j is a valid upper bound on the rank of a j in I B . Fig. 1 An illustration of the original distribution of sizes of big items in I B , depicted by a smooth curve, and the distribution of item sizes in the rounded instance I R , depicted by a bold "staircase" function. The distribution of I R (which is I R without the 4δN B biggest items) is depicted by a (blue) dash dotted line. Selected items a i , . . . , a q , with q = 11, are illustrated by (red) dots, and the upper bounds u 1 , . . . , u q on the ranks appear on the x axis Next, we give bounds for OPT(I R ) and SIZE(I R ), which are required by properties (P3) and (P4). We pack the 4δN B biggest items in I R separately into "extra" bins. Using the choice of δ = 1 4 ε 2 and N B ≤ 1 ε SIZE(I B ), we bound the number of these items and thus extra bins by 4δN B ≤ ε · SIZE(I B ) ≤ ε · OPT(I B ). Let I R be the remaining items in I R . We claim that the i-th biggest item b i in I B is bigger than the i-th biggest item in I R with size equal to a j for j = 1, . . . , q. For a contradiction, suppose that b i < a j , which implies that the rank r j of a j in I B is less than i. Note that j < q as a q is the smallest item in I B . Since we packed the 4δN B biggest items from I R separately, one of the positions of a j in the ordering of I R is i + 4δN B and so we have i + 4δN B < u j +1 ≤ u j + 2δN B , where the first inequality holds by the construction of I R and the second inequality is by the design of data structure Q(δ). It follows that i < u j − 2δN B . Combining this with r j < i, we obtain that the rank of a j in I B is less than u j − 2δN B , which contradicts that u j − 2δN B is a valid lower bound on the rank of a j .
The claim implies OPT(I R ) ≤ OPT(I B ) and SIZE(I R ) ≤ SIZE(I B ). We thus get that OPT( Better rounding algorithm Our improved rounding algorithm reduces the number of sizes in the rounded instance (and also the memory requirement) . It is based on the observation that the number of items of sizes close to ε can be approximated with much lower accuracy than the number of items with sizes close to 1, without affecting the quality of the overall approximation. This was observed already by Karmarkar and Karp [34].
We have now built up all the pieces we need in order to prove Lemma 1, that there is a small space streaming algorithm outputting a HIGH-MULTIPLICITY BIN PACKING instance satisfying (P1)-(P4).
Proof of Lemma 1 Let k = log 2 1 ε . We first group big items in k groups 0, . . . , k−1 by size such that in group j there are items with sizes in (2 −j −1 , 2 −j ]. That is, the size intervals for groups are (0.5, 1], (0.25, 0.5], etc. Let N j , j = 0, . . . , k − 1, be the number of big items in group j ; clearly, N j < 2 j +1 SIZE(I B ) ≤ 2 j +1 OPT(I B ). Note that the total size of items in group j is in (2 −j −1 · N j , 2 −j · N j ]. Summing over all groups, we get in particular that For each group j , we use a separate data structure Q j := Q(δ) with δ = 1 8 ε, where Q(δ) is the quantile summary from [27] with precision δ. So when a big item of size s i arrives, we find j such that s i ∈ (2 −j −1 , 2 −j ] and insert s i into Q j . After processing all items, for each group j , we do the following: We extract from Q j the set of stored input items (i.e., their sizes) together with upper bounds on their rank. Let An auxiliary instance I We show that the desired properties (P1)-(P4) are satisfied. Property (P1) follows easily from the definition of I R as the union of instances I j R and the design of data structures Q j . To see property (P2), for every group j , it holds that the i-th biggest item in group j in I R is at least as large as the i-th biggest item in group j in I B . Indeed, for any p = 0, . . . , q j , u j p is a valid upper bound on the rank of a j p in group j in I B and ranks of items of size a j p in group j in I R are at least u j p . Moreover, the number of items is preserved in every group. Hence, overall, the i-th biggest item in I R cannot be smaller than the i-th biggest item in I B .
Next, we prove properties (P3) and (P4), i.e., the bounds on OPT(I R ) and on SIZE(I R ). For each group j , we pack the 4δN j biggest items in I R with size in group j into "extra" bins, each containing 2 j items, except for at most one extra bin which may contain fewer than 2 j items. This is possible as any item in group j has size at most 2 −j . Using the choice of δ = 1 8 ε and (1), we bound the total number of extra bins by Let I R be the remaining items in I R . Consider group j and let I B (j ) and I R (j ) be the items with sizes in (2 −j −1 , 2 −j ] in I B and in I R , respectively. We claim that the i-th biggest item b i in I B (j ) is at least as large as the i-th biggest item in I R (j ) with size equal to a p for p = 1, . . . , q j . For a contradiction, suppose that b i < a p , which implies that the rank r p of a p in I B (j ) is less than i. Note that p < q j as a q j is the smallest item in I B (j ). Since we packed the largest 4δN j items from I R (j ) separately, we have i + 4δN j < u p+1 ≤ u p + 2δN j , where the last inequality is by the design of data structure Q j . It follows that i < u p − 2δN j . Combining it with r p < i, we obtain that the rank of a p in I B (j ) is less than u p − 2δN j , which contradicts that u p − 2δN j is a valid lower bound on the rank of a p . Hence, the claim holds for any group and it immediately implies OPT(I R ) ≤ OPT(I B ) and SIZE(I R ) ≤ SIZE(I B ).
Combining with (2), we get that OPT(I R ) ≤ OPT(I R ) + ε · OPT(I B ) + k ≤ (1 + ε) · OPT(I B ) + k, thus (P3) holds. Similarly, to bound the total wasted space, observe that the total size of items of I R that are not in I R is bounded by where we use (1) in the last inequality. We obtain that SIZE(I R ) ≤ SIZE(I R ) + ε · SIZE(I B ) ≤ (1 + ε) · SIZE(I B ). We conclude that properties (P1)-(P4) hold for the rounded instance I R .
Lemmas 1 and 2 directly imply our main result for BIN PACKING.

Theorem 3 There is a streaming algorithm for BIN PACKING that for each instance I given as a stream outputs an estimate ALG(I ) of OPT(I ) such that
The space requirement of the algorithm is O 1 ε · log 1 ε · log OPT(I ) .

Bin Packing and Quantile Summaries
In the previous section, the deterministic quantile summary data structure from [27] allows us to obtain a streaming approximation scheme for BIN PACKING. We argue that this connection runs deeper. We start with two scenarios for which there exist better quantile summaries. First, suppose that all big item sizes belong to a universe U of bounded size (for example, when item sizes are floating-point numbers with low precision). Then it can be better to use the quantile summary of Shrivastava et al. [40], which provides a guarantee of O( 1 δ · log |U |) on the space complexity, where δ is the precision requirement. Thus, by using k copies of this quantile summary in a similar way as in Section 3.1, we get a streaming (1 + ε)-approximation algorithm for BIN PACKING that runs in space O( 1 ε · log 1 ε · log |U |). Second, if we allow the algorithm to use randomization and fail with probability γ , we can employ the optimal randomized quantile summary of Karnin, Lang, and Liberty [35], which, for a given precision δ and failure probability η, uses space O( 1 δ · log log 1 δη ) and does not provide a δ-approximate quantile for some quantile query with probability at most η. In particular, using k copies of their data structure with precision δ = (ε) and failure probability η = γ /k, similarly as in Section 3.1, gives a streaming (1+ε)-approximation algorithm for BIN PACKING which fails with probability at most γ and runs in space O 1 ε · log 1 ε · log log(log 1 ε /εγ ) . More intriguingly, the connection between quantile summaries and BIN PACKING also goes in the other direction. Namely, we show that a streaming (1 + ε)approximation algorithm for BIN PACKING with space bounded by S(ε, OPT) (or S(ε, N)) implies a data structure of size S(ε, N) for the following ESTIMATING RANK problem: Create a summary of a stream of N numbers which is able to provide a δ-approximate rank of any query q, i.e., the number of items in the stream which are larger than q, up to an additive error of ±δN. A summary for ESTIMAT-ING RANK is essentially a quantile summary and we can actually use it to find an approximate quantile by doing a binary search over possible item names. However, this approach does not guarantee that the item name returned will correspond to one of the items present in the stream.
The reduction from ESTIMATING RANK to BIN PACKING goes as follows: Suppose that all numbers in the input stream for ESTIMATING RANK are from interval ( 1 2 , 2 3 ) (this is without loss of generality by scaling and translating) and let q be a query in ( 1 2 , 2 3 ). For each number a i in the stream for ESTIMATING RANK, we introduce two items of size a i in the stream for BIN PACKING. After these 2N items (two copies each of a 1 , . . . , a N ) are inserted in the same order as in the stream for ESTI-MATING RANK, we then insert a further 2N items in the stream for BIN PACKING, all of size 1 − q. Observe first that no pair of the first 2N items can be placed in the same bin, so we must open at least 2N bins, two for each of a 1 , . . . , a N . Since 1 3 , and a i > 1 2 , we can place at most one of the 2N items of size (1 − q) in a bin with a i in it, provided that a i + (1 − q) ≤ 1, i.e., a i ≤ q. Thus, we can pack a number of the (1 − q)-sized items, equivalent to 2(N − rank(q)), in the first 2N bins. This leaves 2 rank(q) items, all of size (1 − q). We pack these optimally into rank(q) additional bins, for a total of 2N + rank(q) bins.
We claim that a (1 + ε)-approximation of the optimum number of bins provides a 4ε-approximate rank of q. Indeed, let m be the number of bins returned by the algorithm and let r = m − 2N be the estimate of rank(q). We have that the optimal number of bins equals 2N + rank(q) and thus 2N + rank(q) ≤ m ≤ (1 + ε) · (2N + rank(q)) + o(N). By using r = m − 2N and rearranging, we get rank(q) ≤ r ≤ rank(q) + ε rank(q) + 2εN + o(N) .
Since the right-hand side can be upper bounded by rank(q) + 4εN (provided that o(N) < εN, which holds for a large enough N), r is a 4ε-approximate rank of q. Hence, the memory state of an algorithm for BIN PACKING after processing the first 2N items (of sizes a 1 , . . . , a N ) can be used as a data structure for ESTIMATING RANK.
In [16] we show a space lower bound of ( 1 ε · log εN) for comparison-based data structures for ESTIMATING RANK (and for quantile summaries as well).
Theorem 4 (Theorem 13 in [16]) For any 0 < ε < 1 16 , there is no deterministic comparison-based data structure for ESTIMATING RANK which stores o 1 ε · log εN items on any input stream of length N.
We conclude that there is no comparison-based streaming algorithm for BIN PACKING which stores o( 1 ε · log OPT) items on any input stream (recall that N = O(OPT) in our reduction). Note that our algorithm is comparison-based if we employ the comparison-based quantile summary of Greenwald and Khanna [27], except that it needs to determine the size group for each item, which can be done by comparisons with 2 −j for integer values of j . Nevertheless, comparisons with a fixed set of constants does not affect the reduction from ESTIMATING RANK (i.e., the reduction can choose an interval contained in ( 1 2 , 2 3 ) to avoid all constants fixed in the algorithm), thus the lower bound of 1 ε · log OPT applies to our algorithm as well. This yields near optimality of our approach, up to a factor of O log 1 ε . Finally, we remark that the lower bound of ( 1 ε · log log 1 δ ) for randomized comparison-based quantile summaries [35] translates to BIN PACKING as well.

Vector Bin Packing
As already observed by Fernandez de la Vega and Lueker [19], a (1 + ε)approximation algorithm for (scalar) BIN PACKING implies a d · (1 + ε)approximation algorithm for VECTOR BIN PACKING, where items are d-dimensional vectors and bins have unit capacity in every dimension. Indeed, we split the vectors into d groups according to the largest dimension (chosen arbitrarily among dimensions that have the largest value) and in each group we apply the approximation scheme for BIN PACKING, packing just according to the largest dimension. Finally, we take the union of opened bins over all groups. Since the optimum of the BIN PACKING instance for each group is a lower bound on the optimum of VECTOR BIN PACKING, we get that the solution is a d · (1 + ε)-approximation.
This can be done in the same way in the streaming model. Hence, there is a streaming algorithm for VECTOR BIN PACKING which outputs a d · (1 + ε)-approximation of OPT, the offline optimal number of bins, using O d ε · log 1 ε · log OPT memory. By scaling ε, there is a (d + ε)-approximation algorithm with O( d 2 ε ) memory. We can, however, do better by one factor of d.

Theorem 5 There is a streaming (d + ε)-approximation for VECTOR BIN PACKING algorithm that uses
Proof Given an input stream I of vectors, we create an input stream I for BIN PACKING by replacing each vector v by a single (scalar) item a of size v ∞ . We use our streaming algorithm for BIN PACKING with precision δ = ε d which uses O 1 δ · log 1 δ · log OPT memory and returns a solution with at most B = (1 + δ) · OPT(I ) + O( 1 δ ) scalar bins. Clearly, B bins are sufficient for the stream I of vectors, since in the solution for I we replace each item by the corresponding vector and obtain a valid solution for I .
Finally, we show that (1+δ)·OPT(I ) for which it is sufficient to prove that OPT(I ) ≤ d ·OPT(I ) as δ = ε d . Namely, from an optimal solution S for I , we create a solution for I with at most d · OPT(I ) bins. For each bin B in S, we split the vectors assigned to B into d groups according to the largest dimension (chosen arbitrarily among those with the largest value) and for each group i we create bin B i with vectors in group i. Then we just replace each vector v by an item of size v ∞ and obtain a valid solution for I with at most d · OPT(I ) bins.
Interestingly, a better than d-approximation using sublinear memory, which is rounding-based, is not possible, due to the following result in [7]. (Note that the result requires that the numbers in the input vectors can take arbitrary values in [0, 1], i.e., vectors do not belong to a bounded universe.)

Vector Scheduling
We provide a novel approach for creating an input summary for VECTOR SCHEDUL-ING, based on combining small items into containers. Our streaming algorithm stores all big jobs and all containers, created from small items, and these containers are relatively big as well. Thus, there is a bounded number of big jobs and containers, and the space used is bounded as well. We show that this simple summarization preserves the optimal makespan up to a factor of 2 − 1 m + ε for any 0 < ε ≤ 1. We assume that the algorithm knows (an upper bound on) m ≥ 2 in advance.
Description of algorithm AGGREGATESMALLJOBS For 0 < ε ≤ 1 and m ≥ 2, the algorithm works as follows: For each k = 1, . . . , d, it keeps track of the total load of all jobs in dimension k, denoted L k . Note that the optimal makespan satisfies OPT ≥ max k 1 m · L k (an alternative lower bound on OPT is the maximum ∞ norm of a job seen so far, but our algorithm does not use this). For brevity, let LB = max k 1 m · L k . Let γ = ε 2 / log d 2 ε ; the constant hidden in follows from the analysis below. We also ensure that γ ≤ 1 4 ε. We say that a job with vector v is big if v ∞ > γ · LB; otherwise it is small. The algorithm stores all big jobs (i.e., the full vector of each big job), while it aggregates small jobs into containers, and does not store any small job directly. A container is simply a vector c that equals the sum of vectors for small jobs assigned to this container, and we ensure that This completes the description of the algorithm. (We remark that for packing the containers, we may also use another, more efficient algorithm, such as FIRST FIT, which however makes no difference in the approximation guarantee.) Properties of the input summary After all jobs are processed, we can assume that LB = max k 1 m ·L k = 1, which implies that OPT ≥ 1. This is without loss of generality by scaling every quantity by 1/LB. First, we bound the space needed to store the input summary. Since any big job and any closed container, each characterized by a vector v, satisfy v ∞ > γ , it holds that there are at most 1 γ · d · m big jobs and closed containers. As at most one container remains open in the end and any job or container is described by d numbers, We now analyze the maximum approximation factor that can be lost by this summarization. Let I R be the resulting instance formed by big jobs and containers with small items (i.e., the input summary produced by algorithm AGGREGATESMALL-JOBS), and let I be the original instance, consisting of jobs in the input stream. We show that OPT(I R ) and OPT(I ) are close together, up to a factor of 2 − 1 m + ε, and an example in Section 4.1 shows that this bound is tight for our approach. Note, however, that we still need to execute an offline algorithm to get (an approximation of) OPT(I R ), which is not an explicit part of the summary; see the proof of Theorem 9 below.
The crucial part of the analysis is to show that containers for small items can be assigned to machines so that the loads of all machines are nearly balanced in every dimension, especially in the case when containers constitute a large fraction of the total load of all jobs. Let L C k be the total load of containers in dimension k (equal to the total load of small jobs). Let I C ⊆ I R be the instance consisting of all containers in I R . The following lemma establishes the key properties of the input summary I R .
Lemma 7 Supposing that max k 1 m · L k = 1, the following holds: (i) There is a solution for instance I C with load at most max( 1 2 , 1 m · L C k ) + 2ε + 4γ in each dimension k on every machine.
Proof (i) We obtain the solution from the randomized online algorithm by Im et al. [31].
Although this algorithm has ratio O(log d/ log log d) on general instances, we show that it behaves substantially better when jobs are small enough. In a nutshell, this algorithm works by first assigning each job j to a uniformly random machine i and if the load of machine i exceeds a certain threshold, then the job is reassigned by GREEDY.
The online GREEDY algorithm works by assigning jobs one by one, each to a machine so that the makespan increases as little as possible (breaking ties arbitrarily).
. We assume that each machine has its capacity of L k + 2ε + 4γ in each dimension k split into two parts: The first part has capacity L k + ε + 2γ in dimension k for the containers assigned randomly, and the second part has capacity ε + 2γ in all dimensions for the containers assigned by GREEDY. Note that GREEDY cares about the load in the second part only.
The algorithm assigns containers one by one as follows: For each container c, it first chooses a machine i uniformly and independently at random. If the load of the first part of machine i already exceeds L k + ε in some dimension k, then c is passed to GREEDY, which assigns it according to the loads in the second part. Otherwise, the algorithm assigns c to machine i.
As each container c satisfies c ∞ ≤ 2γ , it holds that randomly assigned containers fit into capacity L k +ε+2γ in any dimension k on any machine. We show that the expected amount of containers assigned by GREEDY is small enough so that they fit into machines with capacity of ε + 2γ , which in turn implies that there is a choice of random bits for the assignment so that the capacity for GREEDY is not exceeded. The existence of a solution with capacity L k + 2ε + 4γ in each dimension k will follow.
Consider a container c and let i be the machine chosen randomly for c. We claim that for any dimension k, the load on machine i in dimension k, assigned before processing c, exceeds L k + ε with probability of at most ε d 2 . To show the claim, we use the following Chernoff-Hoeffding bound: Fact 8 Let X 1 , . . . , X n be independent binary random variables and let a 1 , . . . , a n be coefficients in [0,1]. Let X = i a i X i . Then, for 0 < δ ≤ 1 and μ ≥ E[X], it We use this bound with variable X c for each vector c assigned randomly before vector c and not reassigned by GREEDY. We have X c = 1 if c is assigned on machine i. Let a c = 1 2γ ·c k ≤ 1. Let X = c a c X c be the random variable equal to the load on machine i in dimension k, scaled by 1 2γ . It holds that E[X] ≤ 1 m · 1 2γ · L k · m = 1 2γ · L k , since each container c is assigned to machine i with probability 1 m and L k · m is the upper bound on the total load of containers in dimension k. Using the Chernoff-Hoeffding bound with μ = 1 2γ · L k and δ = ε ≤ 1, we get that Using where the last inequality holds for a suitable choice of the multiplicative constant in the definition of γ . This is sufficient to show the claim as X > (1 + ε) · 1 2γ · L k if and only if the load on machine i in dimension k, assigned randomly before c, exceeds By the union bound, the claim implies that each container c is reassigned by GREEDY with probability at most ε d . Let G be the random variable equal to the sum of the 1 norms (where c 1 = d k=1 c k ) of containers assigned by GREEDY. Using the linearity of expectation and the claim, we have where the second inequality uses that the total load of containers in each dimension is at most m. Let μ G be the makespan of the containers created by GREEDY. Observe that each machine has a dimension with load at least μ G − 2γ . Indeed, otherwise, if there is a machine i with load less than μ G − 2γ in all coordinates, the last container c assigned by GREEDY that caused the increase of the makespan to μ G would be assigned to machine i, and the makespan after assigning c would be smaller than μ G (using c ∞ ≤ 2γ ). It follows that μ G − 2γ ≤ 1 m · G and, using (ii) The first inequality is straightforward as any solution for I R can be used as a solution for I , just packing small items first in containers and then the containers according to the solution for I R .
We create a solution of I R of makespan at most 2 − 1 m + 3ε ·OPT(I ) as follows: We take an optimal solution S B for instance I R \ I C , i.e., for big jobs only, and combine it, in an arbitrary way, with solution S C for containers from (i), to obtain a solution S for I R . Let μ k be the largest load assigned to a machine in dimension k in solution S B ; we have μ k ≤ OPT(I ). Note that L C k ≤ m − μ k , since the total load of big jobs and containers together is at most m, by the assumption of the lemma.
Consider the load on machine i in dimension k in solution S. If 1 m · L C k ≥ 1 2 , then this load is bounded by where the first inequality uses L C k ≤ m − μ k and γ ≤ 1 4 ε (ensured by the definition of γ ), and the last inequality holds by μ k ≤ OPT(I ) and 1 ≤ OPT(I ).
Finally, we have built up all the pieces for our main result for VECTOR SCHEDULING.

Theorem 9
There is a streaming 2 − 1 m + ε -approximation algorithm for VEC- Proof We process the stream by algorithm AGGREGATESMALLJOBS that outputs an input summary I R such that OPT(I R )/OPT(I ) ≤ 2 − 1 m + ε by Lemma 7. Given summary I R , we compute a (1 + ε)-approximation of OPT(I R ) using the algorithm in [8], which requires time doubly exponential in d (recall that such a time cost is needed to get an approximation of the makespan). This gives a solution of makespan at most 2 − 1 m + O(ε) · OPT(I ). The space bound follows from the analysis given above.
It remains open whether or not algorithm AGGREGATESMALLJOBS with γ = (ε) also gives (2 − 1 m + ε)-approximation, which would imply a better space bound of O( 1 ε · d 2 · m). The approximation guarantee of this approach cannot be improved, however, which we demonstrate by an example in the next section.

Tight Example for the Algorithm for Vector Scheduling
For any m ≥ 2, we present an instance I in d = m + 1 dimensions such that OPT(I ) = 1, but OPT(I R ) ≥ 2 − 1 m , where I R is the instance created by algorithm AGGREGATESMALLJOBS described in Section 4.
The groups arrive one by one, with an arbitrary ordering inside the group. Note, however, that these jobs with ∞ norm equal to γ become small for the algorithm only once the first job from the last group arrives as they are compared to the total load in each dimension, which increases gradually. When they become small, the algorithm will combine each group into one container (γ, γ, . . . , γ, 0), which can be achieved by processing the jobs in their arrival order and by having the last vector of the group larger by an infinitesimal amount (we do not take these infinitesimals into account in further calculations). Thus, I R consists of m big jobs and (m − 1) · 1 γ containers (γ, γ, . . . , γ, 0).
Observe that OPT(I ) = 1, since in the optimal solution, each machine i is assigned big job v i and 1 γ small jobs with γ in dimension k for each k ∈ {1, . . . , d − 1} \ {i}. Thus the load equals one on any machine and dimension.
We claim that OPT(I R ) ≥ 2 − 1 m . Indeed, in a solution with makespan below 2, only one big job can be assigned on one machine, as all of them have value one in dimension m + 1, so each machine contains one big job. Observe that some machine gets at least m−1 m · 1 γ containers and thus, it has load of at least 2 − 1 m in one of the d − 1 first dimensions, which shows the claim.
Note that for this instance to show ratio 2− 1 m it suffices that the algorithms creates (m − 1) · 1 γ containers (γ, γ, . . . , γ, 0). This can be enforced for various greedy algorithms used for packing the small jobs into containers. We conclude that we need a different approach for input summarization to get a ratio below 2 − 1 m .

Makespan Scheduling
We start by outlining a simple streaming algorithm for d = 1 based on rounding. Here, each job j on input is characterized by its processing time p j only. The algorithm uses the size of the largest job seen so far, denoted p max , as a lower bound on the optimum makespan. This makes the rounding procedure (and hence, the input summary) oblivious of m, the number of machines, which is in contrast with the algorithm in Section 4 that uses just the sum of job sizes divided by m as the lower bound. The rounding works as follows: Let q be an integer such that p max ∈ ((1+ε) q , (1+ ε) q+1 ], and let k = log 1+ε . A job is big if its size exceeds (1 + ε) q−k ; note that any big job is larger than ε · p max /(1 + ε) 2 . All other jobs are small and have size less than ε · p max . The algorithm maintains one variable s for the total size of all small jobs and variables L i , i = q − k, . . . , q, for the number of big jobs with size in ((1 + ε) i , (1 + ε) i+1 ] (note that this interval is not scaled by p max , i.e., increasing p max slightly does not move the intervals).
Maintaining these variables when a new job arrives can be done in a straightforward way. In particular, when an increase of p max causes that q increases (by 1 or more as it is integral), we discard all variables L i that do not correspond to big jobs any more, and account for previously big jobs that are now small in variable s. However, as the size of these jobs was rounded to a power of 1 + ε, variable s can differ from the exact total size of small jobs by a factor of at most 1 + ε.
The created input summary, consisting of O( 1 ε log 1 ε ) variables L i and variable s, preserves the optimal value up to a factor of 1+O(ε). This follows, since big jobs are stored with size rounded up to the nearest power of 1 + ε, and, although we just know the approximate total size of small jobs, they can be taken into account similarly as when calculating a bound on the number of bins in our algorithm for BIN PACKING.

Vector Scheduling
We describe the rounding introduced by Bansal et al. [8], which we can adjust into a streaming (1 + ε)-approximation for VECTOR SCHEDULING in a constant dimension. The downside of this approach is that it requires memory exceeding 2 ε d , which becomes unfeasible even for ε = 1 and d being a relatively small constant. Moreover, such an amount of memory may be needed also in the case of a small number of machines.
We first use the following lemma by Chekuri and Khanna [11], where δ = ε d : In the following, we assume that the algorithms receives vectors from instance I , created as in Lemma 10. Let p max be the maximum ∞ norm over all vectors that arrived so far; we use it as a lower bound on OPT. We again do not use the total volume in each dimension as a lower bound, which makes the input summarization oblivious of m. A job, characterized by vector v, is said to be big if v ∞ > δ · p max ; otherwise, v is small.
We round all values in big jobs to the powers of 1 + ε. By Lemma 10, we have that either v k > δ 2 · p max or v k = 0 for any big v and dimension k, thus there are log 1+ε types of big jobs at any time. We have one variable L t counting the number of jobs for each big type t, where t is an integer vector consisting of the exponents, i.e., if v is a big vector of type t, then v i ∈ (1 + ε) t i , (1 + ε) t i +1 (we set t i = −∞ if v i = 0). As in the 1-dimensional case, big types change over time, when p max (sufficiently) increases.
Note that small jobs cannot be rounded to powers of 1 + ε directly. Instead, they are rounded relative to their ∞ norms. More precisely, consider a small vector v and let γ = v ∞ . For each dimension k, if v k > 0, let t k ≥ 0 be the largest integer such that v k ≤ γ · (1 + ε) −t i , and if v i = 0, we set t i to ∞. Then (t 1 , . . . , t d ) is the type of small vector v. Observe that small types do not change over time and there are at most O The variables can be maintained in an online fashion. Namely, when p max increases, the types for previously big jobs that are now small are discarded, while the jobs that become small are accounted for in small types. For each such former big type t, we compute the corresponding small type as follows: Let δ = t ∞ be the maximum value in t (which is not −∞). The corresponding small typet haŝ t i = δ −t i if t i = −∞, andt i = ∞ otherwise. Then we increase st by L t ·(1+ε) δ+1 .
There are two types of errors introduced due to maintaining variables in the streaming scenario and not offline, where we know the final value of p max in advance. First, it may happen that a vector v that was big upon its arrival becomes small, and the small type of v is different than the small type computed for the former big type of v (i.e., the small type of v with values rounded to powers of 1 + ε). Second, the sum of ∞ norms of small vectors of a small type t is in s t /(1 + ε), s t ], and moreover, the error in some dimension i with t i > 0 (i.e., not the largest one for this type) may be of factor up to (1 + ε) 2 , since we may round such a dimension two times for some jobs. Note, however, that by giving up a factor of 1 + O(ε), we may disregard both issues.
The offline algorithm of Bansal et al. [8] implies that such an input summary, consisting of variables for both small and big types, is sufficient for computing (1 + ε)-approximation.

Conclusions and Open Problems
In this paper, we provide the first efficient streaming algorithms for two fundamental problems in packing and scheduling. For BIN PACKING, our streaming asymptotic (1 + ε)-approximation achieves a nearly optimal space bound. Both the algorithm and the lower bound follow from the close connection to estimating quantiles in data streams. We believe that the extra factor of O(log 1 ε ) in the space complexity of BIN PACKING is necessary -intuitively, BIN PACKING seems to be a harder problem than estimating quantiles (this is true of the time complexity in the offline case, as one can find any quantile exactly in linear time, while BIN PACKING is NP-hard).
The aggregation algorithm for VECTOR SCHEDULING achieves a 2approximation in poly(d, m) space. Interestingly, if the dimension is relatively large, this "outperforms" offline algorithms, since computing an O(1)-approximation is not possible in polynomial time, unless NP = ZPP [11]. Note, however, that this does not provide a complexity breakthrough, since a streaming algorithm also needs to compute the actual estimate of the optimal makespan from the input summary. Rather, this emphasises that the time and space cost to make a compact summary of the input is small, to which (exponential time) post-processing can be applied to find a solution. The downside of our approach is that the dependence of the space complexity on m, the number of machines, is linear. It would be interesting to know whether or not there is a streaming O(1)-approximation with space poly(d, log m), that is, polynomial in d and polylogarithmic in m, or even independent of m. Recall that the rounding from [8] works in space independent of m, but exponential in d.
Further results follow by using our approaches for some generalizations and variants of the aforementioned basic problems. For instance, we presented our algorithm for VECTOR SCHEDULING assuming that all machines have the same speed of processing the jobs, i.e., for identical machines. The algorithm still works in the case of uniform machines, where each machine has a certain speed of processing jobs.
However, as a small job must be small on any machine, the space needed increases by factor of s max /s min , where s max and s min are the largest and the smallest speeds of a machine, respectively. This ratio can be very large, so a better summarization technique is desired in such a case, even for 1-dimensional jobs.