1 Introduction

We deal with a problem that is called Bin covering with delivery, BCD for short. The problem is introduced in the preliminary version of this paper in Benkő et al. (2010). BCD is a modification of the classical bin covering problem (BC), which is defined as follows. Given items with positive sizes, and unit sized bins. A bin is covered, if the sum of the sizes of the items assigned to that bin (or in other words, packed into that bin), is at least the bin capacity, i.e. 1. The goal is to cover as many bins as possible.

Now, (in case of the BCD problem), we not only pack the items into bins but also deliver the bins to some destination point. We will get money for the delivered bins. More exactly, some amount of profit (or gain) is realized for the covered and delivered bins, and our goal is to maximize the total profit.

In the present paper we significantly augment the investigation on a huge amount of benchmark instances. The offline version of the problem is studied in Benkő et al. (2013). Some closely related problems are also introduced in Dósa and Tuza (2012).

It is well known that the bin covering problem is NP-hard to solve optimally Garey and Johnson (1979). Usually, if a problem is NP-hard, then there is no chance to get an optimal solution of the problem for any input in polynomial time, unless \(P\ne NP\). In many cases, however, we would like to get a good solution for a problem in reasonable time.

In the online version of the problem the items arrive one by one, and an item needs to be packed immediately when it is revealed. The items are supposed to be packed without many open bins; in our model the cost of open bins is the decrease of efficiency with respect to the value of the solution. An extremal example for using only a few open bins is the bin packing algorithm Next Fit (NF), which uses in fact only one open bin at a time. If the current item fits into the open bin, it is packed there; otherwise the current open bin is closed, and a new bin is opened for the current item. For NF the number of used bins can be (in the worst case) the twice of what it could be in an optimal solution. For a comprehensive review about (offline and online) bin packing and covering we propose Coffman et al. (1984); Galambos and Woeginger (1995); Csirik and Woeginger (1998).

In this paper we deal with the online version of the BCD problem. The objective depends not only on the quality of the covering, but also on the number of open bins. Hence, we are looking for a “good and fast” covering: we punish if “too many” bins are opened, since it is a natural idea that more open bins need more time to handle.

It is clear that for a given algorithm (in our BCD model) if the mean level of delivered bins is close to the bin size then we can gain more money (because the bins are less overpacked), since more bins mean more profit. This conclusion gives the idea of trying to avoid – as much as possible – the overpacking of the bins. (In other words, let us cover the bins, but keep the total size of items, packed into a bin, as close as possible to the bin capacity.)

Note that “Scheduling with delivery” (see Zhong et al. (2007)), is a related problem in some sense. Also, there exist some works where the time spent between opening a bin and releasing a bin (i.e., waiting time of a bin) is punished, or even the waiting time of items is also punished, see, e.g., Epstein (2021) and Ahlroth et al. (2013).

Our model differs from both scenarios studied in the above mentioned papers, since we will punish if the algorithm opens more bins, rather than punishing the waiting times of items or bins (and we deal with covering, not with packing).

In Sect. 2, we define the BCD problem in details, and we also mention some properties of the offline problem. In the same section we also introduce the instance classes that we will use in our investigations.

Then, in Sect. 3, some natural online algorithms are proposed. In Sect. 4 we define a new, flexible algorithm class, called MMask. Any member of this class is a parameterized algorithm. In Sect. 5 we show our parameter optimization meta-algorithm. Intensive computer-aided experiments are in Sect. 6, to show the ability of our algorithms. Finally, we conclude in Sect. 7.

2 Problem definition and some properties

In this section first we give the exact definition of our model. Items arrive one by one according to a list L, but this list is not known in advance for the algorithm. The size of the i-th item is \(p_{i}>0\). We also have an infinite number of bins with the same capacity C. When a new item is revealed, it must be immediately assigned to a bin (in other words, pack it into a bin). The number of items is n, it is known in the offline case; but in the online case n is not known in advance. The value of n will be known only at the end of the list when no more items come. A fixed integer \(K>0\) is also given, and there can be at most K opened bins simultaneously.

It means that a covering algorithm can open a new bin only if the number of open bins is less than K. Otherwise the item must be packed into an existing bin. (Recall that a bin B is covered if the sum of sizes of items packed into B is not less than C.) Moreover, if a bin is covered, it is immediately delivered.

The objective is defined by a positive, non-increasing profit function \(G:\left\{ 1,\dots ,K\right\} \rightarrow R^{+}\) as follows: If there are k opened bins (\(1\le k\le K\)) just when bin B becomes covered (we assume that just at this moment the bin is delivered), then G(k) profit is realized for covering this bin. Naturally, if a bin is covered and delivered, the number of open bins decreases by one. In this way we model that for handling more opened bins we need more time to decide where to pack the current item. The objective is to maximize the total profit for the closed and delivered bins. Already here we mention that it is also an option, that k is always kept to 1. This makes sense if the gain function suddenly decreases from \(k=1\) to \(k=2\).

For an application consider the following. At a small producer, some kind of fruit is packed into small boxes. Each box needs to contain a minimum total weight (but should not contain much more than necessary), and having less open bins is advantageous, as it is hard to handle many open bins.

Since small excess weight and few open boxes contradict each other, we model this by a gain per box that decreases with the number of open boxes.

The efficiency of offline or online algorithms usually is measured by approximation or competitive analysis, respectively. This means that the value \(C_{A }(I)\) of the solution determined by an offline/online algorithm \(A \) on an input sequence I is compared to the offline optimum value \(C^* (I)\). In case of a maximization problem (as in our case) the infimum of the ratio \(C_{A }(I)/ C^* (I))\) taken over all sequences I (which is a number at most one) is called the approximation/competitive ratio of algorithm A.

By “offline solution” , we mean a solution of an algorithm, which packs the items knowing everything about them (their order and their sizes), and also knowing the gain function G, and moreover, packs the items in the order of the given list L. The offline algorithm is optimal, if it reaches the maximum possible value of the defined objective: it gets maximum possible money for the covered bins.

On the other hand, in the online case, nothing is known in advance about the input (except the gain function G), and every decision regarding the packing of each item must be made just after revealing the item.

For any finite list L of the items to be packed, and any profit function G, let \(C_{A }(L,G)\) mean the value of the solution got by an offline/online algorithm \(A \). We compare it to the offline optimum value, denoted by \( C ^* (L,G)\). An online algorithm \(A \) is said to be \(\rho \)-competitive (\(0\le \rho \le 1\)) if

$$\begin{aligned} C_{A }(L,G)/ C ^* (L,G) \ge \rho \end{aligned}$$

holds for all LG. The largest \(\rho \) for which \(A \) is \(\rho \)-competitive is called the competitive ratio of \(A \). (If \(A \) is an offline algorithm, the ratio is called approximation ratio). On the other hand, if for some fixed G there exists an instance L (called adversary) for which

$$\begin{aligned} \mu \ge C_{A }(L,G)/ C ^* (L,G) \end{aligned}$$

holds for every online algorithm \(A \), then \(\mu \) is called an upper bound of the problem. An online algorithm is called optimal if its competitive ratio \(\rho \) matches the infimum of upper bounds \(\mu \) of the problem.

2.1 Some properties of the offline model

Here we only mention some properties of the offline model of BCD from our previous paper Benkő et al. (2013) for illustration. One main result is that the offline optimum can be determined efficiently (i.e. in polynomial time) if the item sizes are bounded by a positive value from below (from above they are bounded by the bin size), with an arbitrary choice of the profit function.

It is also possible to show that the following result holds: There exists a suitable choice of integer K, profit function G, and a class of item-sequences L for which no polynomial-time algorithm can achieve an approximation ratio better than 6/7, if \(P\ne NP\).

This latter results shows that in some sense BCD is even harder than the BC problem (which is already NP-hard). For more details, please see Benkő et al. (2013).

2.2 Benchmark classes

In a wide variety of optimization problems, it is typical that some benchmark instances are considered. For bin packing, a large collection of benchmark problems is collected on the homepage of Operations Research Group Bologna, (http://or.dei.unibo.it/library/bpplib). Since our problem is not pure bin packing (or covering) as a gain function is also taken into account, we need to add this information to the instances. (It means that there is no “ready” benchmark set for our model, we need to generate these instance sets.) For our experiments we will adapt some instances from (http://or.dei.unibo.it/library/bpplib), by adding a gain function.

In our recent paper Ábrahám et al. (2021), we considered two main types from these benchmarks, the Schwerin type and the Falkenauer U type (for pure bin packing). We choose again these two classes in the following way.

The Schwerin instances are defined in Schwerin and Wäscher (1997). These 200 instances are grouped into two subsets (“Schwerin 1” and “Schwerin 2”), both having 100 instances. For them, \(n=100\) and \(n=120\), respectively, and the bin capacity is \(C=1000\). The item sizes are randomly drawn and uniformly distributed between 150 and 200. (It means that any bin will be covered by at least 5 and at most 7 items.)

The other type is the Falkenauer U class, defined in Falkenauer (1996), this type has 80 instances, divided into four subclasses, 20 instances each. In a subclass the number of items is \(n=120\), \(n=250\), \(n=500\), and \(n=1000\), respectively. In all subclasses, the item sizes are uniformly distributed between 20 and 100, and the bin capacity is \(C=150\). It means, that (because the uniform distribution) the expected value of the size of an element is 60, so, roughly speaking, in general we would expect 3 items to cover an average bin.

2.2.1 Treating the instances

A, disordering. Since these instances are ordered (by non-increasing size), first we create a random order of the items.

B, normalizing the Falkenauer type. We normalize the Falkenauer instances to bin capacity 1000, the same as Schwerin instances, and multiply the size of any item by 1000/150 (rounded down to have integer sizes).

After performing steps A and B, we denote the input types as S1, S2, F1, F2, F3, F4.

C, a new instance class. As we have seen, the items in the Schwerin class come in a very restricted interval, the bin size is 1000, and all items are drawn from the interval [150; 200]. In case of the Falkenauer class, after the normalization, the bin size is again 1000, and the item sizes are in [133; 666]. In our preliminary paper Benkő et al. (2010), we had a third item class also, where the item sizes are taken from an even wider interval. Thus, we create here another instance class, where there is no restriction for the item sizes, any size between 0 and the bin capacity C is possible. (So, roughly speaking, it means that 2–3 items cover a bin in average.

As we did not find such an input class on the Unibo homepage (http://or.dei.unibo.it/library/bpplib ), we created it. Here, the bin size is again 1000, and the item sizes are from [1; 1000]. The name of the class is LR (large range). We created 100 such instances, where the number of items is 1000. Note, that here the item sizes are not ordered! The name of this subclass is LR4. Then we cut the “tail” of the instances (i.e., the last 880 items, or the last 750 items, or the last 500 items, respectively), keeping only the first 120 or 250 or 500 items, getting the LR1, LR2, LR3 input subclasses.

Below we insert a table (Table 1.) about the instance classes.

Table 1 The instance classes

We still need to introduce gain functions.

D, introducing gain functions. As we told, in the original input classes, there are no gain functions given (as we deal with another model, not with the pure bin packing model). We introduce three types of gain functions for any \(k=1,2,...\), that are the followings:

a, \(G1(k)=10.1-0.1 \times k\), so \(G1(1)=10\), \(G1(2)=9.9\), \(G1(3)=9.8\), and so on. This is a slightly decreasing gain function (dark blue dotted line in Fig. 1.).

b, \(G2(k)=11-k\) , so \(G2(1)=10\), \(G2(2)=9\), \(G2(3)=8\), and so on. This is a strongly decreasing gain function (red straight line in Fig. 1.)

c, \(G3(k)=10.05-0.05 \times k^{2}\), i.e. \(G3(1)=10\), \(G3(2)=9.85\), \(G3(3)=9.6\), and so on. This gain function decreases slightly in the beginning, but then it decreases fast (green dashed curve in Fig. 1.), so it is between G1 and G2.

Fig. 1
figure 1

The graphs of the three gain functions

E, combinations of the item types with the gain types. After this, we combine any item type S, F and LR with any gain function type G, and we will denote the instance subclasses as SiGu, FjGu, and LRjGu, where \(i=1,2\), \(j=1,2,3,4\), \(u=1,2,3\). The benchmark instances can be found in Schwerin and Wäscher (1997) and Abraham (2021).

In the next several sections we introduce algorithms; for studying and demonstrating their performance we will choose only one instance from each input class. However, later in Sect. 6, we will perform a very detailed analysis, taking into account the performance on all 3 times 680 instances within the 30 input subclasses.

Note that in the preliminary version of this paper in Benkő et al. (2010), only six input subclasses are defined (in some sense in an ad-hoc way), moreover only ten instances in each class. So, our framework and investigation is much deeper and more detailed in the present paper.

3 Natural online algorithms

In this section we adapt two simple online algorithms from the literature: DNF and Harmonic(K), for reference, see Galambos and Woeginger (1995) or Csirik and Woeginger (1998). Then we introduce a simple modification of Harmonic(K) to make it “smarter”

3.1 Algorithm dual next fit (DNF)

This algorithm allows only one opened bin at a time. The next item is always packed into the open bin. If the bin is covered, it is just delivered and a new bin is opened for the further items. The algorithm stops if there is no more item.

It is trivial to see that even this simple algorithm is optimal, when the profit for a delivered bin becomes zero when more than one bin is opened. Several more simple properties of DNF are also shown in Benkő et al. (2010), as follows:

  • Let the bin size be normalized to \(C=1\), and let \(k\ge 2\) be an integer. If \(1/k\le p_{i}<1/(k-1)\) holds for each item size, then algorithm DNF is optimal.

  • The competitive ratio of algorithm DNF is at least 1/2, with arbitrary non-increasing gain function G.

Let us realize that the case for the S1 and S2 types are quite similar to the case when all sizes are in \(1/k\le p_{i}<1/(k-1)\) for some k. Now, by normalization, all sizes of the Schwerin type instances are in (1/7; 1/5]. It means that DNF can overpack the bins only slightly: the load of any covered bin is at most 6/5. Let us fix an instance I from S1 or S2 type. Suppose, that in the optimal solution there are c covered bins. Then the optimum value is at most \(c \cdot G(1)\). On the other hand, it is easy to see that DNF creates at least \(\left\lfloor 5/6\cdot c \right\rfloor \) bins, moreover gets exactly G(1) gain for each covered bin, thus the next claim is valid:

Claim. In case of the Schwerin instances the objective value of DNF is at least \(\left\lfloor 5/6\cdot C^* (I) \right\rfloor \).

Since the case is very simple, we can state a more general claim as well:

Claim. Suppose that all item sizes are in (1/q; 1/t] for some integers q and t. Then the objective value of DNF is at least \(\left\lfloor t/(t+1)\cdot C^* (I) \right\rfloor \).

In this paper we report several new experiments for DNF. Since the cases are quite similar, we chose only the input classes S1Gu, F1Gu, F4Gu and LRjGu, for \(u=1,2,3\), \(j=1,2,3,4\), and from each input class, we considered only the first instance. (As we promised, we will perform a very careful investigation in Section 6; here we only demonstrate the performance of the algorithms.)

We measured the average level of the delivered bins (rounded down), the number of delivered bins, and the gained money, per case. Since DNF has only one open bin at any time, and the gain functions have the same value for \(k=1\), we show the results below only for G1.

The result of the experiments can be seen in Table 2.

Table 2 Runs for the DNF algorithm

Note that in case of the Schwerin instances, in the Schwerin 1 subclass, it is known that the items can be packed into 18 bins, and cannot be packed into fewer bins. From our previous investigations in Ábrahám et al. (2021) it can be easily seen that the items can cover 17 bins and not more (as the total size of items is less than 18 times the bin capacity). So, in row 1, comparing to the “best possible” covering of 17 bins, DNF covers 16 bins in a very simple way, so this solution is very good. But, at this point, it is not clear at all whether DNF is also very good for the other input classes.

However, we can state a claim were we compare the optimum solution and the value of DNF, as follows. Let us fix some instance I. Let the average level of covered bins be a, the number of bins be N. Then the total size of items is \(s= a \cdot N\) plus the total size of items, which are, possibly, in one, not covered bin. Thus taking into account also these items in case of the DNF algorithm, we have \(s \le a \cdot N +C\), and hence the optimum value is at most \( (a N /C +1) \cdot G(1)\), here we applied that the total size of items in any covered bin is at least C, and also that the gain function is non-increasing. On the other hand, the value of the solution provided by the DNF algorithm is \(N \cdot G(1)\) because the G function is non-increasing, thus we have the following claim:

Claim. The objective value of the DNF algorithm cannot be smaller than \( \frac{1}{a/C+1/N} \cdot C^* (I) \).

In the following, we will compare the performance of the DNF algorithm with another algorithms.

3.2 Algorithm dual harmonic DH(K)

If we want to save at least some portion of the lost size, more than one bin should be opened to make possible that no bin will be overpacked too much. We consider another classical bin packing algorithm Dual Harmonic(K) (DH(K) for short). Here \(K\ge 1\) is an integer, the maximum number of bins that can be opened at the same time. DH(K) packs only items of similar size into common bin. The items are grouped into classes according to their sizes, and each bin also belongs to a class. Items with sizes from the interval \(I_{k}=(\frac{1}{k+1},\frac{1}{k}]\) are packed only into the k-th bin-type, for \(k=1,\dots ,K-1\). The smallest items, i.e. items with sizes from interval \(I_{K}=(0,\frac{1}{K}]\) are packed into the K-th bin-type. Hence, when a new item is revealed, we determine its type. If this type is k and there exists an open bin from the same type, we pack the item into one such bin. (There will be only at most one open bin from each type so this bin is unique.) If the bin becomes covered we deliver it. If there is not an appropriate bin to accomodate the current item, we open a new bin for it, and pack the items into this new bin.

We note that another idea that would be possibly useful is to reserve some of the large items from the (1/2, 1] class, waiting for smaller items, as in the case of Refined or Modified Harmonic algorithms, see Coffman et al. (1984).

We give here the results of the algorithm DH(K) for \(K=2,3,4,5\). The considered input classes and instances are almost the same as in case of Table 2 (our previous investigation for algorithm DNF) but here, as we may open more than one bin, we made investigations also for the other two gain functions G2 and G3. The result of the test can be seen in Table 3 (S and F classes) and Table 4 (LR class). The tables contain the amount of money that can be got for the delivered bins. (For comparison, we provide also the result of DNF). All values are rounded down. The biggest values are denoted by bold in each row.

Table 3 Profit for the DH(K) algorithm on the S and F classes

Conclusions: It seems surprising that for the S1 set, the total gain remains 160, for all gain functions (see the first three lines). But here the item sizes are between 150 and 200 (the bin size is 1000), so all the items are packed into the K-th (i.e. last) bin-type, such that only one bin is used. This fact can change only if K is bigger (\(K\ge 6\)), but then, the gain function decreases so much, that opening so many bins is not advantageous.

The case is more interesting in the next three lines, where the F1 set is packed with the three different gain functions. (Recall, that G1 is slightly decreasing, G2 decreases fast, and G3 is in between, see Fig 1.) Here, H(2) produces better result than DNF for G1 and G3. For all the three gain functions, the gains are decreasing when K grows from 2 to 3 or to 4 or to 5. We conclude that if the gain function decreases only slightly (this holds for G1 and also a little bit for G3) it is advantageous to open more than one bin, this gives more opportunities to pack the items “well”. Regarding G2, it is greedily decreasing, this can be the reason that DNF is competitive with the other kind of algorithms.

The tendency is similar in case of the last three rows (for F4), but here there are much more items in the instance, and this results in a bigger growth in the total gain, comparing to DNF. For F4G1, even DH(4) is better than DNF.

Summarizing our conclusions, we observed that DH(K) can perform better, or even much better than DNF, if K is “not too big” and the gain function does not decrease “too fast”. Let also be noted, that the performance of DH(K) also depends on the ground interval of the uniformly generated items.

Now let us see the result of the similar examinations for the LR input classes below.

Table 4 Performance for the DH(K) algorithm in case of the LR classes

For the LR1 subclass, the results are similar to the ones we obtained in case of the results of Table 3. But, for LR2 - LR4, the case is different. Here, it is still true that the result of DH(K) is worse if K is grown, but no DH(K) algorithm can outperform DNF! Even DH(2) is worse, and also for G1 (which gain function decreases the least). The reason can be that in the LR classes the diversity of the items is high (all sizes are possible from 1 to 1000, where the bin size is 1000), so it can happen that, e.g., in case of DH(2), “always” two bins are opened, so G1(1) is never or very rarely realized, the algorithm can get only G1(2), which is smaller than G1(1).

Another issue that can be important is, that in average every second item is a large item (larger than the half of the bin capacity), this also can influence the efficiency of the algorithms.

3.3 Algorithm SDH(K)

In this subsection we modify DH(K) a little bit to get a smarter algorithm, we will call it Smart Dual Harmonic(K), or SDH(K) in short. The only difference between DH(K) and SDH(K) is the following: if the next item can cover a bin, we pack the item into one such bin with the smallest level and deliver the bin. In any other case SDH(K) performs the same way like DH(K).

The intuition behind SDH is that if the current item is placed to cover a bin whenever possible, then we can hope that on the average the bins wait less and the total gain will be bigger. Another possible advantage is, that the big items are not expected to be two in a bin, or at least not as many pairs of big items as before.

Let us see the results (all values are rounded down) (Table 5).

Table 5 Performance for the SDH(K) algorithm on the S and F classes

Our experiences are as follows:

  • In case of the S1 class, we got completely the same results as for DH(K), here the “smart” version was not really smarter.

  • But in case of the F1 class, we find an interesting phenomenon: Here, Smart Dual Harmonic is significantly better than the “pure” Dual Harmonic. Focusing on F1G1 (where the gain function decreases only slightly) the total gain grows until \(K=4\). The case is similar to F1G3, but here we find the highest total gain for \(K=3\).

  • Regarding the last three rows (F4 class), here SDH is also significantly better than DNF (even for the quickly decreasing gain function for \(K=2\)!), and also better than the pure DH(K) algorithm. The highest value for the F4 class is given for F4G1, and \(K=5\).

As a summary, we conclude that SDH(K) really performs better than DH(K), at least in the majority of the instances.

Now, let us consider the result of the examinations for the LR input classes:

Table 6 Runs for the SDH(K) algorithm on the LR class

Previously, we found that for LR2 - LR4, no DH(K) algorithm can outperform DNF. Now, for SDH(K), the case is completely different. For the LR1 class even the quickly decreasing gain function (i.e. G2) has better results. For LR1, the highest value is given for LR1G1, and \(K=4\). The case is similar for LR2-LR4.

3.4 Collection of the results of the previous algorithms

In this subsection, we collect all previous results, comparing the considered algorithms. In each row, the solutions provided by the different algorithms are shown, for a specific problem class (rounded down to get an integer value). In the last column, we show the maximum value of the row. In each row, we denote the maximal value by bold.

Results for S and F classes are presented in Table 7, and results for LR in Table 8.

Table 7 Comparison in case of the S and F classes
Table 8 Comparison on the LR class

Now we want to create a new algorithm, that outperforms all previous algorithms (if possible). For this, we need to use a new idea. This new idea and the resulted algorithm is described in the next section.

4 A new, parameterized algorithm, MMask

In this section we introduce new, parameterized algorithm. We will see that it is competitive with any of the previous algorithms. We call the algorithm MMask. A previous version, that is proposed in Benkő et al. (2010) was called Mask. As we made a small modification to make it even more efficient, we call it Modified Mask, or simply MMask.

The MMask algorithm has a parameter K that denotes the maximum number of bins that are allowed to be simultaneously open.

It has two further parameters, \(\alpha \) and \(\beta \). Here, \(\alpha \) is a K-dimensional nonnegative vector, and \(\beta \) is a positive real number.

The MMask algorithm applies an accepting-rejecting policy. It accepts an item and packs it into a bin if the increased load of the bin will be in the “accept” area of the bin, and rejects to pack the item otherwise (i.e. if the increased load of the bin falls into the “reject” area of the bin). The accept area and the reject area of a bin are determined as follows (supposing that the bin capacity is normalized to 1):

The accept area of the k-th bin in Algorithm MMask(\(\alpha \), \(\beta \), K) is \(\left[ 0;1-\alpha _{k}\right] \cup \left[ 1;1+\beta \right] \). Then the reject area is \((1-\alpha _{k};1)\cup (1+\beta ;\infty )\). We can suppose that the \(\alpha _{k}\) values are non-decreasing, i.e. \(\alpha _1 \le \alpha _2\), and so on.

The idea behind the above is the following: we do not allow to pack the current item into a bin, if the bin will not be covered but it will be “almost” covered, i.e. the increased load of the bin would be within \((1-\alpha _{k};1)\). The reason is that in unfortunate case it can happen that the bin later becomes covered but the covering item is “big”, thus the bin will be overcovered. We neither would like to highly overpack the bin, in other words, we do not allow that the increased load of a bin would be bigger than \(1+\beta \). Summarizing the idea of the algorithm, we can assume that only a few bins will be highly overcovered.

Usually, we choose different components of \(\alpha \), so we keep the level of some bin low where we suppose that a large item will be packed later, and we can fill some bins almost fully, where we wait that some small items will be packed later, just filling the bin.

Below we give the formal description of MMask(\(\alpha \), \(\beta \), K).

Algorithm MMask(\(\alpha \), \(\beta \), K)

  1. 1.

    If there exists a bin so that by packing the current item into this bin, the bin will be covered in the accept area, then pack the current item into such a bin with minimum level (ties are broken arbitrarily), deliver the bin and goto 5.

  2. 2.

    If the current item can be packed into a bin within the accept area but the bin will not be covered, pack the item into one such bin (arbitrarily) and goto 5.

  3. 3.

    If \(k<K\) (where k is the number of currently open bins) open a new bin for the smallest available alpha value, pack the current item into this bin and goto 5.

  4. 4.

    If \(k=K\), pack the current item into a bin with minimum level. If the bin is covered after the packing of the current item, let the bin be delivered, goto 5.

  5. 5.

    Stop if there are no more items, otherwise goto 1.

Note, that if a bin is covered in Step 4, it cannot happen in the “accept” area, by the algorithmic rule. But at this point we have no better choice than covering the least full bin.

To illustrate the machinery of the algorithm, we give a simple example below.

Example. Let \(K=4\), let the bin-size be 100, \(\alpha =[10,20,30,40]\), and \(\beta =30\). Also, let us suppose that the items come from the interval [10; 40], and they are the following, in this order: 24, 35, 18, 22, 16, 29, 20, 17, 38, 14, 31, 28, 32.

In the beginning there is only one open bin and this bin is empty. The parameter \(\alpha _{1}\) means that items are allowed to be put into this bin if the level of the bin (after the item is packed there) is at most 90. The \(\beta \) parameter means that items can be assigned to any bin if the increased level of the bin is between 100 and 130. It means that the reject area is from 90 to 100 and above 130 (regarding the first bin). The accept area is from 0 to 90 and from 100 to 130, for the first bin. MMask packs the first item (24) into this opened bin (into the first bin). The next item (35) can be packed into the same bin so the level of this bin is \(24+35=59\). Still the next item (18) is packed into the first bin since the level of the bin will be \(59+18=77\). But, the next item (22) cannot be packed into the bin, because the increased level would be \(77+22=99\), and this value is in the reject area.

Thus, the current item (22) will be packed into a new bin, i.e. into the second bin. The accept areas of the second bin are from 0 to 80 and from 100 to 130, based on the \(\alpha _{2}\) and \(\beta \) parameters. We pack the current item (22) into the second bin. Now, there are two opened bins with the levels of 77 and 22. The next item (16) cannot be packed into the first bin, but can be packed into the second bin since \(22+16=38\) lies in the accept area of the second bin. At this moment, the levels of the bins are 77 and 38.

Then the next item (29) is big enough, so it can be packed into the first bin: the increased level of the fist bin will be \(77+29=106\), which is in the accept area. At this moment this bin is delivered. There are two opened bins while delivering the covered bin so the gain from delivery is G(2). Now only one opened bin remained (its index remains 2, so \( \alpha _2=20\) belongs to this bin also in the following), and the level of this bin is 38.

We pack the next item (20) into this opened bin, the level of the bin grows to \(38+20=58\). Still the next item (17) is packed into this bin, the level grows to \(58+17=75\). At this moment the next item comes, its size is 38, it is allowed to be packed into the bin as \(75+38=113\) is in the accept area. We deliver the bin, and the currently gained money is G(1) as there is only one open bin at moment. After the bin is delivered, we already delivered two bins, and no open bin remained.

We open a new bin (with parameters \(\alpha _{1}\) and \(\beta \)) and put the next item (14) into the opened bin, its level will be 14. We pack into this bin the next item (31), the level grows to \(14+31=45\). The next item (28) is also packed into the bin, the level will be \(45+28=73\). Then the next item (32) is packed also into this bin, the increased level is \(73+32=105\), which is in the accept area. We deliver this covered bin and gain money G(1) again. The total gain is \(2G(1)+G(2)\).

Note that if algorithm DNF is applied for this list of items instead of MMask, then the gained money would be only 2G(1) and we have one uncovered and not delivered bin with level \(14+31+28=73\). So in this case our MMask algorithm provides (much) better packing than DNF, but usually the case deeply depends on the instance, and also on the appropriate choice of the \(\alpha _{k}\) and \(\beta \) parameters.

We will see that by a “good” choice of the parameters, the MMask algorithm can be competitive with any of the previously treated algorithms: for almost all instances that were considered before, the parameters can be chosen in such a way, that MMask is able to outperform the previous algorithms. Below we give the results of our experiments, with certain choice of the parameters of MMask. We compare the result of MMask (with some choice of the parameters) with the previous best result (denoted by MAX), in case of each input class. We emphasize that here we found these “good” parameter settings “by hand” , that means that we tried to get some good set of the parameters, if some collection did not work, we tried some other, but this kind of optimization is rather ad-hoc. For the S and F instances the results are given below (rounded down to integers). We denoted the MMask algorithms by M and the number of its version like M1, M2, and so on.

In the first column of Tables 9 and 10 (column “MAX” ) we show the previous best results for the considered problem classes, i.e., the last columns of Tables 7 and 8, for easy comparison.

Table 9 Different choices for the parameters of MMask

Conclusions: In Tables 9 and 10 we denoted by bold numbers the values that are better than or equal to the previous “best” results. In case of the S1 class, the previous best result was provided by DNF. In fact, all other previous algorithms give the same packing like DNF, as the items are chosen from a very narrow interval. Its consequence is, that we have no choice to overcome it, if our MMask algorithm opens at least two bins (see all columns from M1 to M8). But, for \(K=1\), we can reach the same result.

In any other row we find bold numbers, that means that for the considered instance class, we could successfully find a parameter setting for which MMask performs better than any previous algorithm. In some rows there are many bold values, like row of F4G1; in some others there are only few like for F1G2. It means that some parameter setting can be a good choice for some of the instance classes, and bad for the others. This is completely a natural thing, since the instance classes do have significant differences. So, it cannot be our purpose to find a “universal” parameter setting that is good for all possible instances. We will be satisfied also if for any instance class, or almost any instance class some specific, but good choice of the parameters can be found. For the instances above these seem possible. The chosen parameters of MMask in each case are as follows:

  • M0: K=1, \({\small \ \alpha =[200],\ \beta =200}\).

  • M1: K=4, \({\small \ \alpha =[100,200,300,400],\ \beta =200}\).

  • M2: K=5, \({\small \ \alpha =[100,200,300,400,500],\ \beta =200}\).

  • M3: K=2, \({\small \ \alpha =[100,100],\ \beta =200}\).

  • M4: K=3, \({\small \ \alpha =[100,100,100],\ \beta =200}\).

  • M5: K=4, \({\small \ \alpha =[100,100,100,100],\ \beta =500}\).

  • M6: K=4, \({\small \ \alpha =[225,225,230,230],\ \beta =560}\).

  • M7: K=4, \({\small \ \alpha =[200,200,200,200],\ \beta =600}\).

  • M8: K=4, \({\small \ \alpha =[200,200,300,400],\ \beta =350}\).

Now, let us see the results for the LR classes. (As M0 is designed definitely for the Schewerin type instances and it opens only one bin, it cannot be competitive for the other classes, we do not consider it. Moreover, after running the algorithm, we realized that M3 and M4 were not competitive for any instance class, so we do not show their results either.)

Table 10 Different choices for the parameters of MMask, on the LR class

Now for these input classes the case is as follows: among the 12 rows, there are 3 rows (LR2G2, LR2G3 and LR3G2) where there is no bold letter: we could not outperform the previous best algorithm. (In row LR2G2 the result of M6 is only slightly below the previous best result).

Note that in our earlier investigations in Benkő et al. (2010) we did not consider SDH(4) and SDH(5), but now we augmented the investigation for them. It turned out that sometimes SDH(4) is much better than SDH with other K values (see Table 6, and mainly Table 7). It means that now we must work harder than that time: we would like to outperform a much better solution than before. However, in the other 9 rows we again found some parameter setting for MMask, which is competitive.

Summarizing our experiences, in this section we have seen that, although MMask cannot be competitive with any choice of its parameters, but for any fixed input-class it is possible to determine such parameter setting, for which MMask is competitive with the other algorithms. So, what remained is the question, that how to find a good choice of the parameters. This question is answered in the next section.

5 Improving the parameters

We propose a natural method to solve the parameter optimization of the online algorithm. (We call it EoA for short as we called it in Benkő et al. (2010)). Such optimization is often called hyperparameter optimization (for example in neural network learning) and is done for many algorithms. Hyperparameter optimization is mostly an offline procedure (as it is in our case), since the optimization is done using some predefined datasets.

We define a neighborhood structure among parameter settings of an algorithm in a natural way: A neighbor of a specific \(MMASK(K,\alpha ,\beta )\) can be got by modifying “slightly” one parameter from K, \(\alpha \), \(\beta \). For a small \(\Delta >0\) if any \(\alpha _{i}\) or any \(\beta \) is decreased or increased with the given \(\Delta \) (so that the changed value remains valid), we get a neighbor of the algorithm. We also get a neighbor if K changed by one (keeping it at least 1). We do not change the value of \(\Delta \) during the process. (However, we note that the generation of some random \(\Delta \), e.g. in the case if the objective value is not improved for a while, can be useful. We keep this option for later research.)

To maintain the main process as simple as possible, we have applied local search for algorithm MMask. This simply means that after generating a neighbor, we run MMask for the “original” settings and also for the “new” settings of the parameters, where “new” means a randomly generated neighbor.

Which produces a better solution is kept, the other one is forgotten, and this step is repeated many times (until some stopping criteria will hold, e.g. the number of iterations reaches some predetermined value).

In Table 11, for the first group of the instance classes (S and F classes) we again give the best result of the algorithms introduced before MMask (DNF, DH(K) and SDH(K) for certain K values), this is denoted again by MAX. Then, we show the best result of the MMask algorithms (the best solution among MMask1-MMask 8), in column BM (Best MMask with ad-hoc choice). Then in new columns we show new versions of MMask where the parameters are optimized by local search, described above.

We did not try to overcome the results in case of the S1 class (first three rows of Table 8), as we already concluded that no algorithm can surpass DNF, as it makes no sense to open more than one bin. But for the other rows (F1 and F4 classes) we made parameter optimization, to see whether we will get finally an even better choice of the parameters: more efficient MMask settings. These are columns O1,O2,...,O5 (MMask algorithms with optimized parameters). These algorithms are planned to run only for a specific problem class, e.g. O1 is planned specifically for F1G1, we made the optimization separately for this input class. Similarly, O2 is designed for F1G3, etc. As one can see, we could not find better parameters for input class F1G2. However, for any other problem class the optimization is successful, we could improve upon the previous ad-hoc settings. The new settings of the parameters are also given below.

Table 11 Optimization of the parameters of MMask, in case of F classes

O1: \(K=4\), \({\small \ \alpha =[113,222,286,388],\ \beta =175}\).

O2: \(K=4\), \({\small \ \alpha =[94,221,316,410],\ \beta =267}\).

O3: \(K=5\), \({\small \ \alpha =[64,197,286,395,495],\ \beta =199}\).

O4: \(K=2\), \({\small \ \alpha =[100,101],\ \beta =198}\).

O5: \(K=3\), \({\small \ \alpha =[103,104,89],\ \beta =218}\).

Now let us see the same kind of investigations for the remaining part of the instance classes. The new settings are in columns O6-O14. We could not improve upon MAX in rows LR2G3 and LR3G2 (here we put a *). We could not improve on BM (upon the previous choice of MMask) in row LR1G2 (denoted by **). We could improve in any other row, in 9 rows among the 12 rows. The results can be seen in Table 12.

Table 12 Optimization of the parameters of MMask, in case of LR classes

The applied settings are as follows:

  • O6: \(K=5\), \({\small \ \alpha =[106,205,292,413,490],\ \beta =238}\).

  • O7: \(K=5\), \({\small \ \alpha =[130,204,304,393,473],\ \beta =238}\).

  • O8: \(K=5\), \({\small \ \alpha =[128,194,308,414,481],\ \beta =241}\).

  • O9: \(K=4\), \({\small \ \alpha =[218,197,211,185],\ \beta =558}\).

  • O10: \(K=5\), \({\small \ \alpha =[100,205,299,402,498],\ \beta =214}\).

  • O11: \(K=4\), \({\small \ \alpha =[220,224,320,409],\ \beta =373}\).

  • O12: \(K=5\), \({\small \ \alpha =[102,199,302,399,498],\ \beta =214}\).

  • O13: \(K=4\), \({\small \ \alpha =[225,220,206,214],\ \beta =563}\).

  • O14: \(K=4\), \({\small \ \alpha =[201,199,301,400],\ \beta =353}\).

Remark 1

Above we started the run of the local search from the best setting we found by hand previously. We also made runnings when the run is started from some “basic” setting. One basic setting was the following: K is determined to be not too big and not too small, for example \(K=4\). Then \({\small \alpha }_{i}{\small =200}\) for all \(1\le i\le K\) and \({\small \beta =200}\). This was the BASIC1 setting. The other basic setting was different, then BASIC2 setting is the following: \(K=4\), \({\small \ \alpha =[100,200,300,400],\ \beta =200}\). (Note that BASIC2 is the same as the M1 setup.) The reason is that previously when we performed the optimization in the ad-hoc way, these two settings produced good results. But, our experiment shows that starting the optimization from an ad-hoc starting point, or starting the optimization from some basic setting, does not influence significantly the efficiency of the optimization.

We conclude that the “good parameters” can be found by local search. Naturally, by some more advanced method it is likely that even better settings can be found, but this is outside of the scope of this paper, and can be a task of some further research.

Note that we performed Simulated Annealing (SA) (van Laarhoven and Aarts 1987) in the preliminary version of this paper, and we found, that SA can provide a little bit better choice of the settings than the pure Local Search (LS), see Miettinen et al. (1999). We think that this fact does not need a verification here, instead we made much more detailed experiments.

6 Detailed experiments

In this section we give a more exhaustive amount of experiments. We consider the instance classes, one by one. Let us suppose that we choose a certain instance class, e.g. F1G1. For this instance class, we chose the algorithm which is the best one taking into account all algorithms we considered before introducing MMask; this algorithm was SDH(4), see Table 6. The result of SDH(4) is 414.

Then we compare the result of this algorithm (i.e. SDH(4)) with MMask. For the latter, we apply the parameter setting that was found to be the “best” one, for the first member of the F1G1 class. This setting was in O1, see Table 11. The result of O1 was 436.

Let us emphasize again that this results belongs to the first member of the F1G1 class, so the parameters are tuned for this one member of the class. Now we make the runnings for all 20 members of this class, and compare SDH(4) with MMask with the O1 setting. The obtained results are shown (after rounding down) in Table 13.

Table 13 Comparison between SDH(4) and O1 for the F1G1 class

We conclude that O1 is better for each member of the instance class than SDH(4) (except the very last instance), with the parameter setting that we optimized only for the first member of this class. In fact (recall that we rounded all objective values down), in case of the last instance the exact objective value of SDH(4) is 423.1, while the exact objective values of O1 is 422.9, very close to each other. It means that we can tune the parameters in this way, i.e. we can learn the “best” parameters in this fashion, as we know how the items are generated (we know the interval from which they are drawn by uniform distribution), and therefore the members of a class are “similar” in some sense.

We performed similar investigations for the other two gain functions G2 and G3, and the same instance class (i.e. F1), and the results were similar: O1 won for each member against SDH(4). To save space, we provide in Table 14 only the worst result, the average result (denoted by ave), and the best result considering the 20 members of the instance classes, the results are in Table 14. Note that for each class, the best algorithm before MMask, was the SDH algorithm, we provide the K value for which the SDH algorithm works best for that instance class. Similarly, for MMask we provide the parameter setting that worked the best previously for the first member of the considered instance class.

Table 14 Comparison between SDH and MMask for the F1 and F4 classes

Conclusions

  • We can realize that all values are better for MMask than for SDH. This fact shows experimentally that our parameter optimization algorithm works.

  • We could get even better results for MMask, if the parameters would be optimized for each member of the class, one by one. But then the algorithm cannot be considered as some online algorithm. What is in between: we consider a class, and we tune the algorithm’s parameters by not only one, but with several members of the class. For example, if we have 20 members of a class, we teach the algorithm by considering, say, 5 members of the class, and we check the efficiency on the other 15 members. We did not make such investigation here, so this remains for further research.

After having made the investigations for the F classes, let us see the similar investigation for the LR classes. The results are in Table 15.

Table 15 Comparison between SDH and MMask for the LR classes

Here it does not hold that all values in the table would be better in case of running the MMask algorithms. We denote the places where SDH is better by bold numbers. However, we also have some interesting conclusions, as follows.

  • As one can see, there are only few bold numbers in the table, which means that MMask is usually better; moreover, at the places where MMask is not better, it is almost as good as the SDH algorithm.

  • For LR1, we have 5 values among the 9 values where MMask is not better (but almost the same as the previous best algorithm). For LR2, we have only one such place. And, for LR3 and LR4, all values are better by MMask. It means that if the number of items in the list grows, this is advantageous for MMask, regarding its efficiency.

7 Conclusions, discussion and further work

Here we mention several options that possibly provide further improvements. First of all, we note that there are some points where MMask can be possibly further improved. We list them below.

  • Our parameter optimization algorithm (i.e. local search) is very simple. It can be that for some further parameter settings that we did not try yet, MMask can perform (much) better. Instead of local search, we can apply also e.g. Simulated Annealing for the parameter optimization. Will it help to get much better algorithms? It can be cleared by further investigations. We also note that since it is possible that there are valleys attracting convergence to local minima, it seems a good idea to allow for a “jump” (up or down) in the value of \(\Delta \).

  • We realized that SDH(4) is very efficient for several problem instances among the LR class (see Table 8). So, it can be a good idea to combine SDH(4) and MMask in some sense, if we can do that. At this point this option remains for further research.

  • In Step 2 of MMask, we pack the current item arbitrarily, into some appropriate bin. It can happen that if we chose a bin (from the bins where the packing is allowed) more carefully, the efficiency of MMask improves. We already performed two such experiences. First, we packed the current item into the most loaded bin (among the appropriate bins). For this version we did not find significant improvement: For some instances the result of this modification gives a better result, but for some other ones the modification gives slightly worse results.

  • The other possible modification of Step 2 was as follows: Let \(B_{1},B_{2},...,B_{t}\) be the appropriate bins where the current item can be packed by Step 2. If \(t=1\), then, naturally, there is no other choice, we pack the item into the only one possible bin. Otherwise, we can choose from several possible bins. We calculate the average size of the already packed items in \(B_{1},B_{2},...,B_{t}\), say, \(x_{1},x_{2},...,x_{t}\). Then we pack the current item, say x, into the bin where the size of x is the most “similar” to the average item sizes: where \((x-x_{i})^{2}\) is the smallest one among \(1\le i\le t\). Note that this calculation would mimic the performance of SDH. Unfortunately, this kind of modification neither led to significantly better performance: the results of the modified algorithm are sometimes a bit better, sometimes worse.

  • Another option is to try the bin where the distance for the neighbors is the smallest one. In this way we can try to get the biggest diversity for the levels after packing the item.

There can be, of course, many other options. We only started the investigation of the problem, and proposed several options to be considered in further research. Some other questions regarding the research are as follows:

  • The items in the Falkenauer class were put in a uniformly random order. What happens if the (decreasing) order is disturbed less significantly, e.g. by rearranging just a certain fraction of the items?

  • What is the case if the list of items is put in decreasing order of their sizes. Will we get significantly different results? This question seems interesting also because in the beginning most algorithms would pack the big items (which are bigger than half of the bin capacity) in pairs, so MMask may perform well.

We need to mention that we got many useful proposals from the referees, in particular, both referees proposed some algorithm versions to consider.

One proposal was regarding Algorithm DH(K). Note, that (applying normalization, i.e. considering the bin capacity to be \(C=1\)), we applied 1/i endpoints. But, the endpoints of the intervals can be chosen also in some different way, namely for some K it may be better to have a classification where we group elements in the following intervals: smaller than 0.2, then between 0.2 and 0.3, then between 0.3 and 0.4, and so on. Then, say, an element from interval (0.3; 0.4] can be complemented with another one from interval (0.7; 0.8] The items, say between 0, 5 and 0, 6 can be kept in pairs.

Another idea regarding DH(K) would be to dedicate some of the big items and possibly pair them with smaller ones, i.e. to consider a simple version of the Modified DH(K) algorithm.

Another option (proposed by the same Referee) is to consider the simplified classification of Woeginger (1993) which is applied for the algorithm called Simplified Harmonic; e.g. for the Schwerin (S1 and S2) classes. It is based on the union of some classes of the DH(K) algorithm, based on the Sylvester series.

These are interesting questions (whether such modification can help, to get a more efficient algorithm), we kept the investigation for further research.

Another proposal was the next algorithm: For a given K, we always keep k open bins for certain \(1 \le k \le K\). Then, the current item is always packed into a bin with smallest level. Any time when a bin is covered, it is delivered, and we open a new bin instead of the delivered bin. In fact, this is the dual version of the Next-K Fit algorithm, we denote it by DN(k).

We tried the performance of the algorithm: we applied it with different k values, and the results of the algorithms were compared with the results of the other algorithms. We provide the detailed results below. Here in the column of “MAX” we give the maximal values from Tables 7 and 8, where we collected the results of all algorithms that are tried before the MMask algorithms. (All values are rounded down to integers.)

And the other part of the results:

Recall, that MMask outperformed the “MAX” values in most of all cases. But it turned out that DN(k) cannot be competitive with even the “MAX” values; i.e. the best algorithms out of the ones treated in the paper, before introducing the MMask class (Tables 16 and 17). The reason can be, that it is not advantageous to simultaneously open k bins, as in this way we never can get more than G(k) for a covered bin.

Table 16 The results of DN(k) compared to the best results of all algorithms before MMask, on the S and F classes
Table 17 The results of DN(k) compared to the best results of all algorithms before MMask, on the LR class