Variable and constraint reduction techniques for the temporal bin packing problem with fire-ups

The aim of this letter is to design and computationally test several improvements for the compact integer linear programming (ILP) formulations of the temporal bin packing problem with fire-ups (TBPP-FU). This problem is a challenging generalization of the classical bin packing problem in which the items, interpreted as jobs of given weight, are active only during an associated time window. The TBPP-FU objective function asks for the minimization of the weighted sum of the number of bins, viewed as servers of given capacity, to execute all the jobs and the total number of fire-ups. The fire-ups count the number of times the servers are activated due to the presence of assigned active jobs. Our contributions are effective procedures to reduce the number of variables and constraints of the ILP formulations proposed in the literature as well as the introduction of new valid inequalities. By extensive computational tests we show that substantial improvements can be achieved and several instances from the literature can be solved to proven optimality for the first time.


Introduction
The temporal bin packing problem (TBPP) introduces a temporal dimension to the classical bin packing problem (BPP), see [11,18,20], by associating to the items time windows in which they are active. Formally, given a set of items (or jobs), the TBPP asks for determining the minimum number of bins (or servers) of given capacity to execute all jobs. Each job is characterized by a size (or resource demand) and a lifespan (time window in which the job is active), a jobs-to-servers assignment is feasible if and only if the capacity of the servers is respected at any instant of time. The TBPP is a challenging problem with a high practical and theoretical interest which has been recently introduced in the literature, see [9,10]. It belongs to the rich family of cutting and packing problems, object of intense research in the last decades. The TBPP can also be interpreted as a high-dimensional vector packing problem [7,19] and it partially shares the mathematical structures of two-dimensional packing problems, where the time can be seen as one of the dimensions and the other one is related to the capacity of the servers. We also refer the interested reader to [13,16] for the strip packing versions of these problems.
Another problem closely related to the TBPP is the temporal knapsack problem (TKP). The TKP asks for determining a maximum-profit subset of jobs (also active for a given time window) which can be executed by a single server, see [2,4]. As far as the exact approaches to solve the TKP to proven optimality are concerned, this problem is usually tackled by a Dantzig-Wolfe reformulation and branch-and-price algorithms [5,6] or by dynamic programming algorithms [8].
The TBPP is introduced in an application-oriented article, see [9], dealing with efficient workload server management in data centers-one of the key challenges in light of the ever-growing energy demand of the IT industry [1,12,14]. As far as the TBPP solution methods are concerned, we refer the reader to [10], an article which presents several heuristic and exact approaches. The state-of-the-art exact algorithm for the TBPP is a branch-and-price algorithm, which exploits in preprocessing a plethora of heuristic algorithms (see [10] for further details).
In the TBPP, the quality of a jobs-to-servers assignment is evaluated solely by the number of servers in use. However, several recent publications point out that the specific operating mode of the servers is also crucial. This means, in particular, that an inactive server can be temporarily put into a sleep mode (or can be completely shut down) for the purpose of saving energy. In this case the server must then be turned on again, if required. A server restart is called a fire-up, see also Fig. 1 for a graphical representation, and-from the perspective of energy-efficiency-this naturally leads to a second optimization goal, introduced in [3]. The authors of that article propose two compact ILP formulations (denoted M1 and M2) taking into consideration two different objectives: (i) the number of servers and (ii) the number of fire-ups. This new objective function is a weighted-sum, using the input parameter γ > 0 to scale the contribution of the fire-ups. In this way, a multi-objective variant of the TBPP has been proposed in [3], hereinafter referred to as the temporal bin packing problem with fireups (TBPP-FU). The models M1 and M2 of [3] are based on the classic Kantorovichtype structure for the BPP, see [15], and the experiments show that the proposed exact solution method using these models is very challenging from a computational Fig. 1 An exemplary assignment of five jobs to two servers. (This assignment is optimal for the instance appearing in Example 1 with γ = 1) point of view. In order to quickly find good-quality heuristic solutions, a constructive look-ahead heuristic (CLH), together with an advanced recovery algorithm based on mathematical programming techniques, is also presented in [3].
In that article, an interesting insight of the multi-objective function is also shown: for rather small values of γ , i.e., for γ ≤ 1/n (where n is the number of jobs), it is possible to considerably reduce the size of the ILP formulations by exploiting the information given by the heuristic solutions. However, the important question on how to reduce the models in the general case of arbitrary values of γ is still an open problem which we tackle in this letter.
Very recently, in [17] several methods for improving the TBPP-FU ILP models have been proposed. These techniques generate substantial performance gains, including also the possibility of efficiently handling larger values of γ . In addition, a third formulation (called M3) is proposed which is not based on the classical job-to-server assignment variables. More in details, M3 is based on a job-to-job relation and it allows to a priori discard some infeasible solutions by removing a set of variables. As a result, model M3 is a more compact ILP formulation (compared to M1 and M2) which is particularly effective for fast calculations. However, from an overall point of view, the tests in [17] show that the "optimized" version of M1 remains on average the state-of-the-art approach in terms of instances solved to proven optimality.
In this article, we aim to enrich and complete the structural analysis of the compact ILP formulations for the TBPP-FU. To this end, we present new methods to improve the ILP models which lead to better numerical results. More in details, we show how to transfer parts of the previously mentioned favorable properties of M3, i.e., mainly a small model size and the possibility to easily account for job incompatibilities, to M1 and M2, thus obtaining very robust and more powerful compact formulations. The main contributions of our investigation are: -We optimize and reduce the set of constraints of M1 by removing redundancies and tightening some conditions. -We present a new class of valid inequalities for M1 and M2 mimicking the inherent property of M3 to avoid forbidden item pairs. -We theoretically show how heuristic-based information can be used for arbitrary choices of γ , generalizing the results of the literature. A side effect of investigating the heuristic used for that purpose is that we can also pass a good-quality starting point to the ILP solver.
The remainder of this letter is structured as follows: In the next section we introduce the notation and the M1 and M2 formulations from the literature. In the core Sect. 3, we present the new reduction procedures and the new family of valid inequalities.
Finally, extensive computational tests are reported in Sect. 4, aiming at computationally evaluating the beneficial effect of the newly proposed techniques.

Preliminaries and basic models
In this section, we present the most important terminologies and notations, as well as a brief overview of the current ILP formulations from the literature. Let us consider n ∈ N items (jobs), each of which possessing a resource demand (item size) c i ∈ N that is active only in the interval [s i , e i ) formed by the starting time s i ∈ N and the ending time e i ∈ N, i ∈ I := {1, . . . , n}. Then, these items have to be assigned to bins (servers) of capacity C ∈ N so that a weighted sum consisting of (i) the number of servers required and (ii) the number of fire-ups is minimized, where the second term is scaled by a weighting parameter γ > 0. Let K := {1, . . . , n} denote the set of servers and let T := i∈I {s i , e i } and T S := i∈I {s i } collect the relevant instants of (starting) times. Moreover, we assume the items to be ordered with respect to non-decreasing starting times (where ties are broken arbitrarily), and we use I t := {i ∈ I | t ∈ [s i , e i )} to indicate the active jobs at time t ∈ T .
In total, three different compact models for the TBPP-FU have so far been formulated in the literature. To reiterate, the original publication [3] contained two ILP formulations (called M1 and M2) which were based on classic job-to-server assignment variables. Some first reduction methods together with a new third exact approach (called M3) have been presented in [17], all of which contributed to significantly better numerical results. To facilitate understanding the ILP formulations, here we only present the raw versions from [3] in some detail. However, we emphasize that all of our contributions presented later are directly applied to the models from [17], which correspond to the current state of the literature. In contrast, a formal introduction of M3 is skipped since our new reductions either specifically address the set of servers to be modeled (which, however, is not explicitly contained in M3) or exclude combinations of items (which is automatically done in M3 through job-to-job coupling).

Remark 1
Nevertheless, we will also consider an improved version of M3 in the numerical part, but its differences with the version from [17] are very general and therefore traceable even without explicit knowledge of the model itself.
To state M1, let us consider the following four types of binary variables: -We have z k = 1 if and only if server k ∈ K is used.
-We set x ik = 1 if and only if job i ∈ I runs on server k ∈ K . -We use y tk = 1 if and only if server k ∈ K is active at time t ∈ T . -We have w tk = 1 if and only if server k ∈ K is activated at time t ∈ T S . Consequently, the z-variables measure the number of servers required whereas the w-variables are responsible for counting the fire-ups. The original version of M1, as presented in [3], then reads as follows The objective function minimizes a weighted sum of the aforementioned criteria. Constraints (1) make sure that the capacity of an active server is respected (right hand side), and that an empty server is deactivated (left hand side). Conditions (2) demand that any job is assigned exactly once, while linking the different types of variables is done by Restrictions (3)-(5). In particular, (5) is responsible for recognizing a fireup in precisely those cases where the considered server is currently active, but was inactive at the preceding instant of time (indicated by t − 1 in a symbolic way). The second formulation M2 appearing in [3] addresses the temporal aspect of the problem in a less explicit way, e.g., by not making use of the y-variables measuring the activity of the servers. Instead, however, the sets δ i := j < i s i < e j and δ + i := j < i s i ≤ e j , i ∈ I , gathering jobs being active at s i (see δ i and δ + i ) or just having finished at s i (only δ + i ) are required. With these ingredients, M2 from [3] is given by: The objective function and Conditions (11) already appeared in exactly the same way in M1, whereas Constraints (10) again ensure that the server capacity is respected. Restrictions (12)-(13) are responsible for linking the different types of variables. In particular, Conditions (13) state that a fire-up at time s i has to be perceived on server k, if item i, but none of the items from δ + i , has been placed on it. Besides being relatively large in size, the original versions of M1 and M2 possess some structural drawbacks: (i) The solution space is highly symmetric. (ii) The LP relaxation is rather poor. (iii) The set K is much larger than required in an optimal solution.
Note that the first two problems are mainly related to the Kantorovich-type structure of the models and were already successfully addressed in parts in [17]. Without going into too much detail (for this, we refer the reader to the contributions of [17]), this was done primarily by: (R1) Renumbering the servers so that only index pairs (i, k) from the set Δ := {(i, k) | k ≤ i} have to be considered. This also made it possible to move to serverdependent time sets which helped to also save some of the y-and w-variables. (R2) Introducing valid inequalities z k ≤ t∈T S (k) w tk for any k ∈ K implying at least one fire-up on every server. In particular, this prevents the two variable types in the objective function from becoming arbitrarily small independently of each other. (R3) Lifting the item sizes c i , i ∈ I , to possibly tighten Conditions (1) and (10).
Furthermore, by additional minor reductions (referred to as (R4) and (R5) in [17]), some redundancies within the set of constraints could be eliminated in M2. Based on numerical tests, it was shown that the listed techniques lead to significant improvements of the compact models, with (R1) and (R2) proving to be particularly valuable for the benchmark instances considered. This was due mainly to the fact that the size of the models could be reduced by roughly 50% (both in terms of variables and constraints), but also to significantly better LP bounds. For this reason, it seems worthwhile to us to explore further reduction possibilities (for the approaches from [17]) and present them in the next section, thus arriving at the somewhat "best possible" version of the compact formulations M1 and M2 (and, in the light of Remark 1, also M3).

New reduction methods
The new reduction methods to be proposed in this section can be roughly divided into three categories: (a) "optimizing" the set of constraints, (b) adding new valid inequalities, (c) a heuristic-based reduction of the model size.
While item (a) is specific to the y-variables and Constraints (1) from M1, (b) and (c) can be applied to M1 and M2. For this reason, we will discuss all these methods using M1, but also briefly point out how they might need to be modified for M2. For the sake of completeness, we again point out that the specification of a starting point for the ILP solver, which is implicitly associated with (c), can of course be used for all three existing models.

Reduction (a)
We first observe that after having applied Reduction (R1), Constraints (1) from M1 only need to be formulated for the time steps t ∈ T (k) relevant on server k ∈ K . However, we do not require the whole inequality chain for each t ∈ T (k), since either of the two parts has a very specific purpose. To be more precise, the left hand side is only important to perceive the deactivation of a server, while the right hand side helps to recognize an active server. Hence, it suffices to state the first inequality for ending times t ∈ T E (k) = i≥k {e i } (on server k) only, whereas the second inequality needs to be formulated just for the server-dependent starting times t ∈ T S (k) = i≥k {s i }. In the latter case, we can even go a step further by noting that only the non-dominated starting times T nd S (k) have to be considered. Observe that a starting time t 1 is called dominated if it is directly followed by another starting time t 2 (so that every job that is active at t 1 is still active at t 2 ), see also [10,17]. After we have accordingly broken the chain of inequalities into two parts, we can strengthen the first part by moving from y tk ≤ i∈I t :(i,k)∈Δ c i x ik to y tk ≤ i∈I t :(i,k)∈Δ x ik . Obviously, the latter better conveys the important message that an empty server cannot be active and an active server needs to have at least one item running on it. Overall, Reduction (a) helps to save some redundant conditions appearing in (1), while other conditions of that type are even tightened.

Reduction (b)
One of the major advantages of Model M3 from [17] consisted of the fact that the set of variables to be considered was very small due to the elimination of forbidden item pairs. By that, we mean that any pair of jobs appearing in F : cannot be executed on the same server. While this could be incorporated directly into the model generation of M3 due to the job-to-job coupling, additional valid inequalities are needed for M1 and M2 to avoid such pairs. However, the naive way of just demanding x ik + x jk ≤ z k for any k ∈ K and any (i, j) ∈ F with (i, k), ( j, k) ∈ Δ normally leads to a very large amount of further constraints (up to O(n 3 ) of them in the worst case). For this reason, here we propose a more sophisticated strategy that produces fewer and at the same time (partly) better such inequalities. To this end, note that for a fixed server k it would be sufficient to collect (a reasonable subset of) the maximal cliques of the incompatibility graph G(k) = (I (k), F) formed by the relation F on the set I (k) := {i ∈ I | i ≥ k} of items that can be assigned to server k. Then, for any such clique C (related to server k) with |C| ≥ 2, the condition i∈C x ik ≤ z k (17) has to be added. For M1, we recommend to replace z k on the right hand side of this inequality by y tk , where t is the last starting time of the items from C (that is, one specific instant of time where all these items are active). Thanks to the coupling conditions (4) from M1, this will impose an even stronger constraint. To find the maximal cliques, we propose the following strategy, starting with k = 1 (so that I (k) = I holds): (i) Find the maximal cliques of the subgraph (of G(1)) formed only by the items having c i > C/2. (ii) For any fixed item i ∈ I with c i ≤ C/2: consider the subgraph formed by the items j with c j > C − c i that are adjacent to i in G(1). When we add {i} to a maximal clique of this subgraph, then we end up with a maximal clique of G(1).
By these two steps, we will find all maximal cliques of G(1) thanks to the following result: Proof Indeed, if there were two such items in the clique, we would not have an edge between them which gives the contradiction.
For any remaining server k ≥ 2, the maximal cliques of G(k) (having size |C| ≥ 2) can be iteratively obtained from the information of the previous step k − 1 by deleting the item i = k − 1 from any maximal clique of G(k − 1) and applying cardinality and dominance tests to the obtained subsets of items.

Reduction (c)
The techniques presented so far (in Sects. 2 and 3) have not yet contributed to the reduction of the quantity |K |, which has a decisive influence on the model size as, for instance, the index k appears in any of the four variable types of M1. To deal with this challenge, which was already established as item (iii) in Sect. 2, appropriate heuristic information can be applied. However, from a theoretical point of view, the latter has only been successfully achieved for very small choices of γ : Then, the number of servers required for the TBPP-FU is equal to the number of servers in an optimal solution to the TBPP.
The previous result allows for either computing the optimal size of K beforehand by solving a somewhat easier auxiliary problem (that is, the TBPP) or to at least limit the number of servers to any value obtained by a heuristic solution. However, as also reported in [3, Example 2.2], this result does not hold for larger choices of γ , so that reducing the size of K cannot be performed in the majority of the cases. To tackle this issue the following result presents an easy way of using heuristic information in the general case, too.

Theorem 2 Let z heu be the objective value obtained by any heuristic for the TBPP-FU. Then, the number of servers required in an optimal solution is at most k
Proof If the claim was wrong we would need at least k + 1 servers in an optimal solution. Since every server is switched-on at least once, this would lead to an objective value of at least giving the contradiction because the heuristic would have to be better than the optimal solution.
This result allows us to replace the set K = {1, . . . , n} at all positions in M1 and M2 with an appropriately defined and greatly reduced set K := {1, . . . , k }. Moreover, we recommend to pass the heuristic solution to the solver to give it a warm start. For our investigations, we will use the constructive look-ahead heuristic (CLH) described in [3,Sect. 3], but in a slightly more exploratory way. Before explaining the precise meaning of this intention, let us briefly collect the main idea of that heuristic: As stated in Sect. 2, we start by an item list ordered with respect to non-decreasing starting times s i (where ties are broken in an arbitrary way) and process the items one by one. Moreover, we require a look-ahead parameter q ∈ N indicating the number of future items to be taken into account when making the current decision.
In a specific iteration, we consider a fixed item i ∈ I and assign it to every open server that is able to accommodate it, and (as another alternative) also to a new empty server. By that, we obtain various different assignments A 1 , . . . , A p . Now, we add the next q items to any of these assignments in a best-fit fashion, leading to the extended assignments A 1 , . . . , A p . Finally, we compute the corresponding objective values (i.e., the weighted sum of servers and fire-ups) and place item i to that bin whose extended assignment led to the lowest objective value. Since in [3] the parameter q = 3 was used without compelling justification, we will first preface our actual test calculations in the next section with a somewhat more detailed consideration of the CLH algorithm.

Data sets and methodology
For our numerical calculations, we coded the above approaches in Python (version 3.9.2) and used its Gurobi (version 9.1.1) interface to solve the resulting ILP formulations with default settings and a time limit of 30 min per instance. All the experiments were run on an AMD A10-5800K processor with 16 GB RAM, that is, the same hardware as in [17]. Due to its novelty, the TBPP-FU has not yet been able to leave a large scientific footprint in the relevant literature, so that only one set of benchmark instances has been specifically designed for the problem under consideration, see [3]. In that publication, the authors propose 160 instances formed by 32 groups of five instances each, all sharing the same capacity C = 100. Apart from that, any group is determined by four indicators: Even though the total number of instances appears to be relatively small, they can be considered very suitable for numerical test calculations due to their difficulty, especially because only 63 of them could be solved in the original publication [3]. Also the improvements discussed in [17] could increase this number to only just over 50% (that is, 85 out of 160), so that their solution still represents a serious challenge from today's point of view. In the following discussions, the improved versions of the compact models from [17] will be referred to as M1 , M2 , and M3 . To reiterate, beyond providing a heuristic starting point, M1 and M2 differ from their previous versions by applying the techniques proposed in Sect. 3. For model M3, we note that based on the impressions of some internal preliminary test calculations, the final implementation in [17] sometimes did contain only a subset of the valid inequalities of type x ik ≤ x kk , but unfortunately this was not sufficiently clearly stated in the text itself. Although this approach suggested slight performance advantages from the point of view of that time, (in the meantime) these expectations have not been generally confirmed with respect to the whole benchmark set tested here. For this reason, in contrast to [17], here we use the complete set of constraints of the above type (and, of course, the same heuristic starting point as for the other models) to define M3 .

Computational results for CLH
In a first experiment, we study the influence of the look-ahead parameter q on the performance of the CLH approach from [3]. For this purpose, we exemplarily consider the instances with n ∈ {100, 200} items and refer to the results in Table 1. Based on this data, one can see the rough trend that a deeper look into the future (that is, a larger value of q) typically leads to a reduction in the heuristic value. However, this is by no means a strictly monotonous relationship, because for two different choices of q the obtained assignments will be considerably different, in general. Consequently, although the tabulated data give a very consistent picture in that large values of q are to be preferred, there is no single universally best choice of that parameter.

Remark 2
To better evaluate these data, column L B in Table 1 contains the average rounded-up LP values of M1 , i.e., a lower bound for the integer optimal value. By that, we see for instance that the average difference between the heuristic and the optimal value is bounded by roughly 27% for the hard instances with n = 200 items, but for a few constellations (especially with c H ) it is also (considerably) larger since it is much harder to obtain a dense heuristic packing in these cases.
As for the computational efforts, it is important to note that all the heuristic values can be determined very quickly, meaning that for many scenarios (n, q) the heuristic solution is available in less than 1 s. Even checking all the look-ahead parameters q (mentioned in Table 1) for an instance with n = 200 jobs and then deciding on the best result (see column 'best' in Table 1) takes only about 17 s on average, which is quite acceptable when measured against the time limit of 30 min permitted for the exact solution of these rather challenging instances. For this reason, and considering that our intention is to present a preferably maximally reduced compact formulation, we will always choose the best heuristic value to define k (that is, the number of initialized servers) appearing in Theorem 2. We note, however, that one could alternatively agree on a compromise between computational effort and quality of the heuristic solution and always use a fixed value of the look-ahead parameter (say q = 20), since already this leads to a significant improvement (e.g., on average about 15% better heuristic values for n = 200) compared to the relatively arbitrary choice of q = 3 from [3], without noticeably increasing the time required. Either way, as the cardinality of K strongly influences the number of variables and constraints, very powerful reductions in terms of the model size can be expected.
For the sake of exposition, we take a closer look at the associated numbers in Table  2. Due to space limitations, we again consider only the subset of instances that was also used in Table 1, but finally we also report the average results over all 160 instances in the last row of Table 2 to allow for a better overall picture. In addition, we also include the values of M3 (and M3 just having slightly more constraints for some instance groups), as this was the best formulation so far in terms of model size. Compared to that approach, we notice that the ideas presented in Sect. 3 lead to significant reductions of the integer programs associated with M1 and M2. To be more precise, while the The best average value for each subset is printed in bold. The column 'best' refers to the average over the best value obtained per instance from the considered subset. The column 'LB' provides the (average) rounded-up LP value of M1 to enable a rough evaluation of the heuristic solution savings in the number of constraints is about 60% in both cases (compared to M3), in the case of the number of variables it ranges from about 32% (for M1 ) to 45% (for M2 ). These reductions become even more remarkable when referring only to the comparison of the literature version M1 (resp. M2) with the version M1 (resp. M2 ) improved in the context of this work. On average, here we end up with roughly 75% fewer variables (in both cases) and, depending on the formulation, between 67% and 80% fewer constraints. While for a fixed number of items n and a fixed model we previously saw very little variation in the indicators n var and n con among the several groups of instances, we now observe that our reductions are particularly successful when short and/or low-resource jobs are considered (see d S and c L ). On the one hand, these constellations tend to lead to particularly few interactions between the jobs and therefore allow for better heuristic solutions (typically leading to a small value of k ), which is also clearly supported by the results from Table 1. On the other hand, for the case c H , the number of forbidden item combinations increases significantly, so that, for example, a sometimes substantial number of valid inequalities (see Reduction (b) in Sect. 3) must be added to the model.

Computational results for the compact models
In a next step, we focus on the performance of M1 , M2 , and M3 when addressing the exact solution of the benchmark instances. To this end, we tabulate the obtained results in Table 3 and compare them with the previous state of the literature (that is, M1, M2, and M3 from [17]). First, we note that the contributions from Sect. 3 (and also the warm start of the solver) helped to significantly increase the number of instances solved to optimality. More specifically, the modifications to M1 (M2, and M3) resulted in 18 (18, and 15) additional proven optimal solutions, so that all formulations now perform considerably better than their original versions from [17]. A table containing more information about which model was able to solve which instance can be found in the "Appendix" section.

Remark 3
Interestingly, in at least ten additional cases M2 already had the correct optimal value, but failed to prove the optimality within the given time limit. For M1 and M3 , these numbers resulted in 0 and 3 instances, respectively, see also Table 7 in the "Appendix" section.
Due to its small model size and the fact that, for example, the reduction related to K is particularly promising for large values of n, M3 is the best formulation for the very small instances with n = 50 items, but constantly loses this leading position for larger instances (especially in comparison with M1 ). Overall, it is noticeable in Table 3 that the performance of M1 , M2 , and M3 has improved, especially for many instances from the constellation (d S , c L ), which further supports the observations made in Table  2 that the reductions (of M1 and M2) are particularly strong for these cases. However, the ideas from Sect. 3 not only contribute to an overall improvement in the number of optimally solved instances, but (in many cases) also to considerably lower computation times required. On the one hand, this can be seen from the average values in Table  3, but it becomes somewhat more evident if we look at the percentage of optimally Table 3 Number For the sake of completeness, also the average exit gaps (computed based on all 160 instances) are displayed. We use bold numbers to indicate the best formulation per subset Fig. 2 Temporal development of the number of instances solved to optimality by the various formulations solved instances over time, see Fig. 2. Therein, it is clearly visible that at any point in time a fixed updated formulation dominates its corresponding original model from [17]. Moreover, given the additional improvements from Sect. 3, either M1 or M2 always possesses the most convincing performance. The generally smaller model size of M2 causes it to dominate M1 within the first 2 min, while in the long term M3 offers similar and M1 even better numerical results. We attribute the latter to the fact that the numerous coupling conditions in M1 (i.e., the implications inherent to Constraints (3)-(5)) have a large effect in the deeper layers of the branch-and-bound tree, since already the specification of a small set of variables actually fixes a much larger set of variables to integer values. Moreover, the methods contained in Reduction (a), and the fact that the valid cuts from Reduction (b) can be formulated in a stronger way may also have contributed to the slightly better overall performance of M1 .
While these considerations refer only to the successfully solved instances, the exit gaps provided in the last row of Table 3 also indicate the improvements with respect to all instances. Roughly speaking, all compact models were able to reduce their exit gap by at least 45%, with M1 standing out here with a reduction of more than 80%. This observation is partly due to the fact that the specification of a starting point now generally leads to reasonable approximate solutions even for very difficult instances.

Remark 4
To gain a rough insight into which of the presented reductions have which effect, we exemplarily solved again the more difficult half of the instances (those with n ≥ 150) by different variants of M1. More precisely, we start with the version from [17] and then gradually add the methods from Sect. 3, see Table 4. Note that for Reduction (c) we distinguish between the mere heuristic-based reduction (called "M1 (cold)") and the additional use of the feasible solution as a starting point for the ILP solver.
In terms of model size, one can clearly see that Reduction (c) makes the largest contribution, reducing both variable and constraint numbers very significantly. However, also Reduction (a) helps to make a remarkable improvement, especially by already removing roughly 40% of all constraints. In contrast, as expected, adding valid cuts (i.e., applying Reduction (b)) again leads to an increase (of 22%) in constraints, but this still results in better overall performance, especially for n = 150. With respect to Table 4 Some key indicators summarizing the effects of the step-by-step reduction for n = 150 (upper half, i.e., rows 1-6) and n = 200 (lower half, i.e., rows 7-12) In addition to the notation already used before, n nz represents the number of nonzero elements appearing in the constraint matrix of the optimization problems the number of instances solved to proven optimality and the solution times required, it can be observed that for n = 150 each individual method has an approximately equal contribution. For the even more difficult instances with n = 200 items, Reductions (a) and (b) do particularly lead to significantly better objective function values (in total, we see a reduction of almost 50% from M1 to M1+(a)+(b)), but the optimality for additional instances can only be witnessed after having added Reduction (c). Hence, from the point of view of model performance, Reduction (c) as a whole (i.e, using heuristic information plus warm start) is typically slightly superior to the other two individual techniques. Some more detailed results can also be found in Table 8 in the "Appendix" section.
Moreover, Fig. 3 additionally gives an overview of the average objective values (again only for the 80 harder instances with n ≥ 150) over time. Besides the obvious and substantial improvements of M1 , M2 , and M3 (over the original versions), we highlight the very good performance of M2 (see Fig. 4 for an enlarged section) at almost all instants of time, which we again particularly attribute to the much smaller model size. Although Fig. 4 might suggest this, M2 is nevertheless not better than M1 or M3 for every single instance, see also Table 5.
We observe that the comparison between M1 and M2 ends in a draw here, with 16 wins for each of the models, while both of the previously mentioned formulations dominate the M3 model in terms of the objective function value found much more often than they are defeated by it. From a general point of view, it can be said that the advantages of M1 lie in particular in its ability to obtain proven optimal solutions, while M2 is able to (on average) produce slightly better feasible points even for challenging instances of the TBPP-FU. The high quality of the approximate solutions obtained from M1 and M2 is also confirmed by the fact that the best (rounded-up)  LP bound (over all models) for the instances considered in Fig. 4 averaged 34.55. The fact that M3 , in the light of Table 5, now tends to perform worst in the comparison of the three formulations is mainly due to the fact that, according to Table 2, it has lost its former leading position (in terms of model size) to the other two formulations as a result of the very powerful improvements from Sect. 3.
Overall, it can be concluded that the methods presented in Sect. 3 (and the warm start of the solver) not only substantially improve the performance of the models individually, but also result in the advantageous features of the M3-type approaches (listed in [17]) now being barely discernible in most of the numerical comparisons. As a consequence, also M3 is outperformed in many respects by M1 and M2 .

An outlook: valid cuts from lot sizing
In the literature there are many problems of operations research which, similar to the TBPP-FU, assign additional costs to the start-up of a machine, see for example the uncapacitated lot sizing problem [21]. For that problem, classes of valid inequalities are also known, whose applicability and usefulness for the TBPP-FU we want to briefly discuss here as a conclusion of our considerations. In particular, this is also to emphasize that the improvements in compact models achieved in this article have indeed reached a certain plateau level and, as a consequence, that obvious ideas from adjacent fields do not easily lead to further numerical advantages. For the sake of exposition, in a final experiment, let us therefore add the following types of constraints to M1 The first two sets are directly taken from [21], establish an additional w-y-coupling of the variables appearing in M1 , and can be added for all k ∈ K and t ∈ T S (k).
In particular, these restrictions include that a server that is active at a given instant of time cannot be activated at the following point in time. The third set of conditions involves a lower bound lb t ∈ N which is defined as the optimal value of a bin packing problem containing precisely the subset I t of jobs being available at time t ∈ T S . By that, we make sure that sufficiently many servers are active at every time instant to accommodate the items that are executed at that moment.

Remark 5
Since all these valid cuts explicitly require the presence of the y-variables, they cannot be applied to M2 or M3 .
Let us refer to the model containing all these new valid inequalities by M1 . Then, for the benchmark set and hardware specified in Sect. 4.1, we obtain the average numbers collected in Table 6. In terms of model size, we note that variables remain untouched, while there is an obvious increase in the number n con of constraints and also in the number n nz of non-zero elements in the system matrix. In contrast, the LP bound z L P at the root node improves only negligibly. In our observations, these opposing effects nevertheless lead to a marginal improvement of the model performance overall. As can be seen from the exit gap, slightly better feasible points are found on average, so that in the end exactly one more instance could be optimally solved. We attribute this in particular to the fact that although the additional inequalities do not necessarily contribute to raising the continuous bound in the root node, they do help to keep the resulting branch-and-bound trees (over the entire time period) somewhat smaller, see also Fig. 5. Finally, however, we would like to point out that despite the same software and hardware, the solution process applied by the Gurobi solver is subject to a high degree of randomization and thus a comparison of both sets of experiments (with such a small difference) is difficult. In particular, we do not want to claim that the small performance deviations between the two variants are an actual advancement of the model itself, since they could have been caused by other effects. Even though we will refer to M1 as state-of-the-art for these reasons, it was important to us to at least briefly discuss this alternative variant in the context of an outlook, since one or the other version could prove to be more advantageous for concrete problem instances from practice or other (future) benchmark sets.

Conclusions
In this article, we dealt with the temporal bin packing problem with fire-ups, a relatively new decision making problem in operations research typically leading to integer models of challenging size. Even though some fundamental methods for obtaining more tractable formulations have already been described in the recent literature, these investigations do not yet turn out to be "complete" upon closer inspection, especially because the incorporation of heuristic information has so far only been possible for a few special cases. Therefore, the contributions of this article were aimed in particular at three methods to improve existing ILP models: "optimizing" the set of constraints (by removing redundancies and tightening some inequalities), adding valid inequalities, and reducing the number of servers to be considered (thus considerably decreasing the overall model size). Based on numerical computations, the positive effects of the new techniques (together with the warm start of the solver) could be manifested. We underline not only the fact that, as a result of the improvements, each model was able to solve at least 15 additional instances (compared to its previous version from [17]) of the challenging benchmark set from [3], but also highlight that, in particular, the optimal solution of 14 instances (twelve by M1 , nine by M2 , seven by M3 , and six by all three models) was obtained for the first time. Now that the investigation of assignment models for the TBPP-FU is somewhat "complete", future research should focus in particular on flow-based models or branch-and-price approaches. Moreover, theoretical results dealing with the worst-case performance of heuristics for the TBPP-FU have not yet been addressed at all in the literature. From a practical point of view, also generalizations of the problem considered here could be explored, in which, for example, the time interval [s i , e i ) of a job can be shifted slightly, which is then associated with penalty costs (for early or late execution). An idea related to this in a certain sense has already been mentioned in the concluding parts of [10] for the TBPP.
Funding Open Access funding enabled and organized by Projekt DEAL.

Availability of data and material
The instances used in that paper were originally designed in [3] and can be found online, see https://github.com/sibirbil/TemporalBinPacking.

Conflict of interest
The authors declare that they do not have any conflicts of interest.
Code availability The instances were solved by the commercial software Gurobi. The underlying implementation of the models in Python can be found in https://github.com/wotzlaff/tbpp-cf2.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A Further numerical results
See Tables 7 and 8.  Table 7 Detailed numerical results for every instance (numbered from 1 to 5 for any parameter constellation) of the considered benchmark set The table contains the best objective value z best found by any of the models M1 , M2 , and M3 as well as the information which approach led to that value (indicated by 'V'). Whenever, in addition, an instance was solved to proven optimality by a given formulation, we use the symbol 'S' (instead of 'V'). This table also supports the observations from Remark 3 Table 8 Detailed 1360.1 (11) 1219.9 (14) We start with the original version from [17], which is then improved step-by-step with the methods presented in Sect. 3. Reduction (c) is split into two parts: at first, we just reduce the number of servers by using heuristic information, but we do not provide the feasible starting point (called "M1 (cold)" in the table), then, in a second step, we make use of the warm start option (leading to M1 )