Lot sizing with storage losses under demand uncertainty

We address a variant of the single item lot sizing problem affected by proportional storage (or inventory) losses and uncertainty in the product demand. The problem has applications in, among others, the energy sector, where storage losses (or storage deteriorations) are often unavoidable and, due to the need for planning ahead, the demands can be largely uncertain. We first propose a two-stage robust optimization approach with second-stage storage variables, showing how the arising robust problem can be solved as an instance of the deterministic one. We then consider a two-stage approach where not only the storage but also the production variables are determined in the second stage. After showing that, in the general case, solutions to this problem can suffer from acausality (or anticipativity), we introduce a flexible affine rule approach which, albeit restricting the solution set, allows for causal production plans. A hybrid robust-stochastic approach where the objective function is optimized in expectation, as opposed to in the worst-case, while retaining robust optimization guarantees of feasibility in the worst-case, is also discussed. We conclude with an application to heat production, in the context of which we compare the different approaches via computational experiments on real-world data.


Introduction
Lot Sizing (LS) is a fundamental problem in a large part of modern production planning systems. In its basic version, given a demand for a single good over a finite time horizon, the problem calls for a production plan which minimizes storage, production, and setup costs, while keeping production and storage levels within the prescribed lower and upper bounds.
In the paper, we focus on a variant of LS where the storage suffers from proportional losses and the product demands are subject to uncertainty. 1 This variant suits the case of many applications in the energy sector where a small portion of the energy that is stored is lost over time and the demands (of, e.g., heat, as in the application that we will consider) are often not known in advance. This is especially relevant when the decision maker has to commit to a production plan ahead of when it becomes operational.
To cope with uncertainty, we develop a robust optimization approach. After observing that a fully first-stage approach is not viable, we adopt an approach with first-stage set-up and production variables and second-stage storage variables. For it, we establish a reduction showing that this problem can be solved as an instance of its deterministic counterpart with suitably (re)defined demands and storage upper bounds. Then, we consider the case where the second-stage decisions comprise not just the storage but also the production variables, while still retaining first-stage set-up variables. After showing that solutions to this problem may be acausal (or anticipative), i.e., that they may contain production variables whose value, at time t, can only be determined by knowing the realization of the demand at time t > t, we introduce an affine rule approach. This technique, although restricting the solution set, allows for causal production plans. We then illustrate a set of computational experiments in the context of an application to heat production.
The paper, which is an extended version of Coniglio et al. (2016), is organized as follows. In Sect. 2, we summarize some key results on the deterministic version of LS and present some relevant previous work on robust optimization. The version of LS with storage losses that we tackle in this paper is formally introduced in Sect. 3, together with its uncertainty aspect. In Sect. 4, we introduce the robust counterpart of the problem with second-stage inventory variables and present our reduction. The approach with second-stage production and storage variables is proposed in Sect. 5, while, in Sect. 6, we present a hybrid stochastic-robust extension. Computational results are reported and illustrated in Sect. 7. Concluding remarks are drawn in Sect. 8. regardless of the accuracy of the estimation, such stochastic techniques are, in many cases, intrinsically doomed to suffer from the curse of dimensionality (Bertsimas and Thiele 2006). This is because they usually require a computing time which is, at least, linear in the size of the probability space (assumed to be discrete) to which the realized demand belongs-such space is, typically, exponential in the length of the time horizon. For an alternative way of modeling stochastic demands, via socalled service-level constraints, see Bookbinder and Tan (1988), Tempelmeier andHerpers (2011), andTempelmeier (2013). Also see Calafiore (2008) for a stochastic programming approach where feasibility is enforced in expectation.
A different option, often more affordable from a computational standpoint, is resorting to a robust optimization approach. The idea is of looking for a solution which is feasible for every realization of the uncertain demand belonging to a given uncertainty set and which, among all such feasible solutions, is of minimum cost in the worst case. Two-stage robust optimization is an extension of this approach in which the variables are partitioned in two sets. The first-stage (or here-and-now) variables are required to take a value independent of the realization of the uncertain parameters, while the second-stage (or wait-and-see) variables are allowed to vary as a function of the specific realization that takes or has taken place, depending on whether the problem has a dynamical aspect or not.
Two seminal papers applying a robust optimization approach to LS are Bertsimas and Thiele (2004) and Bertsimas and Thiele (2006). The authors address the uncapacitated version of LS with backlogging, static storage, and production costs, with demands subject to a so-called -robustness model of uncertainty Sim 2003, 2004). When applied to LS, the idea of -robustness is of assuming that the uncertain parameters, i.e., the product demand for LS, (i) belong to symmetric intervals and (ii) given an integer , the total number of time steps in which the uncertain demand deviates from its nominal value to one of the extremes of the interval is bounded by , in any constraint of the problem.
Among other results, the authors of Thiele (2004, 2006) show that the -robust counterpart of the variant of LS they consider (with backlogging) can be solved as an instance of the deterministic problem with modified demands, also for the case where production bounds are in place. Notice that, for this result to hold, bounds on the storage cannot be enforced. Unfortunately, such bounds are necessary in many applications, such as, e.g., those in the energy sector, where backlogging is not tolerable as the demand of energy (e.g., heat) must be typically satisfied when issued.
(the latter is incurred when production is active) at time t ∈ T . We consider timedependent bounds U t ≤ u t ≤ U t on the storage, as well as time-dependent bounds Q t ≤ q t ≤ Q t on production, when active.

The deterministic case
The (DETerministic) variant of LS with storage losses tackled in this paper can be cast as the following Mixed-Integer Linear Programming (MILP) problem: For notational consistency with the remainder of the paper, the problem is written in epigraph form, with the objective function moved into Constraint (1b). Constraints (1c) are balance constraints, which only differ from those in the standard LS problem for the (possibly time-dependent) conservation factor α t ∈ (0, 1]. They account for the fact that, at the beginning of each time period t ∈ T , a fraction (1 − α t ) of the previously stored product is lost, whereas, at the end of time period t, the remaining product is conserved. Constraints (1d) and (1e) impose lower and upper bounds on storage and production. Constraints (1f)-(1h) specify the nature of the decision variables.

The uncertain case
Let us assume an uncertain product demand d, belonging to an uncertainty set D -as better illustrated in Sect. 7, we will construct this set from a mixture of historical data and forecasts. In the combinatorial optimization literature, two types of uncertainty sets are predominantly employed: finite uncertainty sets and polyhedral uncertainty sets. In the former case, D is assumed to be a finite set, albeit possibly extremely large. 2 In the latter, D is assumed to be a bounded polyhedron. The -robustness approach used in Bertsimas and Sim (2004) falls into this category. Its uncertainty set, the uncertainty set, corresponds to assuming, for each time period t ∈ T , that the uncertain demand d t belongs to a symmetric interval [d t −d t ,d t +d t ] centered around a nom-inal valued t and that, in each constraint of the problem, the demand could deviate fromd t by ±d t for, at most, time steps.

Robust LS with first-stage production and second-stage storage
As a first approach to the problem, we assume, as natural in a number of applications, that the storage variables u could be determined in a wait-and-see fashion as a function of the realization of the uncertain demand d, while all the other variables are required to a take a here-and-now value independently of the realization. This corresponds to adopting a two-stage setting with second-stage storage variables, as better explained in the following. 3 In terms of the production and set-up variables, this approach can be regarded as an open-loop approach (see Calafiore 2008), in which the value of the production variables at time t does not depend on the realization of d up to time t. Although we will consider more flexible approaches in the next section, this one is of interest since, as shown in this section, it allows for solving the corresponding robust problem very efficiently, as an instance of the deterministic problem, obtained after modifying a few of its givens.
The problem with second-stage u, which we report in full for notational convenience, reads: It only differs from LS-DET in that, here, we have a vector u(d) for each realization d ∈ D.
As for the classical version of LS without storage losses, see, e.g., Bertsimas and Thiele (2006), Gaussian elimination shows that, in any feasible solution, the storage variables u are uniquely defined as a function of d and q: with: A compact formulation for this problem is thus obtained by substituting, in the previous formulation, the right-hand side of Eq. (3) for u(d).
With the next theorem, we show how LS-2RO u can be reduced to an instance of LS-DET with identical givens except for a different demand d and different storage upper bounds U , defined as follows: The idea is that, with this definition, the newly introduced demand d t specifies a lower bound on the product which has to be available at time t ∈ T to ensure that every demand d ∈ D can be met. The value t reduces the storage upper bounds of the transformed problem to prevent that, if a large production is realized but a large deficit in demand occurs (an event which would result in an overflow of storage), the storage upper bound U t is not exceeded. Formally: Theorem 1 Assuming d t and t are well defined for each t ∈ T , LS-2RO u can be reduced, with respect to the first-stage variables (η, z, q), to an instance of LS-DET with d = d and U = U , as defined in Eq. (4).
Proof First note that, due to the assumptions on D, the values for d t and U t can be computed iteratively from t = 1 to t = |T |. Let u t (d) be defined as in Eq.
(3) and let: For a given (z, q), adopting d = d and U = U , we now show that (η, z, q, u(d)), with u(d) and η defined as above, is feasible for LS-2RO u if and only if it is feasible for LS-DET with demand d and storage upper bounds U . Following the derivations reported in "Appendix A", we deduce: What remains to be shown is that Constraints (2d) are satisfied by u(d) if and only if Constraints (1d) are satisfied by u. This is done by observing that, for all t ∈ T , the following holds true: Since the objective functions moved to Constraints (2b) and (1b) are equal up to a constant additive term, as highlighted in Eq. (6d), we deduce that (η, z, q) is optimal for LS-2RO u if and only if it is optimal for LS-DET.
We remark that the reduction in Theorem 1 allows us to cast LS-2RO u as an instance of LS-DET with static lower bounds on u. This can be quite relevant as, to the best of our knowledge, most of the algorithms for LS-DET, such as the one by Atamtürk and Küçükyavuz (2008), require a storage with either zero or static lower bounds. From a computational complexity perspective, Theorem 1 allows us to establish a direct relationship between LS-2RO u and LS-DET:

Corollary 1 Given an uncertainty set D over which a linear function can be optimized in polynomial time, the reduction in Theorem 1 is a polynomial reduction. As such, LS-ROB2-u is in P (respectively, N P-hard) if and only if the corresponding version of LS-DET is in P (respectively, N P-hard).
As a consequence of Corollary 1, LS-ROB2-u is in P for all the polynomially solvable cases of LS-DET that we reported in Sect. 2, provided that their algorithm allows for the introduction of time-dependent upper bounds U on u. This is, for instance, the case of the problem studied in Hellion et al. (2012). Note that the result still holds if we introduce additional constraints on z and q or assumptions on the givens (except for d and U ). It is also valid for U t = ∞ and for not necessarily nonnegative demands d.
The assumptions on the uncertainty set in Corollary 1 subsume the cases of many robustness models, including finite uncertainty sets, polyhedral uncertainty sets (such as the -robustness one), and ellipsoidal uncertainty sets, such as those employed in Ben-Tal et al. (2009). The advantages in terms of computing times of this robust optimization approach are better illustrated in Sect. 7.

Robust LS with second-stage production and storage variables
In a number of applications, it is reasonable to assume that not only the storage variables but also the production variables could be adjusted (possibly, in real time) in a second-stage fashion, after the uncertain demand reveals.
Still retaining first-stage set-up variables, we can cast the problem with explicit second-stage variables u(d) and q(d), for all d ∈ D, as this (infinite, for a continuous D) MILP: LS-2RO q,u suffers from an important drawback: acausality (or anticipativity). Indeed: Proposition 1 In the general case, optimal solutions to LS-2RO q,u can be acausal, with q t (d) depending, for some t ∈ T , from d t for t > t.
Note that q 1 1 = q 2 1 , even though d 1 1 = d 2 1 = 1. To be able to decide whether, at time t = 1, q 1 := q 1 1 = 2 or q 1 := q 2 1 = 1, we need to know d 2 (so to be able to determine whether d = d 1 or d = d 2 ). The (unique) optimal solution to the problem is, thus, acausal.

Affine decision rules
To prevent acausality, we resort to affine rules, a technique which has been successfully adopted in a number of optimization and control problems, see, e.g., Ben-Tal et al. (2004), Calafiore (2008), and Bertsimas and Georghiou (2015).
While still allowing q(d) to be a function of d, the idea is to restrict the feasible region of the problem by requiring q t (d) to be, for each t ∈ T , an affine function of the uncertain demand vector d. We impose: which stipulates that q t (d) be composed of two parts: a first-stage part t , which, as in LS-2RO u , does not depend on the realization of d, and a second-stage part j∈T : j≤t ξ jt d j , which does, but only for components of d up to time t. Notice that, with this approach, the coefficients ξ it and t must be determined as first-stage variables. As it is clear, this approach generalizes the case of LS-2RO u with a first-stage q, as the latter is obtained when requiring ξ jt = 0 for all j, t ∈ T .
Overall, we obtain the following (infinite, for a nonfinite D) MILP with Affine Rules (AR): To arrive at a finite MILP, we can first derive a semi-infinite formulation by substituting the right-hand sides of Eq. (3) and Constraints (9f) for variables u(d) and q(d). For finite scenario uncertainty sets, the problem thus obtained can be rewritten as a finite MILP by duplicating the constraints involving the demand vector d exactly |D| times. For the case of -robustness, we can obtain a reformulation of polynomial size via the dualization procedure introduced by Bertsimas and Sim (2004). We report the formulation thus obtained in "Appendix B". Note that, since the feasible region of the problem belongs to a higher-dimensional space than that of LS-2RO u , we cannot hope for a result similar to that of Theorem 1.

A regularization approach
As mentioned in Caramanis (2006), affine rules add an affine regression subproblem to the robust counterpart of a deterministic problem.
Let q * t (d) be the (assumed unique, for simplicity) optimal production value at time t for LS-2RO q,u (in which the second-stage q is not restricted by affine rules and, thus, it can be anticipative), expressed as a function of the realization d ∈ D. If LS-2RO q,u -AR admits a solution with coefficients ξ jt and t satisfying: q t (d) can then take the same value that it takes in LS-2RO q,u and, thus, the affine rules pose no restriction to the problem. If Eq. (10) cannot be satisfied for all t ∈ T and d ∈ D, LS-2RO q,u -AR calls for coefficients which minimize the objective function loss that is incurred when adopting q t (d) = j∈T ξ jt d j + t instead of q * t (d). As typical in many applications, D is usually an estimate of the "hidden" uncertainty set, let us call itD, to which the realizations of the uncertain demand belong. In that case, one should look for coefficients which, rather than just minimizing the worstcase objective function value on demands d ∈ D, are also likely to achieve a good objective function value on realizations d ∈D \ D, thus enjoying good generalization properties. In a number of machine learning works, see Bishop (2006), it is observed that, both in theory and in practice, a better generalization is obtained by introducing a regularization term which limits the magnitude of the coefficients of the function that is being learned-those in Eq. (8) in our case.
For these reasons, we also consider, in the computational experiments, an alternative approach encompassing a restriction on the norm of the vector of coefficients ξ t = (ξ 1t , . . . , ξ jt , . . . , ξ tt ). More precisely, we consider, together with the original approach with unrestricted ξ t , a version where the coefficients ξ jt are restricted to the interval [−1, 1]. This corresponds to introducing, for all t ∈ T , the ∞-norm constraint ||ξ t || ∞ ≤ 1 on the vector of coefficients ξ t adopted in each affine rule.

Hybrid robust-stochastic approach
A robust approach is, arguably, bound not to be viable in a free market where the decision maker is facing a number of not risk-averse competitors. While interested in solutions which are feasible even in the worst-case (which is particularly relevant in the energy sector, as in the application considered in the next section), minimizing the objective function in expectation, as opposed to in the worst-case, is likely to offer solutions which, by being less risk-averse, can be more profitable in practice.
As customary in many results in robust optimization, such as, e.g., those in Bertsimas and Sim (2004) on -robustness, let us assume that d ∈ D is a symmetrically distributed random variable with E[d] =d. With this, the following holds: Proposition 2 Let D ⊆ R n be a set centrally symmetric aboutd, d ∈ D a random variable with density function p : D → [0, 1] symmetric aboutd, and g : D → R an affine function. Then, E[g(d) Then, p is symmetric about 0. Let f be a linear function. We have: (0) is a linear function, we deduce: which concludes the proof.
For LS-2RO q,u (the problem with second-stage u and q and affine rules), the objective function constraint of the hybrid stochastic-robust formulation reads: In the next section, we will apply this approach for the case of -robustness. For a finite D and assuming a uniform distribution with first momentd, the objective can be restated as: In the next section, we adopt this approach for the case of finite uncertainty sets which, by construction (see further), are not symmetric in our application. We remark that the hybrid approach is different from applying affine rules in a stochastic optimization setting, as, here, all constraints, except for the objective function one, are enforced in the worst case. We also remark that the hybrid stochastic-robust approach differs from the purely robust one only when affine rules are employed. This is because, if q is first-stage, the objective function reads: ). After rearranging the terms, it becomes: Since this function only depends on d in the additive term c i t t i=1 α it d i which involves no variables, the additive term can be dropped without affecting the optimality of the solutions that are found. The same holds in expectation over d, where, by linearity,

Computational results
We report on a set of computational results obtained with the models and methods that we have proposed and discussed. As an application, we consider a problem arising within the project Robuste Optimierung der Stromversorgungsysteme (Robust Optimization of Power Supply Systems), funded by the German Bundesministerium für Wirtschaft und Energie (Federal Ministry for Economic Affairs and Energy, BMWi). Originally, this is a Coproduction of Heat and Power (CHP) problem in which fuel is transformed into heat and power (at a fixed heat-to-power ratio) by a CHP device. When assuming that there is no internal power demand and that all the power which is produced can be sold to the market, the problem reduces to an instance of LS with heat as the unique good. Since heat can be stored (as hot water) for free, we set c i t = 0 for all t ∈ T . The cost of fuel is transferred to the static costs c v , together with the profit generated by the amount of power that is sold. More precisely, given a static fuel cost c f per unit of heat, a heat-to-fuel ratio ν, a heat-to-power ratio ρ, and a static market profit c m per unit of power, c v is set to c v := c f ν − c m ρ.

Instances and setup
We consider a dataset spanning a period of 2 years (with some missing months) with hourly time steps. The first half of the data set (338 days) is used as training set, the second half (232 days) as testing set. The heat demand is taken from historical data of a portion of Frankfurt (of around 50000 households). For all t ∈ T , the storage bounds are set to U t = 0 MWh and U t = 120 MWh, while we set the production bounds to Q t = 37.5 MWh and Q t = 125 MWh. We also set u 0 = 36 MWh. The market prices for the power market are taken from EPEX SPOT (the European Power Exchange).
We consider two cases: finite uncertainty sets and -robustness, with uncertainty sets. Both sets are built based on the training set as well as on a heat demand forecast provided by our industrial partner ProCom GmbH. This forecast is generated in a twostage fashion, with an autoregressive component and a neural network one, relying on temperature and calendar events as main influence factors.
The finite uncertainty sets D are constructed as follows. From the original forecast d for the current day, we single out the (up to 70) days from the set of historical time series (the training set) in which demand and forecast are closest in 1-norm. After computing the forecast error between the two, we create a realization where such error is added to the forecast of the current day (for which the problem is being solved).
For the -robustness approach, the hours of the training set days are partitioned into three groups: morning, midday, and evening. Similarly, the demand forecasts are partitioned in the groups low, medium, and high. This way, we can associate each pair (t, d t ) with a category in {morning, midday, evening} × {low, medium, high}. We consider all the hours in a given category over all the training days and look at their deviation (i.e., the forecast error) between the historical demand and the forecast d t . We then take the 95%-quantiles w.r.t. positive and negative deviations and let the deviationd t at time t be equal to their maximum.
In our experiments, we do not introduce constraints on the end-of-horizon value of u |T | . If required by the application though, it is possible to introduce a lower and/or upper bound on u |T | , solve the problem thus obtained and, in case of infeasibility, adjust those values, iterating until two values yielding a feasible solution are found. 4 See Calafiore (2008) for more details on a similar procedure.
The experiments are run on an Intel i7-3770 3.40 GHz machine with 32 GB RAM using CPLEX 12.6 and AMPL as modeling language. All the instances are solved to optimality within default precision.

Realized robustness
While we optimize over uncertainty sets built with a combination of training set and forecast for the current day, we evaluate the quality of the results on the testing set (i.e., out of sample) in terms of realized robustness, simulating a real-world situation in which a solution found via robust optimization is implemented with a realized demandd not necessarily belonging to the uncertainty set D. For a givend, we first transfer any violation occurring in the lower and upper bound constraints imposed on u and q to the balance constraints. This is achieved by redefining q(d) and u(d), for all t ∈ T , as follows: where u 0 := u 0 . This way, the lower and upper bound constraints on storage and production are always satisfied. We then define the violation of the balance constraints for the newly introduced q (d) and u (d) as: For the case of a first-stage q, we simply let: for all t ∈ T , while adopting the same definition as in Eq. (11b) for u (d).
We remark that, if v (d) ≤ 0 andd + v (d) ≥ 0, the solution with q (d) and u (d) corresponds to what a practitioner would arguably do when an infeasibility is detected: production and storage are adjusted so to stay within their prescribed bounds, while the infeasibility is transferred, if possible, to the customer, by satisfying only the reduced demandd + v (d) (remember that v (d) ≤ 0 is assumed to hold here), as opposed to the originald.
For each instance of the problem (i.e., for each realizationd), we evaluate the quality of a robust solution in terms of how it performs when implemented with the given realizationd in terms of (total) realized violation: and realized cost: Note that the realized cost can, in general, be quite different from the objective function value achieved by an optimal robust solution. Indeed, the latter accounts for either the worst-case or the expected cost over D (depending on whether a worst-case or a stochastic objective function is employed), as opposed to the former, which amounts to the cost that realizes for a givend. The adoption of affine rules, although advantageous in terms of the extra flexibility it offers, introduces an element of so-called nervousness to the solutions as, for different realizations ofd, the realized production values q t (d) are likely to be quite different. As mentioned in Tempelmeier and Herpers (2011), a large nervousness can have a negative impact on other managerial decisions which are taken after the lot sizing problem has been solved. To quantify this phenomenon, we also report a cumulated measure of nervousness, defined as: whered is the nominal value of d (the forecast).

Computational study
When affine rules are in place, all the experiments with either |D| = 0 or = 0 (that is, those relying only on the forecast) are obtained by letting ξ jt = 0 for all j, t ∈ T .
When presenting our results, we report the sum over all the instances in our data set of the realized violation, realized nervousness, and realized cost, for each value of |D| or . The charts that we report show the violation-versus-cost and nervousness-versuscost curves, thus allowing for, visually, assessing the tradeoff between these quantities that is achieved with the different approaches. The curves are obtained by joining each pair of points obtained with two consecutive values of |D| or . Computing times are reported in Tables 1, 2, and 3.   Costs  2675  2689  2694  2705  2712  2714  2719  2721   Time  15  767  2842  6975  9971  13792  17358  21161  Table 3 Results for LS-2RO q,u -AR with uncertainty sets

First-stage production and second-stage storage variables (LS-2RO u )
The results for LS-2RO u (the approach with first-stage production variables q and second-stage storage variables u) are obtained by solving the problem as an instance of LS-DET, relying on Theorem 1. We let |D| = 0, 10, 20, 30, 50, 60, 70 for the finite uncertainty set case and = 0, 1, 2, 3, 4, 5, 6 for -robustness. We do not go beyond = 6 as, for larger values, the problem becomes infeasible for at least a few instances. The results are reported in Table 1, with an illustration in terms of realized violations and costs provided in Fig. 1. Nervousness is not reported as, due to q being first-stage, it is equal to zero for all instances.
Overall, we observe that the adoption of -robustness provides slightly better results, with a curve, as seen in Fig. 1, which dominates that for finite scenarios, providing smaller realized violations and costs, especially for small values of |D| and . Interestingly, as these two parameters increase, the two approaches become almost equivalent.
Although, as we will see in the remainder of the section, this approach is limited in terms of quality of the solutions, it comes at the advantage of being very fast from a computational standpoint and, by definition, of not suffering from nervousness.

Second-stage storage and production variables with affine rules (LS-2RO q,u -AR)
We consider LS-2RO q,u -AR (the two stage approach with second-stage u and q). We illustrate the results with finite uncertainty sets and -robustness, considering both the cases of a worst-case objective function and a stochastic one, with either unrestricted coefficients ξ t j or coefficients restricted to ξ t j ∈ [−1, 1] for all j, t ∈ T .
The results for the case of finite uncertainty sets are reported in Table 2 and illustrated in Fig. 2.
Considering violations and costs, when adopting a worst-case objective, we register highly "unstable" solutions when no restrictions are imposed in ξ jt . The solutions that  Fig. 2 Violation and nervousness VS realized costs for LS-2RO q,u with finite scenarios and affine rules, with a worst-case or stochastic objective are obtained when imposing ξ t j ∈ [−1, 1] for all j, t ∈ T are much more stable and largely dominate the previous ones. The introduction of a stochastic objective yields very large improvements w.r.t. both realized cost and violation, while also introducing a good deal of stability even for small values of |D|. The restriction of the coefficients provides an improvement also with a stochastic objective, but not as substantial as with a worst-case one.
Notice that, with |D| = 10, 20, the results are substantially worse than those obtained for larger values of |D|. This is a consequence of overfitting. Indeed, if |D| |T |, there are infinitely many (with |D| − |T | degrees of freedom) affine functions satisfying Eq. (10) and, as such, we are likely to obtain solutions with a large fitting error on demand vectorsd / ∈ D, such as all those in the testing set. We remark that, although |D| = 30 > 24 = |T |, and, thus, the fitting subproblem (assuming the vectors d ∈ D are in general position) is well-posed, |D| = 30 seems to be still insufficient to suitably capture the structure of the "hidden" uncertainty set D as, for |D| ≥ 40, much better results are obtained. Overall, as expected, the quality of the results on the testing set increases with the size of the finite uncertainty set D.
As to nervousness, we observe larger values when unrestricted coefficients are used, as opposed to the restricted case, and when the objective function is minimized in the worst-case, as opposed to in expectation. Not surprisingly, higher values of nervousness are observed in presence of overfitting (|D| < |T |), which then start declining for |D| ≥ 30. Overall, the values of nervousness tend to decrease for larger values of |D| in all the configurations, although not completely monotonically. In all approaches but the worst-case unrestricted one, the nervousness levels seem to converge for larger |D|. Although it is unclear whether the nervousness values are converging for the worst-case unrestricted approach, they are clearly getting smaller for larger values of |D|.
Analogous experiments for the case of -uncertainty sets and -robustness are reported in Table 3 and illustrated in Fig. 3  Considering violations and costs, the results seem quite stable, even with unrestricted coefficients. This is possibly due to the fact that uncertainty sets are continuous and full-dimensional and, as a consequence, the set of points on top of which the affine rules are constructed always contains (except for = 0) sufficiently many data points in general position and, thus, overfitting cannot happen. What is more, we see that, in this case, restricting the coefficients hinders the quality of the solutions. As expected, we also observe that the results with a stochastic objective function dominate those with a worst-case one.
As to nervousness, we observe that, although its value increases substantially with small values of , it seems to converge to a constant for a larger . This value is comparable to that obtained with the three more "stable" approaches with finite uncertainty sets (worst-case restricted, stochastic unrestricted, and stochastic restricted).

A note on computing times
We remark that, for each setting, the hybrid stochastic-robust approach requires, for bigger uncertainty sets (larger |D| or larger ), much shorter computing times as opposed to the approach with a worst-case objective. For -robustness and with the largest values of , the difference in computing time can amount to a factor as large as eight.
We also observe that the -robustness approach is much faster than the finite uncertainty sets one. This is, most likely, due to the fact that, with a finite D, we have to replicate each constraint containing d exactly |D| times whereas, with -robustness, we can employ the compact reformulation reported in "Appendix B".
As it is clear when comparing Tables 1, 2, and 3, the method with first-stage production is faster than the other ones employing affine rules by almost two orders of magnitude. Fig. 4 Hybrid stochastic-robust approaches with unrestricted coefficients with either finite uncertainty sets or -robustness, also compared to the approach with first-stage q and second-stage u with -robustness 1

Comparison of the best methods
We conclude with a final comparison of the best method, the one with second-stage production and affine rules employing a stochastic objective with unrestricted coefficients in the two cases of finite uncertainty sets and -robustness. For the sake of comparisons, we also consider LS-2RO u (that with first-stage production) with -robustness (which performs better than the one with finite uncertainty sets). The results are reported in Fig. 4. Here, LS-2RO u with -robustness is clearly dominated by the other two. While, with finite uncertainty sets, the affine rules fare rather well in terms of costs, the approach with -robustness yields solutions with substantially smaller violations, differently from the method with a finite uncertainty set, whose solutions converge, experimentally, to a total violation of 400, without further improvements for a larger |D|. Assuming the larger computational effort that is required to construct solutions with this method can be undertaken, it should be preferred due to offering higher quality solutions.

Concluding remarks
We have considered a variant of the lot sizing problem with storage losses subject to demand uncertainty. We have developed two two-stage robust optimization approaches, both with first-stage set-up variables, one with second-stage storage variables but first-stage production variables, and another one in which production and storage variables are both second-stage. For greater tractability, we have introduced affine rules to tackle the second problem, also encompassing a regularization aspect. We have then proposed a hybrid stochastic-robust approach with a stochastic objective function and robustness in all its constraints, arguably less conservative than the purely robust method with a worst-case objective.
Computational experiments on a heat production problem have shown that the twostage approach with second-stage production and storage variables outperforms that with first-stage production variables. The solutions that we have obtained in this set-ting with -robustness are better in terms of cost and violations than those obtained with a finite uncertainty set, and are also much faster to be computed from a computational perspective. We have found that the hybrid stochastic-robust approach yields the overall best results, resulting in smaller costs and smaller violations when compared to the one with a worst-case objective, and also being, with -robustness, much faster than the former.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Appendix A: Derivations in the proof of Theorem 1
For t > 1, defineď ∈ D such that the following holds: As a consequence, we deduce: With this, we can then show: Similarly, we deduce: Lastly, we compute η: