In this section we describe a new MINLP CG method. The method, called minlpcg, solves MINLP sub-problems to generate partial feasible solutions \(x_k\in X_k\) and \(w_k\in W_k\), which in the end are combined into a global solution which also fulfills the global constraints. The direction for looking for better partial solutions are given by master problems in the resource space. The basic steps of minlpcg are:
-
1.
Initialize column sets \(R_k, k\in K\).
-
2.
Solve (21) for computing an approximate solution \({\tilde{w}}\) of (19) .
-
3.
Project \({\tilde{w}}\) onto the feasible set finding a solution candidate \(w^*\).
-
4.
Add new columns to \(R_k, k\in K\).
-
5.
Repeat from step 2 until stopping criterion.
Since the sub-models of (10) represent difficult MINLP problems, it is not easy to quickly generate many columns. Therefore, we developed several strategies for accelerating the column generation (CG).
Traditional generation of columns
Dealing with linear block (sub-problem), adapting inner approximation
In the DESS models the dimension \(n_1\) of the linear block (14) is much higher than the dimension \(n_k\) of other blocks. Preliminary numerical experiments showed that, if we distinguish the linear block instead of running traditional CG over all blocks, we obtain a running time reduction in order of magnitude from 2 days towards 1-2 hours. Mathematically this can be explained by the many vertices that the polytope of a linear block has which should be generated as columns if treated as nonlinear blocks. Therefore, we use the following modified LP-IA master problem (25), for which the linear constraints of \(X_1\) in (14) are directly integrated in the LP-IA:
$$\begin{aligned} \begin{aligned} \min&\,F(w(z,x_1)) \\ {{\,\mathrm{\quad s.t. \,\, }\,}}&\sum _{k\in K} w_{ki}(z_k,x_1) \le b_i \ , \, i\in M_1,\\&\sum _{k\in K} w_{ki}(z_k,x_1) = b_i \ , \, i\in M_2,\\&z_k\in \varDelta _{|R_k|} ,\,\,k\in K\setminus \{1\},\quad x_1\in X_1, \end{aligned} \end{aligned}$$
(25)
where \(w_k(z_k,x_1)\) is defined similarly as in (23) by
$$\begin{aligned} w_k(z_k,x_1):=\left\{ \begin{array}{cl} w_k(z_k) &{} : k\in K\setminus \{1\}\\ A_1x_1 &{} : k=1. \end{array} \right. \end{aligned}$$
(26)
So we distinguish \(K\setminus \{1\}\) as the set of nonlinear blocks whereas \(x_1\in {\mathbb {R}}^{n_1}\) are the variables of the linear block, and \(X_1\) is defined by only local linear constraints as in (14).
Lemma 1
Let \(R_1=\text {vert}(W_1)\) be the set of extreme points of \(W_1\). Then problems (25) and (21) are equivalent.
Proof
By definition (14), \(X_1\) defines a polytope. Hence, \(W_1\) is a polytope defined by a linear transformation of \(X_1\), and \(W_1={{\,\mathrm{conv}\,}}(R_1)\). Therefore, \(A_1x_1\) in (26) can be replaced by \(w_1(z_1)\), which proves the statement. \(\square \)
The procedure solveinnerlp(R) solves (25). It returns a primal solution \((x_k)_{k\in K}\), where \(x_1\) is the solution of the linear block and \(x_k=x_k(z_k) \) with
$$\begin{aligned} x_k(z_k):=\sum _{j\in [S_k]} z_{kj} y_{kj}, \quad z_k \in \varDelta _{|S_k|}, \quad k\in K\setminus \{1\}, \end{aligned}$$
(27)
and \(S_k\) is the set of generated feasible points of \(X_k\) related to \(R_k\), i.e. \(r_{kj}=A_ky_{kj}\). Moreover, the procedure returns the dual values \(\mu \) for the global resource constraints.
Generation of columns solving sub-problems
The MINLP CG-algorithm generates columns by solving the following MINLP sub-problems
$$\begin{aligned} \begin{aligned} y_k \in {{\,\mathrm{argmin}\,}} d^TA_kx_k,\\&\quad x_k\in X_k \end{aligned} \end{aligned}$$
(28)
regarding a search direction \(d\in {\mathbb {R}}^{m+1}\), where d is typically defined by a dual solution \(\mu \in {\mathbb {R}}^m\) of the LP-IA (21), i.e. \(d=(1,\mu ^T)\). Notice that the result \(y_k\) corresponds to an extreme point of \(X_k\) as well as \(W_k\) and is a so-called supported Pareto point in the resource space, see Muts et al. (2020b).
The procedure solveMinlpSubProbl(d) solves (28) and is used in procedure addCol(\(d,R_k\)), described in Algorithm 1, to add a column \(A_ky_k\) to \(R_k\). Moreover, the procedure computes the reduced cost
$$\begin{aligned} \delta _k:=\max \{0,\min \{ d^Tr_k - d^TA_ky_k : r_k \in R_k \}\}, \end{aligned}$$
(29)
which is used later to measure the impact of the procedure. If \(\delta _k < 0\) for some \(k\in K\), then column \(A_ky_k\) may improve the objective value of (25). In the other case, if \(\delta =0\), the objective value of (25) cannot be changed (Nowak 2005) and the column generation algorithm can be stopped.
Algorithm 2 shows a method for generating columns by alternately solving an LP-IA master problem and generating columns using Algorithm 1. The set \({\hat{K}} \subseteq K\) is subset of the block set K. Note that columns can be generated in parallel.
Initializing columns
In order to initialize column set \(R_k\), we perform a subgradient method using Algorithm 3 for maximizing the dual of (10) regarding the global constraints:
$$\begin{aligned} L(\mu ) := \sum _{k \in K}\left( \min _{y_k\in X_k}(1, \mu ^T) A_k y_k\right) - \mu ^Tb. \end{aligned}$$
(30)
We compute the step length \(\alpha ^{p}\) by comparing the values of the function \(L(\mu )\) defined in (30) at different iterations p of Algorithm 3, similarly as in Shor (1985):
$$\begin{aligned} \alpha ^{p + 1} = {\left\{ \begin{array}{ll} 0.5 \alpha ^{p}&{}: L(\mu ^p) \le L(\mu ^{p - 1}), \\ 2 \alpha ^{p}&{}: \text {otherwise}. \end{array}\right. } \end{aligned}$$
(31)
Note that procedure addcol in Algorithm 3 can be performed in parallel.
Accelerating column generation
In order to accelerate CG without calling an MINLP routine, we developed two methods. The first method generates columns by performing NLP local search from good starting points of an IA-LP. The second method is a Frank-Wolfe algorithm for quickly solving the convex hull relaxation (19).
Fast generation of columns using NLP local search and rounding
The master problem used to provide starting points for the column generation is LP-IA problem (21). Since in the beginning only few columns are available, often LP-IA master problem (21) is infeasible, i.e. \(H\cap \prod _{k\in K} {{\,\mathrm{conv}\,}}(R_k)=\emptyset \). Therefore, we use an LP-IA master problem (32), that includes slacks and a penalty term \(\varPsi (\theta ,s)\), similar as in du Merle et al. (1999):
$$\begin{aligned} \begin{aligned} \min&\ F(w(z,x_1)) + \varPsi (\theta ,s)\\ {{\,\mathrm{\quad s.t. \,\, }\,}}&\sum _{k\in K} w_{ki}(z_k,x_1) \le b_i+ s_i \ , \, i\in M_1,\\&\sum _{k\in K} w_{ki}(z_k,x_1) = b_i+ s_{i1}-s_{i2} \ , \, i\in M_2,\\&z_k\in \varDelta _{|R_k|} ,\,\,k\in K\setminus \{1\},\quad x_1\in X_1, \\&s_i\ge 0,\quad i \in [m], \end{aligned} \end{aligned}$$
(32)
where the penalty term for slack variables is
$$\begin{aligned} \varPsi (\theta ,s):=\sum _{i\in M_1} \theta _i s_i + \sum _{i\in M_2} \theta _i (s_{i1}+s_{i2}), \end{aligned}$$
(33)
and penalty weights \(\theta > 0\) are sufficiently large. We define \(w_k(z_k,x_1)\) similarly to (23) as follows
$$\begin{aligned} w_k(z_k,x_1):=\left\{ \begin{array}{cl} w_k(z_k) &{} : k\in K\setminus \{1\},\\ A_1x_1 &{} : k=1, \end{array} \right. \end{aligned}$$
and \(x_1\in {\mathbb {R}}^{n_1}\) are the variables of the linear block, defined in (14). The procedure solveslackinnerlp(R) solves (32) and returns solution point x, dual solution \(\mu \) and slack values s. If the slack variables are nonzero, i.e. \(s\not =0\), in order to eliminate nonzero slack variables, the method slackdirections computes a new search direction \(d \in {\mathbb {R}}^m\)
$$\begin{aligned} d := \sum \limits _{\begin{array}{c} s_{i}> 0.1 \max (s), \\ i \in M_1 \end{array}} e_i + \sum \limits _{\begin{array}{c} s_{i1}> 0.1 \max (s), \\ i \in M_2 \end{array}} e_i-\sum \limits _{\begin{array}{c} s_{i2}> 0.1 \max (s), \\ i \in M_2 \end{array}} e_i, \end{aligned}$$
with \(e_i \in {\mathbb {R}}^m\) the coordinate i unit vector.
Since, for the CG algorithm, it is sufficient to compute high-quality local feasible solutions, we present a local search procedure approxsolveminlpsubprobl in Algorithm 4 based on rounding of locally feasible point. The goal of this procedure is to avoid usage of a MINLP solver for solving sub-problems and, therefore, reduce the time for sub-problem solving. The inputs of local search procedure approxsolveminlpsubprobl are the block solution \(x_k\) as starting point and the direction d or \((1,\mu )\) as search direction. It starts by running procedure solvenlpsubproblem which computes a local minimizer of the integer relaxed sub-problem
$$\begin{aligned} \begin{aligned} {\tilde{y}}_k:= {{\,\mathrm{argmin}\,}}&\ d^TA_kx \\ {{\,\mathrm{\quad s.t. \,\, }\,}}&x \in G_k \end{aligned} \end{aligned}$$
(34)
starting from the primal solution \(x_k\) of the LP-IA. Then the procedure round rounds integer variables of block k in \({\tilde{y}}_k\) to obtain \({\hat{x}}_k\). Finally, procedure solvefixednlpsubproblem solves again an NLP problem, fixing the rounded integer variables of \({\hat{x}}_k\):
$$\begin{aligned} {\tilde{x}}_k:={{\,\mathrm{argmin}\,}}\,\,&c_k^Tx_k {{\,\mathrm{\quad s.t. \,\, }\,}}x_k\in G_k, \quad x_{ki}={\hat{x}}_{ki},\quad i\in [n_{k2}], \end{aligned}$$
(35)
and using the continuous variable values of \({\tilde{x}}_k\) as starting point. The complete column generation procedure is depicted in Algorithm 5. Note that procedure approxsolveminlpsubprobl in Algorithm 5 can be performed in parallel.
CG by solving the convex hull relaxation using a Frank-Wolfe algorithm
In this section, we present a Frank-Wolfe algorithm which is an alternative way to generate columns. It is based on solving convex hull relaxation (19) by a quadratic penalty function approach:
$$\begin{aligned} Q(w,\sigma ):=F(w)+ \sum _{i\in [m]} \sigma _i \left( \sum _{k\in K} w_{ki} -b_i\right) ^2 \end{aligned}$$
(36)
where \(\sigma \in {\mathbb {R}}^m_+\) is a vector of penalty weights. Consider the convex optimization problem
$$\begin{aligned} \min Q(w,\sigma ) {{\,\mathrm{\quad s.t. \,\, }\,}}\,\,w_k\in {{\,\mathrm{conv}\,}}(W_k),\,\,k\in K. \end{aligned}$$
(37)
Let \(\mu ^*\) be an optimal dual solution of (19) regarding the global constraints \(w\in H\) and set the penalty weights \(\sigma _i=0\), if \(\mu _i^*=0\), and \(\sigma _i\ge |\mu _i^*|\), else, for \(i\in [m]\). Then it can be shown that (36) is an exact penalty function and (37) is a reformulation of the convex relaxation (19), i.e. (19) is equivalent to (37).
Algorithm 6 presents a Frank-Wolfe (FW) algorithm for approximately solving the convex penalty problem (37). For acceleration, we use the Nesterov direction update rule (Nesterov 1983), line 17. We set the penalty weight \(\sigma =|\mu |\), where \(\mu \) is a dual solution of LP-IA (32). One step of the FW-algorithm is performed by approximately solving the problem with a linearized objective
$$\begin{aligned} \min \nabla _w Q({\tilde{w}},\sigma )^Tw {{\,\mathrm{\quad s.t. \,\, }\,}}w_k \in {{\,\mathrm{conv}\,}}(W_k), \quad k\in K, \end{aligned}$$
(38)
which is equivalent to solving the sub-problems
$$\begin{aligned} \min \nabla _{w_k} Q({\tilde{w}},\sigma )^Tw_k {{\,\mathrm{\quad s.t. \,\, }\,}}w_k \in {{\,\mathrm{conv}\,}}(W_k). \end{aligned}$$
(39)
The sub-problem (39) is solved with approxsolveminlpsubprobl, depicted in Algorithm 4, in order to compute quickly new columns. The columns can be computed in parallel. Note that the gradient \(\nabla _{w_k} Q({\tilde{w}},\sigma )\) is defined by
$$\begin{aligned} \frac{\partial }{\partial w_{k0}}Q(w,\sigma )=1,\quad \frac{\partial }{\partial w_{ki}}Q(w,\sigma )= 2\sigma _i\left( \sum _{\ell \in K} w_{\ell i}-b_i\right) =:\eta _i (w,\sigma ). \end{aligned}$$
Hence, \(\nabla _{w_k}Q({\tilde{w}},\sigma )=(1,\eta ({\tilde{w}},\sigma )^T)\) for all \(k\in K\). The quadratic line search problem
$$\begin{aligned} \theta = {{\,\mathrm{argmin}\,}}Q({\tilde{w}}+t(r-{\tilde{w}}),\sigma ) \end{aligned}$$
in step 14 of Algorithm 6 can be easily solved.
A primal heuristic for finding solution candidates
In this section, we present two heuristic procedures for computing solution candidates. The first one computes a feasible solution from the slack LP-IA problem (32) solution. The second one computes high-quality solution candidates.
Algorithm 7 presents the initial primal heuristic, which aims at eliminating slacks in LP-IA master problem (32). It starts with running procedure nlpresourceproject, which performs an NLP local search solution of the following integer relaxed resource-projection NLP master problem
$$\begin{aligned} \begin{aligned} \min&\sum _{k\in K} \Vert A_kx_k-A_k{\check{x}}_k\Vert ^2,\\ {{\,\mathrm{\quad s.t. \,\, }\,}}&x\in P,\quad x_k\in G_k,\quad k\in K,\\ \end{aligned} \end{aligned}$$
(40)
where \({\check{x}}\) is the solution of the LP-IA master problem (32).
Using the potentially fractional solution \({\tilde{y}}\) of (40), the algorithm computes an integer globally feasible solution \({\hat{y}}\) by calling the procedure mipproject(\({\tilde{y}}\)) which solves MIP-projection master problem
$$\begin{aligned} \begin{aligned} {\hat{y}} = {{\,\mathrm{argmin}\,}}\,\,&\sum _{k\in K}\Vert x_k-{\tilde{y}}_k\Vert _\infty \\ {{\,\mathrm{\quad s.t. \,\, }\,}}&x\in P,\,\, \, x_k\in Y_k,\, k\in K.\\ \end{aligned} \end{aligned}$$
(41)
The integer globally feasible solution \({\hat{y}}\) is then used to perform an NLP local search, where integer variables are fixed starting from \({\hat{y}}\) by procedure solvefixednip(\({\hat{y}}\)):
$$\begin{aligned} \begin{aligned} x^*={{\,\mathrm{argmin}\,}}\,\,&c^Tx + \sum _{i\in M_1} \theta _i s_i + \sum _{i\in M_2} \theta _i (s_{i1}+s_{i2})\\ {{\,\mathrm{\quad s.t. \,\, }\,}}&\sum _{k\in K} A_kx_k \le b_i+ s_i \ , \, i\in M_1,\\&\sum _{k\in K} A_kx_k= b_i+ s_{i1}-s_{i2} \ , \, i\in M_2,\\&x_k\in G_k, x_{ki}={\hat{y}}_{ki}, i\in [n_{k2}], k\in K. \end{aligned} \end{aligned}$$
(42)
Algorithm 8 presents a primal heuristic for computing a high-quality solution candidate of MINLP problem (10). The procedure is very similar to Algorithm 7, but it does not use the NLP resource-projection problem (40). Instead, the solution of LP-IA (25) is used directly in MIP-projection problem (41). There is no guarantee that the optimal solution of (41) provides the best primal bound. Moreover, it may be infeasible for the original problem (10). Therefore, we generate a pool \({\widehat{Y}}\) of feasible solutions of problem (41) provided by the MIP solver. Solution pool \({\widehat{Y}}\) provides good starting points for an NLP local search over the global space and increases the possibility of improving the quality of solution candidate.
Similarly to Algorithm 7, Algorithm 8 starts with computing an inner LP solution \({\check{x}}\) of problem (25) by calling procedure solveinnerlp. Point \({\check{x}}\) is used in a procedure solpoolmipproject(\({\check{x}},N\)) to generate solution set (pool) \({\widehat{Y}}\) of (41) of size N, which also includes the optimal solution. Like in Algorithm 7, those alternative solutions are used to perform an NLP local search over the global space, defined in (42) by fixing the integer valued variables. In order to find better solution candidates, these steps are repeated iteratively with an updated point \({\check{x}}\). In each iteration, the point \({\check{x}}\) is shifted towards the point \(x^*\) which corresponds to the best current primal bound of the original problem (10). This is a typical heuristic local search procedure, which aims to generate a different solution pool \({\widehat{Y}}\) in each iteration of the algorithm. Algorithm 8 terminates when the maximum number of iterations is reached or the best primal bound in the current iteration does not improve the best primal bound in the previous iteration.
Main algorithm
Algorithm 9 describes a MINLP-CG method for computing a solution candidate of (10). The algorithm starts with the initialization of IA with the procedure iainit (Algorithm 3). Since the problem (32) might have nonzero slack values, the algorithm tries to eliminate them by computing a first primal solution. This is done by alternately calling the procedures approxcolgen (Algorithm 5) and findsolutioninit (Algorithm 7). For quickly improving the convex relaxation (19), the algorithm calls FW-based column generation procedure fwcolgen (Algorithm 6).
In the main loop, the algorithm alternately performs colgen (Algorithm 2) and heuristic procedure findsolution (Algorithm 8) for computing solution candidates. The procedure colgen is performed for a subset of blocks \({\hat{K}} \subseteq K \setminus {\{1\}}\), in order to keep the number of solved MINLP sub-problems low. Moreover, focusing on a subset of blocks helps avoiding computing already existing columns. The blocks can be excluded for a while by looking at the value of the reduced cost \(\delta _k, k \in K\setminus {\{1\}}\), which is computed in line 14, as defined in (29). The reduced block set \({\hat{K}}\) contains the blocks where the reduced cost is negative, i.e. \(\delta _k < 0, k \in K\setminus {\{1\}}\), and is updated at each main iteration by solving the sub-problems for the full set K. Note that if the reduced cost \(\delta _k\) is nonnegative for all blocks, i.e. \(\delta _k \ge 0, k \in K\setminus {\{1\}}\), the Column Generation has converged and the algorithm terminates.
Convergence analysis
Column Generation and Frank-Wolfe algorithms are well-known approaches and their convergence has already been proven. In this section, we discuss the convergence of Column Generation in Algorithm 9 and the Frank-Wolfe method in Algorithm 6.
Convergence of Algorithm 9
The convergence proof of the Column Generation algorithm is due to its equivalence to the dual cutting-plane algorithm, see Lemma 4.10 in Nowak (2005). Note that the proof is not based on the computation of reduced cost \(\delta _k, k \in K\), defined in (29). However, it can be used for measuring the impact of the new columns and as a criterion for algorithm termination, see Nowak (2005). For the convergence proof, we assume that all LP master problems (21) and MINLP sub-problems (28) are solved to optimality. Since Algorithm 2 is performed regarding the subset of the blocks \({\hat{K}}\), we ensure that main Algorithm 9 converges by performing a standard CG step in line 14 regarding all blocks. Note that the direct integration of the linear block (14) into LP-IA master problem (25) is equivalent to performing the CG algorithm regarding that block until convergence, as shown in Lemma 1.
Proposition 1
Let \(x^p\) be the solution of LP-IA (21) at the p-th iteration of Algorithm 9 at line 12 and \(\nu ^*\) be the optimal value of convex hull relaxation (19). Then \(\lim \limits _{p \rightarrow \infty }c^Tx^p=\nu ^*\).
Proof
The proof is equivalent to the proof of Proposition 4.11 in Nowak (2005). \(\square \)
Convergence of Algorithm 6
Algorithm 6 combines the original Frank-Wolfe algorithm (see Algorithm 1 in Jaggi (2013)) with Nesterov update rule (Nesterov 1983). The approach proposed by Nesterov in Nesterov (1983) has a convergence rate \(O(1/p^2)\), whereas the original Frank-Wolfe algorithm has a slower convergence rate of O(1/p), see Theorem 1 in Jaggi (2013). In order to prove the convergence of Algorithm 6, we assume that all sub-problems (39) are solved to global optimality. Next, we state that Algorithm 6 has a convergence rate \(O(1/p^2)\).
Proposition 2
Let \(\nu ^p:=Q({\tilde{w}}^p,\sigma )\) be the value of the quadratic penalty function (36) at p-th iteration of Algorithm 6 and \(\nu ^*\) be the optimal value of convex hull relaxation (19). Assume that \(\sigma _i \ge |\mu ^*_i|, i \in [m]\), where \(\mu ^*\) defines an optimal dual solution of (19). Then there exist a constant C such that \(\forall p\ge 0\)
$$\begin{aligned} \nu ^p - \nu ^* \le \dfrac{C}{(p + 2)^2}. \end{aligned}$$
Proof
The proof is equivalent to the proof of Theorem 1 in Mouatasim and Farhaoui (2019). \(\square \)